Recent changes in technology have altered how the cloud can protect your business infrastructure. The goal of cloud based backup, disaster recovery (DR), and business continuity (BC) services has changed from providing a disaster recovery solution to providing a business continuity solution. This change has become necessary as businesses are increasingly reliant on systems technology being available 24 by 7. Before reading this article it is appropriate to reflect on the real costs of your organization operating without email, without data files, or without its transaction system. Would the business survive? This is the risk that DR and BC technology strategically mitigates. Before reviewing backup, DR, and BC solutions it is important to define the “cloud”.
The term cloud has broad interpretations. For the purpose of this article it is defined as a set of remote systems stored in a stable environment that can be accessed via a reasonable speed network such as the Internet. This leaves three basic configurations for the cloud: 1) your own equipment and software in your own remote data center – not just in a closet at your remote sales office, 2) your own equipment and/or software in a vendor’s data center, 3) a vendor’s equipment and software in a vendor’s data center – not necessarily the same vendor but invoices come from one entity.
In order to rate each of the configurations we need a rating system organized into functional areas. We will use Excellent, Good, Bad, and Ugly for our ratings. Cost is estimated from 1 to 5 with 5 being the most expensive. For functional areas we will make use of the following terms: RPO, RTO, Effort, and Monitoring. RPO, recovery point objective, refers to the point at which the last backup was taken and the point to which you will be able to restore – in short, how much data will be lost. RTO, recovery time objective, refers to the amount of time it takes to restore your backups and get your systems servicing users – in short, how long will it take until you can get your email and process an invoice. Effort is our rating as to how much sweat it takes to keep the backup system functioning – in short, is it viable with your current staff. Monitoring, this is our rating as to how easy it is to make sure the backups or replicas are working – in short, you need to know if you are or are not protected. A couple of things to keep in mind. Good RPO is required for a DR. Good RPO and good RTO are both required for BC.
What are the backup, redundancy, and business continuity options and what are the benefits of each? Please note that the configurations summarized are in order of capability and cost. They build upon one-another, taking the best of each level and adding features resulting in superior DR and BC. Note, this article is concentrating on redundancy in the cloud and not on archival backup. Many of the configurations described include archival backup but some do not.
Local Backup to Removable Media
This is the “old standby” that the SMB market has been using since the mainframe, mini, or Intel based server landed within the corporate walls. It works by getting a copy of data out of the building to protect against disaster. Thus the RPO is the time of the last backup. The main issue is RTO. In the event of a disaster it could take a couple of weeks to find and assemble the platform capable of accepting the restores and processing the applications. Also, swapping physical tapes and disks is something that requires daily scheduling, perseverance, and off-site storage.
Remote Backup of Data
Remote cloud backup became viable a few years ago with the advent of snapshot capable operating systems and applications, and inexpensive Internet bandwidth. Remote backup works by loading an agent onto each system and pointing that agent at a cloud backup service provider (often the agent is provided by the service provider). RPO is excellent because these services can keep track of changes and send differential updates when needed to keep your backups current. The RTO issue remains. If you have a building disaster or even a server failure, how do you get your data to your facility to be restored? Often it is too much data to “download” in a reasonable timeframe. Thus you need to call the provider and ask for the data on a USB disk.
This delay might not be the only issue since you need new equipment on which to restore. Once again the RTO can be weeks. Another downside is monitoring. Differential synchronization is difficult to monitor as it is happening frequently. Therefore some remote backup services require proactive effort to assure that they are actually working.
Remote Replication of System Images
Beyond data backup, operating system technology now offers the ability to replicate an entire system image to a remote storage provider. In the case of full system images, the replication often happens once per day and not continually. So the RPO is good and the replication jobs are easier to monitor. Unfortunately the RTO issue remains. If you have a local failure you will have all of the system images and new hardware on which to restore these images. The hardware may not need to be the same but you will need to procure it quickly.
Remote Replication of System Images to a Provider with “Run” Capability
Replicating your physical or virtual system images to a provider with available infrastructure gives your organization a business continuity platform. The replication is likely daily thus providing a reasonable RPO and monitoring. The RTO is substantially decreased (better) because the provider has the ability to process your replicated images. Thus in the event of a disaster the provider uses their infrastructure to temporarily process your systems in their data center. You revert back to the last replication and can be running in a reasonable time. The RTO is not instant because the service provider will store your images compressed and encrypted so the images must be expanded and decrypted. During this restore time the provider will build the appropriate remote network to make your systems available. The end result is you can be back up and running in a business day.
Remote Replication of Virtual Images to a Remote Virtual Infrastructure
In order to reduce the RTO of replicated system images, the remote storage must be able to receive differential data to a replica that is ready to turn on. This can easily be done with virtual servers and with some physical servers. In the event of a disaster the RPO is the last backup and the RTO can be less than an hour. Thus, if the decision is made to “failover”, up to a day of data may be lost but systems will be running and available within an hour. Please note, that a Remote Desktop Server may be needed to remotely process desktop environments depending upon the architecture of certain transaction systems. The remote infrastructure can be provided by a service provider or be company owned equipment placed in a remote data center.
Remote Replication of Primary Systems to Running Secondary Systems
Active/passive running systems are the next step in reducing both RPO and RTO. These types of configurations allow the primary system to replicate to a standby system while the primary is in use. The replication typically occurs once a data change threshold is reached – such as 50MB.
Your primary and secondary systems are almost in sync and your resultant RPO can be minutes. The RTO is also reduced by pre-configuring the secondary system to operate seamlessly through the use of dynamic DNS in both Microsoft Active Directory and public DNS. The end result is very low data loss and a very short recovery time. Monitoring is typically done by exception to indicate that things have stopped working. The downside is complexity and cost. When you have many systems the effort associated with maintaining this type of replication is extensive.
Block Level Storage Replication
For larger installations block level replication is often chosen for its ease of deployment. Many companies move to a shared storage architecture as their server count grows. Thus all of the servers data (OS and data drives) exists on the shared storage devices which in this case would be
SANs (storage area network). Since all of the data for all of the production servers is on the same device or devices, the data is replicated to another similar device in a remote data center. This replication can happen periodically or continually. At the remote data center there must be server hardware ready to process the replicated data. In the event of a failure the replicated volumes are made available to the standby hardware and the servers are booted at the remote data center. The great advantage of block level replication is that it is highly scalable with little effort to add more replicated volumes.
Thankfully we continue to have an increasing number of backup, DR, and BC options to fit varying budgets and levels of need. At the risk of being repetitive the parameters of making the appropriate choice when it comes to backup/DR/BC remain RPO and RTO. In short, how much data can you lose and how long can your organization operate without its technology systems. These parameters vary by organization. A financial services company cannot be without their transaction system when the financial markets are on-line. However, a manufacturer that has a
facilities issue may be able to function without its transaction system for a couple of days since its manufacturing is not taking place. The one system that seems to be universally required is email. It has developed from a convenience to an essential business communication system and for this reason email belongs in the cloud.