This post covers the basics of data replication in automated disaster recovery. Automated disaster recovery means that when a site providing a service goes down, the service will continue running elsewhere without human intervention. If you're willing to trust your automation, this is an ideal situation - one site goes down (for whatever reason) and the other one takes over - automatically - and pretty quickly. In most cases, your users get continued access to the service. The Linux-HA software is well-prepared to do this for you correctly - even when your cluster is split across sites. It has specific technology in it to deal with this situation - which I'll explain in a later post.
In Disaster recovery terminology, this kind of a solution provides meets both excellent recovery point objectives (RPO), and recovery time objectives (RTO).
There are a few problems with this though -- the second site has to have enough servers to be able to run the service, and of course, the software to run the service, and - the most difficult part - it has to have an up-to-date copy of the data the service uses. This post will concentrate on this aspect of automated disaster recovery.
There are two basic approaches to replicating data across sites - application-specific, or disk volume level replication.
Application-specific Replication
A number of applications (DB2 UDB, Oracle, DHCP, DNS, etc.) have built-in (or add-on) replication mechanisms which will do a great job of keeping two copies of your data in sync - and these methods are typically the best when they're available. However, if your application doesn't have such a method, then you'll have to use disk-volume-level replication.
Disk-volume-level replication
If you're running Linux, a good general way of replicating dynamic data copied across sites is DRBD. DRBD is a great open source tool which keeps two copies of data in sync, and worries a lot about data integrity -- which copy of the data is up to date, and related things. It works most efficiently when you make one side master and update from only the master side. Typically, in a split-site design, this is typically what's needed - both for efficiency and keeping both your sanity and your data's sanity.
DRBD is only one method of doing this. Most storage vendors (IBM for example) sell products that can replicate storage volumes across distances. There are also a number of other commercial replication packages, and a few other open source disk replication packages.
If you are going to do automated site failover, then it's highly recommended that you replicate your data synchronously rather than asynchronously.
Synchronous replication means that before a disk write (or database transaction) is reported as being complete, that the replication software makes sure the other side has a good copy of the data on disk. That way, no writes (or transactions) are ever reported as complete, but somehow lost after a failover.
With asynchronous replication, the replication software ensures only that the write (or transaction) is queued to send to the other machine before the write or transaction is reported as being complete. This means that in case of a crash, the last few writes or transactions may be lost - which may compromise your recovery point objective (RPO).. If you fail over to the backup site, those last few transactions may be lost permanently. This is obviously a disadvantage. If you failed over automatically due to a communications link being down for a few minutes, those transactions were lost for what may turn out to be very little reason. This is the kind of thing that has been known to encourage people to make out their resumes unexpectedly.
So, with asynchronous replication having rather serious limitations, why do people use it? It's simple really - the speed of light. If your two sites are 1000 km apart, the round trip time to send a packet, write the data, and confirm it is greater than the speed of light in fiber. Normally people think of the speed of light as 300,000 km/sec. However, in fibre, the speed of light is more like 195000 km/sec. If you want your transactions to complete with no more than 1ms delay due to network delays, then you need your round trip ping time to be less than 1 ms. This means that in the worst case, the round trip distance can't be more than 195 km (or 97 km each way). However, with disk delays, and delays induced by store-and-forward algorithms in networking hardware, the practical rule-of-thumb distance is more like 50 km. But, of course, this depends on how write-intensive your application is, and how much delay you can tolerate. I'm not an expert in this area, but perhaps someone reading this can offer their opinions on it. You probably figured out pretty much right away, that you can't get this kind of latency over the public Internet - there are just too many hops. This means for synchronous replication that you need a dedicated (and obviously low-latency) network between your sites.
To summarize - for the things we've talked about so far, you need:
- Two (or more) sites
- Servers on both sites
- Replication software or hardware
- Duplicate storage on both sites
- Dedicated low-latency bandwidth to connect the two sites
Of course, this isn't nearly enough for a complete solution, but it covers most of the basics for replication.
Another resource worth looking at is Christoph Miasch's thesis on Server-Based Wide Area Data Replication for Disaster Recovery
Comments