Traditionally, the way people have implemented high availability is by using a high-availability management package like Linux-HA[1], then configure it in detail for each application, file system mount, IP address and so on. This traditional method works quite well, but can be a bit labor intensive - particularly when using custom or uncommon applications. You may have to understand the structure of your applications, write some resource agents[2], debug them, and test them in detail. In addition, every time you change your mount structure, or other details you've told your HA system, you have to be sure and update your HA configuration to match - or it might not fail over correctly the next time.
When you have good resource agents, your HA system will also recover from application failures - by restarting applications that have failed. This is a good thing. On the other hand, this is enough work that virtually no one runs all their applications in an HA configuration. It's just too much work for most applications. I call this traditional boutique-like method "HA at retail". It works well, but it is a little costly to set up and maintain all the details just so.
With virtualization, another approach is possible, and (big surprise), I call it "HA at wholesale". In this paradigm, instead of needing to write scripts for each type of application, you just have one resource agent - one for managing a virtual machine. You also don't need to know the structure of the applications - the OS still starts them in whatever way it has been starting them all along. Wow, this sounds good - less work, fewer chances for errors! As expected, there is still no such thing as a free lunch here - you do wind up with some disadvantages.
For example, you can no longer easily detect the failure of an application. In addition, if an application fails, the only thing you can do about it is reboot the entire virtual machine. Inevitably, this takes longer than just restarting the failed application.
So, HA at wholesale has these properties:
[1] http://linux-ha.org/
[2] http://linux-ha.org/ResourceAgent
[3] http://www-05.ibm.com/hu/termekismertetok/xseries/dn/pfa.pdf
When you have good resource agents, your HA system will also recover from application failures - by restarting applications that have failed. This is a good thing. On the other hand, this is enough work that virtually no one runs all their applications in an HA configuration. It's just too much work for most applications. I call this traditional boutique-like method "HA at retail". It works well, but it is a little costly to set up and maintain all the details just so.
With virtualization, another approach is possible, and (big surprise), I call it "HA at wholesale". In this paradigm, instead of needing to write scripts for each type of application, you just have one resource agent - one for managing a virtual machine. You also don't need to know the structure of the applications - the OS still starts them in whatever way it has been starting them all along. Wow, this sounds good - less work, fewer chances for errors! As expected, there is still no such thing as a free lunch here - you do wind up with some disadvantages.
For example, you can no longer easily detect the failure of an application. In addition, if an application fails, the only thing you can do about it is reboot the entire virtual machine. Inevitably, this takes longer than just restarting the failed application.
So, HA at wholesale has these properties:
- Simple enough that you can implement it for every machine
- Works well for hardware failures
- When coupled with hardware predictive failure analysis[3] and smart HA software, outages can sometimes be completely avoided.
- Can't easily detect or recover from application failures
- The only thing you can do about any failure is reboot the virtual machine
- It is complex enough that you need to limit how broadly you apply it in your environment
- Works well for hardware failures
- It can easily detect and recover from application failures
- Individual applications can easily be restarted - and don't require a reboot
[1] http://linux-ha.org/
[2] http://linux-ha.org/ResourceAgent
[3] http://www-05.ibm.com/hu/termekismertetok/xseries/dn/pfa.pdf