« How Managed Virtualization (including HA) conflicts with System Management | Main | Virtual machine snapshots considered (nearly) worthless... »

07 January 2008


Feed You can follow this conversation by subscribing to the comment feed for this post.

Mike Dolan

Wow! Outstanding - now, how do I set it all up :-)

Alan R.

Thanks Mike!

All these components exist - but they don't all work together in the way one would like. Getting some unification and common code and interfaces for Linux clustering has been one of my goals since I started the Open Cluster Framework (OCF) effort back in 2001. Improving this situation is something that the Linux-HA project is looking at very seriously - getting the best of our technology to work with the best of the Red Hat and IBM Technology, and not make it take a rocket scientist to get it all to interoperate correctly. When this happens (and it's coming), it will be a huge boon to the Linux community. I've alluded to a little (but by no means all) of the work that needs to be done. RealSoonNow(TM) I'm going to significantly update the project's goals - and even more of this work will be outlined there.


"In large systems, this would probably use the ClusterIP capability to provide load distribution (leveling) across multiple LRM proxies."

Ugh... please dont tie things to the clusterIP (if that is really what you mean :) ) I have enough problems with IP drift in RHCS that redhat cant seem to explain :)

Alan R.

IC: Well... I don't think RHCS supports the kind of load-distributing/leveling ClusterIP I'm talking about (as described in my article on load balancing or http://www.linux-ha.org/ClusterIP). Any cluster that can't make its basic capabilities work is broken - and needs to be fixed. I've never heard of such a problem in Linux-HA - at least not in the last 8 years or so. We perform extensive automated tests before each release and we have tens of thousands of clusters in the field.

For the case of a cluster large enough to need more than one LRM proxy server, you have to have a load balancing method - and it's far better to use one that's already part of the product than to invent a new one which wouldn't be as well-tested.

Of course, one could always configure an external load balancer, or LVS, but I'm pretty sure that would be much gnarlier than the ClusterIP idea. For *BSD and other UNIX systems, it won't scale as high without an external load balancer, but that's probably not so bad, since just having the proxy would probably get us to clusters in the hundreds to small thousands range.

The idea in a little more detail is this: There would be two classes of nodes in the cluster - core nodes, and non-core nodes. Non-core nodes wouldn't run the whole cluster stack - they would only run the LRM + an LRM proxy client application. The LRM proxy client application would connect to the LRM via IPC, and also to a corresponding proxy server process via the network.

If a cluster exceeds the capacity of a single machine to run the LRM proxy server to its non-core nodes, one would have to have to run the proxy server on more than one node at a time. The cluster IP would then be used to allow the distribution of these client-server relationships across the set of LRM proxy server processes.

Keep in mind that this load balancing function is only needed for clusters which have hundreds or more likely thousands of nodes.

Smallish (16 nodes or less) clusters wouldn't have any secondary nodes - and wouldn't need this LRM proxy server capability at all - much less more than one machine running the LRM proxy server.


Thanks for the information. Good food for thought!


Have you looked at CTDB/Samba ?

CTDB is a new Clustering application for Linux and AIX (and should with some work work on any other unixen as well)
which are used to turn a bunch of nodes into a full all-active samba cluster.

http://ronniesahlberg.blogspot.com/ my blog about CTDB
http://www.linux.conf.au/programme/presentations where Tridge and me do a presentation about CTDB at linux conf

CTDB is a clustering framework that allows you to run samba on all nodes at the same time and to export the same share read/write to clients from all nodes in the cluster at the same time.
This solves a lot of problems often associated with nas server clustering since most implementations are really active/passive failover solutions currently and with those you may have issues like : what if the passive node fails to start after failover? what if both nodes become active at the same time? Issues which can not occur for an all-active cluster.

CTDB is currently in production use at many sites and is not a prototype any more. It provides all-active clustering of both your CIFS and NFS server.

Apart from providing very tight integration with samba and creating a "single image" samba distributed across multiple nodes, CTDB also supports HA for services such as iSCSI, FTP, HTTP... .

Also, an all-active cluster can be VERY VERY VERY fast. There are some sugegstions of throughput numbers in Tridge's presentation from linux.conf for throughput to a single read/write share where clients are accessing the same data through different nodes at the same time that are VERY impressive.

Alan R.

I'm familiar with Tridge's work. It works extremely well for his application, and he's done a very good job within the constraints of what he's trying to do - which is different from what Linux-HA tries to do.

For Linux-HA, we have an old, slow, kind-of-crufty protocol that works, and it doesn't make any difference to it - because (strangely enough) it's not a parallel application in the same sense at all. Surely we do things in parallel, but they're all planned in parallel, so there is no deed for any kind of concurrency control. We create a graph of work to do then we walk the graph in the correct order.

What the ctdb does is make parallel applications of certain kinds quite fast, and avoids locking for many/most cases. This is a very interesting, but different problem.

Now, for the full cluster stack idea (with a true parallel filesystem under it), I don't know if anyone has written such a filesystem based on Tridge's ideas. I confess to not having thought of it in that way enough to know if that would be a breakthrough or irrelevant. If it would be, someone (perhaps Tridge) will do it.

The comments to this entry are closed.

Become a Fan

AddThis Social Bookmark Button
Blog powered by Typepad