« Methods for relocating network connectivity | Main | A network discovery proposal for highly scaleable monitoring with O(1) overhead »

29 October 2010

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00e54ed61e0788330133f570c1cb970b

Listed below are links to weblogs that reference Really Big Clusters: A Scalable membership proposal:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

AlTobey

Any clustering system designed for 10k hosts should look a lot more like OSPF than Linux-HA.

Alan R.

That's an interesting (and thoughtful) comment. This doesn't look very much like Linux-HA (or Pacemaker) - and it's a long way from a complete solution. An OSPF network with 10K routers is a very big OSPF network. (not 10K hosts, or 10K switches - they don't participate in OSPF).

What specific way do you think it should look more like OSPF?

Here are my off-the-cuff thoughts on this question...

How is this problem like OSPF? - it is trying to manage liveness, it is trying to be local network topology aware and network efficient.

How is this problem different from OSPF? It's not trying to solve the "let's help independent fiefdoms work together" problem. At this level, all machines are "owned" by the same owner. It is not trying to provide anything more than liveness (it's trying to solve a simpler problem). There is no distributed control (at this level).

Alan R.

Thinking about this design - it seems to me it's more like DHCP than any other internet protocol that I can think of, since it's centrally managed and has other things in common. For example...

When we boot up, we send a multicast/broadcast packet asking for someone to tell us who we are and how we should be configured (analogous to getting DNS entries and so on from DHCP). Like DHCP clients, our machines "renew their leases" periodically - except we do it a *lot* more often. Instead of measuring lease renewal times in minutes, hours or days, we measure them in seconds (or potentially even in fractions of seconds). To compensate for this, we distribute the our "dhcp server analog" throughout the network.

The comments to this entry are closed.

Become a Fan

AddThis Social Bookmark Button
Blog powered by Typepad