This post describes a more maintainable method than normal DHCP for automatically assigning host names and IP addresses to servers which is ideal for cloud computing and large clusters. It assigns them according to the location of the server, and requires zero administrative effort when you add or replace a server.
DHCP is OK, but it costs time and effort to keep up to date as you install and replace servers. If you have a million servers, you'd like for the IP addresses to be easily correlated to where the hardware is. For example, you'd like for the server name to be something like Drone-Rracknum-Sservernum and the IP address to be something like PREFIX.rack-number.server-placement-in-rack. Then your IP address would tell you immediately where the server is in your data center. If you have several hundred servers or more, this is a good thing. But you don't want to hard wire MAC addresses into DHCP configurations or IP addresses into filesystem images, either. Both are high-maintenance methods.
This post talks about an alternative method of assigning IP addresses, that lets you locate your servers easily, is lower overhead and more maintainable than the usual way of using DHCP.
The assumption of this arrangement is that you have a managed switch or two for each cabinet of servers, and that you maintain the IP addresses of the switches in DHCP. When you replace a switch, you put its MAC address / IP address pair into DHCP and away you go. But there are many fewer switches than servers, so this is something like 1/49 or so of the number of servers you have.
Once you have this, then the rest is simple. You assign the switch IP addresses sparsely, perhaps one per 50 addresses in a subnet. For example, 10.0.0.1, 10.0.0.51, etc. You then rewrite the network startup to listen for a link discovery packet (LLDP or CDP). This packet contains the management address of the switch, and the switch port this NIC is connected to.
If you assign these switch numbers in a way that reflects your hardware layout, then it's simple to decode that back into the placement of the switch in your environment.
Basically, the host computes the host name it gives to DHCP, and lets DHCP do its thing. The host name should be structured something like this: Drone-Rracknum-Sservernum. Of course, the script to translate the switch IP address to a rack number, and servernum offset is site dependent, but not hard to compute. If you have your racks arranged in nice neat rows and columns, you could go further and translate the racknum into a row and column or other local designations for the location of the server.
You give this host name to DHCP, and it then assigns you an IP address, DNS servers, and other information from that. If you wanted to hard-wire your "other information", the IP address itself is also easily computed, and you can avoid DHCP completely.
Now when you replace that computer with a different one, the new one automatically takes on the same IP address as the previous one had without any manual effort.
If you aren't familiar with CDP or LLDP - they provide management address and port information and a lot of other good stuff by periodically sending layer two packets to their connected endpoints. Most or all modern managed switches support one or both, and there are no known security issues that arise from enabling them.
It is more maintainable - instead of having to update DHCP tables with new MAC addresses for all the servers and switches, you only have to maintain the IP/MAC addresses of the switches, and the hostname/IP address correlations - leaving the MAC addresses out.
Of course, this also assumes that you have a rational cabling system in your cabinets - one where the switch port numbers correspond to positions of the servers in the cabinet. The main downside is that the systems take longer to come up. DHCP servers respond quickly, but you will have to wait to hear your link discovery packet. Of course, you will have to have another method for providing the netmask and the list of DNS servers. Although the former can be hard-wired, the latter may need to change over time.
Thanks to Narayan Desai of Argonne National Labs who introduced me to CDP and LLDP a few years back, and inspired these ideas. No doubt people have done this kind of thing before - maybe better. I mentioned this idea at LinuxCon, and people asked me to write it up. So here's a post about it.
What's your reaction to this? Is waiting for an LLDP packet going to take too long? How do you do this at your data center? Do you have a better method you follow?
Is there already a lightweight client that will do the listening? Seems it will need raw socket access (and promiscious interface?) to get the relevant ethernet frames...
Posted by: Howie | 13 September 2012 at 01:56
So answering my own question, for CDP you can use:
tcpdump -c1 -vvv -s0 -i fxp0 "ether multicast and ether[20:2] = 0x2000"
to dump the first recieved CDP frame as text, and
tcpdump -c1 -vvv -s0 -i fxp0 "ether multicast and ether proto 0x88cc"
does a similar (less nice) job with LLDP.
Posted by: Howie | 13 September 2012 at 02:31
My code in the Assimilation project monitoring system knows how to decode LLDP and CDP packets, and is lightweight, and doesn't enable promiscuous mode on Linux. What you really want is to listen on all your interfaces at once, and enable each as the packet(s) come in. LLDP has this annoying property that the frame length is split across two bytes, taking one bit from the frame type.
Posted by: Alan R. | 13 September 2012 at 22:27
I have been having trouble assigning them according to the location of the server. I eventually want them to require zero administrative effort when I replace a server. I have been trying to get this figured out for about a month now because my business needs a disaster recovery plan. I felt that cloud computing would be a great alternative for this.
Posted by: Michael Cornelia | 24 September 2012 at 12:58
Doing this right is definitely a pain in the you-know-what. It will require:
(0) Renumbering all your IP addresses of your switches as noted above
leaving room for all your servers AND their virtual IPs as well as fixed IPs
(1) Enabling LLDP or CDP in your network
(2) Writing a piece of software to intercept/read the incoming LLDP or CDP packets
on all the interfaces you expect CDP or LLDP on
(3) Integrating this CDP/LLDP reader into the bootup process to assign IP addresses
as the packets arrive, and terminate when all the requested interfaces are assigned
(4) Do a LOT of testing to make this all work.
Obviously, you'd want to start with a small test network to use in creating and testing this software.
Regarding Disaster Recovery - that's a complicated issue - not particularly related to cloud computing. It is unlikely that in most cloud computing scenarios that you'd have _any_ influence over IP address assignments.
Posted by: Alan R. | 24 September 2012 at 20:57