Wouldn't it be wonderful if you could just drop a monitoring package onto your servers with no configuration at all, and it all started up, began monitoring your servers, discovered your services, dependencies and switch port connections all without you doing anything - with practically no load on your monitoring servers, and without setting off any network alarms?
You can now do that with the Assimilation Monitoring Project. Really.
For the skeptics and geeks out there (you know who you are!), this blog post tells you how it works in some detail.
In earlier posts, stealth discovery of servers, services, dependencies, and switch ports was covered in some detail. Earlier posts also explained how the hyperscale O(1) monitoring works. So, I won't repeat those things here.
What's new is the zeroconf-like configuration-less startup. The problem is this: We have a bunch of nanoprobes on an unknown number of machines, some of which may be using our preferred port for other things, and we have a collective management authority (CMA) - and we need to connect them to each other - without configuring anything specific for a given site.
So, how can this be done?
It first started off with a trip to IANA - to get a reserved multicast address. I sent an email to IANA requesting that they reserve a specific multicast address for the Assimilation project. After a a few weeks and half dozen emails back and forth, they agreed that this was the right approach, and they assigned one for the project. The project is now the proud owner of multicast address 22.214.171.124. Here's how it's used:
- When the CMA starts up, it binds to the ANY address and port 1984. It then also joins the multicast group on 126.96.36.199. This socket now hears any packets sent to any of the unicast addresses on the machine, or to our multicast address.
- When a nanoprobe starts up, it tries to bind to the ANY address on UDP port 1984. If that port is already being used, it asks the OS to assign a port to it to bind with the ANY address.
- The nanoprobe then sends a single multicast message to our multicast address requesting connection and setup. This message includes some local network configuration. Nanoprobes do not join the multicast group.
- The CMA hears this multicast packet from the nanoprobe. It then records the address and port the nanoprobe sent from in the database, and begins the process of configuring it into the system. Configuration consists of these steps:
- Analyze the network information and put it into our Neo4j database. This includes the entries for all the NICs and IP addresses on the machine.
- Send it a packet giving it a variety of configuration information - including the IP addresses and ports to use when contacting the CMA for various purposes.
- Connect it in a neighbor monitoring arrangement as noted in earlier posts.
- Send it commands to perform and schedule discovery actions.
- The nanoprobes then obey these commands which then causes all further communication with the CMA to be sent to the the unicast IP addresses that were sent in the configuration packet from the step above.
- Everyone lives happily ever after.
It is important that the nanoprobes be able to tolerate having the preferred port be unavailable when they start. One reason is that in a large and diverse customer set, it's hard to predict who might be using this port already. Another even more important reason is that we want to start a nanoprobe running on the CMA - and the CMA always uses that port.
Here are the requirements for this to work:
- There must be a well-known multicast address which most sites can use.
- There must be a well-known port which is free on the CMA.
- The site must permit multicast.
- Multicast must extend between a CMA and all its nanoprobes - but not to other CMAs.
- On the CMA servers, the nanoprobes have to be started after the CMA.
Obviously not every site can meet all these requirements - but many can. And for those that can, it makes for a great open-box experience - and definitely a much lower suck factor. Imagine that - monitoring that doesn't suck!
So, what's your reaction to this? Will you be able to run it without configuration? Or will you have to configure in the address of your CMA? Is this cool? Or is it just ho-hum?