Wouldn't it be wonderful if you could just drop a monitoring package onto your servers with no configuration at all, and it all started up, began monitoring your servers, discovered your services, dependencies and switch port connections all without you doing anything - with practically no load on your monitoring servers, and without setting off any network alarms?
You can now do that with the Assimilation Monitoring Project. Really.
For the skeptics and geeks out there (you know who you are!), this blog post tells you how it works in some detail.
In earlier posts, stealth discovery of servers, services, dependencies, and switch ports was covered in some detail. Earlier posts also explained how the hyperscale O(1) monitoring works. So, I won't repeat those things here.
What's new is the zeroconf-like configuration-less startup. The problem is this: We have a bunch of nanoprobes on an unknown number of machines, some of which may be using our preferred port for other things, and we have a collective management authority (CMA) - and we need to connect them to each other - without configuring anything specific for a given site.
So, how can this be done?
It first started off with a trip to IANA - to get a reserved multicast address. I sent an email to IANA requesting that they reserve a specific multicast address for the Assimilation project. After a a few weeks and half dozen emails back and forth, they agreed that this was the right approach, and they assigned one for the project. The project is now the proud owner of multicast address 224.0.2.5. Here's how it's used:
- When the CMA starts up, it binds to the ANY address and port 1984. It then also joins the multicast group on 224.0.2.5. This socket now hears any packets sent to any of the unicast addresses on the machine, or to our multicast address.
- When a nanoprobe starts up, it tries to bind to the ANY address on UDP port 1984. If that port is already being used, it asks the OS to assign a port to it to bind with the ANY address.
- The nanoprobe then sends a single multicast message to our multicast address requesting connection and setup. This message includes some local network configuration. Nanoprobes do not join the multicast group.
- The CMA hears this multicast packet from the nanoprobe. It then records the address and port the nanoprobe sent from in the database, and begins the process of configuring it into the system. Configuration consists of these steps:
- Analyze the network information and put it into our Neo4j database. This includes the entries for all the NICs and IP addresses on the machine.
- Send it a packet giving it a variety of configuration information - including the IP addresses and ports to use when contacting the CMA for various purposes.
- Connect it in a neighbor monitoring arrangement as noted in earlier posts.
- Send it commands to perform and schedule discovery actions.
- The nanoprobes then obey these commands which then causes all further communication with the CMA to be sent to the the unicast IP addresses that were sent in the configuration packet from the step above.
- Everyone lives happily ever after.
It is important that the nanoprobes be able to tolerate having the preferred port be unavailable when they start. One reason is that in a large and diverse customer set, it's hard to predict who might be using this port already. Another even more important reason is that we want to start a nanoprobe running on the CMA - and the CMA always uses that port.
Here are the requirements for this to work:
- There must be a well-known multicast address which most sites can use.
- There must be a well-known port which is free on the CMA.
- The site must permit multicast.
- Multicast must extend between a CMA and all its nanoprobes - but not to other CMAs.
- On the CMA servers, the nanoprobes have to be started after the CMA.
Obviously not every site can meet all these requirements - but many can. And for those that can, it makes for a great open-box experience - and definitely a much lower suck factor. Imagine that - monitoring that doesn't suck!
So, what's your reaction to this? Will you be able to run it without configuration? Or will you have to configure in the address of your CMA? Is this cool? Or is it just ho-hum?
Dev and admin teams struggle these days with keeping up with agile development. DevOps helps by breaking down some of the walls. But one of the biggest challenges is getting the entire development team involved and not just 1 or 2 people who help do deployments. The entire team needs visibility to the production server environments to help support and troubleshoot applications. I read a great blog post about this recently. DevOps
Posted by: ricky | 18 October 2012 at 13:07
One of my hopes for this project is that by continually discovering lots of different aspects of the environment and expressing the relationships between all the different pieces that we can provide an up-to-date database for people to go to learn things about the environment. This is analogous to a CMDB - but much more detailed than is typical, and always up to date.
Switches, servers, services, client/server relationships, configuration parameters and so on will all wind up there. Thanks for the pointer to the article.
We already gather all these types of information - now we just need to provide an awesome interface for finding out what you want without getting overwhelmed.
As a note -- having the URL of the blog post you mentioned would be a good thing.
Posted by: Alan Robertson | 18 October 2012 at 13:37
I'm planning on taking a good bit of time off to make the first Assimilation release available. I'd describe its current state as a well-established proof-of-concept. Hopefully in the time I have before the end of the year, I can make it into a release worth having others try out. Feel free to play with it as is.
Posted by: Alan R. | 16 November 2012 at 06:27