One of the things that people have gotten most excited about in the Assimilation Monitoring Project in the area of discovery is the discovery of clients, servers and particularly dependencies.
That code is now in the Assimilation code base. We discover client processes, server processes, and their interconnections. In this post, I'll explore how this works and what this looks like in the Neo4j graph database. These dependencies are discovered using Stealth DiscoveryTM methods - without port scanning or packet sniffing - .
If a server process goes down, its clients will most likely won't function correctly - so there's a dependency of the client on the server. Imagine your NOC suddenly lights up with a dozen services broken in a cascading failure. To find the root cause, find failed services which aren't clients of other failed services - by following the dependencies. Because we discover these client->server dependencies (and a number of other types) automatically, the Assimilation code will be able to help you or your DevOps team quickly find the root cause.
On the right is a capture of some of the data from the new client/server discovery code on my test machine. For this example, I did an ssh from 'servidor' to 'servidor'. In the upper right is a green box labelled 'tcp-process/ssh'. This is the client process of the connection. It in turn connects to the violet 10.10.10.5:22 IP:tcpport node by a 'tcpclient' relationship. This port in turn is listened to via the 'tcpservice' relationship by the sshd server process in the green box labelled 'tcp-process/sshd'. As you can see from the graph, both of them are 'runningon' Drone servidor.
So, you can see that if you have a problem with the 'ssh' client, it could be caused by a problem with the 'sshd' service - just by following the tcpclient->tcpservice links. Of course, if the server it's 'runningon' fails, then things will also fail. Similarly by following the tcpclient->baseip->ipowner, relationships you can see that failure of the NIC eth0 can also cause problems. It's these kinds of complex dependency relationships where Neo4j shines.
The sshd process is also listening to port 22 on 10.10.10.200, but no one is connecting to it by that address. It turns out sshd listens to both the IPv4 and the IPv6 ANY addresses. Some applications (like sshd) listen on all addresses, some on just a few, and some (like BIND) look to see what all addresses exist when they start and bind to each one individually (funky, eh?). Sshd is a little weird because Linux has a dual IPv4/IPv6 stack and there's no need to listen to both the IPv6 :: and the IPv4 0.0.0.0 ANY addresses, but apparently sshd has two separate sockets - one for IPv4 and one for IPv6 clients.
The source of this data in the database is from our tcplisteners and tcpclients discovery agents - shell scripts that use netstat and /proc to grab cool stuff...
So, what's your take on this? Will it be useful to you? What do you think we missed? What would be your favorite thing for us to discover?
Comments
You can follow this conversation by subscribing to the comment feed for this post.