I haven't written much lately - because I've been quite busy writing tons of code for the new Assimilation Monitoring Project - which will start showing up here in my blog much more frequently - beginning with this post.
From the perspective of managing a set of applications, a single tenant data center, or a multiple tenant data center, perhaps the most interesting, useful and fundamental information that one can come up with might be dependencies.
That is, this service depends on this server, this IP address. The IP address depends on the network interface that's providing it, the switch that it's attached to, and the upstream switches from that switch.
The server depends on its fans, its power and room cooling. Its power depends on one or more UPSes, the cooling depends on a series of chillers and/or air movers.
This service (perhaps a web server) might depend on a J2EE service, or an Enterprise Service Bus. These services might further depend on a database, which might depend on a series of SAN connections, to one or more SAN switches, which might in turn depend on one or more SAN arrays.
This kind of information helps understand when one has a cascading failure, which resources(s) are most likely to be the root cause of the overall failure.
Part of what the Assimilation Project will accomplish is to collect this data using Continuous Stealth DiscoveryTM. I'll write more about that later. But regardless of the method of collecting this data, it's easy to see that having this information available and accurate is important to managing computers in a data center.
It's easy enough to see that this information is most naturally represented by a graph structure. Although this is mostly your typical Directed Acyclic Graph (DAG), it's not guaranteed that the data is acyclic.
Moreover, the structure of the dependencies is not completely predictable in advance, and some of the annotations of links (like the information from LLDP or CDP) can be complex and vendor and version specific.
If you've spent time chasing dependencies in a relational database, you already know that performing a transitive closure of a dependency graph is slow, painful and ugly in a relational database. Each time you want to traverse another link you wind up performing a join. And the join speed doesn't depend on the size of the subgraph of dependencies you're concerned with, but on the size of the overall database. The bigger the data center (more dependencies) you have, the longer it's going to take to chase the next link in the path.
This speed argument applies to any database (Cassandra, MongoDB, etc) where you have to go back to the index of all the things each time you want to follow to the next link in the subgraph.
What also seems obvious is that the most natural way to store this kind of graph data is in a graph database. The speed of this kind of a query in a graph database is only dependent on the size of the subgraph - not on the size of the entire graph for the enterprise. Moreover, with graph query languages like Cypher, you can write a query for this kind of closure that finds the entire subgraph in one trip to the database.
This is why I'm leaning strongly towards using Neo4J in the Assimilation project. It is a graph database, it's open source, it has bindings for many languages including Python, and has a series of very active communities which surround it and contribute to it.
Good post. Sums up why we too are working on using Neo4j for facilities management/equipment monitoring. Would be really interested in more such posts!
Posted by: Account Deleted | 04 June 2012 at 11:03
Hi Luannem,
Thanks for your comments! Sorry I was so slow to respond to it :-(.
If you hang in there, you'll get more posts like this. I'm currently making about one a week - and I have at least 3 more weeks (after today) of posts I know what to say.
Because of your interests, maybe you should join the Assimilation mailing list: http://lists.community.tummy.com/cgi-bin/mailman/listinfo/assimilation
Posted by: Alan R. | 30 July 2012 at 08:10