In an earlier posting, I explained why I was looking at using the Neo4J graph database for the Assimilation Project. In the meantime, I made the decision, created a schema for it, and wrote the code to create the database. Although Neo4J is schemaless, every database needs a schema - a set of organizing principles - even if they are pretty flexible and aren't enforced by the database itself.
In this installment, I'll cover the basic schema of how I currently store information about servers in Neo4J. In later installments I'll cover other parts of the schema - like how the rings are represented, how services and dependencies are represented and so on.
Along the way, I'll risk exposing my ignorance of how to store data "properly"
or "efficiently" in Neo4J. And certainly, the Assimilation CMA code doesn't yet make any pretense of being efficient or fast. Right now I just want it to work ;-)
For the server-only part of the graph, there are three basic kinds of entities represented. These are:
- Servers (operating system images)
- NICs
- IP addresses
It's pretty easy to see that this particular part of the system is hierarchical. If this were all there were to the overall schema, something like MongoDB might be a better choice. But it's not all there is to a data center...
On the right is an example taken from running tests on my home desktop machine - inappropriately named servidor...
Since this is the Assimilation system, and we monitor the servers with "nanoprobes" it seemed appropriate to name servers "Drones"
This part of the system has three indexes to help find these components. They are:
- Index of server names
- Index of MAC addresses (which point to NICs)
- Index of IP addresses
As you can see I've created a variety of relationships between these various components. These are:
- nicowner - from a NIC to the Drone (server) it's installed in
- ipowner - from an IP address to the NIC that it's associated with
- iphost - from an IP address to the host that its NIC is installed in
- primary IP from a Drone to its "primary" IP address
One thing to note about Neo4J - you can follow a link in either direction. So, for example, if you wanted to follow the nicowner link from the Drones to its NICs this is perfectly easy to do so. Another sort of obvious thing is that the iphost links are redundant with the ipowner/nicowner path - but having them made writing the software a bit simpler - so I put them in. I might change my mind in the future, but they're there for now.
There is more to the graphs that the CMA generate now, and will be even more in the future - and those will be covered in later blog posts.
As a closing note, I drew this graph with a little Python program I wrote that uses Nigel Small's excellent py2Neo bindings for Neo4j - which I also use in the Assimilation CMA. My program is about 50 lines of code and feeds into the Graphviz's cool and useful dot program - I suspect I'll get to know it a lot better after I've worked with graphs for a while ;-). I became aware of it because it draws all the source dependency graphs that Doxygen generates for the Assimilation web site. Dependencies - graphs -- what an obvious combination.
As a post-postcript, I've also started a mailing list for the Assimilation Project - which you can sign up for here: http://lists.community.tummy.com/cgi-bin/mailman/listinfo/assimilation
Comments