From a monitoring perspective, one of the most exciting possibilities in the Assimilation project comes from the integration of monitoring and discovery.
We've recently implemented the rules which will cause services to be automatically monitored once they're discovered. In other words, you don't have to tell the system to monitor these services, they'll just get monitored automatically.
For example, if you have a rule which describes how to monitor mysql, then whenever it sees the mysql service, it can start monitoring it without human intervention.
In this blog post, I'll describe how to tell the system to monitor services by using LSB-style init scripts. Later posts will cover more interesting ways of monitoring.
Monitoring with init scripts
Let's start simple - and see how you can define rules that describe how to recognize a service and monitor it with its init script -- /etc/init.d/service-name status. For this example, let's look at what it takes to recognize that a service is the secure shell, and then to specify that we monitor it via an LSB-style init script. Although this isn't a really good way to monitor a service, it does detect if it goes away.
The Assimilation Project's tcpdiscovery process discovers a lot about a service - what ports and IP addresses it listens on, what the binary name is, what arguments it was invoked with, what the user id and group id is and a lot of other things as well.
So, if we want to determine that some service we've discovered is an sshd service, we could do is look to see if the pathname of the binary has a basename of 'sshd'. Then we would know we could monitor it with /etc/init.d/ssh status. That would normally be considered more reliable than just saying if it's running on port 22, it must be sshd.
In Assimilation (and Pacemaker) terminology, we call this kind of a monitoring method an LSB monitoring method. Here's how you would define a rule that does what we've been talking about. You'll notice that this is just JSON (although we allow comment lines).
{
"class": "lsb",
"type": "ssh",
"classconfig": [
["@basename()", "sshd$"]
]
}
This tells the system that this is an LSB class (init script) monitoring rule, that the service can be monitored via /etc/init.d/sshd (type=sshd). In the case of LSB scripts, what the rest of the configuration is trying to do is do pattern matching on things we've discovered to see if this looks like something that can be monitored by this particular kind of monitoring. If you're running upstart or systemd for your init system, there are some exceptions which I'm ignoring for the sake of simplicity.
The list of elements in classconfig are tuples containing these elements:
(expression-to-evaluate, regular-expression-to-match)
So, in this case, we have an expression which calls a monitoring-defined function called basename. Basename returns the basename of its argument, or if it isn't given an argument, it operates on the pathname of the binary that is providing the service. In our case, this is "/usr/bin/sshd". This function then returns "sshd". This is then matched against the (anchored) regular expression 'sshd$'. If the regular expression matches, then we know that this service can be monitored by the ssh init script.
The Assimilation system can then start a monitoring action against this service automatically - because now we know how to monitor it.
This is the basic template for monitoring many services using init scripts. However, when the service is written in Java, or in python, the basename of the binary isn't sufficient to tell us that this is the service that we intended.
Here's an example for monitoring the Neo4j graph database - which is written in Java.
{
"class": "lsb",
"type": "neo4j-service",
"classconfig": [
["@basename()", "java$"],
["argv[-1]", "org\\.neo4j\\.server\\.Bootstrapper$"]
]
}
As before, we match the binary name - in this case against "java" - but we also match the final argument "argv[-1]" against the string org.neo4j.server.Bootstrapper. The double backslashes are required by JSON. The [-1] notation refers to the last element of the array - something a bit like argv[len(argv)-1]. You can refer to the next-to-the-last element as argv[-2] and so on.
The name 'argv' is the name of the argument list for the listening process in the neo4j graph node representing this service. Names like argv are evaluated in the context of both the service node and the server node. If it is defined as an attribute of either node, the resulting value is used. If the attribute value is a JSON string, then you can also write expressions that look for values inside the JSON.
Expressions consists of names like 'argv', or argv[1], or function calls like @basename. The currently-defined set of functions is shown below:
- argequals searches a list for an argument of the form name=value. The name is given by the argument in args, and the list 'arglist' is assumed to be the list of arguments. If there are two arguments in args, then the first argument is the array value to search in for the name=value string instead of 'arglist'.
- basename returns the basename from a pathname. If no pathname is supplied, then the executable name is assumed.
- dirname returns the directory name from a pathname. If no pathname is supplied, then the executable name is assumed.
- flagvalue returns the value after a flag - for example flagvalue(--debug) would return 5 when examining --debug 5 in the argument list.
- OR returns the value of its first non-NULL argument.
- serviceip searches discovery information for a suitable concrete IP address for a service. The optional argument to this function tells it an expression that will provides the map of IP/port combinations discovered for this service. The IP address it returns corresponds to the port returned by serviceport(). When possible, it returns an IPv4 address.
- serviceipport searches discovery information for a suitable ip:port combination. The optional argument to this function tells it an expression that provides the map of IP/port discovery information for this service.
The return value is a legal ip:port combination for the given address type (IPv4 or IPv6). It defaults to the lowest port number being listened to by the service. When possible it returns an IPv4 address. - serviceport searches discovery information for a suitable port for a service. The optional argument to this function tells it an expression that provides the hash table (dict/map) of IP/port discovery information for this service. If a service is listening on multiple ports, it defaults to the lowest port number being listened to by the service.
The reasons for some of these functions will become apparent in later blog posts.
As noted earlier, this is not the most interesting kind of monitoring to do (in fact, it's usually very minimal). Later articles will cover the same ground for more interesting (and more complex) kinds of monitoring.
I have a few questions for you readers out there. Does this make sense? Assuming we can handle better monitoring methods (and we can), does the idea of not having to configure monitoring seem worthwhile?
Something I should have made a note of - it automatically configures _monitoring_, not _alerting_. It's much harder to figure out how to alert for a particular service than it is to figure out how to monitor it. Monitoring is apolitical and mostly independent of company, but alerting is not. Who to contact, when to contact them, what's the priority of this service? All good alerting questions. But you don't need to know the answers to these questions to monitor it.
Posted by: Alan R. | 30 December 2013 at 16:01