I've recently been discussing some nitty gritty details about how packets are laid out on the wire with a member of the Assimilation Project, so it seemed good to explain how our packets are constructed and deconstructed to everyone while I was at it. This blog post talks about how we send lay out the bytes on the wire, with a bit of why and how.
Type, Length, Value messages
First off, the general low-level format is a variant on Type-Length-Value (TLV) messages. In the software, we refer to these individual TLV fields as Frames, which are in turn concatenated into a FrameSet. One or more FrameSets can be put into a packet. Our packets are UDP (with a user-level retransmission protocol to ensure reliable delivery). It's also worth remembering that in the Assimlation Project, communication between endpoints can consist of only a handful of packets every few months - or even every few years. By design, silence is the norm - communication is the unusual case.
Each FrameSet has a type of its own. There are three "magic" classes of TLV Frames, which if they appear, must appear in this order:
- SignFrame - Digital Signature - currently not cryptographically secure (that's coming). This Frame is not optional. It indicates the type of signature and provides the signature data.
- CryptFrame - Encryption Frame (optional)- indicating what type of encryption was used to encrypt the frame.
- CompressFrame - Compression Frame (optional) - indicating what kind of compression was used to compress the frame.
If the packet is neither encrypted nor compressed, then it will only have a digital signature frame before the "payload" frames.
Other classes of frames include the following:
- SeqnoFrame - a sequence number frame (used by the reliable protocol)
- AddrFrame - a network address (any one of several types)
- CstringFrame - a NUL-terminated C-style string
- IntFrame - an integer (2,3,4 or 8 byte integers)
- IpPortFrame - an IP address with a port number (can be IPv4 or IPv6)
There are also many types that correspond to a single Frame subclass. For example, HOSTNAMEs, JSDISCOVER, DSCJSON, RSCJSON are all TLV types that correspond to CstringFrame objects. In other words, hostnames, output from discovery operations, discovery requests, and resource descriptions are all represented as strings (CstringFrames) - but each has its own TLV type.
Why Signatures, Encryption and Compression?
Since the nanoprobes run as root, and are capable of doing potentially dangerous operations, it is only prudent that we have cryptographically secure digital signatures (which have not yet been implemented). It is expected that each role (CMA, nanoprobe) will have its own signature. Although it is possible for each nanoprobe to have its own unique signature, this causes headaches when nanoprobes send each other heartbeats. It may eventually be implemented, but initial implementations are likely to be simpler.
The case for encryption is a little less obvious. Although most data from monitoring and discovery is not likely to be security sensitive, in some environments the data will be considered sensitive by the customer - in which case encryption is a good idea.
Compression is needed because our communication is UDP - which limits message sizes to a little more than 60K bytes. The most voluminous data is expected to be discovery results - which are JSON. Initial experiments indicate that the compression ratio for JSON should be about 15:1, which gives about 900K for the maximum uncompressed packet size. Eventually, we may need to add packet fragmentation to accommodate larger discovery results.
Packet Construction (Marshalling)
The process of constructing a message byte string from a FrameSet proceeds as follows:
- A first-to-last pass is made through the Frames to deterimine the message size, by adding up the Frame sizes.
- Space to hold the packet is allocated
- A last-to-first pass is made through the Frame, placing the data from each frame in its correct position in the message
The encryption and compression Frames perform unique operations. Although processing things in reverse order seems odd, the demarshalling will reveal a method to my madness...
When a compression frame is encountered, it replaces the message data with a compressed version of the message. If the compression method cannot (or is instructed not to) reduce the size of the message, it shifts the message up over the compression frame, effectively deleting it. It adjusts the size of the message accordingly
When an encryption frame is encountered, it replaces the message data after the encryption Frame with an encrypted verison of the message. Note that it is important that the packet be compressed before encrypting it. Encrypted packets don't compress very well.
Packet De-Construction (Demarshalling)
The process of demarshalling a message is the mirror of the marshalling method.
- A first-to-last pass is made through the message, creating Frames that correspond to the various elements of the message.
The following rules are observed:
- The digital signature is checked - and the role associated with that signature is determined and stored away. (i.e., is this packet from the CMA, or from a nanoprobe).
- If the next frame is an encryption frame, then the remainder of the message is replaced by a decrypted form of itself.
- If the next frame is a compression frame, then the entire message is replaced by a decompressed version of itself.
- The remaining frames are demarshalled as before.
Although the signature, encryption and compression frames are special cases - by enforcing that they occur in the proper order, the amount of special case code is minimized.
Although this isn't exactly rocket science, by doing things in the right order, the whole process is reasonably straightforward - which in my book is a virtue.
Does this make sense to you? How should it be improved? How can I improve this description?