I'm still on track for a first release of the Assimilation code by the end of the year. But there is one last interesting (meaning tricky) feature to write before this release. All communication is over UDP, which means the OS doesn't guarantee packet delivery. So we need to do that ourselves. From an availability perspective, we need to acknowledge packets at the application layer and high-throughput is not an issue, so nothing much is lost (the reasons for that are worthy of their own post). The most interesting part of this is that our protocol needs to be resilient to replay attacks. This post explains what a replay attack is, and how we plan on eliminating them.
First off - what's a replay attack?
A replay attack is when a series of messages are captured then played back later. Because the messages are identical, without some kind of special attention being paid, these messages will be acted on exactly like the original messages. Because of its connectionless nature, UDP communications are especially suspecptible to this kind of attack. Although an adversary can't send arbitrary messages because of digital signatures, if it has a way of capturing the messages, they can make the system misbehave - possibly in seriously bad ways.
How to deal with replay attacks?
The computer science literature has a number of scholarly articles written on replay attacks. But the essence is simple - recognize that this packet has been acted on before. There are a variety of different methods proposed in the literature, but for my purposes, I'm going to concentrate on two variants of one method - the one Linux-HA uses.
Since we are implementing our own protocol, we have control over thingsl like sequence numbers. If you could simply ensure that sequence numbers go up forever, without ever wrapping around, then you could simply let the sequence numbers go up forever and you could easily tell if a particular packet had been seen before. For a variety of reasons, the Linux-HA heartbeat protocol implements that with a generation number combined with a sequence number. The sequence number resets to zero whenever the protocol is restarted, and the generation number is ensured to go up monotonically.
How to manage generation numbers
In Linux-HA, there are two ways to manage generation numbers - one is to store the generation number on persistent storage (i.e., disk) and then to increment it whenever the protocol restarts. This works great if you have persistent storage (some systems don't), and you never restore from backups. The other method is to set the generation number to the current time since the UNIX epoch. This works great as long as you never reboot and set the clock back in time to a time before its value at the previous protocol reset. The Linux-HA heartbeat protocol implements both of these methods.
For the Assimilation project, I plan on implementing a combination of these two approaches. Use a number based on the current time, but also store it on disk. If when you restart the protocol, the current time is less than the previous generation number, set the generation number to one greater than the previous generation number. This can also fail - but only if you have no persistent storage (or do a restore) and you also set the clock back to before the time it had at the last protocol restart. This combination is an improvement over either method by itself and the probability of failures is significantly reduced.
Over the next month or so, I'm planning on implementing the reliable UDP protocol including this method of eliminating replay attacks. As a result, I'm looking for feedback on this. Do you have any suggestions on how to improve this? Are there holes in it or use cases that fail that I missed?
PS: The current release implements digital signatures, but doesn't yet implement any cryptographically useful ones.