Corfu: Global Sequencer

by simbo1905

This the third post in a series about Corfu. In the last post we outlined how to linearizable consistency without a global buffer. This introduced a write contention challenge. In this post we will get into Corfu’s simple fix for this issue. Corfu uses a global ticket service which acts as a sequencer:

Ultimately, append throughput hits a bottleneck at around 180K appends/sec; this is the maximum speed of our sequencer implementation, which is a user-space application running over TCP/IP.

and:

The evaluation for this article was done with a user-space sequencer serving 200K tokens/s, whereas our more recent user-space sequencer is capable of 500K tokens/s.

The global sequence can be a simple 64bit counter representing the next empty page of an infinite logical log file. In Java the `AtomicLong::getAndIncrement` API will do the job. When a client wants to write to the end of the file it simply invokes the global sequencer service to obtain the next number. It then targets its write at the corresponding page within the global log file.

The global sequencer gives us zero contention with a very simple and very fast service. In the Corfu paper they find that they saturate other parts of their evaluation system before the sequencer becomes a bottleneck. It is important to note is that the global sequencer isn’t required for either safety nor correctness. The sequencer is only an optimisation to avoid write contention. Safety is maintained using simple “write-once” semantics. We only need an extra bit in a header for each page to track whether a page has been written to. We then only need a trivial boolean check of that bit to obtain safety.

In a later post we will have to consider what happens if the global sequencer crashes. Theoretically the system can continue to make some process under write contention. In practice I suspect that performance would be so degraded that the knock-on problems would be insurmountable. It is probably best to fail fast when the sequencer dies until it can be replaced. Exactly how that works we will save to a later post.