slash dev slash null

simbo1905’s ramblings about computers

Tag: Paxos algorithm

Corfu: The Protocol

This is the eighth and last in a series of posts about Corfu a strongly consistent and scalable distributed log. To quote the paper:

At its base, CORFU derives its success by scaling Paxos to data center settings and providing a durable engine for totally ordering client requests at tens of Gigabits per second throughput.

It does this through a number of existing techniques:

Our design borrows bits and pieces from various existing methods, yet we invented a new solution in order to scale SMR [state machine replication] to larger cluster scales and deliver aggregate cluster throughput to hundreds of clients.

What I really like about Corfu is that it innovates by bringing existing techniques such as Paxos together into something which is a real breakthrough. The protocol itself is very simple. The established techniques it uses to implement the protocol are very simple. Yet when assembled into the solution they become extremely powerful. This post will put the pieces together and discuss the Corfu protocol.

Read the rest of this entry »

Corfu: Safety Techniques

This the sixth post in a series about Corfu. It will cover some of the basic safety techniques which I call “Write Once”, “Write-Ordering” and “Collaborating Threads”. What is remarkable about Corfu is how it uses such well-established techniques to build something extremely scalable. Read the rest of this entry »

Corfu: Copy Compaction

This the fifth post in a series about Corfu. In the last post we discussed striping a log across many mirrored sets of disks for IO concurrency. In this post we will discuss deletions from the global log and copy collection techniques to free the disk space:

CORFU makes it easy for developers to build applications over the shared log without worrying about garbage collection strategies

Copy collection will also allow us to compact the log even when we don’t use deletes which the paper doesn’t mention. Why do we need compaction under write-only usage? Writes are quantized into pages written to directly by clients in an uncoordinated manner. Under realistic workloads many writes may be much smaller than the page size. Without copy compaction Corfu would squander disk space.

Read the rest of this entry »

Corfu: Stripe & Mirror

This the fourth post in a series about Corfu. The last post introduced a global sequencer to reduce contention on writes to the end of a log file. The idea is to be able to distribute the log file across many machines. In this post we will scale up or fictional example using by striping and mirroring to achieve high performance:

[…] we can append data to the log at the aggregate bandwidth of the cluster […] Moreover, we can support reads at the aggregate cluster bandwidth. Essentially, CORFU’s design decouples ordering from I/O, extracting parallelism from the cluster for all IO while providing single-copy semantics for the shared log.

Read the rest of this entry »

 Corfu: Global Sequencer

This is the third post in a series about Corfu. In the last post we outlined how to linearizable consistency without a global buffer. This introduced a write contention challenge. In this post we will get into Corfu's simple fix for this issue. Corfu uses a global ticket service which acts as a sequencer:

Read the rest of this entry »

Corfu: Linearizable Consistency

This the second post in a series about Corfu. To quote the paper

The setting for CORFU is a data center with a large number of application servers (which we call clients) and a cluster of storage units. Our goal is to provide applications running on the clients with a shared log abstraction implemented over the storage cluster.

Why is a shared log abstraction exposed to clients useful? This post will discuss how a shared log can be used to achieve linearizable consistency. To quote the paper:

The semantics provided for read/write operations is linearizability [Herlihy and Wing 1990], which for a log means that once a log entry is fully written by a client or has been read, any future read will see the same content (unless it is reclaimed via a [delete] operation).

We will then look at a simple adaption that we will later see can be used to scale the system which introduces an additional challenge to solve.

Read the rest of this entry »

Corfu: Scaling log replication

Corfu: A distributed shared log is a paper running to 25 pages brought to my attention by The Morning Paper. What I really like about Corfu is how it borrows bits and pieces of established techniques and puts them together to get something which is a real breakthrough. Corfu achieves linearizable consistency of a fault tolerant distributed log using very simple write-once techniques. The real eye open is that it implements a distributed log interface without an IO bottleneck giving extreme performance. To quote the papers conclusion:

Despite almost forty years of research into replicated storage schemes, the only approach so far to scale up capacity and throughput has been to shard data and trade consistency for performance. In this article, we presented the CORFU system, which breaks this seeming tradeoff by organizing a cluster of drives as a single, shared log. CORFU offers single-copy semantics at cluster-scale speeds, providing a scalable source of atomicity and durability for distributed systems. CORFU’s novel client-centric design eliminates any single I/O bottleneck between numerous clients and the cluster, allowing data to stream to and from the cluster in parallel.


This series of posts will build up a simple fictional example that demonstrates the key techniques. Then I will glue it all together to describe how Corfu itself works and demystify the role Paxos plays.

Read the rest of this entry »

The Trial Of Paxos Algorithm

The Paxos Consensus Algorithm gets a bad rap. I think there is something quite beautiful, simple and intuitive hiding within Paxos; collaboration. Raft which was designed as a replacement restores the master-slave concept to supremacy. That may be a pragmatic thing to do yet I do not consider the master-slave model elegant (note: this is a subjective statement of my aesthetic taste not a statement of utility). Given all the bad press Paxos gets I worry that people will skip over it and miss out on enjoying its elegance before moving onto alternatives. They may then fail to use Paxos for the wrong reasons else avoid it in scenarios where it may be better suited than alternatives.
Fork me on GitHub
I am probably going to regret this but I am going to make the case for the defence of Paxos. I will do this in three parts. First I will have a go at explaining why I think Paxos is cute, elegant and exactly the sort of algorithm you want to truly get your head around. I will be a character witness in the court of public opinion. Second I will say that the accused, Paxos Algorithm, was framed and put on trial for a crime it did not commit. Then we will cross examine some star witnesses for the prosecution and bring in our own celebrity expert witness. Finally I will say that although Paxos should be acquitted of all charges that doesn’t imply you should be using it for every problem you have at hand.

Read the rest of this entry »