slash dev slash null

stuff about puters

Corfu: Paxos Configurator

This the seventh post in a series about the techniques used in Corfu. The techniques described this far are the basic mechanics. Corfu uses these to build a linearizable log with one final ingredient: Paxos. To quote the paper:

At its base, CORFU derives its success by scaling Paxos to data center settings and providing a durable engine for totally ordering client requests at tens of Gigabits per second throughput.

It does this by using Paxos to control the cluster configuration whilst using the simpler techniques outlined in this series of posts to scale the main transactional load.  Read the rest of this entry »

Corfu: Safety Techniques

This the sixth post in a series about Corfu. It will cover some of the basic safety techniques which I call “Write Once”, “Write-Ordering” and “Collaborating Threads”. What is remarkable about Corfu is how it uses such well-established techniques to build something extremely scalable. Read the rest of this entry »

Corfu: Copy Compaction

This the fifth post in a series about Corfu. In the last post we discussed striping a log across many mirrored sets of disks for IO concurrency. In this post we will discuss deletions from the global log and copy collection techniques to free the disk space:

CORFU makes it easy for developers to build applications over the shared log without worrying about garbage collection strategies

Copy collection will also allow us to compact the log even when we don’t use deletes which the paper doesn’t mention. Why do we need compaction under write-only usage? Writes are quantized into pages written to directly by clients in an uncoordinated manner. Under realistic workloads many writes may be much smaller than the page size. Without copy compaction Corfu would squander disk space.

Read the rest of this entry »

Corfu: Stripe & Mirror

This the fourth post in a series about Corfu. The last post introduced a global sequencer to reduce contention on writes to the end of a log file. The idea is to be able to distribute the log file across many machines. In this post we will scale up or fictional example using by striping and mirroring to achieve high performance:

[…] we can append data to the log at the aggregate bandwidth of the cluster […] Moreover, we can support reads at the aggregate cluster bandwidth. Essentially, CORFU’s design decouples ordering from I/O, extracting parallelism from the cluster for all IO while providing single-copy semantics for the shared log.

Read the rest of this entry »

 Corfu: Global Sequencer

This is the third post in a series about Corfu. In the last post we outlined how to linearizable consistency without a global buffer. This introduced a write contention challenge. In this post we will get into Corfu's simple fix for this issue. Corfu uses a global ticket service which acts as a sequencer:

Read the rest of this entry »

Corfu: Linearizable Consistency

This the second post in a series about Corfu. To quote the paper

The setting for CORFU is a data center with a large number of application servers (which we call clients) and a cluster of storage units. Our goal is to provide applications running on the clients with a shared log abstraction implemented over the storage cluster.

Why is a shared log abstraction exposed to clients useful? This post will discuss how a shared log can be used to achieve linearizable consistency. To quote the paper:

The semantics provided for read/write operations is linearizability [Herlihy and Wing 1990], which for a log means that once a log entry is fully written by a client or has been read, any future read will see the same content (unless it is reclaimed via a [delete] operation).

We will then look at a simple adaption that we will later see can be used to scale the system which introduces an additional challenge to solve.

Read the rest of this entry »

Corfu: Scaling log replication

Corfu: A distributed shared log is a paper running to 25 pages brought to my attention by The Morning Paper. What I really like about Corfu is how it borrows bits and pieces of established techniques and puts them together to get something which is a real breakthrough. Corfu achieves linearizable consistency of a fault tolerant distributed log using very simple write-once techniques. The real eye open is that it implements a distributed log interface without an IO bottleneck giving extreme performance. To quote the papers conclusion:

Despite almost forty years of research into replicated storage schemes, the only approach so far to scale up capacity and throughput has been to shard data and trade consistency for performance. In this article, we presented the CORFU system, which breaks this seeming tradeoff by organizing a cluster of drives as a single, shared log. CORFU offers single-copy semantics at cluster-scale speeds, providing a scalable source of atomicity and durability for distributed systems. CORFU’s novel client-centric design eliminates any single I/O bottleneck between numerous clients and the cluster, allowing data to stream to and from the cluster in parallel.

 

This series of posts will build up a simple fictional example that demonstrates the key techniques. Then I will glue it all together to describe how Corfu itself works and demystify the role Paxos plays.

Read the rest of this entry »

C# ASP.NET with SQLServer on Docker and Linux is now “A Thing™”

Those of you who have been following along will know that Scala on Linux is my preferred ecosystem. This past few weeks in the office I have been tinkering with the opensource C# ASP.NET ecosystem. What I came across shocked me to the dotnet core. I have posted some evidence of my findings up on GitHubRead the rest of this entry »

TRex UPaxos Experimental Code

There is now a sketch (hack!) demoing the happy path flow of a UPaxos reconfiguration written up on the TRex wiki.

Paxos Voting Weights

The last UPaxos post took a run through the reconfiguration stall avoiding aspects of the UPaxos paper. In this post, we will take a quick look at how the paper describes hot swapping of nodes using voting weights. Read the rest of this entry »