Rust and Embedded Databases for Paxos
So far on my spike into Rust we have been on a roll. Next on the bucket list is an embedded disk backed B-tree database to act as the Paxos journal for TRex. This is where I have hit my first bump in the road.
To recap Paxos needs to journal a few things to disk. We need the highest promise and the highest accepted messages flushed to disk. If a message gets lost we can potentially have that retransmitted. That fits with a disk backed B-tree. Something like Google’s LevelDB would be perfect. Only it isn’t. Huh?
LevelDB is written in C. That’s unsafe code. It would be a Very Bad Thing™ for me to tie my safe Rust library to an unsafe 3rd party library that could crash the host application. Don’t get me wrong LevelDB would be what I would probably use if it was my application. Only TRex is a library. I don’t want to have C library dependencies to my Rust library. Even if I hide it behind a trait and use C libraries in my demo apps that aren’t enough. Folks would use the C library dependency from a demo app as an endorsed approach. Crash bugs will eventually happen.
A quick scan through crates.io doesn’t come up with a mature pure Rust embedded B-tree database implementation. Bindings to C libraries like LevelDB and SQLite do exist. I can only find experimental examples of anything in Rust. This is probably because anyone wanting a production solution would run a mature C library rather than an immature Rust solution. This means that I am going to have to roll something basic myself.
Thinking it through a disk backed B-tree is a bit of a luxury. Under steady state, only a few uncommitted messages need to be held in-memory within a BTreeMap. A Write Ahead Log (WAL) can be used to journal messages before updating the BTreeMap. Crash recovery can simply scan the WAL to rebuild the in-memory map.
The only slightly tricky part of hand rolling a solution is responding to retransmission requests of committed messages. An append-only file can track the WAL position of each committed message. Retransmission responses can walk the fixed field width positions file and pull the payload to retransmit from the WAL.