Building a database: BufferManager

A database has many parts: the query executor, networking, query parsing and optimizing, schema management, etc. However, we will be starting way down the stack, in what is know as the “Buffer Manager”. The goal of the buffer manager is to efficiently manage the data that the layers above (query executor) need.

This is complicated by the fact that we’re designing our system to store more data than fits in RAM, which means the buffer manager will be moving data between disk and memory as needed. We will explore managing the disk later – this first version of our buffer manager will only be in memory.

Since we will be storing data to disk, and disks work in blocks (e.g. writes to disk are done in 4kb pages), our buffer manager will be page based. That is, it will allocate and manage pages of data of a certain size. Most databases used a fixed page size, usually between 4kb and 16kb.

So what is the basic interface of our buffer manager?

struct BufferManager {
    // ...
}

impl BufferManager {
    fn allocate_page<T>(&self) -> Swip<T>
    unsafe fn release_page<T>(&self, frame: Swip<T>)
}

As you can see, our BufferManager’s interface is pretty limited. We will later need to add more to deal with the disk, but for now this will suffice. If you’re familiar with standard buffer managers, you’ll notice a lot missing. Usually we’d need methods for loading and pinning frames at the very least. What gives? Also, what is a Swip<T>?

Our BufferManager is based on the work in the awesome paper ‘LeanStore’. This paper describes a simple, yet extremely powerful method of managing buffers. Traditionally, buffer managers would own a hashmap that pointed from page ids (pid) to frames. Instead, LeanStore uses a notion of a tagged pointer (Swip), leaning on the fact that there are unused bits in a standard 64 bit pointer.

EDIT: This isn’t quite right. We need to add ability to load from disk which is essentially “pinning” a page.

The idea is that a Swip that looks like a pointer is a pointer to live memory, while a Swip that has a bit set in a position that wouldn’t be in a pointer, is referencing a page id which is stored on disk. Thus, if you encounter the former you can immediately derefence, while if it stores something on disk, we need to load it before dereferencing.

struct Swip<T> {
    ptr: u64,
    _marker: std::marker::PhantomData<T>,
}

impl<T> Swip<T> {
    /// Safety: Caller must ensure the underlying BufferFrame is live
    unsafe fn as_buffer_frame(&self) -> &BufferFrame {
        assert!(self.ptr & SWIZZLE_MASK == 0);
        (self.ptr as *const BufferFrame).as_ref().unwrap()
    }
}

Performing this operation is unsafe since we need to be certain that self.ptr is pointing to valid memory of a live BufferFrame. It’s trivial to create a bad Swip.

So waht even is a BufferFrame and more importantly, where did the memory come from and how are we sure it’s live? Well, this is the job of our BufferManager! You see, it is the job of the BufferManager to ensure the memory behind a Swip is valid, and inversely, a Swip is a pointer that higher levels will use to access this memory.

Said another way, we ask for some memory, and the BufferManager provides us the memory via a Swip! Also, you’ll notice that Swips are generic over T. This is simply a convinence to reduce manually casting later. The Swip doesn’t actually store anything of type T (thus the PhantomData)–nor does loading the reference cast to T, it returns a BufferFrame. However, we will see in a later chapter how it is used—for now it is safe to ignore.

Ok, so let’s look under the hood to see where this memory comes from.

We’re drawing on insights from a LeanStore follow up titled Umbra. In this paper, they describe using a large anonymously mmap’ed region. Actually, they show a path to having variable page lengths, which is fairly novel and we will be implementing, but we will start with a fixed page size for simplicity.

So when we create our BufferManager, we will create this mmap’ed region. Therefore, as long as any Swip produced by the BufferManager point into this memory segment—the Swips will point to valid memory as long as they don’t outlive the BufferManager, which we can utilize Rust’s lifetimes to achieve!

use mmap::MmapMut;

impl BufferManager {
    fn new(max_memory: usize) -> Self {
        let mmap = MmapMut::map_anon(max_frames).unwrap();
        
        BufferFrame {
            page_data: UnsafeCell::new(mmap),
        }
    }

    fn allocate_page<T>(&self) -> Swip<T> {
        // Just bump allocate for now
        // TODO: reuse after release_page()
        let page_id = self.last_page.fetch_add(Ordering::Relaxed);
        let start = page_id * self.page_size;

        unsafe {
            let mmap = self.page_data.get().as_mut().unwrap();
            &mut mmap[start..start+self.page_size]
        }
    }
}

From a Rust perspective, we need to create the mapped region and then wrap that in an UnsafeCell. This is because we will need to unsafely gain access to the mmap as a mutable reference at times (what good would read only zero’ed out data be!?)

(picture of the backing storage and buffer manager)

Of course, safely accessing the memory through Swips is another whole challenge. We can’t just have threads all over the place freely writing into this memory. Therefore, we will build upon Swips to provide a safe way to read and write the underlying data. Next time.

✨

Series

Series Intro: Building a database
Building a database: BufferManager
Building a database: Large Nodes