Ring Buffer

Introduction

First let's talk a bit about First In First Out (FIFO) data structure. A FIFO act like a queue where elements are pushed in (enqueue) and pulled out (dequeue) in the same order. The oldest element in the FIFO will always be the first to be pulled out, hence named First-In First-Out. In case you're wondering Last-In First-Out (LIFO) exists and act like a pile: add element on top, remove elements from the top.

Typically a ring buffer is a FIFO that is implemented to work on a fixed size memory region. Some ring buffer supports a dynamic number of elements and can be implemented with a linked-list (to accommodate with new elements being added to the queue).

Let's only consider ring buffers working on a fixed size memory region. Such buffer can be full, when filled with the maximum number of elements.

A ring buffer is a simple and very useful mechanism to exchange data between processes, but it is a one way communication. In special condition a ring buffer can be made lockless: data can be exchange between two processes without a lock.

Ring buffer example

Let's create a empty ring buffer of 8 empty cells.

Both read and write pointer are initialized to the first empty cell (index 0). When both read and write indices points to the same cell then the ring-buffer is /empty/: there is no elements.

The following is a graphical representation of an empty ring buffer:

  ╓ Read/Write=0
┌─╫─┬───┬───┬───┬───┬───┬───┬───┐
│   │   │   │   │   │   │   │   │
└───┴───┴───┴───┴───┴───┴───┴───┘

Pushing new elements

Let's push one element A into the ring buffer. The ring buffer is not full, we can put the new element in the cell pointed by the write index (0) and increase the write index to point on the next free cell.

After pushing two more elements (B and C), the read index is still 0 but the write index is equal to 3: the ring-buffer has three elements.

After pushing the three elements the ring buffer will look like:

  ┌ Read=0    ┌ Write=3
┌─┼─┬───┬───┬─┼─┬───┬───┬───┬───┐
│ A │ B │ C │   │   │   │   │   │
└───┴───┴───┴───┴───┴───┴───┴───┘

Pulling out elements

Let's pull one element out from the ring buffer. The ring buffer is not empty as the read index is different from the write index. The read index, equal to 0, points to the next element to be read, here A. Once this element is read, and will no longer be needed, the read index can be increased by one (the number of elements pulled-out), now the read index has the value 1.

After pulling A out the ring buffer has two elements left:

      ┌ Read=1┌ Write=3
┌───┬─┼─┬───┬─┼─┬───┬───┬───┬───┐
│   │ B │ C │   │   │   │   │   │
└───┴───┴───┴───┴───┴───┴───┴───┘

We can see that the number of elements present inside the ring buffer can be calculated by subtracting the write index by the read index.

nr_elements = write_index - read_index;

-- This is true only if the write index is greater or equal to the read index.

After pulling-out two more elements (B then C), the fifo is empty:

              ╓ Read/Write=3
┌───┬───┬───┬─╫─┬───┬───┬───┬───┐
│   │   │   │   │   │   │   │   │
└───┴───┴───┴───┴───┴───┴───┴───┘

Full condition

Let's look at what happens when a ring buffer gets full. Continuing with previous buffer state where both read and write indices equal to 3.

Let's push five more values: a,b,c,d,e. Now the write index equal to

8` and has gone beyond the last index of the underlying array (of `8` elements).

This is not really an issue, if we do a simple modulo operation on the write index we can get back to the beginning: 8 mod 8 equal to 0.

  ┌ Write=8%8 ┌ Read=3            ┌ Write=8
┌─┼─┬───┬───┬─┼─┬───┬───┬───┬───┐ │
│   │   │   │ a │ b │ c │ d │ e │
└───┴───┴───┴───┴───┴───┴───┴───┘

We also need to update the way we calculate the number of elements in the ring buffer. To handle the case where the write index can be less than the read index. We can simply do the same operation as before but modulo the array size:

nr_elements = (array_size + write_index - read_index) % array_size;

In our example: (8 + 0 - 3) % 8 which gives 5.

Now if we push three more values: f, g, h, we would have both read and write indices with the same value of 3. But this time the ring buffer is full:

              ╓ Read/Write=3
┌───┬───┬───┬─╫─┬───┬───┬───┬───┐
│ f │ g │ h │ a │ b │ c │ d │ e │
└───┴───┴───┴───┴───┴───┴───┴───┘

This edge case need to be handled, common implementation of lockless ring buffers have a naive but simple solution, which is to not fill all elements in the buffer, thus there is always a "blank" spot in the buffer.

We are going to see later why we need to use two indices and how this edge case can be dealt with without this limitation.

Lockless ring buffer

We have now explored an example on how a ring buffer is commonly used, and have found a limitation of using two indices: the difficulty to distinguish a between an empty buffer and a full buffer.

A ring buffer can be used to exchange data between two threads without ever using a lock, that is possible by strictly using two indices.

To be used without a lock, both read and write pointers, need to always be written by only one thread/process: the Writer process must only write to the write pointer; the Reader process must only write to the read pointer. In other word there can be only one thread that write into the ring buffer and only one thread that read from the same buffer.

Using all elements

I found a neat solution to use all the space available in the buffer: to use the read/write indices modulo two time the buffer size. The idea is to distinguish between an empty and a full buffer. In both case the read and write index points to the same buffer index (modulo the buffer size).

The additional space in the index number can be viewed as a "fold" number The ring buffer is full when indices refer to the same cell on different folds, empty when they are on the same fold.

And we don't need more then two folds, as indices cannot be more apart that the actual buffer size (otherwise you're overwriting stuff).

In the following diagram we can see a full ring buffer with a read index of 3 and a write index of 12:

              ┌ Read=3                        ┌ Write=12
┌───┬───┬───┬─┼─┬───┬───┬───┬───┐╌╌╌╷╌╌╌╷╌╌╌╷╌┼╌╷╌╌╌╷╌╌╌╷╌╌╌╷╌╌╌╷
│ f │ g │ h │ a │ b │ c │ d │ e │ f'╎ g'╎ h'╎   ╎   ╎   ╎   ╎   ╎
└───┴───┴───┴───┴───┴───┴───┴───┘╌╌╌╵╌╌╌╵╌╌╌╵╌╌╌╵╌╌╌╵╌╌╌╵╌╌╌╵╌╌╌╵

Implementation

see ring_buffer.h

Notes

At the time where I initially started writing this I found another article describing the exact same way of using all the space available in a ring buffer.

Memory model

Each processor architecture should define a Memory Model, this is a subtle but important topic. The Memory Model defines the guarantees on memory operations, how memory operations will be seen by other component of the system (other CPUs).

x86 has a Total Store Order (TSO), which means that every CPU in the system will see every writes in the same order as they were executed.

arm has a Weak Memory Order (WMO), which means that two CPU might not see memory writes in the same order, unless explicit memory barrier is used. This is bad in the case of a lockless ring buffer because buffer data are expected to be available in the global memory before increasing the write index, otherwise the other CPU might read garbage. The wmb (write memory barrier) arm instruction will forces all prior write operation to complete before resuming the execution.