Writing a simple pool allocator in C
Table of Contents
1. Introduction
I found out about pool allocators some time ago, and I really liked its simplicity and high performance, so I decided to write my own. This article was initially inspired by Dmitry Soshnikov’s article. It’s also worth mentioning that this article was discussed in Hacker News.
Similarly to malloc
, a pool allocator allows the user to allocate memory at run
time. The pool allocator, however, is much faster than malloc
1, at the cost of having a fixed pool
size. It allows the user to allocate and free memory blocks (referred to as
chunks, from now on) in O(1) constant time. This implementation also uses very
little memory: when creating the pool, a very small Pool
structure is allocated,
along with the pool itself. Free chunks are used to store information, so the
memory impact is minimal.
Lastly, I would like to mention that this article is based on my ANSI C library for pool allocation, libpool. In the future, I might make small changes to the code on that repository and they won’t be reflected here, so feel free to check it out as well.
2. Writing the initial implementation
The following implementation corresponds to version v1.0.0
of my libpool
library, with some notable “visual” differences (e.g. without ANSI
restrictions). Some improvements, like being able to resize/reallocate an
existing pool, will also be expanded below.
2.1. The Chunk
type
In our simple implementation, we will declare a Chunk
type with a fixed size. In
a more general implementation, we would use a chunk size specified by the
programmer at runtime, but that method adds a lot of casts, and I think it would
make this article more confusing.
First, let’s have a look at the definition of Chunk
.
#define CHUNK_SZ 64 typedef union Chunk Chunk; union Chunk { Chunk* next; char arr[CHUNK_SZ]; };
Notice how we are typedef
’ing a union
, not a struct
.
If you are not familiar with C union
’s, they are syntactically similar to
struct
’s, but they can be used to define variables that may hold (at different
times) objects of different types and sizes; as opposed to structures, which let
the programmer group different data types as a single record. Unions essentially
let the programmer access the data in a single area of storage as if it were
storing one of many data types at a given time. When you access a member of a
union variable (e.g. my_chunk.next
or my_chunk.arr
), the compiler is responsible
for accessing the union
variable as if it were the specified member type (in
that example, either a pointer or an array of CHUNK_SZ
characters, never both).
In this case, using a union
is convenient because it lets us “interpret” the
Chunk
as a pointer to another Chunk
depending on the context, while also
specifying the real size of the Chunk
(i.e. CHUNK_SZ
). Also note that the size
of a union is the size of the biggest possible member (in a 64-bit computer,
sizeof(Chunk*)
is 8 bytes, and since the arr
member is 64 bytes, that will be
the size of the Chunk
union).
However, it’s still not clear why we would need to interpret the Chunk
as a
pointer depending on the context. First, we need to understand that there will
be two (implicit) types of chunks:
- Free chunks: They are not in use by the program. They will use the
next
pointer. - Non-free chunks: They have been allocated, so we assume they contain
arbitrary user data. Specifically, in this implementation, each chunk can
contain (at most)
CHUNK_SZ
bytes of data.
As mentioned, these types are “implicit”. This means that there isn’t any flags
variable that lets us know whether a specified chunk is free; instead, we know
that a chunk is free because it will be inside a linked list of free
chunks. When we initialize the pool, since all chunks are free, we will use
the Chunk.next
pointer to link each chunk to its adjacent one, like this:
Figure 1: Four free chunks right after initializing the pool.
The gray area represents the rest of the union that is not used when accessing
Chunk.next
, and the incomplete red arrow represents a NULL
pointer. The creation
of this linked list, along with its advantages, will be explained when creating
the pool below.
2.2. The Pool
structure
We will also declare a Pool
structure, which simply contains:
- A pointer to the first element of the linked list of free chunks, which will be used when the user allocates a chunk.
- A pointer to the first element of the chunk array, which will be used when closing the whole pool.
typedef struct Pool Pool; struct Pool { Chunk* free_chunk; Chunk* chunk_arr; };
Again, other members such as the chunks size might be necessary depending on the implementation; in this case it’s known at compile time.
2.3. Creating the pool
We will define a function for creating a Pool
with an arbitrary number of
chunks.
Pool* pool_new(size_t pool_sz) { Pool* pool = malloc(sizeof(Pool)); if (pool == NULL) return NULL; pool->chunk_arr = pool->free_chunk = malloc(pool_sz * sizeof(Chunk)); if (pool->chunk_arr == NULL) { free(pool); return NULL; } for (size_t i = 0; i < pool_sz - 1; i++) pool->chunk_arr[i].next = &pool->chunk_arr[i + 1]; pool->chunk_arr[pool_sz - 1].next = NULL; return pool; }
Here’s a brief explanation of each step:
- We allocate the
Pool
structure that will be returned, usingmalloc
2. - We allocate the pool itself, that is, the array of
Chunk
unions. We initialize bothchunk_arr
andfree_chunk
pointers to the same address, since all chunks will be free by default. - We build the linked list of free chunks. We set the
.next
member of theChunk
union to the address of the adjacent chunk3, except for the last free chunk, which will point toNULL
.
This is how the pool looks after being returned from pool_new
:
Figure 2: Layout of a Pool
structure after initialization.
And this is how the pool looks after the user has allocated two chunks. This process will be explained below, but perhaps you are starting to realize the advantages of this method:
Figure 3: Layout of a Pool
structure after two allocations.
Since this implementation doesn’t support pool resizing, the only O(n) algorithm
occurs when creating the pool itself, since we need to iterate each chunk to
build the linked list described above. The chunk allocation process, on the
other hand, has O(1) complexity, since we always have a free chunk waiting for
us at Pool.free_chunk
. Freeing a chunk is also done in constant time, since we
just have to prepend an element to this linked list.
2.4. Allocating chunks
Now that the pool has a pointer to a linked list of free chunks, when the user requests an allocation for a chunk, we just have to:
- Ensure that we haven’t reached the end of the list, that is, ensure the
Pool.free_chunk
pointer is notNULL
. - The first element of this “free chunks” list will be returned. Before that,
remove it from the list by setting the start of the list
(i.e.
Pool.free_chunk
) to what used to be the second element (i.e.Pool.free_chunk.next
).
void* pool_alloc(Pool* pool) { if (pool == NULL || pool->free_chunk == NULL) return NULL; Chunk* result = pool->free_chunk; pool->free_chunk = pool->free_chunk->next; return result; }
Now the user can safely overwrite the contents of the pointer returned by
pool_alloc
, and it will be essentially setting the arr
member of the Chunk
union. This is fine, since that chunk is no longer part of our “free chunks”
list.
Just to emphasize once again, we are not iterating anything, so this process is constant. Allocating chunks of arbitrary size on constant time obviously has great advantages, specially when we have to allocate and free many times per second (e.g. many entities in each tick of a simulation).
2.5. Freeing chunks
Freeing chunks is pretty straight-forward, and if you understood the previous sections, I recommend you try to write your own function.
The freeing process simply consists of adding (prepending) a chunk back into the linked list of free chunks. As you can probably tell, this is also a constant process.
void pool_free(Pool* pool, void* ptr) { if (pool == NULL || ptr == NULL) return; Chunk* freed = ptr; freed->next = pool->free_chunk; pool->free_chunk = freed; }
For example, following the previous figure, this would be the layout after the user frees the first block of memory.
Figure 4: Layout of a Pool
structure after freeing a chunk.
Notice how, unlike with arena allocators, we don’t have to free in the same order that we allocated.
2.6. Closing the pool
Finally, once the user is done with the pool itself, it should be able to free it to the system. This is also pretty intuitive in this implementation, but it will get a bit trickier below.
void pool_close(Pool* pool) { if (pool == NULL) return; free(pool->chunk_arr); free(pool); }
3. Reallocation problems
When using a pool allocator, at some point you will probably want to be able to resize an existing pool, for example when you run out of chunks. This might not seem too hard, but there are a few caveats.
At first sight, we could reallocate a pool with a few simple steps:
- Reallocate the old chunk array (i.e.
my_pool.chunk_arr
). - Link the new chunks together, just like we did when creating the pool.
- Prepend the new chunks to the list of free chunks, just like we did when freeing a previously-allocated chunk.
For example, following the previous figure, if we reallocated the pool to add two more chunks, we would (at first sight) get the following layout.
Figure 5: Layout of a Pool
structure after resizing it, with two new chunks.
However, there is an important detail that is easy to miss. When we reallocate the pool (i.e. the array of chunks), the base address of the array might change, so the address of each chunk will also change. This is a problem because:
- The old pointers used to build the linked list of free chunks still point to the old array, so they become invalid. There are a few possible fixes for this, like recalculating the offsets4 from the old base address, storing offsets instead of pointers, etc.
- The pointers we returned when the user allocated chunks also point to the old array, so they are also invalid. If the user tries to access or free these pointers, a segmentation fault might occur.
This is how the layout will probably look like after the reallocation. Incomplete connections crossed-out with a single line represent invalid (but non-null) pointers to the old array, which is now invalid.
Figure 6: Resizing problems: old pointers may become invalid.
4. Second implementation: Expanding the pool
Instead of modifying the existing chunk array, we can allocate a new array with the number of chunks we want to add and prepend them to the linked list of free chunks an existing pool. Although this only let’s the pool grow (and not shrink), I think it’s what most implementations will need. Some important details about this approach will be explained below.
The following figure shows how two different Chunk
arrays could be allocated
separately. The green area denotes the initial chunk array allocated inside
pool_new
, while the blue area denotes a different chunk array allocated when
expanding the pool. The two arrays don’t necessarily have to be adjacent in
memory, which is why there is no need for reallocations.
Figure 7: Different chunk arrays after expanding a pool.
Notice how we have to keep track of the start of each array, since we will need
to free them separately. In the previous figure, we use two chunk_arr0
and
chunk_arr1
members to denote this, but since we would like to allow the user to
expand the pool an arbitrary number of times, we should be able to keep track of
an indefinite number of pointers (to the start of the chunk arrays) at runtime.
4.1. Keeping track of the array starts
For keeping track of these pointers, we will create another linked list of
“array starts”. We declare an ArrayStart
structure which will contain the
address of the next element in the linked list (or NULL
), along with the pointer
to the start of each array.
typedef struct ArrayStart ArrayStart; struct ArrayStart { Chunk* arr; ArrayStart* next; };
Now, instead of storing a single Chunk*
in the Pool
structure, we store a
pointer to the linked list of array starts.
struct Pool { Chunk* free_chunk; ArrayStart* array_starts; /* Updated */ };
This takes a bit more space in memory, but it’s worth it. Even if we don’t
expand the pool, only the size 2 more pointers would be needed: one that points
to the ArrayStart
structure itself, and the (unused) .next
member.
4.2. Changes to pool_new
and pool_close
The pool_new
and pool_free
functions need to be modified according to our new
ArrayStart
structure.
When creating the pool, instead of storing the base address of the chunk array
in pool->chunk_arr
, we will have to allocate an ArrayStart
structure and write
it there.
Pool* pool_new(size_t pool_sz) { Pool* pool = malloc(sizeof(Pool)); if (pool == NULL) return NULL; Chunk* arr = pool->free_chunk = malloc(pool_sz * sizeof(Chunk)); if (arr == NULL) { free(pool); return NULL; } for (size_t i = 0; i < pool_sz - 1; i++) arr[i].next = &arr[i + 1]; arr[pool_sz - 1].next = NULL; /* Added */ pool->array_starts = malloc(sizeof(ArrayStart)); if (pool->array_starts == NULL) { free(arr); free(pool); return NULL; } pool->array_starts->next = NULL; pool->array_starts->arr = arr; return pool; }
When closing the pool, we will also need to traverse this array_starts
linked
list, freeing each chunk array and each ArrayStart
structure in the list.
void pool_close(Pool* pool) { if (pool == NULL) return; ArrayStart* array_start = pool->array_starts; while (array_start != NULL) { ArrayStart* next = array_start->next; free(array_start->arr); free(array_start); array_start = next; } free(pool); }
4.3. Expanding without modifying the array
Now that we have a way of storing the start addresses of an indefinite number of chunk arrays, we can implement the expansion method shown in the previous figure. The expansion process is the following:
- Allocate the array of extra chunks that we are trying to add to the pool.
- Link the new chunks together, just like we did when creating the pool.
- Prepend the array of extra chunks to the “free chunks” list, just like we did when freeing chunks.
- Allocate a new
ArrayStart
structure, and store the start of the new chunk array in it. - Prepend this new
ArrayStart
structure to the linked list of “array starts”, stored inside thePool
structure.
And this is the code corresponding to the previous steps.
bool pool_expand(Pool* pool, size_t extra_sz) { if (pool == NULL || extra_sz == 0) return false; /* Step 1 */ Chunk* extra_arr = malloc(extra_sz * sizeof(Chunk)); if (extra_arr == NULL) return false; /* Step 2 */ for (size_t i = 0; i < extra_sz - 1; i++) extra_arr[i].next = &extra_arr[i + 1]; /* Step 3 */ extra_arr[extra_sz - 1].next = pool->free_chunk; pool->free_chunk = extra_arr; /* Step 4 */ ArrayStart* array_start = malloc(sizeof(ArrayStart)); if (array_start == NULL) { free(extra_arr); return false; } /* Step 5 */ array_start->arr = extra_arr; array_start->next = pool->array_starts; pool->array_starts = array_start; return true; }
Just like in the previous figure, the green and blue regions represent arrays
allocated independently, but their respective ArrayStart
structures are also
included in the diagram.
Figure 8: Layout of a pool after expanding, showing the linked list of array starts.
Naturally, this second implementation is able to allocate and free chunks with the same O(1) efficiency. The memory impact is slightly bigger, but it’s probably worth it if you ever want to resize a pool.
5. Adding valgrind support
Valgrind is a very useful tool for debugging and profiling programs. Among
other things, it lets you detect memory leaks and invalid memory
accesses. Personally, I have been using it for some years for detecting
memory-related bugs in programs that use the standard library (i.e. malloc
,
free
, etc.), but I didn’t know that the valgrind framework also had support for
custom allocators.
The Memcheck manual contains a lot of useful information for us, specially the Memory Pools section. This Red Hat article also contains some useful information about how the valgrind framework can be used with other kinds of allocators.
First, we will need to include valgrind’s headers. You might need to install
some valgrind-devel
package, depending on your distribution.
#include <valgrind/valgrind.h> #include <valgrind/memcheck.h>
We will need to register an anchor address (i.e. the address of our Pool
structure) with VALGRIND_CREATE_MEMPOOL
once, in pool_new
. This anchor address
is needed to associate each chunk with its specific pool whenever it’s allocated
or freed. We will also have to use VALGRIND_DESTROY_MEMPOOL
once we are done
with the pool, in pool_close
.
The VALGRIND_MAKE_MEM_NOACCESS
macro will be used when a memory region should be
inaccessible by the user (e.g. for the Pool
and ArrayStart
structures); while
the VALGRIND_MAKE_MEM_DEFINED
macro will be used whenever we need to read or
write to a NOACCESS
region (e.g. when accessing the Pool.array_starts
member in
pool_expand
). There is also a VALGRIND_MAKE_MEM_UNDEFINED
macro, but we won’t
need to call it manually.
Finally, we will have to use VALGRIND_MEMPOOL_ALLOC
when allocating a chunk from
the pool, and VALGRIND_MEMPOOL_FREE
when freeing it. These functions will call
VALGRIND_MAKE_MEM_UNDEFINED
5 and VALGRIND_MAKE_MEM_NOACCESS
respectively, so
we won’t have to worry about that.
With all that in mind, this is how our new code would look like:
Pool* pool_new(size_t pool_sz, size_t chunk_sz) { /* ... */ VALGRIND_MAKE_MEM_NOACCESS(arr, pool_sz * chunk_sz); VALGRIND_MAKE_MEM_NOACCESS(pool->array_starts, sizeof(ArrayStart)); VALGRIND_MAKE_MEM_NOACCESS(pool, sizeof(Pool)); VALGRIND_CREATE_MEMPOOL(pool, 0, 0); return pool; } bool pool_expand(Pool* pool, size_t extra_sz) { if (pool == NULL || extra_sz == 0) return false; VALGRIND_MAKE_MEM_DEFINED(pool, sizeof(Pool)); /* ... */ VALGRIND_MAKE_MEM_NOACCESS(extra_arr, extra_sz * pool->chunk_sz); VALGRIND_MAKE_MEM_NOACCESS(array_start, sizeof(ArrayStart)); VALGRIND_MAKE_MEM_NOACCESS(pool, sizeof(Pool)); return true; } void pool_close(Pool* pool) { if (pool == NULL) return; VALGRIND_MAKE_MEM_DEFINED(pool, sizeof(Pool)); ArrayStart* array_start = pool->array_starts; while (array_start != NULL) { VALGRIND_MAKE_MEM_DEFINED(array_start, sizeof(ArrayStart)); /* ... */ } VALGRIND_DESTROY_MEMPOOL(pool); free(pool); } void* pool_alloc(Pool* pool) { if (pool == NULL) return NULL; VALGRIND_MAKE_MEM_DEFINED(pool, sizeof(Pool)); if (pool->free_chunk == NULL) return NULL; VALGRIND_MAKE_MEM_DEFINED(pool->free_chunk, sizeof(void**)); /* ... */ VALGRIND_MEMPOOL_ALLOC(pool, result, pool->chunk_sz); VALGRIND_MAKE_MEM_NOACCESS(pool->free_chunk, sizeof(void**)); VALGRIND_MAKE_MEM_NOACCESS(pool, sizeof(Pool)); return result; } void pool_free(Pool* pool, void* ptr) { if (pool == NULL || ptr == NULL) return; VALGRIND_MAKE_MEM_DEFINED(pool, sizeof(Pool)); /* ... */ VALGRIND_MAKE_MEM_NOACCESS(pool, sizeof(Pool)); VALGRIND_MEMPOOL_FREE(pool, ptr); }
Although it makes the code less readable, adding valgrind support is definitely worth it.
As I mentioned, this is the first time I use valgrind with a non-standard allocator, so feel free to contribute if you have any suggestions or improvements.
Footnotes:
This depends
on many factors: the malloc
implementation, the hardware, the CPU load,
etc. Using the benchmark.sh
script from commit 9b101a7
of my libpool
repo
showed that malloc
took ~2.25 seconds to allocate 1 million blocks of 10
kilobytes each, while libpool
took ~0.75 seconds (3 times faster). Credits to
kragen for his informative replies in HN.
We
could allocate the Pool
and the chunk array with a single call, but I think
this would make it less readable. Furthermore, since the Pool
structure is
small, we could even return it directly on the stack, instead allocating it
and returning a pointer.
In this case, we store the “absolute address” of the adjacent chunk, but we could use less space by storing an offset relative to the start of the chunk array. This is not a big deal, though, since we will probably want to use the pool allocator with chunks bigger than a pointer in the first place.
If the pool (i.e. the anchor address) was
registered with the is_zeroed
argument set, valgrind will mark the chunk as
DEFINED
, instead of UNDEFINED
.