Jun 29, 2015

Memory Stomp Allocator

A decade ago a wrote a stomp allocator for Unreal Engine 4. I wrote it out of necessity when I was working on one of the cancelled iterations of Dead Island 2. To avoid introducing engine divergences (a topic I will probably write about at some point) I made the change public and provided a pull request for it. The pull request was accepted, and it was useful enough that Epic wrote about it in blog post and even talked about it on a stream. Being one of those tools that I have always relied on for more than two decades I decided to bring back the old post about it with some modifications.

Symptoms of a memory stomp

The symptoms of a memory stomp could be as clear as an explicit error about a corrupted heap, or as complex as unexpected behavior without any crash at all. That’s why they can be so hard to catch. Just as an example, I had to deal with the case where some CPU particles were behaving in a way that they would end up at a wrong position and with color change every now and then for a few frames. After looking for a logic or content error, I was able to determine that the issue had nothing to do with the code or content changing the color or transform. The issue was a memory stomp on unrelated code which ended up in the same arena. Another example is when pointers get overwritten. In that case you might waste some time thinking that there was a memory stomp of the data being pointed to when in fact the pointer itself was stomped. At that point you are just getting a wrong view of unrelated data. Depending on the nature of the code accessing the pointer, it may or may not crash. Still, the symptom would be similar to that of overwritten data.

Common types of memory stomps

Memory overrun. Reading or writing off past the end of an allocation.

1constexpr size_t numBytes = 1u << 6u;
2uint8_t* const testBytes = new uint8_t[numBytes];
3const size_t someVariable = 1u << 7u; // Memory overrun.
4testBytes[someVariable] = 1u;
5delete[] testBytes;

Memory underrun. Reading or writing memory before the beginning of an allocation.

1constexpr size_t numBytes = 1u << 6u;
2uint8_t* const testBytes = new uint8_t[numBytes];
3const int someVariable = -128; // Memory underrun.
4testBytes[someVariable] = 1u;
5delete[] testBytes;

Operation after free. Reading or writing memory that was already freed.

1constexpr size_t numBytes = 1u << 6u;
2uint8_t* const testBytes = new uint8_t[numBytes];
3delete[] testBytes;
4testBytes[0u] = 1u; // Operation after free.

Detecting stomps

The stomp allocator work by using the memory protection features in the operating systems. The OS allows tagging different memory pages with different access modes. The access modes are OS-dependent, but they all offer four basic protection modes: execute, read, write, and no-access. Based on that the stomp allocator allocates at least two pages. One page where the actual allocation lives, and an extra page that will be used to detect the memory stomp. The allocation requested is pushed to the end of the last valid page in such way that any read or write beyond that point would cause a segmentation fault since you would be trying to access a protected page. Since people say that a picture is worth a thousand words, here is a reference diagram:


Diagram not to scale. The memory allocation information + the sentinel only accounts for 0.39% of the two pages shown.

As it is visible, there is a waste of space since we must deal with full pages, but that’s the price of the feature. But the waste is limited to a full page plus the waste in the space of the first valid page. Considering that, the stomp allocator isn’t something that would be enabled by default. Instead it is a tool to help you find those hard to catch issues, and only in the relevant arenas. Another aspect for safety is the use of a sentinels to detect an underrun as you can write a value such as 0xdeadbeefdeadbeef and check for its integrity. Still, the best option is to provide a mode where the stomp allocator tags the first page as no-access and then the second and subsequent pages are used for the actual allocation.


Diagram not to scale.

The last relevant aspect is that any attempt to read or write from memory already free should fail as it happens. A performance-aware allocator (such as an arena allocator) would add freed pages to a free list and return them when someone else request memory without making a request to the OS for new pages. That is one of the reasons why they are fast. But in the case of the stomp allocator, we either want to request the OS to free those pages, or in the worst case scenario tag all pages in the free list as no-access and make sure the free list is managed as first-in-last-out to increase the chances of catching the memory stomp.

Implementation

The implementation of a stomp allocator can very simple for a naive implementation with good assurance that if there is a stomp it will be found. The naive implementation is to allocate pages as memory is requested (via APIs such as VirtualAlloc() or mmap() depending on the OS) and free and decommit pages as memory is freed (via VirtualFree() and munmap()). While the approach is good when it comes to increasing the odds of finding a memory stomp, those APIs are quite expensive in terms of performance. The alternative is to allocate a whole bunch of pages at launch and then change the protection mode of the pages to be no-access (via VirtualProtect() and mprotect()). When memory is released the stomp allocator just protects the pages again, and when memory is requested the least recently used free pages have the protection lifted and it is sent back for use. That makes the stomp allocator faster but it increases the odd of a page being reused which means that a stomp may not be detected.

Video


Clip from “A Year in Review with Tim Sweeney & Mike Fricker”* from December 31st 2015.*