So you got a memory corruption issue with a piece of software. It comes in a unique scenario along the line of having a huge pile of weird code running well most of the time and then, right out of the blue, a corruption takes place followed by unexpected code execution and unstable software state in general.
The biggest problem with memory corruption is that a fragment of code is modifying a memory block which it does not own, and it has no idea who actually is the owner of the block, while the real owner has no timely way to detect the modification. You only face the consequences being unable to capture the modification moment in first place.
To get back to the original cause, an engineer has to drop into a time machine, turn back time and step back to where the trouble took originally place. As developers are not actually given state-of-the-art time machines, the time turning step is speculative.
CVirtualHeapPtr Class: Memory with Exception-on-Write access mode
At the same time a Windows platform developer is or might be aware of virtual memory API which among other things provides user mode application with capabilities to define memory protection modes. Having this on hands opens unique opportunity to apply read-only protection (PAGE_READONLY) onto a memory block and have exception raised at the very moment of unexpected memory modification, having call stack showing up a source of the problem. I refer to this mode of operation as “hardware assisted” because the access violation exception/condition would be generated purely in hardware without any need to additionally do any address comparison in code.
Needless to say that this way is completely convenient for the developer as he does not need to patch the monstrous application all around in order to compare access addresses against read-only fragment. Instead, a block defined as read-only will be immediately available as such for the whole process almost without any performance overhead.
As ATL provides a set of memory allocator templates (CHeapPtr for heap backed memory blocks, allocated with CCRTAllocator, alternate options include CComHeapPtr with CComAllocator wrapping CoTaskMemAlloc/CoTaskMemFree API), let us make an alternate allocator option that mimic well-known class interface and would facilitate corruption detection.
Because virtual memory allocation unit is a page, and protection mode is defined for the whole page, this would be the allocation granularity. For a single allocated byte we would need to request SYSTEM_INFO::dwPageSize bytes of virtual memory. Unlike normal memory heap manager, we have no way to share pages between allocations as we would be unable to effectively apply protection modes. This would definitely increase application pressure onto virtual memory, but is still acceptable for the sacred task of troubleshooting.
We define a CVirtualAllocator class to be compatible with ATL’s CCRTAllocator, however based on VirtualAlloc/VirtualFree API. The smart pointer class over memory pointer would be defined as follows:
template <typename T> class CVirtualHeapPtr : public CHeapPtr<T, CVirtualAllocator> { public: // CVirtualHeapPtr CVirtualHeapPtr() throw(); explicit CVirtualHeapPtr(_In_ T* pData) throw(); VOID SetProtection(DWORD nProtection) { // TODO: ... } };
The SetProtection method is to define memory protection for the memory block. Full code for the classes is available on Trac here (lines 9-132):
- CGlobalVirtualAllocator class is a singleton querying operating system for virtual memory page size, and provides alignment method
- CVirtualAllocator class is a CCRTAllocator-compatible allocator class
- CVirtualHeapPtr class is smart template class wrapping a pointer to allocated memory
Use case code will be as follows. “SetProtection(PAGE_READONLY)” enables protection on memory block and turns on exception generation at the moment memory block modification attempt. “SetProtection(PAGE_READWRITE)” would restore normal mode of memory operation.
CVirtualHeapPtr<BYTE> p; p.Allocate(2); p[1] = 0x01; p.SetProtection(PAGE_READONLY); // NOTE: Compile with /EHa on order to catch the exception _ATLTRY { p[1] = 0x02; // NOTE: We never reach here due to exception } _ATLCATCHALL() { // NOTE: Catching the access violation for now to be able to continue execution } p.SetProtection(PAGE_READWRITE); p[1] = 0x03;
Given the information what data gets corrupt, the pointer allocator provides an efficient opportunity to detect the violation attempt. The only thing remained is to keep memory read-only, and temporarily revert to write access when the “legal” memory modification code is about to be executed.
One-shot Read/Write Protection with Guard Pages
Another option granted by memory protection modes is brought by PAGE_GUARD flag. MSDN says:
A guard page provides a one-shot alarm for memory page access. This can be useful for an application that needs to monitor the growth of large dynamic data structures. For example, there are operating systems that use guard pages to implement automatic stack checking.
Setting a guard page mode provides an additional option to trigger an exception with even read access to a protected memory block.
p.SetProtection(PAGE_READWRITE | PAGE_GUARD); BYTE n = p[0];
CDebugHeapPtr Class: More Options to Catch Memory Corruption Conditions
While setting memory protection attributes on a memory block of interest provides unique troubleshooting opportunities, it still does not cover important typical problems with memory misuse scenarios. Those are writing immediately before the allocated block, and writing immediately after. Having array of N items, this would be writing to indices -1 and N respectively.
To address this scenarios of misuse we can extend CVirtualHeapPtr class so that it could additionally provide “sanity pages” with PAGE_NOACCESS protection at the boundary of allocation. Because virtual memory allocation is granular, we will have to have padding bytes that extend our block to the page boundary, however we have an option to put the padding bytes before or after the payload data block in order to capture after or before memory block writes respectively.
The figure below shows memory layout for the data:
Source code for the CDebugHeapPtr class is available on Trac (lines 155-). The sanity pages create a block of inaccessible addresses which immediately cause access violation exception on either read of write access attempt. Under debugger, those are shows with question marks:
The padding space is pre-initialized with hardcoded value 0x77, and the space is checked for integrity at release of memory block call.
Catching the Exceptions
Having the exceptions generated on run-time, they immediately alter application execution code path and are easy to track and catch. There is no need to bring the feature rich debugger, such as Visual Studio to the production site in order to catch the exception and environment, instead a way simpler tool such as LogProcessExceptions would be able to create a minidump file and write the state of the application. The minidump can be transferred into debugger-enabled environment for detailed check.
Visual C++ .NET 2010 source code is available from SVN.