So, how many EVRs you can do?

Direct3D based DirectShow video renderers – Video Mixing Renderer 9 and Enhanced Video Renderer – have been notoriously known for consuming resources in a way that you can run at most X simultaneously. There has been no comment published on the topic and questions (e.g. like this: How many VMR 9 can a PC support concurrently) remain unanswered for a long time. Video Mixing Renderer 7 was a good alternative for some time in past until it was cut down to be unable to support hardware scaling (thanks Microsoft!).

The trendy way to render video nowadays is using Enhanced Video Renderer, a Media Foundation subsystem with an interface into DirectShow to take over state of the art video rendering capabilities. So, how many EVRs one can run simultaneously? Chances are that it is less than one could suppose.

The interesting part is that there is no obvious evidence on type of resource running out, which causing next EVR instance to fail to run. And not even run, the failure seem to be coming up at an earlier stage of just connecting pins in stopped state. The failure might be accompanied with errors like E_INVALIDARG, ERROR_FILE_NOT_FOUND, E_UNEXPECTED. The actual limit appear to be loosely correlating to parameters of video output, such as resolution and bitness.

Desperately waiting for clarification, I am sharing the tool to estimate the limit:

  • multiple EVR instances at once, hosted by multiple windows, which can be distributed across multiple monitors
  • choices of resolutions and formats
  • a double click on an individual renderer pops up property page set displaying effective frame rate
  • 32- and 64-bit versions

A binary [Win32x64] and partial Visual C++ .NET 2010 partial source code are available from SVN.

Arithmetics Problem Generator

Having no creativity to build problems myself yet trying to train 9 yo buddy in arithmetics, a small application is here to help generate problem to solve to get familiar with operator priorities and basic math.

Tweaking code back and forth can make it more or less complex. Regular expression replace by pattern erases all answers from end of lines and the text is ready to be printed for the one being trained.

180 - 70 - (6460 + 91 - 6) : 77 + (152 - 74 + 2733 - 67) : (4606 : 47) = ?
(1747 - (153 - 66)) : (43 + 125 - (117 - 32)) - 13 * 1 = 7
21 + 145 - (155 - 79) - 120 : 10 - (99 + 560 : 56 - 96) = 65
2 * (270 + 40) : 62 + (139 - 60 + 8753) : (40 + 29) - (180 - (149 - 36 - (91 - 63))) = 43
(4071 - 12 + 2640 : 80) : (98 - (46 + 35 + 7) + 34) = 93
151 + 39 - 45 - (59 + 62 - 33 - (61 - (70 - 46))) = 94
1 * 11 * 4 - (185 - 73 - (118 + 54 + 14 - 89)) = 29
122 - 83 + 81 - (1 + 8) - (101 - 42 - 5 * 2) = 62
254 - 82 - 65 - ((3342 + 13) : 55 + 118 - 96) + (41 * 2 - 38) * 0 * 95 = 24
(1 * 35 + 204 - 87 - 79 + 4 + 5 - (16 + 2758) : (81 - 43)) * 2133 : 79 : 27 = 9

A binary [Win32] and partial Visual C++ .NET 2010 source code are available from SVN.

If you just need an infinite source of things to crack, it’s here also: 01.txt.

Oops, AMAP_3D_TARGET

Unfortunately, AMAP_3D_TARGET appears to be useless. Internally, surface allocation comes up with the following capabilities for the surface: DDSCAPS_VIDEOMEMORY | DDSCAPS_LOCALVIDMEM | DDSCAPS_OFFSCREENPLAIN | DDSCAPS_3DDEVICE. Sadly, DirectDraw responds (might respond?) with E_NOTIMPL.

While one can advance one step further by removing DDSCAPS_VIDEOMEMORY | DDSCAPS_LOCALVIDMEM, another problem is hit that you are no longer able to create a surface with is both FourCC enabled and in the same time has 3D rendering capabilities. Remove your YUV four character code (such as NV12, YV12, YUY2 you would normally have as codec’s output) or fail miserably with DDERR_INVALIDPIXELFORMAT.

Back to the original problem, the AMAP_3D_TARGET flag failing to work makes it impossible to allocate 3D-enabled DirectDraw surface with DirectShow Video Mixing Renderer. While it might sound as deprecated technology, it is yet a supported Windows SDK component, the most efficient video rendering component, the least hardware sensitive video rendering component, and – the most immediately important – the only way an SDK I am working with can deliver its video overlay.

So, there is no easy way to request a 3D enabled surface through a custom allocator-presenter, with a hook in the IVMRSurfaceAllocator::AllocateSurface updating the allocation flags. It does not make much sense to replace the whole allocator-presenter either: while an earlier DirectShow/Windows SDK provides a decent base for custom allocator-presenter, it is going to eventually hit the same problems mentioned above. And you cannot blit from 3D surface into a plain offscreen surface either without getting E_FAIL.

What appears to still be possible is allocating an addition 3D enabled surface, such as within IVMRImagePresenter::StartPresenting and using it as a replacement surface within IVMRImagePresenter::PresentImage. Having a custom allocator-presenter receive video frame, you can blit the picture into 3D-enabled surface, do your thing and pass the updated VMRPRESENTATIONINFO to default allocator-presenter so that it presents the additional update surface, not the original one.

Read more »

Utility Clearance: Simple SMTP Email Sender

The library implements SMTP client and exposes a simple COM interface to send emails. The interface is simple and straightforward, and the emails can be send from various environments, including such as JavaScript code. The class supports SSL/TLS security and is GMail compliant.

A JScript code snippet below provides a sample use case:

message = new ActiveXObject("AlaxInfo.EmailTools.Message");
message.ServerHost = "mail.alax.info";
//message.ServerPort = 25;
message.Sender = "Sender <test@alax.info>";
message.ToRecipients = "Recipient <address@gmail.com>";
//message.CcRecipients = "";
//message.BccRecipients = "";
message.Subject = "Message Test (Plain)";
message.Body = "This is an e-mail message test:" + "\r\n" +
  "" + "\r\n" +
  "- Security: None" + "\r\n" +
  "- Authentication: Plain Text (PLAIN)" + "\r\n" +
  "";
message.AuthMethods = "plain";
message.AuthName = "test@alax.info";
message.AuthPassword = "12345678";
message.Send();

The capabilities include:

  • No additional dependencies – regsvr32 the DLL and it’s ready to use; SSL/TLS implementation through SChannel API
  • UTF-8 encoding and support for Unicode and international characters
  • Secure SSL connections (SecureSocketsLayer property)
  • Secure Transport Layer Security (TLS) connections with STARTTLS command (TransportLayerSecurity property)
  • Authentication options: plain text (LOGIN, PLAIN), digest (CRAM-MD5)
  • Protocol References:

Read more »

File Mappings: Virtual Memory and Virtual Address Space

More and more applications hit the Windows limit of available address space for 32-bit applications, and the whole concept becomes more important for understanding due to necessity to work things around.

A thing, which is more or less easy to understand, is that a user mode 32-bit application can address 2^32 addresses. The addresses are not directly physical RAM and the operating system is responsible for management of the mapping addresses into RAM as a part of virtual memory manager operation. Paged memory organization is well documented on MSDN, and the questions has been raised numerous times. An interesting question is whether a 32-bit application can effectively manage memory amounts exceeding address space limits.

Back in 80386 times, the systems could address megabytes of RAM in 16-bit code through XMS and EMS services. The application could access “high” memory addresses by requesting mapping portions of RAM into lower megabyte address space. In some way similar technique is also here for 32-bit applications in Windows through use of file mappings.

A regular memory backed file mapping requests Windows to reserve a memory block which becomes available for mapping into address space of one or more processes. Creating file mapping itself does not imply mapping and this leaves a great option for the owner to allocate more data than it can actually map into address space: if 32-bit process virtual address space is fundamentally constrained, the file mapping allocation space is more loosely limited by amount of physical memory and paging file. The application can allocate 2, 3, 4 and more gigabytes of memory – it just cannot still map it all together into address space and make it available simultaneously.

The FileMappingVirtualAddress utility does a simple thing:

  • on startup it allocates (CreateFileMapping) as many 256 MB file mappings as operating system would allow, and shows it in a list
  • each time a user checks a box, the application maps (MapViewOfFile) corresponding file mapping into address space; unchecking a box unmaps the view
  • the caption shows currently used and maximal available virtual address space

A plain 32-bit version of the application allocated 51 blocks for me (which totals in 13 GB of memory, with 8 GB physical RAM installed in the system). The allocation takes place immediately because the operating system does not actually make all this memory prepared for use – the actual pages would be allocated and ready to use on demand when the application requires them.

The most important part made so obvious is that the 32-bit application succeeds in allocating well over 4 GB, which is maximal virtual address space it can ever get.

The virtual address space in use is only 1641 MB and another request to map an additional section with MapViewOfFile would fail (the default address space limit is 2 GB) – space fragmentation make mapping unavailable earlier than we actually use the whole space, since the API would need to allocate contiguous range of addresses to satisfy the request.

32-bit application built with /LARGEADDRESSAWARE parameter might manage to do more allocations: 64-bit versions of Windows provide 4 GB of addresses to 32-bit processes. 32-bit operating systems might also be extending the limit in case of 4GB RAM Tuning (which would typically be 3 GB of space for a process).

Finally, 64-bit build of the application is free from virtual address space limit as the limit is 8 terabytes. The mapping is again instantaneous because actual RAM will be supplied on first request to mapped pages only.

A binary [Win32, Win32 with /LARGEADDRESSAWARE, x64] and partial Visual C++ .NET 2010 partial source code are available from SVN.

DirectShow SDK at Crime Scene Investigation Service

According to CSI: Crime Scene Investigation, Season 06 Episode 13 (17:04), investigators are using snake scope camera with DirectShow AMCAP tool to present the captured image:

Hardware assisted memory corruption detection

So you got a memory corruption issue with a piece of software. It comes in a unique scenario along the line of having a huge pile of weird code running well most of the time and then, right out of the blue, a corruption takes place followed by unexpected code execution and unstable software state in general.

The biggest problem with memory corruption is that a fragment of code is modifying a memory block which it does not own, and it has no idea who actually is the owner of the block, while the real owner has no timely way to detect the modification. You only face the consequences being unable to capture the modification moment in first place.

To get back to the original cause, an engineer has to drop into a time machine, turn back time and step back to where the trouble took originally place. As developers are not actually given state-of-the-art time machines, the time turning step is speculative.

CVirtualHeapPtr Class: Memory with Exception-on-Write access mode

At the same time a Windows platform developer is or might be aware of virtual memory API which among other things provides user mode application with capabilities to define memory protection modes. Having this on hands opens unique opportunity to apply read-only protection (PAGE_READONLY) onto a memory block and have exception raised at the very moment of unexpected memory modification, having call stack showing up a source of the problem. I refer to this mode of operation as “hardware assisted” because the access violation exception/condition would be generated purely in hardware without any need to additionally do any address comparison in code.

Needless to say that this way is completely convenient for the developer as he does not need to patch the monstrous application all around in order to compare access addresses against read-only fragment. Instead, a block defined as read-only will be immediately available as such for the whole process almost without any performance overhead.

As ATL provides a set of memory allocator templates (CHeapPtr for heap backed memory blocks, allocated with CCRTAllocator, alternate options include CComHeapPtr with CComAllocator wrapping CoTaskMemAlloc/CoTaskMemFree API), let us make an alternate allocator option that mimic well-known class interface and would facilitate corruption detection.

Because virtual memory allocation unit is a page, and protection mode is defined for the whole page, this would be the allocation granularity. For a single allocated byte we would need to request SYSTEM_INFO::dwPageSize bytes of virtual memory. Unlike normal memory heap manager, we have no way to share pages between allocations as we would be unable to effectively apply protection modes. This would definitely increase application pressure onto virtual memory, but is still acceptable for the sacred task of troubleshooting.

We define a CVirtualAllocator class to be compatible with ATL’s CCRTAllocator, however based on VirtualAlloc/VirtualFree API. The smart pointer class over memory pointer would be defined as follows:

template <typename T>
class CVirtualHeapPtr :
    public CHeapPtr<T, CVirtualAllocator>
{
public:
// CVirtualHeapPtr
    CVirtualHeapPtr() throw();
    explicit CVirtualHeapPtr(_In_ T* pData) throw();
    VOID SetProtection(DWORD nProtection)
    {
        // TODO: ...
    }
};

The SetProtection method is to define memory protection for the memory block. Full code for the classes is available on Trac here (lines 9-132):

  • CGlobalVirtualAllocator class is a singleton querying operating system for virtual memory page size, and provides alignment method
  • CVirtualAllocator class is a CCRTAllocator-compatible allocator class
  • CVirtualHeapPtr class is smart template class wrapping a pointer to allocated memory

Use case code will be as follows. “SetProtection(PAGE_READONLY)” enables protection on memory block and turns on exception generation at the moment memory block modification attempt. “SetProtection(PAGE_READWRITE)” would restore normal mode of memory operation.

CVirtualHeapPtr<BYTE> p;
p.Allocate(2);
p[1] = 0x01;
p.SetProtection(PAGE_READONLY);
// NOTE: Compile with /EHa on order to catch the exception
_ATLTRY
{
    p[1] = 0x02;
    // NOTE: We never reach here due to exception
}
_ATLCATCHALL()
{
    // NOTE: Catching the access violation for now to be able to continue execution
}
p.SetProtection(PAGE_READWRITE);
p[1] = 0x03;

Given the information what data gets corrupt, the pointer allocator provides an efficient opportunity to detect the violation attempt. The only thing remained is to keep memory read-only, and temporarily revert to write access when the “legal” memory modification code is about to be executed.

Read more »