Infrared Camera in Media Foundation

Surface Pro (5th Gen) infrared camera streamed into Chrome browser in H.264 encoding over WebSocket connection

The screenshot above shows Surface Pro tablet’s infrared camera (known as “Microsoft IR Camera Front” on the device) captured live, encoded and streamed (everything is hosted by Microsoft Media Foundation Media Session by this point) over network using WebSockets into Chrome’s HTML5 video tag by means of Media Source Extensions (MSE).

Why? Because why not.

Unfortunately, Microsoft did not publish/document API to access infrared and depth (time-of-flight) cameras so that traditional applications could use the hardware capabilities. Nevertheless, the functionality is available in Universal Windows Platform (UWP), see Windows.Media.Capture.Frames and friends.

UWP implementation is apparently using Media Foundation on its backyard so the fucntionlaity could certainly be published for desktop applications as well. Another interesting thing is that my [undocumented] way to access the device seems to be bypassing frame server and talks to device directly, including video.

It does not look like Microsoft is planning to extend visibility of these new features to desktop Media Foundation API since they sequentially add new features without exposing them for public use outside UWP. UWP API itself is eclectic and I can’t imagine how one could get a good understanding of it without having a good grip on underlying API layers.

Media Foundation MP4 Media Source gets a bit too tired when doing too much work

It appears there is a sort of a limitation (read: “a bug”) in Media Foundation MPEG-4 File Source implementation when it comes to reading long fragmented MP4 files.

When respective media source is used to read a file (for which, by the way, it does not offer seeking), the source issues a MF_SOURCE_READERF_ENDOFSTREAM before reaching actual end of file.

When some software sees a full hour of video in the file…

… Media Foundation primitive, after reading frame 00:58:35.1833333, issues “oh gimme a break” event and reports end of stream.

NVIDIA Video Codec SDK encoder initialization memory leak

It appears that re-initialization of encoding session with NVIDIA Video Codec SDK is or might be producing an unexpected memory leak.

So, how does it work exactly?

Status = m_ApiFunctionList.nvEncInitializeEncoder(m_Encoder, &InitializeParams);
assert(Status == NV_ENC_SUCCESS);
// NOTE: Another nvEncInitializeEncoder call
Status = m_ApiFunctionList.nvEncInitializeEncoder(m_Encoder, &InitializeParams);
assert(Status == NV_ENC_SUCCESS); // Still success
Status = m_ApiFunctionList.nvEncDestroyEncoder(m_Encoder);
assert(Status == NV_ENC_SUCCESS);

The root case problem is secondary nvEncInitializeEncoder call. Alright, it might be not exactly how API is designed to work, but returned statuses all indicate success, so it will be a bit hard to justify the leak by telling that second initialization call was not expected in first place. Apparently the implementation overwrites internally allocated resources without accurate releasing or reusing. And without triggering any warning of sorts.

Another part of the problem is eclectic design of the API in first place. You open a “session” and obtain “encoder” as a result. Then you initialize “encoder” and when you are finished you destroy “encoder”. Do you destroy “session”? Oh no, you don’t have any session at all except that API opening “session” actually opens an “encoder”.

So when I get into situation where I want to initialize encoder and it is already initialized then what I do is to destroy existing “encoder”, open new “session” and now I can initialize the session-encoder once again with the initialization parameters.

MFCreateVideoSampleFromSurface’s IMFTrackedSample offering

IMFTrackedSample interface is available/allowed in UWP applications. The interface is a useful one when one implements a pool of samples and needs a notification when certain instance can be recycled.

Use this interface to determine whether it is safe to delete or re-use the buffer contained in a sample. One object assigns itself as the owner of the video sample by calling SetAllocator. When all objects release their reference counts on the sample, the owner’s callback method is invoked.

The notification is asynchronous meaning that when a sample is available the notification is scheduled for delivery via standard (for Media Foundation) IMFAsyncCallback::Invoke call. This is quite convenient.

When this method is called, the sample holds an additional reference count on itself. When every other object releases its reference counts on the sample, the sample invokes the pSampleAllocator callback method. To get a pointer to the sample, call IMFAsyncResult::GetObject on the asynchronous result object given to the callback’s IMFAsyncCallback::Invoke method.

I would not have mentioned this if it was that simple, would I?

One could start feeling problems already while looking at MSDN page:


Minimum supported client – Windows Vista [desktop apps | UWP apps]

Minimum supported server – Windows Server 2008 [desktop apps | UWP apps]

Header – Evr.h

Library – Strmiids.lib

Oh really, Strmiids.lib?

So the problem is that even though the interface itself is whitelisted for UWP and is a Media Foundation interface in its nature, it is implemented along with EVR and is effectively exposed to public via MFCreateVideoSampleFromSurface API. That is, the only API function that provides access to UWP-friendly interface is a UWP-unfriendly function. Bummer.

It took me less than 300 lines of code to implement a video sample class with IMFTrackedSample implementation that mimics standard (good bye stock implementation!), so it is not difficult. However it would be better if OS implementation is available nicely in first place.

Intel H.264 Video Encoder MFT is ignoring texture synchronization too

Some time ago I wrote about a bug in AMD’s H.264 Video Encoder MFT, where implementation fails to synchronize access to Direct3D 11 texture. So Intel’s implementation has exactly the same problem. Intel® Quick Sync Video H.264 Encoder MFT processes input textures/frames without acquiring synchroization and can lose actual content.

It is pretty hard to reproduce this problem because it is hardware dependent and in most cases the data arrives into the texture before encoder starts processing it, so the problem remains hidden. But in certain systems the bug comes up so easily and produces a stuttering effect. Since input textures are pooled, when new data is late to arrive into texture, H.264 encoder encodes an old video frame and H.264 output is technially valid: it just produces a stuttering effect on playback because wrong content was encoded.

For a Media Foundation API consumer it is not really easy to work the problem around because Media Foundation does not provide access to data streamed between the promitives internally. A high level application might be even not aware that primitives are exchanging with synchronization enabled textures so it is unclear where the source of the problem is. 

Possible solutions to the problem (applicable or not depending on specific case):

  1. to not use synchronization-enabled textures; do a copy from properly sycnhronized texture into a new plain texture before feeding it into encoder; this might require an additional/special MFT inserted into the pipeline before the encoder
  2. implement a customized Media Session (Sink Writer) alike subsystem with control over streamed data so that, in particular, one could sycnhronize (or duplicate) the data before it is fed to encoder’s IMFTransform::ProcessInput
  3. avoid using vendor supplied video encoder MFTs as buggy…

Hardware video encoding in Radeon RX Vega M GH Graphics

If you are curious what video encoding capabilities Radeon RX Vega M GH Graphics offers for a Media Foundation application, here are the details. Some introductory information for starters:

The AMD Radeon RX Vega M GH is an integrated GPU in the fastest Intel Kaby-Lake-G SoC. It combines a Kaby-Lake processor, a Vega graphics card and 4 GB HBM2 memory on a single package. The graphics card offers 24 CUs (1536 shaders) and is clocked from 1063 – 1190 MHz.

The quote above has enough benchmarks related to high resolution gaming, I am however interested in hardware codecs on the chip. The system enumerates two DXGI adapters, so they are both present on chip:

Display Devices

  • Intel(R) HD Graphics 630
  • Instance: PCI\VEN_8086&DEV_591B&SUBSYS_20738086&REV_04\3&11583659&0&10
  • DEVPKEY_Device_Manufacturer: Intel Corporation
  • DEVPKEY_Device_DriverVersion:
  • Radeon RX Vega M GH Graphics
    • Instance: PCI\VEN_1002&DEV_694C&SUBSYS_20738086&REV_C0\4&2BF2E4F6&0&0008
    • DEVPKEY_Device_Manufacturer: Advanced Micro Devices, Inc.
    • DEVPKEY_Device_DriverVersion: 24.20.11026.2001

    Then it is interesting that both integrated GPUs have their own video encoders:


    • IntelВ® Quick Sync Video H.264 Encoder MFT (MFT_ENUM_FLAG_HARDWARE)
    • IntelВ® Hardware H265 Encoder MFT (MFT_ENUM_FLAG_HARDWARE)

    That is, both Intel and AMD hardware parts come with their video encoding ASICs, no reduction, and together they basically provide excessive video encoding capabilities. 

    Below is the quote of AMF SDK capabilities of the hardware. The data looks pretty much similar to that of my another Radeon RX 570 Series system:

    Continue reading →

    Direct3D 11 Video Processors

    ID3D11VideoContext::VideoProcessorSetOutputTargetRect method:

    The target rectangle is the area within the destination surface where the output will be drawn. The target rectangle is given in pixel coordinates, relative to the destination surface. If this method is never called, or if the Enable parameter is FALSE, the video processor writes to the entire destination surface.

    Okay, let us try it out deflating output rectangle “creating a margin”.

    OutputPosition.SetRect(0, 0, OutputTextureDescription.Width, OutputTextureDescription.Height);
    OutputPosition.DeflateRect(OutputTextureDescription.Width / 8, OutputTextureDescription.Height / 8);
    pVideoContext->VideoProcessorSetOutputTargetRect(pVideoProcessor, TRUE, OutputPosition);

    Ability to take care of destination rectangle, Radeon RX 570 Series vs. Intel(R) UHD Graphics 630

    Why worry, maybe it is just one small bug for today? Oh, no. Forget SetOutputTargetRect, now just plain texture-to-texture with the same DXGI format. These two are produced on the same system, just different GPUs. NVIDIA GeForce GTX 1080 Ti adds a purple tint to the output when it is basically not expected to:

    Ability to keep colors right, NVIDIA GeForce GTX 1080 Ti vs. Intel(R) UHD Graphics 630

    This one does not even look a bug compared to mentioned above. Even though it was an “optimal quality” request Radeon’s optimal quality is not really impressing:

    Text downscaling, Radeon RX 570 Series vs. Intel(R) UHD Graphics 630