Effects of IMFVideoProcessorControl2::EnableHardwareEffects

IMFVideoProcessorControl2::EnableHardwareEffects method:

Enables effects that were implemented with IDirectXVideoProcessor::VideoProcessorBlt.

[…] Specifies whether effects are to be enabled. TRUE specifies to enable effects. FALSE specifies to disable effects.

All right, it is apparently not IDirectXVideoProcessor and MSDN link behind the identifier takes one to correct Direct3D 11 API method: ID3D11VideoContext::VideoProcessorBlt.

Worse news is that having the effects “enabled”, the transform (the whole thing belongs to Media Foundation’s Swiss knife of conversion [with just one blade and no scissors though] – Video Processing MFT) fails to deliver proper output and produces green black fill instead of proper image.

Or, perhaps, this counts as a hardware effect.

D3D11_VIDEO_DECODER_BUFFER_DESC Documentation Quality

Direct3D 11 DXVA decoding documentation lacks accuracy. The API, sadly, lacks other things too but it is a different story.

D3D11_VIDEO_DECODER_BUFFER_DESC as defined in Windows SDK:

MSDN documentation lost DataSize field, which is – ironically – the most important one along with buffer type.

Related D3D11_1DDI_VIDEO_DECODERR_BUFFER_DESC structure has both fields but the Members section has an obvious copy/paste typo:

The structure and the API itself is presumably not so popular.

Applying Hardsubs to H.264 Video with GPU

Video adapters currently offer a range of services which enables transcoding of H.264 content with certain modifications (including but not limited to flexible overlays, scaling, mirroring, effects and filters) end-to end on GPU keeping data as DirectX Graphics Infrastructure resource at all processing stages.

Such specialized processing capabilities are pretty powerful compared to traditional CPU processing, especially taking into consideration performance of low end low power-consumption systems still equipped with contemporary GPU.

For a test, I transcoded video H.264/MP4 files of different resolutions applying a text overlay having a time stamp of video frame being processed. The overlay is complex enough to be  varying frame to frame, be a standard font with respective rasterization (using DirectWrite). The test re-encoded H.264 content maintaining bitrate without giving too much care for other encoding details (defaults used).

Hardsub Performance Test

The roughly made test was successful with two video GPUs:

  1. Intel HD Graphics 4600 (7th gen; Desktop system; Core i7-4790 CPU)
  2. Intel HD Graphics (8th gen; Ultramobile system; Atom x5-Z8300 CPU) – the system is actually a $200 worth budget Chinese tablet Cube iWork 10 Ultimate

The test failed on other GPUs:

  • Intel HD Graphics 4000 (7th gen; Mobile system; Core i7-3517U CPU)
  • NVIDIA GeForce GTX 750

The problem – as it looks without getting into details – seems to be the inability of Media Foundation APIs to fit Direct3D-enabled pipelines out of the box, such as because of lack of certain conversion. It looks like transcoding can be achieved, with just putting some more effort into it.

As of now, Intel offers their 9th generation GPUs and the ones being tested are hardware of a few yeas in age…

Compared to real time performance of 100% (meaning that it takes one second to process one second of video of given metrics), both systems managed to do the transcoding relatively efficiently. With a roughly built test having a bottleneck at applying overlay, taking place serially in single thread, both systems showed performance sufficient to convert 1920×1080@60 video faster than in real time and without maxing CPU out.

Intel’s seventh generation desktop GPU managed to do the job way much faster.

It is interesting that even cheap tablet can process a Full HD video stream loading CPU less than 40%. Basically, the performance is sufficient for doing real time video processing (including using external web camera like Logitech C930E) with certain processing in 1080p resolution using budget grade hardware.

Re-encoding Performance

When there is no necessity to keep the real time processing pace, the cheap tablet showed the ability to do GPU processing on 2K video, which is also good news for those who wants to apply budget hardware to high resolution material.

Apparently, the key factor is ability of the process to keep data in video hardware. As Intel GPU H.264 abilities scale well when used for concurrent multi-stream processing, the performance numbers promise great performance recording video in several formats at a time: raw video, video with overlay, scaled down etc.

The table below gives more numbers for the tests concluded:

Re-encoding Performance Numbers

As mentioned above, the overlaying part itself is a single threaded bottleneck and presumably it is a reserve to be used to cut elapsed time down even further.

Another interesting observation is that while ultramobile system still uses much of CPU time (which is okay – it’s not a powerful system by design), the desktop GPU has minimal impact on CPU while doing pretty complicated task.

Reference Signal Source: Unity 3D Integration

This is not really practical – just some fun stuff and attempt to make use of a readily available code snippet. So… reference signal in Unity 3D game engine through Media Foundation and Direct3D 11.

Reference signal in Unity 3D

It is basically Low-Level Native Plugin example project (limited to Direct3D 11 and Visual Studio 2015), which works together with DirectShowReferenceSource-x64.dll doing a few things:

  1. commented out standard plugin example inner loop rendering
  2. plugin does COM instantiation of VideoMediaSourceTextureReader class of DirectShowReferenceSource project and passes a texture for further updates:
    • the helper class instantiates Media Foundation media source and initializes compatible resolution
    • also instantiates Source Reader to conveniently read from media source
    • starts producing video frames
  3. plugin rendering loop does IVideoMediaSourceTextureReader::Update calls passing Unity time; the helper class matches generated frames against requested time and updates texture respectively

The scene thus contains a plane with custom material updated on the backend with Media Foundation produced data.

The demo is not really about performance because so many things can be done to improve it and make it well… done right. Threading, tighter Direct3D integration, use of proper Direct2D render target etc. The demo is really about integration itself. Nevertheless, 2K texture is streamed well at full rate.

Download links

StressEvr: So, how many EVRs you can do?

Direct3D based DirectShow video renderers – Video Mixing Renderer 9 and Enhanced Video Renderer – have been notoriously known for consuming resources in a way that you can run at most X simultaneously. There has been no comment published on the topic and questions (e.g. like this: How many VMR 9 can a PC support concurrently) remain unanswered for a long time. Video Mixing Renderer 7 was a good alternative for some time in past until it was cut down to be unable to support hardware scaling (thanks Microsoft!). The trendy way to render video nowadays is using Enhanced Video Renderer, a Media Foundation subsystem with an interface into DirectShow to take over state of the art video rendering capabilities. So, how many EVRs one can run simultaneously? Chances are that it is less than one could suppose. The interesting part is that there is no obvious evidence on type of resource running out, which causing next EVR instance to fail to run. And not even run, the failure seem to be coming up at an earlier stage of just connecting pins in stopped state. The failure might be accompanied with errors like E_INVALIDARG, ERROR_FILE_NOT_FOUND, E_UNEXPECTED. The actual limit appear to be loosely correlating to parameters of video output, such as resolution and bitness. Desperately waiting for clarification, I am sharing the tool to estimate the limit:
* multiple EVR instances at once, hosted by multiple windows, which can be distributed across multiple monitors
* choices of resolutions and formats
* a double click on an individual renderer pops up property page set displaying effective frame rate
* 32- and 64-bit versions

Download links