D3D11_VIDEO_DECODER_BUFFER_DESC Documentation Quality

Direct3D 11 DXVA decoding documentation lacks accuracy. The API, sadly, lacks other things too but it is a different story.

D3D11_VIDEO_DECODER_BUFFER_DESC as defined in Windows SDK:

MSDN documentation lost DataSize field, which is – ironically – the most important one along with buffer type.

Related D3D11_1DDI_VIDEO_DECODERR_BUFFER_DESC structure has both fields but the Members section has an obvious copy/paste typo:

The structure and the API itself is presumably not so popular.

Media Foundation’s MFT_MESSAGE_SET_D3D_MANAGER with Frame Rate Converter DSP

It might look weird why would someone try Direct3D mode with a DSP, which is not supposed to be Direct3D aware, but still. I am omitting the part why I even got to such scenario. The documentation says a few things about MFT_MESSAGE_SET_D3D_MANAGER:

  • This message applies only to video transforms. The client should not send this message unless the MFT returns TRUE for the MF_SA_D3D_AWARE attribute (MF_SA_D3D11_AWARE for Direct3D 11).
  • Do not send this message to an MFT with multiple outputs.
  • An MFT should support this message only if the MFT uses DirectX Video Acceleration for video processing or decoding.
  • If an MFT supports this message, it should also implement the IMFTransform::GetAttributes method and return the value TRUE…
  • If an MFT does not support this message, it should return E_NOTIMPL from ProcessMessage. This is an exception to the general rule that an MFT can return S_OK from any message it ignores.

Frame Rate Converter DSP is a hybrid DMO/MFT, which in turn basically means that its “legacy” DMO upgraded to MFT using specialized wrapper. It is not supposed to be Direct3D aware, not documented as such.

However it could presumably normalize frame rate of Direct3D aware samples by dropping/duplicating samples respectively. It could easily be Direct3D aware since it does not need, in its simplest implementation, to change the data. It is easy to see that the MFT satisfies the other conditions: it is single output video transform.

The MFT correctly and expectedly does not advertise itself as Direct3D aware. It does not have transform attributes.

However, it fails to comply with documented behavior on returning E_NOTIMPL in MFT_MESSAGE_SET_D3D_MANAGER message. The message is defined to be an exception, however DSP implementation seems to be ignoring that. The wrapper could possibly be created even before the exception was introduced in first place.

The DSP does not make an exception, returns success code as if it does handle the message, and does not act as documented.

AMD started offering hardware H.265/HEVC video encoder for Media Foundation

It should be good news for those interested in hardware assisted video encoding as AMD extends offering in their new hardware and offers H.265 encoder in already well-known form factor: as a Microsoft Media Foundation Transform “AMDh265Encoder”:

# System

* Version: 10.0.14393, Windows 10, VER_SUITE_SINGLEUSERTS, VER_NT_WORKSTATION
* Product: PRODUCT_PROFESSIONAL

[…]

# Display Devices

* AMD Radeon (TM) RX 480
* Instance: PCI\VEN_1002&DEV_67DF&SUBSYS_0B371002&REV_C7\4&2D78AB8F&0&0008
* DEVPKEY_Device_Manufacturer: Advanced Micro Devices, Inc.
* DEVPKEY_Device_DriverVersion: 21.19.137.514

[…]

# Category `MFT_CATEGORY_VIDEO_ENCODER`

[…]

## AMDh265Encoder

15 Attributes:

* MFT_TRANSFORM_CLSID_Attribute: {5FD65104-A924-4835-AB71-09A223E3E37B} (Type VT_CLSID)
* MF_TRANSFORM_FLAGS_Attribute: MFT_ENUM_FLAG_HARDWARE
* MFT_ENUM_HARDWARE_VENDOR_ID_Attribute: VEN_1002 (Type VT_LPWSTR)
* MFT_ENUM_HARDWARE_URL_Attribute: AMDh265Encoder (Type VT_LPWSTR)
* MFT_INPUT_TYPES_Attributes: MFVideoFormat_NV12, MFVideoFormat_ARGB32
* MFT_OUTPUT_TYPES_Attributes: MFVideoFormat_HEVC
* MFT_CODEC_MERIT_Attribute: 8 (Type VT_UI4)
* MFT_SUPPORT_DYNAMIC_FORMAT_CHANGE: 1 (Type VT_UI4)
* MF_TRANSFORM_ASYNC: 1 (Type VT_UI4)
* MF_SA_D3D11_AWARE: 1 (Type VT_UI4)
* MF_SA_D3D_AWARE: 1 (Type VT_UI4)
* MF_TRANSFORM_ASYNC_UNLOCK: 0 (Type VT_UI4)
* MFT_GFX_DRIVER_VERSION_ID_Attribute: 1.2.3.4

This follows Intel’s H.265/HEVC hardware compression offering also available in MFT form factor:

## Intel® Hardware H265 Encoder MFT

12 Attributes:

* MFT_TRANSFORM_CLSID_Attribute: {BC10864D-2B34-408F-912A-102B1B867B6C} (Type VT_CLSID)
* MF_TRANSFORM_FLAGS_Attribute: MFT_ENUM_FLAG_HARDWARE
* MFT_ENUM_HARDWARE_VENDOR_ID_Attribute: VEN_8086 (Type VT_LPWSTR)
* MFT_ENUM_HARDWARE_URL_Attribute: AA243E5D-2F73-48c7-97F7-F6FA17651651 (Type VT_LPWSTR)
* MFT_INPUT_TYPES_Attributes: {3231564E-3961-42AE-BA67-FF47CCC13EED}, MFVideoFormat_NV12, MFVideoFormat_ARGB32
* MFT_OUTPUT_TYPES_Attributes: MFVideoFormat_HEVC
* MFT_CODEC_MERIT_Attribute: 7 (Type VT_UI4)
* MFT_SUPPORT_DYNAMIC_FORMAT_CHANGE: 1 (Type VT_UI4)
* MF_TRANSFORM_ASYNC: 1 (Type VT_UI4)
* MFT_GFX_DRIVER_VERSION_ID_Attribute: 0.0.0.3

Intel Quick Sync Video Consumption by Applications

I wrote a few posts on hardware H.264 encoding (e.g. this and the latest one Applying Hardsubs to H.264 Video with GPU). A blog reader asked a question regarding availability of the mentioned Intel Quick Sync Video support with low end Intel x5-Z8300 CPU.

[…] Intel has advertised that the Cherry Trail CPUs support H264 encoding and / or QSV, but nowhere have I seen a demo of this being used […].
What did you use to encode the video? Is the QSV codec available in the x5-z8300 for possible 720p realtime encoding? I’d like to see this checked in regards to using software like FFmpeg with qsv_h264 -codec and OBS. […]

A picture below explains how applications are consuming Intel’s hardware video compression offering in Windows.

Intel QSV includes hardware implementation of the encoder and corresponding drivers which provide a frontend API to software. This includes a component which integrates the codec with Microsoft’s Media Foundation API. Applications are to choose between interfacing the codec using Windows API – this is the way stock Microsoft applications work, and this is the way I used for video encoding development mentioned on the blog. Other applications prefer to interface through Intel Media SDK, which is an alternate route ending up at the same hardware-backed services.

Intel x5-Z8300 system in question has H.264 video encoding support integrated into Windows API and the services can be consumed without additional Intel runtime and/or development kit. The codec, according to the benchmarks made earlier, is fast enough to handle real-time 720p video encoding nevertheless the device is a budget thing.

Applying Hardsubs to H.264 Video with GPU

Video adapters currently offer a range of services which enables transcoding of H.264 content with certain modifications (including but not limited to flexible overlays, scaling, mirroring, effects and filters) end-to end on GPU keeping data as DirectX Graphics Infrastructure resource at all processing stages.

Such specialized processing capabilities are pretty powerful compared to traditional CPU processing, especially taking into consideration performance of low end low power-consumption systems still equipped with contemporary GPU.

For a test, I transcoded video H.264/MP4 files of different resolutions applying a text overlay having a time stamp of video frame being processed. The overlay is complex enough to be  varying frame to frame, be a standard font with respective rasterization (using DirectWrite). The test re-encoded H.264 content maintaining bitrate without giving too much care for other encoding details (defaults used).

Hardsub Performance Test

The roughly made test was successful with two video GPUs:

  1. Intel HD Graphics 4600 (7th gen; Desktop system; Core i7-4790 CPU)
  2. Intel HD Graphics (8th gen; Ultramobile system; Atom x5-Z8300 CPU) – the system is actually a $200 worth budget Chinese tablet Cube iWork 10 Ultimate

The test failed on other GPUs:

  • Intel HD Graphics 4000 (7th gen; Mobile system; Core i7-3517U CPU)
  • NVIDIA GeForce GTX 750

The problem – as it looks without getting into details – seems to be the inability of Media Foundation APIs to fit Direct3D-enabled pipelines out of the box, such as because of lack of certain conversion. It looks like transcoding can be achieved, with just putting some more effort into it.

As of now, Intel offers their 9th generation GPUs and the ones being tested are hardware of a few yeas in age…

Compared to real time performance of 100% (meaning that it takes one second to process one second of video of given metrics), both systems managed to do the transcoding relatively efficiently. With a roughly built test having a bottleneck at applying overlay, taking place serially in single thread, both systems showed performance sufficient to convert 1920×1080@60 video faster than in real time and without maxing CPU out.

Intel’s seventh generation desktop GPU managed to do the job way much faster.

It is interesting that even cheap tablet can process a Full HD video stream loading CPU less than 40%. Basically, the performance is sufficient for doing real time video processing (including using external web camera like Logitech C930E) with certain processing in 1080p resolution using budget grade hardware.

Re-encoding Performance

When there is no necessity to keep the real time processing pace, the cheap tablet showed the ability to do GPU processing on 2K video, which is also good news for those who wants to apply budget hardware to high resolution material.

Apparently, the key factor is ability of the process to keep data in video hardware. As Intel GPU H.264 abilities scale well when used for concurrent multi-stream processing, the performance numbers promise great performance recording video in several formats at a time: raw video, video with overlay, scaled down etc.

The table below gives more numbers for the tests concluded:

Re-encoding Performance Numbers

As mentioned above, the overlaying part itself is a single threaded bottleneck and presumably it is a reserve to be used to cut elapsed time down even further.

Another interesting observation is that while ultramobile system still uses much of CPU time (which is okay – it’s not a powerful system by design), the desktop GPU has minimal impact on CPU while doing pretty complicated task.

Applicability of Virtual DirectShow Sources

Virtual DirectShow  sources have been a long time synonym of software-only camera implementation exposed to applications along with physical cameras in a way that applications consume the sources without making a difference whether the camera is real or virtual. Vivek’s template was a starting point for many:

Capture Source Filter filter (version 0.1) 86 KB zipped, includes binaries.  A sample source filter that emulates a video capture device contributed by Vivek (rep movsd from the public newsgroups).  Thanks Vivek!  TMH has not tested this filter yet.  Ask questions about this on microsoft.public.win32.programmer.directx.video.

With API changes over years, the sample and the concept is still understood as the method of adding a virtual camera, however new scenarios exist where the concept no longer works. Typical problems:

  1. 64-bit applications cannot consume virtual 32-bit virtual sources
  2. Virtual sources are no visible and accessible to applications consuming video using Media Foundation API

The diagram below explains the applicability of virtual cameras:

Applicability of Virtual DirectShow Sources

Important is that virtual sources can only be consumed by the DirectShow-based applications of the same bitness.

If source developer needs to synchronize virtual source throughout multiple applications (e.g. video is synthesized by another application and needs to be deliverable to multiple clients), he needs to add interprocess synchronization on the backyard of virtual source.

If developer needs to support both 32- and 64-bit apps, he needs both variants of virtual sources registered, and possibly synchronization of the kind of the paragraph above.

The only virtual device which is visible to all video capture applications if implemented by kernel level driver (implementations are rare but exist).

See also:

Screen recording using Desktop Duplication API and hardware H.264 encoder

The application takes advantage of three powerful Windows APIs at a time:

MediaFoundationDesktopRecorder initializes a desktop duplication session and sends obtained desktop images to H.264 video encoder producing a standard MP4 recording. Optionally, it can add an audio track capturing data from one of the standard inputs.

The best performance is achieved when used with hardware H.264 encoder: not only the performance of hardware encoder is better, but additionally desktop images are transferred to the encoder efficiently, without being copied through system memory. With respective hardware, recording is pretty efficient.

There are certain limitations: duplication API is Windows 8+, encoder availability depends on hardware and OS versions. The application let API pick encoder automatically and in worth case scenario falls back to software encoder, which is typically a performance hit.

MediaFoundationDesktopRecorder UI

When started, the application prints initial information, esp. regarding availability of devices, and appends as actions and events take place.

The application uses configuration file with the same name and location as the application, and .INI extension. Changes to the configuration file take effect when the application is restarted.

The application registers Win+F5, Win+F8 hotkeys globally to start/stop recording when the application is in background (that is, when user interacts with another application).

The application generates .MP4 files in the directory of its own location. There will be a video track, and optionally one additional audio track – depending on settings. Video is taken from one of the monitors, and audio – from one of the available standard audio input devices.

The application also generates log files at one the locations:

  • C:\ProgramData\MediaFoundationDesktopRecorder.log
  • C:\Users\$(UserName)\AppData\Local\MediaFoundationDesktopRecorder.log (in case the first path above is inaccessible, esp. due to insufficient permissions)

Configuration

The configuration .INI file might contain a few settings that set up and alter the behavoir of the application:

[Input]
;Video Adapter Description=NVIDIA GeForce GTX 750
Video Output Device Name=\\.\DISPLAY2
;Audio Friendly Name=Stereo Mix (Realtek High Definition Audio)

When started, the application enumerates (“found video…”, “found audio…”) available video and audio inputs. These discoveries are compared against configuration file settings in order to identify monitor for recording, and possibly audio input device.

Default behavior is to take first available monitor, which happens when settings do not instruct otherwise. By default, no audio is recorded. Audio is recorded and added to resulting file if input device is provided explicitly.

The application also prints which devices are taken for further recording (“using adapter…”).

[Format]
;Video Frame Rate=30000
;Video Frame Rate Denominator=1001
Video Bitrate=4096000
Video Texture Pool Capacity=24
Video Throttle=70
Audio Bitrate=192000

Default behavior is to identify monitor’s refresh rate and produce output file with video at the same frame rate. Video Frame Rate and Video Frame Rate Denominator settings offer an override to target file frame rate. With the former value only, it is the frame rate. With both values they define a ratio, e.g. values of 30000 and 1001 result in 29.97 fps file.

Frame rate reduction is a good way to reduce encoding complexity and overall graphics subsystem load.

Bitrate values define respective bitrates for the encoded content.

Details

As recording goes, the application grabs new desktop snapshots and sends them to encoder. There are no specific expectations about frame rate stability and reduction in case of overload of graphics subsystem. When the complexity is excessive, it is expected that some frames might be lost without breaking the entire playability of the output file.

The application provides additional information when it creates a file, for example:

Using Direct3D 11 at feature level D3D_FEATURE_LEVEL_11_0
Using Desktop Duplication mode: Resolution 1680 x 1050, Refresh Rate 59954/1000, Format DXGI_FORMAT_B8G8R8A8_UNORM
Using path “D:\Projects\...\Output\20160707-070707.mp4”
Using video transform Direct3D 11 Aware, Category MFT_CATEGORY_VIDEO_PROCESSOR, Input MFVideoFormat_ARGB32, Output MFVideoFormat_NV12
Using video transform NVIDIA H.264 Encoder MFT, Direct3D 11 Aware, Category MFT_CATEGORY_VIDEO_ENCODER, Input MFVideoFormat_NV12, Output MFVideoFormat_H264
Started writing…
PPP frames written (QQQ frame timeouts, RRR early frame skips, SSS late frame skips)
Stopped writing
Output file size is TTT bytes

When started the application might experience a condition when certain hardware resource is no longer available, e.g. the desktop itself is locked by user. The application will close the file, and attempt to automatically restart recording into new file. The attempts keep going until user explicitly stops recording.

The application does NOT do the following (among things it could):

  • the application is limited to record from one monitor only; to record from two at a time it is possible to start several instances however the produced result will not be synchronized
  • the application does not provide options to record single window image, to cut a section of monitor image or to scale image down
  • the application does not offer choices for video encoders (e.g. there are two or more hardware H.264 encoders), it will always use encoder picked by the system
  • the application only offers bitrate setting for video encoding
  • the application does not provide flexibility in audio encoding settings, it also expects that audio device is available throughout the entire recording session (esp. is not unplugged as recording goes)

References (Informational)

Download links