AV1 video makes its way with Media Foundation

Microsoft released AV1 Video Extension (Beta) via their store:

Play AV1 videos on your Windows 10 device. This extension is an early beta version of the AV1 software decoder that lets you play videos that have been encoded using the AV1 video coding standard developed by the Alliance for Open Media. Since this is an early release, you might see some performance issues when playing AV1 videos. We’re continuing to improve this extension. If you allow apps to be updated automatically, you should get the latest updates and improvements when we release them.

The extension installs Media Foundation decoder AV1VideoExtension for MFVideoFormat_AV1 video media subtype, dually interfaced for desktop and store (UWP) applications. The decoder is software-only without hardware acceleration (via GPU). Let us hope we will see compatible hardware soon and vendor specific implementation with hardware assisted decoding.

## AV1VideoExtension

13 Attributes:

 * MF_TRANSFORM_FLAGS_Attribute: MFT_ENUM_FLAG_SYNCMFT
 * MFT_INPUT_TYPES_Attributes: MFVideoFormat_AV1
 * MFT_OUTPUT_TYPES_Attributes: MFVideoFormat_NV12, MFVideoFormat_IYUV, MFVideoFormat_420O, MFVideoFormat_P010
 * {3C0FBE52-D034-4115-995D-95B356B9855C}: 1 (Type VT_UI4)
 * {7347C815-79FC-4AD9-877D-ACDF5F46685E}: C:\Program Files\WindowsApps\Microsoft.AV1VideoExtension_1.1.13377.0_x64__8wekyb3d8bbwe\build\x64\av1decodermft_store.dll (Type VT_LPWSTR)
 * {957193AD-9029-4835-A2F2-3EC9AE9BB6C8}: Microsoft.AV1VideoExtension_1.1.13377.0_x64__8wekyb3d8bbwe (Type VT_LPWSTR)
 * {9D8B61A8-6BC8-4BFF-B31F-3A31060AFA3D}: Microsoft.AV1VideoExtension_8wekyb3d8bbwe (Type VT_LPWSTR)
 * {BB49BC51-1810-4C3A-A9CF-D59C4E5B9622}: {4AFB1971-030E-47F7-B991-C8E3BEBB9094} (Type VT_CLSID)
 * {DE106D30-42FB-4767-808D-0FCC6811B0B9}: AV1DecMft (Type VT_LPWSTR)
 * {F9542F80-D069-4EFE-B30D-345536F76AAA}: 0 (Type VT_UI4)
 * {F9A1EF38-F61E-42E6-87B3-309438F9AC67}: 1 (Type VT_UI4)

### IMFTransform

 * Stream Limits: Input 1..1, Output 1..1
 * Streams: Input 1, Output 1

#### Attributes

 * MF_SA_D3D11_AWARE: 0 (Type VT_UI4)
 * CODECAPI_AVDecVideoThumbnailGenerationMode: 0 (Type VT_UI4)
 * {592A2A5A-E797-491A-9738-C0007BE28C52}: ??? (Type VT_UNKNOWN, 0x00000280DCE59790)
 * CODECAPI_AVDecNumWorkerThreads: 0 (Type VT_UI4)
 * MF_SA_D3D_AWARE: 0 (Type VT_UI4)
 * MF_TRANSFORM_ASYNC: 0 (Type VT_UI4)

Getting started with WASAPI

Reader’s question:

… Audio field is very new to me. But I must use WASAPI for one of my project. Would you mind give me a direction where should I start learning in order to be able to implement WASAPI to my project .

WASAPI basics are straightforward:

  • enumerate devices
  • capture audio
  • play (render) audio back

To start WASAPI development I recommend looking at Microsoft’s SDK samples here. The samples include both capture and playback tasks in simple scenarios.

A few more samples for WASAPI on UWP:

You will need MSDN documentation for Windows Audio Session API (WASAPI) to get details on API calls.

Related MSDN API links:

Once you have more specific questions I suggest that you search StackOverflow and MSDN Forums and ask on StackOverflow if you still need help.

Minefields of Intel Media SDK

When it comes to vendor specific SDK for hardware assisted video encoding, Intel Media SDK is perhaps the oldest one among current vendors: Intel, NVIDIA, AMD. And also surprisingly the worst one. All three vendors are offering their SDKs for really close capabilities, however the kits are designed and structured differently. If NVIDIA and AMD are close in terms of convenience to developer, Intel Media SDK is apparently an outsider here.

Debug output to facilitate debugging and troubleshooting? No. Working with this and having memories of AMF SDK AMFDebug and AMFTrace evetnually makes you cry.

Trying to initialize a session against non-Intel GPU, which is apparently not going to work? No failure until you hit something weird later in an unrelated call. What the hell is MFX_IMPL_HARDWARE2 in first place? In which exactly enumeration this device is second or otherwise how do I understand what device this is exactly when I select it by Intel’s ordinal number? MFX_IMPL_HARDWAREn flags are not defined to be sequential. Documentation typo references non-exiting MFX_IMPL_HARDWARE1 flag. NVIDIA and AMD are clearly offering this in a more convenient way.

Forgot to attach an allocator? You get a meaningless failure code trying to initialize encoding context.

Trying to identify maximal supported resolution for encoder? Oopsy.

How do I identify if runtime/driver is capable to implicitly handle ARGB32 (RGB4) to NV12 conversion? No way without actual attempt to initialize context. In which runtime version the capability was introduced? Not documented.

mfxExtVPPDoNotUse and mfxExtVPPDoUse… Seriously? Not documented well. An attempt to initialize structures “differently” still making sense results in meaningless error code.

Asynchronous MFXVideoENCODE_EncodeFrameAsync requires that lifetime of mfxFrameSurface1 argument is extended to the completion of asynchronous call… Things like these just have to be documented! One would hate to find this out while troubleshooting unstable operation of the API.

The hardware encoders are out there for years, and decent ones. It is surprising that the SDK is not equally well and friendly.

AMD H.264 Video Encoder MFT buggy processing of synchronization-enabled input textures

Even though AMD H.264 Video Encoder Media Foundation Transform (MFT) AKA AMDh264Encoder is, generally, a not so bad done piece of software, it still has a few awkward bugs to mention. At this time I am going to show this one: the video encoder transform fails to acquire synchronization on input textures.

The problem comes up when keyed mutex aware textures knock the input door of the transform. The Media Foundation samples carry textures created with D3D11_RESOURCE_MISC_SHARED_KEYEDMUTEX flag, MSDN describes this way:

[…] You can retrieve a pointer to the IDXGIKeyedMutex interface from the resource by using IUnknown::QueryInterface. The IDXGIKeyedMutex interface implements the IDXGIKeyedMutex::AcquireSync and IDXGIKeyedMutex::ReleaseSync APIs to synchronize access to the surface. The device that creates the surface, and any other device that opens the surface by using OpenSharedResource, must call IDXGIKeyedMutex::AcquireSync before they issue any rendering commands to the surface. When those devices finish rendering, they must call IDXGIKeyedMutex::ReleaseSync. […]

Video encoder MFT is supposed to pay attention to the flag and acquire synchronization before the video frame is taken to encoding. AMD implementation fails to do so and it is a bug, a pretty important one and it has been around for a while.

The following code snippet (see also text at the bottom of the post) demonstrates the incorrect behavior of the transform.

Execution reaches the breakpoint position and produces a H.264 sample even though input texture fed into transform is made inaccessible by AcquireSync call in line 104.

By contrast, Microsoft’s H.264 Video Encoder implementation AKA CLSID_MSH264EncoderMFT implements correct behavior and triggers DXGI_ERROR_INVALID_CALL (0x887A0001) failure in line 112.

In the process of doing the SSCCE above and writing the blog post I hit another AMD MFT bug, which is perhaps less important but still showing the internal implementation inaccuracy.

An attempt to send MFT_MESSAGE_NOTIFY_START_OF_STREAM message in line 96 above without input and output media types set triggers a memory access violation:

‘Application.exe’ (Win32): Loaded ‘C:\Windows\System32\DriverStore\FileRepository\c0334550.inf_amd64_cd83b792de8abee9\B334365\atiumd6a.dll’. Symbol loading disabled by Include/Exclude setting.
‘Application.exe’ (Win32): Loaded ‘C:\Windows\System32\DriverStore\FileRepository\c0334550.inf_amd64_cd83b792de8abee9\B334365\atiumd6t.dll’. Symbol loading disabled by Include/Exclude setting.
‘Application.exe’ (Win32): Loaded ‘C:\Windows\System32\DriverStore\FileRepository\c0334550.inf_amd64_cd83b792de8abee9\B334365\amduve64.dll’. Symbol loading disabled by Include/Exclude setting.
Exception thrown at 0x00007FF81FC0E24B (AMDh264Enc64.dll) in Application.exe: 0xC0000005: Access violation reading location 0x0000000000000000.

Continue reading →

Getting MF_E_TRANSFORM_NEED_MORE_INPUT from Video Processor MFT’s ProcessOutput just to let it take next input

Another example how Microsoft Media Foundation can be annoying in small things. So we have this Video Processor MFT transform which addresses multiple video conversion tasks:

The video processor MFT is a Microsoft Media Foundation transform (MFT) that performs colorspace conversion, video resizing, deinterlacing, frame rate conversion, rotation, cropping, spatial left and right view unpacking, and mirroring.

It is easy to see that Microsoft does not offer a lot of DSPs and even less of them are GPU friendly. Video Processor MFT is a “swiss army knife” tool: if takes care of video fitting tasks, in efficient way, in task combinations, being able to take advantage of GPU processing with fallback to software code path, such as known earlier by Color Converter DSP or similar.

Now the main question is, if you are offering just one thing, is there any chance you can do it right?

First of all, the API is not feature rich, it offers just basics via IMFVideoProcessorControl interface. Okay some functionality might not be available in fallback software code path, but it is still the only Direct3D 11 aware conversion component you are offering, so you could still offer more options for those who want to take advantage of GPU-enabled conversions.

With its internal affiliation to Direct3D 11 Video Processor API, it might be worth to mention how exactly the Direct3D API is utilized internally, the limitations and perhaps some advanced options to customize the conversion: underlying API is more flexible than the MFT.

The documentation is not just scarce, it is also incomplete and inconsistent. The MFT does not mention its implementation of IMFVideoProcessorControl2 interface, while the interface itself is described as belonging to the MFT. Although I wrote before that this interface is known for giving some troubles.

The MFT is designed to work in Media Foundation pipelines, such as Media Session hosted and other. However it does take a rocket scientist to realize that if you offer just one thing to the developers for broad range of tasks, the API will be used in various scenarios. Including, for example, use as standalone conversion API.

They should have mentioned in the documentation that the MFT behavior is significantly different in GPU and CPU modes, in for example way the output samples are produced: CPU mode requires the caller to supply a buffer to have output generated to. GPU mode, on the contrary, provides its own output textures with data from internally managed pool (this can be changed, but it’s default behavior). It is fine for Media Session API and alike, but they are also poorly documented so it is not very helpful overall.

I am finally getting to the reason which inspired me to write this post in first place: doing input and output with Video Processor MFT. This is such a fundamental task that it has to have a few words on MSDN on it:

When you have configured the input type and output type for an MFT, you can begin processing data samples. You pass samples to the MFT for processing by using the IMFTransform::ProcessInput method, and then retrieve the processed sample by calling IMFTransform::ProcessOutput. You should set accurate time stamps and durations for all input samples passed. Time stamps are not strictly required but help maintain audio/video synchronization. If you do not have the time stamps for your samples it is better to leave them out than to use uncertain values.

If you use the MFT for conversion and you set the time stamps accurately, it is easy to achieve “one input – one output” behavior. The MFT is additionally synchronous, so it does not require to implement asynchronous processing model: it is possible to consume the API in a really straightforward way: media types set, input 1, output 1, input 2, output 2 etc. – everything within a single thread, linearly. Note that the MFT is not necessarily producing one output sample for every input sample, it is just possible to manage it this way.

No the linear code snippet looks this way:

However even in such a simple scenario the MFT finds a way to horse around. Even if it finishes to produce output for given input, it still requires an additional IMFTransform::ProcessOutput call just to unlock itself for further input and return MF_E_TRANSFORM_NEED_MORE_INPUT. A failure to do this unnecessary call and receive a failure status results in being unable to feed new input in with respectively MF_E_NOTACCEPTING in IMFTransform::ProcessInput. Even though it sort of matches the documented behavior (for example, in ASF related documentation: Processing Data in the Encoder), where MFT host is expected to request output until it is no longer available, there is nothing on documented contract that prevents the MFT to be friendlier on its end. Given the state and the role of this API, it should have been done super friendly to the developers and Microsoft failed to reach minimal acceptable level of friendliness here.

ATLENSURE_SUCCEEDED(pTransform->ProcessInput(0, pSample, 0));
MFT_OUTPUT_DATA_BUFFER OutputDataBuffer = { };
// …
DWORD nStatus;
ATLENSURE_SUCCEEDED(pTransform->ProcessOutput(0, 1, &OutputDataBuffer, &nStatus));
// …
// NOTE: Kick the transform to unlock its input
const HRESULT nProcessOutputResult = pTransform->ProcessOutput(0, 1, &OutputDataBuffer, &nStatus);
ATLASSERT(nProcessOutputResult == MF_E_TRANSFORM_NEED_MORE_INPUT);

MediaFoundationDxgiCapabilities: with AMF SDK H.264 encoder related data

Yet another post on AMD AMF SDK and hopefully a helpful tool reference. I updated one of the capability discovery applications (MediaFoundationDxgiCapabilities) so that it includes a printout of AMFVideoEncoderVCE_AVC related properties similarly as they are printed for Nvidia video adapters.

Information includes:

  • runtime version (and its availability in first place!)
  • maximal resolution, profile and level supported
  • formats with respect to capabilities reported on Direct3D 11 initialized component; specifically the data show which surface formats the encoding component has internal capability to convert on the way to hardware encoder

It looks like this tool was not described in detail earlier so one could find other DXGI related information as well (such as, for example, order of enumeration of DXGI adapters depending on whether an app is running on iGPU or dGPU on a hybrid system; DXGI desktop duplication related information).

This is reported directly from AMF as opposed to information received from Media Foundation API (which is also partially included though). On video encoders reported via Media Foundation, not just H.264 ones, see MediaFoundationVideoEncoderTransforms: Detecting support for hardware H.264 video encoders.

# Display Devices

 * Radeon RX 570 Series
  * Instance: PCI\VEN_1002&DEV_67DF&SUBSYS_E3871DA2&REV_EF\4&2D78AB8F&0&0008
  * DEVPKEY_Device_Manufacturer: Advanced Micro Devices, Inc.
  * DEVPKEY_Device_DriverVersion: 24.20.13017.5001
  * DEVPKEY_Undocumented_LUID: 0.0x0000D1B8

[...]

##### AMD AMF SDK Specific

 * AMF SDK Version: 1.4.9.0 // https://gpuopen.com/gaming-product/advanced-media-framework/
 * AMF Runtime Version: 1.4.9.0

###### AMFVideoEncoderVCE_AVC

 * Acceleration Type: AMF_ACCEL_HARDWARE
 * AMF_VIDEO_ENCODER_CAP_MAX_BITRATE: 100,000,000
 * AMF_VIDEO_ENCODER_CAP_NUM_OF_STREAMS: 16
 * AMF_VIDEO_ENCODER_CAP_MAX_PROFILE: AMF_VIDEO_ENCODER_PROFILE_HIGH
 * AMF_VIDEO_ENCODER_CAP_MAX_LEVEL: 52
 * AMF_VIDEO_ENCODER_CAP_BFRAMES: 0
 * AMF_VIDEO_ENCODER_CAP_MIN_REFERENCE_FRAMES: 1
 * AMF_VIDEO_ENCODER_CAP_MAX_REFERENCE_FRAMES: 16
 * AMF_VIDEO_ENCODER_CAP_MAX_TEMPORAL_LAYERS: 1
 * AMF_VIDEO_ENCODER_CAP_FIXED_SLICE_MODE: 0
 * AMF_VIDEO_ENCODER_CAP_NUM_OF_HW_INSTANCES: 1

####### Input

 * Width Range: 64 - 4,096
 * Height Range: 64 - 2,160
 * Vertical Alignment: 32
 * Format Count: 6
 * Format: AMF_SURFACE_NV12 Native
 * Format: AMF_SURFACE_YUV420P 
 * Format: AMF_SURFACE_YV12 
 * Format: AMF_SURFACE_BGRA 
 * Format: AMF_SURFACE_RGBA 
 * Format: AMF_SURFACE_ARGB 
 * Memory Type Count: 4
 * Memory Type: AMF_MEMORY_DX11 Native
 * Memory Type: AMF_MEMORY_OPENCL 
 * Memory Type: AMF_MEMORY_OPENGL 
 * Memory Type: AMF_MEMORY_HOST 
 * Interlace Support: 0

####### Output

 * Width Range: 64 - 4,096
 * Height Range: 64 - 2,160
 * Vertical Alignment: 32
 * Format Count: 1
  * Format: AMF_SURFACE_NV12 Native
 * Memory Type Count: 4
  * Memory Type: AMF_MEMORY_DX11 Native
  * Memory Type: AMF_MEMORY_OPENCL 
  * Memory Type: AMF_MEMORY_OPENGL 
  * Memory Type: AMF_MEMORY_HOST 
 * Interlace Support: 0

Note that more detailed information can be obtained using amf\public\samples\CPPSamples\CapabilityManager application from the SDK itself, if you build and run it.

Download links

Runtime H.264 encoder setting changes with AMD H.264 hardware MFT

One more AMD MFT related post for now. Some time ago I mentioned that Intel’s implementation of hardware H.264 video encoder Media Foundation Transform (MFT) is not implementing correctly runtime change of encoding settings. Respective Intel Developer Zone submission has received no follow-up and, presumably, attention over time. At this time it was a good moment to check how AMD is doing when it comes to adjustment of encoding settings on active session.

Let us recap:

  • Microsoft: software encoder supports the feature as documented;
  • Intel: fails to change settings;
  • Nvidia: settings change is supported in minimal documented extent;
  • AMD: ?

AMD H.264 hardware encoder fails to support the feature MSDN documentation mentions as required. Respective request triggers 0x80004001 E_NOTIMPL “Not implemented” exception.