Hardware video encoding in Radeon RX Vega M GH Graphics

If you are curious what video encoding capabilities Radeon RX Vega M GH Graphics offers for a Media Foundation application, here are the details. Some introductory information for starters:

The AMD Radeon RX Vega M GH is an integrated GPU in the fastest Intel Kaby-Lake-G SoC. It combines a Kaby-Lake processor, a Vega graphics card and 4 GB HBM2 memory on a single package. The graphics card offers 24 CUs (1536 shaders) and is clocked from 1063 – 1190 MHz.

The quote above has enough benchmarks related to high resolution gaming, I am however interested in hardware codecs on the chip. The system enumerates two DXGI adapters, so they are both present on chip:

Display Devices

  • Intel(R) HD Graphics 630
  • Instance: PCI\VEN_8086&DEV_591B&SUBSYS_20738086&REV_04\3&11583659&0&10
  • DEVPKEY_Device_Manufacturer: Intel Corporation
  • DEVPKEY_Device_DriverVersion: 24.20.100.6286
  • Radeon RX Vega M GH Graphics
    • Instance: PCI\VEN_1002&DEV_694C&SUBSYS_20738086&REV_C0\4&2BF2E4F6&0&0008
    • DEVPKEY_Device_Manufacturer: Advanced Micro Devices, Inc.
    • DEVPKEY_Device_DriverVersion: 24.20.11026.2001

    Then it is interesting that both integrated GPUs have their own video encoders:

    Category MFT_CATEGORY_VIDEO_ENCODER

    • IntelВ® Quick Sync Video H.264 Encoder MFT (MFT_ENUM_FLAG_HARDWARE)
    • IntelВ® Hardware H265 Encoder MFT (MFT_ENUM_FLAG_HARDWARE)
    • AMDh264Encoder (MFT_ENUM_FLAG_HARDWARE)
    • AMDh265Encoder (MFT_ENUM_FLAG_HARDWARE)

    That is, both Intel and AMD hardware parts come with their video encoding ASICs, no reduction, and together they basically provide excessive video encoding capabilities. 

    Below is the quote of AMF SDK capabilities of the hardware. The data looks pretty much similar to that of my another Radeon RX 570 Series system:

    Continue reading →

    Direct3D 11 Video Processors

    ID3D11VideoContext::VideoProcessorSetOutputTargetRect method:

    The target rectangle is the area within the destination surface where the output will be drawn. The target rectangle is given in pixel coordinates, relative to the destination surface. If this method is never called, or if the Enable parameter is FALSE, the video processor writes to the entire destination surface.

    Okay, let us try it out deflating output rectangle “creating a margin”.

    OutputPosition.SetRect(0, 0, OutputTextureDescription.Width, OutputTextureDescription.Height);
    OutputPosition.DeflateRect(OutputTextureDescription.Width / 8, OutputTextureDescription.Height / 8);
    pVideoContext->VideoProcessorSetOutputTargetRect(pVideoProcessor, TRUE, OutputPosition);

    Ability to take care of destination rectangle, Radeon RX 570 Series vs. Intel(R) UHD Graphics 630

    Why worry, maybe it is just one small bug for today? Oh, no. Forget SetOutputTargetRect, now just plain texture-to-texture with the same DXGI format. These two are produced on the same system, just different GPUs. NVIDIA GeForce GTX 1080 Ti adds a purple tint to the output when it is basically not expected to:

    Ability to keep colors right, NVIDIA GeForce GTX 1080 Ti vs. Intel(R) UHD Graphics 630

    This one does not even look a bug compared to mentioned above. Even though it was an “optimal quality” request Radeon’s optimal quality is not really impressing:

    Text downscaling, Radeon RX 570 Series vs. Intel(R) UHD Graphics 630

    Nasty bugs in Intel Media SDK

    It might be an “old” version of Intel Media SDK runtime but still it is expected that software is running fine in older environments as well.

    MFXVideoVPP_Reset API is available since SDK API 1.0, there is nothing new with it. In certain scenario I use the API to change resolution of the processed video and
    I respectively update mfxVideoParam structure then MFXVideoVPP_Reset, MFXVideoVPP_Query, MFXVideoVPP_QueryIOSurf all succeed – nice.

    The system is an i7-3571U laptop, that is equipped with Intel 3rd Gen CPU. MFX version reported is 1.11.

    When the reset sequence succeeds as expected, however further MFXVideoVPP_GetVideoParam reports unchanged properties… Hey, come on!

    I quote below feedback from Intel I just found on a seemingly similar matter, enjoy:

    for some algorithm, MFXVideoVPP_Reset is not supported, and it will return MFX_ERR_NONE but no effect.
    you can try replace MFXVideoVPP_Reset with MFXVideoVPP_Close & MFXVideoVPP_Init to make it work.

    Continue reading →

    AV1 video makes its way with Media Foundation

    Microsoft released AV1 Video Extension (Beta) via their store:

    Play AV1 videos on your Windows 10 device. This extension is an early beta version of the AV1 software decoder that lets you play videos that have been encoded using the AV1 video coding standard developed by the Alliance for Open Media. Since this is an early release, you might see some performance issues when playing AV1 videos. We’re continuing to improve this extension. If you allow apps to be updated automatically, you should get the latest updates and improvements when we release them.

    The extension installs Media Foundation decoder AV1VideoExtension for MFVideoFormat_AV1 video media subtype, dually interfaced for desktop and store (UWP) applications. The decoder is software-only without hardware acceleration (via GPU). Let us hope we will see compatible hardware soon and vendor specific implementation with hardware assisted decoding.

    ## AV1VideoExtension
    
    13 Attributes:
    
     * MF_TRANSFORM_FLAGS_Attribute: MFT_ENUM_FLAG_SYNCMFT
     * MFT_INPUT_TYPES_Attributes: MFVideoFormat_AV1
     * MFT_OUTPUT_TYPES_Attributes: MFVideoFormat_NV12, MFVideoFormat_IYUV, MFVideoFormat_420O, MFVideoFormat_P010
     * {3C0FBE52-D034-4115-995D-95B356B9855C}: 1 (Type VT_UI4)
     * {7347C815-79FC-4AD9-877D-ACDF5F46685E}: C:\Program Files\WindowsApps\Microsoft.AV1VideoExtension_1.1.13377.0_x64__8wekyb3d8bbwe\build\x64\av1decodermft_store.dll (Type VT_LPWSTR)
     * {957193AD-9029-4835-A2F2-3EC9AE9BB6C8}: Microsoft.AV1VideoExtension_1.1.13377.0_x64__8wekyb3d8bbwe (Type VT_LPWSTR)
     * {9D8B61A8-6BC8-4BFF-B31F-3A31060AFA3D}: Microsoft.AV1VideoExtension_8wekyb3d8bbwe (Type VT_LPWSTR)
     * {BB49BC51-1810-4C3A-A9CF-D59C4E5B9622}: {4AFB1971-030E-47F7-B991-C8E3BEBB9094} (Type VT_CLSID)
     * {DE106D30-42FB-4767-808D-0FCC6811B0B9}: AV1DecMft (Type VT_LPWSTR)
     * {F9542F80-D069-4EFE-B30D-345536F76AAA}: 0 (Type VT_UI4)
     * {F9A1EF38-F61E-42E6-87B3-309438F9AC67}: 1 (Type VT_UI4)
    
    ### IMFTransform
    
     * Stream Limits: Input 1..1, Output 1..1
     * Streams: Input 1, Output 1
    
    #### Attributes
    
     * MF_SA_D3D11_AWARE: 0 (Type VT_UI4)
     * CODECAPI_AVDecVideoThumbnailGenerationMode: 0 (Type VT_UI4)
     * {592A2A5A-E797-491A-9738-C0007BE28C52}: ??? (Type VT_UNKNOWN, 0x00000280DCE59790)
     * CODECAPI_AVDecNumWorkerThreads: 0 (Type VT_UI4)
     * MF_SA_D3D_AWARE: 0 (Type VT_UI4)
     * MF_TRANSFORM_ASYNC: 0 (Type VT_UI4)
    

    Getting started with WASAPI

    Reader’s question:

    … Audio field is very new to me. But I must use WASAPI for one of my project. Would you mind give me a direction where should I start learning in order to be able to implement WASAPI to my project .

    WASAPI basics are straightforward:

    • enumerate devices
    • capture audio
    • play (render) audio back

    To start WASAPI development I recommend looking at Microsoft’s SDK samples here. The samples include both capture and playback tasks in simple scenarios.

    A few more samples for WASAPI on UWP:

    You will need MSDN documentation for Windows Audio Session API (WASAPI) to get details on API calls.

    Related MSDN API links:

    Once you have more specific questions I suggest that you search StackOverflow and MSDN Forums and ask on StackOverflow if you still need help.

    Minefields of Intel Media SDK

    When it comes to vendor specific SDK for hardware assisted video encoding, Intel Media SDK is perhaps the oldest one among current vendors: Intel, NVIDIA, AMD. And also surprisingly the worst one. All three vendors are offering their SDKs for really close capabilities, however the kits are designed and structured differently. If NVIDIA and AMD are close in terms of convenience to developer, Intel Media SDK is apparently an outsider here.

    Debug output to facilitate debugging and troubleshooting? No. Working with this and having memories of AMF SDK AMFDebug and AMFTrace evetnually makes you cry.

    Trying to initialize a session against non-Intel GPU, which is apparently not going to work? No failure until you hit something weird later in an unrelated call. What the hell is MFX_IMPL_HARDWARE2 in first place? In which exactly enumeration this device is second or otherwise how do I understand what device this is exactly when I select it by Intel’s ordinal number? MFX_IMPL_HARDWAREn flags are not defined to be sequential. Documentation typo references non-exiting MFX_IMPL_HARDWARE1 flag. NVIDIA and AMD are clearly offering this in a more convenient way.

    Forgot to attach an allocator? You get a meaningless failure code trying to initialize encoding context.

    Trying to identify maximal supported resolution for encoder? Oopsy.

    How do I identify if runtime/driver is capable to implicitly handle ARGB32 (RGB4) to NV12 conversion? No way without actual attempt to initialize context. In which runtime version the capability was introduced? Not documented.

    mfxExtVPPDoNotUse and mfxExtVPPDoUse… Seriously? Not documented well. An attempt to initialize structures “differently” still making sense results in meaningless error code.

    Asynchronous MFXVideoENCODE_EncodeFrameAsync requires that lifetime of mfxFrameSurface1 argument is extended to the completion of asynchronous call… Things like these just have to be documented! One would hate to find this out while troubleshooting unstable operation of the API.

    The hardware encoders are out there for years, and decent ones. It is surprising that the SDK is not equally well and friendly.

    AMD H.264 Video Encoder MFT buggy processing of synchronization-enabled input textures

    Even though AMD H.264 Video Encoder Media Foundation Transform (MFT) AKA AMDh264Encoder is, generally, a not so bad done piece of software, it still has a few awkward bugs to mention. At this time I am going to show this one: the video encoder transform fails to acquire synchronization on input textures.

    The problem comes up when keyed mutex aware textures knock the input door of the transform. The Media Foundation samples carry textures created with D3D11_RESOURCE_MISC_SHARED_KEYEDMUTEX flag, MSDN describes this way:

    […] You can retrieve a pointer to the IDXGIKeyedMutex interface from the resource by using IUnknown::QueryInterface. The IDXGIKeyedMutex interface implements the IDXGIKeyedMutex::AcquireSync and IDXGIKeyedMutex::ReleaseSync APIs to synchronize access to the surface. The device that creates the surface, and any other device that opens the surface by using OpenSharedResource, must call IDXGIKeyedMutex::AcquireSync before they issue any rendering commands to the surface. When those devices finish rendering, they must call IDXGIKeyedMutex::ReleaseSync. […]

    Video encoder MFT is supposed to pay attention to the flag and acquire synchronization before the video frame is taken to encoding. AMD implementation fails to do so and it is a bug, a pretty important one and it has been around for a while.

    The following code snippet (see also text at the bottom of the post) demonstrates the incorrect behavior of the transform.

    Execution reaches the breakpoint position and produces a H.264 sample even though input texture fed into transform is made inaccessible by AcquireSync call in line 104.

    By contrast, Microsoft’s H.264 Video Encoder implementation AKA CLSID_MSH264EncoderMFT implements correct behavior and triggers DXGI_ERROR_INVALID_CALL (0x887A0001) failure in line 112.

    In the process of doing the SSCCE above and writing the blog post I hit another AMD MFT bug, which is perhaps less important but still showing the internal implementation inaccuracy.

    An attempt to send MFT_MESSAGE_NOTIFY_START_OF_STREAM message in line 96 above without input and output media types set triggers a memory access violation:

    ‘Application.exe’ (Win32): Loaded ‘C:\Windows\System32\DriverStore\FileRepository\c0334550.inf_amd64_cd83b792de8abee9\B334365\atiumd6a.dll’. Symbol loading disabled by Include/Exclude setting.
    ‘Application.exe’ (Win32): Loaded ‘C:\Windows\System32\DriverStore\FileRepository\c0334550.inf_amd64_cd83b792de8abee9\B334365\atiumd6t.dll’. Symbol loading disabled by Include/Exclude setting.
    ‘Application.exe’ (Win32): Loaded ‘C:\Windows\System32\DriverStore\FileRepository\c0334550.inf_amd64_cd83b792de8abee9\B334365\amduve64.dll’. Symbol loading disabled by Include/Exclude setting.
    Exception thrown at 0x00007FF81FC0E24B (AMDh264Enc64.dll) in Application.exe: 0xC0000005: Access violation reading location 0x0000000000000000.

    Continue reading →