NVIDIA Video Codec SDK encoder initialization memory leak

It appears that re-initialization of encoding session with NVIDIA Video Codec SDK is or might be producing an unexpected memory leak.

So, how does it work exactly?

NVENCSTATUS Status;
Status = m_ApiFunctionList.nvEncInitializeEncoder(m_Encoder, &InitializeParams);
assert(Status == NV_ENC_SUCCESS);
// NOTE: Another nvEncInitializeEncoder call
Status = m_ApiFunctionList.nvEncInitializeEncoder(m_Encoder, &InitializeParams);
assert(Status == NV_ENC_SUCCESS); // Still success
...
Status = m_ApiFunctionList.nvEncDestroyEncoder(m_Encoder);
assert(Status == NV_ENC_SUCCESS);

The root case problem is secondary nvEncInitializeEncoder call. Alright, it might be not exactly how API is designed to work, but returned statuses all indicate success, so it will be a bit hard to justify the leak by telling that second initialization call was not expected in first place. Apparently the implementation overwrites internally allocated resources without accurate releasing or reusing. And without triggering any warning of sorts.

Another part of the problem is eclectic design of the API in first place. You open a “session” and obtain “encoder” as a result. Then you initialize “encoder” and when you are finished you destroy “encoder”. Do you destroy “session”? Oh no, you don’t have any session at all except that API opening “session” actually opens an “encoder”.

So when I get into situation where I want to initialize encoder and it is already initialized then what I do is to destroy existing “encoder”, open new “session” and now I can initialize the session-encoder once again with the initialization parameters.

MFCreateVideoSampleFromSurface’s IMFTrackedSample offering

IMFTrackedSample interface is available/allowed in UWP applications. The interface is a useful one when one implements a pool of samples and needs a notification when certain instance can be recycled.

Use this interface to determine whether it is safe to delete or re-use the buffer contained in a sample. One object assigns itself as the owner of the video sample by calling SetAllocator. When all objects release their reference counts on the sample, the owner’s callback method is invoked.

The notification is asynchronous meaning that when a sample is available the notification is scheduled for delivery via standard (for Media Foundation) IMFAsyncCallback::Invoke call. This is quite convenient.

When this method is called, the sample holds an additional reference count on itself. When every other object releases its reference counts on the sample, the sample invokes the pSampleAllocator callback method. To get a pointer to the sample, call IMFAsyncResult::GetObject on the asynchronous result object given to the callback’s IMFAsyncCallback::Invoke method.

I would not have mentioned this if it was that simple, would I?

One could start feeling problems already while looking at MSDN page:

Requirements

Minimum supported client – Windows Vista [desktop apps | UWP apps]

Minimum supported server – Windows Server 2008 [desktop apps | UWP apps]

Header – Evr.h

Library – Strmiids.lib

Oh really, Strmiids.lib?

So the problem is that even though the interface itself is whitelisted for UWP and is a Media Foundation interface in its nature, it is implemented along with EVR and is effectively exposed to public via MFCreateVideoSampleFromSurface API. That is, the only API function that provides access to UWP-friendly interface is a UWP-unfriendly function. Bummer.

It took me less than 300 lines of code to implement a video sample class with IMFTrackedSample implementation that mimics standard (good bye stock implementation!), so it is not difficult. However it would be better if OS implementation is available nicely in first place.

Intel H.264 Video Encoder MFT is ignoring texture synchronization too

Some time ago I wrote about a bug in AMD’s H.264 Video Encoder MFT, where implementation fails to synchronize access to Direct3D 11 texture. So Intel’s implementation has exactly the same problem. Intel® Quick Sync Video H.264 Encoder MFT processes input textures/frames without acquiring synchroization and can lose actual content.

It is pretty hard to reproduce this problem because it is hardware dependent and in most cases the data arrives into the texture before encoder starts processing it, so the problem remains hidden. But in certain systems the bug comes up so easily and produces a stuttering effect. Since input textures are pooled, when new data is late to arrive into texture, H.264 encoder encodes an old video frame and H.264 output is technially valid: it just produces a stuttering effect on playback because wrong content was encoded.

For a Media Foundation API consumer it is not really easy to work the problem around because Media Foundation does not provide access to data streamed between the promitives internally. A high level application might be even not aware that primitives are exchanging with synchronization enabled textures so it is unclear where the source of the problem is. 

Possible solutions to the problem (applicable or not depending on specific case):

  1. to not use synchronization-enabled textures; do a copy from properly sycnhronized texture into a new plain texture before feeding it into encoder; this might require an additional/special MFT inserted into the pipeline before the encoder
  2. implement a customized Media Session (Sink Writer) alike subsystem with control over streamed data so that, in particular, one could sycnhronize (or duplicate) the data before it is fed to encoder’s IMFTransform::ProcessInput
  3. avoid using vendor supplied video encoder MFTs as buggy…

Hardware video encoding in Radeon RX Vega M GH Graphics

If you are curious what video encoding capabilities Radeon RX Vega M GH Graphics offers for a Media Foundation application, here are the details. Some introductory information for starters:

The AMD Radeon RX Vega M GH is an integrated GPU in the fastest Intel Kaby-Lake-G SoC. It combines a Kaby-Lake processor, a Vega graphics card and 4 GB HBM2 memory on a single package. The graphics card offers 24 CUs (1536 shaders) and is clocked from 1063 – 1190 MHz.

The quote above has enough benchmarks related to high resolution gaming, I am however interested in hardware codecs on the chip. The system enumerates two DXGI adapters, so they are both present on chip:

Display Devices

  • Intel(R) HD Graphics 630
  • Instance: PCI\VEN_8086&DEV_591B&SUBSYS_20738086&REV_04\3&11583659&0&10
  • DEVPKEY_Device_Manufacturer: Intel Corporation
  • DEVPKEY_Device_DriverVersion: 24.20.100.6286
  • Radeon RX Vega M GH Graphics
    • Instance: PCI\VEN_1002&DEV_694C&SUBSYS_20738086&REV_C0\4&2BF2E4F6&0&0008
    • DEVPKEY_Device_Manufacturer: Advanced Micro Devices, Inc.
    • DEVPKEY_Device_DriverVersion: 24.20.11026.2001

    Then it is interesting that both integrated GPUs have their own video encoders:

    Category MFT_CATEGORY_VIDEO_ENCODER

    • IntelВ® Quick Sync Video H.264 Encoder MFT (MFT_ENUM_FLAG_HARDWARE)
    • IntelВ® Hardware H265 Encoder MFT (MFT_ENUM_FLAG_HARDWARE)
    • AMDh264Encoder (MFT_ENUM_FLAG_HARDWARE)
    • AMDh265Encoder (MFT_ENUM_FLAG_HARDWARE)

    That is, both Intel and AMD hardware parts come with their video encoding ASICs, no reduction, and together they basically provide excessive video encoding capabilities. 

    Below is the quote of AMF SDK capabilities of the hardware. The data looks pretty much similar to that of my another Radeon RX 570 Series system:

    Continue reading →

    Direct3D 11 Video Processors

    ID3D11VideoContext::VideoProcessorSetOutputTargetRect method:

    The target rectangle is the area within the destination surface where the output will be drawn. The target rectangle is given in pixel coordinates, relative to the destination surface. If this method is never called, or if the Enable parameter is FALSE, the video processor writes to the entire destination surface.

    Okay, let us try it out deflating output rectangle “creating a margin”.

    OutputPosition.SetRect(0, 0, OutputTextureDescription.Width, OutputTextureDescription.Height);
    OutputPosition.DeflateRect(OutputTextureDescription.Width / 8, OutputTextureDescription.Height / 8);
    pVideoContext->VideoProcessorSetOutputTargetRect(pVideoProcessor, TRUE, OutputPosition);

    Ability to take care of destination rectangle, Radeon RX 570 Series vs. Intel(R) UHD Graphics 630

    Why worry, maybe it is just one small bug for today? Oh, no. Forget SetOutputTargetRect, now just plain texture-to-texture with the same DXGI format. These two are produced on the same system, just different GPUs. NVIDIA GeForce GTX 1080 Ti adds a purple tint to the output when it is basically not expected to:

    Ability to keep colors right, NVIDIA GeForce GTX 1080 Ti vs. Intel(R) UHD Graphics 630

    This one does not even look a bug compared to mentioned above. Even though it was an “optimal quality” request Radeon’s optimal quality is not really impressing:

    Text downscaling, Radeon RX 570 Series vs. Intel(R) UHD Graphics 630

    Nasty bugs in Intel Media SDK

    It might be an “old” version of Intel Media SDK runtime but still it is expected that software is running fine in older environments as well.

    MFXVideoVPP_Reset API is available since SDK API 1.0, there is nothing new with it. In certain scenario I use the API to change resolution of the processed video and
    I respectively update mfxVideoParam structure then MFXVideoVPP_Reset, MFXVideoVPP_Query, MFXVideoVPP_QueryIOSurf all succeed – nice.

    The system is an i7-3571U laptop, that is equipped with Intel 3rd Gen CPU. MFX version reported is 1.11.

    When the reset sequence succeeds as expected, however further MFXVideoVPP_GetVideoParam reports unchanged properties… Hey, come on!

    I quote below feedback from Intel I just found on a seemingly similar matter, enjoy:

    for some algorithm, MFXVideoVPP_Reset is not supported, and it will return MFX_ERR_NONE but no effect.
    you can try replace MFXVideoVPP_Reset with MFXVideoVPP_Close & MFXVideoVPP_Init to make it work.

    Continue reading →

    AV1 video makes its way with Media Foundation

    Microsoft released AV1 Video Extension (Beta) via their store:

    Play AV1 videos on your Windows 10 device. This extension is an early beta version of the AV1 software decoder that lets you play videos that have been encoded using the AV1 video coding standard developed by the Alliance for Open Media. Since this is an early release, you might see some performance issues when playing AV1 videos. We’re continuing to improve this extension. If you allow apps to be updated automatically, you should get the latest updates and improvements when we release them.

    The extension installs Media Foundation decoder AV1VideoExtension for MFVideoFormat_AV1 video media subtype, dually interfaced for desktop and store (UWP) applications. The decoder is software-only without hardware acceleration (via GPU). Let us hope we will see compatible hardware soon and vendor specific implementation with hardware assisted decoding.

    ## AV1VideoExtension
    
    13 Attributes:
    
     * MF_TRANSFORM_FLAGS_Attribute: MFT_ENUM_FLAG_SYNCMFT
     * MFT_INPUT_TYPES_Attributes: MFVideoFormat_AV1
     * MFT_OUTPUT_TYPES_Attributes: MFVideoFormat_NV12, MFVideoFormat_IYUV, MFVideoFormat_420O, MFVideoFormat_P010
     * {3C0FBE52-D034-4115-995D-95B356B9855C}: 1 (Type VT_UI4)
     * {7347C815-79FC-4AD9-877D-ACDF5F46685E}: C:\Program Files\WindowsApps\Microsoft.AV1VideoExtension_1.1.13377.0_x64__8wekyb3d8bbwe\build\x64\av1decodermft_store.dll (Type VT_LPWSTR)
     * {957193AD-9029-4835-A2F2-3EC9AE9BB6C8}: Microsoft.AV1VideoExtension_1.1.13377.0_x64__8wekyb3d8bbwe (Type VT_LPWSTR)
     * {9D8B61A8-6BC8-4BFF-B31F-3A31060AFA3D}: Microsoft.AV1VideoExtension_8wekyb3d8bbwe (Type VT_LPWSTR)
     * {BB49BC51-1810-4C3A-A9CF-D59C4E5B9622}: {4AFB1971-030E-47F7-B991-C8E3BEBB9094} (Type VT_CLSID)
     * {DE106D30-42FB-4767-808D-0FCC6811B0B9}: AV1DecMft (Type VT_LPWSTR)
     * {F9542F80-D069-4EFE-B30D-345536F76AAA}: 0 (Type VT_UI4)
     * {F9A1EF38-F61E-42E6-87B3-309438F9AC67}: 1 (Type VT_UI4)
    
    ### IMFTransform
    
     * Stream Limits: Input 1..1, Output 1..1
     * Streams: Input 1, Output 1
    
    #### Attributes
    
     * MF_SA_D3D11_AWARE: 0 (Type VT_UI4)
     * CODECAPI_AVDecVideoThumbnailGenerationMode: 0 (Type VT_UI4)
     * {592A2A5A-E797-491A-9738-C0007BE28C52}: ??? (Type VT_UNKNOWN, 0x00000280DCE59790)
     * CODECAPI_AVDecNumWorkerThreads: 0 (Type VT_UI4)
     * MF_SA_D3D_AWARE: 0 (Type VT_UI4)
     * MF_TRANSFORM_ASYNC: 0 (Type VT_UI4)