Incorrect video output of NVIDIA H.264 Encoder MFT

I mentioned issues in AMD’s and Intel’s video encoding related drivers, APIs and integration components. Now I switched development box video card to NVIDIA’s and immediately hit their glitch too.

NVIDIA GeForce RTX 2060 SUPER offers really fast video encoder and consumer hardware from AMD and Intel is simply nowhere near. 3840×2160@144 video can be encoded as fast as under with 10 ms per frame:

H264

640x360@144	1.09
640x360@260	1.02
1280x720@60	2.00
1280x720@120	2.00
1920x1080@60	3.26
1920x1080@72	3.30
1920x1080@90	3.26
1920x1080@120	3.29
1920x1080@144	3.77
2560x1440@60	5.23
2560x1440@72	5.22
2560x1440@90	5.33
2560x1440@120	5.71
2560x1440@144	5.75
3840x2160@30	11.00
3840x2160@60	11.33
3840x2160@72	11.32
3840x2160@90	9.41
3840x2160@120	7.62
3840x2160@144	8.54

HEVC

640x360@144	1.00
640x360@260	1.00
1280x720@60	2.05
1280x720@120	2.05
1920x1080@60	4.03
1920x1080@72	4.01
1920x1080@90	4.01
1920x1080@120	4.63
1920x1080@144	4.67
2560x1440@60	4.00
2560x1440@72	4.00
2560x1440@90	4.00
2560x1440@120	4.10
2560x1440@144	4.18
3840x2160@30	8.00
3840x2160@60	8.33
3840x2160@72	8.43
3840x2160@90	8.42
3840x2160@120	7.88
3840x2160@144	6.89

However this is their hardware and API, and Media Foundation integration based on custom Media Foundation wrapper.

NVIDIA’s Media Foundation encoder transform (MFT) shipped with video driver fails to do even simple thing correctly. Encoding texture using NVIDIA MFT:

It looks like internal color space conversion taking place inside the transform is failing…

NVIDIA HEVC Encoder MFT handles the same input (textures) correctly.

Tiny Windows box doing video encoding

The small box in the right bottom corner of the video is Chuwi LarkBox “World’s Smallest 4K Mini PC”.

With Intel J4115 CPU and Intel UHD Graphics 600 GPU it is running Windows 10 and is capable to render and encode video in real time.

1600×900@60 is a bit too heavy for it and VLC consumes pre-buffered content hitting underflow around 26th second of playback. Buggy VLC implementation for HLS client exhibiting playback artifacts (then eventually locks dead and/or crashes completely; retroactive replay of produced content ensure the video stream itself is okay).

The small thing without excessive horsepower uncovered another bug too.

The video content is prepared by Media Foundation pipeline with IntelĀ® Quick Sync Video H.264 Encoder MFT as GPU encoder. Once in a while encoded frame flash exhibiting lack of proper synchronization in the Intel’s MFT.

A broken frame can look like this or otherwise and is supposedly caused by taking work item to encoding without giving a trouble to wait for scheduled GPU work completion.

However it is not anything new, I wrote before about the same issue in another vendor implementation:

Adding a patch with D3D11 event query and waiting on it works the sync issue around (giving reasons to call it Intel’s bug in first place) and so the top posted video shows proper video stream.

Media Foundation support for Opus 5.1 audio

There is some support for Opus in Windows, unfortunately however it is not documented. IIRC it came to extend media codec support in Microsoft Edge browser, and since internally Microsoft Edge is using standard platform media API Media Foundation, the decoder came in format of Media Foundation Transform.

It is interesting that Opus decoding was put deep enough to appear across multiple environments, including even Windows IoT:

However, Microsoft did not update Media Foundation API itself to indicate presence of new codec support. The documentation has no mention for Opus decoder. The thing has been present in Windows for four years, but it is not exposed to developers…

Apart from this, stock support for Opus either decoder or WebM parser, or both, are limited to mono or stereo audio. There is no support for more sophisticated channel configurations. Neither in Media Foundation, nor in Edge itself. Edge Beta has it because it inherited the capability from Chrome, which in turn bundles libopus directly, through use of FFmpeg.

5.1 Opus audio fragment played by Edge Beta but not Edge:

Edge Beta’s internals:

Since the limitation is in Media Foundation primitives, other Media Foundation based applications exhibit similar behavior. For example, Movies and TV application similarly fails on this media file.

Native registration free COM dependency for .NET 5 application

Isolated property is supposed to enable referencing in-process COM servers as a registration free COM dependency, but something got broken on the way: Visual Studio 2019 Preview and .NET 5 produce applications that lose the link.

It is still preview so hopefully things get resolved timely.

The reproducer itself is a nice template for checking out C#/C++ COM interop.

Windows 10 SDK 10.0.19041 needs some massaging

In:

#include <unknwn.h>
#include <winrt\base.h>
#include <winrt\Windows.Foundation.h>

int main()
{
}

Out:

1>—— Build started: Project: CppWinrt01, Configuration: Debug x64 ——
1>CppWinrt01.cpp
1>C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\cppwinrt\winrt\impl\Windows.Foundation.0.h(983,26): error C2039: ‘wait_for’: is not a member of ‘winrt::impl’
1>C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\cppwinrt\winrt\impl\Windows.Foundation.0.h(103): message : see declaration of ‘winrt::impl’
1>C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\cppwinrt\winrt\impl\Windows.Foundation.0.h(985): message : see reference to class template instantiation ‘winrt::impl::consume_Windows_Foundation_IAsyncAction’ being compiled
1>C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\cppwinrt\winrt\impl\Windows.Foundation.0.h(1004,26): error C2039: ‘wait_for’: is not a member of ‘winrt::impl’
1>C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\cppwinrt\winrt\impl\Windows.Foundation.0.h(103): message : see declaration of ‘winrt::impl’
1>C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\cppwinrt\winrt\impl\Windows.Foundation.0.h(1006): message : see reference to class template instantiation ‘winrt::impl::consume_Windows_Foundation_IAsyncActionWithProgress’ being compiled
1>C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\cppwinrt\winrt\impl\Windows.Foundation.0.h(1038,26): error C2039: ‘wait_for’: is not a member of ‘winrt::impl’
1>C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\cppwinrt\winrt\impl\Windows.Foundation.0.h(103): message : see declaration of ‘winrt::impl’
1>C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\cppwinrt\winrt\impl\Windows.Foundation.0.h(1040): message : see reference to class template instantiation ‘winrt::impl::consume_Windows_Foundation_IAsyncOperationWithProgress’ being compiled
1>C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\cppwinrt\winrt\impl\Windows.Foundation.0.h(1057,26): error C2039: ‘wait_for’: is not a member of ‘winrt::impl’
1>C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\cppwinrt\winrt\impl\Windows.Foundation.0.h(103): message : see declaration of ‘winrt::impl’
1>C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\cppwinrt\winrt\impl\Windows.Foundation.0.h(1059): message : see reference to class template instantiation ‘winrt::impl::consume_Windows_Foundation_IAsyncOperation’ being compiled
1>Done building project “CppWinrt01.vcxproj” — FAILED.
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========

For the record, stepping down to 10.18362 heals the build:

Concurrent H.264 encoder sessions with AMD GPUs

I was under impression that AMD hardware allows just one video encoding session and prevents from having multiple side by side. This has been the consistent behavior I was seeing and I was always wondering why it had to be that tight.

To my surprise, the actual limitation is higher and, in particular, is sixteen (16!) sessions runnable in parallel. In particular, with the GPU in my dev box…

The problem has been a bug in AMD driver and/or AMD AMF runtime, which triggered an exception when in low latency mode. With this bug it is indeed just one session at a time. Even though the bug has been present for literally years, it is good that AMD engineers do respond on github and this results in problem identification, workaround and I hope resolution as well.

The good thing is that just 2+ low latency sessions are not allowed. Multiple regular sessions and zero or one low latency, up to 16 in total, is still fine. That is, a fallback to non low latency session is a possible workaround.

UPDATE: This is a hardware limitation for AMD Polaris Architecture: only one process can own low latency hardware queue. Newer GPUs are not affected.