AMD started offering hardware H.265/HEVC video encoder for Media Foundation

It should be good news for those interested in hardware assisted video encoding as AMD extends offering in their new hardware and offers H.265 encoder in already well-known form factor: as a Microsoft Media Foundation Transform “AMDh265Encoder”:

# System

* Version: 10.0.14393, Windows 10, VER_SUITE_SINGLEUSERTS, VER_NT_WORKSTATION
* Product: PRODUCT_PROFESSIONAL

[…]

# Display Devices

* AMD Radeon (TM) RX 480
* Instance: PCI\VEN_1002&DEV_67DF&SUBSYS_0B371002&REV_C7\4&2D78AB8F&0&0008
* DEVPKEY_Device_Manufacturer: Advanced Micro Devices, Inc.
* DEVPKEY_Device_DriverVersion: 21.19.137.514

[…]

# Category `MFT_CATEGORY_VIDEO_ENCODER`

[…]

## AMDh265Encoder

15 Attributes:

* MFT_TRANSFORM_CLSID_Attribute: {5FD65104-A924-4835-AB71-09A223E3E37B} (Type VT_CLSID)
* MF_TRANSFORM_FLAGS_Attribute: MFT_ENUM_FLAG_HARDWARE
* MFT_ENUM_HARDWARE_VENDOR_ID_Attribute: VEN_1002 (Type VT_LPWSTR)
* MFT_ENUM_HARDWARE_URL_Attribute: AMDh265Encoder (Type VT_LPWSTR)
* MFT_INPUT_TYPES_Attributes: MFVideoFormat_NV12, MFVideoFormat_ARGB32
* MFT_OUTPUT_TYPES_Attributes: MFVideoFormat_HEVC
* MFT_CODEC_MERIT_Attribute: 8 (Type VT_UI4)
* MFT_SUPPORT_DYNAMIC_FORMAT_CHANGE: 1 (Type VT_UI4)
* MF_TRANSFORM_ASYNC: 1 (Type VT_UI4)
* MF_SA_D3D11_AWARE: 1 (Type VT_UI4)
* MF_SA_D3D_AWARE: 1 (Type VT_UI4)
* MF_TRANSFORM_ASYNC_UNLOCK: 0 (Type VT_UI4)
* MFT_GFX_DRIVER_VERSION_ID_Attribute: 1.2.3.4

This follows Intel’s H.265/HEVC hardware compression offering also available in MFT form factor:

## Intel® Hardware H265 Encoder MFT

12 Attributes:

* MFT_TRANSFORM_CLSID_Attribute: {BC10864D-2B34-408F-912A-102B1B867B6C} (Type VT_CLSID)
* MF_TRANSFORM_FLAGS_Attribute: MFT_ENUM_FLAG_HARDWARE
* MFT_ENUM_HARDWARE_VENDOR_ID_Attribute: VEN_8086 (Type VT_LPWSTR)
* MFT_ENUM_HARDWARE_URL_Attribute: AA243E5D-2F73-48c7-97F7-F6FA17651651 (Type VT_LPWSTR)
* MFT_INPUT_TYPES_Attributes: {3231564E-3961-42AE-BA67-FF47CCC13EED}, MFVideoFormat_NV12, MFVideoFormat_ARGB32
* MFT_OUTPUT_TYPES_Attributes: MFVideoFormat_HEVC
* MFT_CODEC_MERIT_Attribute: 7 (Type VT_UI4)
* MFT_SUPPORT_DYNAMIC_FORMAT_CHANGE: 1 (Type VT_UI4)
* MF_TRANSFORM_ASYNC: 1 (Type VT_UI4)
* MFT_GFX_DRIVER_VERSION_ID_Attribute: 0.0.0.3

How to control IP Video Source JPEG video filter from C# programmatically

User’s question about Alax.Info IP Video Source DirectShow filter for JPEG/M-JPEG video:

As it stands now, in order to configure the video capture source filter, I have to instantiate it […], set the properties (camera URL, and video dimensions), then save the filter state to a data array which is then compiled into the application. That doesn’t provide a feasible system for user configuration at run time: Telling our customers we need to have the IP address(es) of their camera(s) so we can build a custom version of our software for them isn’t going to be very marketable.

[…]

I looked on the Alax.Info Web site, and didn’t see anything that looks like an API for configuration. I also didn’t see anything inside […] that looks like configuration parameters. I find it hard to believe this isn’t a common requirement, so I suspect I’m overlooking something.

Is there an “easy” way to configure the parameters of the Alax.info video capture filter at run time? Where is it documented?

Is there a structure that describes the state info that is captured [into raw configuration data array]?

Are there any other considerations I’ve overlooked?

C++ and C# sample code (Trac, SVN) appears to be not easy no discover. The code snippets include PlayJpegLocation (C++) and PlayJpegLocationSharp (C#) projects/snippets demonstrating import of IP Video Source COM interfaces via COM and use them from code.

IP Video Source comes with a COM type library which is well suited for import by development environments, specifically by Visual Studio as a reference for .NET projects. To avoid bitness issues, it is recommended to have both 32 and 64 bit versions of IP Video Source installed.

C# integration is straightforward:

First, one needs to add a reference to AlaxInfoIpVideoSource type library, the same way DirectShow.NET is added to the project. The type library is imported and is available via .NET COM Interop at this point (specifically, visible in Object Browser). In code, the definitions are in AlaxInfoIpVideoSource namespace, JpegVideoSourceFilter represents a filter class, with IJpegVideoSourceFilter interface offering filter configuration.

The filter is also compatible with generic COM persistence: it implements standard IPersistStream, IPersistStreamInit, IPersistPropertyBag and other interfaces, its state can be saved to stream or property bad and loaded back.

Intel Quick Sync Video Consumption by Applications

I wrote a few posts on hardware H.264 encoding (e.g. this and the latest one Applying Hardsubs to H.264 Video with GPU). A blog reader asked a question regarding availability of the mentioned Intel Quick Sync Video support with low end Intel x5-Z8300 CPU.

[…] Intel has advertised that the Cherry Trail CPUs support H264 encoding and / or QSV, but nowhere have I seen a demo of this being used […].
What did you use to encode the video? Is the QSV codec available in the x5-z8300 for possible 720p realtime encoding? I’d like to see this checked in regards to using software like FFmpeg with qsv_h264 -codec and OBS. […]

A picture below explains how applications are consuming Intel’s hardware video compression offering in Windows.

Intel QSV includes hardware implementation of the encoder and corresponding drivers which provide a frontend API to software. This includes a component which integrates the codec with Microsoft’s Media Foundation API. Applications are to choose between interfacing the codec using Windows API – this is the way stock Microsoft applications work, and this is the way I used for video encoding development mentioned on the blog. Other applications prefer to interface through Intel Media SDK, which is an alternate route ending up at the same hardware-backed services.

Intel x5-Z8300 system in question has H.264 video encoding support integrated into Windows API and the services can be consumed without additional Intel runtime and/or development kit. The codec, according to the benchmarks made earlier, is fast enough to handle real-time 720p video encoding nevertheless the device is a budget thing.

Bug in Media Foundation MPEG-4 File Source related to timestamping video frames of a fragmented MP4 file

Some recent update in Media Foundation platform introduced a new bug related to fragmented MP4 files and H.264 video. The bug shows up consistently with file versions:

  • mfplat.dll – 10.0.14393.351 (rs1_release_inmarket.161014-1755)    15-Oct-16 05:48
  • mfmp4srcsnk.dll – 10.0.14393.351 (rs1_release_inmarket.161014-1755)    15-Oct-16 05:45

The nature of the problem is that MPEG-4 File Source is incorrectly time stamping the data: frame time stamps are incorrect, they seems to be getting wrong durations and increments, then quickly jumps into future… and on playback this leads to unobvious playback freezes. As Media Foundation is used by Windows Media Player, Windows 10 Movies & TV Player, the bug is present there as well.

The original report is on MSDN Forums.

Presumably it is possible to roll certain Windows Update package back, or alternatively one has to wait for Microsoft to fix the problem and deliver a new update deploying the fix.

Applying Hardsubs to H.264 Video with GPU

Video adapters currently offer a range of services which enables transcoding of H.264 content with certain modifications (including but not limited to flexible overlays, scaling, mirroring, effects and filters) end-to end on GPU keeping data as DirectX Graphics Infrastructure resource at all processing stages.

Such specialized processing capabilities are pretty powerful compared to traditional CPU processing, especially taking into consideration performance of low end low power-consumption systems still equipped with contemporary GPU.

For a test, I transcoded video H.264/MP4 files of different resolutions applying a text overlay having a time stamp of video frame being processed. The overlay is complex enough to be  varying frame to frame, be a standard font with respective rasterization (using DirectWrite). The test re-encoded H.264 content maintaining bitrate without giving too much care for other encoding details (defaults used).

Hardsub Performance Test

The roughly made test was successful with two video GPUs:

  1. Intel HD Graphics 4600 (7th gen; Desktop system; Core i7-4790 CPU)
  2. Intel HD Graphics (8th gen; Ultramobile system; Atom x5-Z8300 CPU) – the system is actually a $200 worth budget Chinese tablet Cube iWork 10 Ultimate

The test failed on other GPUs:

  • Intel HD Graphics 4000 (7th gen; Mobile system; Core i7-3517U CPU)
  • NVIDIA GeForce GTX 750

The problem – as it looks without getting into details – seems to be the inability of Media Foundation APIs to fit Direct3D-enabled pipelines out of the box, such as because of lack of certain conversion. It looks like transcoding can be achieved, with just putting some more effort into it.

As of now, Intel offers their 9th generation GPUs and the ones being tested are hardware of a few yeas in age…

Compared to real time performance of 100% (meaning that it takes one second to process one second of video of given metrics), both systems managed to do the transcoding relatively efficiently. With a roughly built test having a bottleneck at applying overlay, taking place serially in single thread, both systems showed performance sufficient to convert 1920×1080@60 video faster than in real time and without maxing CPU out.

Intel’s seventh generation desktop GPU managed to do the job way much faster.

It is interesting that even cheap tablet can process a Full HD video stream loading CPU less than 40%. Basically, the performance is sufficient for doing real time video processing (including using external web camera like Logitech C930E) with certain processing in 1080p resolution using budget grade hardware.

Re-encoding Performance

When there is no necessity to keep the real time processing pace, the cheap tablet showed the ability to do GPU processing on 2K video, which is also good news for those who wants to apply budget hardware to high resolution material.

Apparently, the key factor is ability of the process to keep data in video hardware. As Intel GPU H.264 abilities scale well when used for concurrent multi-stream processing, the performance numbers promise great performance recording video in several formats at a time: raw video, video with overlay, scaled down etc.

The table below gives more numbers for the tests concluded:

Re-encoding Performance Numbers

As mentioned above, the overlaying part itself is a single threaded bottleneck and presumably it is a reserve to be used to cut elapsed time down even further.

Another interesting observation is that while ultramobile system still uses much of CPU time (which is okay – it’s not a powerful system by design), the desktop GPU has minimal impact on CPU while doing pretty complicated task.

Applicability of Virtual DirectShow Sources

Virtual DirectShow  sources have been a long time synonym of software-only camera implementation exposed to applications along with physical cameras in a way that applications consume the sources without making a difference whether the camera is real or virtual. Vivek’s template was a starting point for many:

Capture Source Filter filter (version 0.1) 86 KB zipped, includes binaries.  A sample source filter that emulates a video capture device contributed by Vivek (rep movsd from the public newsgroups).  Thanks Vivek!  TMH has not tested this filter yet.  Ask questions about this on microsoft.public.win32.programmer.directx.video.

With API changes over years, the sample and the concept is still understood as the method of adding a virtual camera, however new scenarios exist where the concept no longer works. Typical problems:

  1. 64-bit applications cannot consume virtual 32-bit virtual sources
  2. Virtual sources are no visible and accessible to applications consuming video using Media Foundation API

The diagram below explains the applicability of virtual cameras:

Applicability of Virtual DirectShow Sources

Important is that virtual sources can only be consumed by the DirectShow-based applications of the same bitness.

If source developer needs to synchronize virtual source throughout multiple applications (e.g. video is synthesized by another application and needs to be deliverable to multiple clients), he needs to add interprocess synchronization on the backyard of virtual source.

If developer needs to support both 32- and 64-bit apps, he needs both variants of virtual sources registered, and possibly synchronization of the kind of the paragraph above.

The only virtual device which is visible to all video capture applications if implemented by kernel level driver (implementations are rare but exist).

See also:

C++ #import and x64 builds

I already wrote earlier on 32/64-bit issues with Visual Studio. The problems are not frequent but when they happen they are pretty confusing. Here is another one today.

C++ code is simple:

    #import "libid:59941706-0000-1111-2222-7EE5C88402D2" raw_interfaces_only no_namespace

    CComPtr<IObject> pObject;
    ATLENSURE_SUCCEEDED(pObject.CoCreateInstance(__uuidof(Object)));
    BYTE* pnData;
    ATLENSURE_SUCCEEDED(pObject->Method((ULONG_PTR) (BYTE*) pnData));

A COM method returns a pointer to data – pretty straightforward, what could have gone wrong?

COM server vendor designed the library for easy .NET integration and defined the pointer argument as an integer value. They suppose the values to be used further with System.Runtime.InteropServices.Marshal class.

32-bit builds worked well and 64-bit builds experienced memory access violations. An attempt to consume the COM server from C# project showed the same problem: unexpected exception in the call.

The problem is that cross-compiler importing COM type library using LIBID takes 32-bit library even when it builds 64-bit code. This is the problem for both C++ #import "libid:..." and .NET COM reference using the identifier.

The type library imports as the following IDL in 32-bits:

[id(1)]
HRESULT Method(
                [in] unsigned long bufPtr);

It is supposed that 64-bit builds get the following import:

[id(1)]
HRESULT Method(
                [in] uint64 bufPtr);

Effectively though, 64-bit builds get the 32-bit import and the argument which is supposed to carry casted pointer value is truncated to 32-bits, ULONG type. Cast to ULONG_PTR in 64-bit C++ code is, of course, not helpful since it’s trimmed anyway further fitting the IDL argument type.

The same happens with C# build.

It was developer’s choice to publish ordinal type argument, they wanted this to be “better” and ended up in bitness mess. If the argument remained a pointer type in the IDL then even incorrect bitness would not necessarily result in value truncation.

All together it is unsafe to import [an untrusted] type library using LIBID when it comes to 64-bit builds. It’s 32-bit library to be taken and it can result in incorrect import. Instead, such build should explicitly point to 64-bit type library, for example:

#if defined(_WIN64)
    #import "Win64\ThirdParty.DLL" raw_interfaces_only no_namespace
#else
    //#import "libid:59941706-0000-1111-2222-7EE5C88402D2" raw_interfaces_only no_namespace
    #import "Win32\ThirdParty.DLL" raw_interfaces_only no_namespace
#endif

Too bad! libid looked so nice and promising.