Microsoft Media Foundation webcam video capture in one screen of code

Being complicated for many things, Media Foundation is still quite simple for the basics. To capture video with Media Foundation the API offers Source Reader API which uses Media Foundation primitives to build a pipeline that manages origin of the data (not necessarily a live source as in this example, but can also be a file or remote resource) and offers on-request reading of the data by the application, without consumption by Media Foundation managed primitives (in this aspect Source Reader is opposed to Media Session API).

The simplest use of Source Reader to read frames from a web camera is simple enough to fit a few tens of lines of C++ code. Sample VideoCaptureSynchronous project captures video frames in the form of IMFSample samples in less then 100 lines of code.

Friendly Name: Logitech Webcam C930e
nStreamIndex 0, nStreamFlags 0x100, nTime 1215.074, pSample 0x0000000000000000
nStreamIndex 0, nStreamFlags 0x0, nTime 0.068, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 0.196, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 0.324, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 0.436, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 0.564, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 0.676, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 0.804, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 0.916, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 1.044, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 1.156, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 1.284, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 1.396, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 1.524, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 1.636, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 1.764, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 1.956, pSample 0x000002CAF3805D10
...

VideoCaptureSynchronous project does not show what to do with the samples or how to request specific format of the samples, it just shows the ease of video capture per se. The capture takes place synchronously with blocking calls requesting and obtaining the samples.

Media Foundation API is asynchronous by its design and Source Reader API in synchronous mode hides the complexity. The blocking call issues a request for a frame internally, waits until the frame arrives and makes it available.

Source Reader API does offer asynchronous model too, making it available again in as simple as possible way.

VideoCaptureAsynchronous project is doing the same video capture but asynchronously: the controlling thread just starts capture, and frames are delivered via a callback once they are available, on a worker thread.

So when does one use synchronous and asynchronous?

Even though synchronous model results in cleaner and more reliable code with less chances for a mistake, and the gains over asynchronous model in most cases can be neglected esp. to those who are interested in beginner material like this post, video capture is real time process where one doesn’t want to block controlling thread and instead receive the frames out of nowhere as soon as they are ready. Hence, the asynchronous version. Asynchronous still can be simple: VideoCaptureAsynchronous is already more than 100 lines of code, but 120 lines might be also okay.

Download links

  • Source code:
    • VideoCaptureSynchronous: SVN, Trac
    • VideoCaptureAsynchronous: SVN, Trac
  • License: This software is free to use

Reference Signal Source: Direct3D 11 awareness

A few updates to DirectShowReferenceSource module, it’s Media Foundation video Media Source related part today.

First, the video media source is now handling restarts from paused state correctly and resumes frame generation from proper position (not from zero as before).

Second, the video media source is now Direct3D 11 aware. That is, when it participates in Direct3D 11 enabled topologies, the media source generates the video frames using DXGI render target variant of Direct2D (see ID2D1Factory::CreateDxgiSurfaceRenderTarget for details) and delivers them downstream as textures. This is, in particular, useful to those who needs a signal to fit to Direct3D 11 aware transforms and renderers such as DX11VideoRenderer. Specifically, being connected to DX11VideoRenderer the video media source features  GPU-only video playback.

Download links

NVIDIA H.264 Encoder Media Foundation Transform’s REGDB_E_CLASSNOTREG

For already years Nvidia’s H.264 video encoder Media Foundation Transform has been broken giving REGDB_E_CLASSNOTREGClass not registered” failure in certain circumstances, like main display is not the one connected to Nvidia video adapter.

As simple as this:

CoInitialize(NULL);
MFStartup(MF_VERSION);
class __declspec(uuid("{60F44560-5A20-4857-BFEF-D29773CB8040}")) CFoo; // NVIDIA H.264 Encoder MFT
CComPtr<IMFTransform> pTransform;
const HRESULT nCoCreateInstanceResult = pTransform.CoCreateInstance(__uuidof(CFoo));
// NOTE: nCoCreateInstanceResult is 0x80040154 REGDB_E_CLASSNOTREG

The COM server itself is, of course, present and registered properly via their nvEncMFTH264.dll, it’s just broken inside. Being called straightforwardly via IClassFactory.CreateInstance it gives 0x8000FFFF E_UNEXPECTED “Catastrophic Failure”, and so it also fails in Media Foundation API scenarios.

This is absolutely not fun, Nvidia!

Meet Rainway, a free and easy to use game streaming platform

Rainway Client UI
Designed with speed in mind, Rainway is tuned to avoid impacting the performance of your game. Enjoy 60FPS streams with super low-latency gameplay. Rainway is launched beta-version today offering a server for self-hosting and HTML5 web client. The server component is designed to stream using state-of-the-art technology and performance. On client Rainway offers quality remote gaming experience in Chrome and FireFox web browsers.

A few days earlier Rainway released a trailer which features a vision on future of gaming in the world of variety of devices and fast network. Today’s beta launch is the first Rainway’s step towards putting gaming online.

Feel free to register and enjoy the new experience. Also, even though it is about gaming you are not limited to games: the technology remotes you into a workstation in general and makes remove desktop accessible via general purpose browser.

Intel’s hardware H.264 encoder MFT fails to change encoding settings as encoding goes

If I recall correctly, Intel was the first vendor to supply H.264 hardware video encoder as a Media Foundation Transform (since Windows 7) and overall Intel Quick Sync Video (QSV) was the earliest widely available implementation of hardware assisted encoding. However one of the aspects of such encoding seems to be missing: update of encoder parameters during encoding session.

The task itself is pretty typical and outside Media Foundation is handled for example by libx264:

/* x264_encoder_reconfig:
 * various parameters from x264_param_t are copied.
 * this takes effect immediately, on whichever frame is encoded next;
 * due to delay, this may not be the next frame passed to encoder_encode.
 * if the change should apply to some particular frame, use x264_picture_t->param instead.
 * returns 0 on success, negative on parameter validation error.
 * not all parameters can be changed; see the actual function for a detailed breakdown.
 *
 * since not all parameters can be changed, moving from preset to preset may not always
 * fully copy all relevant parameters, but should still work usably in practice. however,
 * more so than for other presets, many of the speed shortcuts used in ultrafast cannot be
 * switched out of; using reconfig to switch between ultrafast and other presets is not
 * recommended without a more fine-grained breakdown of parameters to take this into account. */

Even though Intel’s comment on Media Foundation interface for Intel QSV is such that MFT offers a subset of the available functionality:

These hardware MFT (HMFT) are provided and distributed as part of our graphics drivers. These provide Quick sync (hardware acceleration) support on platforms. Quick Sync Video H.264 Encoder MFT is a fixed function implementation, hence there is limited flexibility to make any changes to the pipeline. Yes, HMFT distributed via graphic driver on win7 includes dx9 support, Win8/8.1 dx11 and Win10 dx12.

Update of setting as encoding goes have been a must for some time. Not only MSDN has a mention of this, e.g. here in Codec Properties detail:

In Windows 8, this property can be set at any time during encoding. Changes are applied starting at the next input frame.

But also it looks like Microsoft implements a related feature in their software video encoder without properly documenting it AND Nvidia follows that in their MFT implementation. This makes me think that certain materials exist such as reference H.264 MFT implementation, which vendors use as a sample and Intel in particular does not include this in their code.

More technical detail in my question on Intel Developer Zone site: Certified Hardware Encoder for H.264 video encoding ignores quality changes:

Continue reading →

Video Processor MFT scaling bug

Let us break the silence with a fresh Media Foundation bug.

D3D11 WARNING: ID3D11DeviceContext::PSSetShaderResources: Resource being set to PS shader resource slot 0 is inaccessible because of a previous call to ReleaseSync or GetDC. [ STATE_SETTING WARNING #7: DEVICE_PSSETSHADERRESOURCES_HAZARD]

A Direct3D 11 enabled instance of Video Processor MFT is doing something wrong with the data and produces blacked output…

As the quoted message suggests, the problem is closely related to D3D11_RESOURCE_MISC_SHARED_KEYEDMUTEX type of textures, and – again presumably – it is a bug in the Microsoft’s MFT implementation since the problem is triggered for no apparent reason and just in some scenarios.

Apparently the internal Dircet3D 11 layer itself is capable to do the scaling because it seems to be even possible to fool the MFT with incorrect media types and get the scaling done!

The problem might be tricky to catch if the conversion is taking place inside higher level Media Foundation APIs like Media Session: it might be hard to try things with the MFT while its managed by the API and the API enforces consistency of the setup. An apparent solution to the problem (one of) is to add another custom MFT in between and copy mutex-enabled texture into a simple one. Video Processor MFT does the scaling right and efficiently when it is not confused by extravagant input.

H.265/HEVC Video Decoder issues in Windows 10 Fall Creators Update

Microsoft released new Windows 10 update and again there is a new tsunami approaching.

Media Foundation H.265 decoder: there is no longer “preliminary documentation” notice in MSDN article. The decoder is a substandard quality software item but it covers more or less what it has to. In particular, it indeed backs H.265 video playback including with the use of DXVA2 where applicable. It could have been an impression that technology matures and who knows maybe it will be even updated to at least the quality and feature set of H.264 decoder.

To make life more interesting and challenging Fall Creators Update does a breaking change. The changes are not yet documented, and they are important: H.265 decoder is taken away. Luckily, not completely: the decoder (along with encoder) is moved to a separate Windows Store download: HEVC Video Extension.

Even though it looks like being the same decoder, it is packaged differently and consumers might require to update code to continue decoder use/consumption.

Even though mentioned above makes at least some sense, there is also an obvious bug on top of all this: 32-bit version of the decoder is BROKEN.

The released variant of the decoder is sealed by digital signature and enforces integrity checks. This looks good with 64-bit version of the software, and in particular stock Movies and TV Player application can indeed play H.265 video in 64-bit Windows because 64-bit version of the application is used, and in turn this pulls 64-bit version of the decoder. However, 32-bit version of the DLL is broken and does not work, hence 32-bit applications relying on H.265 video decoding capabilities are going to stop working with Fall Creators Update.

The problem is apparently the integrity check because if you manage to remove that, 32-bit version of H.265/HEVC decoder is operational.

It will take some time for Microsoft to identify and fix the problem. It will be fixed though, so if it is important to have 32-bit apps functional in the part of H.265 video decoding/playback, one should rather postpone Fall Creators Update.

Further reading: