UWP Media Element fullscreen playback bug

New platform, Universal Windows Platform (UWP), offers new Media Foundation bugs.

UWP Media Element embeds a Media Foundation video renderer to present media video frames played back. Under certain conditions, the Media Element control fails to present video inline with just incrementing the presentation time indicating playback in progress. Nevertheless once expanded full screen the video frames are presented as expected.

Video Processor MFT pixel format conversion bug

Not the first, not the last. A Direct3D 11 enabled Media Foundation transfer fails to transfer sample attributes while doing the conversion.

Why attributes are important in first place? Because we can associate data with samples/frames and have them passed through attached to specific frame as the conversion goes and as the data transits through the pipeline.

There is no strict rule whether a transform needs to copy attributes from input to output samples. Attributes are flexible and in this case it’s so flexible that it is not clear what the transforms actually do. Microsoft attempted to bring some order with MFPKEY_EXATTRIBUTE_SUPPORTED property. Let us have a look at what documentation says about the processing model:

The input samples might have attributes that must be copied to the corresponding output samples.

  • If the MFT returns VARIANT_TRUE for the MFPKEY_EXATTRIBUTE_SUPPORTED property, the MFT must copy the attributes.
  • If the MFPKEY_EXATTRIBUTE_SUPPORTED property is either VARIANT_FALSE or is not set, the client must copy the attributes.

Words “client must copy the attributes” should be read as this: MFT does not give a damn about the attributes and go copy them yourself the way you like.

Needless to say that Video Processor MFT itself has not faintest idea about this MFPKEY_EXATTRIBUTE_SUPPORTED attribute in first place, and so is the behavior it defines.

Microsoft designed Video Processor MFT as a Swiss army knife for basic conversions. The MFT has zero degrees of customization and has multiple code paths inside to perform this or that conversion.

All together it means that small bugs inside are endless and MFT behavior is not consistent across different conversions.

So I approached the bug itself: unlike other scenarios when the MFT does pixel format conversion it fails to copy the sample attributes. I feed a sample with attributes attached and I get output with zero attributes.

In my case the workaround is this a wrapper MFT that intercepts IMFTransform::ProcessInput and IMFTransform::ProcessOutput calls and copies the missing attributes.

DirectShowSpy: Who sent EC_ERRORABORT once again

A few years ago the question was already asked: DirectShow Spy: Who Sent EC_ERRORABORT?. The spy already attempted to log a call stack in order to identify the sender of the message. However overtime the tracing capability stopped functioning. There were a few reasons and limitations of internal stack unwinder specifically resulted in inability to capture the information of interest.

It is still important once in a while to back trace the event sender, so now it is time to improve the logging.

Updated DirectShow Spy received an option to produce a minidump at the moment of Filter Graph Manager’s IMediaEventSink.Notify call. The advantage of minidump is that it can be expanded retroactively and depending on minidump type it is possible to capture sufficient amount of details and trace the event back to certain module and/or filter.

The image below displays an example of call stack captured by the minidump. For the purpose of demonstration I used EC_COMPLETE instead though without waiting for actual EC_ERRORABORT.

The call stack shows that event is sent by quartz.dll’s Video Mixing Renderer filter’s input pin as a part of end-of-stream notification handling, which in turn has a trace of video decoder and media file splitter higher on the call stack.

The minidump generation is available in both 32 and 64 bit versions of the spy.

To enable the minidump generation, one needs to add/set the following registry value (that is, set ErrorAbort MiniDump Mode = 2):

  • Key Name: HKEY_LOCAL_MACHINE\SOFTWARE\Alax.Info\Utility\DirectShowSpy
  • Key Name for 32-bit application in 64-bit Windows: HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Alax.Info\Utility\DirectShowSpy
  • Value Name: ErrorAbort MiniDump Mode
  • Value Type: REG_DWORD
  • Value: 0 Default (equal to 1), 1 Disabled, 2 Enabled

Then, to control the type of generated minidump file the following registry value can be used:

  • Key Name: HKEY_LOCAL_MACHINE\SOFTWARE\Alax.Info\Utility\DirectShowSpy\Debug
  • Key Name for 32-bit application in 64-bit Windows: HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Alax.Info\Utility\DirectShowSpy\Debug
  • Value Name: MiniDump Type
  • Value Type: REG_DWORD
  • Value: respectively to MINIDUMP_TYPE enumeration; default value of zero produces MiniDumpNormal minidump

Some useful minidump type flag combinations can be looked up here: What combination of MINIDUMP_TYPE enumeration values will give me the most ‘complete’ mini dump?, and specifically the value 0x1826 expands to the following:
MiniDumpWithFullMemory | MiniDumpWithFullMemoryInfo | MiniDumpWithHandleData | MiniDumpWithThreadInfo | MiniDumpWithUnloadedModules – this gives a “complete” output.

That is, to put it short, to enable .DMP creation on EC_ERRORABORT, the following registry script needs to be merged in addition to spy registration:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Alax.Info\Utility\DirectShowSpy]
"ErrorAbort MiniDump Mode"=dword:00000002

[HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Alax.Info\Utility\DirectShowSpy\Debug]
"MiniDump Type"=dword:00001826

DirectShowSpy.log file will also mention the minidump file name once it is generated.

Download links

About Microsoft FLAC Audio Encoder MFT

This is basically a cross-post of StackOverflow answer:

How do I encode raw 48khz/32bits PCM to FLAC using Microsoft Media Foundation?

So, Microsoft introduced a FLAC Media Foundation Transform (MFT) Encoder CLSID_CMSFLACEncMFT in Windows 10, but the codec remains undocumented at the moment.

Supported Media Formats in Media Foundation is similarly out of date and does not reflect presence of recent additions.

I am not aware of any comment on this, and my opinion is that the codec is added for internal use but the implementation is merely a standard Media Foundation components without licensing restrictions, so the codecs are unrestricted too by, for example, field of use limitations.

This stock codec seems to be limited to 8, 16 and 24 bit PCM input options (that is, not 32 bits/sample – you need to resample respectively). The codec is capable to accept up to 8 channels and flexible samples per second rate (48828 Hz is okay).

Even though the codec (transform) seems to be working, if you want to produce a file, you also need a suitable container format (multiplexer) which is compatible with MFAudioFormat_FLAC (the identifier has 7 results on Google Search at the moment of the post, which basically means noone is even aware of the codec). Outdated documentation does not reflect actual support for FLAC in stock media sinks.

I borrowed a custom media sink that writes a raw MFT output payload into a file, and such FLAC output is playable as the FLAC frames contain necessary information to parse the bitstream for playback.

enter image description here

For the reference, the file itself is: 20180224-175524.flac.

An obvious candidate among stock media sinks WAVE Media Sink is unable to accept FLAC input. Nevertheless it potentially could, the implementation is presumably limited to simpler audio formats.

AVI media sink might possibly take FLAC audio, but it seems to be impossible to create an audio only AVI.

Among other media sink there is however a media sink which can process FLAC: MPEG-4 File Sink. Again, despite the outdated documentation, the media sink takes FLAC input, so you should be able to create .MP4 files with FLAC audio track.

Sample file: 20180224-184012.mp4. “FLAC (framed)”

 

To sum it up:

  • FLAC encoder MFT is present in Windows 10 and is available for use; lacks proper documentation though
  • One needs to take care of conversion of input to compatible format (no direct support for 32-bit PCM)
  • It is possible to manage MFT directly and consume MFT output, then obtain FLAC bitstream
  • Alternatively, it is possible to use stock MP4 media sink to produce output with FLAC audio track
  • Alternatively, it is possible to develop a custom media sink and consume FLAC bitstream from upstream encoder connection

Potentially, the codec is compatible with Transcode API, however the restrictions above apply. The container type needs to be MFTranscodeContainerType_MPEG4 in particular.

The codec is apparently compatible with Media Session API, presumably it is good for use with Sink Writer API either.

In your code as you attempt to use Sink Writer API you should similarly either have MP4 output with input possibly converted to compatible format in your code (compatible PCM or compatible FLAC with encoder MFT managed on your side). Knowing that MP4 media sink overall is capable to create FLAC audio track you should be able to debug fine details in your code and fit the components to work together.

Bonus reading:

Microsoft Media Foundation webcam video capture in one screen of code

Being complicated for many things, Media Foundation is still quite simple for the basics. To capture video with Media Foundation the API offers Source Reader API which uses Media Foundation primitives to build a pipeline that manages origin of the data (not necessarily a live source as in this example, but can also be a file or remote resource) and offers on-request reading of the data by the application, without consumption by Media Foundation managed primitives (in this aspect Source Reader is opposed to Media Session API).

The simplest use of Source Reader to read frames from a web camera is simple enough to fit a few tens of lines of C++ code. Sample VideoCaptureSynchronous project captures video frames in the form of IMFSample samples in less then 100 lines of code.

Friendly Name: Logitech Webcam C930e
nStreamIndex 0, nStreamFlags 0x100, nTime 1215.074, pSample 0x0000000000000000
nStreamIndex 0, nStreamFlags 0x0, nTime 0.068, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 0.196, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 0.324, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 0.436, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 0.564, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 0.676, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 0.804, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 0.916, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 1.044, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 1.156, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 1.284, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 1.396, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 1.524, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 1.636, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 1.764, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 1.956, pSample 0x000002CAF3805D10
...

VideoCaptureSynchronous project does not show what to do with the samples or how to request specific format of the samples, it just shows the ease of video capture per se. The capture takes place synchronously with blocking calls requesting and obtaining the samples.

Media Foundation API is asynchronous by its design and Source Reader API in synchronous mode hides the complexity. The blocking call issues a request for a frame internally, waits until the frame arrives and makes it available.

Source Reader API does offer asynchronous model too, making it available again in as simple as possible way.

VideoCaptureAsynchronous project is doing the same video capture but asynchronously: the controlling thread just starts capture, and frames are delivered via a callback once they are available, on a worker thread.

So when does one use synchronous and asynchronous?

Even though synchronous model results in cleaner and more reliable code with less chances for a mistake, and the gains over asynchronous model in most cases can be neglected esp. to those who are interested in beginner material like this post, video capture is real time process where one doesn’t want to block controlling thread and instead receive the frames out of nowhere as soon as they are ready. Hence, the asynchronous version. Asynchronous still can be simple: VideoCaptureAsynchronous is already more than 100 lines of code, but 120 lines might be also okay.

Download links

  • Source code:
    • VideoCaptureSynchronous: SVN, Trac
    • VideoCaptureAsynchronous: SVN, Trac
  • License: This software is free to use

Reference Signal Source: Direct3D 11 awareness

A few updates to DirectShowReferenceSource module, it’s Media Foundation video Media Source related part today.

First, the video media source is now handling restarts from paused state correctly and resumes frame generation from proper position (not from zero as before).

Second, the video media source is now Direct3D 11 aware. That is, when it participates in Direct3D 11 enabled topologies, the media source generates the video frames using DXGI render target variant of Direct2D (see ID2D1Factory::CreateDxgiSurfaceRenderTarget for details) and delivers them downstream as textures. This is, in particular, useful to those who needs a signal to fit to Direct3D 11 aware transforms and renderers such as DX11VideoRenderer. Specifically, being connected to DX11VideoRenderer the video media source features  GPU-only video playback.

Download links