Bug in Media Foundation MPEG-4 File Source related to timestamping video frames of a fragmented MP4 file

Some recent update in Media Foundation platform introduced a new bug related to fragmented MP4 files and H.264 video. The bug shows up consistently with file versions:

  • mfplat.dll – 10.0.14393.351 (rs1_release_inmarket.161014-1755)    15-Oct-16 05:48
  • mfmp4srcsnk.dll – 10.0.14393.351 (rs1_release_inmarket.161014-1755)    15-Oct-16 05:45

The nature of the problem is that MPEG-4 File Source is incorrectly time stamping the data: frame time stamps are incorrect, they seems to be getting wrong durations and increments, then quickly jumps into future… and on playback this leads to unobvious playback freezes. As Media Foundation is used by Windows Media Player, Windows 10 Movies & TV Player, the bug is present there as well.

The original report is on MSDN Forums.

Presumably it is possible to roll certain Windows Update package back, or alternatively one has to wait for Microsoft to fix the problem and deliver a new update deploying the fix.

Screen recording using Desktop Duplication API and hardware H.264 encoder

The application takes advantage of three powerful Windows APIs at a time:

MediaFoundationDesktopRecorder initializes a desktop duplication session and sends obtained desktop images to H.264 video encoder producing a standard MP4 recording. Optionally, it can add an audio track capturing data from one of the standard inputs.

The best performance is achieved when used with hardware H.264 encoder: not only the performance of hardware encoder is better, but additionally desktop images are transferred to the encoder efficiently, without being copied through system memory. With respective hardware, recording is pretty efficient.

There are certain limitations: duplication API is Windows 8+, encoder availability depends on hardware and OS versions. The application let API pick encoder automatically and in worth case scenario falls back to software encoder, which is typically a performance hit.

MediaFoundationDesktopRecorder UI

When started, the application prints initial information, esp. regarding availability of devices, and appends as actions and events take place.

The application uses configuration file with the same name and location as the application, and .INI extension. Changes to the configuration file take effect when the application is restarted.

The application registers Win+F5, Win+F8 hotkeys globally to start/stop recording when the application is in background (that is, when user interacts with another application).

The application generates .MP4 files in the directory of its own location. There will be a video track, and optionally one additional audio track – depending on settings. Video is taken from one of the monitors, and audio – from one of the available standard audio input devices.

The application also generates log files at one the locations:

  • C:\ProgramData\MediaFoundationDesktopRecorder.log
  • C:\Users\$(UserName)\AppData\Local\MediaFoundationDesktopRecorder.log (in case the first path above is inaccessible, esp. due to insufficient permissions)

Configuration

The configuration .INI file might contain a few settings that set up and alter the behavoir of the application:

[Input]
;Video Adapter Description=NVIDIA GeForce GTX 750
Video Output Device Name=\\.\DISPLAY2
;Audio Friendly Name=Stereo Mix (Realtek High Definition Audio)

When started, the application enumerates (“found video…”, “found audio…”) available video and audio inputs. These discoveries are compared against configuration file settings in order to identify monitor for recording, and possibly audio input device.

Default behavior is to take first available monitor, which happens when settings do not instruct otherwise. By default, no audio is recorded. Audio is recorded and added to resulting file if input device is provided explicitly.

The application also prints which devices are taken for further recording (“using adapter…”).

[Format]
;Video Frame Rate=30000
;Video Frame Rate Denominator=1001
Video Bitrate=4096000
Video Texture Pool Capacity=24
Video Throttle=70
Audio Bitrate=192000

Default behavior is to identify monitor’s refresh rate and produce output file with video at the same frame rate. Video Frame Rate and Video Frame Rate Denominator settings offer an override to target file frame rate. With the former value only, it is the frame rate. With both values they define a ratio, e.g. values of 30000 and 1001 result in 29.97 fps file.

Frame rate reduction is a good way to reduce encoding complexity and overall graphics subsystem load.

Bitrate values define respective bitrates for the encoded content.

Details

As recording goes, the application grabs new desktop snapshots and sends them to encoder. There are no specific expectations about frame rate stability and reduction in case of overload of graphics subsystem. When the complexity is excessive, it is expected that some frames might be lost without breaking the entire playability of the output file.

The application provides additional information when it creates a file, for example:

Using Direct3D 11 at feature level D3D_FEATURE_LEVEL_11_0
Using Desktop Duplication mode: Resolution 1680 x 1050, Refresh Rate 59954/1000, Format DXGI_FORMAT_B8G8R8A8_UNORM
Using path “D:\Projects\...\Output\20160707-070707.mp4”
Using video transform Direct3D 11 Aware, Category MFT_CATEGORY_VIDEO_PROCESSOR, Input MFVideoFormat_ARGB32, Output MFVideoFormat_NV12
Using video transform NVIDIA H.264 Encoder MFT, Direct3D 11 Aware, Category MFT_CATEGORY_VIDEO_ENCODER, Input MFVideoFormat_NV12, Output MFVideoFormat_H264
Started writing…
PPP frames written (QQQ frame timeouts, RRR early frame skips, SSS late frame skips)
Stopped writing
Output file size is TTT bytes

When started the application might experience a condition when certain hardware resource is no longer available, e.g. the desktop itself is locked by user. The application will close the file, and attempt to automatically restart recording into new file. The attempts keep going until user explicitly stops recording.

The application does NOT do the following (among things it could):

  • the application is limited to record from one monitor only; to record from two at a time it is possible to start several instances however the produced result will not be synchronized
  • the application does not provide options to record single window image, to cut a section of monitor image or to scale image down
  • the application does not offer choices for video encoders (e.g. there are two or more hardware H.264 encoders), it will always use encoder picked by the system
  • the application only offers bitrate setting for video encoding
  • the application does not provide flexibility in audio encoding settings, it also expects that audio device is available throughout the entire recording session (esp. is not unplugged as recording goes)

References (Informational)

Download links

Untweetable Video

Two H.264 MP4 files, close one to another. The files are playable, a sort of: Windows desktop players (except Media Foundation based), QuickTime, Android and iOS devices play them. The files are not flawless but make sense, hence the glitches.

There is a problem with the second file, which is rejected by Twitter “Your media file could not be processed.”. The file is treated well by browser, and uploaded to remote server. Twitter is unable to convert it on its backend.

 

Reference Signal Source: RGB32/ARGB32 Subtypes, Media Foundation Media Source for Video

An update for Reference signal source for DirectShow DLLs:

  • the source is doing more accurately RGB subtypes and allows specification whether you want MEDIASUBSTYPE_RGB32 or MEDIASUBSTYPE_ARGB32
  • additionally the DLL implements Microsoft Media Foundation Media Source for the video stream

A more detailed description follows.

RGB32 and ARGB32 are very close and share the same byte structure, and due to minimal support of alpha channel with video, these are having the difference mostly in counterpart support in other applications, like for example and specifically hardware-assisted H.264 encoders whcih are taking alpha-enabled variant.

IVideoSourceFilter::SetMediaType method takes vCompression argument which defines the subtype. RegisterSources sample code shows how the method is used when exposing reference signal as video capture device.

Similar IVideoMediaSource::SetMediaType methods is applicable to Media Foundation counterpart (see below).

Both implementation only offer the given subtype as default, but in the same time both accept the other variant as well if an application or peer connection is trying to re-agree the media type. Same applies to changing resolution etc. The sources are flexible to take different video format if anyone is requesting it.

The other big new thing is Media Foundation API Media Source which generates reference signal as well. There is no option to set it up as a virtual camera because the API does not offer extensibility of the kind, however the source can be used to generate test content via Media Foundation and the code remains pretty simple. I am publishing MfGenerate code snippet which demonstrates the necessary steps to create an MP4 file with video, with desired properties.

A frame from generated 4096x2304 content in Windows 10 player

As Media Foundation offers H.265 (HEVC) and fragmented MP4 options, they can also be easily used with the source to generate test footage.

The code does the following steps:

  1. Creates a media source (commented out lines show alternate steps to create a media source from a file)
  2. Creates a source reader from media source
  3. Builds an H.264 media type from raw video media type
  4. Creates and configures a sink writer, which is instructed to do its magic setting up H.264 encoder (a side note – the code produces 4096×2304 video, however it is only possible once hardware encoder is enabled; software encoder was rejecting the media type)
  5. Implements a loop of reading frames until they run out feeding them into encoder/writer

High level APIs are simple (similar to DirectShow), which is untrue for the internals (similar to DirectShow; even more so).

Media Foundation source is video only for now.

MF media source is supposed to be seekable (not really tested; not really testable with topoedit), and allows zero duration to produce infinite feed. Duration is not necessarily taken from property, it can also be specified with overwritten presentation descriptor attribute. The video format can also be set up through stream descriptor media type handler.

Download links

Update – Connecting MF Media Source to MFCaptureD3D Sample application

To quickly connect MF media source to Windows SDK MFCaptureD3D Sample application, add #import and a few code lines replacing the source around CPreview::SetDevice as shown on the image below:

MFCaptureD3D update for custom media source

DirectShowFileMediaSamples Update: Command Line Mode

It appears that the tool was never mentioned before (just mentioned in general software list). The application takes a media file on the input and applies respective DirectShow demultiplexer to list individual media samples.

DirectShowFileMediaSamples UI

  • for MP4 files the application attempts to use GDCL MPEG-4 Demultiplexer first
  • it is possible to filter a specific track/stream
  • ability to copy data to clipboard or save to file
  • drag and drop a file to get it processed

Now the tool has command line mode too:

DirectShowFileMediaSamples-Win32.exe input-path [/no-video] [/no-audio] [output-path]

  • /no-video – excludes video tracks
  • /no-audio – excludes audio tracks

Default output path is input path with extension renamed to .TSV. If DirectShowSpy is installed, the file also contains filter graph information used (esp. media types).

For example,

D:\>DirectShowFileMediaSamples-Win32.exe “F:\Media\Ленинград — Экспонат.mp4”

Typical command line use: troubleshooting export/transcoding sessions where on completion you need a textual information about the export to make sure time accuracy of individual samples: start, stop times, gaps etc.

Interactively one can also achieve the same goal using GraphStudioNext‘s built-in Analyzer Filter.

Download links

Media Foundation MPEG-4 Property Handler might report incorrect Video Frame Rate

To follow up previous post with Media Foundation bug, here goes another one related to property handler for MPEG-4 files (.MP4) and specific property PKEY_Video_FrameRate which reports frame rate for given media file.

This is the object responsible for filling columns in explorer, or otherwise visually the bug might look like this:

Image001

The values of the properties are also accessible programmatically using IPropertyStore::GetValue API, in which case they are:

  • PKEY_Video_FrameWidth1280 (VT_UI4) // 1,280
  • PKEY_Video_FrameHeight720 (VT_UI4) // 720
  • PKEY_Video_FrameRate1091345 (VT_UI4) // 1,091,345
  • PKEY_Video_Compression{34363248-0000-0010-8000-00AA00389B71} (VT_LPWSTR) // FourCC H264
  • PKEY_Video_FourCC875967048 (VT_UI4) // 875,967,048
  • PKEY_Video_HorizontalAspectRatio1 (VT_UI4) // 1
  • PKEY_Video_VerticalAspectRatio1 (VT_UI4) // 1
  • PKEY_Video_StreamNumber2 (VT_UI4) // 2
  • PKEY_Video_TotalBitrate12123288 (VT_UI4) // 12,123,288

The actual frame rate of the file is 50 fps. The file is playable well in every media player, so the problem is the reporting itself. Let us look inside the file to possibly identify the cause. The mdhd box for the video track shows the following information:

Image003

Let us do some math now:

  • Time Scale: 10,000,000
  • Duration: 4,501,200,000 (around 7.5 minutes)
  • Video Sample Count: 22,506

This makes the correct fps of 50 (frames per scaled duration). However the duration number itself is a pretty big one and looks exceeding the 32-bit range. Now let us try this one:

22506 / (4501200000 & ((1 << 32) – 1)) * 10000000

And we get 1,091. Bingo! Arithmetic overflow in the property handler then…

See also:

Bonus tool: FilePropertyStore application which reads properties of the file you drag and drop onto it, Win32 and x64 versions.

Image002