Media Foundation incorrectly reports resolution for H.265/HEVC video tracks

Another problem (bug) with Microsoft Media Foundation MPEG-4 Media Source H.265/HEVC handler is that it ignores conformance_window_flagflag and values from H.265’s seq_parameter_set_rbsp (see H.265 spec, F.7.3.2.2.1 General sequence parameter set RBSP syntax).

The problem might or might not be limited to fragmented MP4 variants.

It is overall questionable whether it has been a good idea to report video stream properties using parameter set data. This is not necessarily bad, especially if it was accurately documented in first place. Apparently this raises certain issues from time to time, like this one:
Media Foundation and Windows Explorer reporting incorrect video resolution, 2560×1440 instead of 1920×1080. Perhaps every other piece of software and library does not take a trouble to parse the bitstream and simply forwards values from tkhd and/or stsd boxes, why not?

Not the case of Media Foundation primitives which shake the properties out of bitstreams and their parameter sets. There is no problem if values match one another through the file of course.

A bigger problem, however, is that parsing out H.265/HEVC bitstream the media source handler fails to take into account cropping window… Seriously!

conformance_window_flag equal to 1 indicates that the conformance cropping window offset parameters follow next in the SPS. conformance_window_flag equal to 0 indicates that the conformance cropping window offset parameters are not present.

The popular resolution of 1920×1080 when encoded in 16×16 macroblocks is effectively consisting of 120×68 blocks with 1088 luma samples in height. The height of 1080 is obtained by cropping 1088 from either or both sides. By ignoring the cropping, Microsoft’s handler misreporting video height 1920×1088 even if all parts of video file have the correct value of 1080.

1920×1080 HEVC (meaning it does not play in every browser – beware and use Edge)
 MF_MT_MAJOR_TYPE, vValue {73646976-0000-0010-8000-00AA00389B71} (Type VT_CLSID, MFMediaType_Video, FourCC vids)
 MF_MT_SUBTYPE, vValue {43564548-0000-0010-8000-00AA00389B71} (Type VT_CLSID, MFVideoFormat_HEVC, FourCC HEVC)
 MF_MT_VIDEO_PROFILE, vValue 1 (Type VT_UI4)
 MF_MT_VIDEO_LEVEL, vValue 123 (Type VT_UI4)
 MF_MT_FRAME_SIZE, vValue 8246337209408 (Type VT_UI8, 1920x1088)
 MF_MT_INTERLACE_MODE, vValue 7 (Type VT_UI4)
 MF_MT_FRAME_RATE, vValue 65970697666816 (Type VT_UI8, 15360/256, 60.000)
 MF_MT_AVG_BITRATE, vValue 41976 (Type VT_UI4)
 MF_MT_MPEG4_CURRENT_SAMPLE_ENTRY, vValue 0 (Type VT_UI4)
 MF_MT_MPEG4_SAMPLE_DESCRIPTION, vValue 00 00 00 D1 68 76 63 31 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 07 80 04 38 00 00 00 48 00 00 00 48 00 00 00 00 00 01 0B 48 45 56 43 20 43 6F 64 69 6E 67 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 18 FF FF 00 00 00 7B 68 76 63 43 01 01 00 00 00 40 00 B0 00 00 00 00 7B F0 F0 FC FD F8 F8 3C 00 0B 03 A0 00 01 00 17 40 01 0C 01 FF FF… (Type VT_VECTOR | VT_UI1)
 MF_MT_VIDEO_ROTATION, vValue 0 (Type VT_UI4)
 MF_NALU_LENGTH_SET, vValue 1 (Type VT_UI4)

CleanPoint markup fun with a fragmented MP4 file and Media Foundation MPEG-4 Source

MPEG-4 Media Foundation Source stubbornly keeps marking a second video sample with a MFSampleExtension_CleanPoint flag even though nothing suggests that the video frame is an IDR frame.

The actual video frame is a P frame both in terms of MP4 box formatting and contained NAL units (the video is in fact an “infinite GOP” flavor of recording where all frames are P frames except the very first IDR one).

The problem is specific to fragmented MP4 files (and maybe even a subset of those), however is pretty much consistent and shows up with both H.264 and H.265/HEVC video.

Use of ICodecAPI interface with a video encoder managed by Media Foundation Sink Writer instance

A bump of StackOverflow post about Media Foundation design flaw related to video encoding.

Set attributes via ICodecAPI for a H.264 IMFSinkWriter Encoder

I am trying to tweak the attributes of the H.264 encoder created via ActivateObject() by retrieving the ICodecAPI interface to it. Although I do not get errors, my settings are not taken into account. […]

Media Foundation’s Sink Writer is a simplified API with a encoder configuration question slipped away. The fundamental problem here is that you don’t own the encoder MFT and you are accessing it over the writer’s head, then the behavior of encoders around changing settings after everything is set up depends on implementation, which is in encoder’s case a vendor specific implementation and might vary across hardware.

Your more reliable option is to manage encoding MFT directly and supply Sink Writer with already encoded video.

Your potential trick to make things work with less of effort is to retrieve IMFTransform of the encoder as well and clear and then set back the input/output media types after you finished with ICodecAPI update. Nudging the media types, you suggest that encoder re-configures the internals and it would do this already having your fine tunings. Note that this, generally speaking, might have side issues.

Continue reading →

Source code to fit 80 columns of text

LLVM Coding Standards – Source Code Width:

Write your code to fit within 80 columns of text. This helps those of us who like to print out code and look at your code in an xterm without resizing it.
The longer answer is that there must be some limit to the width of the code in order to reasonably allow developers to have multiple files side-by-side in windows on a modest display. If you are going to pick a width limit, it is somewhat arbitrary but you might as well pick something standard. Going with 90 columns (for example) instead of 80 columns wouldn’t add any significant value and would be detrimental to printing out code. Also many other projects have standardized on 80 columns, so some people have already configured their editors for it (vs something else, like 90 columns).
This is one of many contentious issues in coding standards, but it is not up for debate.

Is there any more stupid rule than to wrap around source code lines just because someone would possibly look at code in an xterm?

So source is consuming less than 25% width of a quote ordinary monitor wasting all this space on the right. Same time, the source code lines are objectively long and are massively wrapped around.

Wrapping destroys readability of code.

Re-wrapping source code has an obvious negative effect on change tracking.

I, for once, want to see as much of source code as possible momentarily because it helps to have a picture of what is going on. Information at the end of lines is less important so it is not a big deal even if it goes beyond the right visible margin, but it’s important to have as many LINES of code as possible – I would even prefer to skip blank lines and utilize IDE’s capabilities to collapse comments, functions, regions and scopes. For this reason some developers even rotate monitors into portrait mode – to see more of source code at a time.

Fitting 80 columns and having it even not up for debate is a clearly genius move to keep devs productive. Through continuous irritation.

Video compression in AVerMedia Live Gamer Ultra GC553

The next generation of game capture is here.” The device addresses needs of real time capture of video signal: offering a pass-through HDMI connection the box provides a video capture sink with USB 3.1 Type C interface and makes the video signal available to video capture applications via standard DirectShow and Media Foundation APIs.

I was interested whether the device implements video compression, H.264 and/or H.265/HEVC in hardware. The technical specifications include:

• Max Pass-Through Resolutions:2160p60 HDR /1440p144 / 1080p240
• Max Record Resolutions:2160p30 / 1440p60 / 1080p120 / 1080p60 HDR
• Supported Resolutions (Video input):2160p, 1440p, 1080p, 1080i, 720p, 576p, 480p
Record Format: MPEG 4 (H.264+AAC) or (H.265+AAC)*

Notes:
*H.265 Compression and HDR are supported by RECentral

So there is a direct mention of video compression, and given the state of the technology and the price of the box it makes sense to have it there. Logitech C930e camera has been offering H.264 video compression onboard for years.

So is it there in the Ultra thing? NO, IT IS NOT. Pathetic…

One could guess this of course from a study of FAQ section in the part of third party software configuration. The software is clearly expected to use external compression capabilities. However popular software is also known to not use the latest stuff, so there was a little chance that hardware codec is still there. I think it would fair to include that right there into technical specification that the product does not offer any encoding capabilities.

The good thing is that the box offers 10-bit video capture up to 2560×1440@30 – there is not so much of inexpensive hardware capable to do such job.

The specification mentions high rate 1920×1080@120 mode but I don’t see it in the effectively advertised capabilities.

Also, video capture capabilities in Media Foundation API suggest that it is possible to capture into video memory bypassing system memory mapping/copy. Even though it is irrelevant to most of the applications, some newer ones including those leveraging UWP video capture API could take advantage (such as, for example, video capture apps running on low power consumption devices).

Media Foundation on Raspberry Pi 3 Model B+

The interesting part with live WebM Media Foundation media source I mentioned in the previous post is that the whole thing works great on… Raspberry Pi 3 Model B+ running Windows 10 IoT Core (RaspberryPi 3B+ Technical Preview Build 17661).

Windows 10 IoT has quite the same Media Foundation infrastructure as in other Universal Windows Platform environments (Desktop, Xbox, HoloLens) including the core API, primitives, support in XAML MediaElement (MediaPlayerElement). There is no DirectX support on Raspberry Pi 3 Model B+ and video delivery fails, however this is a sort of known/expected problem with the Technical Preview build. Audio playback is okay.

The picture above is taken on C# UWP application (that’s ARM platform) running a MediaPlayerElement control taking live audio signal from network using a Windows.Networking.Sockets.MessageWebSocket connection.

A custom (the platform does not have a capable primitive out of the box) WebM live media source forwards the signal to media element for low latency audio playback. The codec is Opus and, yes, stock Media Foundation audio decoder MFT decodes the signal just fine.