Media Foundation on Raspberry Pi 3 Model B+

The interesting part with live WebM Media Foundation media source I mentioned in the previous post is that the whole thing works great on… Raspberry Pi 3 Model B+ running Windows 10 IoT Core (RaspberryPi 3B+ Technical Preview Build 17661).

Windows 10 IoT has quite the same Media Foundation infrastructure as in other Universal Windows Platform environments (Desktop, Xbox, HoloLens) including the core API, primitives, support in XAML MediaElement (MediaPlayerElement). There is no DirectX support on Raspberry Pi 3 Model B+ and video delivery fails, however this is a sort of known/expected problem with the Technical Preview build. Audio playback is okay.

The picture above is taken on C# UWP application (that’s ARM platform) running a MediaPlayerElement control taking live audio signal from network using a Windows.Networking.Sockets.MessageWebSocket connection.

A custom (the platform does not have a capable primitive out of the box) WebM live media source forwards the signal to media element for low latency audio playback. The codec is Opus and, yes, stock Media Foundation audio decoder MFT decodes the signal just fine.

Parsing live WebM stream is not so easy

A magic transition from E_BUFFER_NOT_FULL to E_FILE_FORMAT_INVALID in depths of libwebm

Media.dll!mkvparser::Block::Parse(const mkvparser::Cluster * pCluster) Line 7630    C++
Media.dll!mkvparser::BlockGroup::Parse() Line 7579 C++
Media.dll!mkvparser::Cluster::CreateBlockGroup(__int64 start_offset, __int64 size, __int64 discard_padding) Line 7253 C++
Media.dll!mkvparser::Cluster::CreateBlock(__int64 id, __int64 pos, __int64 size, __int64 discard_padding) Line 7154 C++
Media.dll!mkvparser::Cluster::ParseBlockGroup(__int64 payload_size, __int64 & pos, long & len) Line 6724 C++
Media.dll!mkvparser::Cluster::Parse(__int64 & pos, long & len) Line 6381 C++
Media.dll!mkvparser::Cluster::GetNext(const mkvparser::BlockEntry * pCurr, const mkvparser::BlockEntry * & pNext) Line 7369 C++
Media.dll!WebmLiveMediaSource::HandleData(std::vector,std::allocator > > & AsyncResultVector) Line 757 C++
Media.dll!WebmLiveMediaSource::ReadInvoke(AsyncCallbackT * AsyncCallback, IMFAsyncResult * AsyncResult) Line 1176 C++
Media.dll!AsyncCallbackT::Invoke(IMFAsyncResult * AsyncResult) Line 2344 C++
RTWorkQ.dll!CSerialWorkQueue::QueueItem::ExecuteWorkItem() Unknown
RTWorkQ.dll!CSerialWorkQueue::QueueItem::OnWorkItemAsyncCallback::Invoke() Unknown
RTWorkQ.dll!ThreadPoolWorkCallback() Unknown
ntdll.dll!TppWorkpExecuteCallback() Unknown
ntdll.dll!TppWorkerThread() Unknown
kernel32.dll!BaseThreadInitThunk() Unknown
ntdll.dll!RtlUserThreadStart() Unknown

The problem here is that the stream is live and is available in increments in ultra low latency consumption mode. The libwebm library code structure does not suggest it is sufficiently robust for such processing (or maybe it’s good and it’s just one bug? who can tell) even though it apparently have multiple code paths added for live signal. There is no problem to parse a complete file, of course, and then even a retry from the same point succeeds once new data is appended.

It looks like a reasonable workaround here is to check whether we are close to the edge of the stream and temporarily ignore errors like this.

Library parsing performance/efficiency is also a bit questionable. The library is not capable to process incremental reads via mkvparser::IMkvReader. Instead it keeps steppping back all over the parsing process and the live signal source has to keep a bit of processed data because it can be requested once again…

Apparently UWP as a platform has code capable to process this type of data reliably as MediaElement has built-in support for the format in Media Source Extensions (MSE) mode. However, this implementation is limited for internal consumers.

DirectShow VMR-7 bug in Windows 10

DirectShow Video Mixing Renderer (VMR-7) filter exhibits a (regression?) bug in Windows 10 systems. When aspect ratio preservation is enabled in VMR_ARMODE_LETTER_BOX mode, which makes overall sense as default mode quote so often, the letterboxing does not work as expected.

The problem is easy to reproduce with a well known DShowPlayerSDK sample application, with an edit enforcing VMR-7 mode. Once video is started, just resize the window and the parts not covered by video will not be erased as expected.

Apparently this worked well earlier.

UpdateVersionInfoGit: Multiple references/hashes

Some time ago I shared an application which I have been using to embed git reference into binary resources, especially as a post-build event in automated manner: Embedding a Git reference at build time.

This time I needed a small amendment related to use of a git repository as a sub-module of another repository. To make things easier for troubleshooting, when a project if built as a part of bigger build through a sub-module repository reference, both git details of the repository and its parent might be embedded into resources.

“$(ProjectDir)..\_Bin\Third Party\UpdateVersionInfoGit-$(PlatformName).exe” path “$(ProjectDir)..” path “$(ProjectDir)..\..” binary “$(TargetPath)”

The utility allows multiple path arguments, will go over all of them and concetenate the “git log” output. When multiple paths are given it is okay to some of them be invalid or unrelated to git repositories.

Download links

Infrared Camera in Media Foundation

Surface Pro (5th Gen) infrared camera streamed into Chrome browser in H.264 encoding over WebSocket connection

The screenshot above shows Surface Pro tablet’s infrared camera (known as “Microsoft IR Camera Front” on the device) captured live, encoded and streamed (everything is hosted by Microsoft Media Foundation Media Session by this point) over network using WebSockets into Chrome’s HTML5 video tag by means of Media Source Extensions (MSE).

Why? Because why not.

Unfortunately, Microsoft did not publish/document API to access infrared and depth (time-of-flight) cameras so that traditional applications could use the hardware capabilities. Nevertheless, the functionality is available in Universal Windows Platform (UWP), see Windows.Media.Capture.Frames and friends.

UWP implementation is apparently using Media Foundation on its backyard so the fucntionlaity could certainly be published for desktop applications as well. Another interesting thing is that my [undocumented] way to access the device seems to be bypassing frame server and talks to device directly, including video.

It does not look like Microsoft is planning to extend visibility of these new features to desktop Media Foundation API since they sequentially add new features without exposing them for public use outside UWP. UWP API itself is eclectic and I can’t imagine how one could get a good understanding of it without having a good grip on underlying API layers.

Media Foundation MP4 Media Source gets a bit too tired when doing too much work

It appears there is a sort of a limitation (read: “a bug”) in Media Foundation MPEG-4 File Source implementation when it comes to reading long fragmented MP4 files.

When respective media source is used to read a file (for which, by the way, it does not offer seeking), the source issues a MF_SOURCE_READERF_ENDOFSTREAM before reaching actual end of file.

When some software sees a full hour of video in the file…

… Media Foundation primitive, after reading frame 00:58:35.1833333, issues “oh gimme a break” event and reports end of stream.

NVIDIA Video Codec SDK encoder initialization memory leak

It appears that re-initialization of encoding session with NVIDIA Video Codec SDK is or might be producing an unexpected memory leak.

So, how does it work exactly?

Status = m_ApiFunctionList.nvEncInitializeEncoder(m_Encoder, &InitializeParams);
assert(Status == NV_ENC_SUCCESS);
// NOTE: Another nvEncInitializeEncoder call
Status = m_ApiFunctionList.nvEncInitializeEncoder(m_Encoder, &InitializeParams);
assert(Status == NV_ENC_SUCCESS); // Still success
Status = m_ApiFunctionList.nvEncDestroyEncoder(m_Encoder);
assert(Status == NV_ENC_SUCCESS);

The root case problem is secondary nvEncInitializeEncoder call. Alright, it might be not exactly how API is designed to work, but returned statuses all indicate success, so it will be a bit hard to justify the leak by telling that second initialization call was not expected in first place. Apparently the implementation overwrites internally allocated resources without accurate releasing or reusing. And without triggering any warning of sorts.

Another part of the problem is eclectic design of the API in first place. You open a “session” and obtain “encoder” as a result. Then you initialize “encoder” and when you are finished you destroy “encoder”. Do you destroy “session”? Oh no, you don’t have any session at all except that API opening “session” actually opens an “encoder”.

So when I get into situation where I want to initialize encoder and it is already initialized then what I do is to destroy existing “encoder”, open new “session” and now I can initialize the session-encoder once again with the initialization parameters.