Demo: Web camera video with MPEG-DASH live broadcasting

New series in demonstrations of what one can squeeze out of Windows Media Foundation Capture Engine API.

This video camera capture demonstration application features a mounted MPEG-DASH (Dynamic Adaptive Streaming over HTTP) server. The concept is straightforward: during video capture, the application takes the video feed and compresses it in H.264/AVC format using GPU hardware-assisted encoding. It then retains approximately two minutes of data in memory and generates an MPEG-DASH-compatible view of this data. The view follows the dynamic manifest format specified by ISO/IEC 23009-1. The entire system is integrated with the HTTP Server API and accessible over the network.

Since it is pretty standard streaming media (just maybe without adaptive bitrate capability: the broadcasting takes place in just one quality) the signal can be played back with something like Google Shaka Player. As the application keeps last two minutes of data, you can rewind web player back to see yourselves in past… And then fast forward yourselves into future once again.

Just Windows platform APIs, Microsoft Windows Media Foundation and C++ code, the only external library is Windows Implementation Libraries (WIL) if this classifies at all as an external library. No FFmpeg, no GStreamer and such. No curl, no libhttpserver and whatever web servers are. That is, as simple as this:

auto const ToSeconds = [] (NanoDuration const& Value, double Multiplier = 1.0) -> std::wstring
{
	return Format(L"PT%.2fS", Multiplier * Value.count() / 1E7);
};

Element Mpd(L"MPD", // ISO/IEC 23009-1:2019; Table 3 — Semantics of MPD element; 5.3.1.2 Semantics
{
	{ L"xmlns", L"urn:mpeg:dash:schema:mpd:2011" },
	//{ L"xmlns", L"xsi", L"http://www.w3.org/2001/XMLSchema-instance" },
	//{ L"xsi", L"schemaLocation", L"urn:mpeg:dash:schema:mpd:2011 http://standards.iso.org/ittf/PubliclyAvailableStandards/MPEG-DASH_schema_files/DASH-MPD.xsd" },
	{ L"profiles", L"urn:mpeg:dash:profile:isoff-live:2011" },
	{ L"type", L"dynamic" },
	{ L"maxSegmentDuration", ToSeconds(2s) },
	{ L"minBufferTime", ToSeconds(4s) },
	{ L"minimumUpdatePeriod", ToSeconds(1s) },
	{ L"suggestedPresentationDelay", ToSeconds(3s) },
	{ L"availabilityStartTime", FormatDateTime(BaseLiveTime) },
	//{ L"publishTime", FormatDateTime(BaseLiveTime) },
});

The video is compressed once as video capture process goes, and the application is integrated with native HTTP web server, so the whole thing is pretty scalable: connect multiple clients and this is fine, the application mostly provides you a view into H.264/AVC data temporarily kept in memory within the retention window. For the same reason resource consumption of the solution is what you expect it to be. The playback clients do not evenhave to play the same historical part of the content:

So okay well, this demo opens path to next steps once in a while: audio, DRM, HLS version, low latency variants such as LL-HLS, MPEG-DASH segment sequence representations.

So just have the webcam video capture application working, and open MPEG-DASH manifest http://localhost/MediaFoundationCameraToolkit/Capture/manifest.mpd with https://shaka-player-demo.appspot.com/ using “Custom Content” option.

Note that the application requires administrative elevated access in order to use HTTP Server API capabilities (AFAIR it is possible to make it another way, but you don’t need this this time).

The application doing video capture, rendering the 1920×1080@30 stream to the user interface, teeing signal into additional processing, doing hardware assisted video encoding, packaging, serving MPEG-DASH content is not taking too many resources: it is just something that makes good sense.

Oh and one can also use standard C# tooling to display this sort of video signal, here we go with standard PlayReady C# Sample with a XAML MediaElement inside:

Demo: Live camera video with Microsoft’s Video Stabilization Effect MFT

In continuation of camera demos, one another build with Microsoft’s Video Stabilization MFT.

In the context of Capture Engine applciation and use of the MFT as an effect, it is used in its defautl configuration, in particular without explicit low latency mode. This creates a noticeable delay in video transmission. Still, it is what it is – the effect still passes through the video feed.

Still, it is hardware accelerated and is apparently well suitable for real-time video processing.

Demo: Live OpenCV Non-local Means Denoising

On occasion I hooked OpenCV Non-local Means Denoising to the Media Foundation camera capture test application already meantioned in previous posts. It is the regular, non-CUDA implementation wrapped into Media Foundation Transform so that it could be plugged directly into Media Foundation Capture Engine.

Original implementation is taken from non-realtime denoised and hence the question is how good it is for real time video. Unforutunately, it appears to be rather slow. Essentially, it is a Media Foundation wrapper over…

fastNlMeansDenoisingColoredMulti(..., ..., 1, 3, ...);

… with minimal temporal window of three frames sliding one by one. OpenCV implementation could obviosuly be better too, but probably the most efficient improvement would be to take advantage of CUDA variant.

Well, anyway, the demo is here, to see how slow it is for live…

Demo: Direct3D 11 aware SuperResolution scaler based on AMD framework (updated)

While on it, a quick update to the previous post/application:

  • added dynamic media type change support to upscaler effect MFT
  • the video is automatically upscaled to map 1:1 to effective pixels & window size, as you resize the application window
  • added high DPI awareness markup/support to the application
  • removed FSR 1.1 as it would fail if upscaling falls outside supported range
  • default resolution is changed to 1280×720 for better UX out of the box

Demo: Direct3D 11 aware SuperResolution scaler based on AMD framework

A variant of previous CaptureEngineVideoCapture demo application which features AMD Advanced Media Framework SuperResolution scaler for video.

It is basically a live video camera application started in low resolution mode, and it enables you to switch between GPU (OpenCL probably?) implemented realtime upscaling modes.

AMD scaler is wrapped into Media Foundation Transform and is applied to Microsoft Media Foundation Capture Engine as a video effect.

Note that SuperResolution 1.1 mode assumes upscaling in range 1.1 to 2.0 only and might not work if you select video resolution & target scaling resolution outside of the range.


The scaler is fast and fully GPU backed, perfect for real time, however the effect is not that obvious. Still it’s easy to see yourselves, just run and that’s it. Maybe next time I will do side by side, and then also DNN backed Media Foundation Transform to possibly address more expressive video output.

Also obviously it would help to dynamically change output resolution to be 1:1 with window size… It is also for the next experiment.

Demo: Direct3D 11 GPU Windows 11 Virtual Camera

Moving on with Widows Camera experiments.

This time it is a Windows 11 Virtual Camera, a true one like this and this, and not this which should be totally archived despite people still keep on trying.

So what we got this time? The application derived from old CaptureEngineVideoCapture sample still features IMFCaptureEngine API. However, it also intallls temporary a virtual camera based on beautiful Shadertoy shader “Entangled Illumination” (you can still find the HLSL variant in the resources of the executable downloadable from the bottom of the post).

The image is synthesized on GPU (hardcoded as 1920×1080, announced as 30 fps but runs at the pace of frame requester) and is then delivered to Frame Server service for distribution.

Oh my, how poorly this Microsof Windows Media Foundation stuff is documented! In fact, this demo is the proof that despite the quality of the documentation of this technology, an experienced stranger can still take advantage of it.

From the very beginning this video travels through Windows subsystems and processes as GPU (video memory, D3D11 texture) data, from the synthesis to display composition, not a single video/system memory crossing.

You can stop preview in the test application (F4 or menu), and while the application is running the virtual video camera is visible to other applications, e.g. Windows Camera app or Skype:

Continue reading →