Demo: Live OpenCV Non-local Means Denoising

On occasion I hooked OpenCV Non-local Means Denoising to the Media Foundation camera capture test application already meantioned in previous posts. It is the regular, non-CUDA implementation wrapped into Media Foundation Transform so that it could be plugged directly into Media Foundation Capture Engine.

Original implementation is taken from non-realtime denoised and hence the question is how good it is for real time video. Unforutunately, it appears to be rather slow. Essentially, it is a Media Foundation wrapper over…

fastNlMeansDenoisingColoredMulti(..., ..., 1, 3, ...);

… with minimal temporal window of three frames sliding one by one. OpenCV implementation could obviosuly be better too, but probably the most efficient improvement would be to take advantage of CUDA variant.

Well, anyway, the demo is here, to see how slow it is for live…

Demo: Direct3D 11 aware SuperResolution scaler based on AMD framework (updated)

While on it, a quick update to the previous post/application:

  • added dynamic media type change support to upscaler effect MFT
  • the video is automatically upscaled to map 1:1 to effective pixels & window size, as you resize the application window
  • added high DPI awareness markup/support to the application
  • removed FSR 1.1 as it would fail if upscaling falls outside supported range
  • default resolution is changed to 1280×720 for better UX out of the box

Demo: Direct3D 11 aware SuperResolution scaler based on AMD framework

A variant of previous CaptureEngineVideoCapture demo application which features AMD Advanced Media Framework SuperResolution scaler for video.

It is basically a live video camera application started in low resolution mode, and it enables you to switch between GPU (OpenCL probably?) implemented realtime upscaling modes.

AMD scaler is wrapped into Media Foundation Transform and is applied to Microsoft Media Foundation Capture Engine as a video effect.

Note that SuperResolution 1.1 mode assumes upscaling in range 1.1 to 2.0 only and might not work if you select video resolution & target scaling resolution outside of the range.


The scaler is fast and fully GPU backed, perfect for real time, however the effect is not that obvious. Still it’s easy to see yourselves, just run and that’s it. Maybe next time I will do side by side, and then also DNN backed Media Foundation Transform to possibly address more expressive video output.

Also obviously it would help to dynamically change output resolution to be 1:1 with window size… It is also for the next experiment.

Demo: Direct3D 11 GPU Windows 11 Virtual Camera

Moving on with Widows Camera experiments.

This time it is a Windows 11 Virtual Camera, a true one like this and this, and not this which should be totally archived despite people still keep on trying.

So what we got this time? The application derived from old CaptureEngineVideoCapture sample still features IMFCaptureEngine API. However, it also intallls temporary a virtual camera based on beautiful Shadertoy shader “Entangled Illumination” (you can still find the HLSL variant in the resources of the executable downloadable from the bottom of the post).

The image is synthesized on GPU (hardcoded as 1920×1080, announced as 30 fps but runs at the pace of frame requester) and is then delivered to Frame Server service for distribution.

Oh my, how poorly this Microsof Windows Media Foundation stuff is documented! In fact, this demo is the proof that despite the quality of the documentation of this technology, an experienced stranger can still take advantage of it.

From the very beginning this video travels through Windows subsystems and processes as GPU (video memory, D3D11 texture) data, from the synthesis to display composition, not a single video/system memory crossing.

You can stop preview in the test application (F4 or menu), and while the application is running the virtual video camera is visible to other applications, e.g. Windows Camera app or Skype:

Continue reading →

Microsoft’s MPEG-DASH client implementation inaccurately handles SSR flavor of streaming media

Recent additions to MPEG-DASH specification (ISO/IEC 23009-1 5th ed., AMD3; also SCTE 214-6) offer new ways to implement low-latency low-delay playback: using so called Segment Sequence Representations (SSR).

MPGE-DASH manifests advertise this flavor of data by including a respective EssentialProperty.

Before Shaka Player Demo added support for this technology, they ignored the respective adaptation set as “unknown” and “unsupported”, which seems to be the right way to address something you do not understand.

Here is the Microsoft’s approach.

They do support MPEG-DASH playback in, for example, their XAML MediaElement (below is the screenshot from PlayReady sample):

They apparently support adaptation set switching urn:mpeg:dash:adaptation-set-switching:2016 as they mix downloads from regular and SSR-enabled adaptation sets.

However they do not support segment sequence representations urn:mpeg:dash:ssr:2023 themselves as they attempt to download content without replacing $SubNumber$ placeholder, and, hence, they ignore the EssentialProperty markup: the adaptation set is marked as something new they do not understand but they take chances to play it back.

Demo: GPU shader Sobel filter and video capture with Media Foundation Capture Engine API

Back to some experiments…

The current video capture API in Windows is Media Foundation Capture Engine API (AKA IMFCaptureEngine and mfcaptureengine.h). Media Foundation is layered: you can work at lower level with video capture Media Sources, but if you don’t want to go into details you have the Capture Engine.

The application continues a good old CaptureEngineVideoCapture sample from Windows Classic Samples, and adds an effect in form factor of Media Foundation Transform to the engine.

One of the unobvious aspects is that video stream goes with data backed in video memory by default. Hence, the effect would preferrably handle the stream on associated Direct3D 11 device.

The effect here is a HLSL equivalent of Webcam edge glow shadertoy, and processes the video from webcamera on GPU, via shader implementing Sobel operator. The data does not leave video memory and continues its travel along capture engine to preview visualization.

The shader code can be looked up at in the executable resources, it is compiled on runtime.