// Oprogramowanie Roman Ryltsov
Something I built on my development machine in order to flash into Android set-top box device for the expo to entertain people for three days in a row… I hope everyone had fun!
Serving MPEG-DASH differs from serving HLS but as long as you have video packaged in ISO BMFF segments, adding an option to also expose content as HLS (HTTP Live Streaming, RFC 8216) is not too difficult.
Besides, being able to stream webcamera signal as MPEG-DASH using http://localhost/MediaFoundationCameraToolkit/Capture/manifest.mpd I also made it possible to use http://localhost/MediaFoundationCameraToolkit/Capture/master.m3u8 with, for example, the same Shaka Player Demo, or https://hlsjs.video-dev.org/demo/.
BTW, hls.js has even better visualization of media buffering:
Since the last demo appears to be quite nice, one another addition: ability to drop the internal memory video content of the application into MP4 file.
As the application works and shows video preview, it keeps H.264/AVC version of the data in memory in the form of sliding window. Now you can just hit F8 and have this video – the last two minutes, that is – written into an MP4 file.
What the application does is effectively this: it uses initialization segment from MPEG-DASH content, and concatenates all media segments that the server keeps ready. This way we have an ISO BMFF media file, of the fragmented MP4 flavor. It is playable but not nicely, so the application contintues and then right in the memory and on the fly it re-packages (there is a related bug in Windows Media Foundation which needs to be worked around, but it is what it is) this file into standard MP4 file, using here and there just Windows Media Foundation. This whole process is instant, of course.
And more to this, alternatively you can take this video snapshot even via web server! Just request http://localhost/MediaFoundationCameraToolkit/Capture/video.mp4 and you will have the application to do the same processing and preparation/export of video file, but it will be delivered to browser instead of saving to file system.
Have fun!
New series in demonstrations of what one can squeeze out of Windows Media Foundation Capture Engine API.
This video camera capture demonstration application features a mounted MPEG-DASH (Dynamic Adaptive Streaming over HTTP) server. The concept is straightforward: during video capture, the application takes the video feed and compresses it in H.264/AVC format using GPU hardware-assisted encoding. It then retains approximately two minutes of data in memory and generates an MPEG-DASH-compatible view of this data. The view follows the dynamic manifest format specified by ISO/IEC 23009-1. The entire system is integrated with the HTTP Server API and accessible over the network.
Since it is pretty standard streaming media (just maybe without adaptive bitrate capability: the broadcasting takes place in just one quality) the signal can be played back with something like Google Shaka Player. As the application keeps last two minutes of data, you can rewind web player back to see yourselves in past… And then fast forward yourselves into future once again.
Just Windows platform APIs, Microsoft Windows Media Foundation and C++ code, the only external library is Windows Implementation Libraries (WIL) if this classifies at all as an external library. No FFmpeg, no GStreamer and such. No curl, no libhttpserver and whatever web servers are. That is, as simple as this:
auto const ToSeconds = [] (NanoDuration const& Value, double Multiplier = 1.0) -> std::wstring
{
return Format(L"PT%.2fS", Multiplier * Value.count() / 1E7);
};
Element Mpd(L"MPD", // ISO/IEC 23009-1:2019; Table 3 — Semantics of MPD element; 5.3.1.2 Semantics
{
{ L"xmlns", L"urn:mpeg:dash:schema:mpd:2011" },
//{ L"xmlns", L"xsi", L"http://www.w3.org/2001/XMLSchema-instance" },
//{ L"xsi", L"schemaLocation", L"urn:mpeg:dash:schema:mpd:2011 http://standards.iso.org/ittf/PubliclyAvailableStandards/MPEG-DASH_schema_files/DASH-MPD.xsd" },
{ L"profiles", L"urn:mpeg:dash:profile:isoff-live:2011" },
{ L"type", L"dynamic" },
{ L"maxSegmentDuration", ToSeconds(2s) },
{ L"minBufferTime", ToSeconds(4s) },
{ L"minimumUpdatePeriod", ToSeconds(1s) },
{ L"suggestedPresentationDelay", ToSeconds(3s) },
{ L"availabilityStartTime", FormatDateTime(BaseLiveTime) },
//{ L"publishTime", FormatDateTime(BaseLiveTime) },
});
The video is compressed once as video capture process goes, and the application is integrated with native HTTP web server, so the whole thing is pretty scalable: connect multiple clients and this is fine, the application mostly provides you a view into H.264/AVC data temporarily kept in memory within the retention window. For the same reason resource consumption of the solution is what you expect it to be. The playback clients do not evenhave to play the same historical part of the content:
So okay well, this demo opens path to next steps once in a while: audio, DRM, HLS version, low latency variants such as LL-HLS, MPEG-DASH segment sequence representations.
So just have the webcam video capture application working, and open MPEG-DASH manifest http://localhost/MediaFoundationCameraToolkit/Capture/manifest.mpd with https://shaka-player-demo.appspot.com/ using “Custom Content” option.
Note that the application requires administrative elevated access in order to use HTTP Server API capabilities (AFAIR it is possible to make it another way, but you don’t need this this time).
The application doing video capture, rendering the 1920×1080@30 stream to the user interface, teeing signal into additional processing, doing hardware assisted video encoding, packaging, serving MPEG-DASH content is not taking too many resources: it is just something that makes good sense.
Oh and one can also use standard C# tooling to display this sort of video signal, here we go with standard PlayReady C# Sample with a XAML MediaElement inside:
In continuation of camera demos, one another build with Microsoft’s Video Stabilization MFT.
In the context of Capture Engine applciation and use of the MFT as an effect, it is used in its defautl configuration, in particular without explicit low latency mode. This creates a noticeable delay in video transmission. Still, it is what it is – the effect still passes through the video feed.
Still, it is hardware accelerated and is apparently well suitable for real-time video processing.
On occasion I hooked OpenCV Non-local Means Denoising to the Media Foundation camera capture test application already meantioned in previous posts. It is the regular, non-CUDA implementation wrapped into Media Foundation Transform so that it could be plugged directly into Media Foundation Capture Engine.
Original implementation is taken from non-realtime denoised and hence the question is how good it is for real time video. Unforutunately, it appears to be rather slow. Essentially, it is a Media Foundation wrapper over…
fastNlMeansDenoisingColoredMulti(..., ..., 1, 3, ...);
… with minimal temporal window of three frames sliding one by one. OpenCV implementation could obviosuly be better too, but probably the most efficient improvement would be to take advantage of CUDA variant.
Well, anyway, the demo is here, to see how slow it is for live…