While on it, a quick update to the previous post/application:
// Oprogramowanie Roman Ryltsov
While on it, a quick update to the previous post/application:
A variant of previous CaptureEngineVideoCapture demo application which features AMD Advanced Media Framework SuperResolution scaler for video. It is basically a live video camera application started in low resolution mode, and it enables you to switch between GPU (OpenCL probably?) implemented realtime upscaling modes. AMD scaler is wrapped into Media Foundation Transform and is applied…
Moving on with Widows Camera experiments. This time it is a Windows 11 Virtual Camera, a true one like this and this, and not this which should be totally archived despite people still keep on trying. So what we got this time? The application derived from old CaptureEngineVideoCapture sample still features IMFCaptureEngine API. However, it also intallls temporary…
We recently came across an article https://lnkd.in/dx2ZUgZX discussing the use of Dolby Laboratories Dialog Intelligence for speech gating. This technology addresses a challenge we’ve encountered in the past, involving standards like ITU-R BS.1770 https://lnkd.in/dhVSRTRB and related methods. The article provides detailed technical information and references, allowing us to focus on the practical implications.
We had reference Dolby Dialog Intelligence source code as a departure point, and we applied the code to live audio streams we already handled. The primary outcome of this processing is the ability to confidently determine whether content contains speech or not. While the Dolby source code was relatively straightforward to integrate, it had some performance limitations. It worked, but the resource consumption didn’t align well with other processing requirements.
Before requesting production-ready implementation from Dolby, our customer allowed us to investigate further. We discovered that the initial part of the processing involved downsampling the audio signal to 16 kHz. By replacing this step with a proper #audio resampler and ensuring that it didn’t affect the speech detection algorithm’s output, we achieved a production-ready speech gating solution: processing complexity was reduced by an order of magnitude.
Speech gating plays a crucial role in determining the audio loudness of broadcasted content. Compliance requirements now demand accurate loudness measurements, preventing any manipulation or cheating with audio levels.
Recent additions to MPEG-DASH specification (ISO/IEC 23009-1 5th ed., AMD3; also SCTE 214-6) offer new ways to implement low-latency low-delay playback: using so called Segment Sequence Representations (SSR). MPGE-DASH manifests advertise this flavor of data by including a respective EssentialProperty. Before Shaka Player Demo added support for this technology, they ignored the respective adaptation set…
Back to some experiments… The current video capture API in Windows is Media Foundation Capture Engine API (AKA IMFCaptureEngine and mfcaptureengine.h). Media Foundation is layered: you can work at lower level with video capture Media Sources, but if you don’t want to go into details you have the Capture Engine. The application continues a good…
In the years 2009 to 2011, engineers from the Microsoft Media Foundation Team shared a series of blog posts containing sample code related to the hashtag#MediaFoundation API — a successor to the venerable hashtag#DirectShow.
At that time, there was a scarcity of sample source code specifically addressing this topic. Unfortunately, the passage of time and various transformations of blog sites and the Microsoft website took their toll. The original blog posts suffered, and although they were eventually recovered and reinstated as part of the team blog archive https://lnkd.in/drKBW5tW, the source code associated with those posts vanished entirely. The links now led to the dreaded HTTP 404 “Not Found” error.
However, our quest for historical preservation and the benefit of those who remain curious led us to a solution. We unearthed the missing source code and deposited it into a GitHub repository https://lnkd.in/dXRi9PZF. There, it resides — a testament to the past and a resource for those who still harbor interest in the intricacies of the Windows Media Foundation API.
Feel free to explore the repository and delve into the code. After all, sometimes even lost fragments of the digital realm can find their way back home. ????????