Tag: Audio

Resource-Efficient Speech Gating: Leveraging Dolby Dialog Intelligence

We recently came across an article https://lnkd.in/dx2ZUgZX discussing the use of Dolby Laboratories Dialog Intelligence for speech gating. This technology addresses a challenge we’ve encountered in the past, involving standards like ITU-R BS.1770 https://lnkd.in/dhVSRTRB and related methods. The article provides detailed technical information and references, allowing us to focus on the practical implications.

We had reference Dolby Dialog Intelligence source code as a departure point, and we applied the code to live audio streams we already handled. The primary outcome of this processing is the ability to confidently determine whether content contains speech or not. While the Dolby source code was relatively straightforward to integrate, it had some performance limitations. It worked, but the resource consumption didn’t align well with other processing requirements.

Before requesting production-ready implementation from Dolby, our customer allowed us to investigate further. We discovered that the initial part of the processing involved downsampling the audio signal to 16 kHz. By replacing this step with a proper #audio resampler and ensuring that it didn’t affect the speech detection algorithm’s output, we achieved a production-ready speech gating solution: processing complexity was reduced by an order of magnitude.

Speech gating plays a crucial role in determining the audio loudness of broadcasted content. Compliance requirements now demand accurate loudness measurements, preventing any manipulation or cheating with audio levels.

Streaming Games to Any Device

In the past, GeekWire featured an article https://lnkd.in/d8FMf3mH on Rainway — a prominent Seattle startup with an ambitious mission: streaming games to any device.

Our role in this endeavor was to contribute essential components to Rainway’s game streaming technology. Among these, a pivotal piece involved transforming the audiovisual content generated by standard games into a format compatible with hashtag#HTML5. Our primary objective was to extend the gaming experience to remote web browsers.

To achieve this, we repurposed our existing technology and developed a subsystem, which efficiently converted monitor video into an H.264/AVC data stream, meticulously packaged for compatibility with HTML5 Media Source Extensions (hashtag#MSE). Through hashtag#WebRTC transmission, this stream seamlessly reached remote systems and integrated into web browsers.

Throughout our journey, we engaged in thoughtful experiments. Should audio be part of a joint stream with video, or should it be delivered separately? We delved into format intricacies and explored novel ideas. Notably, while some debated the idea of video remaining entirely within the GPU realm, including video encoding, we had already implemented this with production-quality results back in 2017.

The outcome was groundbreaking software that facilitated desktop Windows gaming streaming to HTML5 browsers, mobile devices, and even hashtag#Xbox consoles. ????????

LDS Temples and Technology: The DirectShow Journey

A while back, we were working on a media subsystem for The Church of Jesus Christ of Latter-day Saints. They needed software-controlled multimedia playback with specific requirements for their temples worldwide.

Now, the attached image isn’t an exact representation of our work, but it captures the essence: LDS and technology go hand in hand.

Back in the day, we used #DirectShow as our multimedia framework, and boy, did we face some interesting challenges. One that sticks out in memory is related to audio delivery. Picture this: we had a multi-channel audio output card from AudioScience, Inc., and our task was to schedule audio delivery in perfect sync across multiple physical audio connectors. But wait, there’s more! We also had to toggle outputs on and off while others were already belting out sound. And when we turned on a fresh audio stream, it had to seamlessly match the signal already in play. Oh, and don’t forget — the video part of this signal was streaming nonstop and couldn’t be interrupted.

Now, let me tell you, this wasn’t a walk in the park. The multimedia framework was designed back in the ’90s, with the quaint notion that once you set up your playback topology, you couldn’t tweak anything while the show was running.

But guess what? Our software spread its wings and flew to over a hundred locations worldwide. Many moons have passed, but who knows — it might still be chugging along out there.

Legacy Filters, Modern Solutions: MP4 Support in DirectShow

Microsoft #DirectShow API was introduced long before the widespread adoption of MPEG-4. As MPEG-4 codecs and container formats became standard, DirectShow was, by Microsoft’s own admission, nearing the end of its life.

That’s how this once-popular media framework for Windows found itself without support for MP4 files. Fortunately, there was a handy solution: freely available filters https://gdcl.co.uk/mpeg4/ developed by Geraint Davies. Originally published in 2006, these filters gained popularity over time. Since Geraint had other commitments after the last update, we took the liberty of placing a copy of his work on GitHub https://lnkd.in/dPsZEfpE somewhere around 2015.

Despite the state of DirectShow, these filters still play a role in DirectShow applications. We’ve even made a few updates ourselves — a little bit of everything: a unit test project, some modern C++ and #COM code based on Microsoft WIL https://lnkd.in/de5nxif, a COM type library with an integration interface, and various features. One particularly valuable addition is the ability to recover broken recordings.

You see, sometimes applications crash — whether due to external factors or just plain bad luck. And sometimes the cost of “re-doing things right” is too high. Oh, and the cost of data loss is high too! In such cases, we can salvage the broken recording from the crashed application and recover its content. It’s like a digital rescue mission. And in some instances, it’s even automated — like in our partner’s medical software https://lnkd.in/dCrJJRjy, where multi-hour recordings are the norm these days.

Custom Filters to the Rescue: Diagnosing and Solving Media Stream Woes

In a recent customer project, we encountered a longstanding problem related to a broadcasted media stream captured by software.

Our customer application, built on the hashtag#DirectShow framework, was responsible for capturing the media stream. The application needed to operate 24/7 and maintain stability during continuous live media broadcasting.

Unfortunately, sporadic crashes occurred due to a third-party DirectShow filter within the topology. This filter triggered memory access violations and crashed the entire application. The vendor of this problematic component was unavailable for maintenance. Given the low incident frequency, the customer was concerned about possible regressions in the case of any update of this part of the application.

The issue manifested only in the production signal, complicating troubleshooting, and the customer’s data center required remote access via VPN and several remote desktops.

To diagnose and solve the problem effectively, we introduced a lightweight DirectShow filter. This custom filter transparently captured a dump of the recently captured MPEG-TS signal, mirroring the data flow to the faulty component. In the production environment, this solution allowed us to analyze the content in sufficient detail to reproduce the problem outside the production setup.

Eventually, we replaced the problematic third-party DirectShow filter with an in-house development that handled that particular processing step reliably and maintainably.

In summary, our custom DirectShow filter provided crucial insights, enabling us to address the issue and enhance stability in the customer’s live media broadcasting application.

From Webcams to 24-Hour Recordings: A Decade of Medical Video Evolution

For over a decade, we have been supporting a customer’s product in the field of medical video. Our journey began with their attempts of previous years to record from webcams — an endeavor that was initially partially successful. We put our expertise to bring use of this technology to stable level, and since then we have successfully captured countless hours of footage, using both standard cameras and professional-grade hardware.

One particularly intriguing aspect of this evolution stands out.

In the early days, our recordings were typically brief, lasting only a few minutes. Moreover, the original implementation imposed a strict 10-minute limit within the application. Why? Well, at that time, there was no robust video encoder available that could handle the application’s requirements — hardware constraints, desired quality, and real-time performance. Consequently, the signal was often recorded in an uncompressed format. The time limit served a purpose: managing disk usage and preventing accidental overflows.

As the years passed, we transitioned to using proper video encoders for live content. We swiftly expanded our supported resolution to Full HD at 60 frames per second. Hardware-assisted encoding became the norm, and the duration of encoded sessions gradually increased. What started as 10-minute recordings soon extended to 15 minutes, half an hour, and eventually full procedures lasting up to 4 hours. Then came the pivotal question: “Can we record continuously for 24 hours?” The answer was a resounding yes.

Despite its long history, the ongoing support, stable performance, and improvements over time have been well worth the very moderate effort, and yes, it’s still a hashtag#DirectShow app. With a little hashtag#MediaFoundation insertion.

Simultaneously, the need for data safety during recording intensified. Fortunately, we devised a solution, which we’ll delve into further in an upcoming piece.

Microsoft Media Foundation and HoloLens: Enabling Real-Time Video Communication

A customer approached us with a specific request: they needed a code snippet that could compress video into the H.264/AVC video format. Additionally, they were developing an application for the hashtag#Microsoft hashtag#HoloLens, which was the previous generation of hashtag#mixedreality headsets.

Upon closer examination, we realized that the purpose was to enable video encoding and decoding in a hashtag#lowlatency, hashtag#realtime, multi-party video conferencing application. The challenge lay in achieving this on low-powered hardware with its unique platform and limitations.

The HoloLens utilizes the Microsoft UWP platform, which typically allows developers to avoid rewriting software for different platform devices. However, in this particular case, adaptation was necessary. The only viable approach for handling real-time video was to leverage hardware-assisted video encoding and decoding through Microsoft Media Foundation.

These devices come equipped with specialized GPUs that share similarities with traditional desktop graphics adapters but also have their own distinct characteristics. Essentially, we developed a subsystem specifically tailored for Microsoft HoloLens 1, enabling real-time video capture, compression, decompression, seamless camera integration, hashtag#unity3d integration, and audio hardware support.