Support for FLAC in ISO BMFF with MSE in StreamingServer application

Chrome platform supports FLAC encoding in ISO BMFF (fragmented MP4) media since version 62 (October 2017), however the support for FLAC (and Opus) overall did not become standard and comprehensive since then.

I hooked up Microsoft FLAC Audio Encoder MFT into media streaming application to produce media and check browser compatibility.

  • /audio.mp4?flac – produces FLAC in ISO BMFF media on the fly, resulting in streamable media used here in Google’s demo
  • /audio.mp4?flac&duration=50 – allows to override duration and generate longer content; there is no chunked HTTP transfer [yet] so the content needs to be fully generated before it is sent, beware if doing real long files
  • /audio.mp4 – without “flac” specifier with or without duration results in single AAC track media
  • /audio.mp4.html?flac – produces a wrapper HTML page offering playback of the media (see below)

HTML is repeating Google’s demo and is feeding the FLAC MP4 audio into HTML5 media element:

std::string Text;
Text.append("<html><body>");
Text.append(Format("<p>Audio MIME type: audio/mp4; codecs=\"%hs\"</p>", Codecs.c_str()));
Text.append("<audio controls />");
Text.append("<script>");
Text.append(Format("const uri = 'audio.mp4%ls';", Request.GetQueryString().c_str()));
Text.append(Format("const mimeType = 'audio/mp4; codecs=\"%hs\"';", Codecs.c_str()));
auto constexpr g_Script = R"script(
    const audio = document.querySelector('audio');
    if (MediaSource.isTypeSupported(mimeType)) {
        const mediaSource = new MediaSource();
        audio.src = URL.createObjectURL(mediaSource);
        mediaSource.addEventListener('sourceopen', function () {
            URL.revokeObjectURL(audio.src);
            const sourceBuffer = mediaSource.addSourceBuffer(mimeType);
            console.log('Fetching audio file...');
            fetch(uri)
                .then(response => response.arrayBuffer())
                .then(data => {
                    sourceBuffer.appendBuffer(data);
                    sourceBuffer.addEventListener('updateend', function () {
                        if (!sourceBuffer.updating && mediaSource.readyState === 'open') {
                            mediaSource.endOfStream();
                            console.log('Audio is ready to play!');
                        }
                    });
                });
        });
    } else {
        console.log('MIME type ' + mimeType + ' is not supported on this platform with MSE.');
    }
)script";
Text.append(g_Script);
Text.append("</script>");
Text.append("</body></html>");
Response->Initialize(HTTP_STATUS_OK, "OK");
Response->AddHeader(HttpHeaderContentType, "text/html; charset=utf-8");
AddAccessControlResponseHeaders(*Response);
Response->AddMemoryEntityChunk(Text);
Response->Send();

The generated FLAC MP4 asset is playable not just on Chrome, it can be played on:

  • new Edge (obviously, it’s Chromium based)
  • VLC player on Windows
  • MPC-HC player on Windows (libav backed)
  • Safari on macOS (but not on iOS, neither from HTML wrapper because of MSE absence nor directly)
  • Some Android phone (“Samsung Internet” browser? WTF; both directly and via MSE interface from HTML wrapper)
  • UWP MediaPlayerElement control

The ISO BMFF content is styled for low latency progressive streaming, it’s just concatenated into complete file. For this reason the FLAC content can also be put into adaptive bitrate streaming media asset like HLS, and the application does it as well, but it deserves a separate blog post and support for FLAC in HLS is not as good.

Download links

Binaries:

  • 64-bit: StreamingServer.exe (in .ZIP archive)
  • License: This software is free to use; builds have time based expiration

Reference HTTP Live Streaming (HLS) server application

StreamingServer is the application I am using as internal testbed for various media processing and encoding primitives. As an application (or service) it is capable to stream HLS assets preparing them on the fly without need to keep and host real media files. The functionality includes:

  1. Encodes and multiplexes ISO BMFF Byte Stream Format media segments with AAC audio and H.264/AVC, H.265/HEVC video, exposing them as HLS assets (see also RFC 8216 “HTTP Live Streaming”)
  2. Supports video only, audio only, video and audio assets
  3. Supports parts of ISO/IEC 23001-7 “Common encryption in ISO base media file format files” specification and implements ‘cenc’ and ‘cbcs’ encryption schemes with AES-CTR-128 and AES-CBC-128 encryption modes of operation respectively
  4. Implemetns encryption layouts as supported by Microsoft PlayReady DRM implementations, and specifically Microsoft PlayReady sample
  5. Supports live HLS assets, including live finite and live infinite assets
  6. Encoding services are provided by underlying Media Foundation encoders; due to state of Media Foundation and, specifically, awful quality of vendor specific third party vendor integrations the application (a) might have issues with specific video cards, (b) implements built-in encoding based on NVIDIA Video Codec SDK for NVIDIA GPUs, (c) offers software only mode for GPU agnostic operation

The application assumes just one client and its streaming services are, generally speaking, limited by trivial HTTP serving loop. Still multiple clients should be able to request data in parallel too.

It is possible to dump produced responses as files for retroactive review. Unless responses are written to files, they are streamed in HTTP chunked mode at lowest latency.

Quick start

Start the application with privilege elevation to enable its initialization with HTTP Server API services. Unless overridden with command line parameters, the application uses first available DXGI device for hardware assisted video encoding, and exposes its HTTP functionality via http://localhost/hls base. Open http://localhost/hls/about to get up to date syntax for command line and URI; also to check the status of the application.

Problem resolution

The application is best suited for use with NVIDIA GPUs doing hardware H.264 video encoding. In the case of video encoding issues, it makes sense to start the application with “-Software” switch to put it into software only mode: video frames will be generated by Direct2D into WIC bitmaps instead of DXGI and Direct3D 11 textures, video encoders will use system memory backed Media Foundation media buffers and samples.

Download links

Binaries:

  • 64-bit: StreamingServer.exe (in .ZIP archive)
  • License: This software is free to use; builds have time based expiration

Microsoft HEVCVideoExtension software H.265/HEVC encoder

The engineering quality of most recent Microsoft’s work around Media Foundation is terrible. It surely passes some internal tests to make sure that software items meet requirements of the use cases required for internal products, but published work gives impression that there is noone left to care about API offerings to wide audience.

One new example of this is how H.265/HEVC video encoder implemented by respective Windows Store extension in mfh265enc.dll works.

I have been putting the component into existing code base in order to extend it with reference software video encoding, now in H.265/HEVC format. Hence, the stock software encoder regardless of its performance and qualtiy metrics.

Encoder started giving nonsensical exceptions and errors, in particular rejecting obviously valid input. Sorting out a few things, I started seeing the MFT producing E_FAIL on the very first video frame it receives.

The suspected problem was (and there were not so many other things left) that output media type was set two times. Both calls were valid, with good arguments and before any payload processing. Second call supplied the same media type, all the same attributes EXACTLY. Both media type setting call were successful. The whole media type setting story did not produce any errors at the stage of handling streaming start messages.

Still the second call apparently ruined internal state because – and there can be no other explanation – of shitty quality of the MFT itself.

The code fragment that discards the second media type setting call at wrapping level gets the MFT back to processing. What can I say…

New AMD Radeon RX 6800 in real time low latency video encoding scenarios

AMD is not seemingly making any progress in improving video encoding ASICs on their video cards. New stuff looks pretty depressing.

AMD Radeon RX 5700 XT was a bit of a move forward, a bit. New series look about the same but even slower a bit, however it is quite clear that existing cheaper NVIDIA offering beats the hell out of new AMD gear.

Not to even mention that NVIDIA cards are capable to handle larger resolutions, where AMD’s bar is at 3840×2160@90.

H.265/HEVC looks pretty similar:

Windows 10 SDK RTWorkQ.h and C++/WinRT winrt::implements

interface DECLSPEC_UUID("ac6b7889-0740-4d51-8619-905994a55cc6") DECLSPEC_NOVTABLE
    IRtwqAsyncResult : public IUnknown
{
    STDMETHOD(GetState)( _Out_ IUnknown** ppunkState);
    STDMETHOD(GetStatus)();
    STDMETHOD(SetStatus)( HRESULT hrStatus);
    STDMETHOD(GetObject)( _Out_ IUnknown ** ppObject);
    STDMETHOD_(IUnknown *, GetStateNoAddRef)();
};

interface DECLSPEC_UUID("a27003cf-2354-4f2a-8d6a-ab7cff15437e") DECLSPEC_NOVTABLE
    IRtwqAsyncCallback : public IUnknown
{
    STDMETHOD(GetParameters)( _Out_ DWORD* pdwFlags, _Out_ DWORD* pdwQueue );
    STDMETHOD(Invoke)( _In_ IRtwqAsyncResult* pAsyncResult );
};

Interface methods lack pure specifiers. This might be OK for some development but once you try to inherit your handler class from public winrt::implements<AsyncCallback, IRtwqAsyncCallback> you are in trouble!

1>Foo.obj : error LNK2001: unresolved external symbol "public: virtual long __cdecl IRtwqAsyncCallback::GetParameters(unsigned long *,unsigned long *)" (?GetParameters@IRtwqAsyncCallback@@UEAAJPEAK0@Z)
1>Foo.obj : error LNK2001: unresolved external symbol "public: virtual long __cdecl IRtwqAsyncCallback::Invoke(struct IRtwqAsyncResult *)" (?Invoke@IRtwqAsyncCallback@@UEAAJPEAUIRtwqAsyncResult@@@Z)

The problem exists in current Windows 10 SDK and since 10.0.18362.0 at the very least.

Continue reading →

Updated NvcEncode: 5K and 8K low latency video encoding tester for NVIDIA GPUs

Added a few more resolutions to NvcEncode tool. Resolutions above 4K are tried with H.264 codec but they are expected to not work since H.264 codec is limited to resolutions up to 4096 pixels in width or height. So the new ones apply to H.265/HEVC. They work pretty well on NVIDIA GeForce RTX 2060 SUPER:

640x360@144 2.10
640x360@260 2.16
1280x720@60 4.50
1280x720@120 5.38
1280x720@260 5.33
1920x1080@60 8.78
1920x1080@72 8.67
1920x1080@90 8.50
1920x1080@120 9.69
1920x1080@144 6.74
1920x1080@260 4.20
2560x1440@60 8.45
2560x1440@72 9.89
2560x1440@90 10.10
2560x1440@120 6.62
2560x1440@144 5.73
3840x2160@30 17.88
3840x2160@60 13.02
3840x2160@72 10.86
3840x2160@90 10.07
3840x2160@120 7.98
3840x2160@144 6.57
5120x2880@30 27.78
5120x2880@60 13.85
7680x4320@30 27.22
7680x4320@60 26.47

One interesting thing is – and it is too visible and consistent to be an occaisional fluctuation – is that per frame latency is lower for higher rate feeds. Most recent run has a great example of this effect:

3840x2160@30 17.88
3840x2160@60 13.02
3840x2160@72 10.86
3840x2160@90 10.07
3840x2160@120 7.98
3840x2160@144 6.57

I have an educated guess only and driver development guys are likely to have a good explanation. This is probably something NVIDIA can improve for those who want to have absolutely lowest encoding latencies.

Download links

Binaries:

  • 64-bit: NvcEncode.exe (in .ZIP archive)
  • License: This software is free to use