IMediaObject::Discontinuity while Windows Media Video 9 Encoder has data to process

This is presumably a bug in Windows Media Video 9 Encoder in versions up to Windows 7 included (fixed in Windows 8.1 at the very least – wmvencod.dll 6.3.9600.17415).

A IMediaObject::Discontinuity call destroys input the DMO it already holds: it reports success, and handles discontinuity correctly. It even drains output as it should, but if in the same time it already has input to process – this input is gone and the typical outcome is that a frame (or possibly more?) in the end of the stream is trimmed away.

The call itself is legal and reports S_OK. The method should have returned DMO_E_NOTACCEPTING if it is too early to report discontinuity, and the DMO does not do it.

Good news it’s fixed in its most recent version and it is not a cold case.

The case of incorrect behavior of stock DirectX Media Objects

Since Microsoft Windows Vista, Media Foundation API offers a set of convenient Digital Signal Processors for video and audio processing. These include conversion helpers for video and audio, dual interfaced as Media Foundation Transforms (MFTs) and DirectX Media Objects (DMOs). Earlier posts shed some light on DMO interfaces, and use of DMOs in DirectShow graphs with the help of DMO Wrapper Filter.

It appears, however, that some of the stock DMOs are exhibiting unexpected behavior in part of setting media types for input and output streams. IMediaObject::SetInputType and IMediaObject::SetOutputType methods are not only expected to set media types, but also test and clear them depending on flags passed in arguments:

There are three modes defined, depending on value of dwFlags parameter:

  • 0 – [test and] set media type
  • DMO_SET_TYPEF_TEST_ONLY – test only media type
  • DMO_SET_TYPEF_CLEAR – clear media type already set

When the DMO backs a DirectShow filter through DMO Wrapper Filter, there might be (and there definitely are in case of interactive operation in GraphEdit-like application) tests and sets, and resets too during pipeline building stage, and correct support for all flags is important for proper operation.

Unfortunately, some DMOs do not honor the flags and implement incorrect behavior. In particular,

  • Color Converter DSP (CLSID_CColorConvertDMO) does not do it right withDMO_SET_TYPEF_TEST_ONLY and is unable to test media types – instead it sets them
  • Video Resizer DSP (CLSID_CResizerDMO) does it even worse: it ignores both DMO_SET_TYPEF_TEST_ONLY and DMO_SET_TYPEF_CLEAR and attempts to set the media type even if the request was to clear

The project below (SVNTrac) demonstrates the problem in action.

An attempt to test and print current media type shows that testing actually sets the media type:

const HRESULT nTrySetInputTypeResult = pMediaObject->SetInputType(0, &InputMediaType, DMO_SET_TYPEF_TEST_ONLY);
_tprintf(_T("nTrySetInputTypeResult 0x%08x\n"), nTrySetInputTypeResult);
PrintInputType(pMediaObject, DMO_E_TYPE_NOT_SET, _T("we only tested, we did not set it"));

Output:

nTrySetInputTypeResult 0x00000000
nGetInputCurrentTypeResult 0x00000000 <<--- Incorrect, we only tested, we did not set it
Input: biWidth 1920, biHeight 1080

An attempt to clear produces a failure, since it is treated as an attempt to set and no real media type was passed as an argument:

const HRESULT nResetInputTypeResult = pMediaObject->SetInputType(0, NULL, DMO_SET_TYPEF_CLEAR);
_tprintf(_T("nResetInputTypeResult 0x%08x%s\n"), nResetInputTypeResult, FAILED(nResetInputTypeResult) ? _T(" <<--- Incorrect") : _T(""));

Output:

nResetInputTypeResult 0x80004003 <<--- Incorrect

The problem is posted to Microsoft Connect as Digital Signal Processors do not honor flags arguments on DMO interface IMediaObject when setting input/output types where you can upvote it if you suffer from the bug, and also read back from Microsoft when they follow up.

libx264 illustrated

As libx264 has so many presets and tunes, I was curious how they all related one to another when it comes to encode video info H.264. I was more interested in single pass encoding for live video, so the measurements are respectively for this mode of operation with encoder running in CRF (constant rate factor, X264_RC_CRF).

So I took Robotica_1080.wmv HD video in 1440×1080 resolution and batch-transcoded into H.264 using libx264 (build 128) in various modes operation:

  • Presets: “ultrafast”, “superfast”, “veryfast”, “faster”, “fast”, “medium”, “slow”, “slower”, “veryslow”
  • Tunes: “film”, “animation”, “grain”, “stillimage”, “psnr”, “ssim”, “fastdecode”, “zerolatency”, “touhou”
  • CRFs: 14, 17, 20, 23, 26

It is worth mentioning that libx264 does EXCELLENT job in transcoding in terms of performance. Transcoding operation was a DirectShow graph of the following topology:

Some measurements are obviously not quite accurate because not only encoding time counts, WMV decoding time counts also etc. Still this should give a good idea how modes stand side by side one with another.

For every transcoding run I have the following values (Excel spreadsheet attached below):

  • Processor Time: number of processor-milliseconds spent on the transcoding; I was measuring in 8 core system, so with 100% load processor time could be up to eight times higher than Elapsed Time (below) provided that all cores were used in full
  • Elapsed Time: milliseconds spent on the transcoding; regardless of how many actual cores were in use, because original clip is 20 seconds long everything below that is faster than realtime processing
  • Output File Size: size of resulting MP4 video only file, some headers count as well however it is obviously mostly payload data; for a 20 seconds clip, 20 MB is 8 mbit/s bitrate

Another derivative value is:

  • Processor Time/Elapsed Time: which shows fullness of use of multicore system; some modes are clearly not using all available cores, while other do

Let us start watching pictures.

Average Elapsed Time for Preset/Tune (covers runs with different rate factors) shows that slow+ modes take exponentially more time for encoding. psnr and ssim tunes do transcoding slightly faster, while zerolatency tune is the most expensive.

ultrafast and superfast presets produced significantly larger files, about 2x as large as other presets.

Once again exponential scale of Elapsed Time, and similar Processor Time chart:

It is worth mentioning that fastest presets are not using all CPU cores. Apart from being faster on their own, they leave some CPU time for other processing which can be useful for live encoding applications, and those processing multiple streams at once.

And finally detailed file size dependency from preset and CRF rate. As we already discovered, ultrafast and superfast produce larger stream, while output of other modes not so much differ (within a few percent, mostly on the slowest end). A step in rate factor of three gives about 0.7x decrease in amount of produces bytes.

More fun charts can be obtained from the attached .XLS file.

Download links:

Enumerating Media Foundation Transforms (MFTs)

Matthew van Eerde already made a similar wrapper over MFTEnumEx in How to enumerate Media Foundation transforms on your system, and this one extends it with enumeration of attributes, also listing them in human friendly way.
This sort of code should perhaps have been in Media Foundation SDK Samples, however we have what we have.

Media Foundation Transforms (MFTs) – they are registered and accessed through the registry, being available for enumeration with and without qualifying criteria. Some of the transforms are dual, DMO/MFT, some are MFT only which make their useful functionality not available directly for DirectShow pipeline. Luckily, the interface is similar to those of DMOs and making it reasonably possible to wrap one into another. Comparison of MFTs and DMOs shows how the two form factors compare one to the other.

Enumeration tool/utility shows availability of registered MFTs in the system. In Windows 7. For example, the output in Windows 7 workstation in provided below.

The output is a good cheat sheet for seeing support of media types in Windows components.

Download links:

Continue reading →

Using Vista Video Resizer DSP in DirectShow, via DMO Wrapper Filter

Windows Vista introduced helpful video and audio Digital Signal Processors (DSPs) in DMO form-factor, which however do not work smoothly with DMO Wrapper Filter and thus cannot be directly used in DirectShow.

There perhaps was no intent in first place to extend DirectShow functionality with these new components, and no effort was put into providing this mode of operation, however as long as the new classes are DMOs, it is still possible to tune them up to work in DirectShow pipeline.

This sample code/application provides a code snippet on how Video Resizer DSP can be used in DirectShow. There were earlier some discussions on MSDN Forums and this complements the guidelines with code.

The idea is the following:

  • CoCreateInstance the DSP as DMO and add it to DMO Wrapper Filter
  • Use IWMResizerProps::SetFullCropRegion to initialize the DSP
  • Connect input pin
  • Set output type via IMediaObject::SetOutputType
  • IGraphBuilder::ConnectDirect output pin

The sample application takes a video file (note that for the brevity of the sample, not all files will be supported, there is an assumption that files are decoded into VIDEOINFOHEADER media type, and we limit color spaces to 32-bit RGB only to avoid problems on the way in this tiny sample).

The application takes file path (I recommend .WMV) creates DirectShow pipeline, adds Sample Grabber filter to force color space to be 32-bit RGB, adds resizer and sets it up to double video height, but not width, and plays the video.

The application’s filter graph looks like this:

And the video window is stretched twice in height:

A binary [Win32] and Visual C++ .NET 2010 source code [Trac, Subversion] are available from SVN; the important part goes here.

Video Decoder DMO and AM_SAMPLE_PREROLL

This does not seem to be documented anywhere, so it makes sense to mention. A video decoder, wrapped by DMO Wrapper Filter, will receive preroll media samples with AM_SAMPLE_PREROLL flag (alternatively available using IMediaSample::IsPreroll), but it won’t even forward these samples to the underlying DMO, instead they are just ignored.

MSDN says:

Preroll samples are processed but not displayed. They are located in the media stream before the displayable samples.

However this behavoir of DMO Wrapper Filter does not seem to be correct. A DMO might require (in my case, maybe there are other scenarios) to initialize its decoding context from a splice point and then be able to decode further samples. The way the wrapper is skipping samples it appears that DMO is not receiving splice point data and won’t be able to start decoding of non-preroll samples until it receives a non-preroll splice point media sample…

YV12, Extended Video Renderer Strides, Private DMO and more

Recently it was the time to sort out an issue with a video DMO, which outputs YV12 video and in the same time is capable of supporting extended video strides in order to efficiently make a direct connection to Video Mixing Renderer Filters.

From past experience, I already knew that some bugs are definitely involved but their effect was yet unexplored. For a testbed application, I took good old FrameRateSample02 application, which generates multiple video feeds and routes it to video renderers:

FrameRateSample02 Application with New Choices

With new source video choices the application is capable of constructing filter graphs that use a private DMO (that is hosted inside the executable) wrapped with DMO Wrapper Filter, with a graph topology shown below:

Filter Graph with a Private DMO

Continue reading →