Greetings from H.265 / HEVC Video Decoder Media Foundation Transform

H.265 / HEVC Video Decoder Media Foundation has been around for a while, but using Media Foundation overall one step off straightforward basic paths is like walking a minefield.

A twenty-liner below hits memory access violation inside IMFTransform::GetInputStreamInfo “Exception thrown at 0x6D1E71E7 (hevcdecoder.dll) in MfHevcDecoder01.exe: 0xC0000005: Access violation reading location 0x00000000.”:

#include "stdafx.h"
#include <mfapi.h>
#include <mftransform.h>
#include <wmcodecdsp.h>
#include <atlbase.h>
#include <atlcom.h>

#pragma comment(lib, "mfplat.lib")
#pragma comment(lib, "mfuuid.lib")
#pragma comment(lib, "wmcodecdspuuid.lib")

int main()
{
    ATLVERIFY(SUCCEEDED(CoInitialize(NULL)));
    ATLVERIFY(SUCCEEDED(MFStartup(MF_VERSION)));
    CComPtr<IMFTransform> pTransform;
#if 1
    static const MFT_REGISTER_TYPE_INFO InputTypeInformation = { MFMediaType_Video, MFVideoFormat_HEVC };
    IMFActivate** ppActivates;
    UINT32 nActivateCount = 0;
    ATLVERIFY(SUCCEEDED(MFTEnumEx(MFT_CATEGORY_VIDEO_DECODER, 0, &InputTypeInformation, NULL, &ppActivates, &nActivateCount)));
    ATLASSERT(nActivateCount > 0);
    ATLVERIFY(SUCCEEDED(ppActivates[0]->ActivateObject(__uuidof(IMFTransform), (VOID**) &pTransform)));
#else
    ATLVERIFY(SUCCEEDED(pTransform.CoCreateInstance(CLSID_CMSH265EncoderMFT)));
#endif
    MFT_INPUT_STREAM_INFO InputInformation;
    ATLVERIFY(SUCCEEDED(pTransform->GetInputStreamInfo(0, &InputInformation)));
    return 0;
}

Interestingly, alternative path around IMFActivate (see #if above) seems to be working fine.

 

Microsoft AAC Encoder’s MF_E_TRANSFORM_NEED_MORE_INPUT after MFT_OUTPUT_STATUS_SAMPLE_READY

Media Foundation AAC Encoder is a pure MFT, as opposed to legacy DSP’s which are made dual DMO/MFT interfaced and presumably have higher chances for smaller artifacts.

The transform is synchronous and is supposed to be simpler inside: fully passive and drives by input/output calls.

Nevertheles, advertising MFT_OUTPUT_STATUS_SAMPLE_READY via IMFTransform::GetOutputStatus call, if might falsely indicate availability of data. Subsequent ProcessOutput call returns MF_E_TRANSFORM_NEED_MORE_INPUT… Documented behavior:

If the method returns the MFT_OUTPUT_STATUS_SAMPLE_READY flag, it means you can generate one or more output samples by calling IMFTransform::ProcessOutput.

MFTs are not required to implement this method. If the method returns E_NOTIMPL, you must call ProcessOutput to determine whether the transform has output data.

The method is optional, but it is implemented on this particular MFT. Also, this MFT is one of the stock transforms that are documented for public use. Microsoft could apparently have done a better job implementing it cleanly.

See also:

Microsoft Media Foundation code samples online

Media Foundation Team Blog (2009-2011) lost connection with the community some time ago, and its sample code hosted at http://code.msdn.microsoft.com/mfblog passed away too.

Three Four of the five samples were saved and were made back available by user mofo77, and the other two one MFSimpleEncode and MFManagedEncode are is still wanted.

I put the sample code online at GitHub here. If someone happens to have the two missing projects, please post there or email me to have them pushed to the repository. Feel free to use the samples if, for whatever reason,
Media Foundation is what you decided to mess with.

Also, be aware that older Windows SDK Media Foundation samples can be found in:

As it becomes a collection of Media Foundation related links here we go with bonus reading:

Now you are well set, GOOD LUCK!

Effects of IMFVideoProcessorControl2::EnableHardwareEffects

IMFVideoProcessorControl2::EnableHardwareEffects method:

Enables effects that were implemented with IDirectXVideoProcessor::VideoProcessorBlt.

[…] Specifies whether effects are to be enabled. TRUE specifies to enable effects. FALSE specifies to disable effects.

All right, it is apparently not IDirectXVideoProcessor and MSDN link behind the identifier takes one to correct Direct3D 11 API method: ID3D11VideoContext::VideoProcessorBlt.

Worse news is that having the effects “enabled”, the transform (the whole thing belongs to Media Foundation’s Swiss knife of conversion [with just one blade and no scissors though] – Video Processing MFT) fails to deliver proper output and produces green black fill instead of proper image.

Or, perhaps, this counts as a hardware effect.

No conversion with MF_CONNECT_ALLOW_CONVERTER

Microsoft Media Foundation Media Session API topology resolution is way less clear compared to DirectShow. The API takes away a part of component connection process and makes it less transparent to API consumer. Then, while DirectShow Intelligent Connect is use scenario agnostic,  Media Foundation Media Session apparently targets playback scenarios, and its topology resolution process is tuned respectively.

MF_CONNECT_ALLOW_DECODER

Add a decoder transform upstream upstream from this node, if needed to complete the connection. The numeric value of this flag includes the MF_CONNECT_ALLOW_CONVERTER flag. Therefore, setting the MF_CONNECT_ALLOW_DECODER flag sets the MF_CONNECT_ALLOW_CONVERTER flag as well.

[…] If this attribute is not set, the default value is MF_CONNECT_ALLOW_DECODER.

Well, that’s double upstream and suggests that the thing is impressively reliable. However it is not.

In a non-playback topology, if a direct connection is impossible and required conversion is not typical for playback, MF_CONNECT_ALLOW_CONVERTER flag seems to be not helpful for topology resolution.

Apparently, Microsoft does have suitable code, esp. used in Sink Writer API, however it does not seem to be available in any form other than a packet deal with Sink Writer object and its own limitations. Media Session API does not implement this (non-playback, that is) style of topology resolution and node connection. Transcode API too has the necessary topology resolution code, but again it comes with its own constraints making it useless unless you want to do something really simple.

 

Reference Signal Source: audio as Media Foundation source

Reference signal source for DirectShow in its video part already received Media Foundation Source interface earlier.

This time, the update implements a separate Media Foundation source for audio. MfGenerate2 sample code gives an idea on how to initialize the source:

using namespace AlaxInfoDirectShowReferenceSource;
CComPtr<IAudioMediaSource> pSource;
__C(pSource.CoCreateInstance(__uuidof(AudioMediaSource)));
__C(pSource->SetMediaType(NULL, g_nSampleRate, g_nChannelCount, g_nBitDepth));
__C(pSource->put_Duration((DOUBLE) g_nDuration));
CComPtr<IMFMediaSource> pAudioMediaSource = pSource;

The source can be given specific format using Media Foundation stream descriptor’s media type handler, or set up via private COM interface.

The source can be given sampling rate, channel count (all channels receive the same signal) and bit depth for PCM audio formats (8, 16..32), or 32-bit IEEE floating point format.

Video and audio streams can also be combined into aggregate source (video+audio) to produce a multi-track output. The MfGenerate2 sample shows the approach as well:

__D(pVideoMediaSource || pAudioMediaSource, E_UNNAMED);
if(pVideoMediaSource && pAudioMediaSource)
{
    CComPtr<IMFCollection> pCollection;
    __C(MFCreateCollection(&pCollection));
    __C(pCollection->AddElement(pVideoMediaSource));
    __C(pCollection->AddElement(pAudioMediaSource));
    __C(MFCreateAggregateSource(pCollection, &pMediaSource));
} else
    pMediaSource = pVideoMediaSource ? pVideoMediaSource : pAudioMediaSource;
_A(pMediaSource);

The sample project is capable to produce output of this kind:

Download links