Getting MF_E_TRANSFORM_NEED_MORE_INPUT from Video Processor MFT’s ProcessOutput just to let it take next input

Another example how Microsoft Media Foundation can be annoying in small things. So we have this Video Processor MFT transform which addresses multiple video conversion tasks:

The video processor MFT is a Microsoft Media Foundation transform (MFT) that performs colorspace conversion, video resizing, deinterlacing, frame rate conversion, rotation, cropping, spatial left and right view unpacking, and mirroring.

It is easy to see that Microsoft does not offer a lot of DSPs and even less of them are GPU friendly. Video Processor MFT is a “swiss army knife” tool: if takes care of video fitting tasks, in efficient way, in task combinations, being able to take advantage of GPU processing with fallback to software code path, such as known earlier by Color Converter DSP or similar.

Now the main question is, if you are offering just one thing, is there any chance you can do it right?

First of all, the API is not feature rich, it offers just basics via IMFVideoProcessorControl interface. Okay some functionality might not be available in fallback software code path, but it is still the only Direct3D 11 aware conversion component you are offering, so you could still offer more options for those who want to take advantage of GPU-enabled conversions.

With its internal affiliation to Direct3D 11 Video Processor API, it might be worth to mention how exactly the Direct3D API is utilized internally, the limitations and perhaps some advanced options to customize the conversion: underlying API is more flexible than the MFT.

The documentation is not just scarce, it is also incomplete and inconsistent. The MFT does not mention its implementation of IMFVideoProcessorControl2 interface, while the interface itself is described as belonging to the MFT. Although I wrote before that this interface is known for giving some troubles.

The MFT is designed to work in Media Foundation pipelines, such as Media Session hosted and other. However it does take a rocket scientist to realize that if you offer just one thing to the developers for broad range of tasks, the API will be used in various scenarios. Including, for example, use as standalone conversion API.

They should have mentioned in the documentation that the MFT behavior is significantly different in GPU and CPU modes, in for example way the output samples are produced: CPU mode requires the caller to supply a buffer to have output generated to. GPU mode, on the contrary, provides its own output textures with data from internally managed pool (this can be changed, but it’s default behavior). It is fine for Media Session API and alike, but they are also poorly documented so it is not very helpful overall.

I am finally getting to the reason which inspired me to write this post in first place: doing input and output with Video Processor MFT. This is such a fundamental task that it has to have a few words on MSDN on it:

When you have configured the input type and output type for an MFT, you can begin processing data samples. You pass samples to the MFT for processing by using the IMFTransform::ProcessInput method, and then retrieve the processed sample by calling IMFTransform::ProcessOutput. You should set accurate time stamps and durations for all input samples passed. Time stamps are not strictly required but help maintain audio/video synchronization. If you do not have the time stamps for your samples it is better to leave them out than to use uncertain values.

If you use the MFT for conversion and you set the time stamps accurately, it is easy to achieve “one input – one output” behavior. The MFT is additionally synchronous, so it does not require to implement asynchronous processing model: it is possible to consume the API in a really straightforward way: media types set, input 1, output 1, input 2, output 2 etc. – everything within a single thread, linearly. Note that the MFT is not necessarily producing one output sample for every input sample, it is just possible to manage it this way.

No the linear code snippet looks this way:

However even in such a simple scenario the MFT finds a way to horse around. Even if it finishes to produce output for given input, it still requires an additional IMFTransform::ProcessOutput call just to unlock itself for further input and return MF_E_TRANSFORM_NEED_MORE_INPUT. A failure to do this unnecessary call and receive a failure status results in being unable to feed new input in with respectively MF_E_NOTACCEPTING in IMFTransform::ProcessInput. Even though it sort of matches the documented behavior (for example, in ASF related documentation: Processing Data in the Encoder), where MFT host is expected to request output until it is no longer available, there is nothing on documented contract that prevents the MFT to be friendlier on its end. Given the state and the role of this API, it should have been done super friendly to the developers and Microsoft failed to reach minimal acceptable level of friendliness here.

ATLENSURE_SUCCEEDED(pTransform->ProcessInput(0, pSample, 0));
MFT_OUTPUT_DATA_BUFFER OutputDataBuffer = { };
// …
DWORD nStatus;
ATLENSURE_SUCCEEDED(pTransform->ProcessOutput(0, 1, &OutputDataBuffer, &nStatus));
// …
// NOTE: Kick the transform to unlock its input
const HRESULT nProcessOutputResult = pTransform->ProcessOutput(0, 1, &OutputDataBuffer, &nStatus);
ATLASSERT(nProcessOutputResult == MF_E_TRANSFORM_NEED_MORE_INPUT);

MediaFoundationDxgiCapabilities: with AMF SDK H.264 encoder related data

Yet another post on AMD AMF SDK and hopefully a helpful tool reference. I updated one of the capability discovery applications (MediaFoundationDxgiCapabilities) so that it includes a printout of AMFVideoEncoderVCE_AVC related properties similarly as they are printed for Nvidia video adapters.

Information includes:

  • runtime version (and its availability in first place!)
  • maximal resolution, profile and level supported
  • formats with respect to capabilities reported on Direct3D 11 initialized component; specifically the data show which surface formats the encoding component has internal capability to convert on the way to hardware encoder

It looks like this tool was not described in detail earlier so one could find other DXGI related information as well (such as, for example, order of enumeration of DXGI adapters depending on whether an app is running on iGPU or dGPU on a hybrid system; DXGI desktop duplication related information).

This is reported directly from AMF as opposed to information received from Media Foundation API (which is also partially included though). On video encoders reported via Media Foundation, not just H.264 ones, see MediaFoundationVideoEncoderTransforms: Detecting support for hardware H.264 video encoders.

# Display Devices

 * Radeon RX 570 Series
  * Instance: PCI\VEN_1002&DEV_67DF&SUBSYS_E3871DA2&REV_EF\4&2D78AB8F&0&0008
  * DEVPKEY_Device_Manufacturer: Advanced Micro Devices, Inc.
  * DEVPKEY_Device_DriverVersion: 24.20.13017.5001
  * DEVPKEY_Undocumented_LUID: 0.0x0000D1B8

[...]

##### AMD AMF SDK Specific

 * AMF SDK Version: 1.4.9.0 // https://gpuopen.com/gaming-product/advanced-media-framework/
 * AMF Runtime Version: 1.4.9.0

###### AMFVideoEncoderVCE_AVC

 * Acceleration Type: AMF_ACCEL_HARDWARE
 * AMF_VIDEO_ENCODER_CAP_MAX_BITRATE: 100,000,000
 * AMF_VIDEO_ENCODER_CAP_NUM_OF_STREAMS: 16
 * AMF_VIDEO_ENCODER_CAP_MAX_PROFILE: AMF_VIDEO_ENCODER_PROFILE_HIGH
 * AMF_VIDEO_ENCODER_CAP_MAX_LEVEL: 52
 * AMF_VIDEO_ENCODER_CAP_BFRAMES: 0
 * AMF_VIDEO_ENCODER_CAP_MIN_REFERENCE_FRAMES: 1
 * AMF_VIDEO_ENCODER_CAP_MAX_REFERENCE_FRAMES: 16
 * AMF_VIDEO_ENCODER_CAP_MAX_TEMPORAL_LAYERS: 1
 * AMF_VIDEO_ENCODER_CAP_FIXED_SLICE_MODE: 0
 * AMF_VIDEO_ENCODER_CAP_NUM_OF_HW_INSTANCES: 1

####### Input

 * Width Range: 64 - 4,096
 * Height Range: 64 - 2,160
 * Vertical Alignment: 32
 * Format Count: 6
 * Format: AMF_SURFACE_NV12 Native
 * Format: AMF_SURFACE_YUV420P 
 * Format: AMF_SURFACE_YV12 
 * Format: AMF_SURFACE_BGRA 
 * Format: AMF_SURFACE_RGBA 
 * Format: AMF_SURFACE_ARGB 
 * Memory Type Count: 4
 * Memory Type: AMF_MEMORY_DX11 Native
 * Memory Type: AMF_MEMORY_OPENCL 
 * Memory Type: AMF_MEMORY_OPENGL 
 * Memory Type: AMF_MEMORY_HOST 
 * Interlace Support: 0

####### Output

 * Width Range: 64 - 4,096
 * Height Range: 64 - 2,160
 * Vertical Alignment: 32
 * Format Count: 1
  * Format: AMF_SURFACE_NV12 Native
 * Memory Type Count: 4
  * Memory Type: AMF_MEMORY_DX11 Native
  * Memory Type: AMF_MEMORY_OPENCL 
  * Memory Type: AMF_MEMORY_OPENGL 
  * Memory Type: AMF_MEMORY_HOST 
 * Interlace Support: 0

Note that more detailed information can be obtained using amf\public\samples\CPPSamples\CapabilityManager application from the SDK itself, if you build and run it.

Download links

Runtime H.264 encoder setting changes with AMD H.264 hardware MFT

One more AMD MFT related post for now. Some time ago I mentioned that Intel’s implementation of hardware H.264 video encoder Media Foundation Transform (MFT) is not implementing correctly runtime change of encoding settings. Respective Intel Developer Zone submission has received no follow-up and, presumably, attention over time. At this time it was a good moment to check how AMD is doing when it comes to adjustment of encoding settings on active session.

Let us recap:

  • Microsoft: software encoder supports the feature as documented;
  • Intel: fails to change settings;
  • Nvidia: settings change is supported in minimal documented extent;
  • AMD: ?

AMD H.264 hardware encoder fails to support the feature MSDN documentation mentions as required. Respective request triggers 0x80004001 E_NOTIMPL “Not implemented” exception.

AMF SDK in AMD Video Encoder MFTs

I had a wrong assumption that AMD’s H.264 video encoder MFT (and other Media Foundation primitives) are not based on AMF SDK. There were some references to ATI Technologies in the binary (AMDh264Enc64.dll) and most importantly I was unable to change tracing level of the MFT component on runtime. It was a guess that if AMF runtime is shared when MFT is loaded then change of tracing level would affect the MFT, which was not the case (or I did it wrong). Then the MFT DLL does not have direct reference to AMF runtime amfrt64.dll.

However an attempt to use AMD hardware video encoder incorrectly revealed its AMF base:

2018-10-02 11:14:52.434 7128 [AMFEncoderVCE] Error: …\runtime\src\components\EncoderVCE\EncoderVCEImpl.cpp(3057):Assertion failed:Surface format is not supported
2018-10-02 11:14:52.434 7128 [AMF MFT AMFEngine] Error: …\runtime\src\mft\mft-framework\Engine.cpp(348):AMFEngine(0)::SubmitInput(): SubmitInput() failed, AMF_RESULT(AMF_SURFACE_FORMAT_NOT_SUPPORTED)
2018-10-02 11:14:52.434 7128 [AMFAsyncMFTBase] Error: …\runtime\src\mft\mft-framework\AsyncMFTBase.cpp(1103):AsyncMFTBase(0)::ProcessInput(): SubmitInput() failed, AMF_RESULT(AMF_SURFACE_FORMAT_NOT_SUPPORTED)

Apparently the encoder MFT implementation is built over AMF and VCE. With a static link to AMF runtime perhaps.

Bonus picture (from True Audio Next and Multimedia AMD APIs in game and VR application development) suggests that AMD MFT shares the runtime with other consumers, which seems to be not exactly accurate (runtime seems to be a private static AMF_CORE_STATIC dependency there):

AMD’s three ways to spell “enhancement”

From AMD AMF SDK documentation, AMF_Video_Encode_API.pdf:

The typos are not a big deal for development, even though the symbol with the typo is a shortcut to a string with the same typo:

#define AMF_VIDEO_ENCODER_NUM_TEMPORAL_ENHANCMENT_LAYERS L"NumOfTemporalEnhancmentLayers"

The SDK offers a good frontend to AMD VCE hardware encoders, however there are a few unfortunate problems:

  • documentation is incomplete: covers most important but skips too many details
    • as a small example, use of important AMF_VIDEO_ENCODER_EXTRADATA is not covered by documentation; those needing it are not their own to figure out the answer themselves
  • the SDK is good in its structure, convenient and “transparent” – its debug mode is pretty helpful
  • alternative method (this post remains in good standing) to consume hardware encoders are Windows Media Foundation Transforms (MFT), which are stable and efficient, but not so much documented too and lack flexibility; additionally it seems they are not in development and are not directly relying on this SDK

As we take no compromise our experimental AMF SDK based H.264 encoding MFT is eventually slightly more efficient compared to vendor’s but not significantly.

Where is ID3D11DeviceChild::GetPrivateDataInterface?

ID3D11DeviceChild similarly to a few other related interfaces offers methods including:

SetPrivateDataInterface option extends SetPrivateData by adding COM reference management to application defined data. However there is no GetPrivateDataInterface… A reasonable assumption is that it is a single collection of keyed application defined data so it is possible to read interface values using GetPrivateData method. The behavior should have been documented to avoid confusion.

I would perhaps not have posted this if there was no additional aspect. If I could read my interface values attached by SetPrivateDataInterface using GetPrivateData method, should I expect the returned values to be IUnknown::AddRef‘ed or not?

ID3D11DeviceChild* p = …
IUnknown* pUnknownA = …
p->SetPrivateDataInterface(__uuidof(MyKey), pUnknownA);
…
IUnknown* pUnknownB;
UINT nDataSize = sizeof pUnknownB;
p->GetPrivateData(__uuidof(MyKey), &nDataSize, &pUnknownB);
// QUES: pUnknownB is AddRef'ed or not?

It is indeed possible to retrieve the interface data. With no documented behavior I would expect no IUnknown::AddRef to be done. Rationale: after all I am using the method from the pair which is not supposed to deal with interface pointers. An argument against is that even though taking a raw copy of pointer is not a big deal, in multi-thread environment it might so happen that the returned unreferenced pointer is gone and invalidated if a concurrent thread replaces the collection value and internal IUnknown::Release on the pointer results in object disposal.

My guess on the behavior in part of COM referencing was incorrect: the API does do IUnknown::AddRef. Also this behavior is documented in the DXGI section for IDXGIObject::GetPrivateData method:

If the data returned is a pointer to an IUnknown, or one of its derivative classes, previously set by IDXGIObject::SetPrivateDataInterface, you must call ::Release() on the pointer before the pointer is freed to decrement the reference count.

Presumably the same behavior applies to APIs having no explicitly mentioned documented behavior, including ID3D11Device::GetPrivateData method, ID3D12Object::GetPrivateData method and other.

Injecting raw audio data into media pipeline (Russian)

I am reposting a Q+A from elsewhere on injecting raw audio data obtained externally into Windows API media pipeline (in Russian).


Q: … какой самый простой способ превратить порции байтов в формате PCM в сжатый формат, например WMA используя только средства Windows SDK? […] я так понял, что без написания своего фильтра DirectShow (DS) – source или capture? – поток байтов не сконвертировать. Ð’ случае же Media Foundation (MF) я надеялся найти пример в инете, но почему-то есть хороший пример записи loopback в WAV файл или конвертации из WAV в WMA, но использование промежуточного файла очень неэффективно, тем более что следующей задачей будет потоковая передача этого звука по сети параллельно с записью в файл. Сейчас я пытаюсь разобраться с IMFTransform::ProcessInput, но он требует на вход не байты, а IMFSample, а конкретных примеров затащить байты в IMFSample я пока не нашёл. Просто у меня сложилось впечатление, что и DS и MF для такой, казалось бы, простой задачи требуют создания COM-объектов да ещё и их регистрацию в системе. Неужто нет более простого способа?

A: Готового решения для вталкивания данных в тракт DS или MF нет. Сделать самостоятельно необходимую стыковку – довольно посильная задача и поэтому, видимо, Microsoft устранились в предоставлении готового решения, которое всё равно не каждому подойдёт по разным причинам.

Аудиопоток – это всегда не только поток байтов, но и формат, и привязка ко времени, а поэтому те компоненты, которые работают с байтами, обычно оперируют мультиплексированными форматами (типа .WAV, к примеру). Раз у вас именно порции PCM данных, то это задача для, действительно, или custom DirectShow source filter, или custom Media Foundation media/stream source. Их реализация даст вам необходимую склейку и, вообще говоря, это и есть простой способ. Ð’ частности, он куда проще, чем попытаться сделать это через файл.

Ни в случае DS, ни в случае MF не требуется регистрация в системе. Можно, конечно, и с ней, но это необязательно. Когда у вас реализован необходимый класс, то собирая топологию его можно использовать непосредственно, без включения в топологию через системную регистрацию.

Ð’ случае DS вам нужно сделать собственный audio source filter. Сложная часть задачи заключается в том, что вам придётся опереться на довольно старый code base (DirectShow base classes) и в том, что, как бы там ни было, DirectShow API – в конце своего жизненного пути. Тем не менее, в старых SDK есть пример Synth Filter Sample, есть еще пример Ball Filter Sample для видео и другие, которые показывают как сделать source filter и, честно говоря, они довольно компактны. Необходимый вам фильтр будет достаточно простым, когда вы разберётесь что к чему. по использованию фильтра без регистрации вы также сможете найти информацию, к примеру, отсюда Using a DirectShow filter without registering it, via a private CoCreateInstance.

В случае MF, ситуация в какой-то мере схожая. Можно было бы, конечно, формировать в памяти поток формата .WAV и передавать его в топологию MF как поток байтов. Такая возможность и гибкость API имеется, но я бы посоветовал также использовать собственный media source который генерирует поток данных PCM из тех кусков, которые вы в него подкладываете. К преимуществам MF относится то, что это более новое и текущее API, у которого шире охват на современных платформах. Возможно, также, что необходимый код вы сможете сделать на C#, если опять же в этом есть нужда. Плохие новости заключаются в том, что по своей структуре такой COM класс будет определенно сложнее и понадобится чуть глубже копнуть API. Информации и примеров немного, и кроме этого сам MF едва ли предлагает лучшие и/или более понятные возможности по стандартным кодекам, возможности отправлять данные в файл и сеть, по инструментам разработки. Ближайший пример из SDK, будет, видимо, MPEG1Source Sample и, как мне кажется, в нём непросто сходу разобраться.

Если у вас нет конкретных предпочтений в плане API, то для этой задачи и с учётом описанной вами ситуации я бы предложил DirectShow. Однако если помимо описанного вопроса у вас есть причины, ограничения, основания по которым необходимо использовать Media Foundation, то в таком случае, возможно, будет предпочтительнее разрабатывать и обработку аудио данных в рамках этого API. Вместе с тем создание источников данных для обоих API, как я написал сначала, являются вполне посильной задачей и будут работать надёжно и эффективно.