FaceRig, Media Foundation and Adoriasoft

This is nice. One of the users submitted Media Foundation capture capability printout (here it goes exactly), with an interesting Media Foundation source: FaceRig Virtual Camera.

Media Foundation API is not so friendly for extensibility, and not so popular overall, so virtual device there is an interesting thing.

FaceRig is a program enabling anyone with a webcam to digitally embody any character they want, software coming from Bucharest, Romania. Here is their announcement regarding virtual Media Foundation extension:

Support for all-new, all-special FaceRig Virtual Webcam driver. This driver was developed for us by a company employing virtual device drivers specialists, called Adoriasoft. It should have significantly better compatibility than the old virtual webcam. It works with Skype Metro, and a plethora of less known chat apps and browsers (for instace using Chrome for Chatroulette or Omegle is no longer a requirement :) ).

Indeed, their virtual camera has a device path:

`MF_DEVSOURCE_ATTRIBUTE_SOURCE_TYPE_VIDCAP_SYMBOLIC_LINK`: \\?\root#image#0000#{65e8773d-8f56-11d0-a3b9-00a0c9223196}\global

The device backed by “hardware representation” is picked up by Media Foundation because starting in Windows 7, Media Foundation automatically supports audio and video capture devices. For video, the device must provide a kernel streaming (KS) minidriver in the video capture category. Media Foundation uses the PnP path to enumerate the device.

So this Adoriasoft company… appears to be a company from Kharkov, Ukraine, 20 minute drive from me. Well, good job guys!

Media Foundation Video/Audio Capture Capabilities

Just like with DirectShow video capture capability information, it is helpful to unerstand what Medfia Foudnation video capture offering is exactly. Specifically:

H.264 realated attributes in media types might be not so obvious and device might report too many types.

  • Major Type: MFMediaType_Video
  • Compressed: 1
  • 25 Attributes
    • MF_MT_MAJOR_TYPE: MFMediaTypeVideo
    • MF_MT_SUBTYPE: MFVideoFormatH264ES
    • MF_MT_COMPRESSED: 1 (Type VTUI4)
    • MF_MT_ALL_SAMPLES_INDEPENDENT: 0 (Type VTUI4)
    • MF_MT_FIXED_SIZE_SAMPLES: 0 (Type VTUI4)
    • MF_MT_FRAME_SIZE: 755914244240 (Type VTUI8) // Width 176, Height 144
    • MF_MT_PIXEL_ASPECT_RATIO: 4294967297 (Type VTUI8) // Numerator 1, Denominator 1
    • MF_MT_INTERLACE_MODE: 2 (Type VTUI4) // MFVideoInterlaceProgressive
    • MF_MT_FRAME_RATE: 128849018881 (Type VTUI8) // Numerator 30, Denominator 1
    • MF_MT_FRAME_RATE_RANGE_MIN: 128849018881 (Type VTUI8) // Numerator 30, Denominator 1
    • MF_MT_FRAME_RATE_RANGE_MAX: 128849018881 (Type VTUI8) // Numerator 30, Denominator 1
    • MF_MT_AVG_BITRATE: 6003500 (Type VTUI4)
    • MF_MT_AM_FORMAT_TYPE: {2017BE05-6629-4248-AAED-7E1A47BC9B9C}
    • MF_MT_VIDEO_PROFILE: 257 (Type VTUI4) // eAVEncH264VProfileUCConstrainedHigh
    • MF_MT_VIDEO_LEVEL: 40 (Type VTUI4) // eAVEncH264VLevel4
    • MF_MT_H264_MAX_MB_PER_SEC: F5 00 00 00 00 00 00 00 F5 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    • MF_MT_H264_SUPPORTED_USAGES: 3 (Type VTUI4)
    • MF_MT_H264_SUPPORTED_RATE_CONTROL_MODES: 15 (Type VTUI4)
    • MF_MT_H264_SUPPORTED_SYNC_FRAME_TYPES: 10 (Type VTUI4)
    • MF_MT_H264_SIMULCAST_SUPPORT: 0 (Type VTUI4)
    • MF_MT_H264_CAPABILITIES: 40 (Type VTUI4)
    • MF_MT_H264_SUPPORTED_SLICE_MODES: 14 (Type VTUI4)
    • MF_MT_H264_RESOLUTION_SCALING: 3 (Type VTUI4)
    • MF_MT_H264_MAX_CODEC_CONFIG_DELAY: 1 (Type VTUI4)
    • MF_MT_H264_SVC_CAPABILITIES: 1 (Type VTUI4)

MediaFoundationCaptureCapabilities

Download:

Sample Data:

See Also:

CLSID_VideoInputDeviceCategory and Media Foundation

Media Foundation as video capture API is inflexible. At Microsoft – besides standard Media Foundation problems of backward compatibility, availability of developer tools and overall awkwardness – they decided to no longer offer video capture extensibility with Media Foundation. Be happy with MFEnumDeviceSources and don’t go anywhere else. They explain that they already provided support for devices backed by kernel streaming drivers:

Starting in Windows 7, Media Foundation automatically supports audio and video capture devices. For video, the device must provide a kernel streaming (KS) minidriver in the video capture category. Media Foundation uses the PnP path to enumerate the device. For audio, Media Foundation uses the Windows Multimedia Device (MMDevice) API to enumerate audio endpoint devices. If the device meets these criteria, there is no need to implement a custom media source.

The next paragraph there is slyness:

However, you might want to implement a custom media source for some other type of device or other live data source. There are only a few differences between a live source and other media sources.

Indeed, you can implement a custom media source, however you cannot implement a backing object (Media Foundation Transform – see below) that standard media source would use, and you cannot make your own video source discoverable by applications so that a custom video source is a new option for video capture enabled applications using Media Foundation.

Over years developers were eagerly interested in various aspects of video capture om Windows platform using VFW and then DirectShow. Including specifically implementing a virtual camera device, for which Microsoft provided Push Source Filters Sample, which then was extended to popular VCam sample that “publishes” video source device and makes it available to applications enumerating video capture hardware. The latest API, Media Foundation, blocked the opportunity to provide a custom video source.

The interesting thing though is that there is no fundamental problem in allowing such extensibility: just a few pieces are missing.

For starters, MFTEnum enumerates objects in, well, DirectShow’s CLSID_VideoInputDeviceCategory category. This is not documented, but this shows how tightly Media Foundation and DirectShow (and related kernel drivers) are connected.

Category: CLSID_VideoInputDeviceCategory {860BB310-5D01-11D0-BD3B-00A0C911CE86}

Logitech Webcam C930e #0
    MFT_ENUM_HARDWARE_URL_Attribute: \\?\usb#vid_046d&pid_0843&mi_00#6&2314864d&0&0000#{65e8773d-8f56-11d0-a3b9-00a0c9223196}\global (Type VT_LPWSTR)
    MFT_TRANSFORM_CLSID_Attribute: {8AC3587A-4AE7-42D8-99E0-0A6013EEF90F} (Type VT_CLSID)
    MFT_OUTPUT_TYPES_Attributes: 
        MFMediaType_Video MFVideoFormat_YUY2
        MFMediaType_Video MFVideoFormat_MJPG
    MF_TRANSFORM_FLAGS_Attribute: MFT_ENUM_FLAG_HARDWARE

Blackmagic WDM Capture #1
    MFT_ENUM_HARDWARE_URL_Attribute: \\?\decklink#avstream#5&2db0fd5&1&0000#{65e8773d-8f56-11d0-a3b9-00a0c9223196}\decklinkcapture1 (Type VT_LPWSTR)
    MFT_TRANSFORM_CLSID_Attribute: {8AC3587A-4AE7-42D8-99E0-0A6013EEF90F} (Type VT_CLSID)
    MFT_OUTPUT_TYPES_Attributes: 
        MFMediaType_Video MFVideoFormat_UYVY
        MFMediaType_Video MFVideoFormat_v210
        MFMediaType_Video FourCC HDYC
        MFMediaType_Audio MFAudioFormat_PCM
    MF_TRANSFORM_FLAGS_Attribute: MFT_ENUM_FLAG_HARDWARE

Any questions? What MFEnumDeviceSources API does is enumeration in this category, and building device COM objects on top of existing MFTs. Using MFT for video source is actually a smart move. This should have been of course done in DirectShow many years ago, and with DMOs instead of MFTs.

DirectX Media Objects (DMOs) got a compact and powerful form factor. Video and audio source implementation can be nicely put in “zero input one output” DMO and then used by standard objects on top of that. Similarly to DirectShow DMO Wrapper Filter but for source filters. This was never done in DirectShow, unfortunately. In Media Foundation DMOs got their obese brother class: Media Foundation Transform, which is pretty much the same, just bloated.

This time Media Foundation guys implemented their base block, MFT, over video capture hardware items, which APIs like MFEnumDeviceSources and MFCreateDeviceSource picks up and uses on their backyard.

Frontend code activating media source goes inside to enumerate formats right there to the inner MFT, its IMFTransform::GetOutputAvailableType through standard Media Foundation implementation for video device source, mfcore‘s CDeviceSource class.

MyTransform::GetOutputAvailableType(unsigned long nOutputStreamIdentifier, unsigned long nTypeIndex, IMFMediaType * * ppMediaType) Line 1033 C++
mfcore.dll!CDeviceSource::GetDeviceStreamType(unsigned long) Unknown
mfcore.dll!CDeviceSource::CreateStreams(void) Unknown
mfcore.dll!CDeviceSource::CDeviceSource(struct IMFTransform *,struct _GUID,struct IMFAttributes *,long *) Unknown
mfcore.dll!CDeviceSource::CreateInstance(struct IMFTransform *,struct _GUID,struct IMFAttributes *,struct IMFMediaSource * *) Unknown
mfcore.dll!MFCreateDeviceSource() Unknown

Capture of frames takes place on WinRT worker thread via IMFTransform::ProcessOutput:

MyTransform::ProcessOutput(unsigned long nFlags, unsigned long nBufferCount, MFTOUTPUTDATABUFFER * pBuffers, unsigned long * pnStatus) Line 1281 C++
mfcore.dll!CDeviceSource::OnMFTEventReceived(struct IMFAsyncResult *) Unknown
mfcore.dll!CDeviceSource::OnMFTEventReceivedAsyncCallback::Invoke(struct IMFAsyncResult *) Unknown
RTWorkQ.dll!CSerialWorkQueue::QueueItem::ExecuteWorkItem(struct IMFAsyncResult *) Unknown
RTWorkQ.dll!CBaseWorkQueue::HandleConcurrentMMCSSEnter(class CRealTimeState *) Unknown
ntdll.dll!TppWorkpExecuteCallback() Unknown
ntdll.dll!TppWorkerThread() Unknown
kernel32.dll!BaseThreadInitThunk() Unknown
ntdll.dll!RtlUserThreadStart() Unknown

That is, the base building block for video capture in Media Foundation is MFT. Excellent! So do they allow registering your own MFT to provide the applications with a custom video device? Not really. The operation of CDeviceSource and Microsoft’s implementation for the MFT (“Device Proxy MFT”) is based on intimate assumptions between the two, and is not documented. When/if this goes public, we will start implementing virtual cameras the same way we did with good old DirectShow.

Not so good H.264 media type

MainConcept’s MP4 Demultiplexer in Annex B mode looks, well… slightly excessively broken.

MainComcept MP4 Demultiplexer Properties

  1. H.264 media type with start codes (H264 FourCC, but here they use legacy subtype informally known as MEDIASUBTYPE_H264_bis) do not require parameter sets as a part of MPEGVIDEOINFO2 structure. If they however decided to provide the NAL units, they have to be RLE encoded, without start codes. MainConcept does it Aneex B way – not good.
  2. Zero BITMAPINFOHEADER::biSize?
  3. BITMAPINFOHEADER::biBitCount of 24 is hardly correct, but it is not fatal
  4. Additionally, they do memory allocator of default capacity of 64K followed by streaming larger samples…

Oh.

Needless to mention that this sort of connection simply has no chances to succeed:

Trying to Connect MainConcept MP4 Demultiplexer and Microsoft H.264 Decoder

Windows 10 AVI Splitter bug

There were a few reports that in Windows 10 it is unable to play AVI files, which played fine in earlier versions of Windows, AVI files specifically.

OK, the problem does exist. More to say, the problem exist in Windows component that implements AVI Splitter DirectShow filter. One of the reporters mentioned he had a problem with a DV AVI flie. I build one and it indeed showed the problem:

AVI Splitter bug in GraphStudioNext

Playback stops at the same frame every time the filter graph is run. The error is 0x8004020D VFW_E_BUFFER_OVERFLOW “The buffer is not big enough” coming from AVI Splitter’s worker thread. The buffers on the memory allocators look appropriate, so the bug looks related to AVI Splitter implementation details, CBaseMSRWorker class that reads from file and delivers frames downstream.

AVI Splitter bug call stack

The problem exists in 32 and 64 bit versions, but not in Media Foundation. With certain luck Microsoft will fix the problem on their side.

Blackmagic Design’s “Decklink Video Capture” filters

Pulling this out from Blackmagic Design Forum thread:

Generally, the recommended interface to the capture cards is the DeckLink API.

A DirectShow interface is available, but provides a subset of the functionality available from the complete DeckLink API.

Please note that the older, user-space DirectShow filters (DeckLink Video Capture) are deprecated in favour of the WDM filters (Blackmagic WDM Capture).

The WDM filters added support for 4K modes in Desktop Video 10.5+.

So the “Decklink Video Capture” filters that wrap the DeckLink SDK and provide convenient DirectShow interface are at their end of life.

Certainly, the most efficient and flexible way to interface Blackmagic Design hardware is to use their SDK (which is good and easy to use), however it does not give the immediate connectivity to Windows APIs. User mode filters were a good wrapper and provided typical functionality for capture and playback. They had their own issues (e.g. no VideoInfo2 support – interlaced formats treated as progressive and no support for progressive formats that collide with interlaces), also some reported 64-bit versions to be not quite stable.

WDM filters are around for some time, specifically they do offer 32-bit audio capture option which the other filters did not have. From what I remember they are lacking other capabilities availalble through SDK (update – e.g. no timecode support).

Apparently WMD filters do not offer playback option via DirectShow. This is not even mentioning the unfortunate Media Foundation – even though “Blackmagic WDM Render” is somehow around and with a certain luck is listed through MFTEnum:

    Blackmagic WDM Render #3
        MFT_ENUM_HARDWARE_URL_Attribute: \\?\decklink#avstream#5&2db0fd5&1&0000#{65e8773e-8f56-11d0-a3b9-00a0c9223196}\decklinkrender1 (Type VT_LPWSTR)
        MFT_INPUT_TYPES_Attributes: 
            MFMediaType_Video MFVideoFormat_UYVY
            MFMediaType_Video MFVideoFormat_v210
            MFMediaType_Video MFVideoFormat_UYVY
            MFMediaType_Video MFVideoFormat_v210
            MFMediaType_Video MFVideoFormat_UYVY
            MFMediaType_Video MFVideoFormat_v210
            MFMediaType_Video MFVideoFormat_UYVY
            MFMediaType_Video MFVideoFormat_v210
            MFMediaType_Video MFVideoFormat_UYVY
            MFMediaType_Video MFVideoFormat_v210
            MFMediaType_Video FourCC HDYC
            MFMediaType_Video FourCC HDYC
            MFMediaType_Video FourCC HDYC
            MFMediaType_Video FourCC HDYC
            MFMediaType_Video FourCC HDYC
            MFMediaType_Video FourCC HDYC
            MFMediaType_Video FourCC HDYC
            MFMediaType_Video FourCC HDYC
            MFMediaType_Audio MFAudioFormat_PCM
            MFMediaType_Audio MFAudioFormat_PCM
            
        MFT_TRANSFORM_CLSID_Attribute: {8AC3587A-4AE7-42D8-99E0-0A6013EEF90F} (Type VT_CLSID)
        MF_TRANSFORM_FLAGS_Attribute: MFT_ENUM_FLAG_HARDWARE

For more or less serious DirectShow development the best way was and is to wrap the DeckLink SDK with a custom filter and have all options that SDK provides.

Logitech C930e camera and Media Foundation

Logitech’s C930e camera is the first one to be compliant with UVC 1.5 specification:

First 1080p HD webcam to support H.264 with Scalable Video Coding and UVC 1.5 encoding technology. […] The result is a smoother video stream in applications like Skype for Business and Microsoft® Lync® 2013.

More marketing information there at Logitech. More interesting is what the new capabilities look from API side programmatically. Additionally to well known Motion JPEG (FourCC MJPG) and YUY2 video, the camera delivers H.264 (FourCC H264) video.

Logitech C930e Webcam

Lync (Skype for Business) is presumably modified to accept that and it communicates to the camera using Media Foundation API.

The camera’s H.264 capabilities are accessible using both APIs, DirectShow and Media Foundation, and there is apparently a mess with driver versions and operating system versions as well. The best results are achieved with stock driver from Microsoft (without installing Logitech driver, this information is in good standing: “The only way I was able to get that stream under Windows 8.x was by NOT USING LOGITECH DRIVERS. This is a UVC 1.5 compatible camera and it will be configured automatically by the OS. With that driver (from Microsoft), use pin 1 (not 0) and you will get a ton of H264 formats.”).

A printout of DirectShow capabilities using DirectShowCaptureCapabilities is available here (note KS_H264VIDEOINFO structure). This time it is about what it looks when one’s doing Media Foundation.

As a Media Source, exposed are a few attributes and a great deal of media types (216 + 476), greater amount compared to DirectShow as it seems:

    • MF_DEVSOURCE_ATTRIBUTE_MEDIA_TYPE: 76 69 64 73 00 00 10 00 80 00 00 AA 00 38 9B 71 59 55 59 32 00 00 10 00 80 00 00 AA 00 38 9B 71
    • MF_DEVSOURCE_ATTRIBUTE_SOURCE_TYPE_VIDCAP_SYMBOLIC_LINK: \\?\usb#vid_046d&pid_0843&mi_00#6&2314864d&0&0000#{e5323777-f976-4f5b-9b55-b94699c46e44}\global (Type `VT_LPWSTR`)
    • MF_DEVSOURCE_ATTRIBUTE_FRIENDLY_NAME: Logitech Webcam C930e (Type `VT_LPWSTR`)
    • MF_DEVSOURCE_ATTRIBUTE_SOURCE_TYPE_VIDCAP_CATEGORY: KSCATEGORY_VIDEO_CAMERA
    • MF_DEVSOURCE_ATTRIBUTE_SOURCE_TYPE: MF_DEVSOURCE_ATTRIBUTE_SOURCE_TYPE_VIDCAP_GUID
    • MF_DEVSOURCE_ATTRIBUTE_SOURCE_TYPE_VIDCAP_HW_SOURCE: 4 (Type `VT_UI4`)
  • Characteristics: MFMEDIASOURCE_IS_LIVE | MFMEDIASOURCE_CAN_PAUSE
  • Stream 0: Default Selected, Identifier 0x0, Major Type MFMediaType_Video, 216 Media Types
  • Stream 1: Identifier 0x1, Major Type MFMediaType_Video, 476 Media Types

The H.264 formats are marked with subtypes of MFVideoFormat_H264 and MFVideoFormat_H264_ES. A raw print out is downloadable:

Specifically, it is interesting what are the attributes there since with Media Foundation it is tricky thing to find out quickly. The keys/identifiers are listed below.

Common

  • MF_MT_ALL_SAMPLES_INDEPENDENT
  • MF_MT_AM_FORMAT_TYPE
  • MF_MT_AVG_BITRATE
  • MF_MT_FIXED_SIZE_SAMPLES
  • MF_MT_FRAME_RATE
  • MF_MT_FRAME_RATE_RANGE_MAX
  • MF_MT_FRAME_RATE_RANGE_MIN
  • MF_MT_FRAME_SIZE
  • MF_MT_INTERLACE_MODE
  • MF_MT_MAJOR_TYPE
  • MF_MT_PIXEL_ASPECT_RATIO
  • MF_MT_SUBTYPE

MFVideoFormat_H264, MFVideoFormat_H264_ES

  • MF_MT_COMPRESSED
  • MF_MT_H264_CAPABILITIES
  • MF_MT_H264_MAX_CODEC_CONFIG_DELAY
  • MF_MT_H264_MAX_MB_PER_SEC
  • MF_MT_H264_RESOLUTION_SCALING
  • MF_MT_H264_SIMULCAST_SUPPORT
  • MF_MT_H264_SUPPORTED_RATE_CONTROL_MODES
  • MF_MT_H264_SUPPORTED_SLICE_MODES
  • MF_MT_H264_SUPPORTED_SYNC_FRAME_TYPES
  • MF_MT_H264_SUPPORTED_USAGES
  • MF_MT_H264_SVC_CAPABILITIES
  • MF_MT_VIDEO_LEVEL
  • MF_MT_VIDEO_PROFILE

MFVideoFormat_MJPG

  • MF_MT_SAMPLE_SIZE
  • MF_MT_VIDEO_CHROMA_SITING
  • MF_MT_VIDEO_LIGHTING
  • MF_MT_VIDEO_NOMINAL_RANGE
  • MF_MT_VIDEO_PRIMARIES
  • MF_MT_YUV_MATRIX

MFVideoFormat_YUY2

  • MF_MT_DEFAULT_STRIDE
  • MF_MT_SAMPLE_SIZE
  • MF_MT_VIDEO_CHROMA_SITING
  • MF_MT_VIDEO_LIGHTING
  • MF_MT_VIDEO_NOMINAL_RANGE
  • MF_MT_VIDEO_PRIMARIES
  • MF_MT_YUV_MATRIX