CLSID_VideoInputDeviceCategory and Media Foundation

Media Foundation as video capture API is inflexible. At Microsoft – besides standard Media Foundation problems of backward compatibility, availability of developer tools and overall awkwardness – they decided to no longer offer video capture extensibility with Media Foundation. Be happy with MFEnumDeviceSources and don’t go anywhere else. They explain that they already provided support for devices backed by kernel streaming drivers:

Starting in Windows 7, Media Foundation automatically supports audio and video capture devices. For video, the device must provide a kernel streaming (KS) minidriver in the video capture category. Media Foundation uses the PnP path to enumerate the device. For audio, Media Foundation uses the Windows Multimedia Device (MMDevice) API to enumerate audio endpoint devices. If the device meets these criteria, there is no need to implement a custom media source.

The next paragraph there is slyness:

However, you might want to implement a custom media source for some other type of device or other live data source. There are only a few differences between a live source and other media sources.

Indeed, you can implement a custom media source, however you cannot implement a backing object (Media Foundation Transform – see below) that standard media source would use, and you cannot make your own video source discoverable by applications so that a custom video source is a new option for video capture enabled applications using Media Foundation.

Over years developers were eagerly interested in various aspects of video capture om Windows platform using VFW and then DirectShow. Including specifically implementing a virtual camera device, for which Microsoft provided Push Source Filters Sample, which then was extended to popular VCam sample that “publishes” video source device and makes it available to applications enumerating video capture hardware. The latest API, Media Foundation, blocked the opportunity to provide a custom video source.

The interesting thing though is that there is no fundamental problem in allowing such extensibility: just a few pieces are missing.

For starters, MFTEnum enumerates objects in, well, DirectShow’s CLSID_VideoInputDeviceCategory category. This is not documented, but this shows how tightly Media Foundation and DirectShow (and related kernel drivers) are connected.

Category: CLSID_VideoInputDeviceCategory {860BB310-5D01-11D0-BD3B-00A0C911CE86}

Logitech Webcam C930e #0
    MFT_ENUM_HARDWARE_URL_Attribute: \\?\usb#vid_046d&pid_0843&mi_00#6&2314864d&0&0000#{65e8773d-8f56-11d0-a3b9-00a0c9223196}\global (Type VT_LPWSTR)
    MFT_TRANSFORM_CLSID_Attribute: {8AC3587A-4AE7-42D8-99E0-0A6013EEF90F} (Type VT_CLSID)
        MFMediaType_Video MFVideoFormat_YUY2
        MFMediaType_Video MFVideoFormat_MJPG

Blackmagic WDM Capture #1
    MFT_ENUM_HARDWARE_URL_Attribute: \\?\decklink#avstream#5&2db0fd5&1&0000#{65e8773d-8f56-11d0-a3b9-00a0c9223196}\decklinkcapture1 (Type VT_LPWSTR)
    MFT_TRANSFORM_CLSID_Attribute: {8AC3587A-4AE7-42D8-99E0-0A6013EEF90F} (Type VT_CLSID)
        MFMediaType_Video MFVideoFormat_UYVY
        MFMediaType_Video MFVideoFormat_v210
        MFMediaType_Video FourCC HDYC
        MFMediaType_Audio MFAudioFormat_PCM

Any questions? What MFEnumDeviceSources API does is enumeration in this category, and building device COM objects on top of existing MFTs. Using MFT for video source is actually a smart move. This should have been of course done in DirectShow many years ago, and with DMOs instead of MFTs.

DirectX Media Objects (DMOs) got a compact and powerful form factor. Video and audio source implementation can be nicely put in “zero input one output” DMO and then used by standard objects on top of that. Similarly to DirectShow DMO Wrapper Filter but for source filters. This was never done in DirectShow, unfortunately. In Media Foundation DMOs got their obese brother class: Media Foundation Transform, which is pretty much the same, just bloated.

This time Media Foundation guys implemented their base block, MFT, over video capture hardware items, which APIs like MFEnumDeviceSources and MFCreateDeviceSource picks up and uses on their backyard.

Frontend code activating media source goes inside to enumerate formats right there to the inner MFT, its IMFTransform::GetOutputAvailableType through standard Media Foundation implementation for video device source, mfcore‘s CDeviceSource class.

MyTransform::GetOutputAvailableType(unsigned long nOutputStreamIdentifier, unsigned long nTypeIndex, IMFMediaType * * ppMediaType) Line 1033 C++
mfcore.dll!CDeviceSource::GetDeviceStreamType(unsigned long) Unknown
mfcore.dll!CDeviceSource::CreateStreams(void) Unknown
mfcore.dll!CDeviceSource::CDeviceSource(struct IMFTransform *,struct _GUID,struct IMFAttributes *,long *) Unknown
mfcore.dll!CDeviceSource::CreateInstance(struct IMFTransform *,struct _GUID,struct IMFAttributes *,struct IMFMediaSource * *) Unknown
mfcore.dll!MFCreateDeviceSource() Unknown

Capture of frames takes place on WinRT worker thread via IMFTransform::ProcessOutput:

MyTransform::ProcessOutput(unsigned long nFlags, unsigned long nBufferCount, MFTOUTPUTDATABUFFER * pBuffers, unsigned long * pnStatus) Line 1281 C++
mfcore.dll!CDeviceSource::OnMFTEventReceived(struct IMFAsyncResult *) Unknown
mfcore.dll!CDeviceSource::OnMFTEventReceivedAsyncCallback::Invoke(struct IMFAsyncResult *) Unknown
RTWorkQ.dll!CSerialWorkQueue::QueueItem::ExecuteWorkItem(struct IMFAsyncResult *) Unknown
RTWorkQ.dll!CBaseWorkQueue::HandleConcurrentMMCSSEnter(class CRealTimeState *) Unknown
ntdll.dll!TppWorkpExecuteCallback() Unknown
ntdll.dll!TppWorkerThread() Unknown
kernel32.dll!BaseThreadInitThunk() Unknown
ntdll.dll!RtlUserThreadStart() Unknown

That is, the base building block for video capture in Media Foundation is MFT. Excellent! So do they allow registering your own MFT to provide the applications with a custom video device? Not really. The operation of CDeviceSource and Microsoft’s implementation for the MFT (“Device Proxy MFT”) is based on intimate assumptions between the two, and is not documented. When/if this goes public, we will start implementing virtual cameras the same way we did with good old DirectShow.

Leave a Reply