Intel® RealSense™ Camera in DirectShow/Media Foundation

There is an intersting submission for video capture device capabilities for “The Short-Range Intel® RealSenseâ„¢ Camera F200” camera. Another blog user earlier mentioned they have a good stock of the devices with plans to take advantage of new technology.

Intel Realsense camera ad from Intel website

It sounds like the new cameras offer new opportunities for application in user interaction, in ability to conveniently enhance user experience with things like gestures etc.

This is what the camera looks like on the software side:

  • Intel(R) RealSense(TM) 3D Camera Virtual Driver
  • Intel(R) RealSense(TM) 3D Camera (Front F200) RGB
  • Intel(R) RealSense(TM) 3D Camera (Front F200) Depth

Presumably, there are synchronized video and depth sources. It might so happen that SDK offers other presentations of the data (snapshots for combined data and combined stream?).

So what it is all about in terms of how it looks for a video capture application and APIs? The video sensor offers standard video caps and YUV 4:2:2 video stream at 60 fps at resolutions up to 960×540, higher resolutions up to 1920×1080 at 30 fps. This exceeds USB 2.0 bandwidth, so this is either USB 3.0 device or there is hardware compression, with internal software decompression. The video device does not offers compressed video feed capabilities.

There is another video source named “Depth”. It offers YUY2 feed as well as other options with fancy FourCCs (ILER, IRNI, IVNI, IZNI, RVNI, ZVNI?) which is presumably delivering depth information at 640×480@60. Respective SDK is supposedly have the formats documented.

At 60 frames per second and supposedly low latency, the data should be a source of good real-time information to track gestures and reconstruction of short range 3D scene in front of the camera.

Original DirectShow and Media Foundation capability files:

Additional in-depth information about the technology:

DirectShowSpy: Restore default system behavior

There was a problem reported for registered and relocated DirectShowSpy, which might be causing issues: Deleting faulty DirectShowSpy registry key.

Some users that use a 3rd party tool called DirectShowSpy may encounter errors when logging in to XSplit.

This can be caused by a fault registry key that is introduced when DirectShowSpy is registered to intercept Filter Graph initialization — Filter Graph is used by XSplit. The faulty DirectShowSpy registry key is usually caused by DirectShowSpy program begin relocated after registration.

To workaround this situation, XSplit1 detects the presence of HKEYCLASSESROOT\CLSID{E436EBB3-524F-11CE-9F53-0020AF0BA770}\TreatAs registry key2 when it fails to initialize Filter Graph and exits when it is found. In this case, user must manually correct the DirectShowSpy registration or delete3 the registry key. Only after either is done can XSplit be restarted.

The description of the problem is good, solution is good but incomplete.

DirectShowSpy intercepts a few COM classes, not just one, and removing single registry value is only a partial fix.

DirectShowSpy.dll exports UnregisterTreatAsClasses function to accurately restore operation of system classes. It does registry permission magic and updates all COM classes involved. Default unregistration (DllUnregisterServer, regsvr32 /u) behavior is to restore original classes only in case they are currently overridden by DirectShowSpy. That is, if the DLL is moved (deleted) the broken registrations are retained in the registry during unregistration process.

UnregisterTreatAsClasses resolved this problem by forcing recovery of original classes no matter who is overriding them at the moment.

C:\>rundll32 DirectShowSpy-Win32.dll,UnregisterTreatAsClasses
C:\>rundll32 DirectShowSpy-x64.dll,UnregisterTreatAsClasses

DirectShowSpy: REGDB_E_CLASSNOTREG with IMMDevice::Activate

A DirectShow developer complained on sudden failure of Core Audio IMMDevice::Activate call supposed to instantiate a DirectShow filter for a given device.

The problem appeared to be related to installed DirectShowSpy and its interference with the API calls. The symptom was of the following kind: when Activate was called for different types of objects, the calls all succeeded except interoperation with DirectShow (activation for IBaseFilter), e.g. EnumerateAudioDevices output:

    IAudioClient            0x00000000
    IAudioEndpointVolume    0x00000000
    IAudioMeterInformation  0x00000000
    IAudioSessionManager    0x00000000
    IAudioSessionManager2   0x00000000
    IBaseFilter             REGDB_E_CLASSNOTREG
    IDeviceTopology         0x00000000
    IMFTrustedOutput        0x00000000

When Core Audio is requested to do DirectShow activation, the API creates and instance of System Device Enumerator, which is forwarded the activation call to. DirectShowSpy intercepts these calls, however what it did not do was support for unknown COM interfaces, and support for undocumented IMMDeviceActivator interface which is used internally by the APIs to forward the activation call.

So, System Device Enumerator implements documented ICreateDevEnum and then it also implements undocumented internal IMMDeviceActivator. The entire sequence call is as follows:

// Top level code:

CComPtr<IMMDevice> pDevice = ...; // Audio endpoint interface
pDevice->Activate(..., __uuidof(IBaseFilter), ...)

// API:

STDMETHOD(Activate)(...)
{
    // ...
    if(requested is IBaseFilter)
    {
        CComPtr<IMMDeviceActivator> pDeviceActivator;
        pDeviceActivator.CoCreateInstace(CLSID_SystemDeviceEnum);
        return pDeviceActivator->Activate(pDevice, ...)
    }

DirectShowSpy’s failure to provide IMMDeviceActivator resulted in symptom in question and is fixed with version 1.0.0.2106 and on. The failure code is not so much descriptive, but of course the APIs did not expect external hook and failure is not actually a supposed possible behavior there.

System Device Enumerator matches the known devices to the provided Core Audio device and creates an instance of respective filter – this is how APIs work together. DirectShowSpy prints these calls out to its log.

roatlbase.h(1582): TraceModuleVersion: "D:\...\DirectShowSpy-Win32.dll" version is 1.0.0.2107, running in "D:\...\EnumerateAudioDevices-Win32.exe" at 0x63210000
dllmain.h(36): CDirectShowSpyModule::CDirectShowSpyModule: this 0x633963A4
SystemDeviceEnumeratorSpy.h(669): CSystemDeviceEnumeratorSpyT<...>::CSystemDeviceEnumeratorSpyT: this 0x02F1DA68
SystemDeviceEnumeratorSpy.h(681): CSystemDeviceEnumeratorSpyT<...>::FinalConstruct: pszPath "D:\...\EnumerateAudioDevices-Win32.exe", this 0x02F1DA68, m_dwRef 1
SystemDeviceEnumeratorSpy.h(49): CSystemDeviceEnumeratorSpyT<...>::InternalQueryInterface: 0x02F1DA68, Interface {3B0D0EA4-D0A9-4B0E-935B-09516746FAC0}, Result 0x00000000
SystemDeviceEnumeratorSpy.h(49): CSystemDeviceEnumeratorSpyT<...>::InternalQueryInterface: 0x02F1DA68, Interface {3B0D0EA4-D0A9-4B0E-935B-09516746FAC0}, Result 0x00000000
SystemDeviceEnumeratorSpy.h(808): CSystemDeviceEnumeratorSpyT<...>::Activate: this 0x02F1DA68, InterfaceIdentifier {56A86895-0AD4-11CE-B03A-0020AF0BA770}, pMmDevice 0x0054E7F8
SystemDeviceEnumeratorSpy.h(815): CSystemDeviceEnumeratorSpyT<...>::Activate: nActivateResult 0x00000000 
SystemDeviceEnumeratorSpy.h(673): CSystemDeviceEnumeratorSpyT<...>::~CSystemDeviceEnumeratorSpyT: this 0x02F1DA68

Download links

DirectShowSpy: Automated Media Sample Traces, Load Trace from File

Some time ago DirectShowSpy received a capability to record media sample traces (in specific points of the pipeline). DirectShowSpy UI visualized the chronology of the streaming which facilitated detection of various problems, especially out of order media samples, lost or late delivered samples, thread racing conditions.

A developer can easily integrate his filters with DirectShowSpy media sample tracing, and example of this is a fork of MP4 filters where this powerful option is integrated.

This time DirectShowSpy is improved to offer automated traces. Typical use scenario is automatic creation of media sample trace file so that it could be logged/attached to specific export process and be available for further review. More specifically, an application could run a transcoding session and generate a detailed media sample trace alogn with the output in order to troubleshoot specific issues (dropped frame etc).

DirectShowSpy’s IFilterGraphHelper (implemented by FilterGraphHelper class) object received new additional methods:

interface IFilterGraphHelper : IDispatch
{
    [...]
    [id(7)] HRESULT ResetMediaSampleTrace([in] VARIANT vProcessIdentifier);
    [id(8)] HRESULT LockMediaSampleTrace([out, retval] IUnknown** ppLockUnknown);
    [id(9)] HRESULT GetMediaSampleTrace([in] VARIANT vProcessIdentifier, [out, retval] BSTR* psText);

ResetMediaSampleTrace clears currently recorded in-memory trace. The method is similar to pressing Backspace key in media sample trace UI. Optional argument allows cleanup for specific process (not implemented, everything is reset).

LockMediaSampleTrace creates a COM object which internally references newly created traces so that they don’t get freed with the respective filter graph release. Traces are owned by respective filter graphs and even though are accessible across process boundaries they are disposed with the release of graphs-creators. Unless anyone else accesses and references the traces, in which case they are available for review even is originating graph is already gone. This is the behavior of media sample trace UI, which is automatically referencing new traces as long as UI is visible. Developer can review traces for graphs already terminated (even crashed!). LockMediaSampleTrace method returns an interface pointer which owns an internals thread monitoring and referencing new traces. Once the interface pointer is releases, all referenced traces are released as well.

GetMediaSampleTrace obtains a copy of media sample trace for all or for specific process, in multi-line TSV format similar to text created by user interactively when data is copied to clipboard or saved into file with the help of UI.

All together, a controlling application can create a trace automatically using the following code:

#import "libid:B9EC374B-834B-4DA9-BFB5-C1872CE736FF" raw_interfaces_only // AlaxInfoDirectShowSpy

// ...

CComPtr<AlaxInfoDirectShowSpy::IFilterGraphHelper> pFilterGraphHelper;
pFilterGraphHelper.CoCreateInstance(__uuidof(AlaxInfoDirectShowSpy::FilterGraphHelper));
ULONGLONG nAlaxInfoDirectShowSpyFileVersion = 0;
CComPtr<IUnknown> pMediaSampleTraceLock;
if(pFilterGraphHelper)
{
    const CComQIPtr<AlaxInfoDirectShowSpy::IModuleVersionInformation> pModuleVersionInformation = pFilterGraphHelper;
    if(pModuleVersionInformation)
        _V(pModuleVersionInformation->get_FileVersion((LONGLONG*) &nAlaxInfoDirectShowSpyFileVersion));
    if(nAlaxInfoDirectShowSpyFileVersion >= _VersionInfoHelper::MakeVersion(1, 0, 0, 2060)) // Available in 1.0.0.2060+
    {
        _V(pFilterGraphHelper->ResetMediaSampleTrace(CComVariant((LONG) GetCurrentProcessId())));
        _V(pFilterGraphHelper->LockMediaSampleTrace(&pMediaSampleTraceLock));
    }
}

// DirectShow filter graph runs go here

if(pFilterGraphHelper && nAlaxInfoDirectShowSpyFileVersion >= _VersionInfoHelper::MakeVersion(1, 0, 0, 2060))
    _ATLTRY
    {
        CComBSTR sText;
        __C(pFilterGraphHelper->GetMediaSampleTrace(CComVariant((LONG) GetCurrentProcessId()), &sText));
        if(sText && *sText)
        {
            CPath sPath = Combine(m_sOutputDirectory, _T("MediaSampleTrace.tsv"));
            CAtlFile File;
            __C(File.Create(sPath, GENERIC_WRITE, FILE_SHARE_READ, CREATE_ALWAYS));
            // SUGG: UTF-8 BOM
            CStringA sTextA = Utf8StringFromString(CStringW(sText));
            __C(File.Write(sTextA, (DWORD) sTextA.GetLength()));
        }
    }
    _ATLCATCHALL()
    {
        _Z_EXCEPTION();
    }

Additionally, interactive UI for reviewing media sampla traces (DoMediaSampleTracePropertySheetModal DLL export, see respective BAT file) reviewed a new command to load external TSV file (such as generated with the help of code above!) for review:

DirectShowSpy Media Sample Trace

Once data loaded from file, disabled Refresh link indicates use of non-live data. Context menu command offers option to switch back to live data.

Download links

Microsoft Developer Web Service Day

MSDN Forums bug today earlier was nothing. Here is a new one with Microsoft Connect. A StackOverflow poster found a bug and created a feedback item with MS Connect:

MS Connect before Sign In

Available to anonymous, however invisible to authenticated user after sign in:

MS Connect after Sign In

The content that you requested cannot be found or you do not have permission to view it.

It should be a permission thing, I suppose.

Video for Windows API and Multiple Cameras

A StackOverflow question (already deleted) asked about use of indices when referencing Video for Windows (VFW) capture devices such as in capGetDriverDescription API and other. Video capture with Video for Windows allowed use of up to 10 devices (did anyone have that many at that time?). The numbering was API specific and at the latest stage the documentation described it as:

Plug-and-Play capture drivers are enumerated first, followed by capture drivers listed in the registry, which are then followed by capture drivers listed in SYSTEM.INI.

Even though it is a legacy API, and the API was really really simple and limited in capabilities, it still exists in all Windows versions, there is still some code running, and due to complexity of modern APIs some people still use it in VB.NET and C# projects.

There is, however, a trap involved if someone attempts to use multiple cameras using VFW. VFW drivers are no longer developed since… Let us see what VirtualDub says about dates and how ancient they are:

The newer type of video capture driver in Windows uses the Windows Driver Model (WDM), which was introduced in Windows 98 and 2000. The Microsoft DirectShow API is the primary API to use these drivers. Because the DirectShow API supports a larger variety of commands and settings than VFW, the functionality set of a WDM driver is significantly improved.

DirectShow is a much more complex API than VFW, however, and WDM-model drivers historically have been a lot less stable than their VFW counterparts. It is not unusual to see problems such as capture applications that cannot be closed, because their program execution is stuck in the capture driver. WDM is the proscribed driver model going forward, however, so the situation should improve over time.

All new drivers were WDM drivers for 15+ years. In order to provide backward compatibility later between VFW and WDM, Microsoft came out with Microsoft WDM Image Capture (Win32) driver. Windows versions up to Windows 10 include it, and it is the only VFW driver in the system. In turn, it manages one of existing WDM-driver devices of choice and exposes video capture functionality to VFW applications. If there are two or more WDM drivers, the VFW driver offers to choose between the devices.

VFW Capture Source Dialog

The screenshot displays a long standing bug with this driver: it offers choices of all registered DirectShow video capture devices (it enumerates CLSID_VideoInputDeviceCategory category) and reality is that it can only work with WDM devices and not other (more on this below).

VirtualDub has a mention of this driver as well:

If you have a Windows Driver Model (WDM) driver installed, you may also have an entry in the device list called Microsoft WDM Image Capture (Win32) (VFW). This entry comes from a Microsoft driver called VFWWDM32 and is a wrapper that allows WDM-model drivers to be used through the older Video for Windows (VFW) API. The WDM driver that is adapted can be selected through the Video Source driver dialog.

There are unfortunately some quirks in the way this adapter works, and some video capture devices will work erratically or not at all through this wrapper. Device settings not accessible through VFW will also still not be available when using it. If possible, use the capture device directly in DirectShow mode rather than using the VFWWDM32 driver.

This is works pretty nice with VFW API and applications. Even though the are all ancient and deprecated years ago, the system still has bridge to newer devices and applications can leverage their functionality. The problem is that there is only one VFW driver, and its index is zero. If you need two cameras you’re busted.

VFWWDM32 itself does not use any system exclusive resources and there is no reason why its different instances could not be configured with different WDM devices. However, VFWWDM32 is a simple old wrapper, either thread unsafe or such as implemented as singleton. People complain the operation with two cameras is impossible or unstable. It is still possible to run two different processes (such as, for example, VirtualDub) with two completely different VFWWDM32’s which do not interfere because of process boundary and run fine. WDM device is selected using capDlgVideoSource interactively, developers had hard time to do selection programmatically.

The interesting part is how VFWWDM32 does video capture using WDM. It is a cut corner in development: instead of doing simple DriectShow graph with Source –> Renderer, or Source –> Sample Grabber -> Renderer topology, where the wrapper would easily support all DriectShow video devices, including virtual, they decided to implement it this way:

VFWWDM Filter Graph

One-filter graph, where the filter is the WDM Video Capture Filter for the device in question.

  • the graph is CLSID_FilterGraphPrivateThread type, *FINALLY* it is found what this undocumented flavor of DirectShow filter graph is used for
  • source filter output pins are not terminated, not connected to anything else
  • the graph is never run, produces VFW output in stopped state

Instead, VFWWDM32 uses some private undocumented communication to the WDM filter internals to run the device and receive frames.

Bottom line: VFW is a backward compatibility layer now on top DirectShow. DirectShow and Media Foundation both use WDM drivers to access video capture devices. Artificial constrain caused by simplistic implementation of VFWWDM driver is a limit of one video camera per process at a time.