Understanding Your DirectShow Filter Graph

Many questions in DirectShow development are caused by lack of developer’s understanding what topology his code effectively built. Intelligent Connect and RenderXxx methods help adding and connecting filters and in the end a developer does not have a faintest idea what the pipeline looks like.

DirectShow API provides methods to enumerate filters, pins, connection and obtained detailed information about the filter graph. The API is well-documented. Then Windows SDK is shipped with GraphEdit which helps building graphs interactively. Ability to publish a graph on ROT and review it from GraphEdit is nothing but powerful. And then we have GraphStudioNext which makes everything even more convenient.

This does not seem sufficient and clear as many new questions and misunderstanding show that developers have false assumptions on graphs their applications use.

DirectShowSpy goes one step further with debugging options. With DirectShowSpy one can embed reviewing UI right into the developed application and either generate detailed textual description of filters, connections, media types as well as pass filter graph to GraphEdit/GraphStudioNext for interactive review with visualized topology. No excuses left any longer for misunderstanding built topologies.

Steps below explain in detail how to visualize your application DirectShow filter graph and generate a textual report on graph details.

1. For starters, one needs to intall DirectShowSpy in target system. Standard installation is mentioned in original post.

  • It is necessary that DirectShowSpy of correct/matching bitness is installed. 32-bit applications use 32-bit DirectShowSpy and 64-bit applications – 64-bit DirectShowSpy. .NET applications built as “Any CPU” are effectively either 32 or 64 bit processes and respectively need a matching spy as well.
  • To cut long story short, simply download DirectShow*.* from Toolbox and use DirectShowSpy-Win32-reg-ui.bat or DirectShowSpy-x64-reg-ui.bat to pop up registration UI. You need local administrator privileges for the registration step (or spy is usable through COM otherwise but it’s beyond scope of this post).

2. DirectShowSpy’s FilterGraphHelper object (already mentioned earlier) offers DoPropertyFrameModal method to pop up diagnostic UI. The helper needs prior initialization with either graph, filter or pin interface. C++ code snippet:

#import "libid:B9EC374B-834B-4DA9-BFB5-C1872CE736FF" raw_interfaces_only // AlaxInfoDirectShowSpy
// ...
CComPtr<IFilterGraph2> pFilterGraph;
// ...
CComPtr<AlaxInfoDirectShowSpy::IFilterGraphHelper> pFilterGraphHelper;
ATLENSURE_SUCCEEDED(pFilterGraphHelper.CoCreateInstance(__uuidof(AlaxInfoDirectShowSpy::FilterGraphHelper)));
ATLENSURE_SUCCEEDED(pFilterGraphHelper->put_FilterGraph(pFilterGraph));
ATLENSURE_SUCCEEDED(pFilterGraphHelper->DoPropertyFrameModal(NULL));

C# code snippet:

IFilterGraph2 graph = new FilterGraph() as IFilterGraph2;
// ...
FilterGraphHelper helper = new FilterGraphHelper();
helper.FilterGraph = graph;
helper.DoPropertyFrameModal(0);

Downloadable sample projects (FilterGraphHelperDialog for C# and FilterGraphHelperDialog2 for C++) are available in Subversion repository or Trac.

3. DoPropertyFrameModal methods opens a window (it’s argument is parent window handle, optional) with details about the graph, including copyable diagnostic text, filters and their property pages all gathered in single window.

FilterGraphHelper.DoPropertyFrameModal UI

NOTE: With root tree element “Filters” selected, the right-side pane contains the text that provides filter graph description (see image above)!

4. Additionally, it is possible to launch GraphEdit/GraphStudioNext with a hotkey and open – through ROT – the graph visually.

FilterGraphHelper.DoPropertyFrameModal UI (Actions)

Remote Graph in GraphStudioNext

This requires that Windows SDK proppage.dll is available. It is normally registered with Windows SDK, and otherwise can be copied from SDK into target system and COM-registered using regsvr32. Or copied into the folder of DirectShowSpy in which case DirectShowSpy-Win32-reg-ui.bat (see item 1 above) file will see it and offer additional property page for registration.

5. When no longer needed, DirectShowSpy can be removed from system using the batch file mentioned above in item 1.

Whatever debugging you do with DirectShow filter graph, you need a complete understanding what filter graph you deal with. If you want to provide additional information to certain DirectShow related question, a copy/pasted diagnostic information needs to be attached to such question so that others understand what you are dealing with exactly.

Video Capture Issues with Windows 10 Anniversary Update

Windows 10 Anniversary Update brought a breaking change that removed, in many cases, hardware compressed video formats from video capture APIs, even though the devices themselves are known to have respective capabilities.

The magic of Logitech C930e camera is also lost since it is no longer available for video capture in hardware compressed H.264.

Presumably an issue in middle layer, such as Kernel Streaming, it is reflected by missing support for some of video device formats, esp. Motion JPEG (FourCC MJPG) and H.264 (FourCC H264). The depth of the issue also results in effect that capabilities are equally missing from both DirectShow and Media Foundation APIs.

Unless someone wants to develop a driver talking to hardware directly bypassing Kernel Streaming subsystem (such as, for example, “DeckLink Video Capture” for Blackmagic Design devices leveraging SDK and not the complementary WDM driver), it seems that the only solution is to wait for the fix from Microsoft. Rather sooner than later it will be here taking into consideration the scope of the problem.

Rather, try to stop automatic updates if video capture is important for you.

See also:

UPDATE: Mike from MSFT commented in this thread:

Hey guys, Mike from the Camera team here. We saw this concern pop up for the first time a couple of days ago, and we’d like to understand exactly what difficulties you are all facing. There are two items that I would like the discussion to focus on.

Firstly, the media types supported on a given machine vary depending on the capabilities of the camera device you’re using. There is no guarantee that a media type such as MJPEG will be supported on all cameras (for example, the Surface Pro 4 / Surface Book cameras do not support it). This means that, by taking a hard dependency on it being available, you’re limiting the portability of your application to the set of cameras that do offer that media type. What applications should do instead, is query the available formats (see: IMFSourceReader::GetNativeMediaType for MediaFoundation, IAMStreamConfig::GetStreamCaps or the IPin::EnumMediaTypes article for DirectShow), and make a selection based on the one that best suits your needs or capabilities.

Secondly, we are expecting that almost all clients using MJPEG frames will be decoding the stream as one of the first steps in their pipeline. In order for this to be done in the most efficient way, with the smallest impact to overall system performance, we want to offload that piece of the process to be taken care of by the platform. This means your application should be able to consume the uncompressed frames, which will have been decoded to NV12 for MediaFoundation applications, and YUY2 for DirectShow applications.

If there are any other issues that you’d like to bring to our attention, please do so. We’d love to understand how we can best help you through this transition, and we’ll be considering your feedback for future planning. Thanks!

Untweetable Video

Two H.264 MP4 files, close one to another. The files are playable, a sort of: Windows desktop players (except Media Foundation based), QuickTime, Android and iOS devices play them. The files are not flawless but make sense, hence the glitches.

There is a problem with the second file, which is rejected by Twitter “Your media file could not be processed.”. The file is treated well by browser, and uploaded to remote server. Twitter is unable to convert it on its backend.

 

Little known DirectShow VMR-7 snapshot problem

There is so little information about this problem (not really a bug, rather a miscalculation) out there because it is coming up with customized Video Mixing Renderer Filter 7 and there is no problem with straightforward use.

In windowless mode the renderer is accepting media samples and displays them as configured. IVMRWindowlessControl::GetCurrentImage method is available to grab currently presented image and obtain a copy of what is displayed at the moment – the snapshot. The renderer is doing a favor and converts it to RGB, and the interface method is widely misused as a way to access uncompressed video frame, esp. in format compatible with other APIs or saving to bitmap (a related earlier post: How To: Save image to BMP file from IBasicVideo or VMR windowless interface).

One of the problems with the method is that it reads back from video memory, which is – in some configurations – an extremely expensive operation and is simply unacceptable because of its impact overall.

This time, however, I am posting another issue. By default VMR-7 is offering a memory allocator of one media sample. It accepts a new frame and then blits it into video device. Simple. With higher resolutions, higher frame rates and in the same time having VMR-7 as a legacy API working through compatibility layers, we are getting into situation that this presentation method becomes a bottleneck. We cannot pre-load next video frame before getting back from presentation call. For 60 frames/second video this means that with any congestion 17 millisecond long we might miss a chance to present next video frame of a video stream. Virtual artifact and these things are perceptible.

An efficient solution to address this problem is to increase number of buffers in video renderer’s memory allocator, and then fill buffers asynchronously. This does work well: we fill the buffers well in advance, the costly operation does not have to complete within frame presentation time frame. Pushing media pipeline pre-loads video buffers in efficient way and then video renderer simply grabs out of the queue a prepared frame and presents it. Terrific!

The video renderer’s input is thus a queue of media samples. It keeps popping and presenting them matching their time stamps against presentation clock waiting respective time. Now let us have a look at snapshot method signature:

HRESULT GetCurrentImage(
  [out] BYTE **lpDib
);

We have an image, that’s good and now the problem is that it is not clear which sample from the queue this image corresponds to. VMR-7 does not report associated time stamp even though it has this information. The problem is that it could have accepted a frame already and returned control, but presentation is only scheduled and the caller cannot derive the time stamp even from the fact that renderer filter completed the delivery call.

Video Mixing Renderer 9 is presumably subject to the same problem.

In constrast, EVR method’s IMFVideoDisplayControl::GetCurrentImage call is already:

HRESULT GetCurrentImage(
  [in, out] BITMAPINFOHEADER *pBih,
  [out]     BYTE             **pDib,
  [out]     DWORD            *pcbDib,
  [in, out] LONGLONG         *pTimeStamp
);

That is, at some point someone asked the right question: “So we have the image, where is time stamp?”.

Presumably, VMR-7 custom allocator/presenter can work this problem around as presenter processes the time stamp information and can reports what standard VMR-7 does not.

Reference Signal Source: Unity 3D Integration

This is not really practical – just some fun stuff and attempt to make use of a readily available code snippet. So… reference signal in Unity 3D game engine through Media Foundation and Direct3D 11.

Reference signal in Unity 3D

It is basically Low-Level Native Plugin example project (limited to Direct3D 11 and Visual Studio 2015), which works together with DirectShowReferenceSource-x64.dll doing a few things:

  1. commented out standard plugin example inner loop rendering
  2. plugin does COM instantiation of VideoMediaSourceTextureReader class of DirectShowReferenceSource project and passes a texture for further updates:
    • the helper class instantiates Media Foundation media source and initializes compatible resolution
    • also instantiates Source Reader to conveniently read from media source
    • starts producing video frames
  3. plugin rendering loop does IVideoMediaSourceTextureReader::Update calls passing Unity time; the helper class matches generated frames against requested time and updates texture respectively

The scene thus contains a plane with custom material updated on the backend with Media Foundation produced data.

The demo is not really about performance because so many things can be done to improve it and make it well… done right. Threading, tighter Direct3D integration, use of proper Direct2D render target etc. The demo is really about integration itself. Nevertheless, 2K texture is streamed well at full rate.

Download links

Reference Signal Source: RGB32/ARGB32 Subtypes, Media Foundation Media Source for Video

An update for Reference signal source for DirectShow DLLs:

  • the source is doing more accurately RGB subtypes and allows specification whether you want MEDIASUBSTYPE_RGB32 or MEDIASUBSTYPE_ARGB32
  • additionally the DLL implements Microsoft Media Foundation Media Source for the video stream

A more detailed description follows.

RGB32 and ARGB32 are very close and share the same byte structure, and due to minimal support of alpha channel with video, these are having the difference mostly in counterpart support in other applications, like for example and specifically hardware-assisted H.264 encoders whcih are taking alpha-enabled variant.

IVideoSourceFilter::SetMediaType method takes vCompression argument which defines the subtype. RegisterSources sample code shows how the method is used when exposing reference signal as video capture device.

Similar IVideoMediaSource::SetMediaType methods is applicable to Media Foundation counterpart (see below).

Both implementation only offer the given subtype as default, but in the same time both accept the other variant as well if an application or peer connection is trying to re-agree the media type. Same applies to changing resolution etc. The sources are flexible to take different video format if anyone is requesting it.

The other big new thing is Media Foundation API Media Source which generates reference signal as well. There is no option to set it up as a virtual camera because the API does not offer extensibility of the kind, however the source can be used to generate test content via Media Foundation and the code remains pretty simple. I am publishing MfGenerate code snippet which demonstrates the necessary steps to create an MP4 file with video, with desired properties.

A frame from generated 4096x2304 content in Windows 10 player

As Media Foundation offers H.265 (HEVC) and fragmented MP4 options, they can also be easily used with the source to generate test footage.

The code does the following steps:

  1. Creates a media source (commented out lines show alternate steps to create a media source from a file)
  2. Creates a source reader from media source
  3. Builds an H.264 media type from raw video media type
  4. Creates and configures a sink writer, which is instructed to do its magic setting up H.264 encoder (a side note – the code produces 4096×2304 video, however it is only possible once hardware encoder is enabled; software encoder was rejecting the media type)
  5. Implements a loop of reading frames until they run out feeding them into encoder/writer

High level APIs are simple (similar to DirectShow), which is untrue for the internals (similar to DirectShow; even more so).

Media Foundation source is video only for now.

MF media source is supposed to be seekable (not really tested; not really testable with topoedit), and allows zero duration to produce infinite feed. Duration is not necessarily taken from property, it can also be specified with overwritten presentation descriptor attribute. The video format can also be set up through stream descriptor media type handler.

Download links

Update – Connecting MF Media Source to MFCaptureD3D Sample application

To quickly connect MF media source to Windows SDK MFCaptureD3D Sample application, add #import and a few code lines replacing the source around CPreview::SetDevice as shown on the image below:

MFCaptureD3D update for custom media source

Intel® RealSense™ Camera in DirectShow/Media Foundation

There is an intersting submission for video capture device capabilities for “The Short-Range Intel® RealSenseâ„¢ Camera F200” camera. Another blog user earlier mentioned they have a good stock of the devices with plans to take advantage of new technology.

Intel Realsense camera ad from Intel website

It sounds like the new cameras offer new opportunities for application in user interaction, in ability to conveniently enhance user experience with things like gestures etc.

This is what the camera looks like on the software side:

  • Intel(R) RealSense(TM) 3D Camera Virtual Driver
  • Intel(R) RealSense(TM) 3D Camera (Front F200) RGB
  • Intel(R) RealSense(TM) 3D Camera (Front F200) Depth

Presumably, there are synchronized video and depth sources. It might so happen that SDK offers other presentations of the data (snapshots for combined data and combined stream?).

So what it is all about in terms of how it looks for a video capture application and APIs? The video sensor offers standard video caps and YUV 4:2:2 video stream at 60 fps at resolutions up to 960×540, higher resolutions up to 1920×1080 at 30 fps. This exceeds USB 2.0 bandwidth, so this is either USB 3.0 device or there is hardware compression, with internal software decompression. The video device does not offers compressed video feed capabilities.

There is another video source named “Depth”. It offers YUY2 feed as well as other options with fancy FourCCs (ILER, IRNI, IVNI, IZNI, RVNI, ZVNI?) which is presumably delivering depth information at 640×480@60. Respective SDK is supposedly have the formats documented.

At 60 frames per second and supposedly low latency, the data should be a source of good real-time information to track gestures and reconstruction of short range 3D scene in front of the camera.

Original DirectShow and Media Foundation capability files:

Additional in-depth information about the technology: