Comment on Video Capture Issues with Windows 10 Anniversary Update

There is a comment from MSFT’s Mike M on MSDN Forums on recent issue with compressed video capture. I am pulling it out completely as a quote below:

I’d like to start off by providing you guys a little more context on the behavior you’re encountering.

One of the main reasons that Windows is decoding MJPEG for your applications is because of performance. With the Anniversary Update to Windows 10, it is now possible for multiple applications to access the camera in ways that weren’t possible before. It was important for us to enable concurrent camera access, so Windows Hello, Microsoft Hololens and other products and features could reliably assume that the camera would be available at any given time, regardless of what other applications may be accessing it. One of the reasons this led to the MJPEG decoding is because we wanted to prevent multiple applications from decoding the same stream at the same time, which would be a duplicated effort and thus an unnecessary performance hit. This can be even more noticeable or perhaps trigger error cases on in-market devices with a hardware decoder which may be limited on how many decodes can take place simultaneously. We wanted to prevent applications from unknowingly degrading the user experience due to a platform change.

The reasoning for H.264 being decoded can get a little more complicated (and I’m just learning the details myself as I talk to other members of the team), but the basics revolve around how H.264 allows for encoding parameters to be changed on the camera directly, and how in a situation where multiple applications are making use of this control path, they could interfere with each other. Regarding Roman’s concerns about Lync: both Lync and Skype are partner teams, and we stay in touch throughout the development process, so the camera functionality in those applications will continue to work.

So yes, MJPEG and H.264 being decoded / filtered out is the result of a set of features we needed to implement, and this behavior was planned, designed, tested, and flighted out to our partners and Windows Insiders around the end of January of this year.  We worked with partners to make sure their applications continued to function throughout this change, but we have done a poor job communicating this change out to you guys. We dropped the ball on that front, so I’d like to offer my apologies to you all. We’re working on getting better documentation out, to help answer any questions you may have. Of course, you can always reach out to us via these forums for specific issues, as we monitor them regularly, or file feedback using the Feedback Hub. We’re constantly collecting feedback on this and other issues, so we can better understand the impact on our application developers and customers. If you’re having issues adapting your application code to the NV12 / YUY2 media types, we’d like to support you through the changes you may need to make. To get you started, please refer to the documentation links in my previous post. If there are reasons why working with this format isn’t feasible for your project, please let me know, and I’ll present them to the rest of the team, to try and find the best solution for your case.

Dacuda and Stephan B, I’m curious about your specific situations, since you report that this change is breaking functionality for your customers. Are your customers using custom camera hardware? Is the set of supported cameras restricted by your applications? How do your applications deal with devices like the Surface Pro 4, Surface Book, or Dell Venue Pro, which wouldn’t offer the media types your applications are relying on?
I’d like to wrap up this wall of text by letting you know that your feedback here and through other channels is greatly appreciated and something that’s on our radar. We’re trying to look into what other options we can offer you to be able to improve on this for your (and our) customers, so stay tuned! I invite you to please subscribe to this thread (use the “Alert me” link at the top), and I’ll keep you guys updated on what we find. Thanks!

Basically, it’s bad news for those who consume compressed video from capture devices – the breaking change is intentional. Something is offered in exchange and I hope someone will present the platform changes in a friendly readable document. In particular, Microsoft seems to be adding VP8/9 video decoder and encoder in this new platform version (more later on that perhaps).

Understanding Your DirectShow Filter Graph

Many questions in DirectShow development are caused by lack of developer’s understanding what topology his code effectively built. Intelligent Connect and RenderXxx methods help adding and connecting filters and in the end a developer does not have a faintest idea what the pipeline looks like.

DirectShow API provides methods to enumerate filters, pins, connection and obtained detailed information about the filter graph. The API is well-documented. Then Windows SDK is shipped with GraphEdit which helps building graphs interactively. Ability to publish a graph on ROT and review it from GraphEdit is nothing but powerful. And then we have GraphStudioNext which makes everything even more convenient.

This does not seem sufficient and clear as many new questions and misunderstanding show that developers have false assumptions on graphs their applications use.

DirectShowSpy goes one step further with debugging options. With DirectShowSpy one can embed reviewing UI right into the developed application and either generate detailed textual description of filters, connections, media types as well as pass filter graph to GraphEdit/GraphStudioNext for interactive review with visualized topology. No excuses left any longer for misunderstanding built topologies.

Steps below explain in detail how to visualize your application DirectShow filter graph and generate a textual report on graph details.

1. For starters, one needs to intall DirectShowSpy in target system. Standard installation is mentioned in original post.

  • It is necessary that DirectShowSpy of correct/matching bitness is installed. 32-bit applications use 32-bit DirectShowSpy and 64-bit applications – 64-bit DirectShowSpy. .NET applications built as “Any CPU” are effectively either 32 or 64 bit processes and respectively need a matching spy as well.
  • To cut long story short, simply download DirectShow*.* from Toolbox and use DirectShowSpy-Win32-reg-ui.bat or DirectShowSpy-x64-reg-ui.bat to pop up registration UI. You need local administrator privileges for the registration step (or spy is usable through COM otherwise but it’s beyond scope of this post).

2. DirectShowSpy’s FilterGraphHelper object (already mentioned earlier) offers DoPropertyFrameModal method to pop up diagnostic UI. The helper needs prior initialization with either graph, filter or pin interface. C++ code snippet:

#import "libid:B9EC374B-834B-4DA9-BFB5-C1872CE736FF" raw_interfaces_only // AlaxInfoDirectShowSpy
// ...
CComPtr<IFilterGraph2> pFilterGraph;
// ...
CComPtr<AlaxInfoDirectShowSpy::IFilterGraphHelper> pFilterGraphHelper;

C# code snippet:

IFilterGraph2 graph = new FilterGraph() as IFilterGraph2;
// ...
FilterGraphHelper helper = new FilterGraphHelper();
helper.FilterGraph = graph;

Downloadable sample projects (FilterGraphHelperDialog for C# and FilterGraphHelperDialog2 for C++) are available in Subversion repository or Trac.

3. DoPropertyFrameModal methods opens a window (it’s argument is parent window handle, optional) with details about the graph, including copyable diagnostic text, filters and their property pages all gathered in single window.

FilterGraphHelper.DoPropertyFrameModal UI

NOTE: With root tree element “Filters” selected, the right-side pane contains the text that provides filter graph description (see image above)!

4. Additionally, it is possible to launch GraphEdit/GraphStudioNext with a hotkey and open – through ROT – the graph visually.

FilterGraphHelper.DoPropertyFrameModal UI (Actions)

Remote Graph in GraphStudioNext

This requires that Windows SDK proppage.dll is available. It is normally registered with Windows SDK, and otherwise can be copied from SDK into target system and COM-registered using regsvr32. Or copied into the folder of DirectShowSpy in which case DirectShowSpy-Win32-reg-ui.bat (see item 1 above) file will see it and offer additional property page for registration.

5. When no longer needed, DirectShowSpy can be removed from system using the batch file mentioned above in item 1.

Whatever debugging you do with DirectShow filter graph, you need a complete understanding what filter graph you deal with. If you want to provide additional information to certain DirectShow related question, a copy/pasted diagnostic information needs to be attached to such question so that others understand what you are dealing with exactly.

Little known DirectShow VMR-7 snapshot problem

There is so little information about this problem (not really a bug, rather a miscalculation) out there because it is coming up with customized Video Mixing Renderer Filter 7 and there is no problem with straightforward use.

In windowless mode the renderer is accepting media samples and displays them as configured. IVMRWindowlessControl::GetCurrentImage method is available to grab currently presented image and obtain a copy of what is displayed at the moment – the snapshot. The renderer is doing a favor and converts it to RGB, and the interface method is widely misused as a way to access uncompressed video frame, esp. in format compatible with other APIs or saving to bitmap (a related earlier post: How To: Save image to BMP file from IBasicVideo or VMR windowless interface).

One of the problems with the method is that it reads back from video memory, which is – in some configurations – an extremely expensive operation and is simply unacceptable because of its impact overall.

This time, however, I am posting another issue. By default VMR-7 is offering a memory allocator of one media sample. It accepts a new frame and then blits it into video device. Simple. With higher resolutions, higher frame rates and in the same time having VMR-7 as a legacy API working through compatibility layers, we are getting into situation that this presentation method becomes a bottleneck. We cannot pre-load next video frame before getting back from presentation call. For 60 frames/second video this means that with any congestion 17 millisecond long we might miss a chance to present next video frame of a video stream. Virtual artifact and these things are perceptible.

An efficient solution to address this problem is to increase number of buffers in video renderer’s memory allocator, and then fill buffers asynchronously. This does work well: we fill the buffers well in advance, the costly operation does not have to complete within frame presentation time frame. Pushing media pipeline pre-loads video buffers in efficient way and then video renderer simply grabs out of the queue a prepared frame and presents it. Terrific!

The video renderer’s input is thus a queue of media samples. It keeps popping and presenting them matching their time stamps against presentation clock waiting respective time. Now let us have a look at snapshot method signature:

HRESULT GetCurrentImage(
  [out] BYTE **lpDib

We have an image, that’s good and now the problem is that it is not clear which sample from the queue this image corresponds to. VMR-7 does not report associated time stamp even though it has this information. The problem is that it could have accepted a frame already and returned control, but presentation is only scheduled and the caller cannot derive the time stamp even from the fact that renderer filter completed the delivery call.

Video Mixing Renderer 9 is presumably subject to the same problem.

In constrast, EVR method’s IMFVideoDisplayControl::GetCurrentImage call is already:

HRESULT GetCurrentImage(
  [in, out] BITMAPINFOHEADER *pBih,
  [out]     BYTE             **pDib,
  [out]     DWORD            *pcbDib,
  [in, out] LONGLONG         *pTimeStamp

That is, at some point someone asked the right question: “So we have the image, where is time stamp?”.

Presumably, VMR-7 custom allocator/presenter can work this problem around as presenter processes the time stamp information and can reports what standard VMR-7 does not.

Reference Signal Source: RGB32/ARGB32 Subtypes, Media Foundation Media Source for Video

An update for Reference signal source for DirectShow DLLs:

  • the source is doing more accurately RGB subtypes and allows specification whether you want MEDIASUBSTYPE_RGB32 or MEDIASUBSTYPE_ARGB32
  • additionally the DLL implements Microsoft Media Foundation Media Source for the video stream

A more detailed description follows.

RGB32 and ARGB32 are very close and share the same byte structure, and due to minimal support of alpha channel with video, these are having the difference mostly in counterpart support in other applications, like for example and specifically hardware-assisted H.264 encoders whcih are taking alpha-enabled variant.

IVideoSourceFilter::SetMediaType method takes vCompression argument which defines the subtype. RegisterSources sample code shows how the method is used when exposing reference signal as video capture device.

Similar IVideoMediaSource::SetMediaType methods is applicable to Media Foundation counterpart (see below).

Both implementation only offer the given subtype as default, but in the same time both accept the other variant as well if an application or peer connection is trying to re-agree the media type. Same applies to changing resolution etc. The sources are flexible to take different video format if anyone is requesting it.

The other big new thing is Media Foundation API Media Source which generates reference signal as well. There is no option to set it up as a virtual camera because the API does not offer extensibility of the kind, however the source can be used to generate test content via Media Foundation and the code remains pretty simple. I am publishing MfGenerate code snippet which demonstrates the necessary steps to create an MP4 file with video, with desired properties.

A frame from generated 4096x2304 content in Windows 10 player

As Media Foundation offers H.265 (HEVC) and fragmented MP4 options, they can also be easily used with the source to generate test footage.

The code does the following steps:

  1. Creates a media source (commented out lines show alternate steps to create a media source from a file)
  2. Creates a source reader from media source
  3. Builds an H.264 media type from raw video media type
  4. Creates and configures a sink writer, which is instructed to do its magic setting up H.264 encoder (a side note – the code produces 4096×2304 video, however it is only possible once hardware encoder is enabled; software encoder was rejecting the media type)
  5. Implements a loop of reading frames until they run out feeding them into encoder/writer

High level APIs are simple (similar to DirectShow), which is untrue for the internals (similar to DirectShow; even more so).

Media Foundation source is video only for now.

MF media source is supposed to be seekable (not really tested; not really testable with topoedit), and allows zero duration to produce infinite feed. Duration is not necessarily taken from property, it can also be specified with overwritten presentation descriptor attribute. The video format can also be set up through stream descriptor media type handler.

Download links

Update – Connecting MF Media Source to MFCaptureD3D Sample application

To quickly connect MF media source to Windows SDK MFCaptureD3D Sample application, add #import and a few code lines replacing the source around CPreview::SetDevice as shown on the image below:

MFCaptureD3D update for custom media source

Intel® RealSense™ Camera in DirectShow/Media Foundation

There is an intersting submission for video capture device capabilities for “The Short-Range Intel® RealSense™ Camera F200” camera. Another blog user earlier mentioned they have a good stock of the devices with plans to take advantage of new technology.

Intel Realsense camera ad from Intel website

It sounds like the new cameras offer new opportunities for application in user interaction, in ability to conveniently enhance user experience with things like gestures etc.

This is what the camera looks like on the software side:

  • Intel(R) RealSense(TM) 3D Camera Virtual Driver
  • Intel(R) RealSense(TM) 3D Camera (Front F200) RGB
  • Intel(R) RealSense(TM) 3D Camera (Front F200) Depth

Presumably, there are synchronized video and depth sources. It might so happen that SDK offers other presentations of the data (snapshots for combined data and combined stream?).

So what it is all about in terms of how it looks for a video capture application and APIs? The video sensor offers standard video caps and YUV 4:2:2 video stream at 60 fps at resolutions up to 960×540, higher resolutions up to 1920×1080 at 30 fps. This exceeds USB 2.0 bandwidth, so this is either USB 3.0 device or there is hardware compression, with internal software decompression. The video device does not offers compressed video feed capabilities.

There is another video source named “Depth”. It offers YUY2 feed as well as other options with fancy FourCCs (ILER, IRNI, IVNI, IZNI, RVNI, ZVNI?) which is presumably delivering depth information at 640×480@60. Respective SDK is supposedly have the formats documented.

At 60 frames per second and supposedly low latency, the data should be a source of good real-time information to track gestures and reconstruction of short range 3D scene in front of the camera.

Original DirectShow and Media Foundation capability files:

Additional in-depth information about the technology:

DirectShowSpy: Restore default system behavior

There was a problem reported for registered and relocated DirectShowSpy, which might be causing issues: Deleting faulty DirectShowSpy registry key.

Some users that use a 3rd party tool called DirectShowSpy may encounter errors when logging in to XSplit.

This can be caused by a fault registry key that is introduced when DirectShowSpy is registered to intercept Filter Graph initialization — Filter Graph is used by XSplit. The faulty DirectShowSpy registry key is usually caused by DirectShowSpy program begin relocated after registration.

To workaround this situation, XSplit1 detects the presence of HKEYCLASSESROOT\CLSID{E436EBB3-524F-11CE-9F53-0020AF0BA770}\TreatAs registry key2 when it fails to initialize Filter Graph and exits when it is found. In this case, user must manually correct the DirectShowSpy registration or delete3 the registry key. Only after either is done can XSplit be restarted.

The description of the problem is good, solution is good but incomplete.

DirectShowSpy intercepts a few COM classes, not just one, and removing single registry value is only a partial fix.

DirectShowSpy.dll exports UnregisterTreatAsClasses function to accurately restore operation of system classes. It does registry permission magic and updates all COM classes involved. Default unregistration (DllUnregisterServer, regsvr32 /u) behavior is to restore original classes only in case they are currently overridden by DirectShowSpy. That is, if the DLL is moved (deleted) the broken registrations are retained in the registry during unregistration process.

UnregisterTreatAsClasses resolved this problem by forcing recovery of original classes no matter who is overriding them at the moment.

C:\>rundll32 DirectShowSpy-Win32.dll,UnregisterTreatAsClasses
C:\>rundll32 DirectShowSpy-x64.dll,UnregisterTreatAsClasses

Video for Windows API and Multiple Cameras

A StackOverflow question (already deleted) asked about use of indices when referencing Video for Windows (VFW) capture devices such as in capGetDriverDescription API and other. Video capture with Video for Windows allowed use of up to 10 devices (did anyone have that many at that time?). The numbering was API specific and at the latest stage the documentation described it as:

Plug-and-Play capture drivers are enumerated first, followed by capture drivers listed in the registry, which are then followed by capture drivers listed in SYSTEM.INI.

Even though it is a legacy API, and the API was really really simple and limited in capabilities, it still exists in all Windows versions, there is still some code running, and due to complexity of modern APIs some people still use it in VB.NET and C# projects.

There is, however, a trap involved if someone attempts to use multiple cameras using VFW. VFW drivers are no longer developed since… Let us see what VirtualDub says about dates and how ancient they are:

The newer type of video capture driver in Windows uses the Windows Driver Model (WDM), which was introduced in Windows 98 and 2000. The Microsoft DirectShow API is the primary API to use these drivers. Because the DirectShow API supports a larger variety of commands and settings than VFW, the functionality set of a WDM driver is significantly improved.

DirectShow is a much more complex API than VFW, however, and WDM-model drivers historically have been a lot less stable than their VFW counterparts. It is not unusual to see problems such as capture applications that cannot be closed, because their program execution is stuck in the capture driver. WDM is the proscribed driver model going forward, however, so the situation should improve over time.

All new drivers were WDM drivers for 15+ years. In order to provide backward compatibility later between VFW and WDM, Microsoft came out with Microsoft WDM Image Capture (Win32) driver. Windows versions up to Windows 10 include it, and it is the only VFW driver in the system. In turn, it manages one of existing WDM-driver devices of choice and exposes video capture functionality to VFW applications. If there are two or more WDM drivers, the VFW driver offers to choose between the devices.

VFW Capture Source Dialog

The screenshot displays a long standing bug with this driver: it offers choices of all registered DirectShow video capture devices (it enumerates CLSID_VideoInputDeviceCategory category) and reality is that it can only work with WDM devices and not other (more on this below).

VirtualDub has a mention of this driver as well:

If you have a Windows Driver Model (WDM) driver installed, you may also have an entry in the device list called Microsoft WDM Image Capture (Win32) (VFW). This entry comes from a Microsoft driver called VFWWDM32 and is a wrapper that allows WDM-model drivers to be used through the older Video for Windows (VFW) API. The WDM driver that is adapted can be selected through the Video Source driver dialog.

There are unfortunately some quirks in the way this adapter works, and some video capture devices will work erratically or not at all through this wrapper. Device settings not accessible through VFW will also still not be available when using it. If possible, use the capture device directly in DirectShow mode rather than using the VFWWDM32 driver.

This is works pretty nice with VFW API and applications. Even though the are all ancient and deprecated years ago, the system still has bridge to newer devices and applications can leverage their functionality. The problem is that there is only one VFW driver, and its index is zero. If you need two cameras you’re busted.

VFWWDM32 itself does not use any system exclusive resources and there is no reason why its different instances could not be configured with different WDM devices. However, VFWWDM32 is a simple old wrapper, either thread unsafe or such as implemented as singleton. People complain the operation with two cameras is impossible or unstable. It is still possible to run two different processes (such as, for example, VirtualDub) with two completely different VFWWDM32’s which do not interfere because of process boundary and run fine. WDM device is selected using capDlgVideoSource interactively, developers had hard time to do selection programmatically.

The interesting part is how VFWWDM32 does video capture using WDM. It is a cut corner in development: instead of doing simple DriectShow graph with Source –> Renderer, or Source –> Sample Grabber -> Renderer topology, where the wrapper would easily support all DriectShow video devices, including virtual, they decided to implement it this way:

VFWWDM Filter Graph

One-filter graph, where the filter is the WDM Video Capture Filter for the device in question.

  • the graph is CLSID_FilterGraphPrivateThread type, *FINALLY* it is found what this undocumented flavor of DirectShow filter graph is used for
  • source filter output pins are not terminated, not connected to anything else
  • the graph is never run, produces VFW output in stopped state

Instead, VFWWDM32 uses some private undocumented communication to the WDM filter internals to run the device and receive frames.

Bottom line: VFW is a backward compatibility layer now on top DirectShow. DirectShow and Media Foundation both use WDM drivers to access video capture devices. Artificial constrain caused by simplistic implementation of VFWWDM driver is a limit of one video camera per process at a time.