Comment on Video Capture Issues with Windows 10 Anniversary Update

There is a comment from MSFT’s Mike M on MSDN Forums on recent issue with compressed video capture. I am pulling it out completely as a quote below:

I’d like to start off by providing you guys a little more context on the behavior you’re encountering.

One of the main reasons that Windows is decoding MJPEG for your applications is because of performance. With the Anniversary Update to Windows 10, it is now possible for multiple applications to access the camera in ways that weren’t possible before. It was important for us to enable concurrent camera access, so Windows Hello, Microsoft Hololens and other products and features could reliably assume that the camera would be available at any given time, regardless of what other applications may be accessing it. One of the reasons this led to the MJPEG decoding is because we wanted to prevent multiple applications from decoding the same stream at the same time, which would be a duplicated effort and thus an unnecessary performance hit. This can be even more noticeable or perhaps trigger error cases on in-market devices with a hardware decoder which may be limited on how many decodes can take place simultaneously. We wanted to prevent applications from unknowingly degrading the user experience due to a platform change.

The reasoning for H.264 being decoded can get a little more complicated (and I’m just learning the details myself as I talk to other members of the team), but the basics revolve around how H.264 allows for encoding parameters to be changed on the camera directly, and how in a situation where multiple applications are making use of this control path, they could interfere with each other. Regarding Roman’s concerns about Lync: both Lync and Skype are partner teams, and we stay in touch throughout the development process, so the camera functionality in those applications will continue to work.

So yes, MJPEG and H.264 being decoded / filtered out is the result of a set of features we needed to implement, and this behavior was planned, designed, tested, and flighted out to our partners and Windows Insiders around the end of January of this year.  We worked with partners to make sure their applications continued to function throughout this change, but we have done a poor job communicating this change out to you guys. We dropped the ball on that front, so I’d like to offer my apologies to you all. We’re working on getting better documentation out, to help answer any questions you may have. Of course, you can always reach out to us via these forums for specific issues, as we monitor them regularly, or file feedback using the Feedback Hub. We’re constantly collecting feedback on this and other issues, so we can better understand the impact on our application developers and customers. If you’re having issues adapting your application code to the NV12 / YUY2 media types, we’d like to support you through the changes you may need to make. To get you started, please refer to the documentation links in my previous post. If there are reasons why working with this format isn’t feasible for your project, please let me know, and I’ll present them to the rest of the team, to try and find the best solution for your case.

Dacuda and Stephan B, I’m curious about your specific situations, since you report that this change is breaking functionality for your customers. Are your customers using custom camera hardware? Is the set of supported cameras restricted by your applications? How do your applications deal with devices like the Surface Pro 4, Surface Book, or Dell Venue Pro, which wouldn’t offer the media types your applications are relying on?
I’d like to wrap up this wall of text by letting you know that your feedback here and through other channels is greatly appreciated and something that’s on our radar. We’re trying to look into what other options we can offer you to be able to improve on this for your (and our) customers, so stay tuned! I invite you to please subscribe to this thread (use the “Alert me” link at the top), and I’ll keep you guys updated on what we find. Thanks!

Basically, it’s bad news for those who consume compressed video from capture devices – the breaking change is intentional. Something is offered in exchange and I hope someone will present the platform changes in a friendly readable document. In particular, Microsoft seems to be adding VP8/9 video decoder and encoder in this new platform version (more later on that perhaps).

Understanding Your DirectShow Filter Graph

Many questions in DirectShow development are caused by lack of developer’s understanding what topology his code effectively built. Intelligent Connect and RenderXxx methods help adding and connecting filters and in the end a developer does not have a faintest idea what the pipeline looks like.

DirectShow API provides methods to enumerate filters, pins, connection and obtained detailed information about the filter graph. The API is well-documented. Then Windows SDK is shipped with GraphEdit which helps building graphs interactively. Ability to publish a graph on ROT and review it from GraphEdit is nothing but powerful. And then we have GraphStudioNext which makes everything even more convenient.

This does not seem sufficient and clear as many new questions and misunderstanding show that developers have false assumptions on graphs their applications use.

DirectShowSpy goes one step further with debugging options. With DirectShowSpy one can embed reviewing UI right into the developed application and either generate detailed textual description of filters, connections, media types as well as pass filter graph to GraphEdit/GraphStudioNext for interactive review with visualized topology. No excuses left any longer for misunderstanding built topologies.

Steps below explain in detail how to visualize your application DirectShow filter graph and generate a textual report on graph details.

1. For starters, one needs to intall DirectShowSpy in target system. Standard installation is mentioned in original post.

  • It is necessary that DirectShowSpy of correct/matching bitness is installed. 32-bit applications use 32-bit DirectShowSpy and 64-bit applications – 64-bit DirectShowSpy. .NET applications built as “Any CPU” are effectively either 32 or 64 bit processes and respectively need a matching spy as well.
  • To cut long story short, simply download DirectShow*.* from Toolbox and use DirectShowSpy-Win32-reg-ui.bat or DirectShowSpy-x64-reg-ui.bat to pop up registration UI. You need local administrator privileges for the registration step (or spy is usable through COM otherwise but it’s beyond scope of this post).

2. DirectShowSpy’s FilterGraphHelper object (already mentioned earlier) offers DoPropertyFrameModal method to pop up diagnostic UI. The helper needs prior initialization with either graph, filter or pin interface. C++ code snippet:

#import "libid:B9EC374B-834B-4DA9-BFB5-C1872CE736FF" raw_interfaces_only // AlaxInfoDirectShowSpy
// ...
CComPtr<IFilterGraph2> pFilterGraph;
// ...
CComPtr<AlaxInfoDirectShowSpy::IFilterGraphHelper> pFilterGraphHelper;
ATLENSURE_SUCCEEDED(pFilterGraphHelper.CoCreateInstance(__uuidof(AlaxInfoDirectShowSpy::FilterGraphHelper)));
ATLENSURE_SUCCEEDED(pFilterGraphHelper->put_FilterGraph(pFilterGraph));
ATLENSURE_SUCCEEDED(pFilterGraphHelper->DoPropertyFrameModal(NULL));

C# code snippet:

IFilterGraph2 graph = new FilterGraph() as IFilterGraph2;
// ...
FilterGraphHelper helper = new FilterGraphHelper();
helper.FilterGraph = graph;
helper.DoPropertyFrameModal(0);

Downloadable sample projects (FilterGraphHelperDialog for C# and FilterGraphHelperDialog2 for C++) are available in Subversion repository or Trac.

3. DoPropertyFrameModal methods opens a window (it’s argument is parent window handle, optional) with details about the graph, including copyable diagnostic text, filters and their property pages all gathered in single window.

FilterGraphHelper.DoPropertyFrameModal UI

NOTE: With root tree element “Filters” selected, the right-side pane contains the text that provides filter graph description (see image above)!

4. Additionally, it is possible to launch GraphEdit/GraphStudioNext with a hotkey and open – through ROT – the graph visually.

FilterGraphHelper.DoPropertyFrameModal UI (Actions)

Remote Graph in GraphStudioNext

This requires that Windows SDK proppage.dll is available. It is normally registered with Windows SDK, and otherwise can be copied from SDK into target system and COM-registered using regsvr32. Or copied into the folder of DirectShowSpy in which case DirectShowSpy-Win32-reg-ui.bat (see item 1 above) file will see it and offer additional property page for registration.

5. When no longer needed, DirectShowSpy can be removed from system using the batch file mentioned above in item 1.

Whatever debugging you do with DirectShow filter graph, you need a complete understanding what filter graph you deal with. If you want to provide additional information to certain DirectShow related question, a copy/pasted diagnostic information needs to be attached to such question so that others understand what you are dealing with exactly.

Video Capture Issues with Windows 10 Anniversary Update

Windows 10 Anniversary Update brought a breaking change that removed, in many cases, hardware compressed video formats from video capture APIs, even though the devices themselves are known to have respective capabilities.

The magic of Logitech C930e camera is also lost since it is no longer available for video capture in hardware compressed H.264.

Presumably an issue in middle layer, such as Kernel Streaming, it is reflected by missing support for some of video device formats, esp. Motion JPEG (FourCC MJPG) and H.264 (FourCC H264). The depth of the issue also results in effect that capabilities are equally missing from both DirectShow and Media Foundation APIs.

Unless someone wants to develop a driver talking to hardware directly bypassing Kernel Streaming subsystem (such as, for example, “DeckLink Video Capture” for Blackmagic Design devices leveraging SDK and not the complementary WDM driver), it seems that the only solution is to wait for the fix from Microsoft. Rather sooner than later it will be here taking into consideration the scope of the problem.

Rather, try to stop automatic updates if video capture is important for you.

See also:

UPDATE: Mike from MSFT commented in this thread:

Hey guys, Mike from the Camera team here. We saw this concern pop up for the first time a couple of days ago, and we’d like to understand exactly what difficulties you are all facing. There are two items that I would like the discussion to focus on.

Firstly, the media types supported on a given machine vary depending on the capabilities of the camera device you’re using. There is no guarantee that a media type such as MJPEG will be supported on all cameras (for example, the Surface Pro 4 / Surface Book cameras do not support it). This means that, by taking a hard dependency on it being available, you’re limiting the portability of your application to the set of cameras that do offer that media type. What applications should do instead, is query the available formats (see: IMFSourceReader::GetNativeMediaType for MediaFoundation, IAMStreamConfig::GetStreamCaps or the IPin::EnumMediaTypes article for DirectShow), and make a selection based on the one that best suits your needs or capabilities.

Secondly, we are expecting that almost all clients using MJPEG frames will be decoding the stream as one of the first steps in their pipeline. In order for this to be done in the most efficient way, with the smallest impact to overall system performance, we want to offload that piece of the process to be taken care of by the platform. This means your application should be able to consume the uncompressed frames, which will have been decoded to NV12 for MediaFoundation applications, and YUY2 for DirectShow applications.

If there are any other issues that you’d like to bring to our attention, please do so. We’d love to understand how we can best help you through this transition, and we’ll be considering your feedback for future planning. Thanks!

Untweetable Video

Two H.264 MP4 files, close one to another. The files are playable, a sort of: Windows desktop players (except Media Foundation based), QuickTime, Android and iOS devices play them. The files are not flawless but make sense, hence the glitches.

There is a problem with the second file, which is rejected by Twitter “Your media file could not be processed.”. The file is treated well by browser, and uploaded to remote server. Twitter is unable to convert it on its backend.

 

Little known DirectShow VMR-7 snapshot problem

There is so little information about this problem (not really a bug, rather a miscalculation) out there because it is coming up with customized Video Mixing Renderer Filter 7 and there is no problem with straightforward use.

In windowless mode the renderer is accepting media samples and displays them as configured. IVMRWindowlessControl::GetCurrentImage method is available to grab currently presented image and obtain a copy of what is displayed at the moment – the snapshot. The renderer is doing a favor and converts it to RGB, and the interface method is widely misused as a way to access uncompressed video frame, esp. in format compatible with other APIs or saving to bitmap (a related earlier post: How To: Save image to BMP file from IBasicVideo or VMR windowless interface).

One of the problems with the method is that it reads back from video memory, which is – in some configurations – an extremely expensive operation and is simply unacceptable because of its impact overall.

This time, however, I am posting another issue. By default VMR-7 is offering a memory allocator of one media sample. It accepts a new frame and then blits it into video device. Simple. With higher resolutions, higher frame rates and in the same time having VMR-7 as a legacy API working through compatibility layers, we are getting into situation that this presentation method becomes a bottleneck. We cannot pre-load next video frame before getting back from presentation call. For 60 frames/second video this means that with any congestion 17 millisecond long we might miss a chance to present next video frame of a video stream. Virtual artifact and these things are perceptible.

An efficient solution to address this problem is to increase number of buffers in video renderer’s memory allocator, and then fill buffers asynchronously. This does work well: we fill the buffers well in advance, the costly operation does not have to complete within frame presentation time frame. Pushing media pipeline pre-loads video buffers in efficient way and then video renderer simply grabs out of the queue a prepared frame and presents it. Terrific!

The video renderer’s input is thus a queue of media samples. It keeps popping and presenting them matching their time stamps against presentation clock waiting respective time. Now let us have a look at snapshot method signature:

HRESULT GetCurrentImage(
  [out] BYTE **lpDib
);

We have an image, that’s good and now the problem is that it is not clear which sample from the queue this image corresponds to. VMR-7 does not report associated time stamp even though it has this information. The problem is that it could have accepted a frame already and returned control, but presentation is only scheduled and the caller cannot derive the time stamp even from the fact that renderer filter completed the delivery call.

Video Mixing Renderer 9 is presumably subject to the same problem.

In constrast, EVR method’s IMFVideoDisplayControl::GetCurrentImage call is already:

HRESULT GetCurrentImage(
  [in, out] BITMAPINFOHEADER *pBih,
  [out]     BYTE             **pDib,
  [out]     DWORD            *pcbDib,
  [in, out] LONGLONG         *pTimeStamp
);

That is, at some point someone asked the right question: “So we have the image, where is time stamp?”.

Presumably, VMR-7 custom allocator/presenter can work this problem around as presenter processes the time stamp information and can reports what standard VMR-7 does not.

Reference Signal Source: Unity 3D Integration

This is not really practical – just some fun stuff and attempt to make use of a readily available code snippet. So… reference signal in Unity 3D game engine through Media Foundation and Direct3D 11.

Reference signal in Unity 3D

It is basically Low-Level Native Plugin example project (limited to Direct3D 11 and Visual Studio 2015), which works together with DirectShowReferenceSource-x64.dll doing a few things:

  1. commented out standard plugin example inner loop rendering
  2. plugin does COM instantiation of VideoMediaSourceTextureReader class of DirectShowReferenceSource project and passes a texture for further updates:
    • the helper class instantiates Media Foundation media source and initializes compatible resolution
    • also instantiates Source Reader to conveniently read from media source
    • starts producing video frames
  3. plugin rendering loop does IVideoMediaSourceTextureReader::Update calls passing Unity time; the helper class matches generated frames against requested time and updates texture respectively

The scene thus contains a plane with custom material updated on the backend with Media Foundation produced data.

The demo is not really about performance because so many things can be done to improve it and make it well… done right. Threading, tighter Direct3D integration, use of proper Direct2D render target etc. The demo is really about integration itself. Nevertheless, 2K texture is streamed well at full rate.

Download links

Reference Signal Source: RGB32/ARGB32 Subtypes, Media Foundation Media Source for Video

An update for Reference signal source for DirectShow DLLs:

  • the source is doing more accurately RGB subtypes and allows specification whether you want MEDIASUBSTYPE_RGB32 or MEDIASUBSTYPE_ARGB32
  • additionally the DLL implements Microsoft Media Foundation Media Source for the video stream

A more detailed description follows.

RGB32 and ARGB32 are very close and share the same byte structure, and due to minimal support of alpha channel with video, these are having the difference mostly in counterpart support in other applications, like for example and specifically hardware-assisted H.264 encoders whcih are taking alpha-enabled variant.

IVideoSourceFilter::SetMediaType method takes vCompression argument which defines the subtype. RegisterSources sample code shows how the method is used when exposing reference signal as video capture device.

Similar IVideoMediaSource::SetMediaType methods is applicable to Media Foundation counterpart (see below).

Both implementation only offer the given subtype as default, but in the same time both accept the other variant as well if an application or peer connection is trying to re-agree the media type. Same applies to changing resolution etc. The sources are flexible to take different video format if anyone is requesting it.

The other big new thing is Media Foundation API Media Source which generates reference signal as well. There is no option to set it up as a virtual camera because the API does not offer extensibility of the kind, however the source can be used to generate test content via Media Foundation and the code remains pretty simple. I am publishing MfGenerate code snippet which demonstrates the necessary steps to create an MP4 file with video, with desired properties.

A frame from generated 4096x2304 content in Windows 10 player

As Media Foundation offers H.265 (HEVC) and fragmented MP4 options, they can also be easily used with the source to generate test footage.

The code does the following steps:

  1. Creates a media source (commented out lines show alternate steps to create a media source from a file)
  2. Creates a source reader from media source
  3. Builds an H.264 media type from raw video media type
  4. Creates and configures a sink writer, which is instructed to do its magic setting up H.264 encoder (a side note – the code produces 4096×2304 video, however it is only possible once hardware encoder is enabled; software encoder was rejecting the media type)
  5. Implements a loop of reading frames until they run out feeding them into encoder/writer

High level APIs are simple (similar to DirectShow), which is untrue for the internals (similar to DirectShow; even more so).

Media Foundation source is video only for now.

MF media source is supposed to be seekable (not really tested; not really testable with topoedit), and allows zero duration to produce infinite feed. Duration is not necessarily taken from property, it can also be specified with overwritten presentation descriptor attribute. The video format can also be set up through stream descriptor media type handler.

Download links

Update – Connecting MF Media Source to MFCaptureD3D Sample application

To quickly connect MF media source to Windows SDK MFCaptureD3D Sample application, add #import and a few code lines replacing the source around CPreview::SetDevice as shown on the image below:

MFCaptureD3D update for custom media source