Comment on Video Capture Issues with Windows 10 Anniversary Update

There is a comment from MSFT’s Mike M on MSDN Forums on recent issue with compressed video capture. I am pulling it out completely as a quote below:

I’d like to start off by providing you guys a little more context on the behavior you’re encountering.

One of the main reasons that Windows is decoding MJPEG for your applications is because of performance. With the Anniversary Update to Windows 10, it is now possible for multiple applications to access the camera in ways that weren’t possible before. It was important for us to enable concurrent camera access, so Windows Hello, Microsoft Hololens and other products and features could reliably assume that the camera would be available at any given time, regardless of what other applications may be accessing it. One of the reasons this led to the MJPEG decoding is because we wanted to prevent multiple applications from decoding the same stream at the same time, which would be a duplicated effort and thus an unnecessary performance hit. This can be even more noticeable or perhaps trigger error cases on in-market devices with a hardware decoder which may be limited on how many decodes can take place simultaneously. We wanted to prevent applications from unknowingly degrading the user experience due to a platform change.

The reasoning for H.264 being decoded can get a little more complicated (and I’m just learning the details myself as I talk to other members of the team), but the basics revolve around how H.264 allows for encoding parameters to be changed on the camera directly, and how in a situation where multiple applications are making use of this control path, they could interfere with each other. Regarding Roman’s concerns about Lync: both Lync and Skype are partner teams, and we stay in touch throughout the development process, so the camera functionality in those applications will continue to work.

So yes, MJPEG and H.264 being decoded / filtered out is the result of a set of features we needed to implement, and this behavior was planned, designed, tested, and flighted out to our partners and Windows Insiders around the end of January of this year.  We worked with partners to make sure their applications continued to function throughout this change, but we have done a poor job communicating this change out to you guys. We dropped the ball on that front, so I’d like to offer my apologies to you all. We’re working on getting better documentation out, to help answer any questions you may have. Of course, you can always reach out to us via these forums for specific issues, as we monitor them regularly, or file feedback using the Feedback Hub. We’re constantly collecting feedback on this and other issues, so we can better understand the impact on our application developers and customers. If you’re having issues adapting your application code to the NV12 / YUY2 media types, we’d like to support you through the changes you may need to make. To get you started, please refer to the documentation links in my previous post. If there are reasons why working with this format isn’t feasible for your project, please let me know, and I’ll present them to the rest of the team, to try and find the best solution for your case.

Dacuda and Stephan B, I’m curious about your specific situations, since you report that this change is breaking functionality for your customers. Are your customers using custom camera hardware? Is the set of supported cameras restricted by your applications? How do your applications deal with devices like the Surface Pro 4, Surface Book, or Dell Venue Pro, which wouldn’t offer the media types your applications are relying on?
I’d like to wrap up this wall of text by letting you know that your feedback here and through other channels is greatly appreciated and something that’s on our radar. We’re trying to look into what other options we can offer you to be able to improve on this for your (and our) customers, so stay tuned! I invite you to please subscribe to this thread (use the “Alert me” link at the top), and I’ll keep you guys updated on what we find. Thanks!

Basically, it’s bad news for those who consume compressed video from capture devices – the breaking change is intentional. Something is offered in exchange and I hope someone will present the platform changes in a friendly readable document. In particular, Microsoft seems to be adding VP8/9 video decoder and encoder in this new platform version (more later on that perhaps).

Microsoft Developer Web Service Day

MSDN Forums bug today earlier was nothing. Here is a new one with Microsoft Connect. A StackOverflow poster found a bug and created a feedback item with MS Connect:

MS Connect before Sign In

Available to anonymous, however invisible to authenticated user after sign in:

MS Connect after Sign In

The content that you requested cannot be found or you do not have permission to view it.

It should be a permission thing, I suppose.

“… you will never get the same high quality video experience that you find with DirectShow”

Microsoft’s James Daily wrote back in 2011 (and it’s an incredible response in the public forum from MS guy – provided that DirectShow branch of the same forum did not see anything close for 10+ years) about how technologies relate one to another:

Hey, I’m the team’s DShow expert. Trevor asked me to take a look at your post and give my two cents. From looking at the DShow code that you are using in your winforms application I just want you to be aware that by including quartz.dll as a dependency you are using the DirectShow 8 OLE automation objects. These objects have been deprecated for years and are certainly not recommended [this might perhaps be not accurate enough because generally stuff in quartz.dll is not deprecated, it’s rather orphaned and yet waits it deprecation like related stuff from qedit.dll; however the overall attitude is about right – RR]. At this time Microsoft does not have a supported solution for calling DirectShow code from C# (or any managed language). Please see the “note” at the top of the page at the link below for documented confirmation of this. Because the technology is not supported from the winforms environment it is not possible for us to suggest a supported workaround from managed code.

That said it should be possible to facilitate the functionality that you are looking for by creating a custom EVR presenter. By using a custom presenter you can get direct access to the D3D surface. You can then use the standard D3D constructs to draw directly to the same D3D surface that the EVR is using to blit the video. There are two things to keep in mind about this solution. First you must code this solution in unmanaged C++. Again this is due to the fact that DirectShow is not supported from managed code. Second, this solution is extremely complex and difficult to implement even for the most experienced DirectShow / D3D expert. Because of these two factors it is recommended that you take a serious look at the MediaElement in WPF.

As you know the WPF environment is constructed from the ground up to offer developers a very rich “graphics first” environment. The MediaElement in particular was designed to allow you to mix video with various other UI components seamlessly. This solution will give you the flicker free, “draw over video” solution that you are looking for. The best part is you can do all of this in C#. The bad part of this solution is that the MediaElement is not designed for displaying time sensitive media content. In other words, the MediaElement is notorious for dropping and delaying the display of video frames. There are ways to minimize this such as using SD rather than HD content, use a video accelerated codec, etc. However, you will never get the same high quality video experience that you find with DirectShow.

I hope this will help you understand the current shortcomings of the technologies that you have chosen and help you to focus your efforts on a fully supported and viable solution. If you need any additional clarification please let us know.

and then also:

Unfortunately you can’t really tell the WPF MediaElement to never drop frames. The term we use for this class of issues is “disparate clocks”. In this case WPF is updating the screen at a certain rate (clock 1). The MediaElement (based on WMP) is cranking out video frames at a slightly different rate (clock 2). Given the underlying technologies there is currently no way to synchronize the two clocks and force them to “tick” at the same rate. Since the display will only be updated according to the WPF clock, multiple frames of video may be sent from the MediaElement to WPF between clock ticks. Because of this the MediaElement may appear to drop frames. This is a very common problem in multimedia development and there is no simple solution.

So if you absolutely need frame accuracy in your application then using the MediaElement probably won’t work for you. That said, there are some things that you can do to improve the chances of your content dropping as few frames as possible. Modify your content so that it uses either the h.264 or VC1 codec. Require your users to have modern video HW capable of advanced video acceleration. Use the MPEG 4 or ASF file container. When encoding your content set your frame rate at or below 25 frames per second. Set the resolution of your content to something like 720×480. Set the bitrate to VBR constrained and set an upper limit of between 500 Kbps and 2.5 Mbps.

If you use the guidelines above you will minimize the number of frames that get dropped but you will never be able to completely eliminate them. Also keep in mind that the same frames may not be dropped. For example: if you play video1.asf the first time you might drop frames 200 and 375. On the next run of the same file you may drop frames 143, 678 and 901. This is due to the relatively nondeterministic nature of the Windows OS.

I hope this helps.

Another commenter responded rather angrily:

…fail to include any mention of the DirectShow.NET library. Why? And shame on them for failing to do so. This library helps you use DirectShow in a managed context. There are plenty of code samples to be found….

The answer to this, however, was given in the same thread above a couple of times and explains that the responses are limited by existing policy:

I cannot comment on 3rd party libraries.

Because the technology is not supported from the winforms environment it is not possible for us to suggest a supported workaround…

Something went terribly wrong in x64 build of Windows Media Video 9 Decoder

Unfortunately, 64-bit version of Windows Media Video 9 Decoder is not as good as its 32-bit sister. 32-bit version is user an order of magnitude more frequently and does not give troubles, nevertheless 64-bit version offers similar feature set it is pretty hard to see it in action since it takes a 64-bit media application to host it and most of media applications are 32-bit (there is often a good reason for this), and even Windows SDK 7.0 topoedit comes pre-built as Win32 application only (provided with source code though, so one can built x64 peer – after fixing buildability issues and adding x64 configuration manually).

Decoder is available as dual DMO/MFT which enabled it for both DirectShow and Media Foundation APIs and similarly exposes the problem in both as well.

Once in a while, 64-bit version of the decoder might be producing incorrect output, adding white dots where they are not supposed to be.

02

Because the problem resides supposedly in the decoder itself, it affects everything decoding Windows Media Video of this flavor, including

  • Windows Media Player 64-bit
  • GraphEdit x64 from Windows SDK
  • TopoEdit x64 from Windows SDK (built manually)

The only exception is DXVA-accelerated decoding where bug is worked around by hardware assisted decoding and the output is correct.

Image002

One of the ways to easily see and reproduce the problem in action is to re-encode the content in 64-bit version of GraphEdit into anything else: video decoder there will decode in software and burn the artifacts in.

See also:

Crashes in Visual C++ 2012 vs. Visual C++ 2010

Great news for those suffering from Visual Studio 2010 IDE crashes with losing recent source code changes. Visual Studio 2012 is way more stable (event with Visual Studio 2010 Platform Toolset!) and suffers from crashes without losing editor changes.

The worst thing you seem to be getting is:

Which is an access violation or stack overflow issue in a child process, so that the crash does not kill the IDE itself. The problems seem to be all around cpfe.dll, and they are so less annoying.

If you happen to experience anything of the kind, be sure to vote the bug report on MS Connect: Memory access violation and unhandled exception in cpfe.dll while editing C++ source code with Visual Studio 2012. Microsoft is still accepting feedback on this release and will hopefully resolve the problem in future VS 2012 updates.

StressEvr: So, how many EVRs you can do?

Direct3D based DirectShow video renderers – Video Mixing Renderer 9 and Enhanced Video Renderer – have been notoriously known for consuming resources in a way that you can run at most X simultaneously. There has been no comment published on the topic and questions (e.g. like this: How many VMR 9 can a PC support concurrently) remain unanswered for a long time. Video Mixing Renderer 7 was a good alternative for some time in past until it was cut down to be unable to support hardware scaling (thanks Microsoft!). The trendy way to render video nowadays is using Enhanced Video Renderer, a Media Foundation subsystem with an interface into DirectShow to take over state of the art video rendering capabilities. So, how many EVRs one can run simultaneously? Chances are that it is less than one could suppose. The interesting part is that there is no obvious evidence on type of resource running out, which causing next EVR instance to fail to run. And not even run, the failure seem to be coming up at an earlier stage of just connecting pins in stopped state. The failure might be accompanied with errors like E_INVALIDARG, ERROR_FILE_NOT_FOUND, E_UNEXPECTED. The actual limit appear to be loosely correlating to parameters of video output, such as resolution and bitness. Desperately waiting for clarification, I am sharing the tool to estimate the limit:
* multiple EVR instances at once, hosted by multiple windows, which can be distributed across multiple monitors
* choices of resolutions and formats
* a double click on an individual renderer pops up property page set displaying effective frame rate
* 32- and 64-bit versions

Download links