Video Capture Issues with Windows 10 Anniversary Update

Windows 10 Anniversary Update brought a breaking change that removed, in many cases, hardware compressed video formats from video capture APIs, even though the devices themselves are known to have respective capabilities.

The magic of Logitech C930e camera is also lost since it is no longer available for video capture in hardware compressed H.264.

Presumably an issue in middle layer, such as Kernel Streaming, it is reflected by missing support for some of video device formats, esp. Motion JPEG (FourCC MJPG) and H.264 (FourCC H264). The depth of the issue also results in effect that capabilities are equally missing from both DirectShow and Media Foundation APIs.

Unless someone wants to develop a driver talking to hardware directly bypassing Kernel Streaming subsystem (such as, for example, “DeckLink Video Capture” for Blackmagic Design devices leveraging SDK and not the complementary WDM driver), it seems that the only solution is to wait for the fix from Microsoft. Rather sooner than later it will be here taking into consideration the scope of the problem.

Rather, try to stop automatic updates if video capture is important for you.

See also:

UPDATE: Mike from MSFT commented in this thread:

Hey guys, Mike from the Camera team here. We saw this concern pop up for the first time a couple of days ago, and we’d like to understand exactly what difficulties you are all facing. There are two items that I would like the discussion to focus on.

Firstly, the media types supported on a given machine vary depending on the capabilities of the camera device you’re using. There is no guarantee that a media type such as MJPEG will be supported on all cameras (for example, the Surface Pro 4 / Surface Book cameras do not support it). This means that, by taking a hard dependency on it being available, you’re limiting the portability of your application to the set of cameras that do offer that media type. What applications should do instead, is query the available formats (see: IMFSourceReader::GetNativeMediaType for MediaFoundation, IAMStreamConfig::GetStreamCaps or the IPin::EnumMediaTypes article for DirectShow), and make a selection based on the one that best suits your needs or capabilities.

Secondly, we are expecting that almost all clients using MJPEG frames will be decoding the stream as one of the first steps in their pipeline. In order for this to be done in the most efficient way, with the smallest impact to overall system performance, we want to offload that piece of the process to be taken care of by the platform. This means your application should be able to consume the uncompressed frames, which will have been decoded to NV12 for MediaFoundation applications, and YUY2 for DirectShow applications.

If there are any other issues that you’d like to bring to our attention, please do so. We’d love to understand how we can best help you through this transition, and we’ll be considering your feedback for future planning. Thanks!

Media Foundation MPEG-4 Property Handler might report incorrect Video Frame Rate

To follow up previous post with Media Foundation bug, here goes another one related to property handler for MPEG-4 files (.MP4) and specific property PKEY_Video_FrameRate which reports frame rate for given media file.

This is the object responsible for filling columns in explorer, or otherwise visually the bug might look like this:


The values of the properties are also accessible programmatically using IPropertyStore::GetValue API, in which case they are:

  • PKEY_Video_FrameWidth1280 (VT_UI4) // 1,280
  • PKEY_Video_FrameHeight720 (VT_UI4) // 720
  • PKEY_Video_FrameRate1091345 (VT_UI4) // 1,091,345
  • PKEY_Video_Compression{34363248-0000-0010-8000-00AA00389B71} (VT_LPWSTR) // FourCC H264
  • PKEY_Video_FourCC875967048 (VT_UI4) // 875,967,048
  • PKEY_Video_HorizontalAspectRatio1 (VT_UI4) // 1
  • PKEY_Video_VerticalAspectRatio1 (VT_UI4) // 1
  • PKEY_Video_StreamNumber2 (VT_UI4) // 2
  • PKEY_Video_TotalBitrate12123288 (VT_UI4) // 12,123,288

The actual frame rate of the file is 50 fps. The file is playable well in every media player, so the problem is the reporting itself. Let us look inside the file to possibly identify the cause. The mdhd box for the video track shows the following information:


Let us do some math now:

  • Time Scale: 10,000,000
  • Duration: 4,501,200,000 (around 7.5 minutes)
  • Video Sample Count: 22,506

This makes the correct fps of 50 (frames per scaled duration). However the duration number itself is a pretty big one and looks exceeding the 32-bit range. Now let us try this one:

22506 / (4501200000 & ((1 << 32) – 1)) * 10000000

And we get 1,091. Bingo! Arithmetic overflow in the property handler then…

See also:

Bonus tool: FilePropertyStore application which reads properties of the file you drag and drop onto it, Win32 and x64 versions.


Common Controls: Versions, Compatibility, WTL

An application appears to be not working in Windows XP in a weird way: rebar control appeared to fail showing up.

Where the application is expected to look much nicer with rebar control as a container for menu (implemented as command bar WTL control) and toolbar with buttons:

A WTL sample project generated by a Visual Studio wizard would never give such effect, and the bug was a combination of factors:

  1. An application built with a newer version of Windows SDK, which includes support for features (Windows Vista+ Common Controls) that are more recent than production environment (Windows XP); the application targets to Windows Vista+ environment too (_WIN32_WINNT >= 0x0600)
  2. Compatibility issues of Common Controls library
  3. WTL version (7.5), which did not yet include a workaround for the problem

The problem, which caused the bug directly was the REBARBANDINFO structure and its use as an argument with Common Controls API. As MSDN shows, the structure was amended twice with additional fields.

One of the way to support multiple versions of the structure definition, and to resolve compatibility issues, is to embed structure size into structure payload. In fact, REBARBANDINFO::cbSize member is there exactly for this reason.

The application is normally filling cbSize with the maximal known structure size and fills the rest of the fields respectively. The API is expected to be checking cbSize member and be detecting API version compatibility scenarios:

  1. cbSize holds exactly the value the API expects (that is, the maximal value known/defined to the API) – the simplest scenario where the API and the application are on the same page, both are using the same versions of the “protocol”/interface.
  2. cbSize is smaller than API can support – the API sees that it is dealing with a sort of legacy application which cannot utilize all available features, and the API acts respectively supporting the older part of the protocol, and keeping defaults or “old look” for the rest of implementation. This addresses backward compatibility: the newer API works with apps designed for older version of the API
  3. cbSize is greater then API can support – the API sees that the application is already aware of newer version API and is possibly requesting some of the missing features. The API might be ignoring the unsupported part in assumption that API evolution tool place keeping some compatibility in mind, and still do the best it can with the existing implementation. Or, the API might just fail to work.

The latter item #3 is the scenario here with rebar control. The application is using Windows Vista version of REBARBANDINFO structure and Windows XP implementation choses to completely fail.

While it does not seem to be directly a bug, this attitude is definitely not developer friendly: there is no reason for the control to not work in its best and default way. Having API acting this way, each developer using the API needs to take care of the situation explicitly: whenever Windows Vista enabled application needs to be able to run in Windows XP system, the code around REBARBANDINFO would look like this:

REBARBANDINFO BandInformation = { sizeof BandInformation, RBBIM_LPARAM };
#if _WIN32_WINNT >= 0x0600
if(GetOsVersion() < 0x00060000 || GetCommCtrlVersion() < 0x00060000) // pre-Vista, Common Controls pre-6.0
    BandInformation.cbSize = REBARBANDINFO_V6_SIZE;
#endif// _WIN32_WINNT >= 0x0600
const BOOL bGetBandInfoResult = Rebar.GetBandInfo(0, &BandInformation);

If the API was nicer to developers, the code would be plain and simple:

REBARBANDINFO BandInformation = { sizeof BandInformation, RBBIM_LPARAM };
const BOOL bGetBandInfoResult = Rebar.GetBandInfo(0, &BandInformation);

To address this problem, WTL 8.0 comes up with RunTimeHelper namespace and its SizeOf_REBARBANDINFO function. It takes care of details for the developer choosing the proper size of the structure on runtime. The code is being taken back to a simpler shape:

REBARBANDINFO BandInformation = { RunTimeHelper::SizeOf_REBARBANDINFO(), RBBIM_LPARAM };
const BOOL bGetBandInfoResult = Rebar.GetBandInfo(0, &BandInformation);

All in all:

  • be aware of compatibility issues (same scenario exists with other SDK structures: LVGROUP, LVTILEINFO, MCHITTESTINFO, NONCLIENTMETRICS and other).
  • use latest version of WTL to have things worked around for you where Microsoft developers were not kid enough to provide perfect API
  • be aware and take advantage of WTL’s RunTimeHelper class

Booo SRW Locks

Windows Vista introduced new synchronization API: slim reader/writer (SRW) locks. Being already armed with critical sections, one perhaps would not desperately need an SRW lock, but still it offers a great option to provide both exclusive (critical section alike) mode and shared mode when 2+ threads can enter protected section simultaneously. Some time earlier, I already touched SRW lock recursion issue.

This time it is more about performance. Let us suppose we have a code fragment:

static const SIZE_T g_nCount = 100000000;
for(SIZE_T nIndex = 0; nIndex < g_nCount; nIndex++)

How fast is this? Provided that execution took 1.6 ms on an idle lock variable, how fast it is going to be with another shared “reader” on another thread who acquired once access in shared mode? This part comes up confusing: it appears almost twice as slow: 2.9 ms, with also a number of contentions (and context switches) on the way.

SRW lock is advertised as lightweight and fast API, no recursion, no extra features, no fool proof checks. So it could be assumed to get you top performance, but 50 lines of code class can definitely outperform it.

A perhaps simplest SRW lock can be backed on a LONG (or LONGLONG) volatile variable accessed with interlocked API functions. The rules are simple:

  • initial value of zero means idle state
  • shared “reader” enters incrementing by one, positive values indicates one ore more readers
  • exclusive “writer” enters decrementing by a hundred (well, this needs to be any value big enough to be less than minus maximal amount of simultaneous concurrent readers)
  • releasing acquired state the value is decremented (incremented) back
  • in case of contention thread yields execution to continue later with possibly better luck with shared resource (option: implement spic count for several attempts in a row, which is better suitable for multi-processor systems)

It is clear that concurrent readers touch value only once with a single increment only, leaving no opportunity to yield execution due to contention. How fast it can be?

static const SIZE_T g_nCount = 100000000;
for(SIZE_T nIndex = 0; nIndex < g_nCount; nIndex++)
    while(InterlockedIncrement(&m_nNativeLock) < 0)

It is 1.3 ms regardless whether there are any shared readers on concurrent threads. It appears that a simple custom SRW lock class is going to be superior to the API:

  • faster due to zero API overhead (inline compiled code versus WINAPI convention functions)
  • faster in shared mode due to not being subject to additional overhead and contentions
  • flexible spin counts possible to tune performance for specific use
  • extensibility:
    • recursion is allowed shared mode
    • easy to implement upgrades and downgrades between shared and exclusive modes

Sample code is Visual Studio 2010 C++ project accessible from SVN repository.

Update: Critical Section Test. This is how SRW lock API performance compares to entry/leaving critical section (provided that critical section is never locked at entry time). Critical section is about 15% slower to enter and leave.

API: 100M iterations, 1575 ms
API: 100M iterations, 2839 ms
Native: 100M iterations, 1311 ms
Native: 100M iterations, 1326 ms
Critical Section: 100M iterations, 1841 ms

Recursive SRW Locks

Windows Vista added new synchronization API, Slim Reader/Writer (SRW) Locks, which is a powerful alternative to critical sections. The detailed description is, as always on MSDN, and what makes it really cool is simple:

  • unlike critical sections, SRW Locks provide reader and writer access synchronization making it possible for 2 and more reader to not block one another
  • SRW Locks do not reference any resources and have size of a pointer, which is the simplest possible scenario; as a result, they don’t need a destructor and their initialization is simple zeroing of memory/variable (for which you however should use InitializeSRWLock API

Being lightweight they are efficient. To understand how at all they can work, one can imagine that a reader might be trying to InterlockedIncrement a synchronization variable. If result is positive, then it’s OK to go. Otherwise, reader should decrement it back, wait and retry. A writer, instead, does InterlockedAdd with an argument of -0x1000 and checks that result of the operation is exactly -0x1000.

This post is about a trap one cat enter into by neglecting one of the SRW lock warnings:

… so SRW locks cannot be acquired recursively. In addition, a thread that owns an SRW lock in shared mode cannot upgrade its ownership of the lock to exclusive mode.

SRW locks cannot be acquired recursively, but it is very easy to make a mistake. If you attempt to recursively acquire, you are likely to succeed, without a warning, error code, exception or assertion failure. You pass this point and you can write quite some code before you realize something is wrong.

It can be as simple as this:


RegSetKeySecurity, CRegKey::SetKeySecurity and CSecurityDesc

One thing is worth special mentioning in connection with previous post on DirectShow Filter Graph Spy on Microsoft Vista system: ATL’s CSecurityDesc class caused to waste some time.

CRegKey Key;
CSecurityDesc AdministratorsOwnerSecurityDescriptor;
ATLENSURE_SUCCEEDED(HRESULT_FROM_WIN32(Key.SetKeySecurity(OWNER_SECURITY_INFORMATION, &AdministratorsOwnerSecurityDescriptor)));

The code compiles fine, but on runtime it gives error 87 (ERROR_INVALID_PARAMETER, E_INVALIDARG) in the last line, returned from RegSetKeySecurity API call. My first guess was that ATL’s CSecurityDesc class for some reason prepared wrong descriptor which resulted in rejecting it as an argument. From the first glance it looks (not sure) that this class deals, to some extent, with structures itself rather than using API functions, so it could be that it results in something looking differently from expected by API calls.

Still the problem is in class itself and its cast from CSecurityDesc& to required SECURITY_DESCRIPTOR* type. The class only implements operator to automatically cast to const SECURITY_DESCRIPTOR* type, so the following line would not be passed by compiler:

Key.SetKeySecurity(OWNER_SECURITY_INFORMATION, AdministratorsOwnerSecurityDescriptor)

However &AdministratorsOwnerSecurityDescriptor is another level of indirection and hence SECURITY_DESCRIPTOR** type, which is passed by compiler, but results in indeed invalid argument.

So in order to correctly convert CSecurityDesc& to SECURITY_DESCRIPTOR* it can be done this way:

CRegKey Key;
CSecurityDesc AdministratorsOwnerSecurityDescriptor;