GetFileVersionInfoSize and API sets

GetFileVersionInfoSizeW function in Requirements section lists:

Minimum supported clientWindows Vista [desktop apps only]
Minimum supported serverWindows Server 2008 [desktop apps only]
Target PlatformWindows
Headerwinver.h (include Windows.h)
LibraryVersion.lib
DLLApi-ms-win-core-version-l1-1-0.dll

and this is inaccurate. The actual requirement DLL is api-ms-win-core-version-l1-1-1.dll instead. However, what does it mean exactly? Windows API sets:

API Sets rely on operating system support in the library loader to effectively introduce a namespace redirection component into the library binding process. Subject to various inputs, including the API Set name and the binding (import) context, the library loader performs a runtime redirection of the reference to a target host binary that houses the appropriate implementation of the API Set.

The decoupling between implementation and interface contracts provided by API Sets offers many engineering advantages, but can also potentially reduce the number of DLLs loaded in a process.

The “hyphen one” DLL (api-ms-win-core-version-l1-1-1.dll) is missing in Windows Server 2012 R2 and so the documented promise to offer support starting Windows Server 2008 is incorrect. Windows Server 2012 R2 has only “hyphen zero” DLL.

The hyphen zero DLL exposes, however, GetFileVersionInfoSizeExW entry point so the application developers addressing backward compatibility should switch from use of GetFileVersionInfoSize to GetFileVersionInfoSizeEx even though the former is not documented as deprecated explicitly (probably another out of date aspect of the documentation).

The same applies to GetFileVersionInfo functions.

Also related, this part of MSDN documentation API Sets available in Windows 8.1 and Windows Server 2012 R2 looks good and has no mention of GetFileVersionInfoSize and GetFileVersionInfo.

Incorrect breaking #import behavior in recent (e.g. 16.6.2) MSVC versions

Yesterday’s bug is not the only “news”. Some time ago I already saw weird broken behavior of rebuild of DirectShowSpy.dll with current version of Visual Studio and MSVC.

Now the problem is getting more clear.

Here is some interface:

[
    object,
    uuid(6CE45967-F228-4F7B-8B93-83DC599618CA),
    //dual,
    //oleautomation,
    nonextensible,
    pointer_default(unique)
]
interface IMuxFilter : IUnknown
{
    HRESULT IsTemporaryIndexFileEnabled();
    HRESULT SetTemporaryIndexFileEnabled([in] BOOL bTemporaryIndexFileEnabled);
    HRESULT GetAlignTrackStartTimeDisabled();
    HRESULT SetAlignTrackStartTimeDisabled([in] BOOL bAlignTrackStartTimeDisabled);
    HRESULT GetMinimalMovieDuration([out] LONGLONG* pnMinimalMovieDuration);
    HRESULT SetMinimalMovieDuration([in] LONGLONG nMinimalMovieDuration);
};

Compiled into type library it looks okay. Windows SDK 10.0.18362 COM/OLE Object Viewer shows the correct definition obtained from the type library:

[
    odl,
    uuid(6CE45967-F228-4F7B-8B93-83DC599618CA),
    nonextensible
]
interface IMuxFilter : IUnknown {
    HRESULT _stdcall IsTemporaryIndexFileEnabled();
    HRESULT _stdcall SetTemporaryIndexFileEnabled([in] long bTemporaryIndexFileEnabled);
    HRESULT _stdcall GetAlignTrackStartTimeDisabled();
    HRESULT _stdcall SetAlignTrackStartTimeDisabled([in] long bAlignTrackStartTimeDisabled);
    HRESULT _stdcall GetMinimalMovieDuration([out] int64* pnMinimalMovieDuration);
    HRESULT _stdcall SetMinimalMovieDuration([in] int64 nMinimalMovieDuration);
};

Now what happens when MSVC #import takes it into Win32 32-bit code:

struct __declspec(uuid("6ce45967-f228-4f7b-8b93-83dc599618ca"))
IMuxFilter : IUnknown
{
    //
    // Raw methods provided by interface
    //

      virtual HRESULT __stdcall IsTemporaryIndexFileEnabled ( ) = 0;
    virtual HRESULT _VtblGapPlaceholder1( ) { return E_NOTIMPL; }
      virtual HRESULT __stdcall SetTemporaryIndexFileEnabled (
        /*[in]*/ long bTemporaryIndexFileEnabled ) = 0;
    virtual HRESULT _VtblGapPlaceholder2( ) { return E_NOTIMPL; }
      virtual HRESULT __stdcall GetAlignTrackStartTimeDisabled ( ) = 0;
    virtual HRESULT _VtblGapPlaceholder3( ) { return E_NOTIMPL; }
      virtual HRESULT __stdcall SetAlignTrackStartTimeDisabled (
        /*[in]*/ long bAlignTrackStartTimeDisabled ) = 0;
    virtual HRESULT _VtblGapPlaceholder4( ) { return E_NOTIMPL; }
      virtual HRESULT __stdcall GetMinimalMovieDuration (
        /*[out]*/ __int64 * pnMinimalMovieDuration ) = 0;
    virtual HRESULT _VtblGapPlaceholder5( ) { return E_NOTIMPL; }
      virtual HRESULT __stdcall SetMinimalMovieDuration (
        /*[in]*/ __int64 nMinimalMovieDuration ) = 0;
    virtual HRESULT _VtblGapPlaceholder6( ) { return E_NOTIMPL; }
};

WTF _VtblGapPlaceholder1??? That was uncalled for!

It looks like some 32/64 bullshit added by MSVC from some point (cross-compilation issue?) for no good reason reason. A sort of gentle reminder that one should get rid of #import in C++ code.

Please have it fixed, 32-bit code is something still being used.

#import of Microsoft’s own quartz.dll, for example, has the same invalid gap insertion:

struct __declspec(uuid("56a868bc-0ad4-11ce-b03a-0020af0ba770"))
IMediaTypeInfo : IDispatch
{
    //
    // Raw methods provided by interface
    //

      virtual HRESULT __stdcall get_Type (
        /*[out,retval]*/ BSTR * strType ) = 0;
    virtual HRESULT _VtblGapPlaceholder1( ) { return E_NOTIMPL; }
      virtual HRESULT __stdcall get_Subtype (
        /*[out,retval]*/ BSTR * strType ) = 0;
    virtual HRESULT _VtblGapPlaceholder2( ) { return E_NOTIMPL; }
};

Something got broken around version 16.6.1 of Visual C++ compiler

Ancient piece of code started giving troubles:

template <typename T>
class ATL_NO_VTABLE CMediaControlT :
    ...
{
    ...
    STDMETHOD(Run)()
    {
        ...
        T* MSVC_FIX_VOLATILE pT = static_cast<T*>(this); // <<--------------------------------
        CRoCriticalSectionLock GraphLock(pT->m_GraphCriticalSection);
        pT->FilterGraphNeeded();
        __D(pT->GetMediaControl(), E_NOINTERFACE);
        pT->PrepareCue();
        pT->DoRun();
        __if_exists(T::Fire_Running)
        {
            pT->Fire_Running();
        }
    ...
}

When MSVC_FIX_VOLATILE is nothing, it appears that optimizing compiler forgets pT and uses just some variation of adjusted this, which makes sense overall because static cast between the two can be resolved at compile time.

However, the problem is that the value of this is wrong and there is undefined behavior scenario.

If I make MSVC_FIX_VOLATILE to be volatile and have the variable pT somewhat “heavier”, optimizing compiler would forget this and uses pT directly with everything working as expected.

The problem still exists in current 16.6.2.

Intel Media SDK H.264 encoder buffer and target bitrate management

I might have mentioned that Intel Media SDK has a ridiculously eclectic design and is a software package for the brave. Something to stay well clear of for as long as you possibly can.

To be on par on the customer support side Intel did something that caused blocking of Intel Developer Zone account. Over time I did a few attempts to restore the account and only once out of the blue someone followed up from there with a surprising response: “You do not have the enterprise login account”. That’s unbelievable, I could register account, I could post like this, I can still request password reset links and receive them, but the problem is I don’t have “enterprise account”.

Back to Intel Media SDK where things are designed to work about the same obvious and reliable as their forums. A bit of code from very basic tutorial:

    //5. Initialize the Media SDK encoder
    sts = mfxENC.Init(&mfxEncParams);
    MSDK_IGNORE_MFX_STS(sts, MFX_WRN_PARTIAL_ACCELERATION);
    MSDK_CHECK_RESULT(sts, MFX_ERR_NONE, sts);

    // Retrieve video parameters selected by encoder.
    // - BufferSizeInKB parameter is required to set bit stream buffer size
    mfxVideoParam par;
    memset(&par, 0, sizeof(par));
    sts = mfxENC.GetVideoParam(&par);
    MSDK_CHECK_RESULT(sts, MFX_ERR_NONE, sts);

    //6. Prepare Media SDK bit stream buffer
    mfxBitstream mfxBS;
    memset(&mfxBS, 0, sizeof(mfxBS));
    mfxBS.MaxLength = par.mfx.BufferSizeInKB * 1000;
    mfxBS.Data = new mfxU8[mfxBS.MaxLength];
    MSDK_CHECK_POINTER(mfxBS.Data, MFX_ERR_MEMORY_ALLOC);

Even though who was that genius who designed it to measure buffers in kilobytes, the snippet makes great sense. You ask SDK for required buffer size and then you provide the space. I myself am even more generous than that: I grant 1024 bytes for every kilobyte in question.

The thing is that hardware encoder is still hitting scenarios where it is unable to fit the data into the space sized the mentioned way. What happens if encoder has more data on hands, maybe it emits a warning? “Well I just screwed things up, be aware”? Buffer overflow error? Buffer reallocation request? Oh no, the SDK makes it smarter: it fills the buffer completely trimming the excess making the bitstream incompliant and triggering frame corruption later on decoder end. Then encoder continues as if nothing important has happened.

There is an absolute rule in the software technology that if the thing is designed to be able to get broken in certain aspect, once in a while there will be a consumer hit by this flaw. Maybe just this once Intel guys thought it would not be the case.

Heterogeneous Media Foundation pipelines

Just a small test here to feature use of multiple GPUs within single Media Foundation pipeline. The initial idea here is pretty simple: quite so many systems are equipped with multiple GPUs, some have “free” onboard Intel GPU idling in presence of regular video card. Some other systems have integrated “iGPU” and discrete “dGPU” seamlessly blended by DXGI.

Media Foundation API does not bring any specific feature set to leverage multiple GPUs at a time, but this is surely possible to take advantage of.

The application creates a 20 second long video clips by combining GPUs: one GPU is used for video rendering and another is a host of hardware H.264 video encoding. No system memory is used for uncompressed video: system memory jumps in first to receive encoded H.264 bitstream. The Media Foundation pipeline hence is:

  • Media Source generating video frames off its video stream using the first GPU
  • Transform to combine multiple GPUs
  • H.264 video encoder transform specific to second GPU
  • Stock MP4 Media Sink

The pipeline runs in a single media session pretty much like normal pipeline. Media Foundation is designed in the way that primitives do not have to be aligned in their GPU usage with the pipeline. Surely they have to share devices and textures so that they all could operate together, but pipeline itself does not put much of limitations there.

Microsoft Windows [Version 10.0.18363.815]
(c) 2019 Microsoft Corporation. All rights reserved.

C:\Temp>HeterogeneousRecordFile.exe
HeterogeneousRecordFile.exe 20200502.1-11-gd3d16d5 (Release)
d3d16d51e7f2a098c5765d445714f14051c7a68d
HEAD -> master, origin/master
2020-05-09 23:51:54 +0300
--
Found 2 DXGI adapters
Trying heterogeneous configurations…
Using NVIDIA GeForce GTX 1650 to render video and Intel(R) HD Graphics 4600 to encode the content
Using filename HeterogeneousRecordFile-20200509-235406.mp4 for recording
Using Intel(R) HD Graphics 4600 to render video and NVIDIA GeForce GTX 1650 to encode the content
Using filename HeterogeneousRecordFile-20200509-235411.mp4 for recording
Trying trivial configuration with loopback data transfer…
Using NVIDIA GeForce GTX 1650 to render video and NVIDIA GeForce GTX 1650 to encode the content
Using filename HeterogeneousRecordFile-20200509-235416.mp4 for recording
Using Intel(R) HD Graphics 4600 to render video and Intel(R) HD Graphics 4600 to encode the content
Using filename HeterogeneousRecordFile-20200509-235419.mp4 for recording

This is just a simple use case, I believe there can be other: GPUs are pretty powerful for certain specific tasks, and they are also equipped with video specific ASICs.

Download links

Binaries:

A readable version of HelloDirectML sample

So it came to my attention that there is a new API in DirectX family: Direct Machine Learning (DirectML).

Direct Machine Learning (DirectML) is a low-level API for machine learning. It has a familiar (native C++, nano-COM) programming interface and workflow in the style of DirectX 12. You can integrate machine learning inferencing workloads into your game, engine, middleware, backend, or other application. DirectML is supported by all DirectX 12-compatible hardware.

You might want to check out this introduction video if you are interested:

I updated HelloDirectML code and restructured it to be readable and easy to comprehend. In my variant I have two operators of addition and multiplication following one another with a UAV resource barrier in between. The code does (1.5 * 2) ^ 2 math in tensor space.

Here is my fork with updated HelloDirectML, with the top surface code with tensor math in less than 150 lines of code starting here. If you are a fan of spaghetti style (according to Wiki it appears what I prefer is referred to as “Ravioli code”), the original sample is there.

Hardware video encoding latency with NVIDIA GeForce GTX 1080 Ti

To complete the set of posts [1, 2, 3] on hardware video encoding at lowest latency settings, I am sharing the juiciest part and the application for NVIDIA NVENC. I did not have a 20 series card at hand to run the measurement for the numbers, and I hope the table below for GeForce GTX 1080 Ti is eloquent.

It is a sort of awkward to put the GTX 1080 Ti numbers (and those are latency in milliseconds for every video frame sent to encoding) side by side with those of AMD products, at least those I had a chance to check out, so here we go with GeForce GTX 1080 Ti vs. GeForce GTX 1650:

Well that’s fast, and GeForce 10 series were released in 2016.

The numbers show that NVIDIA cards are powerful enough for game experience remoting (what you use Rainway for) in wide range of video modes including high frame rates 144 and up.

I also added 640×360@260 just because I have a real camera (and an inexpensive one, with USB 2.0 connection) operating in this mode with high frame rate capture: generally the numbers suggest that it is generally possible to remote a high video frame rate signal at a blink-of-an-eye speed.

There might be many aspects to compare when it comes to choosing among AMD and NVIDIA products, but when it comes to video streaming, low latency video compression and hardware assisted video compression in general, the situation is pretty much clear: just grab an NVIDIA thing and do not do what I did when I put AMD Radeon RX 570 Series video card into my primary development system. I thought maybe at that time AMD had something cool.

So, here goes the app for NVIDIA hardware.

Download links

Binaries:

  • 64-bit: NvcEncode.exe (in .ZIP archive)
  • License: This software is free to use