Unicode vs. Windows Console

If I run this, what the output would be?

#include <string>
#include <iostream>

#include <winrt\base.h>
#include <winrt\Windows.Foundation.h>
#include <winrt\Windows.Globalization.DateTimeFormatting.h>

#pragma comment(lib, "windowsapp.lib")

int main()
{
	auto const Now = winrt::clock::now();
	winrt::Windows::Globalization::DateTimeFormatting::DateTimeFormatter DateTimeFormatter { L"shortdate longtime" };
	std::wcout << "Now is " << static_cast<std::wstring>(DateTimeFormatter.Format(Now)) << std::endl;
	return 0;
}

Here we go:

What appears to be wrong is Unicode Left-to-Right mark character, which is killing the console and it stops accepting any further text!

Now if you’re going to do this:

#include <string>
#include <iostream>

#include <winrt\base.h>
#include <winrt\Windows.Foundation.h>
#include <winrt\Windows.Globalization.DateTimeFormatting.h>

#pragma comment(lib, "windowsapp.lib")

std::wstring Replace(std::wstring const& Input, std::wstring const& A, std::wstring const& B)
{
	std::wstring Output;
	for(size_t C = 0; ; )
	{
		auto const D = Input.find(A, C);
		if(D == Input.npos)
		{
			Output.append(Input.substr(C));
			break;
		}
		Output.append(Input.substr(C, D - C));
		Output.append(B);
		C = D + A.length();
	}
	return Output;
}

int main()
{
	auto const Now = winrt::clock::now();
	winrt::Windows::Globalization::DateTimeFormatting::DateTimeFormatter DateTimeFormatter { L"shortdate longtime" };
	std::wcout << "Now is " << Replace(static_cast<std::wstring>(DateTimeFormatter.Format(Now)), L"\u200E", L"") << std::endl;
	return 0;
}

Then you get what you want, and not the trailing EOL is in its place (and it’s not in the first run):

Modern asynchronous C++

Windows API offers asynchronously implemented functionality for file, network and other I/O since long ago. It was maybe one of the easiest way to make a simple thing messy and ridiculously bloated and sensitive to errors of sorts.

If you’re sane and you don’t need to squeeze everything out something you would just not use overlapped I/O and prefer blocking versions of API. One specific advantage synchronous API and blocking calls offer is linearity of code: you see clearly what happens next and you don’t need to go back and forth between multiple functions, completions and structures that carry transit context. At the cost of threads, memory, blocking you obtain an easier and more reliable method to write code.

At some point C# as a way more flexibly developed language made a move to approach asynchronous programming in a new way: Asynchronous programming in C# | Microsoft Docs. In C++ you remained where you were before being able to doo all the same at the code of code mess. There have been a few more attempts to make things easier with concurrency in C++ and eventually co-routines are making their way into modern C++.

So for some specific task I needed to quickly write some code to grab multiple images and have a Telegram bot throw it over into channel as a notification measure. Not a superman’s job, but still a good small example how to make things in parallel and have compact C++ code for that.

MSVC C++17 with /await enables use of C++ coroutines and C++/WinRT language projection supplies us with suitable asynchronous API. The code snippet below starts multiple simultaneous tasks of locating a file, reading it into memory, starting an HTTP POST request and posting the image to remote web server. Then the controlling code synchronizes on completion of all of the tasks letting them run and complete independently.

struct CompletionContext
{
	CompletionContext(size_t Counter) :
		Counter(static_cast<uint32_t>(Counter))
	{
	}
	void Decrement()
	{
		if(--Counter == 0)
			WI_VERIFY(SetEvent(Event.get()));
	}

	std::atomic_uint32_t Counter;
	winrt::handle Event { CreateEvent(nullptr, TRUE, FALSE, nullptr) };
};

winrt::Windows::Foundation::IAsyncOperation<bool> Process(DateTime Time, Configuration::Channel& Channel, CompletionContext& Context)
{
	auto Decrement = wil::scope_exit([&]() { Context.Decrement(); });
	WI_ASSERT(!Channel.RecordDirectory.empty());
	WI_ASSERT(!Channel.Name.empty());
	auto const TimeEx = system_clock::from_time_t(winrt::clock::to_time_t(Time));
	winrt::Windows::Storage::Streams::IBuffer Buffer;
	WCHAR Path[MAX_PATH];
	PathCombineW(Path, Channel.RecordDirectory.c_str(), Channel.Name.c_str());
	PathCombineW(Path, Path, L"thumbnail.jpg");
	using namespace winrt::Windows::Storage;
	auto const File = co_await StorageFile::GetFileFromPathAsync(Path);
	auto const InputStream = co_await File.OpenAsync(FileAccessMode::Read, StorageOpenOptions::AllowOnlyReaders);
	Buffer = co_await TelegramHelper::ToBuffer(InputStream);
	std::wostringstream Stream;
	Stream << Format(L"ℹ️ Notification") << std::endl;
	Stream << std::endl;
	AppendComputerTime(Stream);
	Stream << "Directory: " << Channel.RecordDirectory << std::endl;
	Stream << "Channel: " << Channel.Name << L" (" << Channel.FriendlyName << L")" << std::endl;
	co_await TelegramHelper::SendPhoto(TelegramHelper::BinaryDocument(Buffer, L"thumbnail.jpg"), Stream.str());
	co_return true;
}
winrt::Windows::Foundation::IAsyncAction Completion(CompletionContext& Context)
{
	co_await winrt::resume_on_signal(Context.Event.get()); // https://docs.microsoft.com/en-us/uwp/cpp-ref-for-winrt/resume-on-signal
	co_return;
}

CompletionContext Context(m_Configuration.ChannelVector.size());
for(auto&& Channel : m_Configuration.ChannelVector)
	Process(Time, Channel, Context);
co_await Completion(Context);

(I think I just am just not aware of existing suitable pattern to synchronize with multiple completion, so I made it with a manual event and waiting on it with existing C++/WinRT helper that I was aware of)

So how is this better than what we had before?

First – and the most perhaps important – the code remains compact and linear. With this amount of C++ code you would not even say that it runs highly parallelized. The only blocking is at the last line of the snippet where we finally wait on completion of all of the tasks. Still task code is perfectly readable and does not have excessive code to desperately read trying to figure out what is going on.

Second, the code is concurrent and parallel without any need to manage threads and stuff. You don’t need to think of how many threads you want, how many CPU cores the system have. The code is just parallel enough and is mapped onto available system resources in a good way. You just focus on what is important. The scalability will be better understood in the following paragraph.

Third, the amount of co_await operators. They appear a lot in code around asynchronous operations. The way things work is this: C++ compiler slices your function with return type of winrt::Windows::Foundation::IAsync* (for details on this I forward to coroutine theory linked in the beginning of the paragraph, let’s just focus on C++/WinRT part here) into multiple pieces separated by co_await operators. This is done transparently and you see the function as a solid piece of code while effectively it’s broken into separate pieces joint by execution context with arguments, local variables and returned value. At every such operator the function can be suspended for as long as necessary (for example, to complete I/O) and then resumed on this or another thread. As a C++ developer you don’t have to think about the details anymore as C++20 compiler is here to help you to catch up in efficiency with C# guys.

Even though this might be not exactly accurate technically, I think it might be helpful to imagine that C++ compiler compiles multiple “subfunctions” breaking original function at co_await boundaries. Now imagine that it is possible to quickly transfer such “subfunctions” to another CPU core, or put it aside executing a “more important” “subfunction” from another context of execution. The application is now a deck of small tasks which are executed in a highly parallel manner, but in the same time the order of tasks within one context of execution is safely preserved. You also have all nice things you are used to from earlier C++.

IAsync*/co_await implementation is supplying you with a thread pool to place your multiple tasks and their function pieces onto available CPU cores for concurrent execution. That is lots of this subtasks and evenly distributed across cores and reasonable number threads for unblocked execution and managed waiting for I/O completion synchronization.

All in all, you can now have compact well readable and manageable concurrent code, scalable and with efficient resource consumption, with so lower need to do threading and waiting on your own.

So, what “W” is for in lstrcmpW

So I took time and submitted a request to update lstrcmpW documentation because it is inaccurate.

It seemed pretty obvious that Syntax section shows W types (correct) and Parameters section shows T types trying to document the same thing. Some articles, for CRT functions specifically, solve this by documenting a set of similar functions at once, but it is not the case here.

The response was a bit unexpected:

Well, good luck with clarity and accuracy in SDK documentation then.

DirectShow filter built with Visual Studio 2019 Preview to run on Windows Server 2012

I pushed a few commits to my fork of DirectShow Win7 Samples (BaseClasses library specifically).

One of small problems I happened to deal with is that a filter built with current/recent toolset produces code incompatible with legacy operating systems, which are still widely present in the wild. This could be solved by using outdated versions of Visual Studio, outdated Windows SDK etc. This is however not really necessary because even Visual Studio 2019 Preview builds DirectShow code perfectly (including using v142 toolset), and you are generally not limited in using 30 year old codebase alone. I had a filter using Windows Implementation Libraries (WIL) helpers, C++17 code and C++/WinRT for COM object implementation, however a few rough places of BaseClasses resulted in filter binary incompatible with Windows Server 2012 runtime.

A bit of massaging of BaseClasses fixed the problem. I also enabled SDL checks (and this made me fix something COutputQueue implementation – not really a bug, but it could be more accurate, and got rid of strsafe.h in favor of safe CRT string functions.

While doing that, I noticed that ancient lstrcmpW API function is documented incorrectly on MSDN.

Internal E_UNEXPECTED in dxgi.dll

Someone asked a question on StackOverflow recently about suspicious debug output messages associated with DXGI/Direct3D initialization: DirectX12: dxgi dll catastrophic failure when creating IDXGIFactory.

onecore\windows\directx\database\helperlibrary\lib\perappusersettingsqueryimpl.cpp(121)\dxgi.dll!00007FFBA0D6D7F8: (caller: 00007FFBA0D4D167) ReturnHr(1) tid(64d8) 8000FFFF Catastrophic failure onecore\windows\directx\database\helperlibrary\lib\perappusersettingsqueryimpl.cpp(98)\dxgi.dll!00007FFBA0D6D4D0: (caller: 00007FFBA0D3E221) ReturnHr(2) tid(64d8) 8000FFFF Catastrophic failure onecore\windows\directx\database\helperlibrary\lib\directxdatabasehelper.cpp(999)\dxgi.dll!00007FFBA0D6D4FC: (caller: 00007FFBA0D3E221) ReturnHr(3) tid(64d8) 8000FFFF Catastrophic failure

This problem is not fatal or severe but it is a long standing one, and Microsoft folks should look into it because — as StackOverflow question suggests — it confuses people.

It is also a widespread one, and — for instance — it can be easily repro’d by one of the apps I posted earlier:

If you start the application in self-debugging mode with -Debug command line parameter, the debug output is redirected to console and those messages are immediately visible:

In the referenced StackOverflow answer I also advertise Microsoft Windows Implementation Libraries (WIL) which I like and use myself where appropriate, and I think is a good piece of software, and an underrated one. No wonder it is used in DXGI implementation internally.

DirectShow VCam source code

Introducing another popular DirectShow project: Vivek’s source filter which emulates a video capture device. For a long time the code was hosted on P “The March Hare” W’s website, which was eventually taken down.

I placed the recovered project, binaries, README and upgrade to Visual Studio 2019 on GitHub: roman380/tmhare.mvps.org-vcam: Original Vivek’s popular sample project implementing virtual DirectShow video source filter (github.com)

The project itself is described in the repository, so I will not duplicate the text here.

See also: How to build and run Vivek’s Virtual Camera on Windows 10? on StackOverflow.

Windows SDK DirectShow Samples adapted for Visual Studio 2019

Over 20+ years there have been a steady flow of questions “how to build these projects”. Back in time the problem was more about having exactly matching settings in the application/library projects and mandatory dependent static library. At some point Microsoft abandoned the samples, then removed from the SDK completely. Luckily, some point the samples were returned back to public as “Win7Samples” under “Windows Classic Samples” published on GitHub.

DirectShow samples there, however, exist in the state where they were dropped years ago. Still functioning and in good standing, but not prepared for building out of the box. So the flow of the “how to build” questions is still here.

I made a fork of the repository (branch “directshow” on fork of the Microsoft’s repository; “Samples/Win7Samples/multimedia/directshow” from the root of the repository) and upgraded a few projects, those most popular ones (including AmCap, PushSource, EzRGB24, beginner’s DShowPlayer application):

Windows-classic-samples/Samples/Win7Samples/multimedia/directshow at directshow · roman380/Windows-classic-samples (github.com)

The code requires Microsoft Visual Studio 2019 (Community version is okay) and current Windows 10 SDK.

To start, clone the fork and locate README at the directshow folder, open the solution and build the code, Debug or Release configuration, Win32 or x64 platform.