Modern asynchronous C++

Windows API offers asynchronously implemented functionality for file, network and other I/O since long ago. It was maybe one of the easiest way to make a simple thing messy and ridiculously bloated and sensitive to errors of sorts.

If you’re sane and you don’t need to squeeze everything out something you would just not use overlapped I/O and prefer blocking versions of API. One specific advantage synchronous API and blocking calls offer is linearity of code: you see clearly what happens next and you don’t need to go back and forth between multiple functions, completions and structures that carry transit context. At the cost of threads, memory, blocking you obtain an easier and more reliable method to write code.

At some point C# as a way more flexibly developed language made a move to approach asynchronous programming in a new way: Asynchronous programming in C# | Microsoft Docs. In C++ you remained where you were before being able to doo all the same at the code of code mess. There have been a few more attempts to make things easier with concurrency in C++ and eventually co-routines are making their way into modern C++.

So for some specific task I needed to quickly write some code to grab multiple images and have a Telegram bot throw it over into channel as a notification measure. Not a superman’s job, but still a good small example how to make things in parallel and have compact C++ code for that.

MSVC C++17 with /await enables use of C++ coroutines and C++/WinRT language projection supplies us with suitable asynchronous API. The code snippet below starts multiple simultaneous tasks of locating a file, reading it into memory, starting an HTTP POST request and posting the image to remote web server. Then the controlling code synchronizes on completion of all of the tasks letting them run and complete independently.

struct CompletionContext
	CompletionContext(size_t Counter) :
	void Decrement()
		if(--Counter == 0)

	std::atomic_uint32_t Counter;
	winrt::handle Event { CreateEvent(nullptr, TRUE, FALSE, nullptr) };

winrt::Windows::Foundation::IAsyncOperation<bool> Process(DateTime Time, Configuration::Channel& Channel, CompletionContext& Context)
	auto Decrement = wil::scope_exit([&]() { Context.Decrement(); });
	auto const TimeEx = system_clock::from_time_t(winrt::clock::to_time_t(Time));
	winrt::Windows::Storage::Streams::IBuffer Buffer;
	PathCombineW(Path, Channel.RecordDirectory.c_str(), Channel.Name.c_str());
	PathCombineW(Path, Path, L"thumbnail.jpg");
	using namespace winrt::Windows::Storage;
	auto const File = co_await StorageFile::GetFileFromPathAsync(Path);
	auto const InputStream = co_await File.OpenAsync(FileAccessMode::Read, StorageOpenOptions::AllowOnlyReaders);
	Buffer = co_await TelegramHelper::ToBuffer(InputStream);
	std::wostringstream Stream;
	Stream << Format(L"ℹ️ Notification") << std::endl;
	Stream << std::endl;
	Stream << "Directory: " << Channel.RecordDirectory << std::endl;
	Stream << "Channel: " << Channel.Name << L" (" << Channel.FriendlyName << L")" << std::endl;
	co_await TelegramHelper::SendPhoto(TelegramHelper::BinaryDocument(Buffer, L"thumbnail.jpg"), Stream.str());
	co_return true;
winrt::Windows::Foundation::IAsyncAction Completion(CompletionContext& Context)
	co_await winrt::resume_on_signal(Context.Event.get()); //

CompletionContext Context(m_Configuration.ChannelVector.size());
for(auto&& Channel : m_Configuration.ChannelVector)
	Process(Time, Channel, Context);
co_await Completion(Context);

(I think I just am just not aware of existing suitable pattern to synchronize with multiple completion, so I made it with a manual event and waiting on it with existing C++/WinRT helper that I was aware of)

So how is this better than what we had before?

First – and the most perhaps important – the code remains compact and linear. With this amount of C++ code you would not even say that it runs highly parallelized. The only blocking is at the last line of the snippet where we finally wait on completion of all of the tasks. Still task code is perfectly readable and does not have excessive code to desperately read trying to figure out what is going on.

Second, the code is concurrent and parallel without any need to manage threads and stuff. You don’t need to think of how many threads you want, how many CPU cores the system have. The code is just parallel enough and is mapped onto available system resources in a good way. You just focus on what is important. The scalability will be better understood in the following paragraph.

Third, the amount of co_await operators. They appear a lot in code around asynchronous operations. The way things work is this: C++ compiler slices your function with return type of winrt::Windows::Foundation::IAsync* (for details on this I forward to coroutine theory linked in the beginning of the paragraph, let’s just focus on C++/WinRT part here) into multiple pieces separated by co_await operators. This is done transparently and you see the function as a solid piece of code while effectively it’s broken into separate pieces joint by execution context with arguments, local variables and returned value. At every such operator the function can be suspended for as long as necessary (for example, to complete I/O) and then resumed on this or another thread. As a C++ developer you don’t have to think about the details anymore as C++20 compiler is here to help you to catch up in efficiency with C# guys.

Even though this might be not exactly accurate technically, I think it might be helpful to imagine that C++ compiler compiles multiple “subfunctions” breaking original function at co_await boundaries. Now imagine that it is possible to quickly transfer such “subfunctions” to another CPU core, or put it aside executing a “more important” “subfunction” from another context of execution. The application is now a deck of small tasks which are executed in a highly parallel manner, but in the same time the order of tasks within one context of execution is safely preserved. You also have all nice things you are used to from earlier C++.

IAsync*/co_await implementation is supplying you with a thread pool to place your multiple tasks and their function pieces onto available CPU cores for concurrent execution. That is lots of this subtasks and evenly distributed across cores and reasonable number threads for unblocked execution and managed waiting for I/O completion synchronization.

All in all, you can now have compact well readable and manageable concurrent code, scalable and with efficient resource consumption, with so lower need to do threading and waiting on your own.

Leave a Reply