Microsoft “FaceTracker” Face Detection in form of Telegram bot

Microsoft Windows operating systems come with built-in API for face tracking in Windows.Media.FaceAnalysis Namespace. It is available since Windows 10 “Threshold 1” build 1507 even though it is probably not the most popular API for a few reasons but maybe most important is that like some other new additions, the API addresses UWP development.

Nonetheless, Microsoft even published a couple of samples (see Basic face tracking sample) and perhaps this feature is mostly known because of its use Windows Store Camera application: in Photo mode the application detects faces and the documentation suggests that the application might be involved in working cooperatively with hardware in auto-focusing on detected face. Overall, the face detection is of limited use and feature set even though it is quite nicely fitted to not only still image face detection, but also video and “tracking”: when faces are tracked through sequence of video frames.

UWP video APIs do not often mention the internals of implementation and the fact that on the internal side of things this class of functionality is implemented in integration with Media Foundation API. Even though it makes good sense overall since Media Foundation the is the current media API in Windows, the functionality is not advertised as available directly through Media Foundation: no documentation, no sample code: “this area is restricted to personnel, stay well clear of”.

So I just made a quick journey into this restricted territory and plugged FaceTracker into classic desktop Media Foundation pipeline, which in turn is plugged into Telegram Bot template (in next turn, just running as a Windows Service on some Internet connected box: this way Telegram is convertible into cheap cloud service).

The demo is highlighting detected faces in a user provided clip, like this:

That is, in a conversation with a bot one supplies his or her own video clip and the bot runs it against face detector, overlaying frames for found faces, then sends back the re-encoded video. Since Media Foundation is involved and given that face detector is well interfaced to media pipeline, the process is taking place on GPU to extent possible: DXVA, Direct3D 11, hardware encoder ASIC.

All in all, meet @FaceDetectionDemoBot:

I guess next time the bot will also extract face detection information from those clips recorded by iPhones. If I get it right, the recent fruity products do face detection automatically and embed the information right into clip helping cloud storage processing since edge devices already invests its horsepower into resource consuming number crunching.

In closing,

Leave a Reply