Microsoft Media Foundation webcam video capture in one screen of code

Being complicated for many things, Media Foundation is still quite simple for the basics. To capture video with Media Foundation the API offers Source Reader API which uses Media Foundation primitives to build a pipeline that manages origin of the data (not necessarily a live source as in this example, but can also be a file or remote resource) and offers on-request reading of the data by the application, without consumption by Media Foundation managed primitives (in this aspect Source Reader is opposed to Media Session API).

The simplest use of Source Reader to read frames from a web camera is simple enough to fit a few tens of lines of C++ code. Sample VideoCaptureSynchronous project captures video frames in the form of IMFSample samples in less then 100 lines of code.

Friendly Name: Logitech Webcam C930e
nStreamIndex 0, nStreamFlags 0x100, nTime 1215.074, pSample 0x0000000000000000
nStreamIndex 0, nStreamFlags 0x0, nTime 0.068, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 0.196, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 0.324, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 0.436, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 0.564, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 0.676, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 0.804, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 0.916, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 1.044, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 1.156, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 1.284, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 1.396, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 1.524, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 1.636, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 1.764, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 1.956, pSample 0x000002CAF3805D10
...

VideoCaptureSynchronous project does not show what to do with the samples or how to request specific format of the samples, it just shows the ease of video capture per se. The capture takes place synchronously with blocking calls requesting and obtaining the samples.

Media Foundation API is asynchronous by its design and Source Reader API in synchronous mode hides the complexity. The blocking call issues a request for a frame internally, waits until the frame arrives and makes it available.

Source Reader API does offer asynchronous model too, making it available again in as simple as possible way.

VideoCaptureAsynchronous project is doing the same video capture but asynchronously: the controlling thread just starts capture, and frames are delivered via a callback once they are available, on a worker thread.

So when does one use synchronous and asynchronous?

Even though synchronous model results in cleaner and more reliable code with less chances for a mistake, and the gains over asynchronous model in most cases can be neglected esp. to those who are interested in beginner material like this post, video capture is real time process where one doesn’t want to block controlling thread and instead receive the frames out of nowhere as soon as they are ready. Hence, the asynchronous version. Asynchronous still can be simple: VideoCaptureAsynchronous is already more than 100 lines of code, but 120 lines might be also okay.

Download links

  • Source code:
    • VideoCaptureSynchronous: SVN, Trac
    • VideoCaptureAsynchronous: SVN, Trac
  • License: This software is free to use

One Reply to “Microsoft Media Foundation webcam video capture in one screen of code”

Leave a Reply