JPEG Multi File Video Capture Source Filter, a virtual DirectShow camera

Provided that there already is a JPEG Multi File Source Filter that can act as a video source streaming video from local JPEG files, it looked to be useful to build a virtual camera on top of this filter. This is the main difference: an existing filter is generic and customizable: it requires to be provided a directory with the files, other settings may also apply. A virtual camera is the filter that has to work out of the box: a video enabled application, such as AMCap Sample, Media Player Classic, VideoLAN, Skype, Windows Media Encoder enumerates video capture sources, instantiate the one of the interest and it should already be ready to stream.

Implementation Details

The very first question is embedding of an existing filter into new filter. The two most common methods are:

  • COM aggregation
  • embedding a fully featured graph with a sink/renderer that intercepts media samples downstream and forwards to a higher level filter so that it streams them as a source filter

The COM aggregation methods is much easier in implementation but it is subject to a few constraints, the two most important of which are:

  • embedded filter should support instantiation as an aggregated object
  • it is the only underlying filter, not a chain of filters, which can produce required data

COM aggregation is quite fitting for the purpose, so the second embedding method is being left for another topic (with a certain luck to be appear very soon, a DirectShow video capture source filter for a real network/IP camera).

The next step is a check of sufficient implementation in an underlying filter. Obviously, a video source that pretends to be a live video capture source needs an endless stream of media samples, while original implementation streams JPEG files as media samples once only, we need an option to loop the streaming and automatically repeat the sequence.

Playback looping is added to the original JPEG Multi File Source Filter and its controlling private interface IJpegMultiFileSourceFilter received additional properties:

interface IJpegMultiFileSourceFilter : IDispatch
{
...
    [propget, id(2)] HRESULT AutoRepeat([out, retval] VARIANT_BOOL* pbAutoRepeat);
    [propput, id(2)] HRESULT AutoRepeat([in] VARIANT_BOOL bAutoRepeat);
    [propget, id(3)] HRESULT RepeatDelay([out, retval] LONG* pnRepeatDelay);
    [propput, id(3)] HRESULT RepeatDelay([in] LONG nRepeatDelay)

as well the new property page:

As soon as the underlying implementation is ready, it is the right time to get down to the immediate filter COM class. It will be require to implement certain interfaces, to expose certain interfaces of the aggregated object and possibly to hide certain interfaces of the aggregated object. However, the first thing to do is to instantiate aggregated object, for which the right place is ATLs’ FinalConstruct:

class ATL_NO_VTABLE CFilter :
...
{
...
DECLARE_PROTECT_FINAL_CONSTRUCT()

DECLARE_GET_CONTROLLING_UNKNOWN()
...
    CComPtr<IUnknown> m_pInnerUnknown;
    CComPtr<IBaseFilter> m_pInnerBaseFilter;
    CComPtr<IPersistStream> m_pInnerPersistStream;
...
    HRESULT FinalContruct()
    {
...
        __C(m_pInnerUnknown.CoCreateInstance(__uuidof(JpegMultiFileSourceFilter), GetControllingUnknown()));
        CComQIPtr<IJpegMultiFileSourceFilter> pJpegMultiFileSourceFilter = m_pInnerUnknown;
        __D(pJpegMultiFileSourceFilter, E_NOINTERFACE);
...
        __C(pJpegMultiFileSourceFilter->put_Directory(CComBSTR(sDirectory)));
        __C(pJpegMultiFileSourceFilter->put_AutoRepeat(ATL_VARIANT_TRUE));
        __C(pJpegMultiFileSourceFilter->put_RepeatDelay(nRepeatDelay));
...
        m_pInnerBaseFilter = CComQIPtr<IBaseFilter>(m_pInnerUnknown);
        m_pInnerPersistStream = CComQIPtr<IPersistStream>(m_pInnerUnknown);
        __D(m_pInnerBaseFilter, E_NOINTERFACE);

The aggregated (inner) object is created and its “main” IUnknown pointer is saved as lifetime reference (because for the COM aggregated object this is the only interface pointer which AddRef/Release has effect on internal reference counter – see COM Aggregation details). The object is pre-initialzed immediately after creation and the effective values will be taken from registry key “SOFTWARE\Alax.Info\Media Tools\JPEG MultiFile Video Capture Filter” (see below). It also makes sense to pre-query most used object’s interfaces to be able to use them directly without querying for them each time they are required.

Once the inner object is here, we need to sort out the list of interfaces to support. Video capture source filters have specific requirements (see Writing Capture Filters), including indication of pin category and implementation of IKsPropertySet interface; optional availability of preview pin, which we can omit from implementation and let Filter Graph Manager use Smart Tee Pin Filter to split stream into preview and capture.

Additionally, various software is expecting IAMStreamConfig interface on the output pin and would just fail to use the video capture device in case of its absence. We don’t have this interface implemented on the underlying filter, so we have to implement this on the wrapper. Since we need additional interfaces on the pin and the pin is an owned object of an inner aggregated object, it adds complexity to wrap original pin with a private pin class. An easy implementation of the latter point is not quite safe with COM aggregation, however a well written software that treats IPin interface as the “primary” pin’s interface and queries additional from it and holds the reference while using other interfaces will do just fine with a simple implementation on our side.

Another problem with the underlying filter which actually came up later but worth mentioning from the start. A JPEG Multi File Source Filter streams video media sample with an empty format structure attached to AM_MEDIA_TYPE media type. This assumes that an instance of JPEG Frame Decoder Filter will be added downstream and supply missing format by decoding JPEG data. While the filter is flexible enough to dynamically change media type along with JPEG data changes, this is not compatible with (not implemented by) most of the software. JPEG Frame Decoder Filter makes a first guess by settings resolution to 320×240. We need to set the right resolution immediately to avoid media type re-agreement.

To be able to do so we need to pass resolution information from original (inner) filter to the JPEG Frame Decoder Filter instance. It is also important that the filters might have other filters connected in between and in most cases there will be at least Smart Tee Pin Filter. To let the filters effectively communicate between each other and including in a reliable and documented way, available also to other/future filters, JPEG Frame Decoder Filter inroduces a new interface IJpegFrameDecoderFilterSource which, on its input pin connection, it looks for among the upstream filters. Iteratively walking through upstream connections up to developed video capture source filter it reaches our implementation and is capable of changing the default resolution to the effective resolution and exposed correct media type on its output pin right from the start.

So the interfaces on the filter:

  • implemented on the filter:
    • IBaseFilter with IMediaFilter and IPersist to provide our own pin enumeration and supply a wrapper class for a pin object
    • IPersistStreamInit and IPersistStream to be friendly with persistence-enabled applications
    • IAMovieSetup to be able to be registered with DirectShow the way it is normally done
    • IFilter without IDispatch is the filters private interface with IDispatch not exposed directly to let the underlying filter expose his
    • IJpegFrameDecoderFilterSource to communicate to downstream JPEG Frame Decoder Filter and provide resolution information
  • hidden implementation on the inner filter (other interfaces implemented by inner filter and not mentioned will be also available “outside”):
  • implemented on the pin:
    • IPin to intercept method calls to possibly override media type enumeration and pin connection condition; while implementation of its methods (appeared to be not necessary, we still have to expose our private implementation of this interface since software will in most cases query additional interfaces from this interface pointer
    • IKsPropertySet per MSDN requirement
    • IAMStreamConfig to be compatible with wider range of software, specifically AMCap

Implementation details is available in reference implementation Filter.h in SVN.

An updated version of the dependent binaries Acquisition.dll and CodingI.dll is available from SVN.

A few points to mention explicitly:

  • IKsPropertySet::Get marks the pin as a capture pin (PIN_CATEGORY_CAPTURE)
  • IAMStreamConfig minimal implementation ignores media type in SetFormat; returns the only available media type in GetFormat using 300 fps frame rate value in order for it to be not zero and not very low to avoid collision of frames; returns one video capability in GetNumberOfCapabilities and GetStreamCaps – this should be sufficient provided that we actually don’t accept any cahnges/configuration on this interface
  • IBaseFilter::EnumPins exposes a private wrapper pin class over original pin, so that we are capable of providing additional interfaces off the exposed output pin; we should have also provided an equal replacement through IBaseFilter::FindPin but this method is actually very rarely used and appears to be safe to omit, probably unless a compatibility issue arises later
  • IJpegFrameDecoderFilterSource::GetDefaultExtent is the way JPEG Frame Decoder Filter obtains original video resolution.

On initialization the filter will query registry key “SOFTWARE\Alax.Info\Media Tools\JPEG MultiFile Video Capture Filter” for initialization values. Both HKEY_LOCAL_MACHINE and HKEY_CURRENT_USER might be used, HKEY_CURRENT_USER has the priority if present, unless HKEY_LOCAL_MACHINE has an additional non-zero REG_DWORD value named “Force”, in which case it takes priority over HKEY_CURRENT_USER. Initialization values are:

  • “Directory” REG_SZ is the directory containing JPEG files to stream from
  • “Repeat Delay” REG_DWORD is the pause in milliseconds between last and first frame when looping when playback
  • “Video Width” and “Video Height” REG_DWORD is original video resolution and should match those of JPEG files

Another necessary implementation stroke is related to time stamps. The original inner JPEG Multi File Source Filter did not do any frame rate control at which the media samples are sent downstream. It was assumed that renderer filter will enforce media sample presentation time and add necessary delay while playing video frames back. However, this approach does not work in a capture filter: when previewing video, a Smart Tee Pin Filter, which is inserted to make preview samples out of capture samples, is stripping time stamps and video renderer renders media samples at full possible rate. To address this issue, the JPEG Multi File Source Filter is received a new property Rate, which defaults to zero and implements uncontrolled sending of data downstream. With a Rate property value of 1.0, the filter does rate control and adds necessary delay as if the media samples are captured in real time.

Compatibility

Graph Edit and Graph Studio list, insert and are otherwise compatible with the new filter. They don’t have any assumptions on the filter implementation except the very basic, so they are quite capable of manipulating the new filter.

AMCap Sample recognizes the capture source and is rendering the filter through Capture Graph Builder, which automatically inserts Smart Tee Pin Filter. The application is properly showing the video from the new filter.

Media Player Classic‘s “Open Device…” menu command is capable of connecting to the new filter and the application is showing video.

VideoLAN [unexpectedly] failed to receive video from the new filter. It appears that the application is enumerating supported media types and is only capable of accepting those handled directly by the application. JPEG video with FOURCC code ‘AIJ0’ is not on that list and the application does not make any attempt, that a proper application would do, to render the filter through intermediate codecs using Intelligent Connect. The application won’t be able to receive video from the filter before the latter implements decoding into a well known pixel format, such as for example RGB or YUY2.

Skype beta 4.0.0.176 failed to receive video from the filter, though the filter is seen by the software as a video source. The logs don’t indicate the reason and it seems that the application does not like video capabilities and stops trying to get video from the device.

Summary

  • Virtual DirectShow camera implemented
  • Supported by Graph Edit, Monogram Graph Studio (one has to re-enter file directory from GUI when adding a filter), AMCap, Media Player Classic
  • Does not integrate into VideoLAN because of its [unexpectedly] limited capabilities
  • Does not integrate into Skype for whatever reason, probably just another of a number of bugs in a beta version

A partial reference Visual C++ .NET 2008 source code is available from SVN, release binary included.

11 Replies to “JPEG Multi File Video Capture Source Filter, a virtual DirectShow camera”

  1. The error code 0x80040110 is CLASS_E_NOAGGREGATION “Class does not support aggregation (or class object is remote)” which means that it is unable to create an instance because the class does not support COM aggregation.

    When you create the instance by adding a filter to GraphEdit, no aggregation is used in first place. However this filter, as mentioned above, embeds another filter (JPEG Multi File Source Filter) through aggregation and the problem is likely to be that it cannot be created.

    That is, you need installed and COM-registered (regsvr32):

    • JpegMultiFileVideoCaptureSource.dll – this DLL hosting JPEG Multi File Video Capture Source Filter
    • Acquisition.dll – DLL from https://alax.info/blog/741, which hosts JPEG Multi File Source Filter
    • Other dependency files I see:
      • MSVCR90.DLL and ATL90.DLL – C++ and ATL runtime libraries
      • DBGHELP.DLL – MS debugging tools

      I rebult the DLL and updated in SVN so that the latest build does not reference any of these.

    • If you would like to have the JPEG frames decoded, not only receive raw frames, you will also need CodingI.dll and it needs Intel IPP runtime files installed, see details here

    If the problems persists, I would appreciate your letting know more about the environment (OS, service packs etc).

  2. Sorry for my late feedback.

    After installed Acquisition.dll, it worked except I still can’t build a working graph.
    Then I’ve tried to install CodingI.dll, as well as IPP libraries. It’s impossible to regsvr CodingI.dll until I put it under \Intel\IPP\6.0.1.070\ia32\bin, since I received “can’t load dlls in waterfall procedure”. Looks like there’s sth wrong to do with IPP enviroment.

    Then I manually added HTTP Stream Source filter to GraphEdit and had GraphEdit automatically render the rest of graph, still failed due to same error and error message, finally ended up with GE crashed.

    I’m running WinXP SP2.

    Thanks.

  3. The “waterfall” problem is that Windows still cannot find IPP DLLs. You need to put Intel IPP directories on system search path (envrionment PATH variable) or you can also try putting GraphEdt.exe into ntel\IPP\6.0.*\ia32\bin directory. Or another option is to copy IPP DLLs into Windows\system32 directory.

  4. yeah, you’re right. Simply putting GraphEdit into IPP dir got thing worked.
    I thought IPP should set its env path by itself. However it was only my own thought, sorry.

    Another problem happened now, I built a JPEG Multi File Source graph and specified a dir containing some jpgs, then ran it, then got “the graph could not change its state” with return code “0x80004005”.

    Do I have to provide some jpgs as same resolution as what gets set in JPEG Frame Decoder filter property page?

  5. Do I have to provide some jpgs as same resolution as what gets set in JPEG Frame Decoder filter property page?

    Yes you do, because resolution is agreed at connection time, and because real JPEGs are not yet available at the moment, filters use values from property page as a first guess.

    If the real JPEGs are of different resolution, filters try to re-agree media types. But Video Renderer is no capable of doing so, and you receive this error.

    By the way, you can check RenderHttpMjpegVideo01 Sample which has code to intercept resolution change and restart graph with updated media type. See how you can start this sample from command line.

  6. Do you know there is any RTSP live source filter?

    There is RTSP code from Live555.com, perhaps it may be useful for you.

    Also note Morgan RTP DirectShow Filters Beta from Morgam Multimedia:

    It is a set of DirectShow filters that allows you to perform media-streaming on your Windows PC :

    * Morgan RTP Source Filter (to receive media content over a network).
    * Morgan RTP Destination Filter (to send media content over a network).

Leave a Reply