Reference Signal Source: audio as Media Foundation source

Reference signal source for DirectShow in its video part already received Media Foundation Source interface earlier.

This time, the update implements a separate Media Foundation source for audio. MfGenerate2 sample code gives an idea on how to initialize the source:

using namespace AlaxInfoDirectShowReferenceSource;
CComPtr<IAudioMediaSource> pSource;
__C(pSource->SetMediaType(NULL, g_nSampleRate, g_nChannelCount, g_nBitDepth));
__C(pSource->put_Duration((DOUBLE) g_nDuration));
CComPtr<IMFMediaSource> pAudioMediaSource = pSource;

The source can be given specific format using Media Foundation stream descriptor’s media type handler, or set up via private COM interface.

The source can be given sampling rate, channel count (all channels receive the same signal) and bit depth for PCM audio formats (8, 16..32), or 32-bit IEEE floating point format.

Video and audio streams can also be combined into aggregate source (video+audio) to produce a multi-track output. The MfGenerate2 sample shows the approach as well:

__D(pVideoMediaSource || pAudioMediaSource, E_UNNAMED);
if(pVideoMediaSource && pAudioMediaSource)
    CComPtr<IMFCollection> pCollection;
    __C(MFCreateAggregateSource(pCollection, &pMediaSource));
} else
    pMediaSource = pVideoMediaSource ? pVideoMediaSource : pAudioMediaSource;

The sample project is capable to produce output of this kind:

Download links

DirectShowSpy: REGDB_E_CLASSNOTREG with IMMDevice::Activate

A DirectShow developer complained on sudden failure of Core Audio IMMDevice::Activate call supposed to instantiate a DirectShow filter for a given device.

The problem appeared to be related to installed DirectShowSpy and its interference with the API calls. The symptom was of the following kind: when Activate was called for different types of objects, the calls all succeeded except interoperation with DirectShow (activation for IBaseFilter), e.g. EnumerateAudioDevices output:

    IAudioClient            0x00000000
    IAudioEndpointVolume    0x00000000
    IAudioMeterInformation  0x00000000
    IAudioSessionManager    0x00000000
    IAudioSessionManager2   0x00000000
    IBaseFilter             REGDB_E_CLASSNOTREG
    IDeviceTopology         0x00000000
    IMFTrustedOutput        0x00000000

When Core Audio is requested to do DirectShow activation, the API creates and instance of System Device Enumerator, which is forwarded the activation call to. DirectShowSpy intercepts these calls, however what it did not do was support for unknown COM interfaces, and support for undocumented IMMDeviceActivator interface which is used internally by the APIs to forward the activation call.

So, System Device Enumerator implements documented ICreateDevEnum and then it also implements undocumented internal IMMDeviceActivator. The entire sequence call is as follows:

// Top level code:

CComPtr<IMMDevice> pDevice = ...; // Audio endpoint interface
pDevice->Activate(..., __uuidof(IBaseFilter), ...)

// API:

    // ...
    if(requested is IBaseFilter)
        CComPtr<IMMDeviceActivator> pDeviceActivator;
        return pDeviceActivator->Activate(pDevice, ...)

DirectShowSpy’s failure to provide IMMDeviceActivator resulted in symptom in question and is fixed with version and on. The failure code is not so much descriptive, but of course the APIs did not expect external hook and failure is not actually a supposed possible behavior there.

System Device Enumerator matches the known devices to the provided Core Audio device and creates an instance of respective filter – this is how APIs work together. DirectShowSpy prints these calls out to its log.

roatlbase.h(1582): TraceModuleVersion: "D:\...\DirectShowSpy-Win32.dll" version is, running in "D:\...\EnumerateAudioDevices-Win32.exe" at 0x63210000
dllmain.h(36): CDirectShowSpyModule::CDirectShowSpyModule: this 0x633963A4
SystemDeviceEnumeratorSpy.h(669): CSystemDeviceEnumeratorSpyT<...>::CSystemDeviceEnumeratorSpyT: this 0x02F1DA68
SystemDeviceEnumeratorSpy.h(681): CSystemDeviceEnumeratorSpyT<...>::FinalConstruct: pszPath "D:\...\EnumerateAudioDevices-Win32.exe", this 0x02F1DA68, m_dwRef 1
SystemDeviceEnumeratorSpy.h(49): CSystemDeviceEnumeratorSpyT<...>::InternalQueryInterface: 0x02F1DA68, Interface {3B0D0EA4-D0A9-4B0E-935B-09516746FAC0}, Result 0x00000000
SystemDeviceEnumeratorSpy.h(49): CSystemDeviceEnumeratorSpyT<...>::InternalQueryInterface: 0x02F1DA68, Interface {3B0D0EA4-D0A9-4B0E-935B-09516746FAC0}, Result 0x00000000
SystemDeviceEnumeratorSpy.h(808): CSystemDeviceEnumeratorSpyT<...>::Activate: this 0x02F1DA68, InterfaceIdentifier {56A86895-0AD4-11CE-B03A-0020AF0BA770}, pMmDevice 0x0054E7F8
SystemDeviceEnumeratorSpy.h(815): CSystemDeviceEnumeratorSpyT<...>::Activate: nActivateResult 0x00000000 
SystemDeviceEnumeratorSpy.h(673): CSystemDeviceEnumeratorSpyT<...>::~CSystemDeviceEnumeratorSpyT: this 0x02F1DA68

Download links

Reference signal source for DirectShow

Every so often there are tasks that need certain reference video or video/audio footage with specific properties: resolution, frame rate, frame accuracy with content identifying specific frame, motion in view, amount of motion which is “hard” for processing with encoder tuned for natural video, specific video and audio synchronization.

There is of course some content available, and sometimes it’s really nice:

Bipbopbipbop video on Youtube

However once in a while you need 59.94 fps and not 60, and another time you’d go with 50 so that millisecond time is well-aligned and every second has equal number of frames, then next time you need specific aspect ratio override and then you’d prefer longer clip to a short one.

I converted one of my sources for reference signal into DirectShow filters, which might be used to produce infinite signal, or otherwise they might be used to generate a file of specific format with specific properties.

The filters are Reference Video Source and Reference Audio Source, regular filters registered in a separate category (not virtual video/audio source devices – yet?), available for instantiation programmatically or in GraphStudioNext/GraphEdit.

DirectShowReferenceSource filters in GraphStudio

The filters are in both 32- and 64-bit versions, with hardcoded properties (yet?): 1280×720@50 32-bit RGB for video and 16-bit PCM mono at 48 kHz for audio. Programmatically, however, the filters can be tuned flexibly using IAMStreamConfig::Format call:

  • Video:
    • Any resolution
    • 32-bit RGB, top-to-bottom only (the filter internally uses Direct2D/WIC to generate the images)
    • Any positive frame rate
    • Aspect ratio can be overridden using VIDEOINFOHEADER2 format, e.g. to force SD video to be 4:3
  • Audio:
    • Any sample rate
    • 16-bit PCM or 32-bit IEEE floating point format
    • Mono

Video filter generates image sequence with properties burnt in, with frame number, 100ns time, time with frame number within second, and a circle with a sector filled to reflect current sub-second time. There is Uh/Oh text inside the circle at sharp second times and the background color is in continuous transition between colors.

Audio filter beeps every second during the first 100 ms of a second, with a tone different for every fifth and every tenth second.

DirectShowReferenceSource fitlers running in GraphStudio

Both filters support IAMStreamControl interface, and IAMStreamControl::StopAt method in particular, which allows to limit signal duration and be used for time accurate file creation.

This comes with a sample project that demonstrates ASF file generation for specific properties and duration. Output file generated by the sample is Output.asf.

ASF file format and WM ASF Writer are chosen for code brevity and to reference stock multiplexer. This has a downside that multiplexer re-scales video to profile’s resolution and frame rate, of course. Those interested in generation of their own content would use something like their favorite H.264 and AAC encoders with MP4 or MKV multiplexer perhaps. And a nicer output would look like Output.mp4 then.

A good thing about publishing these filters is that while preparing test project, I hit a thread safety bug in GDCL MP4 multiplexer, which is presumably present in all/most version of the mp4mux.dll out there: if filter graph is stopped at the time of video streaming, before reaching end-of-stream on video leg (which is often the case because upstream connection would be H.264 encoder having internal queue of frames draining then on worker threads processing stop request), multiplexer might generate a memory access violation trying to use NULL track object, which is already gone.

Download links


Audio playback at non-standard rates in DirectShow

DirectShow streaming and playback in particular offers flexible playback rates for scenarios where playback is requested to take place slower or faster than real time. For a DirectShow developer, the outer interface is pretty straightforward:IMediaPosition::put_Rate takes playback rate and that’s it.

Playback rate. Must not be zero.

The playback rate is expressed as a ratio of the normal speed. Thus, 1.0 is normal playback speed, 0.5 is half speed, and 2.0 is twice speed. For audio streams, changing the rate also changes the pitch.

Even after taking out the case of reverse playback, which is not supported out of the box and requires some DirectShow magic to implement, there is a nasty problem from those who want to be able to change playback rate flexibly on the go.

Rates greater than one are faster than normal. Rates between zero and one are slower than normal. Negative rates are defined as backward playback, but in practice most filters do not support it. Currently none of the standard DirectShow filters support reverse playback.

The problem comes up when an audio-enabled file/stream is being played back and there is an audio renderer in the pipeline. The filter graph would connect and play excellently, but once you try to change playback rate too much, the request might fail unexpectedly with 0x8004025C VFW_E_UNSUPPORTED_AUDIO “Cannot play back the audio stream: the audio format is not supported.” error.

An application that “almost does everything right” is unable to do a small thing as simple as fast forward playback!

The root of the problem is in audio renderer. Requests to change playback rate propagate through filter graphs through IMediaSeeking interface and Filter Graph Manager sends the new rates through renderers upstream. Audio renderer rejects to accept the rates it does not support and this breaks the whole thing.

Earlier implementations had [supposedly? “But I cannot call SetRate with more than 2, it returns VFW_E_UNSUPPORTED_AUDIO.”] a limit of 50%..200% rate range, and since Vista the actual range is somewhat relaxed. Having no documentation reference, my educated guess is that actual playback rate limit is defined by ability of the renderer to resample the data into format accepted by underlying device. That is, a device taking up to 192 kHz audio could be used to play 44.1 kHz content at rates up to 435%.

The nasty part of the problem is that even though one might want to mute the audio part at such rates, or exclude audio substream at all, this is only possible with transition through stopped state (due to supposed changes in filter graph topology) and otherwise audio renderer blocks rate changing with the mentioned error code.

So, is there any way to fix VFW_E_UNSUPPORTED_AUDIO issue? with reuse of existing components and smooth user experience on the UI side? One of the approaches is to customize the behavior of standard audio renderer, DirectSound Renderer Filter.

Filter Graph Manager would use its IMediaSeeking/IMediaPosition interfaces directly, so the filter cannot be added into filter graph as is. Fhe following is the checklist for required updates:

  • IMediaSeeking needs to be intercepted to accept wide range of rates, to pass some of them transparently and fake those accepted in “muted” mode
  • IPin, IMemInputPin interfaces need to be intercepted to accept incoming media sample, to pass them through or suppress and replace with IPin::EndOfStream in “muted” mode

The mentioned tasks make it impossible to have standard audio renderer as a normal participant of the filter graph, however a wrapper COM object can achieve the planned just fine without a single line of code doing audio. The figure below shows how standard DirectSound renderer is different from its wrapper.


The complete list of tasks to do in the wrapper:

  • IPin::QueryPinInfo needs to properly report wrapper filter
  • IPin::EndOfStream needs to suppress EOS call in case we already “muted” artificially
  • IPin::NewSegment needs to replace rate argument with 1.0 before forwarding to real renderer in case we decided to “mute” the stream
  • IMemInputPin::Receive and IMemInputPin::ReceiveMultiple need to replace media sample delivery with an EOS in case we are muting the stream
  • IBaseFilter::EnumPins and IBaseFilter::FindPin should properly expose pin wrapper
  • IMediaSeeking::SetRate accepts any rate and decides on muting or transparent operation, then forward real or fake value to the real renderer managed internally
  • IMediaSeeking::GetRate reports accepted rate

As the list says, wrapper filter can accept any rate (including negative!) and decode on transparent playback or muted operation for unsupported or otherwise unwanted rates. No filter graph re-creation or stopping required when changing rates, and changing muting.

A DirectSound renderer filter added to the graph automatically or otherwise, as a part of normal graph construction needs to be replaced by the wrapper in the following way:

CLSID ClassIdentifier;
// NOTE: DirectSound Renderer Filter, CLSID_DSoundRender
if(ClassIdentifier != CLSID_DSoundRender)
const CComPtr<IPin> pInputPin = _FilterGraphHelper::GetFilterPin(pBaseFilter);
const CComPtr<IPin> pOutputPin = _FilterGraphHelper::GetPeerPin(pInputPin);
const CMediaType pMediaType = _FilterGraphHelper::GetPinMediaType(pInputPin);
const CStringW sName = _FilterGraphHelper::GetFilterName(pBaseFilter);
CObjectPtr<CFilter> pFilter;
__C(FilterGraph.AddFilter(pFilter, sName + _T(" (Substitute)")));
__C(FilterGraph.ConnectDirect(pOutputPin, pFilter->GetInputPin(), pMediaType));

CaptureClock: Utility to Check Video/Audio Capture Rates

Someone discovered the utility while browsing my public repository (the app prompts to post data back to the website, and the anonymous user accepted the offer and posted the report from this unpublished application), so I have to drop a few lines about the tool.

The idea is basically straightforward: live capture involves attaching time stamps to media samples, and there is a chance that the time stamps slide away causing unwanted effects on captured clip. The application captures video and audio simultaneously and tracks media sample time stamps, and compares them against system clock as well. Having it simply run for a few minutes one can see how the capture is doing and if any of the timings drift away. Being stopped it puts report onto clipboard and optionally posts it back to me online (no actually specific intent about this data, however if you want to share data for a device that does drift away, you are to only click once to send me the details).

CaptureClock operation

The output is on clipboard in tab-separated values (TSV) format:

Computer Name   PSI
Windows Version 6.1.7601 Service Pack 1
Video Device    Conexant's BtPCI Capture    @device:pnp:\\?\pci#ven_109e&dev_036e&subsys_18511851&rev_02#4&39c3dd91&0&08f0#{65e8773d-8f56-11d0-a3b9-00a0c9223196}\global
Audio Device    Stereo Mix (Realtek High Defini @device:cm:{33D9A762-90C8-11D0-BD43-00A0C911CE86}\Stereo Mix (Realtek High Defini

System Time Video Sample Count  Video Sample Time   Relative Video Sample Time  Audio Sample Count  Audio Sample Time   Relative Audio Sample Time
30439   907 30381   -57 304 30291   -147

Or you might prefer pasting it onto Excel:

CapptureClock Output on Excel

By the way, this is also an easy way to ensure devices are operational and check effective video frame rate.

Download links:

Sample: Simultaneous Audio Playback via Waveform Audio (waveOut) API

The minimalistic sample demonstrates support of [deprecated] Waveform Audio API for multiple playback streams.

Depending on command line parameters, the application starts threads to open audio hardware using separate waveOutOpen call and stream one or more generated sine waves:

  • 1,000 Hz sine wave as 22,050 Hz, Mono, 16-bit PCM (command line parameter “a”)
  • 5,000 Hz sine wave as 32,000 Hz, Mono, 16-bit PCM (command line parameter “b”)
  • 15,000 Hz sine wave as 44,100 Hz, Mono, 16-bit PCM (command line parameter “c”)
Check(waveOutOpen(&hWaveOut, WAVE_MAPPER, &WaveFormatEx, NULL, NULL, CALLBACK_NULL));
WAVEHDR* pWaveHeader;
HGLOBAL hWaveHeader = (WAVEHDR*) GlobalAlloc(GMEM_MOVEABLE | GMEM_SHARE, sizeof *pWaveHeader + WaveFormatEx.nAvgBytesPerSec * 10);
pWaveHeader = (WAVEHDR*) GlobalLock(hWaveHeader);
pWaveHeader->lpData = (LPSTR) (BYTE*) (pWaveHeader + 1);
pWaveHeader->dwBufferLength = WaveFormatEx.nAvgBytesPerSec * 10;
//pWaveHeader->dwUser = 
pWaveHeader->dwFlags = 0;
pWaveHeader->dwLoops = 0;
#pragma region Generate Actual Data
    SHORT* pnData = (SHORT*) pWaveHeader->lpData;
    SIZE_T nDataCount = pWaveHeader->dwBufferLength / sizeof *pnData;
    for(SIZE_T nIndex = 0; nIndex < nDataCount; nIndex++)
    pnData[nIndex] = (SHORT) (32000 * sin(1.0 * nIndex / WaveFormatEx.nSamplesPerSec * nFrequency * 2 * M_PI));
#pragma endregion 
Check(waveOutPrepareHeader(hWaveOut, pWaveHeader, sizeof *pWaveHeader)); 
Check(waveOutWrite(hWaveOut, pWaveHeader, sizeof *pWaveHeader)); 

The operating system is supposed to mix the waves, which can be easily perceived taking place. It is possible to run the application with multiple waveforms within a process, e.g. “abc” command line parameter, and/or start multiple instances of the application.

A binary [Win32] and partial Visual C++ .NET 2010 source code are available from SVN.

Utility Clearance: Enumerate Audio ‘MMDevice’s

The utility and code does straightforward enumeration of MMDevices (Vista+, check MSDN for MMDevice API availability), which correspond to MMDevice API, WASAPI, Core Audio API. The code itself is straightforward, with a ready to use binary to quickly lookup data of interest:

The data is detailed well and in Excel-friendly format (via Copy/Paste):

The code also automatically looks up for named Windows SDK constants, such as PKEY_Device_FriendlyName:

    Identifier    {}.{4c1a7642-3f91-43e5-8fcf-b4b1e803d3f9}
    State    DEVICE_STATE_DISABLED    0x02
        {B3F8FA53-0004-438E-9003-51A46E139BFC}, 15    16 bytes of BLOB, DA 07 03 00 02 00 09 00 0E 00 39 00 16 00 C5 02    65
        PKEY_Device_DeviceDesc    Stereo Mix    31
        {B3F8FA53-0004-438E-9003-51A46E139BFC}, 6    Realtek High Definition Audio    31
        {B3F8FA53-0004-438E-9003-51A46E139BFC}, 2    {1}.HDAUDIO\FUNC_01&VEN_10EC&DEV_0888&SUBSYS_80860034&REV_1002\4&37D44F2F&0&0201    31
        {83DA6326-97A6-4088-9453-A1923F573B29}, 3    oem29.inf:AzaliaManufacturerID.NTamd64.6.0:IntcAzAudModel:\func_01&ven_10ec&dev_0888    31
        PKEY_Device_BaseContainerId    {00000000-0000-0000-FFFF-FFFFFFFFFFFF}    72
        PKEY_Device_ContainerId    {00000000-0000-0000-FFFF-FFFFFFFFFFFF}    72
        PKEY_Device_EnumeratorName    HDAUDIO    31
        PKEY_AudioEndpoint_FormFactor    10    19
        PKEY_AudioEndpoint_JackSubType    {DFF21FE1-F70F-11D0-B917-00A0C9223196}    31
        PKEY_DeviceClass_IconPath    %windir%\system32\mmres.dll,-3018    31
        {840B8171-B0AD-410F-8581-CCCC0382CFEF}, 0    316 bytes of BLOB, 01 00 00 00 38 01 00 00 ... 00 00 00 00    65
        PKEY_AudioEndpoint_Association    {00000000-0000-0000-0000-000000000000}    31
        PKEY_AudioEndpoint_Supports_EventDriven_Mode    1    19
        {24DBB0FC-9311-4B3D-9CF0-18FF155639D4}, 3    0    11
        {24DBB0FC-9311-4B3D-9CF0-18FF155639D4}, 4    -1    11
        {9A82A7DB-3EBB-41B4-83BA-18B7311718FC}, 1    65536    19
        {233164C8-1B2C-4C7D-BC68-B671687A2567}, 1    {2}.\\?\hdaudio#func_01&ven_10ec&dev_0888&subsys_80860034&rev_1002#4&37d44f2f&0&0201#{6994ad04-93ef-11d0-a3cc-00a0c9223196}\rtstereomixwave    31
        {5A9125B7-F367-4924-ACE2-0803A4A3A471}, 0    1610612916    19
        {B3F8FA53-0004-438E-9003-51A46E139BFC}, 0    3    19
        PKEY_Device_FriendlyName    Stereo Mix (Realtek High Definition Audio)    31
        PKEY_DeviceInterface_FriendlyName    Realtek High Definition Audio    31
        PKEY_AudioEndpoint_GUID    {4C1A7642-3F91-43E5-8FCF-B4B1E803D3F9}    31

A binary [Win32, x64] and partial Visual C++ .NET 2010 source code are available from SVN.

See also: