Audio playback at non-standard rates in DirectShow

DirectShow streaming and playback in particular offers flexible playback rates for scenarios where playback is requested to take place slower or faster than real time. For a DirectShow developer, the outer interface is pretty straightforward:IMediaPosition::put_Rate takes playback rate and that’s it.

Playback rate. Must not be zero.

The playback rate is expressed as a ratio of the normal speed. Thus, 1.0 is normal playback speed, 0.5 is half speed, and 2.0 is twice speed. For audio streams, changing the rate also changes the pitch.

Even after taking out the case of reverse playback, which is not supported out of the box and requires some DirectShow magic to implement, there is a nasty problem from those who want to be able to change playback rate flexibly on the go.

Rates greater than one are faster than normal. Rates between zero and one are slower than normal. Negative rates are defined as backward playback, but in practice most filters do not support it. Currently none of the standard DirectShow filters support reverse playback.

The problem comes up when an audio-enabled file/stream is being played back and there is an audio renderer in the pipeline. The filter graph would connect and play excellently, but once you try to change playback rate too much, the request might fail unexpectedly with 0x8004025C VFW_E_UNSUPPORTED_AUDIO “Cannot play back the audio stream: the audio format is not supported.” error.

An application that “almost does everything right” is unable to do a small thing as simple as fast forward playback!

The root of the problem is in audio renderer. Requests to change playback rate propagate through filter graphs through IMediaSeeking interface and Filter Graph Manager sends the new rates through renderers upstream. Audio renderer rejects to accept the rates it does not support and this breaks the whole thing.

Earlier implementations had [supposedly? “But I cannot call SetRate with more than 2, it returns VFW_E_UNSUPPORTED_AUDIO.”] a limit of 50%..200% rate range, and since Vista the actual range is somewhat relaxed. Having no documentation reference, my educated guess is that actual playback rate limit is defined by ability of the renderer to resample the data into format accepted by underlying device. That is, a device taking up to 192 kHz audio could be used to play 44.1 kHz content at rates up to 435%.

The nasty part of the problem is that even though one might want to mute the audio part at such rates, or exclude audio substream at all, this is only possible with transition through stopped state (due to supposed changes in filter graph topology) and otherwise audio renderer blocks rate changing with the mentioned error code.

So, is there any way to fix VFW_E_UNSUPPORTED_AUDIO issue? with reuse of existing components and smooth user experience on the UI side? One of the approaches is to customize the behavior of standard audio renderer, DirectSound Renderer Filter.

Filter Graph Manager would use its IMediaSeeking/IMediaPosition interfaces directly, so the filter cannot be added into filter graph as is. Fhe following is the checklist for required updates:

  • IMediaSeeking needs to be intercepted to accept wide range of rates, to pass some of them transparently and fake those accepted in “muted” mode
  • IPin, IMemInputPin interfaces need to be intercepted to accept incoming media sample, to pass them through or suppress and replace with IPin::EndOfStream in “muted” mode

The mentioned tasks make it impossible to have standard audio renderer as a normal participant of the filter graph, however a wrapper COM object can achieve the planned just fine without a single line of code doing audio. The figure below shows how standard DirectSound renderer is different from its wrapper.

Wrapper

The complete list of tasks to do in the wrapper:

  • IPin::QueryPinInfo needs to properly report wrapper filter
  • IPin::EndOfStream needs to suppress EOS call in case we already “muted” artificially
  • IPin::NewSegment needs to replace rate argument with 1.0 before forwarding to real renderer in case we decided to “mute” the stream
  • IMemInputPin::Receive and IMemInputPin::ReceiveMultiple need to replace media sample delivery with an EOS in case we are muting the stream
  • IBaseFilter::EnumPins and IBaseFilter::FindPin should properly expose pin wrapper
  • IMediaSeeking::SetRate accepts any rate and decides on muting or transparent operation, then forward real or fake value to the real renderer managed internally
  • IMediaSeeking::GetRate reports accepted rate

As the list says, wrapper filter can accept any rate (including negative!) and decode on transparent playback or muted operation for unsupported or otherwise unwanted rates. No filter graph re-creation or stopping required when changing rates, and changing muting.

A DirectSound renderer filter added to the graph automatically or otherwise, as a part of normal graph construction needs to be replaced by the wrapper in the following way:

CLSID ClassIdentifier;
if(FAILED(FilterArray[nIndex]->GetClassID(&ClassIdentifier)))
    continue;
// NOTE: DirectSound Renderer Filter, CLSID_DSoundRender
//       http://msdn.microsoft.com/en-us/library/windows/desktop/dd375473%28v=vs.85%29.aspx
if(ClassIdentifier != CLSID_DSoundRender)
    continue;
const CComPtr<IPin> pInputPin = _FilterGraphHelper::GetFilterPin(pBaseFilter);
const CComPtr<IPin> pOutputPin = _FilterGraphHelper::GetPeerPin(pInputPin);
const CMediaType pMediaType = _FilterGraphHelper::GetPinMediaType(pInputPin);
const CStringW sName = _FilterGraphHelper::GetFilterName(pBaseFilter);
__C(FilterGraph.RemoveFilter(pBaseFilter));
CObjectPtr<CFilter> pFilter;
pFilter.Construct();
pFilter->Initialize(pBaseFilter);
__C(FilterGraph.AddFilter(pFilter, sName + _T(" (Substitute)")));
__C(FilterGraph.ConnectDirect(pOutputPin, pFilter->GetInputPin(), pMediaType));

Leave a Reply