An effect of excessive RGB conversion onto video streaming perofrmance

Started here: How can I overlay timestamp on the image? on microsoft.public.win32.programmer.directx.video

Let us see if RGB conversion adds any noticeable effect on streaming YUY2 video, typical output of video decompressor.

As a reference I am taking a simple YUY2 source -> Video Mixing Render Filter (VMR) graph, where source filter streams the same pre-allocated and pre-initialized data in an infinite loop:

while(WaitForSingleObject(TerminationEvent, 0) == WAIT_TIMEOUT)
{
	ATLENSURE_SUCCEEDED(m_pSourceFilter->InjectMediaSample(m_pnData, m_nDataSize));
	CRoCriticalSectionLock DataLock(m_DataCriticalSection);
	m_pnInjectedFrameCounts[0]++;
}

Video resolution is 640×480 pixels.

What is actually consuming CPU resources here is data copy into VMR’s media sample buffer and actually streaming. VMR might be blocking control waiting on rendering completion, I am leaving this for default VMR to decide (it might be hardware dependent etc).

Running at full pace, the application is rendering 510 frames per second consuming virtually no CPU. That is VMR is waiting until meida sample is rendered, this only allows streaming mentioned number of media samples per second, however rendering process does not take CPU resource, just waiting for video hardware to complete.

I am inserting Sample Grabber Filter to the graph, initialized with 640×480 24-bit RGB media type, between the source and the renderer. No callback, just a filter insertion to insist on media type. “Before” and “after” code:

#if TRUE && FALSE
	CComPtr<IBaseFilter> pSampleGrabberBaseFilter;
	ATLENSURE_SUCCEEDED(pSampleGrabberBaseFilter.CoCreateInstance(CLSID_SampleGrabber));
	CComQIPtr<ISampleGrabber> pSampleGrabber = pSampleGrabberBaseFilter;
	CMediaType pRgbMediaType;
	pRgbMediaType.AllocateVideoInfo(640, 480, 24);
	pSampleGrabber->SetMediaType(pRgbMediaType);
	ATLENSURE_THROW(pSampleGrabber, E_NOINTERFACE);
	ATLENSURE_THROW(pGraphBuilder->AddFilter(pSampleGrabberBaseFilter, CStringW(_T("24-bit RGB Sample Grabber"))));
	ATLENSURE_SUCCEEDED(pGraphBuilder->Connect(m_pSourceFilter->GetOutputPin(), _FilterGraphHelper::GetFilterPin(pSampleGrabberBaseFilter, PINDIR_INPUT)));
	ATLENSURE_SUCCEEDED(pGraphBuilder->Render(_FilterGraphHelper::GetFilterPin(pSampleGrabberBaseFilter, PINDIR_OUTPUT)));
#else
	ATLENSURE_SUCCEEDED(pGraphBuilder->Render(m_pSourceFilter->GetOutputPin()));
#endif

DirectShow intelligent connect is inserting two additional filters to the graph: AVI Decompressor Filter to convert YUY2 to RGB and Color Space Converter Filter for the VMR to have necessary upstream flexibility to choose a media type with extended stride.

Running still at full pace the application is only rendering 210 frames per second while CPU consumption jumped to 30%. What is consuming CPU cycles in the changed filter graph? The conversion from YUY2 to RGB in AVI Decompressor Filter and possible additional data copy between buffers.

Hardware:

Reference source code (will require additional headers to compile): FrameRateSample01.01.zip (note that Release build binaries are included)

UPDATE: Another test to bring more detail into performance impact. I am taking Sample Grabber Filter out and let the graph run without it but the the auto-inserted filters, in order to remove effect of the Sample Grabber Filter itself.

CComPtr<IPin> pSampleGrabberInputPeerPin = _FilterGraphHelper::GetPeerPin(_FilterGraphHelper::GetFilterPin(pSampleGrabberBaseFilter, PINDIR_INPUT));
CComPtr<IPin> pSampleGrabberOutputPeerPin = _FilterGraphHelper::GetPeerPin(_FilterGraphHelper::GetFilterPin(pSampleGrabberBaseFilter, PINDIR_OUTPUT));
ATLENSURE_SUCCEEDED(pGraphBuilder->RemoveFilter(pSampleGrabberBaseFilter));
ATLENSURE_SUCCEEDED(pGraphBuilder->Connect(pSampleGrabberInputPeerPin, pSampleGrabberOutputPeerPin));

Frame rate is slightly higher than in previous test with Sample Grabber Filter but it does not keep constant: it keeps jumping between 210 and 250 fps with CPU load jumping between 10% and 50% (note 50% is a 100% load on one of the CPU cores out of the two).

What makes it different in comparison with previous run where Sample Grabber Filter is present? A check of pins’ media types with GraphEdit shows that filters decided to not connect on 24-bit RGB media type. Instead they are connecting on 32-bit RGB between AVI Decompressor Filter and Color Space Converter Filter. Previously when Sample Grabber Filter insisted on 24-bit RGB, Color Space Converter Filter performed additional data conversion from 24-bit RGB to 32-bit RGB because VMR chose to use 32-bit RGB to accept data in.

Updated binary for the third test: FrameRateSample01-RGB32.zip

One Reply to “An effect of excessive RGB conversion onto video streaming perofrmance”

Leave a Reply