On efficiency of hardware-assisted JPEG decoding (AMD MFT MJPEG Decoder)

The previous post was focusing on problems with the hardware MFT decoder provided as a part of video driver package. This time I am going to mention some data about how the inefficiency affects performance of video capture using a high frame rate 260 FPS camera as a test stand. Apparently the effect is better visible with high frame rates because CPU and GPU hardware is fast enough already to process less complicated signal.

There is already some interest from AMD end (deserves a separate post why this is exceptional on its own), and some bug fixes are already under the way.

The performance problem is less visible because the decoder is overall performing without fatal issues and provides expected output: no failures, error codes, no deadlocks, neither CPU or GPU engine is peaked out, so things are more or less fine at first glance… The test application uses Media Foundation and Source Reader API to read textures in hardware MFT enabled mode and discards the textures just printing out the frame rate.

AMD MFT MJPEG Decoder

C:\...\MjpgCameraReader\bin\x64\Release>MjpgCameraReader.exe
 Using camera HD USB Camera
 Using adapter Radeon RX 570 Series
 Using video capture format 640x360@260.004 MFVideoFormat_MJPG
 Using hardware decoder MFT AMD MFT MJPEG Decoder
 Using video frame format 640x384@260.004 MFVideoFormat_YUY2
 72.500 video samples per second captured
 134.000 video samples per second captured
 135.000 video samples per second captured
 134.500 video samples per second captured
 135.500 video samples per second captured
 134.000 video samples per second captured
 134.000 video samples per second captured
 135.000 video samples per second captured
 134.500 video samples per second captured
 133.500 video samples per second captured
 134.000 video samples per second captured

With no sign of hitting a bottleneck the reader process produces ~134 FPS from the video capture device.

Alax.Info MJPG Video Decoder for AMD Hardware

My replacement for hardware decoder MFT is doing the decoding of the same signal, and, generally, shares a lot with AMD’s own decoder: both MFTs are built on top of Advanced Media Framework (AMF) SDK. Driver package installs runtime for this SDK and installs a decoder MFT which is linked against a copy of the runtime (according to AMD representative, the static link copy shares the same codebase).

C:\...\MjpgCameraReader\bin\x64\Release>MjpgCameraReader.exe
 Using camera HD USB Camera
 Using adapter Radeon RX 570 Series
 Using video capture format 640x360@260.004 MFVideoFormat_MJPG
 Using substitute decoder Alax.Info MJPG Video Decoder for AMD Hardware
 Using video frame format 640x360@260.004 MFVideoFormat_YUY2
 74.000 video samples per second captured
 261.000 video samples per second captured
 261.000 video samples per second captured
 261.000 video samples per second captured
 261.000 video samples per second captured
 260.500 video samples per second captured
 261.000 video samples per second captured
 261.000 video samples per second captured
 261.000 video samples per second captured
 261.000 video samples per second captured
 260.500 video samples per second captured

Similar CPU and GPU utilization levels with higher frame rate. Actually, with the expected frame rate because it is the rate the camera is supposed to operate at.

1280×720@120 Mode

Interestingly, at lower FPS mode the AMD MFT threading issues are present, and, more to that the MFT exhibits two other issues (one of them is “just ignore” one per AMD comment). At the same time video capture rate is no longer reduced: the horsepower of the hardware is hiding the implementation inefficiency.

 Using camera HD USB Camera
 Using adapter Radeon RX 570 Series
 Using video capture format 1280x720@120.000 MFVideoFormat_MJPG
 Using hardware decoder MFT AMD MFT MJPEG Decoder
 Using video frame format 1280x736@120.000 MFVideoFormat_YUY2
 18.500 video samples per second captured
 120.000 video samples per second captured
 120.000 video samples per second captured
 120.000 video samples per second captured
 120.000 video samples per second captured
 120.000 video samples per second captured
 120.000 video samples per second captured
 120.000 video samples per second captured
 120.000 video samples per second captured

Intel Hardware M-JPEG Decoder MFT

AMD is not the only one GPU vendor out there and my development system is equipped with integrated GPU from Intel as well, so why not give it a try?

To AMD defence, Intel’s decoder is exhibiting a subpar performance:

C:\...\MjpgCameraReader\bin\x64\Release>MjpgCameraReader.exe
 Using camera HD USB Camera
 Using adapter Intel(R) UHD Graphics 630
 Using video capture format 640x360@260.004 MFVideoFormat_MJPG
 Using hardware decoder MFT IntelРѕ Hardware M-JPEG Decoder MFT
 Using video frame format 640x368@260.004 MFVideoFormat_YUY2
 24.000 video samples per second captured
 63.500 video samples per second captured
 63.500 video samples per second captured
 64.000 video samples per second captured
 63.500 video samples per second captured
 63.000 video samples per second captured
 63.500 video samples per second captured
 62.000 video samples per second captured
 63.500 video samples per second captured
 64.000 video samples per second captured
 63.500 video samples per second captured

At lower relative utilization levels and, again, without hitting any bottleneck visibly, the capture rate is reduced.

And this happens even without the threading problem I could at least see in the AMD’s case.

120 FPS mode is doing good:

C:\...\MjpgCameraReader\bin\x64\Release>MjpgCameraReader.exe
 Using camera HD USB Camera
 Using adapter Intel(R) UHD Graphics 630
 Using video capture format 1280x720@120.000 MFVideoFormat_MJPG
 Using hardware decoder MFT Intelо Hardware M-JPEG Decoder MFT
 Using video frame format 1280x720@120.000 MFVideoFormat_YUY2
 77.000 video samples per second captured
 119.000 video samples per second captured
 120.000 video samples per second captured
 121.000 video samples per second captured
 119.000 video samples per second captured
 121.000 video samples per second captured
 120.000 video samples per second captured
 120.000 video samples per second captured
 120.500 video samples per second captured
 119.500 video samples per second captured
 120.000 video samples per second captured

That is, there is an obvious performance issue in Intel’s implementation since they fail to process lower resolution signal at original rate and even at rate they are showing for higher resolution signal!

So does 1920×1080@60:

C:\...\MjpgCameraReader\bin\x64\Release>MjpgCameraReader.exe
 Using camera HD USB Camera
 Using adapter Intel(R) UHD Graphics 630
 Using video capture format 1920x1080@60.000 MFVideoFormat_MJPG
 Using hardware decoder MFT Intelо Hardware M-JPEG Decoder MFT
 Using video frame format 1920x1088@60.000 MFVideoFormat_YUY2
 49.500 video samples per second captured
 60.500 video samples per second captured
 59.500 video samples per second captured
 60.000 video samples per second captured
 60.000 video samples per second captured
 60.000 video samples per second captured
 60.000 video samples per second captured
 60.000 video samples per second captured
 60.000 video samples per second captured
 60.000 video samples per second captured
 60.000 video samples per second captured

In closing

Bottom line is that hardware ASICs are generally good, but the quality of software MFT layer is not something GPU vendors care much of.

The application below does the testing on first available GPU and it assumes you have a video capture compatible to Media Foundation API. The application uses highest frame rate MJPG format of the camera and uses a hardware decoder MFT associated with the GPU.

One more thing to mention is that video capture takes place through so called Microsoft Windows Camera Frame Server (FrameServer) Service, notorious and not documented. Frame Server virtualizes video capture device adding processing overhead and cross-process synchronization.

Some time later I will compare performance of capturing around Frame Server and around Media Foundation default implementation of video capture device proxy. I expect though that there is no visible performance difference as those are, eventually, done well.

Download links

Binaries:

Leave a Reply