{"id":1996,"date":"2020-01-18T12:34:00","date_gmt":"2020-01-18T10:34:00","guid":{"rendered":"https:\/\/alax.info\/blog\/?p=1996"},"modified":"2020-01-19T11:47:46","modified_gmt":"2020-01-19T09:47:46","slug":"on-efficiency-of-hardware-assisted-jpeg-decoding-amd-mft-mjpeg-decoder","status":"publish","type":"post","link":"https:\/\/alax.info\/blog\/1996","title":{"rendered":"On efficiency of hardware-assisted JPEG decoding (AMD MFT MJPEG Decoder)"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">The <a href=\"https:\/\/alax.info\/blog\/1992\">previous post<\/a> was focusing on problems with the hardware MFT decoder provided as a part of video driver package. This time I am going to mention some data about how the inefficiency affects performance of video capture using a high frame rate 260 FPS camera as a test stand. Apparently the effect is better visible with high frame rates because CPU and GPU hardware is fast enough already to process less complicated signal.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">There is already <a href=\"https:\/\/github.com\/GPUOpen-LibrariesAndSDKs\/AMF\/issues\/198\">some interest from AMD end<\/a> (deserves a separate post why this is exceptional on its own), and some bug fixes are already under the way.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The performance problem is less visible because the decoder is overall performing without fatal issues and provides expected output: no failures, error codes, no deadlocks, neither CPU or GPU engine is peaked out, so things are more or less fine at first glance&#8230; The test application uses Media Foundation and <a href=\"https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/medfound\/source-reader\">Source Reader API<\/a> to read textures in hardware MFT enabled mode and discards the textures just printing out the frame rate.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">AMD MFT MJPEG Decoder<\/h2>\n\n\n\n<pre class=\"wp-block-preformatted\">C:\\...\\MjpgCameraReader\\bin\\x64\\Release>MjpgCameraReader.exe\n Using camera HD USB Camera\n Using adapter Radeon RX 570 Series\n Using video capture format 640x360@260.004 MFVideoFormat_MJPG\n Using hardware decoder MFT AMD MFT MJPEG Decoder\n Using video frame format 640x384@260.004 MFVideoFormat_YUY2\n 72.500 video samples per second captured\n<strong> 134.000 video samples per second captured\n 135.000 video samples per second captured\n 134.500 video samples per second captured\n<\/strong> 135.500 video samples per second captured\n 134.000 video samples per second captured\n 134.000 video samples per second captured\n 135.000 video samples per second captured\n 134.500 video samples per second captured\n 133.500 video samples per second captured\n 134.000 video samples per second captured<\/pre>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/alax.info\/blog\/wp-content\/uploads\/2020\/01\/amd.png\"><img loading=\"lazy\" decoding=\"async\" width=\"650\" height=\"600\" src=\"https:\/\/alax.info\/blog\/wp-content\/uploads\/2020\/01\/amd-650x600.png\" alt=\"\" class=\"wp-image-1997\" srcset=\"https:\/\/alax.info\/blog\/wp-content\/uploads\/2020\/01\/amd-650x600.png 650w, https:\/\/alax.info\/blog\/wp-content\/uploads\/2020\/01\/amd-320x295.png 320w, https:\/\/alax.info\/blog\/wp-content\/uploads\/2020\/01\/amd-768x709.png 768w, https:\/\/alax.info\/blog\/wp-content\/uploads\/2020\/01\/amd-1536x1418.png 1536w, https:\/\/alax.info\/blog\/wp-content\/uploads\/2020\/01\/amd-600x554.png 600w, https:\/\/alax.info\/blog\/wp-content\/uploads\/2020\/01\/amd-1040x960.png 1040w, https:\/\/alax.info\/blog\/wp-content\/uploads\/2020\/01\/amd.png 1550w\" sizes=\"auto, (max-width: 650px) 100vw, 650px\" \/><\/a><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">With no sign of hitting a bottleneck the reader process produces ~134 FPS from the video capture device.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Alax.Info MJPG Video Decoder for AMD Hardware<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">My replacement for hardware decoder MFT is doing the decoding of the same signal, and, generally, shares a lot with AMD&#8217;s own decoder: both MFTs are built on top of <a href=\"https:\/\/github.com\/GPUOpen-LibrariesAndSDKs\/AMF\">Advanced Media Framework (AMF) SDK<\/a>. Driver package installs runtime for this SDK and installs a decoder MFT which is linked against a copy of the runtime (according to AMD representative, the static link copy shares the same codebase). <\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">C:\\...\\MjpgCameraReader\\bin\\x64\\Release>MjpgCameraReader.exe\n Using camera HD USB Camera\n Using adapter Radeon RX 570 Series\n Using video capture format 640x360@260.004 MFVideoFormat_MJPG\n Using substitute decoder Alax.Info MJPG Video Decoder for AMD Hardware\n Using video frame format 640x360@260.004 MFVideoFormat_YUY2\n 74.000 video samples per second captured\n<strong> 261.000 video samples per second captured\n 261.000 video samples per second captured\n 261.000 video samples per second captured\n<\/strong> 261.000 video samples per second captured\n 260.500 video samples per second captured\n 261.000 video samples per second captured\n 261.000 video samples per second captured\n 261.000 video samples per second captured\n 261.000 video samples per second captured\n 260.500 video samples per second captured<\/pre>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/alax.info\/blog\/wp-content\/uploads\/2020\/01\/amd_substitute.png\"><img loading=\"lazy\" decoding=\"async\" width=\"650\" height=\"600\" src=\"https:\/\/alax.info\/blog\/wp-content\/uploads\/2020\/01\/amd_substitute-650x600.png\" alt=\"\" class=\"wp-image-1998\" srcset=\"https:\/\/alax.info\/blog\/wp-content\/uploads\/2020\/01\/amd_substitute-650x600.png 650w, https:\/\/alax.info\/blog\/wp-content\/uploads\/2020\/01\/amd_substitute-320x295.png 320w, https:\/\/alax.info\/blog\/wp-content\/uploads\/2020\/01\/amd_substitute-768x709.png 768w, https:\/\/alax.info\/blog\/wp-content\/uploads\/2020\/01\/amd_substitute-1536x1418.png 1536w, https:\/\/alax.info\/blog\/wp-content\/uploads\/2020\/01\/amd_substitute-600x554.png 600w, https:\/\/alax.info\/blog\/wp-content\/uploads\/2020\/01\/amd_substitute-1040x960.png 1040w, https:\/\/alax.info\/blog\/wp-content\/uploads\/2020\/01\/amd_substitute.png 1550w\" sizes=\"auto, (max-width: 650px) 100vw, 650px\" \/><\/a><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Similar CPU and GPU utilization levels with higher frame rate. Actually, with the expected frame rate because it is the rate the camera is supposed to operate at.<\/p>\n\n\n\n<!--more-->\n\n\n\n<h2 class=\"wp-block-heading\">1280&#215;720@120 Mode<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Interestingly, at lower FPS mode the AMD MFT threading issues are present, and, more to that the MFT exhibits two other issues (one of them is &#8220;just ignore&#8221; one per AMD comment). At the same time video capture rate is no longer reduced: the horsepower of the hardware is hiding the implementation inefficiency.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"> Using camera HD USB Camera\n Using adapter Radeon RX 570 Series\n Using video capture format 1280x720@120.000 MFVideoFormat_MJPG\n Using hardware decoder MFT AMD MFT MJPEG Decoder\n Using video frame format 1280x736@120.000 MFVideoFormat_YUY2\n 18.500 video samples per second captured\n<strong> 120.000 video samples per second captured\n 120.000 video samples per second captured\n 120.000 video samples per second captured\n<\/strong> 120.000 video samples per second captured\n 120.000 video samples per second captured\n 120.000 video samples per second captured\n 120.000 video samples per second captured\n 120.000 video samples per second captured<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Intel Hardware M-JPEG Decoder MFT<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">AMD is not the only one GPU vendor out there and my development system is equipped with integrated GPU from Intel as well, so why not give it a try?<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To AMD defence, Intel&#8217;s decoder is exhibiting a subpar performance:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">C:\\...\\MjpgCameraReader\\bin\\x64\\Release>MjpgCameraReader.exe\n Using camera HD USB Camera\n Using adapter Intel(R) UHD Graphics 630\n Using video capture format 640x360@260.004 MFVideoFormat_MJPG\n Using hardware decoder MFT Intel\u00d0\u00a0\u00d1\u2022 Hardware M-JPEG Decoder MFT\n Using video frame format 640x368@260.004 MFVideoFormat_YUY2\n 24.000 video samples per second captured\n<strong> 63.500 video samples per second captured\n 63.500 video samples per second captured\n 64.000 video samples per second captured\n<\/strong> 63.500 video samples per second captured\n 63.000 video samples per second captured\n 63.500 video samples per second captured\n 62.000 video samples per second captured\n 63.500 video samples per second captured\n 64.000 video samples per second captured\n 63.500 video samples per second captured<\/pre>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"650\" height=\"600\" src=\"https:\/\/alax.info\/blog\/wp-content\/uploads\/2020\/01\/intel-650x600.png\" alt=\"\" class=\"wp-image-1999\" srcset=\"https:\/\/alax.info\/blog\/wp-content\/uploads\/2020\/01\/intel-650x600.png 650w, https:\/\/alax.info\/blog\/wp-content\/uploads\/2020\/01\/intel-320x295.png 320w, https:\/\/alax.info\/blog\/wp-content\/uploads\/2020\/01\/intel-768x709.png 768w, https:\/\/alax.info\/blog\/wp-content\/uploads\/2020\/01\/intel-1536x1418.png 1536w, https:\/\/alax.info\/blog\/wp-content\/uploads\/2020\/01\/intel-600x554.png 600w, https:\/\/alax.info\/blog\/wp-content\/uploads\/2020\/01\/intel-1040x960.png 1040w, https:\/\/alax.info\/blog\/wp-content\/uploads\/2020\/01\/intel.png 1550w\" sizes=\"auto, (max-width: 650px) 100vw, 650px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">At lower relative utilization levels and, again, without hitting any bottleneck visibly, the capture rate is reduced.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">And this happens even without the threading problem I could at least see in the AMD&#8217;s case.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">120 FPS mode is doing good:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">C:\\...\\MjpgCameraReader\\bin\\x64\\Release>MjpgCameraReader.exe\n Using camera HD USB Camera\n Using adapter Intel(R) UHD Graphics 630\n Using video capture format 1280x720@120.000 MFVideoFormat_MJPG\n Using hardware decoder MFT Intel\u00d0\u00be Hardware M-JPEG Decoder MFT\n Using video frame format 1280x720@120.000 MFVideoFormat_YUY2\n 77.000 video samples per second captured\n<strong> 119.000 video samples per second captured\n 120.000 video samples per second captured\n 121.000 video samples per second captured\n<\/strong> 119.000 video samples per second captured\n 121.000 video samples per second captured\n 120.000 video samples per second captured\n 120.000 video samples per second captured\n 120.500 video samples per second captured\n 119.500 video samples per second captured\n 120.000 video samples per second captured<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">That is, there is an obvious performance issue in Intel&#8217;s implementation since they fail to process lower resolution signal at original rate and even at rate they are showing for higher resolution signal!<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">So does 1920&#215;1080@60:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">C:\\...\\MjpgCameraReader\\bin\\x64\\Release>MjpgCameraReader.exe\n Using camera HD USB Camera\n Using adapter Intel(R) UHD Graphics 630\n Using video capture format 1920x1080@60.000 MFVideoFormat_MJPG\n Using hardware decoder MFT Intel\u00d0\u00be Hardware M-JPEG Decoder MFT\n Using video frame format 1920x1088@60.000 MFVideoFormat_YUY2\n 49.500 video samples per second captured\n<strong> 60.500 video samples per second captured\n 59.500 video samples per second captured\n 60.000 video samples per second captured\n<\/strong> 60.000 video samples per second captured\n 60.000 video samples per second captured\n 60.000 video samples per second captured\n 60.000 video samples per second captured\n 60.000 video samples per second captured\n 60.000 video samples per second captured\n 60.000 video samples per second captured<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">In closing<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Bottom line is that hardware ASICs are generally good, but the quality of software MFT layer is not something GPU vendors care much of.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The application below does the testing on first available GPU and it assumes you have a video capture compatible to Media Foundation API. The application uses highest frame rate MJPG format of the camera and uses a hardware decoder MFT associated with the GPU.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">One more thing to mention is that video capture takes place through so called Microsoft Windows Camera Frame Server (FrameServer) Service, <a href=\"https:\/\/social.msdn.microsoft.com\/Forums\/windowsdesktop\/en-US\/9d6a8704-764f-46df-a41c-8e9d84f7f0f3\/mjpg-encoded-media-type-is-not-available-for-usbuvc-webcameras-after-windows-10-version-1607-os\">notorious<\/a> and not documented. Frame Server virtualizes video capture device adding processing overhead and cross-process synchronization. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Some time later I will compare performance of capturing around Frame Server and around Media Foundation default implementation of video capture device proxy. I expect though that there is no visible performance difference as those are, eventually, done well.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Download links<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Binaries:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>64-bit:&nbsp;<a href=\"https:\/\/alax.info\/blog\/wp-content\/uploads\/2020\/01\/MjpgCameraReader.exe_.7z\">MjpgCameraReader.exe<\/a>&nbsp;(in .7z archive)<\/li><li>License: This software is free to use<\/li><\/ul>\n","protected":false},"excerpt":{"rendered":"<p>The previous post was focusing on problems with the hardware MFT decoder provided as a part of video driver package. This time I am going to mention some data about how the inefficiency affects performance of video capture using a high frame rate 260 FPS camera as a test stand. Apparently the effect is better&hellip; <\/p>\n<p><a class=\"moretag\" href=\"https:\/\/alax.info\/blog\/1996\">Read the full article<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[553,82,162,188,424,426,486],"class_list":["post-1996","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-amd","tag-camera","tag-capture","tag-intel","tag-media-foundation","tag-mft","tag-video"],"_links":{"self":[{"href":"https:\/\/alax.info\/blog\/wp-json\/wp\/v2\/posts\/1996","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/alax.info\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/alax.info\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/alax.info\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/alax.info\/blog\/wp-json\/wp\/v2\/comments?post=1996"}],"version-history":[{"count":0,"href":"https:\/\/alax.info\/blog\/wp-json\/wp\/v2\/posts\/1996\/revisions"}],"wp:attachment":[{"href":"https:\/\/alax.info\/blog\/wp-json\/wp\/v2\/media?parent=1996"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/alax.info\/blog\/wp-json\/wp\/v2\/categories?post=1996"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/alax.info\/blog\/wp-json\/wp\/v2\/tags?post=1996"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}