{"id":1586,"date":"2016-01-09T20:24:02","date_gmt":"2016-01-09T18:24:02","guid":{"rendered":"https:\/\/alax.info\/blog\/?p=1586"},"modified":"2016-01-10T00:33:17","modified_gmt":"2016-01-09T22:33:17","slug":"encoding-h-264-video-using-hardware-mfts","status":"publish","type":"post","link":"https:\/\/alax.info\/blog\/1586","title":{"rendered":"Encoding H.264 video using hardware MFTs"},"content":{"rendered":"<p>Some time ago <a href=\"https:\/\/alax.info\/blog\/1394\">there were some pictures<\/a> explaining performance and other properties of software H.264 encoder (x264). At this time, it is a turn of hardware H.264 encoders and more to that, two of them and side by side. Both encoders are nothing new: Intel\u00c2\u00ae Quick Sync Video H.264 Encoder and NVIDIA H.264 Encoder already have been around for a while. Some would say it is already time for H.265 encoders.<\/p>\n<p>Either way, on my test machine both encoders are available without additionally installed software (that is, no need for Intel Media SDK, Nvidia NVENC, redistributable files etc.). Out of the box, Windows 10 offers stock software only encoder, and hardware encoders in form factor of <a href=\"https:\/\/msdn.microsoft.com\/en-us\/library\/windows\/desktop\/ms703138\">Media Foundation Transform (MFT)<\/a>.<\/p>\n<p>Environment:<\/p>\n<ul>\n<li>OS: Windows 10 Pro<\/li>\n<li>CPU: Intel i7-4790<\/li>\n<li>Video Adapter 1: Intel HD Graphics 4600 (on-board, not connected to monitors)<\/li>\n<li>Video Adapter 2: NVIDIA GeForce GTX 750<\/li>\n<\/ul>\n<p>It is not convenient or fun to do things with Media Foundation, but good news is that Media Foundation components are well-separable. A wrapper over MFT that converts them into DirectShow filters, make them available to DirectShow where it is already way easier to run various test runs. The pictures below show metrics for encoder defaults (bitrate, profiles and many other options that create a great deal of encoding modes). Still the pictures do show that both encoders are well usable for many scenarios including HD processing, simultaneous data processing etc.<\/p>\n<p><a href=\"https:\/\/alax.info\/blog\/wp-content\/uploads\/2016\/01\/Clipbrd3.png\" rel=\"attachment wp-att-1587\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1587\" src=\"https:\/\/alax.info\/blog\/wp-content\/uploads\/2016\/01\/Clipbrd3.png\" alt=\"Video Encoder MFT Wrapper in GraphStudioNext\" width=\"599\" height=\"103\" srcset=\"https:\/\/alax.info\/blog\/wp-content\/uploads\/2016\/01\/Clipbrd3.png 599w, https:\/\/alax.info\/blog\/wp-content\/uploads\/2016\/01\/Clipbrd3-320x55.png 320w\" sizes=\"auto, (max-width: 599px) 100vw, 599px\" \/><\/a><\/p>\n<p>Test runs are as simple as taking <a href=\"https:\/\/alax.info\/blog\/1553\">reference video source signal<\/a> of different properties, pushing it through encoder filter and either writing to a file (to inspect the footage) or to <a href=\"https:\/\/msdn.microsoft.com\/en-us\/library\/windows\/desktop\/dd390934\">Null Renderer Filter<\/a> to measure performance.<\/p>\n<p>Intel\u00c2\u00ae Quick Sync Video H.264 Encoder produces files like these: <a href=\"https:\/\/alax.info\/blog\/wp-content\/uploads\/2016\/01\/720x480.mp4\">720&#215;480.mp4<\/a>, <a href=\"https:\/\/alax.info\/blog\/wp-content\/uploads\/2016\/01\/2556x1440.mp4\">2556&#215;1440.mp4<\/a>, which are of decent quality (with respect to low bitrate and &#8220;hard to handle&#8221; background changes). NVIDIA H.264 Encoder produces somewhat better output supposedly by choosing higher bitrate. Either way, both encoders have a number of ways to fine tune the encoding process. Not just bitrate, profile, GOP length, B frame settings but even more sophisticated parameters.<\/p>\n<h3>Intel\u00c2\u00ae Quick Sync Video H.264 Encoder MFT<\/h3>\n<pre><code>CODECAPI_AVEncCommonRateControlMode: VT_UI4 0, default VT_UI4 0, modifiable \/\/ eAVEncCommonRateControlMode_CBR = 0\r\nCODECAPI_AVEncCommonQuality: minimal VT_UI4 0, maximal VT_EMPTY, step VT_EMPTY\r\nCODECAPI_AVEncCommonBufferSize: VT_UI4 3131961357, default VT_UI4 0, modifiable\r\nCODECAPI_AVEncCommonMaxBitRate: default VT_UI4 0\r\nCODECAPI_AVEncCommonMeanBitRate: VT_UI4 3131961357, default VT_UI4 2222000, modifiable\r\nCODECAPI_AVEncCommonQualityVsSpeed: VT_UI4 50, default VT_UI4 50, modifiable\r\nCODECAPI_AVEncH264CABACEnable: modifiable\r\nCODECAPI_AVEncMPVDefaultBPictureCount: VT_UI4 0, default VT_UI4 0, modifiable\r\nCODECAPI_AVEncMPVGOPSize: VT_UI4 128, default VT_UI4 128, modifiable\r\nCODECAPI_AVEncVideoEncodeQP: \r\nCODECAPI_AVEncVideoForceKeyFrame: VT_UI4 0, default VT_UI4 0, modifiable\r\nCODECAPI_AVLowLatencyMode: VT_BOOL 0, default VT_BOOL 0, modifiable\r\nCODECAPI_AVEncVideoLTRBufferControl: VT_UI4 65536, values { VT_UI4 65536, VT_UI4 65537, VT_UI4 65538, VT_UI4 65539, VT_UI4 65540, VT_UI4 65541, VT_UI4 65542, VT_UI4 65543, VT_UI4 65544, VT_UI4 65545, VT_UI4 65546, VT_UI4 65547, VT_UI4 65548, VT_UI4 65549, VT_UI4 65550, VT_UI4 65551, VT_UI4 65552 }, modifiable\r\nCODECAPI_AVEncVideoMarkLTRFrame: \r\nCODECAPI_AVEncVideoUseLTRFrame: \r\nCODECAPI_AVEncVideoEncodeFrameTypeQP: default VT_UI8 111670853658, minimal VT_UI8 0, maximal VT_UI8 219046674483, step VT_UI8 1\r\nCODECAPI_AVEncSliceControlMode: VT_UI4 0, default VT_UI4 2, minimal VT_UI4 2, maximal VT_UI4 2, step VT_UI4 0, modifiable\r\nCODECAPI_AVEncSliceControlSize: VT_UI4 0, minimal VT_UI4 0, maximal VT_UI4 8160, step VT_UI4 1, modifiable\r\nCODECAPI_AVEncVideoMaxNumRefFrame: minimal VT_UI4 0, maximal VT_UI4 16, step VT_UI4 1, modifiable\r\nCODECAPI_AVEncVideoTemporalLayerCount: default VT_UI4 1, minimal VT_UI4 1, maximal VT_UI4 3, step VT_UI4 1, modifiable\r\nCODECAPI_AVEncMPVDefaultBPictureCount: VT_UI4 0, default VT_UI4 0, modifiable\r\n<\/code><\/pre>\n<h3>NVIDIA H.264 Encoder MFT<\/h3>\n<pre><code>CODECAPI_AVEncCommonRateControlMode: VT_UI4 0\r\nCODECAPI_AVEncCommonQuality: VT_UI4 65\r\nCODECAPI_AVEncCommonBufferSize: VT_UI4 8923353\r\nCODECAPI_AVEncCommonMaxBitRate: VT_UI4 8923353\r\nCODECAPI_AVEncCommonMeanBitRate: VT_UI4 2974451\r\nCODECAPI_AVEncCommonQualityVsSpeed: VT_UI4 33\r\nCODECAPI_AVEncH264CABACEnable: VT_BOOL -1\r\nCODECAPI_AVEncMPVGOPSize: VT_UI4 50\r\nCODECAPI_AVEncVideoEncodeQP: VT_UI8 26\r\nCODECAPI_AVEncVideoForceKeyFrame: \r\nCODECAPI_AVEncVideoMinQP: VT_UI4 0, minimal VT_UI4 0, maximal VT_UI4 51, step VT_UI4 1\r\nCODECAPI_AVLowLatencyMode: VT_BOOL 0\r\nCODECAPI_AVEncVideoLTRBufferControl: VT_UI4 0, values { VT_I4 65537, VT_I4 65538 }\r\nCODECAPI_AVEncVideoMarkLTRFrame: \r\nCODECAPI_AVEncVideoUseLTRFrame: \r\nCODECAPI_AVEncVideoEncodeFrameTypeQP: VT_UI8 111670853658\r\nCODECAPI_AVEncSliceControlMode: VT_UI4 2, minimal VT_UI4 0, maximal VT_UI4 2, step VT_UI4 1\r\nCODECAPI_AVEncSliceControlSize: VT_UI4 0, minimal VT_UI4 0, maximal VT_UI4 3, step VT_UI4 1\r\nCODECAPI_AVEncVideoMaxNumRefFrame: VT_UI4 1, minimal VT_UI4 0, maximal VT_UI4 16, step VT_UI4 1\r\nCODECAPI_AVEncVideoMeanAbsoluteDifference: VT_UI4 0\r\nCODECAPI_AVEncVideoMaxQP: VT_UI4 51, minimal VT_UI4 0, maximal VT_UI4 51, step VT_UI4 1\r\nCODECAPI_AVEncVideoROIEnabled: VT_UI4 0\r\nCODECAPI_AVEncVideoTemporalLayerCount: minimal VT_UI4 1, maximal VT_UI4 3, step VT_UI4 1\r\n<\/code><\/pre>\n<p>Important property of hardware encoder is that even that it does consume some of CPU time, the most of the complexity is offloaded to video hardware. In all single stream test runs, the eight-core CPU was loaded not more than 30% including time required to synthesize the image using <a href=\"https:\/\/msdn.microsoft.com\/en-us\/library\/windows\/desktop\/ee719902\">WIC<\/a> and <a href=\"https:\/\/msdn.microsoft.com\/en-us\/library\/windows\/desktop\/dd370990\">Direct2D<\/a> and convert it to YUV format using CPU. That is, offloading video encoding to GPU is a convenient way to free CPU for real time video processing applications.<\/p>\n<p>I was mostly interested in how the encoders are in terms of being able to process real time data, esp. so that they are applied to record lengthy sessions. Both encoders appear to be fast enough to crack 1920&#215;1080 HD video at frame rates up to 60 and higher. The test did encoding at highest rate possible and 100% number on the charts corresponds to situation that it took one second to synthesize and encode one second of video no matter what effective CPU\/GPU load is. That is, values less than 100% indicate ability to encode video content in real time right away.<\/p>\n<p><a href=\"https:\/\/alax.info\/blog\/wp-content\/uploads\/2016\/01\/Clipbrd4.png\" rel=\"attachment wp-att-1588\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-large wp-image-1588\" src=\"https:\/\/alax.info\/blog\/wp-content\/uploads\/2016\/01\/Clipbrd4-678x600.png\" alt=\"Intel and NVidia Hardware H.264 Encoders Side by Side\" width=\"625\" height=\"553\" srcset=\"https:\/\/alax.info\/blog\/wp-content\/uploads\/2016\/01\/Clipbrd4-678x600.png 678w, https:\/\/alax.info\/blog\/wp-content\/uploads\/2016\/01\/Clipbrd4-320x283.png 320w, https:\/\/alax.info\/blog\/wp-content\/uploads\/2016\/01\/Clipbrd4-624x552.png 624w, https:\/\/alax.info\/blog\/wp-content\/uploads\/2016\/01\/Clipbrd4.png 763w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/p>\n<p>Basically, the numbers show that both encoders are fast enough to reliably encode 1080p60 stream.<\/p>\n<p>Looking at it from another standpoint of being able to process two or more H.264 encoding sessions at once, encoder from NVidia has an important limitation of two sessions per system (<a href=\"https:\/\/devtalk.nvidia.com\/default\/topic\/800942\/session-count-limitation-for-nvenc-no-maxwell-gpus-with-2-nevenc-sessions-\/\">supposedly related thread<\/a> &#8211; for this or another reason test run with three streams fails).<\/p>\n<p><a href=\"https:\/\/alax.info\/blog\/wp-content\/uploads\/2016\/01\/Clipbrd6.png\" rel=\"attachment wp-att-1589\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-large wp-image-1589\" src=\"https:\/\/alax.info\/blog\/wp-content\/uploads\/2016\/01\/Clipbrd6-690x600.png\" alt=\"Intel and NVidia H.264 Encoders in Concurrent Encoding\" width=\"625\" height=\"543\" srcset=\"https:\/\/alax.info\/blog\/wp-content\/uploads\/2016\/01\/Clipbrd6-690x600.png 690w, https:\/\/alax.info\/blog\/wp-content\/uploads\/2016\/01\/Clipbrd6-320x278.png 320w, https:\/\/alax.info\/blog\/wp-content\/uploads\/2016\/01\/Clipbrd6-768x668.png 768w, https:\/\/alax.info\/blog\/wp-content\/uploads\/2016\/01\/Clipbrd6-624x543.png 624w, https:\/\/alax.info\/blog\/wp-content\/uploads\/2016\/01\/Clipbrd6.png 806w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/p>\n<p>Both encoders are hardly suitable for reliable encoding of two 1080p60 streams simultaneously (or perhaps some fine tuning might make things faster by choosing appropriate encoding mode). However both look fine for encoding 1080p and lower resolution stream. Clearly, Intel&#8217;s encoder can be used to encoder multiple low resolution streams in parallel or mix real time encoding with background encoding (provided that background encoding is throttled to let the real time stream run fast enough). If otherwise real-time encoding is not necessary, both encoders can do the job as well, and with Nvidia the application needs to make sure that only two sessions are running simultaneously, Intel&#8217;s encoder can be used in a more flexible way.<\/p>\n<p>Also, Nvidia&#8217;s encoder is slightly faster, however Intel&#8217;s allow 3+ concurrently encoded stream and also allows to supply RGB input directly without converting to YUV.<\/p>\n<p>There is also Intel\u00c2\u00ae Hardware H265 Encoder MFT available for H.265 encoding, but this is going to be another story some time later.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Some time ago there were some pictures explaining performance and other properties of software H.264 encoder (x264). At this time, it is a turn of hardware H.264 encoders and more to that, two of them and side by side. Both encoders are nothing new: Intel\u00c2\u00ae Quick Sync Video H.264 Encoder and NVIDIA H.264 Encoder already&hellip; <\/p>\n<p><a class=\"moretag\" href=\"https:\/\/alax.info\/blog\/1586\">Read the full article<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[78,505,379,504,390,424,506,486],"class_list":["post-1586","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-directshow","tag-gpu","tag-h-264","tag-h-265","tag-hardware","tag-media-foundation","tag-realtime","tag-video"],"_links":{"self":[{"href":"https:\/\/alax.info\/blog\/wp-json\/wp\/v2\/posts\/1586","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/alax.info\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/alax.info\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/alax.info\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/alax.info\/blog\/wp-json\/wp\/v2\/comments?post=1586"}],"version-history":[{"count":0,"href":"https:\/\/alax.info\/blog\/wp-json\/wp\/v2\/posts\/1586\/revisions"}],"wp:attachment":[{"href":"https:\/\/alax.info\/blog\/wp-json\/wp\/v2\/media?parent=1586"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/alax.info\/blog\/wp-json\/wp\/v2\/categories?post=1586"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/alax.info\/blog\/wp-json\/wp\/v2\/tags?post=1586"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}