Intel Developer Zone lockout

Some time ago I found my account at IntelĀ® Developer Zone was disabled. It was strange but who knows, let us go from assumption that there was a good reason.

For a moment I thought I was using wrong credentials, but I have saved ones. When password reset email did not show up, it was a bigger surprise – at least these things were supposed to be working. Username reminder did work and generally confirmed that I am using proper sign in data.

Given that “Contact Us” form is dedicated to login problems and being filled says “Thank you for contacting Intel. Your information has been submitted and we will respond to your inquiry within 48 hours.”, they seem to be disabling accounts from time to time and there is an emergency feedback channel for unexpected.

However I just figured out that a couple of weeks or more already passed, and there was no response. RIP Intel Developer Zone.

Direct3D 11 render into live HLS asset

Further experiments with Direct3D 11 shadertoy rendering: HTTP Server API integration and serving on demand parts of HTTP Live Streaming (HLS) asset using Media Foundation with hardware video encoding. An hls.js player is capable to read and play the content, including being able to step between quality levels.

A sort of a Google Stadia for shadertoys with video on demand and possibly low latency. Standard HLS low latency (I am not following latest HTTP/2 extensions for lower latency HLS) is of course not even near the real ultra-low latency that we have in Rainway for web based game streaming being at levels of as low as 10-20 milliseconds with HTML5 delivery, however the approach proves that it is possible to deliver content with on demand rendering.

Perhaps it is possible to use the approach to broadcast live content with server side GPU based post processing. With a single viewer it is easy to change quality levels because a client would request new segment without also downloading it in another quality. Since consumer grade H.264/H.265 encoders are not normally designed to encode much faster than realtime (1920×1080@100 for H.264 is something to align expectations with, perhaps with only higher end NVIDIA cards offering more), quality change can be handled easy, but doing several qualities at a time might be excessive load.

Simplicity of HLS syntax overall allows to format the virtual asset in a flexible way: it can be a true live asset, or it can be a static fixed length seek-enabled asset with on demand rendering from randomly accessed point.

I would also like to use this opportunity to mention another beautiful shader “The Universe Within” by Martijn “BigWings” Steinrucken, which is running on my screenshot.

The SDK API 1.9 adds SkipFrame…

If there was a prize for the messiest SDK, Intel Media SDK would be a favorite. They seem to have put but special care to make things confusing, unclear, inconvenient to use and sick perplexed.

So there is no clear signal which versions of SDK have support for SkipFrame field. One has to query and the query itself is not something straightforward: one needs to build a multi-piece structure with request for multiple things among which this field is zeroed on the way back if the functionality is not supported. That could be fine if other vendors would not have shown that there are so much friendlier ways to expose features to developers.

Going further: the member itself is documented as introduced in SDK version 1.9. Good to know! Let us continue reading:

The enumeration itself is available since SDK version 1.11. That’s a twist!

To summarize, it is likely to be unsafe to do anything about this functionality, which is a small thing among so many there, before SDK 1.9. With SDK versions 1.9 and 1.10 the values are undefined because SDK 1.11 introduced enumeration, which was then extended in 1.13. Regardless, apart from SDK versions one needs to build a query (which alone makes you feel miserable if you happen to know how capability discovery is implemented by NVIDIA) because even though the field might be known to SDK runtime, its implementation might be missing.

However, as it often happens there is a silver lining if you look well enough: we have to thank Intel for the capability because AMD does not offer it at all.

Presenting Shadertoy output at low latency with DXGI and Direct3D 11

A few nice updates today for a Direct3D 11 shadertoy rendering tool I posted earlier on here. As a load tool, it benefits from some flexibility and it is also a demonstration of hardware capabilities as well.

First of all, the -EnumerateAdapters switch is now more verbose about DXGI and prints more of flags, e.g.:

Factory: DXGI_FEATURE_PRESENT_ALLOW_TEARING 1
 Adapter 0: Radeon RX 570 Series, Vendor 0x1002, Adapter 0.0000CE52, Flags DXGI_ADAPTER_FLAG3_ACG_COMPATIBLE | DXGI_ADAPTER_FLAG3_SUPPORT_MONITORED_FENCES | DXGI_ADAPTER_FLAG3_KEYED_MUTEX_CONFORMANCE
   Output 0: \.\DISPLAY4, BitsPerColor 10, ColorSpace DXGI_COLOR_SPACE_RGB_FULL_G22_NONE_P709, HardwareCompositionSupport DXGI_HARDWARE_COMPOSITION_SUPPORT_FLAG_FULLSCREEN | DXGI_HARDWARE_COMPOSITION_SUPPORT_FLAG_CURSOR_STRETCHED
   Output 1: \.\DISPLAY5, BitsPerColor 10, ColorSpace DXGI_COLOR_SPACE_RGB_FULL_G22_NONE_P709, HardwareCompositionSupport DXGI_HARDWARE_COMPOSITION_SUPPORT_FLAG_FULLSCREEN | DXGI_HARDWARE_COMPOSITION_SUPPORT_FLAG_CURSOR_STRETCHED
 Adapter 1: Intel(R) UHD Graphics 630, Vendor 0x8086, Adapter 0.0000C9B9, Flags DXGI_ADAPTER_FLAG3_ACG_COMPATIBLE | DXGI_ADAPTER_FLAG3_SUPPORT_MONITORED_FENCES | DXGI_ADAPTER_FLAG3_KEYED_MUTEX_CONFORMANCE
 Adapter 2: Microsoft Basic Render Driver, Vendor 0x1414, Adapter 0.0000CE2B, Flags DXGI_ADAPTER_FLAG3_SOFTWARE | DXGI_ADAPTER_FLAG3_ACG_COMPATIBLE | DXGI_ADAPTER_FLAG3_SUPPORT_MONITORED_FENCES | DXGI_ADAPTER_FLAG3_KEYED_MUTEX_CONFORMANCE

I also added a few more shadertoy options available now using switch -ShaderIndex with values 0..3. The values of three picks HLSL version of Neon Tunnel by alro. Apart from being cool, this shader is really simple and lightweight and so it enables high FPS rendering while it still offers smooth perceptible motion in view.

New switch -OutputMode 1 enables textual overlay in the top left corner, which is implemented using Direct2D with DirectWrite, and interop with Direct3D 11 to cooperatively render and print over graphics before sending frames to presentation.

With shader rendering, interop with overlay and really simple single threaded design the process results in pretty high frame rates.

The tool is now implements better support for variable refresh rate monitors and low latency presentation. The latency in windowed mode falls below two frame sync intervals (on Radeon RX 570 Series; use PresentMon to collect statistics):

(DXGI): SyncInterval=0 Flags=512 1.81 ms/frame (551.3 fps, 59.8 fps displayed, 25.39 ms latency) Composed: Flip

With 144 Hz monitor, it can still be under 2/144 of a second:

(DXGI): SyncInterval=0 Flags=512 0.68 ms/frame (1469.1 fps, 131.3 fps displayed, 13.45 ms latency) Composed: Flip  

When in fullscreen mode, direct flipping mode results in really low latency of just a few milliseconds.

(DXGI): SyncInterval=0 Flags=0 1.82 ms/frame (550.9 fps, 550.9 fps displayed, 3.57 ms latency) Hardware Composed: Independent Flip

Use of -WaitableObject 3 switch in windowed mode to enable use of DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT shows that a single threaded rendering loop indeed spends some time in sleeping with respective reduction in GPU engine load and frame rate.

Have fun!

Download links

Binaries:

  • 64-bit: RenderSc.exe (in .7z archive)
  • License: This software is free to use

Sakura Bliss version of Direct3D 11 rendering and presentation load tool

I am sharing today one another tool which I am using to simulate load from time to time (such as the one against video/desktop streaming through Rainway).

The application is rendering an HLSL variant of Sakura Bliss shadertoy (by Philippe Desgranges) using Direct3D 11 swap chain in DXGI_SWAP_EFFECT_FLIP_DISCARD mode.

For starters, the shader is mesmerizing on its own, fantastic work!

Just as in the case of another earlier application with another shadertoy, the HLSL source code can be extracted from application resources and the shaders are compiled on runtime.

The application offers some important command line switches to configure the workload as needed.

Syntax: RenderSc [options]

Options: 
  -DisableDebugOutput - Disable forward of debug output to console output in debug mode (should appear before -Debug)
  -Debug[:<Normal|Full|<Value>>] - Enable self-debugging capability with specific minidump type
  -EnumerateAdapters - Enumerate DXGI adapters and exit
  -AdapterIndex <index> - Specify DXGI adapter index (default is 0)
  -OutputIndex <index> - Specify DXGI output index (default is 0)
  -Resolution <width> <height> - Specify resolution of generated video (default is 1920 x 1080)
  -Format <format> - Specify DXGI format (b8g8r8a8, r8g8b8a8, r10g10b10a2, r16g16b16a16; default is b8g8r8a8)
  -SwapChainBufferCount <count> - Specify DXGI swapchain buffer count (default is 2)
  -Fullscreen - Start in full screen mode (otherwise use Alt+Enter to switch)
  -Rate <numerator> <denominator> - Specify frame rate for fullscreen mode (default is 144 Hz)
  -PresentSyncInterval <interval> - Use specific presentation synchronization interval (default is 0)

Full-screen mode can be requested from command line as well as enabled or disabled by Alt+Enter.

It is possible to configure some important parameters which you should be aware of from MSDN documentation on DXGI and Direct3D 11. One specific thing to mention is that it is possible to request DXGI_FORMAT_R10G10B10A2_UNORM and DXGI_FORMAT_R16G16B16A16_FLOAT formats. To reduce amount of rendering a
-PresentSyncInterval 1 parameter can be used: it defines the first argument to IDXGISwapChain::Present call.

Download links

Binaries:

  • 64-bit: RenderSc.exe (in .7z archive)
  • License: This software is free to use

A few interesting observations about NVIDIA Turing video encoders

GPUNVIDIA GeForce GTX 1080 TiNVIDIA GeForce GTX 1660 Ti
MicroarchitecturePascalTuring
H.264
NV_ENC_CAPS_SUPPORT_FIELD_ENCODING
YesNo
H.264
NV_ENC_PRESET_HP_GUID entropyCodingMode
NV_ENC_H264_
ENTROPY_CODING_MODE_
CAVLC
NV_ENC_H264_
ENTROPY_CODING_MODE_
CABAC
H.265/HEVC
NV_ENC_CAPS_NUM_MAX_BFRAMES
05
H.265/HEVC
NV_ENC_CAPS_SUPPORT_TEMPORAL_AQ
NoYes

Apart from the capabilities, a whitepaper mentions these H.265/HEVC improvements:

Turing GPUs also ship with an enhanced NVENC encoder unit that adds support for H.265 (HEVC) 8K encode at 30 fps. The new NVENC encoder provides up to 25% bitrate savings for HEVC and up to 15% bitrate savings for H.264.