Presenting Shadertoy output at low latency with DXGI and Direct3D 11

A few nice updates today for a Direct3D 11 shadertoy rendering tool I posted earlier on here. As a load tool, it benefits from some flexibility and it is also a demonstration of hardware capabilities as well.

First of all, the -EnumerateAdapters switch is now more verbose about DXGI and prints more of flags, e.g.:

Factory: DXGI_FEATURE_PRESENT_ALLOW_TEARING 1
 Adapter 0: Radeon RX 570 Series, Vendor 0x1002, Adapter 0.0000CE52, Flags DXGI_ADAPTER_FLAG3_ACG_COMPATIBLE | DXGI_ADAPTER_FLAG3_SUPPORT_MONITORED_FENCES | DXGI_ADAPTER_FLAG3_KEYED_MUTEX_CONFORMANCE
   Output 0: \.\DISPLAY4, BitsPerColor 10, ColorSpace DXGI_COLOR_SPACE_RGB_FULL_G22_NONE_P709, HardwareCompositionSupport DXGI_HARDWARE_COMPOSITION_SUPPORT_FLAG_FULLSCREEN | DXGI_HARDWARE_COMPOSITION_SUPPORT_FLAG_CURSOR_STRETCHED
   Output 1: \.\DISPLAY5, BitsPerColor 10, ColorSpace DXGI_COLOR_SPACE_RGB_FULL_G22_NONE_P709, HardwareCompositionSupport DXGI_HARDWARE_COMPOSITION_SUPPORT_FLAG_FULLSCREEN | DXGI_HARDWARE_COMPOSITION_SUPPORT_FLAG_CURSOR_STRETCHED
 Adapter 1: Intel(R) UHD Graphics 630, Vendor 0x8086, Adapter 0.0000C9B9, Flags DXGI_ADAPTER_FLAG3_ACG_COMPATIBLE | DXGI_ADAPTER_FLAG3_SUPPORT_MONITORED_FENCES | DXGI_ADAPTER_FLAG3_KEYED_MUTEX_CONFORMANCE
 Adapter 2: Microsoft Basic Render Driver, Vendor 0x1414, Adapter 0.0000CE2B, Flags DXGI_ADAPTER_FLAG3_SOFTWARE | DXGI_ADAPTER_FLAG3_ACG_COMPATIBLE | DXGI_ADAPTER_FLAG3_SUPPORT_MONITORED_FENCES | DXGI_ADAPTER_FLAG3_KEYED_MUTEX_CONFORMANCE

I also added a few more shadertoy options available now using switch -ShaderIndex with values 0..3. The values of three picks HLSL version of Neon Tunnel by alro. Apart from being cool, this shader is really simple and lightweight and so it enables high FPS rendering while it still offers smooth perceptible motion in view.

New switch -OutputMode 1 enables textual overlay in the top left corner, which is implemented using Direct2D with DirectWrite, and interop with Direct3D 11 to cooperatively render and print over graphics before sending frames to presentation.

With shader rendering, interop with overlay and really simple single threaded design the process results in pretty high frame rates.

The tool is now implements better support for variable refresh rate monitors and low latency presentation. The latency in windowed mode falls below two frame sync intervals (on Radeon RX 570 Series; use PresentMon to collect statistics):

(DXGI): SyncInterval=0 Flags=512 1.81 ms/frame (551.3 fps, 59.8 fps displayed, 25.39 ms latency) Composed: Flip

With 144 Hz monitor, it can still be under 2/144 of a second:

(DXGI): SyncInterval=0 Flags=512 0.68 ms/frame (1469.1 fps, 131.3 fps displayed, 13.45 ms latency) Composed: Flip  

When in fullscreen mode, direct flipping mode results in really low latency of just a few milliseconds.

(DXGI): SyncInterval=0 Flags=0 1.82 ms/frame (550.9 fps, 550.9 fps displayed, 3.57 ms latency) Hardware Composed: Independent Flip

Use of -WaitableObject 3 switch in windowed mode to enable use of DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT shows that a single threaded rendering loop indeed spends some time in sleeping with respective reduction in GPU engine load and frame rate.

Have fun!

Download links

Binaries:

  • 64-bit: RenderSc.exe (in .7z archive)
  • License: This software is free to use

Leave a Reply