This sample shows how to use D3D11 Deferred Rendering contexts to lower the CPU overhead and improve performance when rendering large numbers of objects per frame, in situations where instancing is not feasible.
This sample has the following app-specific controls:
Device | Input | Result |
---|---|---|
mouse | Left-Click Drag | Rotate the view |
keyboard | Arrow Keys | Translate the view left/right/forward/back |
W/S | Translate the view forward/back | |
A/D | Translate the view left/right | |
TAB | Toggle the HUD | |
F1 | Toggle help display | |
F2 | Toggle the Test mode HUD | |
gamepad | Right ThumbStick | Rotate the camera |
Left ThumbStick | Move forward/backward, Slide left/right |
Direct3D11 introduced the concept of a deferred context (DC). DCs allow an application to issue D3D calls on multiple threads simultaneously by recording those calls in a driver level command list which is later executed on the main render thread against the Immediate Context (IC).
This DirectX 11 SDK sample illustrates the benefit of using Direct3d11 deferred contexts to fill command lists on threads and amortize API and driver CPU load across multiple CPU cores.
A deferred contexts is a special ID3D11DeviceContext that can be called in parallel on a different thread than the main thread which is issuing commands to the immediate context. Unlike the immediate context, calls to a deferred contexts are not sent to the GPU at the time of call and must be marshalled into a command list which is then executed at a later date. It is also possible to execute a command list multiple times to replay a sequence of GPU work against different input data.
This documentation will not go into the syntax and low level mechanics of using deferred contexts and command lists in Direct3D11. For that information please refer to Microsoft's DirectX Graphics Documentation on command lists [MicrosoftDirect3D] and my GDC 2013 presentation on deferred contexts [DUDASh43].
This samples is best evaluated by running on your local machine and observing the performance for different render strategies and object counts. With animation or without. As the number of render threads vary. Then subsequently investigating the render strategy code and understanding how it works.
It is quite possibly a cop out to not fully explain everything in this documentation. However, every engine is different and every use case is different. Use of deferred contexts is very dependent on data set and running environment and as such there is no single recommended technique for every case.
What follows in this documentation is a high level overview of the render strategies in the sample application and some analysis on their strengths and weaknesses.
Aka "what most engines do". This technique issues all commands against the D3D11 immediate context. Draws are grouped to minimize state changes and API calls per draw. At the limit a single draw should consist of a few buffer bindings and a draw.
This is the same as normal immediate context rendering with the single change to push per object data into a texture that is populated at the start of rendering and used by a large number of draws. This allows the application to batch up per object updates.
Instancing allow for efficient drawing of large numbers of objects which share the same vertex buffer, shaders and render state. It is by far the lowest CPU overhead to draw large numbers of objects, but it comes with significant restrictions to what can be batched together.
It should be noted that in this sample instancing render strategy always uses VTF. Additionally, it must sort all objects to be rendered by vertex buffer. As this sample is simple we are able to make use of instancing, but in a general engine, other state, shaders and mesh data will differ so as to make general use of instancing for all draws infeasible.
Deferred contexts allow us to create command lists in parallel which are then executed on the main thread. This can allow us to efficiently assemble everything that needs to be drawn. In this sample we have the option to draw using DCs with or without using VTF.
The sample has an option to draw for all render strategies using Vertex Texture Fetch (VTF). This method skips the per-object uniform constant buffer map and update in favor of a single monolithic texture that contains per instance world matrix and instance color. Additionally, it will bind a second vertex stream that contains per instance UV information which is used to load data from the bound texture in the vertex shader.
It should be noted that at low draw call counts the overhead of using VTF is such that the application can become bottlenecked on updating that texture. As draw/instance counts go up VTF becomes faster as the overhead is amortized and the benefit of reduced buffer mapping outweighs the overhead. In situations where it is possible for an application to easily assemble this monolithic texture of instance data, it is advisable to do so.
However, for normal engine rendering, it is often that case that each draw call will have a significant amount of per instance data mapped into constant buffers and in these cases making use of VTF would be problematic.
The bottom line is that if you can use this technique, it should be faster than individual mapping of instance data for large numbers of relatively homogeneous draws.
The entire reason for using or not using deferred contexts revolves around performance. There is a potential to parallelize CPU load onto idle CPU cores and improve performance.
You will be interested in using deferred context command lists if:
NVIDIA® GameWorks™ Documentation Rev. 1.0.220830 ©2014-2022. NVIDIA Corporation and affiliates. All Rights Reserved.