1. RTX-Dev Features

The features covered here are for the current mainline versions on RTX-Dev branches. Older versions and topic branches may have additional features which have been superseded or are still undergoing refinement.

2. Hybrid Translucency

Supported versions: RTX-4.23, RTX-4.24, RTX-4.25

Soul City scene with translucency enabled, but hybrid translucency disabled. Cascade particles disappear.

Hybrid translucency restores Cascade particles by merging ray tracing and raster translucency.

Hybrid translucency is a solution for the problem of wanting both raster and ray traced translucency in the same scene. In base UE4, ray traced translucency and raster translucency are mutually exclusive. While ray traced translucency produces high quality reflections, which greatly enhances the look of objects like windows, raster translucency is often more efficient for today’s particle effects.

2.1. How to Use It

  1. Enable the Render Property for hybrid translucency.

  2. Enable ray traced translucency (r.RayTracing.Translucency).

  3. Enable ray traced hybrid translucency (r.RayTracing.HybridTranslucency).

  4. Adjust the threshold setting (r.RayTracing.HybridTranslucency.DepthThreshold) for your content.

  5. Disable ray tracing on any translucent geometry or effect which do not benefit.

  6. Check-out translucent materials for proper behavior.

  7. Evaluate roughness and normal maps on surfaces showing excessive noise.

2.2. What to Expect

With hybrid translucency, large translucent surfaces like windows will gain the detailed reflections seen in ray traced reflections. This will provide a significant immediate leap in visual fidelity. The translucency will also fall back to normal raster effects when the layer count exceeds the present setting, preventing layers from simply vanishing. Additionally, legacy Cascade particle systems will also no longer disappear when ray tracing is enabled. On the downside, the rendering will lose the order independence of ray traced translucency and the full support for refraction ray tracing brings. However, it is important to understand that the sorting and refraction behaves the same as your raster content today. This means that many times, hybrid translucency will save content creation work by behaving similarly to the raster transparency you are used to.

While hybrid translucency offers a much lighter burden on altered behaviors compared to full ray traced translucency, a few issues do need to be considered. First, it uses a threshold value to identify when surfaces are the ‘same’ between the raster and ray traced worlds. This is how the two worlds are matched up to allow the results to be combined. Next, the reflections for hybrid translucency have the same general properties of ray traced reflections in that rough surfaces can produce noisy results. Unlike reflections, hybrid translucency does not have a denoising pass to help filter this down. As a result, the effect is best used with smooth surfaces, like glass. The good news here is that most translucent surfaces with substantial reflections are smooth. Finally, an art pass over translucent materials may be required no matter what version of translucent ray tracing is being used.

Adjusting hybrid translucency’s threshold value will take a small amount of testing with your content. The threshold value is expressed in units of world space, so a value of 1.0 will mean that two surfaces separated by less than 1 cm (assuming standard use of UE4 units) will be considered the same. Due to inaccuracies inherent in floating point math, this test requires a degree of slop, and the right value will generally depend on the scale of the world. (Surfaces further from the viewer will require higher thresholds) If the value is too low, translucency surfaces will either show no translucency effect, or they will show a z-fighting like artifact. If it is too large, then a particle in front of a window may be mistaken as part of the window. Tuning to the right value means the smallest value that avoids the appearance of the z-fighting like artifacts.

To ensure the best results, an art pass over the materials to ensure proper compatibility is desirable. The shading model used today for raster translucency can be highly forgiving of odd material setups, but with the more accurate shading model used in ray tracing, some odd results may occur. Some issues seen previously include normals facing the wrong way and glass materials marked as 100% metallic. The good news is that adjusting these to more correct physical properties tends to improve the raster look as well. Finally, there might be special effects baked into materials like a reflection texture. Obviously, these don't make sense in a world where the reflection is generated through ray tracing, so any such effect would need to be factored out into a separate tree using the ray tracing material graph switch node.

2.3. Multiple Layers

Hybrid translucency offers a quality control for how many layers it supports through the CVar r.RayTracing.HybridTranslucency.Layers. Each layer corresponds to a level of overlap in screen space to support ray traced translucent effects. Only the closest layers are captured, and all more distant layers will just use the standard raster translucency effects. With a view looking through multiple layers of glass, only the top layer will have a high-quality ray trace reflection with the default Layers value of 1. Moving to a value of 2, the scene would render high quality reflections on the next closest layer as well. Each layer has an additional performance and memory cost. The performance impact to supporting more layers is generally minimal as the cost is only incurred when the condition occurs. At 12 bytes per pixel, the memory overhead for each additional layer is 24 MB at 1080p resolution. Generally, having only 1-2 layers is enough to capture the important visual effects as further ones are going to be obscured by the overlapping elements.

2.4. Half-Resolution Transparency

As part of the application step, hybrid translucency offers an upscale, enabling lower resolution evaluation of the ray tracing. The half resolution option (r.RayTracing.HybridTranslucency.HalfRes) has a few different modes to choose from:

0 - Full resolution translucency

1 - Half resolution translucency, line interleaved

2 - Half resolution translucency, checkerboarded, 4-tap upsample

3 - Half resolution translucency, checkerboarded, 2-tap upsample

Option 2 should be the highest quality filtering option, but options 1 and 3 may offer somewhat better performance. The interleaved solution in option 1 may appear sharper, but could have trouble reconstructing nearly horizontal lines.

2.5. How It Works

Hybrid translucency works by ray tracing only the translucent objects requesting ray traced effects and storing the results of these interactions in an offscreen buffer. The interactions only contain the shading (including reflections), opacity, and distance from the camera. Only the first <N> layers are captured based on the setting mentioned above. When translucent objects are later rasterized, any object expecting a ray traced result looks it up in the offscreen buffer and determines the correct interaction based on distance from the camera. If no interaction is found, it uses the normal raster shading. This ensures that the compositing maintains compatibility with rasterized translucency.

3. Reflection Optimizations

Supported versions: RTX-4.23, RTX-4.24, RTX-4.25

The reflection optimizations in RTX-Dev are generally invisible to the user. They either specialize the code for common cases where the simplification improves performance or, they rearrange the code to encourage better code generation. The ones integrated into each RTX-Dev branch vary from version to version as the reflections shader changes in mainline UE4, sometimes due to these features being merged as pull requests. These all happen with no intervention or tuning from the user.

The options RTX-Dev has provided over its history of releases include:

  • Shader specialization for number of bounces

  • Specialization/refactoring for number of samples

  • Fix uninitialized variables to shrink live register ranges

  • Specialize for low clearcoat values

  • Optionally, replace clearcoat with environment probes

  • Terminate reflection path via accumulated roughness

  • Reorder top and bottom traces to reduce live variables

Some additional improvements to reflection performance do exist, but they fall under the general category of light management, so projects interested in reflection improvements should look into these improvements.

4. Lighting Optimizations and Improvements

Lighting is a huge portion of the ray tracing pipeline, and as a result, many of the customizations for RTX-Dev are focused on lighting. Importantly, lighting can mean both direct lighting in the scene (ray traced shadows) as well as lighting in ray traced effects (the lighting seen in ray traced reflections and translucency).

4.1. Light Priorities

Supported versions: RTX-4.23, RTX-4.24, RTX-4.25

An important aspect to both performance and quality is selecting the right lights to use in ray tracing effects. While this can be done by tweaking content, the RTX branches provide the ability to sort lights based on heuristics as to which are likely to be most important. Light prioritization has two separate phases: direct interactions and indirect interactions. The direct interactions are the lights rendered in the "Lighting" phase of rendering and covers the lights and shadows in the scene. The indirect interactions are the lights used in ray tracing effects like reflections, where the effect of the lights are only visible in these passes.

This shot shows indirect versus direct lighting effects. The near geometry is being directly lit by the light source. The image in the mirror is an example of indirect lighting in that the view is seeing the lighting after the extra bounce of the mirror.

4.1.1. Direct Interactions

The prioritization of direct light interactions is used to manage the costs of shadow casting lights in the scene. This prioritization only applies to the direct ray traced shadows visible in the scene. This prioritization can help manage the default light settings. UE4 defaults the "Casts Ray Traced Shadows," property to true for all lights in a scene. This means that while a content team might have set up most of the scene to receive only static shadows in raster, all the lights will cast fully dynamic shadows when ray tracing is enabled. As ray traced dynamic shadows often cost less than shadow maps, this isn’t necessarily a bad thing. However, it can result in a scene having one dynamic shadowing light with raster shadows and 1000 fully dynamic shadow casting lights with ray tracing. Such a scenario can obviously lead to sub-par performance.

Light prioritization can be enabled through either of these CVars:

r.RayTracing.Shadow.MaxLights 
r.RayTacing.Shadow.MaxDenoisedLights

When both are set to -1 (the default), all lights in the scene are rendered exactly as they would be with base UE4. Setting either to 0 or higher turns on priority sorting of lights. Setting MaxLights to 4 will cap the number of ray traced lights in the scene to 4 and fall back to the static shadowing for any lights ranked as less important. Importantly, any light that would have received a dynamic shadow under rasterization automatically gets a ray traced shadow, ensuring no necessary dynamic shadows are dropped. (Setting MaxLights to 0 essentially means to use ray traced shadows only on lights that would cast dynamic shadows without ray tracing).

MaxDenoisedLights has a more nuanced meaning. The cost of most ray traced shadows is often dominated by the cost of denoising the area light shadow. Without denoising, these lights can often be quite fast. Setting MaxDenoisedLights to 8, will ensure that only the 8 highest priority lights with area will receive denoising. This means that lower priority lights will have noisy shadows, but they will still cast ray traced shadows. Generally, MaxDenoisedLights would either be set to -1 or a value less than MaxLights. (It is meaningless to denoise more lights than you ray trace.) Due to the way batching works in UE4, multiples of 4 are the best choice for MaxDenoisedLights.

Finally, one important option is useful alongside the prioritization of direct light interactions. The CVar r.RayTracing.Shadow.FallBackToSharp allows lights casting ray traced shadows, but not receiving denoising to cast sharp shadows rather than noisy shadows. The sharp shadows can be less visually distracting than the speckled noise produced by soft shadows with no denoising.

Infiltrator with default ray traced shadowing light settings, 6.5 fps, 739 shadow casting lights, with almost 18 ms in shadow denoising alone.

Infiltrator demo with shadowing lights capped at 32 and denoised lights capped at 4. Shadows lacking denoising are forced to sharp, as can be seen on the character shadow to the lower left. Performance increases to 28.7 FPS.

Infiltrator demo with no ray traced shadowing lights. Only one shadowing light is computed for the scene, due to the raster setup. 32 FPS.

Overall, the best policy is to do a tuning pass over content to properly set the ray tracing shadow properties of lights, as artists fully understand the scene they are working with. However, prioritization provides a good solution to managing the performance prior to accomplishing such a pass while testing out ray tracing. Additionally, prioritization can offer a solid safety net for cases where the content tuning has failed for some reason. As the priority is a heuristic, it is guaranteed to sometimes make a decision an artist might consider incorrect; however, experience with real content has shown the present heuristic to be reasonable and stable (unlikely to produce flickering).

4.1.2. Indirect Interactions

These are the lights used to apply lighting in ray traced effects like reflections. In many ways, this prioritization is more important than the direct interactions, because while an unlimited number of shadow casting lights can be handled by the engine, it has a cap on the number of ray traced lights (256 in 4.24). If the scene contains more than the maximum number of lights, base UE4 simply treats them as "first come, first served" for the purposes of ray tracing. A night light might be at the far end of the level, but if it is the first item in the list of lights, it’ll take a slot possibly bumping out the flashlight in the character’s hand. Obviously, content tuning can still play a role here, but in a large level it is harder to exclude lights as statically never being important in reflections.

Prioritization for ray traced lights is controlled via the CVar r.RayTracing.Lighting.MaxLights. Setting it to -1 disables the prioritization and reverts to the engine’s default behavior of first come, first served. Any other value enables prioritization and sets a maximum for the number of lights considered in ray traced effects. The default is the engine’s maximum number of supported lights (256). Setting the cap lower is often a reasonable optimization, as 256 truly important lights in a second order effect like a ray traced reflection is often overkill. Many times, reducing the limit to something like 32 will produce minimal or no change in the quality of the rendering. The light priorities are governed by closeness to the viewer, intensity, and how much they interact with the field of view. (Dropping a dim light directly behind the viewer pointed away from them may produce an incorrect image, but it is likely hard to recognize as incorrect.

In addition to just capping the number of lights to improve performance, the prioritization process allows the user to cap the number of ray traced lights casting shadows. Just light shadows might be turned off for less important lights in the main scene to improve performance, the same can be done with less important lights in the ray traced effects. The CVar r.RayTracing.Lighting.MaxShadowLights controls this threshold.

4.2. Light Function Support

Supported versions: RTX-4.24, RTX-4.25

Light material functions are a UE4 feature that has been missing from indirect lighting effects in ray tracing. This leads to the appearance of an object lit by a light function light to be incorrect in ray traced reflections. The RTX-4.24 branch adds the ability to evaluate light material functions in ray traced effects to ensure that these are not lost. Importantly, this is presently limited to only 16 light function lights in a single scene; however, the limit is quite easy to raise as it is merely a compile-time constant.

Light function support can be disabled by setting the r.RayTracing.LightFunction CVar to 0.

Without light functions, the lighting in the mirror lacks the geometric pattern.

With light functions, the lighting in the mirror matches the lighting in the world.

4.3. Light Channel Masks

Supported versions: RTX-4.24, RTX-4.25

Base UE4 ignores light channel masks when evaluating lights in ray traced effects. This results in lighting behaving differently in effects like reflections and translucency than it does in the main scene. This RTX-Dev feature causes light channel masks to be obeyed in ray traced reflections and translucency. It presently does not apply to global illumination as this may require additional effort to determine the 'correct' application of light channels. (Is the channel supposed to affect only the first interaction along the light path, the last interaction, or some other formulation?) Generally, GI as a physically driven effect makes little sense in combination with something non-physical like light channel masks.

Adherence to Shadow Flags in Ray Traced Lighting

Supported versions: RTX-4.23, RTX-4.24, RTX-4.25

Base UE4 treats all lights as shadowed during ray traced light evaluations. This results in lights marked as non-shadowed casting shadows in reflections, but not casting shadows in the main view. This RTX-Dev change propagates the shadow casting property to the ray traced version of the lights and evaluates it faithfully in ray traced effects such as reflections and translucency.

4.4. Light Evaluation in Miss Shaders

Supported versions: RTX-4.23, RTX-4.24

Mainline support: 4.25

By default, UE4 evaluates all lights as part of the ray generation shader after getting the result of a shadow ray cast. This has the unfortunate effect of placing the light evaluation code in the same register allocation context as the rather large ray generation shader. Further, it prevents code specialization which might be beneficial as the lighting code needs to handle any generic case. Fortunately, there is a way to handle both issues via indirection (like callable shaders provide) for free by placing the evaluation in the miss shader. The trace guarantees an indirection code evaluation in the miss shader, and lighting only needs to be evaluated on a shadow miss. Further, since the miss shader used is selected via an index, the miss shader can be specialized for the evaluation. This all applies doubly with light functions, as the light function evaluation can be merged into the miss shader as well. All this adds up to a reduction in the cost of light evaluation, doubly so when using light functions. This capability is enabled by the CVar r.RayTracing.LightingMissShader (on by default). (The quantity is called r.RayTracing.MissShaderLighting in RTX-4.23.) Performance improvements are as much as 5-10% with larger gains occurring with more lights and features such as light functions.

5. Enhancements for Shadows

The RTX-Dev branches have added several capabilities to enhance the behavior and performance of shadows. All of these items are fairly minor tweaks, but they add up to a sizeable amount of extra flexibility and performance for developers working with ray traced content.

5.1. Scissored Shadow Computations

Supported versions: RTX-4.24, RTX-4.25

This optimization reintroduces support to restrict DispatchRays calls to only the region of the viewport known to be potentially receiving shadows from the light. This optimization was available in prior versions of UE4, but was removed in the process of resolving support for multiple views. The revised version only supports the optimization when a single view is being used. The single view case is clearly the most common one, so specializing the code path to improve its performance is extremely valuable. Performance gains vary greatly by scene and view.

5.2. Occlusion Cull Direction

Supported versions: RTX-4.23, RTX-4.24, RTX-4.25

By default, UE4 considers both front and back facing triangles as occluders for the purposes of occlusion (shadows, AO, GI, and skylights). This often results in extra shadows cast from triangles which are backface culled when rendering with shadow maps. Ray tracing effects can counteract this by disabling two sided geometry. (There is a per-effect CVar to do this.) However, face culling isn’t enough to fully match the behavior experienced with rasterization, due to face orderings being reversed for ray tracing. With rasterization, the visibility check travels from the light toward the surface, while in ray tracing, the visibility check travels from the surface toward the light. UE4 culls backfaces when tracing occlusion rays against single-sided geometry, in order to compute more accurate umbras. (Umbra is decided by the occlusion point closest to the surface.) This causes a content compatibility challenge in that both shadow maps and ray tracing are culling backfaces even though the traversal directions are reversed. To resolve this issue, RTX-Dev makes the direction selectable through a CVar:

r.RayTracing.OcclusionCullDirection

The default of zero culls backfaces just like base UE4, and a value of one changes it to culling frontfaces. Importantly, the change is only meaningful if two-sided geometry (r.RayTracing.Shadows.EnableTwoSidedGeometry for shadows) is disabled.

5.3. Shadow Backface Culling for Transmissive Shading Models

Supported versions: RTX-4.25

When casting shadow rays, UE4 first checks the pixel in the gbuffer to determine whether it actually needs to cast the ray. The test not only includes such simple operations as the pixel lies in the light’s area of influence, but also whether the surface is backfacing to the light. Naturally, these are great optimizations as the fastest ray is the one not fired, so marking objects not receiving light as in shadow handles this perfectly. Unfortunately, some material types enable light to transmit through the surface. This transmissive effect means that lights behind the surface may actually cause light to come from behind the surface. RTX-Dev allows the user to disable the backface culling for these material types to properly match the results from rasterization.

r.RayTracing.Shadows.CullTransmissives

Enabling CullTransmissives matches the behavior from standard UE4, and this is the default in the RTX dev branches. Setting CullTransmissives to zero will more closely match the behavior from shadow maps.

5.4. Disable Shadow Casting from Translucent Materials

Supported versions: RTX-4.24, RTX-4.25

Unlike shadow maps, ray traced shadows treat translucent materials as shadow casting. While ray tracing can benefit from partially translucent shadows, that is not supported by UE4 today, and translucent materials cast opaque shadows. UE4 does have a per-material flag to identify whether a material should cast shadows or not, but fixing this material by material in a large project may be time consuming. To help rectify the issue, RTX-Dev offers a CVar to treat all translucent materials as non-shadow casting to match the default behavior from rasterization.

r.RayTracing.ExcludeTranslucentsFromShadows

Importantly, while this CVar is not read-only, it should typically only be set prior to launching the project. Without having it set at startup, some objects may get built for ray tracing without the flag being activated. Some of their properties will potentially be cached, and changing the flag dynamically will appear to have no effect. This CVar defaults to 0 to match UE4 behavior out of the box. Setting it to one will more closely match shadow map behavior.

6. Optimizations for CPU Overhead

Counterintuitively, ray tracing can end up being CPU-bound in UE4 instead of GPU limited. The primary reason for this is the gathering and maintenance of the BVH and shader binding table. With the BVH, the issues are straightforward and less severe. As a general principle, ray tracing just has not yet received the level of refinement in the management of culling and traversal to gather the data to place in the BVH. Additionally, ray tracing may often gather more meshes than the set drawn in raster as the BVH needs to represent objects outside the view frustum. Additionally, entities derived from instanced static meshes, like foliage, are impacted more heavily in the loss of optimizations used with rasterization. 4.24 made a large leap in some of these costs, but further room for optimization exists. For the shader binding table (SBT), the costs are a bit more complex. The SBT needs to gather references to all the textures and constant buffers used by materials. This is essentially a duplication of all the resource setup done for rasterizing the materials in the base pass. However, on top of this, the retained nature of ray tracing means that additional residency tracking overhead exists on top of this.

The CPU overhead for ray tracing has the further complication that it is split between the render thread and the RHI thread. The BVH and material gather operations occur on the render thread, but the material processing and residency management for the SBT happen on the RHI thread. Often, the RHI thread concerns are worse, but this is not absolute. Obviously, a bottleneck in either will hold back performance.

6.1. General RHI Thread Optimizations

To reduce the CPU load, the RTX-Dev branches have 3 optimizations aimed at the costs of setting up and managing the SBT.

6.1.1. Optimized Set

Supported versions: RTX-4.23, RTX-4.24

Mainline support: 4.25 (Alternate version)

The SBT needs to use a set to manage unique elements in a few different cases. As it turns out, the standard UE4 TSet implementation is a poor match to the needs. TSet is more optimized toward the management of a smaller set of large objects, where the SBT needs to manage a large set of small objects (just pointers). Replacing TSet with an implementation tuned toward the needs of the SBT dramatically reduces the costs of managing the SBT.

6.1.2. Per-Heap Residency Management

Supported versions: RTX-4.23, RTX-4.24

Every time a DispatchRays call is made, a check needs to occur to ensure that all resources referenced in the SBT will be resident for the command buffer. Given that scenes often have several thousand objects, each with a material referencing multiple textures, this can easily become a problem of updating the residency status of tens of thousands of resources. Conveniently, most of the static textures live in heaps, and a residency update on a resource in a heap simply passes through as a residency update on the heap. To reduce the update complexity, this optimization simply changes the SBT residency management to track by heaps from the start where possible. Instead of thousands of resource tracking operations per DispatchRays, now the same work is accomplished in dozens.

6.1.3. Batched Descriptor Updates

Supported versions: RTX-4.23, RTX-4.24

One of the more expensive operations remaining in SBT management after the improvements above is the maintenance of the descriptor heap. Every SBT entry needs descriptor ranges carved out and populated via CopyDescriptors. Batching the CopyDesciptors calls together helps reduce the overhead of managing the heap. Overall, this is a smaller advantage than the other RHI thread optimizations, but heap management is a non-trivial cost after the other optimizations are in place.

6.2. Parallel Instance Gathering

Supported versions: RTX-4.23, RTX-4.24

Mainline support: 4.25

Parallel instance gathering addresses the render thread costs by moving much of the work to gather the BVH nodes and materials into worker threads. Parallelizing this work can reduce the costs of the render thread by more than 1 ms in cases of heavy numbers of ray tracing instances. This is essentially bringing ray tracing object management more inline with raster geometry management. This support is a back port from the development branch of UE4, so it is closely related to the version found in 4.25. On branches prior to 4.25, the CVar r.RayTracing.ParallelGather is provided to allow disabling the optimization for debugging purposes.

6.3. Delayed Ray Tracing Setup

Supported versions: RTX-4.23, RTX-4.24

Mainline support: 4.25 (alternate solution)

In addition to the total amount of CPU work generated by ray tracing, the timing of that work can also play an important factor in performance. Because the setup of the SBT is a large CPU task occurring on the RHI thread, it can result in an extended period with no command lists getting submitted to the GPU. This can result in the GPU running out of work and going idle. When combined with operations like occlusion query feedback, this can result in serializations between the CPU and the GPU. The end effect is that both CPU and GPU are partially idle waiting on one another at different parts of the frame. Moving the SBT setup later in the frame allows other GPU work to get kicked off prior to the RHI thread getting blocked with the large amount of material setup work. This reduces the tendency of the GPU to run idle, alleviating serialization issues. To enable this, the RTX-Dev branches introduce the CVar r.RayTracing.LateInit. The best setting may be dependent on content, so some experimentation may be needed. Available modes are as follows:

0 - Leave ray tracing structure setup in its normal place in the render loop after the Z prepass

1 - After early occlusion query submission (ensure occlusion queries begin processing before blocking the CPU)

2 - After base pass rendering

With 4.25, the same effect is handled through an alternate method which cannot be turned off.

6.4. Parallel AddMeshBatch Processing

Supported Versions: RTX-4.24, RTX-4.25

Dynamic objects in the scene such as skeletal meshes and instanced static meshes require additional per-frame mesh batch processing when used in ray tracing. In scenes with large numbers of these objects, the serial processing on the render thread easily becomes a bottleneck. Moving the majority of the work to parallel worker threads allows the render thread to proceed while this work happens in the background. The improvement will be highly content dependent, but 5% gains are very realistic on common content scenarios.

r.RayTracing.ParallelMeshBatchSetup

6.5. Optimized Instanced Static Mesh (ISM) Gathering

Supported Versions: RTX-4.24, RTX-4.25

ISMs cover several rendering object types in UE4, including Hierarchical Instanced Static Meshes (HISMs) and foliage. Since these generate large numbers of instances in the bounding volume hierarchy, they utilize a special culling pass. This feature improves the efficiency of this culling pass while also skipping unnecessary material setup in certain cases. As with all optimizations impacting a specific type of object, gains will be content dependent, but can be on the order of 10%.

6.6. Streamlined Material Sorting Setup

Supported Versions: RTX-4.24, RTX-4.25

While the material sorting pass offers substantial gains in GPU efficiency for reflections and global illumination, it can have a cost in CPU cycles. The primary cost in base UE4 tied to material sorting is the need to setup an additional shader table to handle the deferred material pass. In reality, the material setup for this case is drastically simpler than typical cases, as it is one shader with a unique element of user data for all slots. This optimization takes advantage of that provides a version of SetHitGroup able to do just this. The simplified path reduces costs on both the render thread and the RHI thread, with gains of 5% or less typical.

r.RayTracing.BatchMaterials

6.7. Aggregate Performance Gains

This table shows how the CPU optimizations in RTX-4.24 work out on a heavy piece of content like the Infiltrator demo. Importantly, this is with no content modification.

Baseline Performance     65.6 ms  
RHI Thread Optimizations     58.2 ms  
Parallel Instance Gather     36.9 ms  
Late Initialization     36.0 ms  

7. Asymmetric ScreenPercentage

Supported versions: RTX-4.23, RTX-4.24, RTX-4.25

Standard UE4 interprets all instances of ScreenPercentage as a scale factor on both the X and Y dimensions of the render target. This means that the workload scales with the square of the ScreenPercentage as a value of 50% means every 4th pixel is rendered. This becomes a challenge with the ray tracing ScreenPercentage scaling factors, as ray tracing requires that the scaling be an integer scale of the framebuffer size. This makes the first step down a 2:1 factor of 50%, or only 25% of the pixels on the screen. This is obviously a very large reduction in quality. The asymmetric ScreenPercentage enhancement factors the user specified ScreenPercentage into separate X and Y scale factors. The translation works like the table below:

 ScreenPercentage   X Scale   Y Scale   Pixel Fraction 
100% 1.0 1.0 1.0
70% 0.5 1.0 0.5
50% 0.5 0.5 0.25

With this improvement in place, the ray tracing effects can operate at reduced cost without as drastic a loss in quality.

RealisticRendering at ScreenPercentage 100 - 64 FPS

RealisticRendering at ScreenPercentage 70 - 76 FPS

RealisticRendering at ScreenPercentage 50 - 84 FPS

8. BVH Visualization

Supported versions: RTX-4.25

While most RTX-Dev features address performance or image quality, the BVH visualization modes are geared toward debugging scene setup. The BVH visualization showflags enable an artist or developer to understand how the objects in the scene interact in the acceleration structure. The available modes are:

VisualizeBVHOverlap – Display the count of volumes overlapping pixels in the gbuffer.

VisualizeBVHComplexity – Display the count of volumes between the eye and the pixel in the gbuffer.

These visualization modes allow one to understand the density of objects in the BVH and the amount of overlap between objects. These metrics can provide insight on where ray tracing traversal costs might be driven up due to complexities in the way the geometry is laid out. Since traversal deals with bounding volume hierarchies, excessive overlap may mean that a large number of objects may need to be tested to find the final intersection.

VisualizeBVHOverlap mode showing the number of volumes each pixel in the gbuffer is contained within. The area around the bookshelf to the top left shows that there are a fair amount of overlap in the region.

VisualizeBVHComplexity mode showing the number of volumes crossed between the viewer and the object in the scene. The halos around objects provide a rough sense of where bounding volumes for objects lie.

Reference view of the same Realistic Rendering scene used above in the visualization screenshots.

These CVars control the display of the data to help adapt the visualization to your scene:

r.RayTracing.VisualizeBVH.ColorMap
r.RayTracing.VisualizeBVH.Encoding
r.RayTracing.VisualizeBVH.Range
r.RayTracing.VisualizeBVH.RangeMin

The ColorMap CVar allows the user to use different color mappings to make the data easier to understand. The Encoding CVar allows selection between logarithmic and linear mappings for the density to help discern detail. Finally, the Range and RangeMin allow the user to adjust the scale to best fit the data being examined.

An important aspect of the BVH visualization tools is that they are imperfect approximations. They estimate compiled BLAS nodes via AABB nodes, so they do not represent any efficiencies internal to the BLAS nodes. Additionally, certain mesh types are presently unsupported as the BLASes do not correspond 1:1 with the scene proxy objects in the engine. Keeping these items in mind, the BVH visualizations can provide insight on how scene construction may be impacting ray traversal performance.

Previous | Next

  Previous Topic     Next Topic  

Home    

Getting Started    

Help Your Game and Ship Your RTX UE4 Title    

Deep Learning Super Sampling    

 

Notices

Notice

THE INFORMATION IN THIS GUIDE AND ALL OTHER INFORMATION CONTAINED IN NVIDIA DOCUMENTATION REFERENCED IN THIS GUIDE IS PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE INFORMATION FOR THE PRODUCT, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. Notwithstanding any damages that customer might incur for any reason whatsoever, NVIDIA’s aggregate and cumulative liability towards customer for the product described in this guide shall be limited in accordance with the NVIDIA terms and conditions of sale for the product.

THE NVIDIA PRODUCT DESCRIBED IN THIS GUIDE IS NOT FAULT TOLERANT AND IS NOT DESIGNED, MANUFACTURED OR INTENDED FOR USE IN CONNECTION WITH THE DESIGN, CONSTRUCTION, MAINTENANCE, AND/OR OPERATION OF ANY SYSTEM WHERE THE USE OR A FAILURE OF SUCH SYSTEM COULD RESULT IN A SITUATION THAT THREATENS THE SAFETY OF HUMAN LIFE OR SEVERE PHYSICAL HARM OR PROPERTY DAMAGE (INCLUDING, FOR EXAMPLE, USE IN CONNECTION WITH ANY NUCLEAR, AVIONICS, LIFE SUPPORT OR OTHER LIFE CRITICAL APPLICATION). NVIDIA EXPRESSLY DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY OF FITNESS FOR SUCH HIGH RISK USES. NVIDIA SHALL NOT BE LIABLE TO CUSTOMER OR ANY THIRD PARTY, IN WHOLE OR IN PART, FOR ANY CLAIMS OR DAMAGES ARISING FROM SUCH HIGH RISK USES.

NVIDIA makes no representation or warranty that the product described in this guide will be suitable for any specified use without further testing or modification. Testing of all parameters of each product is not necessarily performed by NVIDIA. It is customer’s sole responsibility to ensure the product is suitable and fit for the application planned by customer and to do the necessary testing for the application in order to avoid a default of the application or the product. Weaknesses in customer’s product designs may affect the quality and reliability of the NVIDIA product and may result in additional or different conditions and/or requirements beyond those contained in this guide. NVIDIA does not accept any liability related to any default, damage, costs or problem which may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this guide, or (ii) customer product designs.

Other than the right for customer to use the information in this guide with the product, no other license, either expressed or implied, is hereby granted by NVIDIA under this guide. Reproduction of information in this guide is permissible only if reproduction is approved by NVIDIA in writing, is reproduced without alteration, and is accompanied by all associated conditions, limitations, and notices.

Trademarks

NVIDIA, the NVIDIA logo, and cuBLAS, CUDA, CUDA-GDB, CUDA-MEMCHECK, cuDNN, cuFFT, cuSPARSE, DIGITS, DGX, DGX-1, DGX Station, NVIDIA DRIVE, NVIDIA DRIVE AGX, NVIDIA DRIVE Software, NVIDIA DRIVE OS, NVIDIA Developer Zone (aka "DevZone"), GRID, Jetson, NVIDIA Jetson Nano, NVIDIA Jetson AGX Xavier, NVIDIA Jetson TX2, NVIDIA Jetson TX2i, NVIDIA Jetson TX1, NVIDIA Jetson TK1, Kepler, NGX, NVIDIA GPU Cloud, Maxwell, Multimedia API, NCCL, NVIDIA Nsight Compute, NVIDIA Nsight Eclipse Edition, NVIDIA Nsight Graphics, NVIDIA Nsight Systems, NVLink, nvprof, Pascal, NVIDIA SDK Manager, Tegra, TensorRT, Tesla, Visual Profiler, VisionWorks and Volta are trademarks and/or registered trademarks of NVIDIA Corporation in the United States and other countries. Other company and product names may be trademarks of the respective companies with which they are associated.