Command-Line Interface#
In addition to the UI software API, Graphics Captures may be created by CLI tools. Like the GUI
tools, the CLI tools can intercept a user application and capture the application’s graphical API
calls and application resources into a form that can then be replayed in a standalone manner. The
collection of these API calls and their resources are colloquially referred to as a “capture file.”
In the Nsight Graphics tool, this capture file has the ngfx-bincap extension.
Important Features
Capture a series of application calls and resources into a standalone replayable format.
Supports both single- and multi-frame capture; up to 60 frames.
The replayed application can be replayed in a standalone manner. A best effort is made to replay on configurations that differ from the original capture format.
How to Capture#
The ngfx-capture tool is a command line executable used to launch a target application with the Nsight Graphics capture libraries injected so that graphics capture files can be generated.
To generate a capture, first launch the target application using the ngfx-capture tool by configuring the target application’s executable path, working directory, command line arguments, and any environment variables. These options are detailed in the Capture Command Line Argument and Options section.
Once launched, ngfx-capture should produce some console output indicating the target application was successfully launched.
> ngfx-capture.exe --exe "C:\VulkanSDK\1.3.246.0\Bin\vkcube.exe"
Launching C:\VulkanSDK\1.3.246.0\Bin\vkcube.exe ...
[vkcube.exe] Connection Established: 2023-08-09 11:08:05
Additionally, upon a successful launch the target application should display the Nsight Graphics HUD if a supported graphics API is being used.
When you would like to capture, either press the capture hotkey (default F11) or click the capture button on the Nsight Graphics HUD. Alternatively, the ngfx-capture tool can be configured to automatically capture after a given timeout or number of frames have been presented.
When the capture process begins, the application briefly pauses while initial data and state are collected. The target application then resumes and the frames are captured. When the capture completes, the capture file is written to disk along with capture statistics output to the console by ngfx-capture.
[vkcube.exe] STARTING CAPTURE: 2023-08-09 11:12:19
[vkcube.exe] Initializing Capture ......................... 100% (58ms)
[vkcube.exe] Capture Begin ................................ 100% (177ms)
[vkcube.exe] Capturing 1 Frame ............................ 100% (40ms)
[vkcube.exe] ENDING CAPTURE 2023-08-09 11:12:19
[vkcube.exe] Capture End .................................. 100% (40ms)
[vkcube.exe] Generating Screenshot ........................ 100% (16ms)
[vkcube.exe] Encoding Object Info ......................... 100% (37ms)
[vkcube.exe] Capture Finalize ............................. 100% (22ms)
[vkcube.exe] Finalizing Capture File ...................... 100% (4ms)
[vkcube.exe] CAPTURE STATS
[vkcube.exe] Capture Time ................................. 400ms
[vkcube.exe] Event Count .................................. 6
[vkcube.exe] Resource Count ............................... 32
[vkcube.exe] FILE STATS
[vkcube.exe] File Size .................................... 158 KiB
[vkcube.exe] Data Chunk Count ............................. 15
[vkcube.exe] File Write Speed ............................. 277.424 MiB/s
[vkcube.exe] COMPRESSION STATS
[vkcube.exe] Compression Mode ............................. Normal (LZ4)
[vkcube.exe] Uncompressed Size ............................ 2.67 MiB
[vkcube.exe] Compression Speed ............................ 176.997 MiB/s
[vkcube.exe] Compression Ratio ............................ 5.794%
[vkcube.exe] Saved to C:\Users\soandso\Documents\NVIDIA Nsight Graphics\GraphicsCaptures\vkcube_2023_08_09_11_12_19.ngfx-capture
By default the capture file is saved to ${MY_DOCUMENTS}\NVIDIA Nsight Graphics\GraphicsCaptures where the filename will be the target process name with a timestamp appended. This location and output file name can also be configured via the launch time options.
Capture Command Line Argument and Options#
For a detailed list of the command line options, pass -h or --help to the ngfx-capture
executable.
NVIDIA Nsight Graphics Capture CLI Tool
Usage: ngfx-capture [OPTIONS]
Options:
-h,--help Print this help message and exit
--version Display program version information and exit
Launch Type:
Launch type.
[Exactly 1 of the following options is required]
Application Launch Options:
-e,--exe TEXT Executable path.
Application Launch Options:
--working-dir,--wd TEXT Working directory.
--args TEXT ... Arguments to pass to executable.
--env TEXT ... Environment variables to inject into process.
--no-hud Disable the Nsight Graphics Capture HUD.
--hud-position TEXT Set the initial position of the Nsight Graphics Capture HUD. May also be used to make the HUD hidden. (default; "Top Left")
--new-console Create a separate, new console from the one the ngfx-capture executable is launched in.
--terminate-after-capture Terminate the application after capture is complete.
Capture Output Options:
-o,--output-file TEXT Output capture file name
--output-dir TEXT Output directory
-n,--frame-count :UINT in [1 - 600] The number of frames to capture.
--bundle-replayer Bundle the ngfx-replay replayer and its dependencies within the capture file (default).
--no-bundle-replayer Do not bundle the ngfx-replay replayer and its dependencies within the capture file.
--non-portable Disable portability of this capture. This will reduce capture size and may lead to increased performance for the system on which the application was captured. The capture may not be able to be replayed on a different system, however.
--compression-level-high Higher Compression. Captures may be generated/load more slowly but with reduced disk space.
--compression-library-zstd Compress using ZSTD library
--compression-library-lz4 Compress using LZ4 library (default)
--no-compression Disable Compression. Not recommended unless debugging.
Capture Triggers:
Select between manual and programmatic capture triggers.
[At most 1 of the following options are allowed]
Options:
--capture-hotkey Capture by Hotkey (default; F11).
--capture-frame :>=1 Capture a specific frame (1-based). The frame number must be greater than 1.
--capture-countdown-timer Capture after a given countdown timer (milliseconds).
Recompression Options:
--recompress-small-data-threshold UINT [16384] Needs: --recompress
Threshold for small data dictionary recompression
Host Visible Video Memory (HVVM) Mode:
HVVM (also known as GPU_UPLOAD heaps) does not support coherent buffer update monitoring, nor capture/replay memory. These options provide workarounds to accommodate these limitations.
[At most 1 of the following options are allowed]
Options:
--hvvm-demote Demote HVVM to system memory (default).
--hvvm-disable Disable HVVM memory. Allocations using HVVM will fail at runtime.
--hvvm-manual-tracking Enable HVVM but require manual tracking of updates via API calls.
--hvvm-cpu-hash Enable HVVM and track updates via CPU hash.
Ray Tracing Options:
--use-rtas-serialize-api Serialization acceleration structures using the serialization API.
--max-sbt-size UINT Max shader binding table deep copy size in bytes. 0 indicates unbounded size.
D3D12 Options:
--d3d12-indirect-sbt-buffer-size UINT Configure the size in bytes of the static buffer used to unroll arguments for ExecuteIndirect(DISPATCH_RAYS).
--d3d12-spoof-resize-buffers-success During capture, instead of executing ResizeBuffers it will be bypassed to return S_OK.
Troubleshooting Knobs:
--passthrough Launch the target application, tracking child process launches, but without injecting capture code. This allows the application to run normally without any capture overhead or modifications. This is useful for disambiguating bad launch arguments from other problems.
--ignore-incompatible If enabled, the frame will attempt to capture despite any incompatibilities. Possible outcomes of proceeding despite an incompatibility include a crash, hang, rendering errors, or incorrect data. Use this option only when you are certain that the incompatibility will not impact your analysis.
--no-lazy-data-collection Disables lazy collection of resource data and instead capture all data at the start of capture.
--block-on-first-incompatibility If enabled, a blocking message box will report the first incompatibility encountered by the application.
--no-block-on-first-incompatibility Disable blocking incompatibility warnings.
--no-block-on-interfering-application Disable blocking on interfering application warnings.
--no-internal-pipeline-caches Disables the internal usage of pipeline caches.
--no-uncached-memory-demotion Disables demoting uncached write-combined memory into cached memory. Cached memory allows for increased capture performance but may impact application GPU performance if write-combined memory is heavily used.
--no-streamline-capture Disable the streamline informational capture feature that shows streamline calls as comments in the event list
--no-process-injection Launch the target application without any process injection. This mode acts as a simple launcher without any interference from process injection. This is useful for disambiguating bad launch arguments from other problems.
--no-vulkan-write-watch-memory Disable the use of write watch to track host visible memory updates.
--no-vulkan-capture-replay-memory Disable overriding device memory allocation flags with VK_MEMORY_ALLOCATE_DEVICE_ADDRESS_BIT and VK_MEMORY_ALLOCATE_DEVICE_ADDRESS_CAPTURE_REPLAY_BIT. This is necessary if the application is later binding addressable buffers but incorrectly excluded the flags on the associated memory.
--no-vulkan-private-data-lookups Disable internal usage of private data objects through VK_EXT_private_data.
Troubleshooting Knobs:
Embed Logging Options
[At most 1 of the following options are allowed]
Troubleshooting Knobs:
--embed-logging Capture log messages more severe than info (warnings, errors, fatal) (default)
Note that captured log messages are embedded with the capture file and can be dumped during replay with the --metadata-logs option
--embed-logging-verbose Capture all log messages regardless of severity
--no-embed-logging Disable capture of log messages entirely
Troubleshooting Knobs:
Resource Data Options
Troubleshooting Knobs:
--capture-full-gpu-allocs Capture full memory allocations (i.e. ID3D12Heap or VkDeviceMemory) as opposed to individual resources (i.e. ID3D12Resource or VkBuffer). This will increase the memory consumption and overhead of capture, but may be necessary for addressing issues where applications read outside the bounds of their defined buffers. This will also reduce the portability of the capture.
Recompression Options:
--recompress Recompress an existing capture file using a higher compression rate
To learn more about Nsight Graphics's ngfx-capture utility, see the documentation at
http://devtools.nvidia.com/docs/Staging/devtools/Dev/Grfx/nsight-graphics/public/UserGuide/graphics-capture-cli.html.
For full documentation, see http://https://docs.nvidia.com/nsight-graphics/index.html.
How to replay#
The ngfx-replay tool is a command line executable used to launch a replay of a graphics capture, such as one created with the ngfx-capture command line executable.
To replay a capture such as one created by ngfx-capture, invoke ngfx-replay using the capture file path as well as any desired options. These options are detailed in the Replay Command Line Argument and Options section.
Once launched, ngfx-replay should display information about the capture process, display replay initialization steps, and finally produce some console output indicating the replay loop has begun.
> ngfx-replay.exe GravityMark_2023_08_16_08_24_52.ngfx-capture
NVIDIA Nsight Graphics Capture Replayer
Loading File ................................... 100% (20ms)
Capture Information:
> Process: MyGame
> Command Line: "MyGame.exe" -d3d12
> Time: 2023_08_16_08_24_52
> Nsight Version: 2023.4.0
> Operating System: Windows 11 (21H2)
> Primary GPU: NVIDIA GeForce RTX 3090
> Driver Vendor: NVIDIA
> Driver Version: 536.52
> Primary API Version: D3D12
Initializing Function Stream ................... 100% (2ms)
Creating Resources ............................. 100% (291ms)
Initializing Resources ......................... 100% (1145ms)
Decoding Function Stream ....................... 100% (1ms)
Function Stream Optimization ................... 100% (0ms)
Function Stream Pre-Pass ....................... 100% (0ms)
Initializing Resource Reset .................... 100% (259ms)
Initializing Execution Engine .................. 100% (2ms)
Replaying Capture:
> 141.19 FPS (Frame: 7.08 ms, Reset: 1.37 ms)
> 144.24 FPS (Frame: 6.93 ms, Reset: 1.37 ms)
Replay Expectations#
Capture files contain a trace of all API calls that occurred between capture begin and capture end. In addition, they contain graphics data for mid-trace host-visible buffer updates, and enough information to recreate all objects used in the API function trace. During replay, the API calls and host-visible buffer updates are replayed in the order they occurred. By default, this replay occurs serially and as fast as possible. Additionally, and by default, command list or command buffer records that occurred during the frame is multithreaded. The result is that graphics replays are generally expected to be as or more GPU bound than the original application. See Replay Command Line Argument and Options for variations on replay multithreading etc.
To correctly replay successive iterations of the replay loop, by default it is assumed that buffers, textures, as well as other objects need to reset their data before proceeding. This is typically not cheap from a wall-clock time perspective. To assist you in understanding the cost of the captured workload, the command-line output of the tool separates out the frame cost from reset cost. Typically, external FPS tools or profilers do not distinguish between the frame cost and reset cost of these captures, however the replayer also injects API-specific markers to help distinguish the frame cost vs. the reset cost. Reset behavior is configurable in options: Replay Command Line Argument and Options.
It should also be noted that after both the frame workload and the reset workload, a wait-for-idle operation is performed. This means that all pending work on all GPU queues is completed before continuing. This ensures that the replayer doesn’t cause conflicts between the frame and reset. It also seems preferable that the frame and reset work do not overlap for users wishing to profile the GPU activity.
Replay Compatibility#
Generally, the replayer strives to be backwards compatible — that is, a capture file from from a previous version of the tool should continue to work on the latest replayer. Note that the reverse is not true: newer captures may contain data that an older replayer does not know how to support.
It is also assumed that a capture from a particular GPU, driver and OS should continue to replay on that GPU, driver and OS. The product strives to replay more broadly as well, i.e., on varying GPUs and drivers, but this portability is partly subject to the specific options used during capture. Consult the capture documentation for more info.
Finally, there are some conditions that make correct replay difficult or impossible: for instance, various alignments may differ especially between GPU vendors. The replayer makes a best effort to detect these conditions and exit gracefully.
Replay Command Line Argument and Options#
For a detailed list of the command line options, pass -h or --help to the ngfx-replay
executable.
NVIDIA Nsight Graphics Replayer CLI Tool
Usage: ngfx-replay [OPTIONS] filename
Positionals:
filename TEXT REQUIRED Graphics capture file
Options:
-h,--help Print this help message and exit
--version Display program version information and exit
--quiet Quiet all console logging (verbosity level 0)
-v,--verbose Verbose console logging (verbosity level 3)
--verbosity-level UINT Console logging verbosity level
--no-present-blit Disable replayer blit from render target to swapchain. Will appear blank but GPU workload will be closer to real application.
--no-seh Suppress catching and processing Win32 structured exceptions
Application Replay Options:
-n,--loop-count UINT:>=1 Replay the specified number of loops
--perf-report-dir TEXT Collect replay performance information to the specified dir
--fixed-timestamps Replay multi-frame captures no faster than the FPS they were captured at
--temp-resource-dir TEXT Create temporary resources files in the specified dir (default=system temp dir)
--no-multithreaded-record Suppress multithreaded record of queue work
--no-multithreaded-pipeline-create Suppress multithreaded creation of all pipelines
--no-multithreaded-rt-pipeline-create Suppress multithreaded creation of ray tracing pipelines
--no-multithreaded-init Suppress multithreaded resource initialization
--max-worker-threads UINT Max number of worker threads used for initialization, command recording, and reset
--no-object-reset-uid UINT ... Skip resetting specific object data in replay loop (specified by uid)
--no-initialized-in-frame-detection Skip checks to detect if an object is fully initialized in frame and therefore can skip reset
--no-internal-perf-markers Hide Nsight internal perf marker usage (e.g. reset, blit-on-present, etc.)
--inject-full-frame-perf-marker Add an perf marker to wrap the non-internal frame work
--no-sysmem-fallback Disable falling back to sysmem when video memory allocations fail for replay resources
Primary Reset Modes:
Primary modes for resource reset
[At most 1 of the following options are allowed]
Options:
--reset Reset all dirty object data in replay loop (default)
--no-reset Skip resetting all object data in replay loop
--reset-only TEXT:{compute,mappable,nonmappable,raster} ...
Selectively opt-in to class(es) of resources to reset
Additional Reset Options:
--reset-force-all-regions Reset all regions of objects even if they are not considered to be dirty
--skip-mapped-memory-and-descriptor-updates-after-iteration-zero
Skip CPU data and descriptor updates after iteration zero
--max-vidmem-bytes-reset-allocation UINT
Max bytes of local memory allowable for replayer's internal reset buffers, used for buffers, textures, heaps, device memory. If exceeded the replayer will spill over to sysmem
Presentation Mode:
Strategy for present to screen
[At most 1 of the following options are allowed]
Options:
--present-wb Force borderless window mode (default)
--present-app Use application presentation mode
--present-hidden Hide replay window
VSync Mode:
Strategy for controlling vsync
[At most 1 of the following options are allowed]
Options:
--vsync-app Use application vsync mode (default)
--vsync-off Force vsync off
--vsync-on Force vsync on
Device Selection:
Explicit override options of GPU device for replay. If none are specified the best match for the capture device will be used.
[At most 1 of the following options are allowed]
Device Selection:
--device-name TEXT Select device by name regex
--device-vendor TEXT Select device by vendor name regex
--device-index UINT Select device by system defined index
Replayer Bundling:
--bundle-replayer Replay capture via bundled version of ngfx-replay executable and with bundled resources
--bundle-replayer-no-rename Needs: --bundle-replayer
Do not rename the ngfx-replay executable when replaying from bundle
--bundle-replayer-dir TEXT Needs: --bundle-replayer
Extract the bundle to the specified directory, creating the directory if it does not exist. By default, a temporary directory is used and cleaned after replay.
--bundle-replayer-extract-only Needs: --bundle-replayer
Extracts the replayer without issuing a replay
Metadata Output:
Options to output some type of metadata and exit, rather than replaying
[At most 1 of the following options are allowed]
Options:
--metadata Print metadata and exit
--metadata-screenshot TEXT Save metadata screenshot (final present embedded in capture) to path and exit (*.png|tga|bmp|jpg supported)
--metadata-functions Print function stream and exit
--metadata-logs Print all captured log messages and exit
--metadata-logs-errors Print all errors from captured log messages and exit
Multibuffer Options:
Multibuffering options
[At most 1 of the following options are allowed]
Application Replay Options:
--multibuffer Enable multi-buffering of the recording, syncs, descriptors, and memory to potentially minimize reset cost
--multibuffer-record-and-sync Enable multi-buffering of recording and syncs to potentially minimize reset cost
Application Profile Options:
Application settings profile options
[At most 1 of the following options are allowed]
Options:
--no-app-profile Disable all profile overrides for the replayer.
Troubleshooting Knobs:
--diagnostic-checkpoints Enable API specific GPU checkpoints (aka breadcrumbs)
--no-internal-pipeline-caches Disable internal usage of pipeline caches
--no-pipeline-caches Disable all usage of pipeline caches
--timeout-interval UINT Set timeouts in milliseconds (default: 2000)
--force-trace-rays-dimensions-to-zero Force the trace rays dimensions to zero. This is useful for debugging purposes.
--no-memory-mapped-file Disable memory mapped file reader
--no-aftermath-replay Disable replay of Aftermath calls from the original application
--no-ngx-replay Disable initialization and replay of the NGX API calls from the original application
--no-dstorage-replay Disable initialization and replay of DirectStorage API calls from the original application
--no-crash-reporting Disable crash reporting
--no-stack-in-crash-reporting Disable showing the callstack in the crash report
--no-block-on-incompatibility Do not show popup for replay incompatibilities; continue automatically
--no-bundled-dlss-plugins Disable using DLSS plugins from the captured application. Instead the plugins deployed by the replayer will be used
--dlss-plugin-path TEXT Load DLSS plugins from the specified path
DX12 Agility Runtime Options:
Options to control which DX12 Agility runtime to use, or the default system dlls
[At most 1 of the following options are allowed]
Troubleshooting Knobs:
--force-dx12-agility-preview Force the usage of the DX12 preview agility runtime
To learn more about Nsight Graphics's ngfx-replay utility, see the documentation at
http://devtools.nvidia.com/docs/Staging/devtools/Dev/Grfx/nsight-graphics/public/UserGuide/graphics-capture-cli.html.
For full documentation, see http://https://docs.nvidia.com/nsight-graphics/index.html.