16.1. Clara Deploy GPU Profiler


Clara Deploy GPU Profiler is designed to monitor GPU utilization in real time from compute and memory perspective while a job is executed on Clara Platform.

  • Clara GPU Profiler Workflow: Clara Deploy GPU profiler enables the profiler Workflow.
  • Multi-Level Profiling: Multi-level profiling is part of profiler Workflow and is integrated with the tool.
  • Multi-GPU tracking: Profiler supports multi-GPU tracking and monitoring.
  • Nsight-system time series graph: Clara Deploy GPU profiler supports profiling of time series data from Nsight-systems. Nsight-systems must be installed on the system for this feature. Operator must also be prepared for Nsight-systems profiling.

Clara Deploy workflow is defined as a combination of Clara Deploy Job, Multi-level profiling (System, Operator, Nsight) controls and other controls (Save, Export and Capture).

User can select the Pipeline and Dataset in any order. Once the buckets of Pipeline and Dataset are filled, Execute option is enabled. Clicking on Execute starts the job on Clara Deploy Platform. Once the job is complete, system level details are displayed by default and other controls are enabled as specified by the user in the pipeline definition. Pipeline update

Each operator in the pipeline must be updated with profiler field to enable it for GPU profiler. A sample profiler field is shown below for bone operator in bone pipeline. User can update field to enable selective profiling activities.


profiler: totalTimeStr: "ROIgeneratorElapsedTime" opModules: ["Readinginput", "Histogramgeneration", "Volumethresholding", "VolumeMerging","MinMaxfinding", "Normalization", "filtering", "3Dconnectedcomponentanalysis", "PublishingOutput" ] cpurun: False rerun: True entrypoint: "pythonroi.py" cpurun_params: "NA" rerun_params: "-eNVIDIA_CLARA_NOSYNCLOCK=TRUE-eROI_OPERATION=organ" nsys: True ncomp: False nsight_out_path: "/app/out"

Multi-level profiling is part of Clara Deploy workflow and includes System, Operator and Nsight profiling. It can be enabled by defining the profiler field in the pipeline definition of each operator. If profiler field is missed for all operators, user can still use the tool for System level profiling. System Profiling

System level profiling displays following plots.

  • Job Timings in seconds
  • Average % GPU Activity
  • GPU % Activity during the Job
  • Percentage GPU Memory occupancy by processes during the Job.
  • Operator runtime plot

In case of Multi-GPU system, each GPU is enabled to be selected to display the activity during the job. Operator Runtime Plot

Operator runtime is additional plot that is part of system level profiling. This plot is generated if the operator has profiler details with totalTimeStr key. totalTimeStr is the string that the tool looks for in the operator log to register the runtime of the operator. For e.g. in bone operator ROI generator Elapsed Time gives the runtime of the operator. In the logs this must be in the following format:


<any_string><KEY><any_string>: <time_in_milliseconds>

Log must be created such that the time is in milliseconds and is separated by : with rest of the text. There must not be any other text other that time in milliseconds after :. Operator Profiling

Only Operators that have opModules enabled in profiler details in the pipeline definition, will be displayed in operator level profiling Operator Modules

Modules within an operator can be picked up by the profiler to report the runtime of the modules. opModules key must be populated in profiler in the operator description in the pipeline. For. E.g. for bone operator all the modules are listed as:


opModules: ["Readinginput", "Histogramgeneration", "Volumethresholding", "VolumeMerging", "MinMaxfinding", "Normalization", "filtering", "3Dconnectedcomponentanalysis", "PublishingOutput"]

If the keyword is repeated in a log with timings, they are added up to report the timing of the module. Keywords in opModules must be unique and must not be used in any other logging details of the operator for correctness. Re-run

Operator can be re-run without Clara platform, as a docker container. User must provide rerun as True, and rerun_param keys in the profiler setting of the operator. All parameters required to run the docker container must be present as a string in rerun_param key. CPU-run

Currently not supported Nsight Profiling

For Nsight profiling, Nsight-systems 2020.3 (https://developer.nvidia.com/rdp/assets nsight-systems-2020-3-linux-deb-installer) must be installed on the system with Clara Deploy GPU Profiler.

Operator must be prepared for Nsight-systems profiling in GPU Profiler. Following are the steps:

  • Install nsight-systems-2019.3.7.5 on the system
  • Copy contents from /usr/local/cuda/nsight-systems-2019.3.7.5/ to the working directory of the operator in folder name nsight-systems-2019.3.7.5/. Refer to the operator roi-generator-perf, packaged with the GPU profiler.
  • Alternatively nsight-systems-2019.3.7.5 can be installed while creating the operator.

The version of nsight is different in the operator and in the system running the Profiler. Ensure both system and operator have nsight system active and running before using the profiler. Default parameters are supported for Nsight-systems and no parameter control is active from the GPU profiler in this version of the tool. Following fields in the profiler setting must be updated for the operator in the pipeline.


nsys: True nsys_out_path: # path to save raw data rerun_param:

Nsight-compute is currently not supported in Clara Deploy GPU profiler.

© Copyright 2018-2020, NVIDIA Corporation. All rights reserved. Last updated on Jun 28, 2023.