16.1. Clara Deploy GPU Profiler
Clara Deploy GPU Profiler is designed to monitor GPU utilization in real time from compute and memory perspective while a job is executed on Clara Platform.
Clara GPU Profiler Workflow: Clara Deploy GPU profiler enables the profiler
Workflow
.Multi-Level Profiling: Multi-level profiling is part of profiler
Workflow
and is integrated with the tool.Multi-GPU tracking: Profiler supports multi-GPU tracking and monitoring.
Nsight-system time series graph: Clara Deploy GPU profiler supports profiling of time series data from Nsight-systems. Nsight-systems must be installed on the system for this feature. Operator must also be prepared for Nsight-systems profiling.
Clara Deploy workflow is defined as a combination of Clara Deploy Job, Multi-level profiling (System, Operator, Nsight) controls and other controls (Save, Export and Capture).
User can select the Pipeline and Dataset in any order. Once the buckets of Pipeline and Dataset are filled, Execute
option is enabled. Clicking on Execute
starts the job on Clara Deploy Platform. Once the job is complete, system level details are displayed by default and other controls are enabled as specified by the user in the pipeline definition.
16.1.2.1.Pipeline update
Each operator in the pipeline must be updated with profiler
field to enable it for GPU profiler. A sample profiler
field is shown below for bone
operator in bone pipeline. User can update field to enable selective profiling activities.
profiler:
totalTimeStr: "ROIgeneratorElapsedTime"
opModules: ["Readinginput", "Histogramgeneration", "Volumethresholding",
"VolumeMerging","MinMaxfinding", "Normalization",
"filtering", "3Dconnectedcomponentanalysis", "PublishingOutput" ]
cpurun: False
rerun: True
entrypoint: "pythonroi.py"
cpurun_params: "NA"
rerun_params: "-eNVIDIA_CLARA_NOSYNCLOCK=TRUE-eROI_OPERATION=organ"
nsys: True
ncomp: False
nsight_out_path: "/app/out"
Multi-level profiling is part of Clara Deploy workflow and includes System, Operator and Nsight profiling. It can be enabled by defining the profiler
field in the pipeline definition of each operator. If profiler
field is missed for all operators, user can still use the tool for System
level profiling.
16.1.3.1.System Profiling
System level profiling displays following plots.
Job Timings in seconds
Average % GPU Activity
GPU % Activity during the Job
Percentage GPU Memory occupancy by processes during the Job.
Operator runtime plot
In case of Multi-GPU system, each GPU is enabled to be selected to display the activity during the job.
16.1.3.1.1.Operator Runtime Plot
Operator runtime is additional plot that is part of system level profiling. This plot is generated if the operator has profiler
details with totalTimeStr
key. totalTimeStr
is the string that the tool looks for in the operator log to register the runtime of the operator. For e.g. in bone
operator ROI generator Elapsed Time
gives the runtime of the operator. In the logs this must be in the following format:
<any_string><KEY><any_string>: <time_in_milliseconds>
Log must be created such that the time is in milliseconds and is separated by :
with rest of the text. There must not be any other text other that time in milliseconds after :
.
16.1.3.2.Operator Profiling
Only Operators that have opModules
enabled in profiler
details in the pipeline definition, will be displayed in operator level profiling
16.1.3.2.1.Operator Modules
Modules within an operator can be picked up by the profiler to report the runtime of the modules. opModules
key must be populated in profiler
in the operator description in the pipeline. For. E.g. for bone operator all the modules are listed as:
opModules: ["Readinginput", "Histogramgeneration",
"Volumethresholding", "VolumeMerging",
"MinMaxfinding", "Normalization", "filtering",
"3Dconnectedcomponentanalysis", "PublishingOutput"]
If the keyword is repeated in a log with timings, they are added up to report the timing of the module. Keywords in opModules
must be unique and must not be used in any other logging details of the operator for correctness.
16.1.3.2.2.Re-run
Operator can be re-run without Clara platform, as a docker container. User must provide rerun
as True, and rerun_param
keys in the profiler
setting of the operator. All parameters required to run the docker container must be present as a string in rerun_param
key.
16.1.3.2.3.CPU-run
Currently not supported
16.1.3.3.Nsight Profiling
For Nsight profiling, Nsight-systems 2020.3 (https://developer.nvidia.com/rdp/assets nsight-systems-2020-3-linux-deb-installer) must be installed on the system with Clara Deploy GPU Profiler.
Operator must be prepared for Nsight-systems profiling in GPU Profiler. Following are the steps:
Install nsight-systems-2019.3.7.5 on the system
Copy contents from /usr/local/cuda/nsight-systems-2019.3.7.5/ to the working directory of the operator in folder name nsight-systems-2019.3.7.5/. Refer to the operator
roi-generator-perf
, packaged with the GPU profiler.Alternatively nsight-systems-2019.3.7.5 can be installed while creating the operator.
The version of nsight is different in the operator and in the system running the Profiler. Ensure both system and operator have nsight system active and running before using the profiler. Default parameters are supported for Nsight-systems and no parameter control is active from the GPU profiler in this version of the tool.
Following fields in the profiler
setting must be updated for the operator in the pipeline.
nsys: True
nsys_out_path: # path to save raw data
rerun_param:
Nsight-compute is currently not supported in Clara Deploy GPU profiler.