15.1. Clara Deploy GPU Profiler¶
Clara Deploy GPU Profiler is designed to monitor GPU utilization in real time from compute and memory perspective while a job is executed on Clara Platform.
- Clara GPU Profiler Workflow: Clara Deploy GPU profiler enables the profiler
- Multi-Level Profiling: Multi-level profiling is part of profiler
Workflowand is integrated with the tool.
- Multi-GPU tracking: Profiler supports multi-GPU tracking and monitoring.
- Nsight-system time series graph: Clara Deploy GPU profiler supports profiling of time series data from Nsight-systems. Nsight-systems must be installed on the system for this feature. Operator must also be prepared for Nsight-systems profiling.
15.1.2. Profiler Workflow¶
Clara Deploy workflow is defined as a combination of Clara Deploy Job, Multi-level profiling (System, Operator, Nsight) controls and other controls (Save, Export and Capture).
User can select the Pipeline and Dataset in any order. Once the buckets of Pipeline and Dataset are filled,
Execute option is enabled. Clicking on
Execute starts the job on Clara Deploy Platform. Once the job is complete, system level details are displayed by default and other controls are enabled as specified by the user in the pipeline definition.
18.104.22.168. Pipeline update¶
Each operator in the pipeline must be updated with
profiler field to enable it for GPU profiler. A sample
profiler field is shown below for
bone operator in bone pipeline. User can update field to enable selective profiling activities.
profiler: totalTimeStr: "ROI generator Elapsed Time" opModules: ["Reading input", "Histogram generation", "Volume thresholding", "Volume Merging","Min Max finding", "Normalization", "filtering", "3D connected component analysis", "Publishing Output" ] cpurun: False rerun: True entrypoint: "python roi.py" cpurun_params: "NA" rerun_params: "-e NVIDIA_CLARA_NOSYNCLOCK=TRUE -e ROI_OPERATION=organ" nsys: True ncomp: False nsight_out_path: "/app/out"
15.1.3. Multi-Level Profiling¶
Multi-level profiling is part of Clara Deploy workflow and includes System, Operator and Nsight profiling. It can be enabled by defining the
profiler field in the pipeline definition of each operator. If
profiler field is missed for all operators, user can still use the tool for
System level profiling.
22.214.171.124. System Profiling¶
System level profiling displays following plots.
- Job Timings in seconds
- Average % GPU Activity
- GPU % Activity during the Job
- Percentage GPU Memory occupancy by processes during the Job.
- Operator runtime plot
In case of Multi-GPU system, each GPU is enabled to be selected to display the activity during the job.
126.96.36.199.1. Operator Runtime Plot¶
Operator runtime is additional plot that is part of system level profiling. This plot is generated if the operator has
profiler details with
totalTimeStr is the string that the tool looks for in the operator log to register the runtime of the operator. For e.g. in
ROI generator Elapsed Time gives the runtime of the operator. In the logs this must be in the following format:
Log must be created such that the time is in milliseconds and is separated by
: with rest of the text. There must not be any other text other that time in milliseconds after
188.8.131.52. Operator Profiling¶
Only Operators that have
opModules enabled in
profiler details in the pipeline definition, will be displayed in operator level profiling
184.108.40.206.1. Operator Modules¶
Modules within an operator can be picked up by the profiler to report the runtime of the modules.
opModules key must be populated in
profiler in the operator description in the pipeline. For. E.g. for bone operator all the modules are listed as:
opModules: ["Reading input", "Histogram generation", "Volume thresholding", "Volume Merging", "Min Max finding", "Normalization", "filtering", "3D connected component analysis", "Publishing Output"]
If the keyword is repeated in a log with timings, they are added up to report the timing of the module. Keywords in
opModules must be unique and must not be used in any other logging details of the operator for correctness.
Operator can be re-run without Clara platform, as a docker container. User must provide
rerun as True, and
rerun_param keys in the
profiler setting of the operator. All parameters required to run the docker container must be present as a string in
Currently not supported
220.127.116.11. Nsight Profiling¶
For Nsight profiling, Nsight-systems 2020.3 (https://developer.nvidia.com/rdp/assets nsight-systems-2020-3-linux-deb-installer) must be installed on the system with Clara Deploy GPU Profiler.
Operator must be prepared for Nsight-systems profiling in GPU Profiler. Following are the steps:
- Install nsight-systems-2019.3.7.5 on the system
- Copy contents from /usr/local/cuda/nsight-systems-2019.3.7.5/ to the working directory of the operator in folder name nsight-systems-2019.3.7.5/. Refer to the operator
roi-generator-perf, packaged with the GPU profiler.
- Alternatively nsight-systems-2019.3.7.5 can be installed while creating the operator.
The version of nsight is different in the operator and in the system running the Profiler. Ensure both system and operator have nsight system active and running before using the profiler. Default parameters are supported for Nsight-systems and no parameter control is active from the GPU profiler in this version of the tool.
Following fields in the
profiler setting must be updated for the operator in the pipeline.
nsys: True nsys_out_path: # path to save raw data rerun_param:
Nsight-compute is currently not supported in Clara Deploy GPU profiler.