4. Occupancy Calculator Python Interface
4.1. Introduction
NVIDIA Nsight Compute features a Python-based interface for performing occupancy calculations and analysis for kernels on NVIDIA GPUs. These APIs are designed to help developers understand and optimize the utilization of GPU resources to achieve better performance for their kernel.
The module is called ncu_occupancy
and works on any Python version from 3.7
1. It can be found in the extras/python
directory of your NVIDIA
Nsight Compute package.
- 1
On Linux machines you will also need a GNU-compatible libc and
libgcc_s.so
.
4.2. API Reference
This documents the content of the ncu_occupancy
package which can be found
in the extras/python
directory of your NVIDIA Nsight Compute installation.
- class ncu_occupancy.OccupancyCalculator
Provide methods to calculate occupancy and analyze ways to improve it, for a given GPU.
- Parameters
- Returns
An instance of the occupancy calculator.
- Return type
- get_occupancy_limiters(occupancy_parameters: ncu_occupancy.OccupancyParameters) list
Get the occupancy limiters for the given occupancy parameters.
- Parameters
occupancy_parameters (
OccupancyParameters
) – The input parameters for the occupancy calculation.- Returns
- get_optimal_occupancy(occupancy_parameters: ncu_occupancy.OccupancyParameters, occupancy_variable_list: list = [<OccupancyVariable.THREADS_PER_BLOCK: 0>]) dict
Get the optimal occupancy configuration.
Optimal occupancy is calculated by varying input occupancy variable values while keeping other occupancy variable values constant. If no input occupancy variable list provided then
OccupancyVariable.THREADS_PER_BLOCK
will be considered by default.- Parameters
occupancy_parameters (
OccupancyParameters
) – The input parameters for the occupancy calculation.occupancy_variable_list (
list
ofOccupancyVariable
, optional) – The list of occupancy variables to consider for optimal occupancy calculation. Only up to two occupancy variables can be specified. (default:OccupancyVariable.THREADS_PER_BLOCK
)
- Returns
- The optimal occupancy configuration. The dictionary contains the following key-value pairs:
’optimal_occupancy’: (
float
) The optimal occupancy.
- Return type
- get_resource_utilization(occupancy_parameters: ncu_occupancy.OccupancyParameters) dict
Get the resource utilization for the given occupancy parameters.
- Parameters
occupancy_parameters (
OccupancyParameters
) – The input parameters for the occupancy calculation.- Returns
- Resource utilization. The dictionary contains the following key-value pairs:
’sm_occupancy’ : (
float
) The occupancy of the SMs.’allocated_blocks’ : (
int
) The number of allocated blocks out of the total possible blocks per SM.- ’resource_utilization’(
dict
) The resource utilization for each resource i.e. threads, registers, shared memory. The resource utilization dictionary contains the following key-value pairs: - ’<resource name>’: (
dict
) The resource utilization for the resource. The resource utilization dictionary contains the following key-value pairs:
- ’<resource name>’: (
- ’resource_utilization’(
- Return type
- get_sm_occupancy(occupancy_parameters: ncu_occupancy.OccupancyParameters) float
Calculate the occupancy of the SMs for the given occupancy parameters.
- Parameters
occupancy_parameters (
OccupancyParameters
) – The input parameters for the occupancy calculation.- Returns
The occupancy of the SMs.
- Return type
- class ncu_occupancy.OccupancyLimiter
Enum representing the occupancy limiters.
- REGISTERS
Register usage is the occupancy limiter.
- SHARED_MEMORY
Shared memory usage is the occupancy limiter.
- BLOCKS
Block size is the occupancy limiter.
- BARRIERS
Barrier usage is the occupancy limiter.
- class ncu_occupancy.OccupancyParameters
OccupancyParameters is a
dataclass
that holds configuration parameters for occupancy calculations.Shared memory size configuration (bytes). (default: 0)
- Type
Shared memory (bytes) per block. (default: 2048)
- Type
- class ncu_occupancy.OccupancyVariable
Enum representing the occupancy variables.
- THREADS_PER_BLOCK
Threads per block.
- REGISTERS_PER_THREAD
Registers per thread.
- SHARED_MEMORY_PER_BLOCK
Shared memory per block.
- BLOCK_BARRIERS
Block barriers.
- ncu_occupancy.get_gpu_data(major: int, minor: int) dict
Get the GPU data for the given compute capability version.
- Parameters
- Returns
- The GPU data for the given compute capability version. The dictionary contains the following key-value pairs:
’cc_major’: (
int
) The major compute capability version of the GPU.’cc_minor’: (
int
) The minor compute capability version of the GPU.’sm_version’: (
str
) The SM version of the GPU.’threads_per_warp’: (
int
) The number of threads per warp.’max_warps_per_sm’: (
int
) The number of warps per SM.’max_threads_per_sm’: (
int
) The number of threads per SM.’max_thread_blocks_per_sm’: (
int
) The number of thread blocks per SM.’block_barriers_per_sm’: (
int
) The number of block barriers per SM.’smem_per_sm’: (
int
) The shared memory (bytes) per SM.’max_shared_mem_per_block’: (
int
) The maximum shared memory (bytes) per block.’registers_per_sm’: (
int
) The registers per SM.’max_regs_per_block’: (
int
) The maximum registers per block.’max_regs_per_thread’: (
int
) The maximum registers per thread.’reg_allocation_unit_size’: (
int
) The register allocation unit size.’reg_allocation_granularity’: (
str
) The register allocation granularity.’shared_mem_allocation_unit_size’: (
int
) The shared memory allocation unit size.’warps_allocation_granularity’: (
str
) The warp allocation granularity.’max_thread_block_size’: (
int
) The maximum thread block size.’shared_mem_size_configs’: (
list
ofint
) The shared memory size configurations (bytes).’warp_reg_allocation_granularities’: (
list
ofint
) The warp register allocation granularities.
- Return type