4. Occupancy Calculator Python Interface

4.1. Introduction

NVIDIA Nsight Compute features a Python-based interface for performing occupancy calculations and analysis for kernels on NVIDIA GPUs. These APIs are designed to help developers understand and optimize the utilization of GPU resources to achieve better performance for their kernel.

The module is called ncu_occupancy and works on any Python version from 3.7 1. It can be found in the extras/python directory of your NVIDIA Nsight Compute package.

1

On Linux machines you will also need a GNU-compatible libc and libgcc_s.so.

4.2. API Reference

This documents the content of the ncu_occupancy package which can be found in the extras/python directory of your NVIDIA Nsight Compute installation.

class ncu_occupancy.OccupancyCalculator

Provide methods to calculate occupancy and analyze ways to improve it, for a given GPU.

Parameters
  • computeCapabilityMajor (int) – The major compute capability version of the GPU.

  • computeCapabilityMinor (int) – The minor compute capability version of the GPU.

Returns

An instance of the occupancy calculator.

Return type

OccupancyCalculator

get_occupancy_limiters(occupancy_parameters: ncu_occupancy.OccupancyParameters) list

Get the occupancy limiters for the given occupancy parameters.

Parameters

occupancy_parameters (OccupancyParameters) – The input parameters for the occupancy calculation.

Returns

list of OccupancyLimiter

get_optimal_occupancy(occupancy_parameters: ncu_occupancy.OccupancyParameters, occupancy_variable_list: list = [<OccupancyVariable.THREADS_PER_BLOCK: 0>]) dict

Get the optimal occupancy configuration.

Optimal occupancy is calculated by varying input occupancy variable values while keeping other occupancy variable values constant. If no input occupancy variable list provided then OccupancyVariable.THREADS_PER_BLOCK will be considered by default.

Parameters
Returns

The optimal occupancy configuration. The dictionary contains the following key-value pairs:
  • ’optimal_occupancy’: (float) The optimal occupancy.

  • ’occupancy_variable_config’:
    • For single occupancy variable input: list of tuple of ranges.

    • For two occupancy variables input: list of tuple of value combinations. The tuple contains the values in the same order as the input occupancy variable list.

Return type

dict

get_resource_utilization(occupancy_parameters: ncu_occupancy.OccupancyParameters) dict

Get the resource utilization for the given occupancy parameters.

Parameters

occupancy_parameters (OccupancyParameters) – The input parameters for the occupancy calculation.

Returns

Resource utilization. The dictionary contains the following key-value pairs:
  • ’sm_occupancy’ : (float) The occupancy of the SMs.

  • ’allocated_blocks’ : (int) The number of allocated blocks out of the total possible blocks per SM.

  • ’resource_utilization’(dict) The resource utilization for each resource i.e. threads, registers, shared memory. The resource utilization dictionary contains the following key-value pairs:
    • ’<resource name>’: (dict) The resource utilization for the resource. The resource utilization dictionary contains the following key-value pairs:
      • ’resource_per_block’: (int) The resource utilized per block.

      • ’unused_resource_count’: (int) The unused resource count per SM.

      • ’unallocated_blocks’: (int) The number of unallocated blocks per SM.

Return type

dict

get_sm_occupancy(occupancy_parameters: ncu_occupancy.OccupancyParameters) float

Calculate the occupancy of the SMs for the given occupancy parameters.

Parameters

occupancy_parameters (OccupancyParameters) – The input parameters for the occupancy calculation.

Returns

The occupancy of the SMs.

Return type

float

class ncu_occupancy.OccupancyLimiter

Enum representing the occupancy limiters.

REGISTERS

Register usage is the occupancy limiter.

SHARED_MEMORY

Shared memory usage is the occupancy limiter.

BLOCKS

Block size is the occupancy limiter.

BARRIERS

Barrier usage is the occupancy limiter.

class ncu_occupancy.OccupancyParameters

OccupancyParameters is a dataclass that holds configuration parameters for occupancy calculations.

shared_mem_size

Shared memory size configuration (bytes). (default: 0)

Type

int

threads_per_block

Number of threads per block. (default: 256)

Type

int

registers_per_thread

Number of registers per thread. (default: 32)

Type

int

shared_mem_per_block

Shared memory (bytes) per block. (default: 2048)

Type

int

num_block_barriers

Number of block barriers. (default: 1)

Type

int

class ncu_occupancy.OccupancyVariable

Enum representing the occupancy variables.

THREADS_PER_BLOCK

Threads per block.

REGISTERS_PER_THREAD

Registers per thread.

SHARED_MEMORY_PER_BLOCK

Shared memory per block.

BLOCK_BARRIERS

Block barriers.

ncu_occupancy.get_gpu_data(major: int, minor: int) dict

Get the GPU data for the given compute capability version.

Parameters
  • major (int) – The major compute capability version of the GPU.

  • minor (int) – The minor compute capability version of the GPU.

Returns

The GPU data for the given compute capability version. The dictionary contains the following key-value pairs:
  • ’cc_major’: (int) The major compute capability version of the GPU.

  • ’cc_minor’: (int) The minor compute capability version of the GPU.

  • ’sm_version’: (str) The SM version of the GPU.

  • ’threads_per_warp’: (int) The number of threads per warp.

  • ’max_warps_per_sm’: (int) The number of warps per SM.

  • ’max_threads_per_sm’: (int) The number of threads per SM.

  • ’max_thread_blocks_per_sm’: (int) The number of thread blocks per SM.

  • ’block_barriers_per_sm’: (int) The number of block barriers per SM.

  • ’smem_per_sm’: (int) The shared memory (bytes) per SM.

  • ’max_shared_mem_per_block’: (int) The maximum shared memory (bytes) per block.

  • ’registers_per_sm’: (int) The registers per SM.

  • ’max_regs_per_block’: (int) The maximum registers per block.

  • ’max_regs_per_thread’: (int) The maximum registers per thread.

  • ’reg_allocation_unit_size’: (int) The register allocation unit size.

  • ’reg_allocation_granularity’: (str) The register allocation granularity.

  • ’shared_mem_allocation_unit_size’: (int) The shared memory allocation unit size.

  • ’warps_allocation_granularity’: (str) The warp allocation granularity.

  • ’max_thread_block_size’: (int) The maximum thread block size.

  • ’shared_mem_size_configs’: (list of int) The shared memory size configurations (bytes).

  • ’warp_reg_allocation_granularities’: (list of int) The warp register allocation granularities.

Return type

dict