cupti.pm_sampling#

The cupti.pm_sampling module provides the pythonic layer for the CUPTI PM Sampling API, built on top of cupti.cupti.

APIs#

class cupti.pm_sampling.Collector(device_index: int = 0)#

Bases: object

Per-device PM Sampling collector.

enable() → None#

Initializes CUPTI state and enables PM sampling on the device.

This call prepares profiler infrastructure for the process, resolves the device chip name for host-side metric handling, and creates the PM sampling object required by later APIs.

Call this before configure(), start(), stop(), decode(), or disable().

Raises:

ValueError – If device_index passed during instantiation is invalid
MemoryError – If PM sampling object creation fails due to out-of-memory.
RuntimeError – If PM sampling is already enabled on the device, or if the nvperf* libraries could not be loaded.
PermissionError – If CUPTI reports insufficient privileges (ERROR_INSUFFICIENT_PRIVILEGES).
cupti.cupti.cuptiError – If there is an internal CUPTI error

configure( *, metrics: list[str], hardware_buffer_size: int = 536870912, sampling_interval: int = 10000, trigger_mode: TriggerMode = TriggerMode.GPU_TIME_INTERVAL, hw_buffer_append_mode: HardwareBuffer_AppendMode = HardwareBuffer_AppendMode.KEEP_OLDEST, single_pass_metric_set_name: str | None = None, ) → None#

Applies PM sampling configuration for the enabled collector.

The configuration defines which metrics are collected and how sampling is triggered by the GPU. This method can be used to configure or reconfigure the active PM sampling object.

Parameters:

metrics – Metric names to collect in the PM sampling session.
hardware_buffer_size – Size (in bytes) of the hardware buffer that stores raw PM sampling data.
sampling_interval – Sampling period interpreted according to trigger_mode (SYSCLK cycles or nanoseconds).
trigger_mode – PM sampling trigger mode.
hw_buffer_append_mode – append mode for the records in hardware buffer.
single_pass_metric_set_name – Optional single-pass metric set name used to initialize the host metrics evaluator for this configuration.

Raises:

RuntimeError – If the collector has not been enabled yet.
ValueError – If hardware_buffer_size or sampling_interval is non-positive, or if the requested configuration is not supported. Unsupported configurations can happen when metrics cannot be collected in a single pass, when TriggerMode.GPU_TIME_INTERVAL is not supported on the GPU, or when HardwareBuffer_AppendMode.KEEP_LATEST is not supported on the GPU.

Notes

TriggerMode.GPU_TIME_INTERVAL is not supported on Turing GPU architecture and GA100 GPU. It is supported on Ampere GA10x and later GPU architectures.
HardwareBuffer_AppendMode.KEEP_LATEST is not supported on Turing GPU architecture. It is supported on Ampere and later GPU architectures.

start() → None#

Starts periodic PM sampling on the enabled device.

Once started, the GPU collects configured metrics into the PM sampling hardware buffer according to the active trigger mode and interval.

Raises:: RuntimeError – If the collector has not been enabled yet or if configure() has not been called, or if PM Sampling has already started.

stop() → None#

Stops PM sampling metric collection on the device.

After stopping, no new PM samples are collected until start() is called again.

Raises:: RuntimeError – If the collector has not been enabled yet or if configure() has not been called, or if PM Sampling has already stopped.

decode( max_samples: int = 10000, ) → CounterData#

Decodes PM sampling hardware-buffer records into a counter image.

This reads collected PM sampling data from the device-side hardware buffer and writes it into a host counter data image. Counter-data info and per-sample timestamps are stored in the returned CounterData; iterate it to evaluate metrics per completed sample.

Parameters:

max_samples – Maximum number of samples to provision in the decoded counter data image.

Returns:

CounterData for the session metrics (see CounterData.metrics).

Raises:

RuntimeError – If the collector has not been enabled yet.
ValueError – If max_samples is non-positive.
MemoryError – If CUPTI reports ERROR_OUT_OF_MEMORY while decoding.
cupti.cupti.cuptiError – For other CUPTI failures.

disable() → None#

Disables PM sampling and tears down profiler state for this process.

If PM sampling is currently enabled, this destroys the PM sampling object for the device. The method also deinitializes profiler state initialized during enable().

Data structures#

class cupti.pm_sampling.Sample( start_timestamp: int, end_timestamp: int, metric_values: list[float], )#

Bases: object

One completed PM sampling interval with timestamps and metric values.

start_timestamp#

Start of the sampling window.

Type:: int

end_timestamp#

End of the sampling window.

Type:: int

metric_values#

One value per configured metric; index i matches metrics[i] on the CounterData this sample came from.

Type:: float

class cupti.pm_sampling.CounterData( *, image: ndarray, metrics: list[str], chip_name: str, num_total_samples: int, num_populated_samples: int, num_completed_samples: int, start_timestamps: list[int], end_timestamps: list[int], )#

Bases: object

Decoded PM sampling results from Collector.decode().

Exposes how many samples were captured and which metrics were collected. Iterating yields each completed sample as a Sample (timestamps and evaluated metric values).

metrics#

Metric names in evaluation order; aligns with each Sample.metric_values list.

Type:: list[str]

num_total_samples#

Total sample slots in this result.

Type:: int

num_populated_samples#

Slots that contain any data.

Type:: int

num_completed_samples#

Samples that are fully complete; iteration yields this many Sample objects (indices 0 … num_completed_samples - 1).

Type:: int

Iteration:: for sample in counter_data yields Sample per completed sample. len() returns num_completed_samples.
Indexing and slicing:: counter_data[i] returns the i-th completed sample. Negative indexing is supported (for example, counter_data[-1] returns the last completed sample). counter_data[a:b:c] returns a list[Sample] following standard Python slice semantics. Valid completed-sample indices map to 0 through num_completed_samples - 1.