cupti.pm_sampling#
The cupti.pm_sampling module provides the pythonic layer for the CUPTI PM Sampling API, built on top of cupti.cupti.
APIs#
- class cupti.pm_sampling.Collector(device_index: int = 0)#
Bases:
objectPer-device PM Sampling collector.
- enable() None#
Initializes CUPTI state and enables PM sampling on the device.
This call prepares profiler infrastructure for the process, resolves the device chip name for host-side metric handling, and creates the PM sampling object required by later APIs.
Call this before
configure(),start(),stop(),decode(), ordisable().- Raises:
ValueError – If device_index passed during instantiation is invalid
MemoryError – If PM sampling object creation fails due to out-of-memory.
RuntimeError – If PM sampling is already enabled on the device, or if the nvperf* libraries could not be loaded.
PermissionError – If CUPTI reports insufficient privileges (
ERROR_INSUFFICIENT_PRIVILEGES).cupti.cupti.cuptiError – If there is an internal CUPTI error
- configure(
- *,
- metrics: list[str],
- hardware_buffer_size: int = 536870912,
- sampling_interval: int = 10000,
- trigger_mode: TriggerMode = TriggerMode.GPU_TIME_INTERVAL,
- hw_buffer_append_mode: HardwareBuffer_AppendMode = HardwareBuffer_AppendMode.KEEP_OLDEST,
- single_pass_metric_set_name: str | None = None,
Applies PM sampling configuration for the enabled collector.
The configuration defines which metrics are collected and how sampling is triggered by the GPU. This method can be used to configure or reconfigure the active PM sampling object.
- Parameters:
metrics – Metric names to collect in the PM sampling session.
hardware_buffer_size – Size (in bytes) of the hardware buffer that stores raw PM sampling data.
sampling_interval – Sampling period interpreted according to
trigger_mode(SYSCLK cycles or nanoseconds).trigger_mode – PM sampling trigger mode.
hw_buffer_append_mode – append mode for the records in hardware buffer.
single_pass_metric_set_name – Optional single-pass metric set name used to initialize the host metrics evaluator for this configuration.
- Raises:
RuntimeError – If the collector has not been enabled yet.
ValueError – If
hardware_buffer_sizeorsampling_intervalis non-positive, or if the requested configuration is not supported. Unsupported configurations can happen when metrics cannot be collected in a single pass, whenTriggerMode.GPU_TIME_INTERVALis not supported on the GPU, or whenHardwareBuffer_AppendMode.KEEP_LATESTis not supported on the GPU.
Notes
TriggerMode.GPU_TIME_INTERVALis not supported on Turing GPU architecture and GA100 GPU. It is supported on Ampere GA10x and later GPU architectures.HardwareBuffer_AppendMode.KEEP_LATESTis not supported on Turing GPU architecture. It is supported on Ampere and later GPU architectures.
- start() None#
Starts periodic PM sampling on the enabled device.
Once started, the GPU collects configured metrics into the PM sampling hardware buffer according to the active trigger mode and interval.
- Raises:
RuntimeError – If the collector has not been enabled yet or if
configure()has not been called, or if PM Sampling has already started.
- stop() None#
Stops PM sampling metric collection on the device.
After stopping, no new PM samples are collected until
start()is called again.- Raises:
RuntimeError – If the collector has not been enabled yet or if
configure()has not been called, or if PM Sampling has already stopped.
- decode(
- max_samples: int = 10000,
Decodes PM sampling hardware-buffer records into a counter image.
This reads collected PM sampling data from the device-side hardware buffer and writes it into a host counter data image. Counter-data info and per-sample timestamps are stored in the returned
CounterData; iterate it to evaluate metrics per completed sample.- Parameters:
max_samples – Maximum number of samples to provision in the decoded counter data image.
- Returns:
CounterDatafor the session metrics (seeCounterData.metrics).- Raises:
RuntimeError – If the collector has not been enabled yet.
ValueError – If
max_samplesis non-positive.MemoryError – If CUPTI reports
ERROR_OUT_OF_MEMORYwhile decoding.cupti.cupti.cuptiError – For other CUPTI failures.
Data structures#
- class cupti.pm_sampling.Sample( )#
Bases:
objectOne completed PM sampling interval with timestamps and metric values.
- metric_values#
One value per configured metric; index
imatchesmetrics[i]on theCounterDatathis sample came from.- Type:
- class cupti.pm_sampling.CounterData(
- *,
- image: ndarray,
- metrics: list[str],
- chip_name: str,
- num_total_samples: int,
- num_populated_samples: int,
- num_completed_samples: int,
- start_timestamps: list[int],
- end_timestamps: list[int],
Bases:
objectDecoded PM sampling results from
Collector.decode().Exposes how many samples were captured and which metrics were collected. Iterating yields each completed sample as a
Sample(timestamps and evaluated metric values).- metrics#
Metric names in evaluation order; aligns with each
Sample.metric_valueslist.
- num_completed_samples#
Samples that are fully complete; iteration yields this many
Sampleobjects (indices0…num_completed_samples - 1).- Type:
- Iteration:
for sample in counter_datayieldsSampleper completed sample.len()returnsnum_completed_samples.- Indexing and slicing:
counter_data[i]returns the i-th completed sample. Negative indexing is supported (for example,counter_data[-1]returns the last completed sample).counter_data[a:b:c]returns alist[Sample]following standard Python slice semantics. Valid completed-sample indices map to0throughnum_completed_samples - 1.
Note
Iteration performs metric evaluation on the host and is expensive.
Note
Instances can be serialized to defer or offload metric extraction to another process or an offline workflow.
Enumerations#
- class cupti.pm_sampling.TriggerMode(value)#
Bases:
IntEnumSee
CUpti_PmSampling_TriggerMode.- COUNT = 2#
- GPU_SYSCLK_INTERVAL = 0#
- GPU_TIME_INTERVAL = 1#
- class cupti.pm_sampling.DecodeStopReason(value)#
Bases:
IntEnumSee
CUpti_PmSampling_DecodeStopReason.- COUNT = 3#
- COUNTER_DATA_FULL = 1#
- END_OF_RECORDS = 2#
- OTHER = 0#
- class cupti.pm_sampling.HardwareBuffer_AppendMode(value)#
Bases:
IntEnumSee
CUpti_PmSampling_HardwareBuffer_AppendMode.- KEEP_LATEST = 1#
- KEEP_OLDEST = 0#