Thermovision#
- class nsight.thermovision.ThermalController(
- thermal_mode: Literal['auto', 'manual', 'off'] = 'auto',
- thermal_wait: int | None = None,
- thermal_cont: int | None = None,
- thermal_timeout: int | None = None,
- verbose: bool = False,
Bases:
objectGPU thermal monitoring and throttling prevention.
Manages GPU temperature and prevents thermal throttling by pausing profiling when the GPU gets too hot and resuming after cooling.
- Parameters:
- init()#
Initialize NVML and get GPU handle.
- Return type:
- Returns:
True if temperature retrieval is supported, False otherwise.
- throttle_guard()#
Check thermal state and pause if GPU is too hot.
Thermal headroom = temperature margin before GPU starts throttling.
Operates in two modes: - Auto mode: Automatically adjusts thermal_cont based on workload - Manual mode: Uses user-provided thresholds without adaptation
Adaptive Algorithm (in auto mode): 1. When thermal headroom reaches thermal_cont after cooling, start counting iterations :rtype:
NoneRun kernel as headroom drops from thermal_cont toward thermal_wait
When headroom drops below thermal_wait, analyze iteration count: - Few iterations (<TARGET_MIN_ITERATIONS): GPU heats quickly → increase thermal_cont (cool more) - Many iterations (>TARGET_MAX_ITERATIONS): GPU heats slowly → decrease thermal_cont (cool less)
Wait until GPU cools back to thermal_cont, then repeat
- Return type:
None