NVIDIA Tegra
NVIDIA Tegra Linux Driver Package

Development Guide
28.3 Release


 
Power Management for TX2/TX2i Devices
 
Interacting Features
Kernel-Space Power Saving Features
Chipset Power States
Clock and Voltage Management
CPU Power Management
Idle Management with cpuidle
Memory Power Management
EMC Frequency Scaling Policy
Max-Q and Max-P Power Efficiency
Thermal Management
Linux Thermal Framework
Thermal Zone
Configuring a Thermal Zone Using the Device Tree
Thermal Sensing in Linux
NCT Sensors
BPMP Sensors
Thermal Cooling
Fan Management
Clock Throttling
Software Thermal Shutdown
Thermal Management in BPMP
Thermal Sensing
SOC_THERM
AOTAG
BPMP Thermal Framework
Hardware Throttling
Hardware Thermal Shutdown
Software-based Power Consumption Modeling
Related Tools and Techniques
3D Frequency Scaling
Setting Frequencies
Maximizing Jetson TX2 Performance
Using CPU Hot Plug
NVIDIA® Tegra® system-on-a-chip and NVIDIA® Tegra® Board Support Package (BSP) provides many features related to power management, thermal management, and electrical management. These features deliver the best user experience possible given the constraints of a particular platform. The target user experience ensures the perception that the device provides:
Uniformly high performance
Excellent battery life
Perfect stability
Comfortable and cool to the touch
This topic describes the power, thermal, and electrical management features visible to software, as well as some tools and related techniques.
Interacting Features
Power, thermal, and electrical management features dynamically constrain knobs, such as:
Clock gate settings
Clock frequencies
Power gate (or regulator enable) settings
Voltages
Processor power state (i.e., which idle state is selected for the CPU)
Peripheral power state (i.e., which idle state is selected for an I/O controller)
Chipset power state
Availability of CPU cores to the OS
Some of these knobs are constrained by more than one feature. For example, cpufreq implements load based scaling based on the how busy the CPU is and adjusts the CPU frequency accordingly. CPU thermal management, however, can override the target frequency of cpufreq. Consequently, before attempting to debug power, performance, thermal, or electrical problems, familiarize yourself with all of the power, thermal, and electrical management features present in the BSP.
Kernel-Space Power Saving Features
This topic describes the features that Tegra implements to save power and extend battery life. Many of these features are implemented by the Linux kernel, with support from firmware and hardware, and without significant involvement from user-space.
Chipset Power States
The supported power states are listed in order of increasing flexibility or configurability:
Off: There is only one way for a system to be off.
SC7 minimally flexible Deep Sleep (SC7) offers a small amount of configurability. For example, prior to entering Deep Sleep, software can select which of the many hardware wake events can wake the chip from Deep Sleep.
Active state is extraordinarily flexible in terms of power and performance: It encompasses activity levels from low power audio playback through peak performance scenarios. Power consumption in the active state can range from tens of milliwatts up to multiple Watts.
Supported Power States
The following table describes the supported power states.
Power State
Functionality
Characteristics
Off
Power rails
None of the power rails supplying Tegra and DRAM are powered.
State
No state is maintained in Tegra or DRAM.
Exiting
The only way to exit this state is via a cold-boot (into active mode).
Deep Sleep (SC7)
Power rails
VDD_RTC, VDDIO_DDR, VDDIO_SYS, and DRAM power rails are powered on. VDD_CORE and VDD_CPU are powered off.
State
Tegra maintains a small amount of state in the PMC block. DRAM maintains state.
Exiting
Exit from this state occurs based on a pre-defined set of wake events into active mode.
Active
Power rails
VDD_RTC, VDDIO_DDR, VDDIO_SYS, VDD_CORE, and DRAM power rails are powered on. Other power rails (including VDD_CPU) may be on.
State
Software actively manages the power states of the many devices comprising Tegra.
Exiting
Software can initiate a transition from Active to any other power state.
Power State Mapping to Linux
Tegra BSP maps these hardware power states onto Linux Power States as follows.
OS Power State
Chipset Power State
Comments
Off
Off
-
Suspend
SC7
Software can choose whether to enter SC7 before the OS enters suspend.
Running/Idle (display on or off)
Active
Many of the devices within Tegra may be idle or disabled. They are under driver control. For example, VDD_CPU may be powered off and the companion CPU may be power-gated.
 
Note:
NVIDIA Tegra X2 uses SCy instead of LPx for the name of the chipset power state.
Clock and Voltage Management
Because frequency is proportional to voltage, dynamic voltage scaling is closely related to frequency scaling. For example, higher frequencies require higher voltages and vice versa.
Most clock register manipulation on Tegra X2 is handled by the Boot and Power Management (BPMP) firmware - power management firmware running on the BPMP. A Linux kernel driver on the CPU exposes a somewhat simplified view of the physical clock tree to software on the main CPU via the Linux Common Clock Framework.
Each of the significant clock domains on the chip has its own dedicated clock source known as a Noise Aware Frequency Lock Loop (NAFLL).
Regulator Framework
The Linux regulator framework provides an abstraction that allows regulator consumer drivers to dynamically adjust voltage or current regulators at runtime, without knowledge of the underlying hardware power tree.
The framework provides a mechanism that platform initialization code can use to declare a power tree topology and assign a driver that provides regulators for each node in the hardware power tree. Such a driver is called a regulator provider driver.
Tegra BSP configures the platform power tree appropriately for Tegra devices. Additionally, drivers within Tegra BSP act as regulator consumers, where appropriate.
When porting Tegra BSP to a new platform, ensure that:
The platform power tree is configured correctly to match the underlying hardware.
All drivers for peripheral devices correctly make use of the regulator consumer APIs.
The Device Tree and board configuration file information for your new platform avoids conflicts between functions using the same I/O pads. BSP drivers registering as regulator consumers can cause I/O pads on the chip to be unavailable for other functions.
The Tegra core power rails (VDD_CORE, VDD_CPU, VDD_SRAM, VDD_GPU) are under the direct control of the BPMP firmware. They are configured via the BPMP device tree blob (which is distinct from the Linux device tree blob)
CPU Power Management
Tegra CPU power management strategy includes dynamic frequency scaling with dynamic voltage scaling, idle power states, and core management tuned for the Tegra X2 architecture.
Frequency Management with cpufreq
Tegra BSP implements CPU Dynamic Frequency Scaling (DFS) with the Linux cpufreq subsystem. The cpufreq subsystem comprises:
Platform drivers to implement the clock adjustment mechanism.
Governors to implement frequency scaling policies.
Core framework to connect governors to platform drivers.
The policy for frequency scaling depends on which cpufreq governor is selected at runtime.
For details consult the information available at:
<top>/kernel/kernel/kernel-4.4/Documentation/cpu-freq/
For each Tegra hardware reference design, a cpufreq governor is selected and tuned to achieve a balance between power and performance.
When a governor requests a CPU frequency change, the Tegra-specific cpufreq platform driver reconciles that request with limits imposed by thermal or electrical limits. The driver updates the clock speed of the CPU.
Tegra X2 uses an NAFLL to clock each CPU. The NAFLLs are configured for AVFS. Hardware, with the assistance of the BPMP, ensures that the CPU voltage is appropriate for the NAFLL to deliver requested CPU frequencies.
Idle Management with cpuidle
The Linux cpuidle infrastructure supports the implementation of SoC-specific idle states for each CPU core. cpuidle lacks direct support for idle states applicable to an entire CPU cluster and for idle states extending beyond a CPU cluster.
For more information on the Linux cpuidle infrastructure, see:
<top>/kernel/kernel/kernel-4.4/Documentation/cpuidle/
NVIDIA provides a Tegra-specific cpuidle driver that plugs into the cpuidle framework to enable CPU idle power management.
Per-Core CPU Idle States
The Tegra cpuidle driver exposes two per-core CPU idle states as follows:
Core State
Description
C1
The CPU core is clock-gated.
Transition CPU Cluster to Idle
When the final core within a cluster transitions to idle, the Tegra-specific cpuidle driver can transition the CPU cluster to a cluster idle state.
Cluster State
Description
CC1
The CPU cluster’s clock is halted.
Additionally, as the final core within a cluster transitions to idle, the Tegra cpuidle driver optionally disables any SoC resources where the CPU was the last active user.
For example, the final CPU core transitioning to idle can optionally do one or more of the following:
Transition DRAM to self-refresh
Clock-gate MC/EMC
Halt various PLLs
CPU Idle
The idle task is scheduled when there are no runnable tasks left in the runqueue for a particular core. This task, through the cpuidle driver and cpuidle governor, selects the core and puts it into a low-power state, where it stays until an interrupt wakes it up to process more work.
When the last active core in a CPU cluster goes into an idle or offline state, the idle task puts the entire CPU cluster in a low power-state.
CCPLEX Idle States
Core states are denoted Cx states, cluster states are denoted as CCx states, and CCPlex states are denoted as CCPx states. The table below summarizes the different states available on the T186.
State
Meaning
Core states
C1
Clock-gating
Cluster states
CC1
Auto clock-gating
CC3
fmax@Vmin
CCPlex states
CCP3
EMC reduction state during CC3
Using KConfig and Device Tree Node to Enable cpuidle
Use KConfig and the Device Tree node to enable cpuidle.
To enable cpuidle from the configuration file
Set the following option:
CONFIG_CPU_IDLE=y
To enable cpuidle from device tree
Use the following compatibility string if not already enabled:
cpuidle {
compatible = "nvidia,tegra18x-cpuidle";
status = "okay";
};
To get and set the core power state of the CPU
On a Denver core, the pathnames of the nodes that represent core states are:
C1: /sys/devices/system/cpu/cpu<x>/cpuidle/state0/*
On an A57 core, the pathnames are:
C1: /sys/devices/system/cpu/cpu<x>/cpuidle/state0/*
To get the status of a core power state on core <x>, read the appropriate node. To set the status, write an ASCII 0 to 1 to the node.
Note:
ASCII 1 corresponds to “disabled,” and 0 to “enabled.”
To get cluster states
To get the status of the cluster states enabled for each cluster, read the appropriate node for the type of cores in the cluster:
For A57 cores, read:
/sys/kernel/debug/cpuidle_a57/deepest_cc_state
For Denver cores, read:
/sys/kernel/debug/cpuidle_denver/deepest_cc_state
The value returned is:
1: Only CC1 is enabled
For example, to get the status of the cluster states on a Denver cluster, run the following command:
cat /sys/kernel/debug/cpuidle_denver/deepest_cc_state
To get the per-core state statistics
To get the number of times the kernel requested a specified core to enter a specified state, read the following node:
cat /sys/devices/system/cpu/cpu<x>/cpuidle/state<y>/usage
To get the number of times a specified core actually entered a specified state, run the following command:
cat /sys/kernel/debug/tegra_mce/cstats
The command requests information from MCE/MTS, which actually sets the state. MCE/MTS may decline to set a requested state. For example, because the actual idle time for the core is less than the crossover threshold value.
Note:
Background translations in Denver can bloat the Denver idle state counts.
To get the total time in microseconds that a specified core has spent in a specified state since boot, read the following device:
cat /sys/devices/system/cpu/cpu<x>/cpuidle/state<y>/time
For example, to get the number of times that Denver core 2 has entered state CC6, run the following command:
cat /sys/devices/system/cpu/cpu2/cpuidle/state1/usage
To get the total time in microseconds that Denver core 2 has spent in state CC6, run the following command:
cat /sys/devices/system/cpu/cpu2/cpuidle/state1/time
To disable cpuidle at boot time
Remove or disable the compatibility string nvidia,tegra18x-cpuidle from the appropriate device tree file.
To disable a core/cluster power state at boot time
Remove or disable the appropriate core/cluster state nodes from the following device trees:
tegra186-a57-cpuidle.dtsi
tegra186-denver-cpuidle.dtsi.
Memory Power Management
Tegra chipsets include power saving features whose operation is largely invisible to software at runtime. Most of those features are statically enabled at boot, according to settings in the boot configuration table (BCT).
Additionally, Tegra BSP implements EMC frequency scaling, which is dynamic frequency scaling for the memory controller (EMC/MC) and DRAM. This is a critical power saving feature that requires tuning and characterization for each new printed circuit board design.
The calibration results include a BCT and an EMC DVFS table specific to the board design. The EMC DVFS table must be included in the platform BPMP device tree file.
EMC Frequency Scaling Policy
The following factors affect EMC frequency scaling policy at runtime:
The entries in the EMC DVFS table
The average memory bandwidth being used (as measured by hardware)
Requests made by various device drivers (cpufreq, graphics drivers, USB, HDMI™, and display)
Any limits dynamically imposed by thermal throttling
Max-Q and Max-P Power Efficiency
Jetson X2 is designed with good power efficiency in mind by selecting the right PMIC and regulators and power tree optimization. Specifically, the power efficiency peaks when the module power consumption is 7.5W. Good power efficiency is achievable when the module power consumption is up to 15W. When the module power is limited at 7.5W and 15W, the module is in Max-Q (7.5W for TX2, 10W for TX2i) mode and Max-P (15W for TX2, 20W for TX2i) mode. For each mode, several configurations with various CPU frequencies and number of cores online are possible.
Capping the Memory, CPU, and GPU frequencies at pre-qualified level confines the module to target mode. The configurations pre-defined by NVIDIA are as follows.
NVPModel Clock Configuration for Jetson TX2
Mode Name
EDP
MAX-Q
MAX-P
MAX-P
MAX-P
Power Budget
n/a
7.5W
15W
15W
15W
Mode ID
0
1
2
3
4
Online A57 CPU
4
4
4
4
1
Online D20 CPU
2
0
2
0
1
A57 CPU Maximal Frequency (MHz)
2000
1200
1400
2000
345
D20 CPU Maximal Frequency (MHz)
2000
n/a
1400
N/A
2000
GPU Maximal Frequency (MHz)
1300
850
1122
1122
1122
Memory Maximal Frequency (MHz)
1866
1331
1600
1600
1600
The default mode is MAX-P (id:3).
NVPModel Clock Configuration for Jetson TX2i UCM1 Profile
Mode Name
EDP
MAX-Q
MAX-P
MAX-P
MAX-P
Power Budget
n/a
10W
20W
20W
20W
Mode ID
0
1
2
3
4
Online A57 CPU
4
4
4
4
1
Online D20 CPU
2
0
2
0
1
A57 CPU Maximal Frequency (MHz)
1920
1200
1400
1920
345
D20 CPU Maximal Frequency (MHz)
1958
n/a
1400
n/a
1958
GPU Maximal Frequency (MHz)
1236
850
1122
1122
1122
Memory Maximal Frequency (MHz)
1600
1600
1600
1600
1600
The default mode is MAX-P (id:3).
NVPModel Clock Configuration for Jetson TX2i UCM2 Profile
Mode Name
EDP
MAX-Q
MAX-P
MAX-P
MAX-P
Power Budget
n/a
10W
20W
20W
20W
Mode ID
0
1
2
3
4
Online A57 CPU
4
4
4
4
1
Online D20 CPU
2
0
2
0
1
A57 CPU Maximal Frequency (MHz)
1420
1200
1400
1420
345
D20 CPU Maximal Frequency (MHz)
1497
n/a
1400
n/a
1497
GPU Maximal Frequency (MHz)
918
850
918
918
918
Memory Maximal Frequency (MHz)
1600
1600
1600
1600
1600
The default mode is MAX-P (id:3).
To change the mode
Run the following command:
sudo /usr/sbin/nvpmodel -m x
Where x is one of the numbers of 0, 1, 2, 3, 4
To find the current mode
Run the following command:
sudo /usr/sbin/nvpmodel -q
To learn about other options
Run the following command:
/usr/sbin/nvpmodel -h
Once you set a mode the module stays in that mode until you change the switch the model. The mode persists across power cycle and SC7.
You can define your own custom mode by adding a mode definition to the following file:
<top>/device/t18x/t186/nvpmodel.conf.
The following is an example entry for mode 2:
< POWER_MODEL ID=2 NAME=MAXP_CORE_ALL >
# cpu core settings
/sys/devices/system/cpu/cpu1/online 1
/sys/devices/system/cpu/cpu2/online 1
/sys/devices/system/cpu/cpu3/online 1
/sys/devices/system/cpu/cpu4/online 1
/sys/devices/system/cpu/cpu5/online 1
 
# cpu clock settings
# A57 cluster
/sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq 0
/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq 1400000
 
# Denver cluster
/sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq 1400000
 
# gpu clock settings
/sys/devices/17000000.gp10b/devfreq/17000000.gp10b/min_freq 0
/sys/devices/17000000.gp10b/devfreq/17000000.gp10b/max_freq 1120000000
 
# emc clock settings
/sys/kernel/nvpmodel_emc_cap/emc_iso_cap 1600000000
The frequency unit of measure for the CPU is KHz. The unit for GPU and EMMC is Hz. You must assign a unique number in the ID field. Test your use case to
Determine how many active cores to use
The frequencies for each CPU clusters, GPU and EMC frequencies
The frequencies selected by user are subject to the EDP limit defined in mode 0.
Thermal Management
Thermal Management is essential for system stability and quality of user experience. The Tegra X2 thermal management provides the following capabilities:
Sensing: for temperature reporting
Cooldown: for removing heat via the fan and for controlling heat via the software clock throttling
Slowdown: for hardware clock throttling
Shutdown for orderly software shutdown and hardware thermal reset
Previously, Tegra thermal management was performed by the software on the main CPU. Thermal management in Tegra X2 is performed by:
Drivers for the on-die thermal sensors
Board and Power Management Processor (BPMP) for slowdown and hardware thermal reset
The following table identifies each thermal management action and the associated module for the Tegra chip.
Thermal Action
Module Name
Tegra X2
Sensing
soctherm.c
BPMP firmware
aotag.c
BPMP firmware
nct1008.c
Kernel software
Cooldown for software throttling
tegraXX_throttle.c
Kernel software
pwm_fan.c
Kernel software
Slowdown for hardware throttling
soctherm.c
BPMP firmware
Software shutdown
thermal_core.c
Kernel software
Hardware shutdown
soctherm.c and aotag.c
BPMP firmware
Linux Thermal Framework
The Linux thermal framework provides generic user-space and kernel-space interfaces for working with devices that measure temperature and devices used to control temperatures. The central component of the framework is the Thermal Zone.
More information about the Linux thermal framework is available at:
<top>/kernel/Documentation/thermal/sysfs-api.txt
Thermal Zone
A thermal zone is a virtual object that represents an area on the die whose temperature is monitored and controlled. Essentially, a working thermal zone is more than a sensor, but acts as an object with the following components:
Temperature sensor
Cooling device
Trip points
Governor
Tegra BSP includes drivers that provide interfaces defined by these components.
This topic introduces these components and demonstrates how they form a thermal zone on a Tegra-based device.
Configuring a Thermal Zone Using the Device Tree
The thermal zone provides knobs to tune the thermal response of the zone. Tegra BSP provides several thermal zones tuned to provide optimum thermal performance. These provided thermal zones can be modified by editing the entries in the device tree. Users can define sensors to use temperature limits and cooling actions on those limits. When a device becomes too hot, in most cases, it can be resolved by tuning the thermal zone.
The following code snippet provides an example of a thermal zone for Tegra X2. This thermal zone monitors the temperature of the THERMAL_ZONE_CPU sensor.
The clock throttling is performed using the CPU-balanced cooling device when the passive trip point, trip_bthrot, is crossed at 95.5 degrees Celsius.
BCPU-therm {
status = “okay”;
polling-delay-passive = <500>;
thermal-zone-params {
governor-name = “step_wise”;
};
trips {
trip_critical {
temperature = <101000>;
type = “critical”;
hysteresis = <0>;
writable;
};
trip_bthrot {
temperature = <95500>;
type = “passive”;
hysteresis = <0>;
writable;
};
};
cooling-maps {
map0 {
trip = <&{/thermal-zones/BCPU-therm/trips/trip_bthrot}>;
cdev-type = “cpu-balanced”;
cooling-device = <&{/bthrot_cdev/cpu_balanced} THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
};
};
};
More information about thermal knobs is available at:
<top>/kernel/Documentation/devicetree/bindings/thermal.txt
Temperature Sensors
A temperature sensor in a thermal zone is responsible for reporting the temperature in milli-Celsius. Tegra has several types of temperature sensors spread across on the die and board.
For more information see Thermal Sensing in Linux.
Trip Points
Trip points are used to communicate with the thermal zone. The trip point identifies the temperature at which to perform a thermal action. Trip points are classified as active or passive, based on the type of cooling they trigger. A trip point is classified as critical if it triggers a thermal shutdown. A cooling map specifies how a cooling device is associated with certain trip points. Tegra BSP supports fan and clock throttling.
Cooling Devices
A cooling device does not actually remove any actual heat; only fan cooling device can remove heat. The Linux thermal framework makes this distinction by classifying fan cooling as an “active” cooling device. Clock throttling, which is the other type of cooling device, is classified as a “passive” cooling device.
For more information see Fan Management and Clock Throttling.
Governors
Thermal management requires some form of feedback control system that keeps the device within a safe operating temperature. The governor implements this feedback control loop. While the Linux thermal framework provides many different governors, Tegra BSP provides a simple Proportional Integral Derivative (PID) controller for all passive throttling needs.
Tegra BSP Specific Thermal Zones
Platform specific thermal zones are provided in the Tegra BSP. They are tuned to provide the best performance within the thermal constraints of the device. Each thermal zone uses a temperature sensor that is controlled by either the Linux kernel or the BPMP firmware as described in the following table.
Thermal Zone
Thermal Sensor
ABI Name
Cooling Action
Balanced Throttle Temperature in Degrees Celsius
BCPU-therm
THERMAL_ZONE_CPU
cpu_balanced
95.5
MCPU-therm
THERMAL_ZONE_AUX
cpu_balanced
95.5
GPU-therm
THERMAL_ZONE_GPU
-
-
PLL-therm
THERMAL_ZONE_PLLX
-
-
AO-therm
THERMAL_ZONE_AO
gpu_balanced
93.5
Tboard_tegra
tmp451
-
-
Tdiode_tegra
tmp451
-
-
thermal-fan-est
 
pwm_fan
51.0
For more information see Thermal Management in BPMP.
For information on AOTAG versus GPU-therm for GPU balanced throttles, see AOTAG in Thermal Sensing.
All gains achieved by tuning are limited by the Thermal Design Power (TDP) of the system. Tuning cannot remedy a faulty TDP, and removing all the thermal zones does not guarantee maximum performance because it can cause irreversible damage to the device and/or resets.
Thermal Sensing in Linux
The Tegra platform includes several drivers for temperature sensing.
NCT Sensors
Tegra BSP includes a driver for devices such as:
NCT1008
NCT72
TMP451
These devices can sense their own temperature as well as sensing the temperature of a remote diode. Tegra platforms have these sensors setup as follows:
Thermal Zone
Thermal Sensor
Sensed Location
Tdiode_tegra
Remote sensor
Temperature on die near GPU
Tboard_tegra
Local sensor
Temperature of the board
Tegra BSP configures this sensor to operate in an extended mode to increase the temperature range to 64 Degrees Celsius to 191 degrees Celsius.
Operation During SC7
On many platforms, the voltage rail that powers the sensor is gated when Tegra enters SC7 state. Consequently, the sensor is stopped when Tegra enters SC7 and turned back on when Tegra exits SC7 state.
Thermal Capabilities
The NCT sensors generate thermal events for:
Thermal zone trip points
Hardware thermal shutdown
Correction Offset
The NCT sensors allow software to program a static offset temperature for the remote sensor. This accounts for any inaccuracy that may be present in the sensor hardware. Tegra BSP reads the offset from the Device Tree and programs into the offset register on boot. The offset is calculated and validated via oil bath experiments.
BPMP Sensors
With Tegra X2, the soctherm and aotag drivers in the Linux kernel are replaced with the tegra_bpmp_thermal sensor driver. This module registers itself as the sensor device driver with the Linux thermal framework for all the thermal sensors, except the NCT sensors.
Each BPMP sensor is exposed using the Application Binary Interface (API) and is given an ABI name as shown in the table in Tegra BSP Specific Thermal Zones. BPMP sensors, without the thermal_zone prefix, are as follows. These BPMP sensors have one trip allocated and used as a thermal zone trip point.
The tegra_bpmp_thermal driver walks through the list of thermal trips in a thermal zone based on the current temperature. It then comes up with a set of trips to program the BPMP sensor that is specified in the Thermal Zone. The driver then uses the following thermal message requests (MRQ) to communicate with the BPMP thermal framework.
CMD_THERMAL_QUERY_ABI
CMD_THERMAL_GET_TEMP
CMD_THERMAL_SET_TRIP
CMD_THERMAL_GET_NUM_ZONES
The driver receives a CMD_THERMAL_HOST_TRIP_REACHED MRQ message when a particular sensor crosses a trip point. This is then relayed back to the Linux thermal framework.
For more information on these thermal management features provided as part of the Tegra BSP, see Thermal Management in BPMP.
Thermal Cooling
Tegra BSP provides thermal management using fan control and throttling of various clocks in the system.
Fan Management
Fan management in Tegra BSP is provided using active cooling in the form of fan control. The cooling device pwm-fan provides:
Fan speed control by programming the PWM controller
Ramp-up and ramp-down control to change the speed of the fan smoothly
Fan control during various power states
The PWM-RPM mapping, and the various ramp rates, are stored as part of the device tree binary. The pwm-fan cooling device maps these PWM values to a cooling state. The fan cooling device can be attached to monitor the temperature of any of the Tegra BSP sensors. As the temperature increases, the governor progressively picks a deeper cooling state for the fan. This results in a higher RPM for the fan which then results in more cooling.
Tegra thermal management uses the fan as the first line of defense to delay clock throttling until a much higher temperature.
Clock Throttling
Tegra BSP provides thermal cooling by throttling various clocks in the system. When a rising temperature crosses a trip point, clock throttling relies on the DVFS capabilities of the clocks to reduce their operating frequency, and thereby the voltage of the rail that powers the clock. This lowered frequency and voltage reduces the power consumption which then helps in controlling the temperature.
Because cooling is achieved by reducing the clock frequency, there is a direct impact on the performance and user experience. If a device feels warm and seems sluggish, it may be due to thermal throttling on the clocks. This can be remedied by tuning the thermal zone provided in the following Tegra BSP balanced cooling devices:
gpu_balanced
cpu_balanced
emergency_balanced
Each of these balanced cooling devices provides several cooling states that translate to a maximum allowable operating frequency for the CPU, GPU, and EMC clocks. These frequencies are optimized to provide the best possible performance at a given temperature. The frequency tables for these clocks are part of the device tree binary.
The governor uses the current temperature of the sensor as an input to the feedback control loop. Similarly, the governor uses the output is as the new cooling state for the operation of the cooling device. As the device heats up, the governor progressively picks a higher cooling which then results in a higher frequency cap for all the clocks, and potentially higher cooling. Tegra BSP performs this thermal throttling of the clocks to maintain the junction temperature of the die within the recommended safe limits.
Software Thermal Shutdown
The thermal zones also define a special type of trip point called a critical trip point that triggers a software shutdown. This special trip point allows the operating system to save its state and perform an orderly shutdown before a hardware reset due to high temperature rates. Tegra BSP defines one critical trip point per thermal zone. Users can set the lower limit for the orderly shutdown. A thermal shutdown occurs after all the other cooling strategies have failed. It is considered a rare event. The shutdown limits are as follows.
Thermal Zone
Shutdown Limit in Degrees Celsius
BCPU-therm
101.0
MCPU-therm
101.0
Tdiode_tegra
107
Thermal Management in BPMP
The Tegra BSP thermal management features are part of the firmware running on BPMP for tegra platforms running any host operating system (host OS) on the CPU.
Thermal Sensing
The BPMP firmware hosts the soctherm and aotag drivers for the on-die thermal sensors as follows:
Thermal Sensor
ABI Name
Sensed Location
AOTAG
AOTAG
THERMAL_ZONE_AO
Near the GPU
SOC_THERM
PLLX
THERMAL_ZONE_PLLX
B/W GPU and L2 RAM
AUX (x2)
THERMAL_ZONE_AUX
Within Tegra
CPU (x4)
THERMAL_ZONE_CPU
Within A57 cluster
GPU (x2)
THERMAL_ZONE_GPU
Within the GPU
SOC_THERM
SOC_THERM is the collection of on-chip Ring Oscillators where their frequency changes are based on the temperature. To convert a measured frequency to a temperature, the oscillating frequency of the sensor, at a fixed temperature, must be known in advance and stored in the on-chip fuses.
The soctherm driver uses these fuses during boot and calibrates the sensor. Once the calibration is complete, the temperature sensor reports the temperature, in degrees Celsius, with a 0.5C precision margin.
Sensors and Sensor Groups
The temperature sensors on the chip are logically grouped into sensor groups, based on their proximity to certain hardware blocks. The sensor groups are represented as a single sensor to the host OS and the BPMP firmware.
For example, Tegra X2 has four temperature sensors in the A57 cluster. These are grouped as CPU sensors that are represented as THERMAL_ZONE_CPU to the operating system running on the CPUs. SOC_THERM reports the temperature of a given group by taking the maximum of all the sensors present in the group.
Thermal Event Detection
Thermal sensors can report the temperature when the current temperature crosses a software programmed trip point. The sensors are capable of monitoring several of these software trip points to perform the following thermal actions:
Report when the thermal trip point has been crossed
Trigger a hardware thermal shutdown
Trigger hardware throttling
Voltage Rail Dependencies
To provide accurate temperature sensing, the sensors require a minimum voltage. Additionally, the sensors cannot operate when the rail is power-gated.
When the system is in a low power-state, the firmware provides the following modes of operation:
No temperature measurements during SC7: Because the rail powering the sensor is power-gated during SC7 state, the oscillator is not running. Therefore, the frequency-to-temperature conversion may result in inaccurate values. To ensure no spurious temperature reports from the sensors, stop the sensor before entering SC7 state.
The firmware provides the AOTAG sensor for measuring the temperature SC7 state. When SC7 state is exited, the sensors are restarted.
Fallback to PLLX sensor on Tegra X2: To ensure accurate temperature readings during minimum voltage, use the PLLX oscillator. On platforms where the minimum voltage is not guaranteed, the firmware falls back on the PLLX oscillator with a programmable offset. The result is that all the sensors invalidate their oscillators and use the PLLX oscillator with the added offset. This fallback on the PLLX oscillator allows for continuous temperature measurement, even at lower voltage levels.
As a side effect of the PLLX fallback, the programmable offset compensates for the fact that the PLLX oscillator is farther away from the oscillator that it is replacing. The host OS continues to use all the thermal zones without side effects. the off-setting ensures that the CPU sensor reports more accurate temperatures than the PLLX sensor. The host OS must therefore continue to use the right sensors for measuring the CPU temperatures.
AOTAG
The Always-On Thermal Alert Generator (AOTAG) is a ring oscillator based temperature sensor. It is present in the always-on power domain and can monitor temperatures even when the device is in SC7 state. Other than this distinction, the AOTAG sensor operation is the same as any of the SOC_THERM sensors.
Thermal Event Detection
Just like the SOC_THERM sensor, the AOTAG sensor can generate interrupts, additionally it can monitor two software programmed level that the BSP uses as:
Thermal zone trip points
Hardware thermal shutdown
Fallback for GPU for Tegra X2
AOTAG sensor is ideal for monitoring GPU temperatures instead of the GPU sensor in SOC_THERM because:
Proximity of AOTAG sensor to the GPU
AOTAG capability to measure temperatures at all voltage levels into SC7
BPMP Thermal Framework
The BPMP firmware hosts a thermal framework to:
Register thermal sensors as thermal zones as identified in Thermal Sensing.
Allows BPMP modules to register trip points on the thermal zones.
Allows the host OS to register trip points using thermal MRQ messages.
Provides trip point management and reporting.
The thermal framework maintains a list of trip points per sensor that includes the current trip point from the host OS, and various BPMP modules. As the temperatures change, the framework examines the list of current trip points and notifies the owner of the trip of the temperature change. The notification is sent using a callback for the BPMP owned trips and the thermal MRQ command CMD_THERMAL_HOST_TRIP_REACHED for trips that are owned by the host OS.
The primary thermal MRQ requests handled by the framework are:
CMD_THERMAL_QUERY_ABI
CMD_THERMAL_GET_TEMP
CMD_THERMAL_SET_TRIP
CMD_THERMAL_GET_NUM_ZONES
For details on these MRQ requests, see the API documentation.
Since there can be several trip points on a given sensor, the thermal framework must ensure a notification is generated anytime a given trip point is crossed. For example, if THERMAL_ZONE_CPU has a trip point at 55C, 60C, 65C, and 70C, the thermal framework sends a single notification when the temperature crosses 55C, 60C, 65C, and 70C.
Additionally, the framework implements hysteresis to prevent sending too many notifications. So, for the above example, the framework:
Sends one notification when the temperature reaches 55C
Waits until the temperature drops below 54C
Sends another notification when the temperature rises back to 55C
To achieve the above notifications, the thermal framework sets low trip points on the sensors to receive events that the temperature has dropped below the limit.
Hardware Throttling
Each element in a power delivery system includes limitations such as:
A battery supplies a certain amount of current without shutting down.
A regulator provides a certain amount of current before it fails to maintain its output voltage.
An inductor, in a switching regulator, may overheat if the ripple current is too large.
These limitations can result in fast transient electrical and thermal events such as:
Over-current at the battery
Voltage droop at the PMIC
Temperature spikes
The firmware refers to these as OC alarms and triggers hardware throttling of the clocks to handle these events.
Impact
Similar to software throttling, hardware throttling may cause lower performance. However, since these events are rare and transient in nature, the user experience is minimally impacted.
The host OS is not notified of these events, but can detect the drop in the clocks using some performance measuring tools that sample the CPU cycle counters. While the thermal management in the host OS works to maintain temperature within control, the hardware throttling performs a clamp down of the clocks to handle events.
Throttle Points and Vector Configuration
The BPMP device tree binary holds the various throttle points and the throttle settings that govern when and how the throttling is performed. The soctherm driver in the firmware programs the hardware and handles any resulting interrupts due to these events. The throttle points can be modified by changing the BPMP device tree.
The throttle temperatures are as follows:
Thermal Zone
Hardware Throttle Limit
in Degrees Celsius
Hardware Throttling
THERMAL_ZONE_CPU
97.5
Heavy
THERMAL_ZONE_AUX
98.5
Heavy
The hardware throttling levels are as follows:
Hardware Throttling
Clock Throttled Percentage
Heavy
87.5
Medium
75
Lite
50
Throttle vectors are optimized for limiting peak current consumption while maximizing performance. To manage peak current consumption, the firmware supports capping the CPU and GPU clocks in three levels (lite, medium, and heavy), as described in the device tree bindings. This capping prevents the CPU and GPU from drawing more current than its voltage regulator can supply.
Design Considerations
Designing fail safe measures in Power Management Integrated Circuits (PMIC), or the battery controller to shut down the device when these events occur, results in a bad user experience. Similarly, designing power delivery hardware for worst-case loads results in large and costly components. Consequently, Tegra systems are designed for power delivery systems that are adequate for common loads. Additionally, Tegra systems actively manage the components to avoid exceeding the design limits. When these events are transient in nature, the need for this design management system becomes more compelling.
Hardware Thermal Shutdown
The final fail-safe the firmware provides is a hardware thermal reset or thermtrip. If the software and hardware throttling are unable to control the heat generation in the system, and the software becomes unresponsive, the Tegra system asserts the reset pin on the PMIC as the hardware shutdown mechanism.
The following are the thermtrip temperatures in Tegra X2:
Thermal Zone
Shutdown Limit in Degrees Celsius
THERMAL_ZONE_CPU
101.5
THERMAL_ZONE_AUX
101.5
THERMAL_ZONE_AO
101.5
The following are thermtrip temperatures for Tegra X2 systems:
Thermal Zone
Shutdown Limit in Degrees Celsius
THERMAL_ZONE_CPU
100.0
THERMAL_ZONE_AUX
100.0
THERMAL_ZONE_AO
100.5
Software-based Power Consumption Modeling
The Jetson TX2 module has 3-channel INA3221 power monitors at I2C addresses 0x40 and 0x41.
The information from the INA3221 power monitors can be read using sysfs nodes. The naming convention for sysfs nodes is as follows:
Command
Description
rail_name_<N>
Exports the rail name.
in_current<N>_input
Exports rail current in mA.
in_voltage<N>_input
Exports rail voltage in mV.
In_power<N>_input
Exports rail power in mW.
Where <N> is a channel number 0-2.
 
Note:
The INA driver may also present other nodes. Do not modify any INA sysfs node value. Modifying these values can result in damage to your device.
The Jetson TX2 module has 3-channel INA3221 power monitors at I2C address 0x40 and 0x41. The sysfs nodes to read for rail names, voltage, current, and power are at:
/sys/bus/i2c/drivers/ina3221x/0-0040/iio:device0
/sys/bus/i2c/drivers/ina3221x/0-0041/iio:device1
The rail names for I2C address 0x40 are:
Rail Name
Description
Channel 0: VDD_SYS_GPU
GPU power rail.
Channel 1: VDD_SYS_SOC
SOC power rail.
Channel 2: VDD_4V0_WIFI
WIFI power rail.
 
Note:
On Jetson TX2i, channel 2 is NOT connected to any rail.
 
The rail names for I2C address 0x41 are:
Rail Name
Description
Channel 0: VDD_IN
Main module power input.
Channel 1: VDD_SYS_CPU
CPU power rail.
Channel 2: VDD_SYS_DDR
DDR power rail.
The Jetson TX2 Developer Kit carrier board has 3-channel INA3221 power monitors at I2C addresses 0x42 and 0x43. The sysfs nodes to read rail name, voltage, current and power are at:
/sys/bus/i2c/drivers/ina3221x/0-0042/iio:device2
/sys/bus/i2c/drivers/ina3221x/0-0043/iio:device3
The rail names for I2C address 0x42 are:
Rail Name
Description
Channel 0: VDD_MUX
Carrier board power input.
Channel 1: VDD_5V_IO_SYS
Carrier board 5 V supply.
Channel 2: VDD_3V3_SYS
Carrier board 3.3 V supply.
The rail names for I2C address 0x43 are:
Rail Name
Description
Channel 0: VDD_3V3_IO_SLP
Carrier board 3.3 V sleep supply.
Channel 1: VDD_1V8_IO (Name on schematic is VDD_1V8)
Carrier board 1.8 V supply.
Channel 2: VDD_3V3_SYS_M2
3.3 V supply for M.2 Key E connector.
Examples
To read INA3221 at 0x41, the channel-0 rail name (i.e., VDD_IN), execute the command:
cat /sys/bus/i2c/drivers/ina3221x/0-0041/iio:device1/rail_name_0
To read VDD_IN voltage, current, and power, execute the commands:
cat /sys/bus/i2c/drivers/ina3221x/0-0041/iio:device1/in_current0_input
cat /sys/bus/i2c/drivers/ina3221x/0-0041/iio:device1/in_voltage0_input
cat /sys/bus/i2c/drivers/ina3221x/0-0041/iio:device1/in_power0_input
 
Note:
In terms of accuracy, assume a 5% guard band for INA measurements greater than 200 mW. Below that, accuracy can deviate by as much as 15%.
Related Tools and Techniques
This section describes the tools and techniques to manage power.
3D Frequency Scaling
3D frequency scaling is enabled by default.
To disable 3D frequency scaling
Run the following command:
echo 0 > /sys/devices/17000000.gp10b/enable_3d_scaling
To enable 3D frequency scaling
Run the following command:
echo 1 > /sys/devices/17000000.gp10b/enable_3d_scaling
Setting Frequencies
Use the following procedures to set frequencies and report current frequency settings.
To get system clock information
Run the following command:
cat /sys/kernel/debug/clk/clk_tree
To print the CPU lower boundary, upper boundary, and current frequency
Run the following commands:
cat /sys/devices/system/cpu/cpuX/cpufreq/cpuinfo_min_freq
cat /sys/devices/system/cpu/cpuX/cpufreq/cpuinfo_max_freq
cat /sys/devices/system/cpu/cpuX/cpufreq/cpuinfo_cur_freq
To change the CPU upper boundary
Run the following command:
echo <cpu_freq> > /sys/devices/system/cpu/cpuX/cpufreq/scaling_max_freq
To change the CPU lower boundary
Run the following command:
echo <cpu_freq> > /sys/devices/system/cpu/cpuX/cpufreq/scaling_min_freq
To set the static CPU frequency
Run the following commands:
echo <cpu_freq> > /sys/devices/system/cpu/cpuX/cpufreq/scaling_min_freq
echo <cpu_freq> > /sys/devices/system/cpu/cpuX/cpufreq/scaling_max_freq
Where:
<cpu_freq> is the frequency value available at:
/sys/devices/system/cpu/cpuX/cpufreq/scaling_available_frequencies
<X> is the CPU core number.
To print the GPU lower boundary, upper boundary, and current frequency
Run the following commands:
cat /sys/devices/17000000.gp10b/devfreq/17000000.gp10b/min_freq
cat /sys/devices/17000000.gp10b/devfreq/17000000.gp10b/max_freq
cat /sys/devices/17000000.gp10b/devfreq/17000000.gp10b/cur_freq
To change the GPU upper boundary
Run the following command:
echo <gpu_freq> > /sys/devices/17000000.gp10b/devfreq/17000000.gp10b/max_freq
To change the GPU lower boundary
Run the following command:
echo <gpu_freq> > /sys/devices/17000000.gp10b/devfreq/17000000.gp10b/min_freq
To set the static GPU frequency
Run the following command:
echo <gpu_freq> > /sys/devices/17000000.gp10b/devfreq/17000000.gp10b/min_freq
echo <gpu_freq> > /sys/devices/17000000.gp10b/devfreq/17000000.gp10b/max_freq
Where <gpu_freq> is the value available in:
/sys/devices/17000000.gp10b/devfreq/17000000.gp10b/available_frequencies
To print the EMC lower boundary, upper boundary, and current frequency
Run the following commands:
cat /sys/kernel/debug/bpmp/debug/clk/emc/min_rate
cat /sys/kernel/debug/bpmp/debug/clk/emc/max_rate
cat /sys/kernel/debug/bpmp/debug/clk/emc/rate
To change the EMC upper boundary
Run the following command:
echo <emc_freq> > /sys/kernel/debug/bpmp/debug/clk/emc/max_rate
To change the EMC lower boundary
Run the following command:
echo <emc_freq> > /sys/kernel/debug/bpmp/debug/clk/emc/min_rate
To set static EMC frequency
Run the following commands:
echo 1 > /sys/kernel/debug/bpmp/debug/clk/emc/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/emc/state
echo <emc_freq> > /sys/kernel/debug/bpmp/debug/clk/emc/rate
Where <emc_freq> is frequency value between EMC min_rate_and max_rate.
Maximizing Jetson TX2 Performance
Tegra BSP provides the jetson_clocks.sh script to maximize Jetson-TX2 performance by setting static max frequency to CPU, GPU and EMC clocks. The script can also be used to show current clock settings, store current clock settings into a file, and restore clock settings from a file. The jetson_clocks.sh script is available at:
$HOME/jetson_clocks.sh
Basic usage is as follows:
jetson_clocks.sh [options]
Options
Description
--show
Displays the current settings.
--store [file]
Stores the current settings to a file. The default file is l4t_dfs.conf.
--restore [file]
Restores the saved settings from the file. The default file is l4t_dfs.conf.
To show the current settings
Execute the command:
sudo ${HOME}/jetson_clocks.sh --show
To store the current settings
Execute the command:
sudo ${HOME}/jetson_clocks.sh --store
To maximize Jetson TX2 performance
Execute the command:
sudo ${HOME}/jetson_clocks.sh
To restore the previous settings
Execute the command:
sudo ${HOME}/jetson_clocks.sh --restore
Using CPU Hot Plug
Manage CPU hot plug with the following procedures.
To manually turn on/off slave CPUs
1. Run the following command to turn on the slave CPU:
echo 1 > /sys/devices/system/cpu/cpuX/online
2. Run the following command to turn the slave CPU off:
echo 0 > /sys/devices/system/cpu/cpuX/online
To check CPU state
Run the following commands:
cat /sys/devices/system/cpu/cpuX/online
Where <X> is the CPU core number.