NVIDIA Tegra
DRIVE 5.0 Linux Open Source Software

Development Guide
5.0.10.3 Release


 
Thermal Management
 
On-Die Thermal Sensors
Determining Sensor Calibration
Viewing On-Die Thermal Information
Off-Die Thermal Sensors
Tj-therm Thermal Sensor
Thermal Shutdown
AOTAG Sensor
External Thermal Sensor
PLLX Thermal Sensor
Thermal Protection During Boot
User Space Thermal Alert
The target platform has various thermal sensors that sense temperatures of different regions of chip. These sensors are used for thermal management.
On-Die Thermal Sensors
On-die thermal sensors are NVIDIA® Tegra® internal thermal sensors.
Tegra Internal Thermal Sensors
Sensor
Description
CPU (BCPU) thermal sensor
Sensor for the A57 CPU group temperature.
This sensor is owned by the Tegra SOCTHERM HW block.
AUX (Denver) (MCPU) thermal sensor
Sensor for the Denver CPU group temperature.
This sensor is owned by the Tegra SOCTHERM HW block.
PLLX thermal sensor
Fallback sensor. When the temperature readings of other sensors are invalid, temperature readings from this sensor are used. This sensor is proximate to Tegra CPUs.
This sensor is owned by the Tegra SOCTHERM HW block.
GPU thermal sensor
Sensor for the internal GPU temperature.
This sensor is owned by the Tegra SOCTHERM HW block.
AOTAG (Always-On Thermal Alarm Generator) thermal sensor
Sensor used for measuring Tegra die temperature and Tegra-internal GPU temperature.
This sensor is owned by the Tegra Power Management Controller (PMC) HW block.
Determining Sensor Calibration
Before thermal sensors on the chip are usable, they must be calibrated. The thermal sensors on manyFtj Tegra A01 version chips are not calibrated. On those HW platforms, the Tegra internal thermal sensors are not functional until they are calibrated.
To determine whether Tegra internal Thermal sensors are calibrated
On the target, enter:
cat /sys/kernel/debug/bpmp/debug/fuse/tsensor/calib_ok
 
This command produces one of the following results:
0 : uncalibrated, and on-chip temperature sensors will not read meaningful value. All the actions tied to these sensors will not take effect
—Or—
1 : calibrated, all ok
Viewing On-Die Thermal Information
To view all the thermal zones present in the system
Enter:
cat /sys/class/thermal/thermal_zone*/type
For example:
cat /sys/class/thermal/thermal_zone0/type
BCPU-therm
To view temperatures for all the thermal zones in the system
Enter:
cat /sys/class/thermal/thermal_zone*/temp
To view a consolidated list
Enter:
cat /sys/class/thermal/thermal_zone*/t[ye][pm][ep]
 
root@tegra-ubuntu:~# cat /sys/class/thermal/thermal_zone*/t[ye][pm][ep]
40000
BCPU-therm
40000
MCPU-therm
46000
GPU-therm
40000
PLL-therm
40000
AO-therm
42250
Tdiode_tegra
 
The values shown by this sysfs interface are in MillDegC.
Note:
The values shown in above sysfs output also lists (off-die thermal sensor) external thermal sensor information if enabled on your platform, as this sysfs is generic for all thermal sensors.
To view a particular zone's temperature
1. Find your zone number to the matching type.
2. Enter:
cat /sys/class/thermal/thermal_zone<id>/type
cat /sys/class/thermal/thermal_zone<id>/temp
Off-Die Thermal Sensors
Off-die thermal sensor is also called external thermal sensor. For more information, see External Thermal Sensor.
Tj-therm Thermal Sensor
Tj-therm is a virtual temperature sensor for reporting the Tegra die temperature and is not a real thermal sensor. The core functionality of the Tj-therm driver is to read the temperature from different on-die temperature sensors and return the maximum hotspot temperature among them after applying the necessary offset corrections.
The purpose of this virtual sensor is to provide a single temperature value, available for use by customer applications that abstracts away the complexity of multiple on-die sensors. The sensor is a polled aggregation of multiple on-die sensors so it has a delay roughly equal to the polling period.
The default polling period is set to 500ms, but you can modify this polling period value from the user space by using the following command:
root@tegra-ubuntu:~# echo x > /sys/class/thermal/thermal_zone<y>/polling_delay
where x is the new polling period in milliseconds and y is the thermal zone number.
The thermal zone number can be identified using the following procedure:
1. Identify the zone that corresponds to Tj-Therm by checking the "type" of different thermal zones.
root@tegra-ubuntu:~# cat /sys/class/thermal/thermal_zone*/type
BCPU-therm
MCPU-therm
GPU-therm
PLL-therm
AO-therm
Tj-therm
2. Find your zone number to the matching type:
root@tegra-ubuntu:~# cat /sys/class/thermal/thermal_zone<id>/temp
For example, to set 200 ms as the polling delay and assuming thermal_zone 5 is Tj-Therm:
root@tegra-ubuntu:~# echo 200 > /sys/class/thermal/thermal_zone5/polling_delay
For Tj-therm Kernel documentation and device tree bindings, see:
Documentation/thermal/nvidia,tegra-tj-thermal.txt
Documentation/devicetree/bindings/thermal/nvidia/tegra-tj-thermal.txt
Thermal Shutdown
Note:
DRIVE PX 2 P2379 only uses the Externel thermal sensor to shutdown.
When the Tegra temperature exceeds thermal shutdown limit, an immediate shutdown is triggered. The following thermal sensors are configured to trigger thermal shutdown:
Always-On Thermal Alarm Generator (AOTAG) sensor
External thermal sensor
PLLX thermal sensor
AOTAG Sensor
AOTAG is armed with a thermal shutdown limit. If the AOTAG temperature exceeds the shutdown threshold, AOTAG triggers an immediate thermal shutdown.
AOTAG triggers thermal shutdown by asserting a THERMAL_SHUTDOWN_TEGRA signal to the Power Management IC (off-chip PMIC). THERMAL_SHUTDOWN_TEGRA is an active low signal which is connected from the Tegra to the PMIC chip EN0 signal.
AOTAG shutdown is configured in two stages of software; one in bootloader (MB1 stage) and another in BPMP firmware init stage.
For configuration in MB1 stage, refer to Thermal Protection During Boot.
For configuration in BPMP firmware device tree (DT), refer to the following DT node and property:
aotag {
thermtrip = <xxxxxx>; /* in MillDegC */
}
 
The DT property above “thermtrip” defines the shutdown limit value.
Refer to this DT property in the BPMP Firmware DT of your platform to find the AOTAG thermal shutdown value.
External Thermal Sensor
For Thermal Shutdown settings of this sensor, refer to External Thermal Sensor-> Shutdown Limits.
PLLX Thermal Sensor
Tegra shutdown for CPU hotspot is enabled via the SOCTHERM PLLX thermal sensor. A Thermal shutdown limit value set for this. If the CPU temperature exceeds the shutdown limit, SOCTHERM triggers thermal shutdown by asserting a THERMAL_SHUTDOWN_TEGRA signal to the Power Management IC (off-chip PMIC).
PLLX Thermal Sensor shutdown is enabled in BPMP firmware init. In BPMP firware DT, the following DT node and property are used to configure PLLX thermal shutdown:
soctherm {
thermtrip {
thermtrip = <THERMAL_ZONE_PLLX <xxxxxx>; /* in MillDegC */
}
};
};
The DT thermtrip DT property above defines the shutdown limit value.
Refer to this DT property in the BPMP firmware DT of your platform to find the PLLX thermal sensor shutdown value.
Thermal Protection During Boot
Tegra uses the external thermal sensors or Always-On Thermal Alarm Generator (AOTAG) mechanisms, or both to protect itself during boot.
Via External Thermal Sensor
At power on, the external thermal sensor is configured with 108-degree Celsius thermal shutdown limit. This sensor can sense the Tegra die temperature. But during early boot, this shutdown threshold is changed to the qualified value. When the Tegra temperature reaches the thermal shutdown limit, an immediate shutdown is triggered.
For more information, see:
External Thermal Sensor->Shutdown Limits
Via AOTAG
Note:
This section is not applicable for DRIVE PX 2 P2379.
AOTAG is armed in the early boot (MB1 bootloader) with the following settings:
Cool-down threshold value.
Shutdown threshold value.
Cool-down timeout value.
If the AOTAG temperature reaches or exceeds the cool-down threshold, boot is paused and the following actions taken:
If, within the cool-down timeout period, the temperature drops below the cool-down threshold, boot continues.
If the cool-down timeout period expires and the temperature is still above the cool-down threshold, AOTAG triggers thermal shutdown.
At any point in time after boot, when the Tegra die temperature exceeds the AOTAG shutdown threshold, AOTG triggers thermal shutdown.
MB1 AOTAG settings are provided via MB1 BCT config file.
<TOP>drive-t186ref-foundation/platform-support/bct/misc/tegra186-mb1-bct-misc-<*>.cfg
 
The following variables are used in above configuration file.
##### aotag variables #####
aotag.boot_temp_threshold = xxxxxx; # Shutdown threshold in Milli degC
aotag.cooldown_temp_threshold = xxxxx; # In Milli degC
aotag.cooldown_temp_timeout = xxxxx; # In Milli seconds
aotag.enable_shutdown = 1; # Set 1 to enable AOTAG shutdown
 
To find the AOTAG setting values, see the settings above in the MB1 BCT config file of your platform.
User Space Thermal Alert
For more information, see the kernel documentation and device tree bindings:
Documentation/thermal/userspace_alert.txt
Documentation/devicetree/bindings/thermal/userspace-alert.txt
User space thermal alert can be configured to use any thermal zone.
For customer use cases of getting alerts for changes in Tegra junction temperature, user-space thermal alert must be configured to use Tj-therm thermal zone.