NVIDIA Tegra
NVIDIA DRIVE OS 5.1 Linux

Developer Guide
5.1.0.2 Release


 
Thermal Management
 
About the tj-max Thermal Sensor
Reading the tj-max Temperature
Configuring the tj-max Sensor
Internal Thermal Sensors
Determining Sensor Calibration
Viewing Thermal Sensor Information
External Thermal Sensors
Thermal Shutdown
TSENSE Thermal Sensor
GPU Group Thermal Sensors
AUX Group Thermal Sensors
Always-On Thermal Alarm Generator Sensor
Thermal Protection During Boot
Using the External Thermal Sensor
Using the Always-On Thermal Alarm Generator Sensor
User Space Thermal Alert
The target platform has various thermal sensors that sense the temperature of different regions of the chip. These sensors are used for thermal management.
The NVIDIA software provides the Tj-max(Tj-therm) virtual sensor that is used to read temperature and gives the hottest point temperature of the chip across different zones.
About the tj-max Thermal Sensor
The Tj-max(Tj-therm)is a virtual temperature sensor for reporting the hottest point temperature on the SoC. It is not a real thermal sensor. The core functionality of the Tj-max(Tj-therm) module is to read the temperature from different on-die temperature sensors and return the maximum hotspot temperature among them; after applying the necessary offset corrections.
The purpose of this virtual sensor is to provide a single temperature value that abstracts away the complexity of multiple internal SoC sensors. The Tj-max(Tj-therm) sensor is a polled aggregation of multiple internal SoC sensors, giving it a delay that is approximately equal to the polling period.
Reading the tj-max Temperature
The thermal zone number for tj-max must be identified using these procedures.
To read the tj-max temperature on Linux
1. Identify the zone that corresponds to the tj-therm, by checking the Type of different thermal zones, for example:
root@tegra-ubuntu:~# cat /sys/class/thermal/thermal_zone*/type
CPU-therm
GPU-therm
AUX-therm
AO-therm
tj-therm
Tdiode_tegra
2. Locate your zone number to the matching type.
root@tegra-ubuntu:~# cat /sys/class/thermal/thermal_zone<id>/temp
To read the tj-max temperature on QNX
Execute the following command:
cat /dev/nvthermmon/thermmon_tree
Sample Output on QNX
The tj-max is listed with the tj-therm row with sensor ID=7 and temp=57500 mdC.
ZoneName Sensor/ZoneID Temp (mdC) Trip Enabled
CPU-therm 5 38000 false
GPU-therm 3 39500 false
AUX-therm 4 38000 false
AO-therm 6 36500 false
tj-therm 7 57500 false
Tboard_tegra 0 -1 false
Tdiode_tegra 1 -1 false
Configuring the tj-max Sensor
The different real thermal sensors used for calculating the Tj-max(Tj-therm) value are configured as part of the BPMP firmware device tree. The Tj-max(Tj-therm) device tree node includes details about which real thermal sensors are used, along with the respective hotspot offset, in millidegrees Celsius.
For detailed information about the BPMP firmware device tree, consult Configuring Power on BPMP Firmware.
The following is an example of the Tj-max(Tj-therm) device tree node in the BPMP firmware device tree:
tj_max {
poll_period = <500>;
tz_list {
cpu = <THERMAL_ZONE_PLLX 12500>;
gpu = <THERMAL_ZONE_GPU 11000>;
cv = <THERMAL_ZONE_AUX 13000>;
};
};
poll_period: Specifies intervals at which the physical sensors are polled. If absent, the default poll period of 500 msec is used. The polling period is in units of milliseconds.
tz_list: Specifies thermal zone sensors such as CPU, GPU, and CV sensors, that the BPMP firmware polls to calculate Tj_max sensor value
Contains one property per thermal zone to be polled by the Tj_max sensor. The property names may be arbitrarily chosen. Each property value must be a two-element array:
Element 0: Thermal zone ID, an unsigned 32-bit integer.
Element 1: Offset temperature in units of 0.001°C, a signed 32-bit integer. The offset temperature values of 12500, 11000, 13000 are example values for those respective zones and vary from platform to platform.
Thermal zone IDs are defined in the thermal-t194.h located at:
<top>/drive-t186ref-foundation/platform-config/bpmp_dtsi/t194/include/mach-t194/thermal-t194.h
The thermal zone IDs include:
THERMAL_ZONE_CPU
THERMAL_ZONE_GPU
THERMAL_ZONE_AUX
THERMAL_ZONE_PLLX
THERMAL_ZONE_AO
Other values are invalid and are ignored. The property must specify both elements to be considered valid. At least one valid zone property must be present to enable the Tj_max sensor. Tj_max is not started if the tz_list node is missing.
Internal Thermal Sensors
The SoC internal thermal sensors, located on different regions of the SoC chip, are as follows.
Sensor
Description
CPU group and TSENSE group thermal sensor
These sensors are intended for CPU zone sensing.
AUX Group thermal sensor
This sensor is for Computer Vision/SoC zone sensing.
GPU group thermal sensor
This sensor is for GPU zone sensing.
AOTAG (Always-On Thermal Alarm Generator) thermal sensor
This sensor is for SoC die temperature sensing.
Determining Sensor Calibration
Some of the early Xavier A01 chips do not have thermal sensors calibrated. On these platforms, the SoC internal thermal sensors are NOT functional..
To determine whether SoC internal thermal sensors are calibrated
On Linux target system, execute the command:
cat /sys/kernel/debug/bpmp/debug/fuse/tsensor/calib_ok
On QNX target system, execute the command:
cat /dev/nvbpmpdebugfs/bpmp_debug/fuse/tsensor/calib_ok
This command produces one of the following results:
0 : uncalibrated, and on-chip temperature sensors will not read meaningful value. All the actions tied to these sensors will not take effect
—Or—
1 : calibrated, all ok
Viewing Thermal Sensor Information
Use these procedures to view thermal sensor information on Linux or QNX systems as identified.
On Linux
To view all the thermal zones present in the system
Execute the command:
cat /sys/class/thermal/thermal_zone*/type
For example:
cat /sys/class/thermal/thermal_zone0/type
BCPU-therm
To view temperatures for all the thermal zones in the system
Execute the command:
cat /sys/class/thermal/thermal_zone*/temp
To view a consolidated list
Execute the command:
cat /sys/class/thermal/thermal_zone*/t[ye][pm][ep]
 
root@tegra-ubuntu:~# cat /sys/class/thermal/thermal_zone*/t[ye][pm][ep]
33000
CPU-therm
33500
GPU-therm
32500
AUX-therm
33000
AO-therm
46000
tj-therm
34000
Tdiode_tegra
The output values for the thermal zones are in Milli DegC.
Note:
The values in the sysfs output also lists external thermal sensor information (Tdiode_tegra) if enabled on your platform, since this sysfs is generic for all thermal sensors.
To view the temperature for a particular thermal zone
1. Locate your zone number to the matching type.
2. Execute the command:
cat /sys/class/thermal/thermal_zone<id>/type
cat /sys/class/thermal/thermal_zone<id>/temp
On QNX
To view all thermal zones and their temperature
Execute the command:
cat /dev/nvthermmon/thermmon_tree
To view a particular zone temperature
Execute the command:
thermmon_control --get_zone_temp <zone_id>
External Thermal Sensors
The external thermal sensor is located on the target board, i.e., NOT on the SoC chip.
For Thermal Shutdown settings for the external thermal sensor, consult External Thermal Sensors topic.
Thermal Shutdown
When the SoC temperature exceeds thermal shutdown limit, an immediate shutdown is triggered. Shutdown is triggered by asserting a THERMAL_SHUTDOWN_TEGRA signal to the Power Management IC, the off-chip PMIC.
The following thermal sensors are configured to trigger thermal shutdown:
TSENSE thermal senor, instrumented for CPU hotspot
GPU group thermal sensors, instrumented for GPU hotspot
AUX group thermal sensor, instrumented for Computer Vision (CV) and SoC hotspot
Always-On Thermal Alarm Generator (AOTAG) sensor, for early boot protection and redundant thermal shutdown on SoC die temperature
External thermal sensor, redundant thermal shutdown for SoC die temperature.
TSENSE Thermal Sensor
SoC shutdown for CPU hotspot is enabled using the TSENSE thermal sensor by setting a thermal shutdown limit value. If the TSENSE temperature exceeds the shutdown limit, thermal shutdown is triggered.
TSENSE thermal sensor shutdown is enabled in the BPMP firmware init stage. In the BPMP firmware Device Tree, the following Device Tree node and property are used to configure TSENSE thermal shutdown:
soctherm {
thermtrip {
thermtrip = <THERMAL_ZONE_PLLX 91000>; /* in Milli DegC */
}
};
};
Where 91000 is shutdown limit value as defined by the thermtrip Device Tree property. The value varies from platform to platform.
To locate the TSENSE thermal sensor shutdown value, consult the property in the BPMP firmware Device Tree of your platform.
GPU Group Thermal Sensors
SoC shutdown for the GPU hotspot is enabled using the GPU group thermal sensor. If the GPU group sensors temperature exceeds the shutdown limit, thermal shutdown is triggered.
GPU group thermal sensors shutdown is enabled in BPMP firmware init stage. In the BPMP firmware Device Tree, the following Device Tree node and property are used to configure TSENSE thermal shutdown:
soctherm {
thermtrip {
thermtrip = < THERMAL_ZONE_GPU <xxxxxx> >; /* in MillDegC */
}
};
};
Where <xxxxx> is the shutdown value for the GPU thermal zone.
AUX Group Thermal Sensors
SoC shutdown for Computer Vision and SoC hotspot is enabled using the AUX group thermal sensor. If the AUX group sensors temperature exceeds the shutdown limit, thermal shutdown is triggered.
AUX group thermal sensors shutdown is enabled in BPMP firmware init stage. In the BPMP firmware Device Tree, the following Device Tree node and property are used to configure TSENSE thermal shutdown:
soctherm {
thermtrip {
thermtrip = < THERMAL_ZONE_AUX <xxxxxx> >; /* in MillDegC */
}
};
};
Where <xxxxx> is the shutdown value for the AUX thermal zone.
Always-On Thermal Alarm Generator Sensor
Always-On Thermal Alarm Generator (AOTAG) is armed with a thermal shutdown limit. If the AOTAG temperature exceeds the shutdown threshold, AOTAG triggers an immediate thermal shutdown.
AOTAG shutdown is configured in two stages of software; one in bootloader (MB1 stage) and another in BPMP firmware init stage.
For configuration in MB1 stage, consult Thermal Protection During Boot.
For configuration in BPMP firmware device tree (DT), refer to the following Device Tree node and property:
aotag {
thermtrip = <xxxxxx>; /* in Milli DegC */
}
 
Where the Device Tree property thermtrip defines the shutdown limit value and <xxxxx> is the actual value.
To find the AOTAG thermal shutdown value, refer to this Device Tree property in the BPMP Firmware Device Tree of your platform.
Thermal Protection During Boot
The SoC uses the external thermal sensors or Always-On Thermal Alarm Generator (AOTAG) mechanisms, or both, to protect itself during boot.
Using the External Thermal Sensor
At power on, the external thermal sensor is configured with 108-degree Celsius thermal shutdown limit. This sensor can sense the SoC die temperature. However, during early boot in MB1, this shutdown threshold is changed to the qualified value. When the SoC temperature reaches the thermal shutdown limit, an immediate shutdown is triggered.
Using the Always-On Thermal Alarm Generator Sensor
The AOTAG is armed, in the early boot MB1 bootloader, with the following settings:
Cool-down threshold value
Shutdown threshold value
Cool-down timeout value
If the AOTAG temperature reaches or exceeds the cool-down threshold, boot is paused, and these actions are taken:
If, within the cool-down timeout period, the temperature drops below the cool-down threshold, boot continues.
If the cool-down timeout period expires and the temperature is still above the cool-down threshold, AOTAG triggers thermal shutdown.
At any point in time after boot, when the Tegra die temperature exceeds the AOTAG shutdown threshold, AOTG triggers thermal shutdown.
MB1 AOTAG settings are provided via MB1 BCT configuration file.
<top>/drive-t186ref-foundation/platform-config/bct/t194/misc/tegra194-mb1-bct-misc-<*>.cfg
The variables are used in the configuration file are as follows.
##### aotag variables #####
aotag.boot_temp_threshold = 90000; # Shutdown threshold in Milli degC
aotag.cooldown_temp_threshold = 80000; # In Milli degC
aotag.cooldown_temp_timeout = 30000; # In Milli seconds
aotag.enable_shutdown = 1; # Set 1 to enable AOTAG shutdown
To locate the AOTAG setting values, consult the settings in the MB1 BCT configuration file of your platform.
User Space Thermal Alert
For details, consult the kernel documentation and device tree bindings available at:
<top>/Documentation/thermal/userspace_alert.txt
<top>/Documentation/devicetree/bindings/thermal/userspace-alert.txt
User space thermal alert can be configured to use any thermal zone.
For customer use cases to get alerts for changes in the SoC junction temperature, user-space thermal alert must be configured to use the Tj-therm thermal zone.