Power Management for Jetson Xavier NX Series and Jetson AGX Xavier Series Devices
Applies to Jetson Xavier NX series and Jetson AGX Xavier series
The NVIDIA® Jetson AGX Xavier™ series of devices includes devices with 16 GB and 32 GB of memory.
The NVIDIA® Jetson Xavier™ NX series of devices also includes devices with 16 GB and 32 GB of memory. Their features are very similar to those of Jetson AGX Xavier series.
This document often refers to Jetson Xavier NX series and Jetson AGX Xavier series devices as a group by the shorter name Jetson Xavier. When it refers specifically to the Jetson Xavier NX or Jetson AGX Xavier series of devices or to a specific device, it uses the appropriate specific name. The term “Jetson Xavier” is used solely for convenience, and is not the proper name of any NVIDIA product or group of products.
Jetson Xavier and NVIDIA® Jetson™ Board Support Package (BSP) provide many features related to power management, thermal management, and electrical management. These features deliver the best user experience possible given the constraints of a particular platform. The target user experience ensures the perception that the device provides:
• Uniformly high performance
• Excellent battery life
• Perfect stability
• Comfortable and cool to the touch
This topic describes the power, thermal, and electrical management features visible to software, as well as some tools and related techniques.
Interacting Features
Power, thermal, and electrical management features place dynamic constraints on many operational settings (“knobs”), such as:
• Clock gate settings
• Clock frequencies
• Power gate (or regulator enable) settings
• Voltages
• Processor power state (i.e., which idle state is selected for the CPU)
• Peripheral power state (i.e., which idle state is selected for an I/O controller)
• Chipset power state
• Availability of CPU cores to the OS
Some of these knobs are constrained by more than one feature. For example, cpufreq implements load based scaling based on how busy the CPU is, and adjusts the CPU frequency accordingly. CPU thermal management, however, can override the target frequency of cpufreq. Consequently, before you attempt to debug power, performance, thermal, or electrical problems, familiarize yourself with all of the power, thermal, and electrical management features in BSP.
Kernel Space Power Saving Features
This section describes BSP features that save power and extend battery life. Many of these features are implemented by the Linux kernel, with support from firmware and hardware, and without significant involvement from the user space.
Chipset Power States
The supported power states are listed in order of increasing flexibility or configurability:
• Off: There is only one way for a system to be off.
• Deep Sleep (SC7) offers a small amount of configurability. For example, prior to entering Deep Sleep, software can select which of the many hardware wake events can wake the chip from Deep Sleep.
• Active state is extraordinarily flexible in terms of power and performance. It encompasses activity levels from low power audio playback through peak performance. Power consumption in Active state can range from tens of milliwatts to several watts.
Supported Power States
The supported power states are:
Power State | Functionality | Characteristics |
Off | Power rails | None of the power rails supplying the SoC and DRAM are powered. |
State | No state is maintained in the SoC or DRAM. |
Exiting | Into Active state via cold boot. |
Deep Sleep (SC7) | Power rails | VDD_RTC, VDDIO_DDR, VDDIO_SYS, and DRAM power rails are powered on. VDD_CORE and VDD_CPU are powered off. |
State | The SoC maintains a small amount of state information in the PMC block. DRAM maintains state. |
Exiting | Into Active state via a pre-defined set of wake events. |
Active | Power rails | VDD_RTC, VDDIO_DDR, VDDIO_SYS, VDD_CORE, and DRAM rails are powered on. Other power rails, including VDD_CPU, may be powered on. |
State | Software actively manages the power states of the devices that make up the SoC. |
Exiting | Software can initiate a transition from Active to any other power state. |
Power State Mapping to Linux
BSP maps hardware power states to Linux power states as follows.
Chipset Power State | Linux Power State | Comments |
Off | Off | — |
Deep Sleep (SC7) | Suspend to RAM | Software can choose whether to enter Deep Sleep before the OS enters Suspend. |
Active | Running/Idle (display on or off) | Many SoC devices may be idle or disabled under driver control. For example, VDD_GPU may be powered off and the companion GPU may be power-gated. |
Note: | For Jetson Xavier the name of the chipset power state is SCy. This differs from most other Jetson devices, for which it is LPx. |
Deep Sleep (SC7)
You can initiate deep sleep from the user space with this command if the systemd init system is in use:
$ sudo systemctl suspend
Alternatively, you can use this:
$ sudo bash -c "echo mem > /sys/power/state"
The first method of entering deep sleep is preferred because it cooperates better with systemd, which maintains the Linux runlevel. Use the second method if your system is not running systemd.
The system can be awakened from deep sleep by common wake sources available on Jetson platforms:
Wake Source | Usage |
Power button | Press and release the power button on the Jetson device. If the power button is not available, connect then disconnect the power button pin and ground. |
RTC alarm | Before entering low power state, program the RTC alarm with the command: $ sudo bash -c "echo `date '+%s' -d '+ 10 seconds'` > /sys/class/rtc/rtc0/wakealarm" |
Micro USB cable hotplug | Connect or disconnect a micro-USB cable to the USB micro-B port for flashing the device. |
USB remote wakeup | Press any key on a USB keyboard connected to the device. |
Wake on LAN | On another machine on the same LAN, enter: $ sudo etherwake -i <interface> <MAC_address_of_target> |
SD card detection | Insert or remove SD card. |
Clock and Voltage Management
Because frequency is proportional to voltage, dynamic voltage scaling is closely related to frequency scaling. For example, higher frequencies require higher voltages and vice versa.
Most clock register manipulation on Jetson Xavier is handled by the Boot and Power Management (BPMP) firmware − power management firmware running on the BPMP. A Linux kernel driver on the CPU exposes a somewhat simplified view of the physical clock tree to software on the main CPU via the Linux Common Clock Framework.
Each of the significant clock domains on the chip has its own dedicated clock source known as a Noise Aware Frequency Lock Loop (NAFLL).
Regulator Framework
The Linux regulator framework provides an abstraction that allows regulator consumer drivers to dynamically adjust voltage or current regulators at runtime, without knowledge of the underlying hardware power tree.
The framework provides a mechanism that platform initialization code can use to declare a power tree topology and assign a driver that provides regulators for each node in the hardware power tree. Such a driver is called a regulator provider driver.
BSP configures the platform power tree appropriately for Jetson Xavier. Additionally, drivers within BSP act as regulator consumers, where appropriate.
When you port BSP to a new platform, you must ensure that:
• The platform power tree is configured to match the underlying hardware.
• All drivers for peripheral devices use the regulator consumer APIs correctly.
• The device tree and board configuration file information for your new platform avoid conflicts between functions using the same I/O pads. BSP drivers registering as regulator consumers can cause I/O pads on the chip to be unavailable for other functions.
The SoC core power rails (VDD_CORE, VDD_CPU, VDD_GPU, VDD_CV) are under the direct control of the BPMP firmware. They are configured via the BPMP device tree blob (which is distinct from the Linux device tree blob).
CPU Power Management
The CPU power management strategy uses dynamic frequency scaling with dynamic voltage scaling, idle power states, and core management tuned for the Jetson Xavier architecture.
Frequency Management with cpufreq
BSP implements CPU Dynamic Frequency Scaling (DFS) with the Linux cpufreq subsystem. The cpufreq subsystem comprises:
• Platform drivers to implement the clock adjustment mechanism
• Governors to implement frequency scaling policies
• A core framework to connect governors to platform drivers
The policy for frequency scaling depends on which cpufreq governor is selected at runtime.
For details, see the information at:
<top>/kernel/kernel-4.9/Documentation/cpu-freq/
For each Jetson hardware reference design, NVIDIA selects a cpufreq governor and tunes it to achieve a balance between power and performance.
When a governor requests a CPU frequency change, the cpufreq platform driver reconciles that request with constraints imposed by thermal or electrical limits, and updates the CPU clock speed.
Jetson Xavier uses an NAFLL to clock each CPU. The NAFLLs are configured for AVFS. Hardware, with the assistance of the BPMP, ensures that the CPU voltage is appropriate for the NAFLL to deliver requested CPU frequencies.
Idle Management with cpuidle
The Linux cpuidle infrastructure supports the implementation of SoC-specific idle states for each CPU core. cpuidle lacks direct support for idle states applicable to an entire CPU cluster and for idle states extending beyond a CPU cluster.
For more information about the Linux cpuidle infrastructure, see:
<top>/kernel/kernel-4.9/Documentation/cpuidle/
NVIDIA provides an SoC-specific cpuidle driver that plugs into the cpuidle framework to enable CPU idle power management.
CPU Idle
For each core there is an idle task which is scheduled when no other runnable tasks are left in the run queue for that core. This task places the core in a low-power state selected by the cpuidle governor. The core stays in that state until an interrupt wakes it up to process more work.
When the last active core in a CPU cluster goes into an idle or offline state, the idle task puts the entire CPU cluster in a low-power state.
Idle States
The table below summarizes the CPU core and cluster idle states available on Jetson Xavier, and the BSP software support for them.
Type of State | State | Meaning | Software Support |
Core state | C1 | Clock gating | Supported |
C6 | Virtual retention (power gating and architecture state restored by MTS) | Supported * |
C7 | Power gating | Not supported |
Cluster state | CC1 | Auto clock gating | Supported |
CC3 | fmax@Vmin or specified idle frequency | Supported |
CC6 | Cluster power gating (includes non-CPU logic) | Supported † |
* C6 is disabled by default because the min-residency-us device tree property of C6 is set to 0xffffffff. † Because C6 is disabled by default, there is no change to enter CC6. |
Core states are denoted as Cx states, and cluster states are denoted as CCx states.
To enable CPU idle
To enable CPU idle you must enable the appropriate kernel configuration option and the appropriate device tree node. (Enabling either one alone is not effective.)
• To enable CPU idle in the configuration file, set this option:
CONFIG_CPU_IDLE=y
• To enable CPU idle in the device tree, enable the device tree node cpuidle:
cpuidle {
compatible = "nvidia,tegra19x-cpuidle";
status = "okay";
};
To disable cpuidle at boot time
• Disable the device tree node cpuidle.
To display CPU idle status
• To determine whether CPU idle is enabled by sysfs, enter these commands:
$ cat /sys/devices/system/cpu/cpuidle/current_driver
If CPU idle is enabled, the command displays:
tegra19x_cpuidle_driver
To enable/disable a core/cluster power state at boot time
• To enable a core/cluster power state, set the following properties of the appropriate core/cluster state node:
• status to "okay"
• min-residency-us to a reasonable value.
For example, to enable power state C6 with min-residency-us = 50000:
C6: c6 {
compatible = "nvidia,tegra194-cpuidle-core";
state-name = "Virtual core powergate";
wakeup-latency-us = <2000>;
min-residency-us = <50000>;
power = <60>;
pmstate = <0x6>;
arm,psci-suspend-param= <0x6>;
status = "okay";
};
• To disable a core/cluster power state, use either of the following procedures. (Both procedures apply to the device tree:tegra194-cpuidle.dtsi.)
• Remove or disable the appropriate core/cluster state node.
or
• Modify the appropriate core/cluster state node by setting the property min-residency-us to a high value, e.g., 0xffffffff.
For example, to disable power state C6:
C6: c6 {
compatible = "nvidia,tegra194-cpuidle-core";
state-name = "Virtual core powergate";
wakeup-latency-us = <2000>;
min-residency-us = <0xffffffff>;
power = <60>;
pmstate = <0x6>;
arm,psci-suspend-param= <0x6>;
status = "okay";
};
To get and set a CPU core’s power state
The pathnames of the nodes that represent core power states are:
/sys/devices/system/cpu/cpu<x>/cpuidle/state<y>
Where:
• <x> is a core ID.
• <y> is the index of the core power state: 0 for C1, or 1 for C6.
Note: | A core power state’s status is 1 if the state is disabled, and 0 if it is enabled. This is the reverse of the usual Boolean sense of 0 and 1. |
To get the status of core power state <y> on core <x>, read the appropriate node. To set the status, write an ASCII 0 to 1 to the node.
Following are several useful commands for getting and setting the core power state:
• To display the name of the core power state with index <y>, enter the command:
$ cat /sys/devices/system/cpu/cpu<x>/cpuidle/state<y>/name
For example, this command displays C1:
$ cat /sys/devices/system/cpu/cpu<x>/cpuidle/state0/name
This command displays C6:
$ cat /sys/devices/system/cpu/cpu<x>/cpuidle/state1/name
• To get the status of core power state <y> on core <x>:
$ cat /sys/devices/system/cpu/cpu<x>/cpuidle/state<y>/disable
• To change the status of core power state <y> on CPU core <x>:
$ echo <b> > /sys/devices/system/cpu/cpu<x>/cpuidle/state<y>/disable
Note: | Remember that a status of 1 disables the core power state, and 0 enables it. |
To get cluster states
• To get the status of the cluster states enabled for each cluster, read this node:
/sys/kernel/debug/tegra_cpuidle/deepest_cc_state
The value returned is:
• 1: Only CC1 is enabled
• 6: CC1 and CC6 are enabled
To get the per-core state usage statistics
• To get the number of times the kernel requested a specified core to enter a specified state, read this node:
$ cat /sys/devices/system/cpu/cpu<x>/cpuidle/state<y>/usage
• To get the number of times a specified core actually entered a specified state, enter the command:
$ cat /sys/kernel/debug/tegra_mce/cstats
For example, to get the number of times that core 2 has entered state C6, enter the command:
$ cat /sys/devices/system/cpu/cpu2/cpuidle/state1/usage
To get the total time in microseconds that a specified core has spent in a specified state since boot, read the following device:
$ cat /sys/devices/system/cpu/cpu<x>/cpuidle/state<y>/time
Memory Power Management
NVIDIA SoC chipsets include power saving features whose operation is largely invisible to software at runtime. Most of those features are statically enabled at boot, according to settings in the boot configuration table (BCT).
Additionally, BSP implements EMC frequency scaling, which is dynamic frequency scaling for the memory controller (EMC/MC) and DRAM. This is a critical power saving feature that requires tuning and characterization for each new printed circuit board design.
The calibration results include a BCT and an EMC DVFS table specific to the board design. The EMC DVFS table must be included in the platform BPMP device tree file.
EMC Frequency Scaling Policy
The following factors affect EMC frequency scaling policy at runtime:
• The entries in the EMC DVFS table
• The average memory bandwidth used (as measured by hardware)
• Requests made by various device drivers (cpufreq, graphics drivers, USB, HDMI™, and display)
• Any limits dynamically imposed by thermal throttling
Supported Modes and Power Efficiency
Jetson Xavier is designed with a high efficiency Power Management Integrated Circuit (PMIC), voltage regulators, and power tree to optimize power efficiency. It supports three optimized power budgets, such as 10 watts, 15 watts, and 30 watts. For each power budget, several configurations are possible with various CPU frequencies and number of cores online.
Capping the memory, CPU, and GPU frequencies, and number of online CPU, GPU TPC, DLA and PVA cores at a pre-qualified level confines the module to the target mode. The configurations pre-defined by NVIDIA are as follows.
NVPModel clock configuration for Jetson Xavier NX series |
Property | Mode |
15W | 15W | 15W | 10W * | 10W | 10W | 20W | 20W | 20W |
Power budget | 15W | 15W | 15W | 10W | 10W | 10W | 20W | 20W | 20W |
Mode ID | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
Online CPU | 2 | 4 | 6 | 2 | 4 | 4 | 2 | 4 | 6 |
CPU maximal frequency (MHz) | 1900 | 1400 | 1400 | 1500 | 1200 | 1900 | 1900 | 1400 | 1400 |
GPU TPC | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |
GPU maximal frequency (MHz) | 1100 | 1100 | 1100 | 800 | 800 | 510 | 1100 | 1100 | 1100 |
DLA cores | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
DLA maximal frequency (MHz) | 1100 | 1100 | 1100 | 900 | 900 | 900 | 1100 | 1100 | 1100 |
PVA cores | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
PVA maximal frequency (MHz) | 600 | 600 | 600 | 400 | 400 | 400 | 600 | 600 | 600 |
CVNAS maximal frequency (MHz) | 576 | 576 | 576 | 460.8 | 460.8 | 460.8 | 576 | 576 | 576 |
Memory maximal frequency (MHz) | 1600 | 1600 | 1600 | 1600 | 1600 | 1600 | 1866 | 1866 | 1866 |
SOC clocks maximal frequency (MHz) 10W & 15W modes | adsp 300 ape 150 axi_cbb 204 bpmp 384 bpmp_apb 408 host1x 204 isp 576 | display 600 display_hub 300 nvcsi 314 nvdec 665.6 nvenc 499.2 nvjpg 371.2 pex 250 | rce 384 sce 345.6 se 473.6 tsec 371.2 vi 460.8 vic 601.6 |
SOC clocks maximal frequency (MHz) 20W modes | adsp 300 ape 150 axi_cbb 204 bpmp 384 bpmp_apb 408 host1x 204 isp 576 | display 600 display_hub 300 nvcsi 314 nvdec 793.6 nvenc 729.6 nvjpg 460.8 pex 250 | rce 384 sce 345.6 se 704 tsec 371.2 vi 460.8 vic 601.6 |
* The default mode is 10W (mode ID 5). |
NVPModel clock configuration for Jetson AGX Xavier with 16 GB, 32 GB, or 64 GB |
Property | Mode |
MAXN | 10W | 15W | 30W | 30W | 30W | 30W | 15W * |
Power budget | n/a | 10W | 15W | 30W | 30W | 30W | 30W | 15W |
Mode ID | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
Online CPU | 8 | 2 | 4 | 8 | 6 | 4 | 2 | 4 |
CPU maximal frequency (MHz) | 2265.6 | 1200 | 1200 | 1200 | 1450 | 1780 | 2100 | 2188 |
GPU TPC | 4 | 2 | 4 | 4 | 4 | 4 | 4 | 4 |
GPU maximal frequency (MHz) | 1377 | 520 | 670 | 900 | 900 | 900 | 900 | 670 |
DLA cores | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
DLA maximal frequency (MHz) | 1395.2 | 550 | 750 | 1050 | 1050 | 1050 | 1050 | 115.2 |
PVA cores | 2 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
PVA maximal frequency (MHz) | 1088 | 0 | 550 | 760 | 760 | 760 | 760 | 115.2 |
CVNAS maximal frequency (MHz) | 1356.8 | 601.6 | 716.8 | 1011.2 | 1011.2 | 1011.2 | 1011.2 | 115.2 |
Memory maximal frequency (MHz) | 2133 | 1066 | 1333 | 1600 | 1600 | 1600 | 1600 | 1333 |
SOC clocks maximal frequency (MHz) All modes | adsp 300 ape 150 axi_cbb 408 bpmp 896 bpmp_apb 408 display 800 display_hub 400 | csi 400 host1x 408 isp 1190.4 nvdec 1190.4 nvenc 1075.2 nvjpg 716.8 pex 500 | rce 819.2 sce 729.6 se 1036.8 tsec 1036.8 vi 998.4 vic 1036.8 |
* The default mode is 15W (mode ID 7). Default mode is intended to improve desktop application performance. PVA and DLA are not in use and run at minimal frequency. |
NVPModel clock configuration for Jetson AGX Xavier Industrial |
Property | Mode |
MAXN | 20W | 40W* | 40W | 40W | 40W |
Power budget | n/a | 20W | 40W | 40W | 40W | 40W |
Mode ID | 0 | 1 | 2 | 3 | 4 | 5 |
Online CPU | 8 | 4 | 8 | 6 | 4 | 2 |
CPU maximal frequency (MHz) | | 1200 | 1200 | 1450 | 1780 | 2035.2 |
GPU TPC | 4 | 4 | 4 | 4 | 4 | 4 |
GPU maximal frequency (MHz) | 1211.3 | 670 | 900 | 900 | 900 | 900 |
DLA cores | 2 | 2 | 2 | 2 | 2 | 2 |
DLA maximal frequency (MHz) | 1228.8 | 750 | 1050 | 1050 | 1050 | 1050 |
PVA cores | 2 | 1 | 1 | 1 | 1 | 1 |
PVA maximal frequency (MHz) | 947.2 | 550 | 760 | 760 | 760 | 760 |
CVNAS maximal frequency (MHz) | 1203.2 | 716.8 | 1011.2 | 1011.2 | 1011.2 | 1011.2 |
Memory maximal frequency (MHz) | 2133 | 1600 | 1600 | 1600 | 1600 | 1600 |
SOC clocks maximal frequency (MHz) All modes | adsp 300 ape 150 axi_cbb 408 bpmp 755.2 bpmp_apb 408 display 768 display_hub 358.4 | csi 400 host1x 408 isp 1011.2 nvdec 960 nvenc 870.4 nvjpg 563.2 pex 500 | rce 678.4 sce 588.8 se 857.6 tsec 806.4 vi 819.2 vic 819.2 |
* The default mode is 40W (mode ID 2). |
To change the power mode
• Enter the command:
$ sudo /usr/sbin/nvpmodel -m <x>
Where <x> is the power mode ID (i.e. 0, 1, 2, 3, 4, 5, or 6).
• Alternatively, use the
nvpmodel GUI front end. For more information, see
To use the nvpmodel GUI, later in this topic.
Once you set a power mode, the module stays in that mode until you change it. The mode persists across power cycles and SC7.
To display the current power mode
• Enter the command:
$ sudo /usr/sbin/nvpmodel -q
• Alternatively, see the mode displayed to the right of the NVIDIA icon in the
nvpmodel window’s menu bar. For more information, see
To use the nvpmodel GUI, later in this topic.
To learn about other options
• Enter the command:
$ /usr/sbin/nvpmodel -h
To define a custom power mode
• Add a mode definition to the file:
/etc/nvpmodel.conf
This is an example entry for mode 2:
< POWER_MODEL ID=2 NAME=MODE_15W >
CPU_ONLINE CORE_0 1
CPU_ONLINE CORE_1 1
CPU_ONLINE CORE_2 1
CPU_ONLINE CORE_3 1
CPU_ONLINE CORE_4 0
CPU_ONLINE CORE_5 0
CPU_ONLINE CORE_6 0
CPU_ONLINE CORE_7 0
CPU_DENVER_0 MIN_FREQ 1200000
CPU_DENVER_0 MAX_FREQ 1200000
CPU_DENVER_1 MIN_FREQ 1200000
CPU_DENVER_1 MAX_FREQ 1200000
GPU MIN_FREQ 0
GPU MAX_FREQ 670000000
EMC MAX_FREQ 1331200000
DLA_CORE MAX_FREQ 750000000
DLA_FALCON MAX_FREQ 450000000
PVA_VPS MAX_FREQ 550000000
PVA_CORE MAX_FREQ 385000000
The unit of measure for CPU frequency is kilohertz. The unit for GPU and EMMC frequency is hertz. You must assign each custom mode a unique number in the ID field. Test your use case to determine:
• How many active cores to use
• The frequency for each CPU cluster, and the GPU and EMC frequencies
The frequencies you select are subject to the MaxN limit defined in mode 0.
Fan Mode Control
Jetson Xavier supports two modes of fan operation named “quiet” and “cool.” BSP manages fan start, fan speed, and stop states based on the trip point temperatures configured for the selected mode.
Fan Mode Configuration
Every fan speed step is associated with the trip point temperature and corresponding hysteresis. The following table shows the configurations predefined by NVIDIA.
Fan mode configuration for Jetson Xavier NX series |
Fan mode “quiet” |
Trip temperature * | 0 | 46 | 60 | 68 | 76 |
Hysteresis * | 0 | 8 | 8 | 7 | 7 |
Fan PWM value | 0 | 130 | 160 | 200 | 255 |
Fan mode “cool” |
Trip temperature * | 0 | 35 | 45 | 53 | 61 |
Hysteresis * | 0 | 8 | 8 | 7 | 7 |
Fan PWM value | 0 | 140 | 170 | 200 | 255 |
* Trip temperature and hysteresis in degrees Celsius. |
Fan mode configuration for Jetson AGX Xavier series |
Fan mode “quiet” |
Trip temperature * | 0 | 50 | 63 | 72 | 81 |
Hysteresis * | 0 | 18 | 8 | 8 | 8 |
Fan PWM value | 0 | 77 | 120 | 160 | 255 |
Fan mode “cool” |
Trip temperature * | 0 | 35 | 53 | 62 | 73 |
Hysteresis * | 0 | 9 | 8 | 8 | 9 |
Fan PWM value | 0 | 77 | 120 | 160 | 255 |
* Trip temperature and hysteresis in degrees Celsius. |
The framework implements hysteresis to prevent frequent changes in fan speed. For Jetson AGX Xavier series devices, as an example, when fan mode is set to “quiet” with the default settings shown above, the framework performs these actions:
• Turns on the fan when the temperature rises to 50° C
• Turns off the fan when the temperature falls to 32° C
• Turns on the fan again when the temperature rises to 50° C, and so on
Default Fan Mode
For Jetson Xavier the fan mode is set to “quiet” by default. It is defined as FAN_CONFIG DEFAULT in the configuration file /etc/nvpmodel.conf.
To change the default fan mode
Set the default fan mode by putting the following property in nvpmodel.conf:
<FAN_CONFIG DEFAULT=<fan_mode>
Where <fan_mode> is quiet or cool.
To change the fan mode
• Enter the command:
$ sudo /usr/sbin/nvpmodel -d <fan_mode>
Where <fan_mode> is quiet or cool.
To identify the current fan mode
• Enter the command:
$ sudo /usr/sbin/nvpmodel -q
Once you set a fan mode, the module stays in that mode until you change it. The mode persists across power cycles and SC7.
Thermal Management
Thermal management is essential for system stability and quality of user experience. Jetson Xavier thermal management provides the following capabilities:
• Sensing: for on-board and on-die thermal sensor temperature reporting
• Cooldown: for removing heat via the fan and for controlling heat via software clock throttling
• Slowdown: for hardware clock throttling
• Shutdown: for orderly software shutdown and hardware thermal shutdown
Thermal management in Jetson Xavier is performed by:
• The Linux kernel, which monitors on-board thermal sensors, performs cooldown, and supports software and hardware thermal shutdown
• The Board and Power Management Processor (BPMP), which monitors on-die thermal sensors, and performs slowdown and hardware thermal shutdown
The following table identifies each thermal management action and the associated module for the SoC.
Thermal Action | Linux Device Driver | Associated Module |
---|
Sensing | soctherm.c | BPMP firmware |
aotag.c | BPMP firmware |
nct1008.c | Kernel software |
Cooldown for software throttling | tegraXX_throttle.c | Kernel software |
Cooldown for fan | pwm_fan.c | Kernel software |
Slowdown for hardware throttling | soctherm.c | BPMP firmware |
Software shutdown | thermal_core.c | Kernel software |
Hardware shutdown | soctherm.c and aotag.c | BPMP firmware |
nct1008.c | Kernel software |
Linux Thermal Framework
The Linux thermal framework provides generic user space and kernel space interfaces for working with devices that measure or control temperature. The central component of the framework is the thermal zone.
For more information about the Linux thermal framework, see:
<top>/kernel/kernel-4.9/Documentation/thermal/sysfs-api.txt
Thermal Zone
A thermal zone is a virtual object that represents an area on the die whose temperature is monitored and controlled. A thermal zone acts as an object with the following components:
• Temperature sensor
• Cooling device
• Trip points
• Governor
BSP includes drivers that provide interfaces to these components.
This topic introduces these components and demonstrates how they form a thermal zone on an NVIDIA SoC.
Configuring a Thermal Zone Using the Device Tree
A thermal zone provides knobs to tune the thermal response of the zone. BSP provides several thermal zones tuned to provide optimum thermal performance. You can modify the provided thermal zones by editing the entries in the device tree. Users can define sensors to use temperature limits and cooling actions on those limits. When a device becomes too hot, in most cases, it can be resolved by tuning the thermal zone.
The following code snippet provides an example of a thermal zone for Jetson AGX Xavier. This thermal zone monitors the temperature of the THERMAL_ZONE_GPU sensor. Clock throttling is performed using the CPU-balanced cooling device when the passive trip point, trip_bthrot, is crossed at 88° C.
GPU-therm {
status = "okay";
polling-delay-passive = <500>;
thermal-zone-params {
governor-name = "step_wise";
};
trips {
trip_critical {
temperature = <93500>;
type = "critical";
hysteresis = <0>;
writable;
};
trip_bthrot {
temperature = <88000>;
type = "passive";
hysteresis = <0>;
writable;
};
};
cooling-maps {
map0 {
trip = <&{/thermal-zones/GPU-therm/trips/trip_bthrot}>;
cdev-type = "cpu-balanced";
cooling-device = <&{/bthrot_cdev/gpu_balanced} THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
};
};
};
For more information about thermal knobs, see:
<top>/kernel/kernel-4.9/Documentation/devicetree/bindings/thermal/thermal.txt
Temperature Sensors
A temperature sensor in a thermal zone is responsible for reporting the temperature in millidegrees Celsius. An NVIDIA SoC has several types of temperature sensors on the die and board.
For more information see
Thermal Sensing in Linux.
Trip Points
Thermal management uses trip points to communicate with thermal zones. A trip point identifies the temperature at which to perform a thermal action.
Trip points are classified as active or passive, based on the type of cooling they trigger. A trip point is classified as critical if it triggers a thermal shutdown. A cooling map specifies how a cooling device is associated with certain trip points. Jetson BSP supports fan and clock throttling.
Cooling Devices
A cooling device reduces the temperature of a power dissipating device. There are essentially two types of cooling devices:
• An active cooling device, such as a fan, reduces the temperature of a power dissipating device by removing heat.
• A passive cooling device, such as software or hardware clock throttling, reduces temperature by reducing device performance, and so reducing heat dissipation.
For more information, see
Thermal Cooling.
Governors
Thermal management requires some form of feedback control system that keeps the device within a safe operating temperature. A governor implements this feedback control loop. While the Linux thermal framework provides many different governors, BSP provides a simple Proportional Integral Derivative (PID) controller for all passive throttling needs.
BSP-Specific Thermal Zones
BSP defines platform-specific thermal zones. The zones are tuned to provide the best performance within the thermal constraints of the device. Each thermal zone uses a temperature sensor that is controlled by either the Linux kernel or the BPMP firmware, as described in the following table.
Thermal Zone | Thermal Sensor | Associated Module |
---|
CPU-therm | THERMAL_ZONE_PLLX | BPMP firmware |
GPU-therm | THERMAL_ZONE_GPU | BPMP firmware |
AUX-therm | THERMAL_ZONE_AUX | BPMP firmware |
AO-therm | THERMAL_ZONE_AO | BPMP firmware |
Tdiode_tegra | tmp451 | Linux kernel |
PMIC-Die | PMIC | Linux kernel |
Tboard_tegra | tmp451 | Linux kernel |
thermal-fan-est | Weighted average of CPU-therm, GPU-therm, & AUX-therm (3:3:4) | Linux kernel |
For more information, see
Thermal Management in BPMP.
Gains achieved by tuning are limited by the Thermal Design Power (TDP) of the system. Tuning cannot remedy a faulty TDP. Removing all of the thermal zones does not guarantee maximum performance, and can cause resets and/or irreversible damage to the device.
Thermal Management in Linux
The Linux kernel provided by BSP includes several drivers for on-board and on-die temperature sensing.
Thermal Sensors
Jetson AGX Xavier has several types of sensors to support hardware and software cooling strategies.
NCT Sensors
BSP includes a driver for on-board sensor devices such as:
• NCT1008
• NCT72
• TMP451
Note: | Jetson Xavier NX series devices do not integrate an external thermal sensor. |
These devices can sense their own temperature as well as the temperature of a remote diode. NVIDIA SoC platforms have these sensors set up as follows:
Thermal Zone | Thermal Sensor | Sensed Location |
---|
Tdiode_tegra | Remote sensor | Temperature on die near GPU |
Tboard_tegra | Local sensor | Temperature of the board |
BSP configures these sensors to operate in an extended mode to increase the temperature range to −64° C to 191° C.
Operation During SC7
On many platforms, the voltage rail that powers the sensor is gated when the SoC enters the SC7 state. Consequently, the sensor is stopped when the SoC enters SC7 and restarted when it exits that state.
Thermal Capabilities
The NCT sensors generate thermal events for:
• Thermal zone trip points
• Hardware thermal shutdown
Correction Offset
The NCT sensors allow software to program a static offset temperature for the remote sensor. This accounts for any inaccuracy that may be present in the sensor hardware. BSP reads the offset from the device tree and programs it into the offset register on boot. The offset is calculated and validated via oil bath experiments.
BPMP Sensors
Jetson Xavier replaces the soctherm and aotag drivers in the Linux kernel with the tegra_bpmp_thermal sensor driver. This module registers itself as the sensor device driver with the Linux thermal framework for all the thermal sensors except the NCT sensors.
Each BPMP sensor is exposed using the Application Binary Interface (ABI), and has an ABI name as shown in the table in
BSP-Specific Thermal Zones. BPMP sensors, without the
thermal_zone prefix, work as described in the following paragraphs. All BPMP sensors have one programmable temperature threshold (one trip), allocated for a thermal zone trip point.
The tegra_bpmp_thermal driver walks through the list of thermal trip points in a thermal zone based on the current temperature. It then comes up with a trip to program the BPMP sensor that is specified in the thermal zone. The driver then uses the following thermal message requests (MRQs) to communicate with the BPMP thermal framework.
• CMD_THERMAL_QUERY_ABI
• CMD_THERMAL_GET_TEMP
• CMD_THERMAL_SET_TRIP
• CMD_THERMAL_GET_NUM_ZONES
The driver receives a CMD_THERMAL_HOST_TRIP_REACHED MRQ message when a particular sensor crosses a trip. The message is then relayed back to the Linux thermal framework.
For more information on these thermal management features provided as part of BSP, see
Thermal Management in BPMP.
Thermal Cooling
BSP provides thermal management using fan control and throttling of various clocks in the system.
Fan Management
BSP provides active cooling by fan management through the cooling device pwm-fan, which provides:
• Fan speed control by programming the PWM controller
• Ramp-up and ramp-down control to change the speed of the fan smoothly
• Fan control during various power states
The PWM-RPM mapping, and the various ramp rates, are stored as part of the device tree binary. The pwm-fan cooling device maps these PWM values to a cooling state. The fan cooling device can be attached to monitor the temperature of any of the BSP sensors. As the temperature increases, the governor picks a progressively deeper cooling state for the fan. This results in a higher RPM for the fan, which produces more cooling.
SoC thermal management uses the fan as the first line of defense to delay clock throttling until a much higher temperature is reached.
Software Clock Throttling
BSP provides thermal cooling by throttling various clocks in the system. When a thermal sensor’s temperature rises above a throttling trip point, clock throttling employs the DVFS capabilities of the clocks to reduce their operating frequencies, and thereby the voltages of the rails that power the clocks. This reduction in frequency and voltage reduces power consumption which helps to control the temperature.
Because BSP provides cooling by reducing the clock frequency, it directly impacts performance and the user experience. If a device feels warm and seems sluggish, it may be due to thermal throttling on the clocks. This can be remedied by tuning the thermal zones provided in the following BSP balanced cooling devices:
• gpu_balanced
• cpu_balanced
• aux_balanced
• emergency_balanced
Each of these balanced cooling devices provides several cooling states, each of which translates to a maximum allowable operating frequency for the CPU, GPU, and EMC clocks. These frequencies are optimized to provide the best possible performance at a given temperature. The frequency tables for these clocks are part of the device tree binary.
The governor uses the current temperature of a thermal zone as an input to the feedback control loop. Similarly, it uses the output of the control loop to set a new cooling state for the thermal zone’s cooling device. As the device heats up the governor picks progressively higher cooling states, which result in higher frequency caps for all of the clocks, and potentially greater cooling. BSP performs this thermal throttling of the clocks to maintain the junction temperature of the die within the recommended safe limits. For software throttling trip temperatures, see the table in
Thermal Specifications.
Software Thermal Shutdown
The thermal zones also define a special type of trip point called a critical trip point that triggers a software shutdown. This special trip point allows the operating system to save its state and perform an orderly shutdown before a hardware reset due to high temperature rates. BSP defines one critical trip point per thermal zone. Users can set the lower limit for the orderly shutdown. A thermal shutdown occurs after all the other cooling strategies have failed. It is considered a rare event. For software throttling trip temperatures, see the table in
Thermal Specifications.
Hardware Thermal Shutdown
The on-board sensor is configured to trigger hardware shutdown when all other cooling strategies have failed, and in particular, after software shutdown has failed to occur when it should. For hardware shutdown limits, see the table in
Thermal Specifications.
Thermal Management in BPMP
BSP thermal management features are part of the firmware running on BPMP for Jetson platforms running any host operating system (host OS) on the CPU.
Thermal Sensing
The BPMP firmware hosts the soctherm and aotag drivers for the on-die thermal sensors as follows:
Thermal Sensor | ABI Name | Sensed Location |
---|
AOTAG | AOTAG | THERMAL_ZONE_AO | Co-locate with TDIODE in pad-ring |
SOC_THERM | PLLX | THERMAL_ZONE_PLLX | Center of CPU cluster |
AUX (x3) | THERMAL_ZONE_AUX | Near CV cluster, SoC cluster |
CPU | THERMAL_ZONE_CPU | Center of CPU cluster |
GPU | THERMAL_ZONE_GPU | Center of GPU |
SOC_THERM
SOC_THERM is the collection of on-chip ring oscillators whose frequency changes are based on temperature. To convert a measured frequency to a temperature, the oscillating frequency of the sensor, at a fixed temperature, must be known in advance and stored in the on-chip fuses.
The soctherm driver uses these fuses during boot and calibrates the sensor. Once the calibration is complete, the temperature sensor reports the temperature, in degrees Celsius, with a 0.5° C precision margin.
Sensors and Sensor Groups
The temperature sensors on the chip are logically grouped into sensor groups, based on their proximity to certain hardware blocks. The sensor groups are represented as a single sensor to the host OS and the BPMP firmware.
For example, Jetson AGX Xavier has two temperature sensors in the SOC cluster and one near the CV cluster. These are grouped as AUX sensors that are represented as THERMAL_ZONE_AUX to the operating system running on the CPUs. SOC_THERM reports the temperature of a given group by taking the maximum of all the sensors in the group.
Thermal Event Detection
Thermal sensors can report the temperature when the current temperature crosses a software programmed trip point. The sensors are capable of monitoring several of these software trip points to perform the following thermal actions:
• Report when the thermal trip point has been crossed
• Trigger a hardware thermal shutdown
• Trigger hardware throttling
Voltage Rail Dependencies
To provide accurate temperature sensing, the sensors require a minimum voltage. Additionally, the sensors cannot operate when the rail is power-gated.
When the system is in a low-power state, the firmware provides the following modes of operation:
• No temperature measurements during SC7: Because the rail powering the sensor is power-gated in the SC7 state, the oscillator is not running. Therefore, the frequency-to-temperature conversion may result in inaccurate values. To ensure no spurious temperature reports from the sensors, stop the sensor before entering the SC7 state.
The firmware provides the AOTAG sensor for measuring temperature in the SC7 state. When the SC7 state is exited, the sensors are restarted.
• Fallback to PLLX sensor on Jetson Xavier: To ensure accurate temperature readings during minimum voltage, use the PLLX sensor’s oscillator. On platforms where the minimum voltage is not guaranteed, the firmware falls back on the PLLX sensor’s oscillator with a programmable offset. The result is that all the sensors invalidate their oscillators and use the PLLX sensor’s oscillator with the added offset. This fallback on the PLLX sensor’s oscillator allows for continuous temperature measurement, even at lower voltage levels.
As a side effect of PLLX fallback, the programmable offset compensates for the fact that the PLLX sensor’s oscillator is farther away from the oscillator that it is replacing. The host OS continues to use all the thermal zones without side effects. The offset ensures that the CPU sensor reports more accurate temperatures than the PLLX sensor. The host OS must therefore continue to use the right sensors for measuring the CPU temperatures.
AOTAG
The Always-On Thermal Alert Generator (AOTAG) is a ring oscillator based temperature sensor. It is in the always-on power domain and can monitor temperatures even when the device is in the SC7 state. Apart from this distinction, the AOTAG sensor operates the same as any of the SOC_THERM sensors.
Thermal Event Detection
Just like the SOC_THERM sensor, the AOTAG sensor can generate interrupts. Additionally, it can monitor two software programmed levels that BSP uses as:
• Thermal zone trip points
• Hardware thermal shutdown
BPMP Thermal Framework
The BPMP firmware hosts a thermal framework to:
• Register thermal sensors as thermal zones as identified in
Thermal Sensing • Allow BPMP modules to register trips on the thermal zones
• Allow the host OS to register trips using thermal MRQ messages
• Provide trip management and reporting
The thermal framework maintains a list of trips per sensor that includes the current trip from the host OS and various BPMP modules. As temperatures change, the framework examines the list of current trips and notifies the owners of the trips of the changes. The notification is sent using a callback for the BPMP owned trips and the thermal MRQ command CMD_THERMAL_HOST_TRIP_REACHED for trips that are owned by the host OS.
The primary thermal MRQ requests handled by the framework are:
• CMD_THERMAL_QUERY_ABI
• CMD_THERMAL_GET_TEMP
• CMD_THERMAL_SET_TRIP
• CMD_THERMAL_GET_NUM_ZONES
Since there can be several trips on a given sensor, the thermal framework must ensure that a notification is generated whenever a given trip is crossed. For example, if THERMAL_ZONE_CPU has trips at 55°, 60°, 65°, and 70° C, the thermal framework sends a single notification when the temperature crosses 55°, 60°, 65°, and 70° C.
Additionally, the framework implements hysteresis to prevent sending too many notifications. So, for the above example, the framework:
• Sends one notification when the temperature reaches 55° C
• Waits until the temperature drops below 54° C
• Sends another notification when the temperature rises back to 55° C
To perform these notifications, the thermal framework sets low trips on the sensors to receive events that the temperature has dropped below the limit.
Hardware Throttling
Each element in a power delivery system includes limitations such as:
• The amount of current a battery can supply without shutting down
• The amount of current a regulator can provide before it fails to maintain its output voltage
• The amount of ripple current an inductor in a switching regulator can tolerate without overheating
These limitations can result in fast transient electrical and thermal events such as:
• Overcurrent at the battery
• Voltage drop at the PMIC
• Temperature spikes
The firmware refers to these events as OC alarms, and triggers hardware throttling of the clocks to handle them.
Impact
Like software throttling, hardware throttling may reduce performance. Because the triggering events are rare and transient in nature, though, the user experience is minimally impacted.
The host OS is not notified of these events, but you can detect the drop in clock rates by using a performance measuring tool that samples the CPU cycle counters. While thermal management in the host OS seeks to control temperature on an ongoing basis, hardware throttling clamps down the clocks to handle events.
Throttle Points and Vector Configuration
The BPMP device tree binary holds the various throttle points and the throttle settings that govern when and how throttling is performed. The soctherm driver in the firmware programs the hardware and handles any interrupts resulting from these events. You can change the throttle points by changing the BPMP device tree.
This table shows the hardware throttling levels.
Hardware Throttling | Clock Throttled Percentage |
---|
Heavy | 87.5 |
Medium | 75 |
Light | 50 |
Throttle vectors are optimized for limiting peak current consumption while maximizing performance. To manage peak current consumption, the firmware supports capping the CPU and GPU clocks at three levels (light, medium, and heavy), as described in the device tree bindings. Clock capping prevents the CPU and GPU from drawing more current than their voltage regulators can supply. For hardware throttling trip temperatures, see the table in
Thermal Specifications.
Design Considerations
Designing failsafe measures into Power Management Integrated Circuits (PMICs), or using the battery controller to shut down the device when the events described here occur, results in a bad user experience. Similarly, designing power delivery hardware for worst-case loads results in large and costly components.
Consequently, NVIDIA SoCs are designed for use with power delivery systems that are adequate for common loads. NVIDIA SoCs actively manage their components to avoid exceeding their design limits. When events are transient, the advantage of this approach to power management becomes more compelling.
Hardware Thermal Shutdown
The final failsafe for firmware thermal management is a hardware thermal reset or thermtrip. If software and hardware throttling are unable to control heat generation in the system, and the software becomes unresponsive, the SoC asserts the reset pin on the PMIC as the hardware shutdown mechanism.
For hardware shutdown limits, see the table in
Thermal Specifications.
Thermal Specifications
This table describes the supported power states.
Thermal Zone | Thermal Sensor | Cooling Action | Jetson Xavier NX 10W‑30W | Jetson AGX Xavier 10W‑30W 1 | Jetson AGX Xavier maxEDP 2 | Jetson AGX Xavier Industrial 20W‑40W 3 | Jetson AGX Xavier Industrial maxEDP 4 |
CPU-therm | THERMAL_ZONE_PLLX | SW throttling | 90.5° C | 90.0° C | 86.0° C | 103.0° C | 96.0° C |
HW throttling | 94.5° C | 94.0° C | 90.0° C | 107.0° C | 100.0° C |
SW shutdown | 96.0° C | 95.5° C | 91.5° C | 108.5° C | 101.5° C |
HW shutdown | 96.5° C | 96.0° C | 92.0° C | 109.0° C | 102.0° C |
GPU-therm | THERMAL_ZONE_GPU | SW throttling | 91.5° C | 92.5° C | 88.0° C | 102.0° C | 98.0° C |
HW throttling | 95.5° C | 96.5° C | 92.0° C | 106.0° C | 102.0° C |
SW shutdown | 97.0° C | 98.0° C | 93.5° C | 107.5° C | 103.5° C |
HW shutdown | 97.5° C | 98.5° C | 94.0° C | 108.0° C | 104.0° C |
AUX-therm | THERMAL_ZONE_AUX | SW throttling | 90.0° C | 89.0° C | 82.0° C | 100.5° C | 92.5° C |
HW throttling | 94.0° C | 93.0° C | 86.0° C | 104.5° C | 96.5° C |
SW shutdown | 95.5° C | 94.5° C | 87.5° C | 106.0° C | 98.0° C |
HW shutdown | 96.0° C | 95.0° C | 88.0° C | 106.5° C | 98.5° C |
AO-therm | THERMAL_ZONE_AO | HW shutdown | 109.0° C | 109.0° C | 109.0° C | 119.0° C | 119.0° C |
PMIC-Die | PMIC thermal sensor | HW shutdown | 120.0° C | 120.0° C | 120.0° C | 120.0° C | 120.0° C |
Tboard-tegra | TMP451 local sensor | HW shutdown | | 107.0° C | 107.0° C | 125.0° C | 125.0° C |
Tdiode-tegra | TMP451 external sensor | HW shutdown | | 109.0° C | 109.0° C | 117.0° C | 117.0° C |
thermal-fan-est 5 | Weighted average of CPU, GPU and AUX | Fan speed control | 46.0° C | 50.0° C | 50.0° C | 50.0° C | 50.0° C |
1 Jetson AGX Xavier uses the 10W‑30W thermal specification when it is flashed with jetson-xavier.conf. 2 Jetson AGX Xavier uses the maxEDP thermal specification when it is flashed with jetson-xavier-maxn.conf. 3 Jetson AGX Xavier Industrial uses the 20W‑40W thermal specification when it is flashed with jetson-agx-xavier-industrial.conf. 4 Jetson AGX Xavier Industrial uses the maxEDP thermal specification when it is flashed with jetson-agx-xavier-industrial-mxn.conf. 5 The thermal-fan-est thermal zone’s trip temperature depends on the selected fan mode. For details, see Fan Mode Control. |
Software-Based Power Consumption Modeling
Jetson Xavier NX series and Jetson AGX Xavier series modules integrate a three-channel INA3221 power monitor whose information can be read using sysfs nodes. The following table shows the naming convention for sysfs nodes.
Command | Description |
rail_name_<N> | Sets/get the rail name. |
in_current<N>_input | Gets rail current in milliamperes. |
in_voltage<N>_input | Gets rail voltage in millivolts. |
In_power<N>_input | Gets rail power in milliwatts. |
crit_current_limit_<N> | Sets/gets rail instantaneous current limit in milliamperes. |
warn_current_limit_<N> | Sets/gets rail average current limit in milliamperes. |
Where <N> is a channel number 0-2. |
Note: | The INA driver may also present other nodes. Do not modify any INA sysfs node value. Modifying these values can result in damage to the device. |
Jetson Xavier NX Series
The Jetson Xavier NX module has one INA3221 power monitor at I2C address 0x40. The sysfs nodes to read for rail names, voltage, current, power, and instantaneous and average current limit are at:
/sys/bus/i2c/drivers/ina3221x/7-0040/iio:device0
The rail names for I2C address 0x40 are:
Rail Name | Description |
Channel 0: 5V_IN | System 5V power rail |
Channel 1: VDD_CPU_GPU | CPU + GPU combined power rail |
Channel 2: VDD_SOC | SoC power rail |
Note: | The VDD_IN rail overcurrent thresholds for average current are 3 A for 10W and 15W power modes, and 4 A for 20W power modes. For instantaneous current they are 5 A for all power modes. When module current consumption exceeds the configured limits, the INA3221 triggers CPU and GPU hardware clock throttling to prevent shutdown and physical damage. The nvpmodel GUI applet notifies user space processes of overcurrent events. For more details, see the section Voltage and Current Monitor in the topic Hardware Setup. |
Jetson AGX Xavier Series
The Jetson AGX Xavier series modules have two 3-channel INA3221 power monitors at I2C addresses 0x40 and 0x41. The sysfs nodes to read for rail names, voltage, current, power, and instantaneous and average current limit are at:
/sys/bus/i2c/drivers/ina3221x/1-0040/iio:device0
/sys/bus/i2c/drivers/ina3221x/1-0041/iio:device1
The rail names for I2C address 0x40 are:
Rail Name | Description |
Channel 0: GPU | GPU power rail |
Channel 1: CPU | CPU power rail |
Channel 2: SOC | SoC power rail |
The rail names for I2C address 0x41 are:
Rail Name | Description |
Channel 0: CV | CV power rail |
Channel 1: VDDRQ | DDR power rail |
Channel 2: SYS5V | System 5V power rail |
Examples
• To read INA3221 at 0x40, the channel-1 rail name, enter the command:
$ cat /sys/bus/i2c/drivers/ina3221x/1-0040/iio:device0/rail_name_1
• To read channel-1 voltage, current, and power, enter the commands:
$ cat /sys/bus/i2c/drivers/ina3221x/1-0040/iio:device0/in_current1_input
$ cat /sys/bus/i2c/drivers/ina3221x/1-0040/iio:device0/in_voltage1_input
$ cat /sys/bus/i2c/drivers/ina3221x/1-0040/iio:device0/in_power1_input
• To read the channel‑1 instantaneous current limit, enter the command:
$ cat /sys/bus/i2c/drivers/ina3221x/1-0040/iio:device0/crit_current_limit_0
• To set the channel‑1 instantaneous current limit, enter the command:
$ echo <current> > /sys/bus/i2c/drivers/ina3221x/1-0040/iio:device0/crit_current_limit_0
• To read the channel-1 average current limit, enter the command:
$ cat /sys/bus/i2c/drivers/ina3221x/1-0040/iio:device0/warn_current_limit_1
• To set the channel-1 average current limit, enter the command:
$ echo <current> > /sys/bus/i2c/drivers/ina3221x/1-0040/iio:device0/warn_current_limit_1
Where <current> is a critical current limit to be set for VDD_IN rail, in milliamperes.
Related Tools and Techniques
This section describes the tools and techniques for managing power.
CPU Hot Plug
You may use the following procedures to manage CPU hot plugging.
To turn secondary CPUs on or off manually
• Enter this command to turn the secondary CPU on:
$ echo 1 > /sys/devices/system/cpu/cpuX/online
• Enter this command to turn the secondary CPU off:
$ echo 0 > /sys/devices/system/cpu/cpuX/online
To check a CPU’s state
• Enter the command:
$ cat /sys/devices/system/cpu/cpu<x>/online
Where <x> is the CPU core number.
To check online CPU cores
• Enter the command:
$ cat /sys/devices/system/cpu/online
CPU Frequency Scaling
The default CPU frequency governor is schedutil.
To list available CPU frequency governors
• Enter the command:
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
To select a CPU frequency governor
• Enter the command:
$ echo <name> > /sys/devices/system/cpu/cpu<x>/cpufreq/scaling_governor
Where:
• <name> is the name of the governor to be selected
• <x> is the CPU core number
GPU 3D Frequency Scaling
GPU 3D frequency scaling is enabled by default.
To disable 3D frequency scaling
• Enter the command:
$ echo 0 > /sys/devices/17000000.gv11b/enable_3d_scaling
To enable 3D frequency scaling
• Enter the command:
$ echo 1 > /sys/devices/17000000.gv11b/enable_3d_scaling
Getting and Setting Frequencies
Use the following procedures to set frequencies and report current frequency settings.
Note: | In all of these procedures, <x> is a CPU core number. For example, to apply a command to CPU core 1, replace cpu<x> with cpu1. |
To get system clock information
• Enter the command:
$ cat /sys/kernel/debug/bpmp/debug/clk/clk_tree
To print the CPU lower boundary, upper boundary, and current frequency
• Enter the commands:
$ cat /sys/devices/system/cpu/cpu<x>/cpufreq/cpuinfo_min_freq
$ cat /sys/devices/system/cpu/cpu<x>/cpufreq/cpuinfo_max_freq
$ cat /sys/devices/system/cpu/cpu<x>/cpufreq/cpuinfo_cur_freq
To change the CPU upper boundary
• Enter the command:
$ echo <cpu_freq> > /sys/devices/system/cpu/cpu<x>/cpufreq/scaling_max_freq
To change the CPU lower boundary
• Enter the command:
$ echo <cpu_freq> > /sys/devices/system/cpu/cpu<x>/cpufreq/scaling_min_freq
To set the static CPU frequency
• Enter the commands:
$ echo <cpu_freq> > /sys/devices/system/cpu/cpu<x>/cpufreq/scaling_min_freq
$ echo <cpu_freq> > /sys/devices/system/cpu/cpu<x>/cpufreq/scaling_max_freq
Where:
• <cpu_freq> is the frequency value available at:
/sys/devices/system/cpu/cpu<x>/cpufreq/scaling_available_frequencies
To print the GPU lower boundary, upper boundary, and current frequency
• Enter the commands:
$ cat /sys/devices/17000000.gv11b/devfreq/17000000.gv11b/min_freq
$ cat /sys/devices/17000000.gv11b/devfreq/17000000.gv11b/max_freq
$ cat /sys/devices/17000000.gv11b/devfreq/17000000.gv11b/cur_freq
To change the GPU upper boundary
• Enter the command:
$ echo <gpu_freq> > /sys/devices/17000000.gv11b/devfreq/17000000.gv11b/max_freq
To change the GPU lower boundary
• Enter the command:
$ echo <gpu_freq> > /sys/devices/17000000.gv11b/devfreq/17000000.gv11b/min_freq
To set the static GPU frequency
• Enter the command:
$ echo <gpu_freq> > /sys/devices/17000000.gv11b/devfreq/17000000.gv11b/min_freq
$ echo <gpu_freq> > /sys/devices/17000000.gv11b/devfreq/17000000.gv11b/max_freq
Where <gpu_freq> is the value available in:
/sys/devices/17000000.gv11b/devfreq/17000000.gv11b/available_frequencies
To print the EMC lower boundary, upper boundary, and current frequency
• Enter the commands:
$ cat /sys/kernel/debug/bpmp/debug/clk/emc/min_rate
$ cat /sys/kernel/debug/bpmp/debug/clk/emc/max_rate
$ cat /sys/kernel/debug/bpmp/debug/clk/emc/rate
To change the EMC upper boundary
• Enter the command:
$ echo <emc_freq> > /sys/kernel/debug/bpmp/debug/clk/emc/max_rate
To change the EMC lower boundary
• Enter the command:
$ echo <emc_freq> > /sys/kernel/debug/bpmp/debug/clk/emc/min_rate
To set static EMC frequency
• Enter the commands:
$ echo 1 > /sys/kernel/debug/bpmp/debug/clk/emc/mrq_rate_locked
$ echo 1 > /sys/kernel/debug/bpmp/debug/clk/emc/state
$ echo <emc_freq> > /sys/kernel/debug/bpmp/debug/clk/emc/rate
Where <emc_freq> is a frequency value between EMC minimum and maximum frequencies.
To set static VIC frequency
Maximizing Jetson Xavier Performance
BSP provides the jetson_clocks.sh script to maximize a Jetson AGX Xavier device’s performance by setting the static maximum frequencies of the CPU, GPU and EMC clocks. You can also use the script to show current clock settings, store current clock settings into a file, and restore clock settings from a file.
The script is available at:
/usr/bin/jetson_clocks
To run the script, enter:
$ jetson_clocks [options]
Option | Description |
--show | Displays the current settings. |
--store [<file>] | Stores the current settings to a file. The default file is l4t_dfs.conf. |
--restore [<file>] | Restores the saved settings from a file. The default file is l4t_dfs.conf. |
--fan | Set maximum PWM fan speed. |
To show the current settings
• Enter the command:
$ sudo /usr/bin/jetson_clocks --show
To store the current settings
• Enter the command:
$ sudo /usr/bin/jetson_clocks --store
To maximize Jetson AGX Xavier series performance
• Enter the command:
$ sudo /usr/bin/jetson_clocks
To maximize Jetson AGX Xavier series performance and fan speed
• Enter the command:
$ sudo /usr/bin/jetson_clocks --fan
Note: | Starting with release 32.4, jetson_clocks no longer sets maximum fan speed by default. If you prefer the old behavior, use the --fan option. |
To restore the previous settings
• Enter the command:
$ sudo /usr/bin/jetson_clocks --restore
To check CPU state
• Enter the command:
$ cat /sys/devices/system/cpu/cpu<x>/online
Fan Speed Control
To set fan speed manually
• Enter the command:
$ echo <PWM_duty_cycle> > /sys/devices/pwm-fan/target_pwm
Where <PWM_duty_cycle> is a value in the range [0,255].
To get the fan speed measured by the tachometer
• Enter the command:
$ cat /sys/devices/generic_pwm_tachometer/hwmon/hwmon1/rpm
nvpmodel GUI
The nvpmodel GUI is a GUI front end for the nvpmodel command line tool. It is an easy way to access power-related functionality and information.
To use the nvpmodel GUI
The nvpmodel GUI is represented by an NVIDIA icon on the right side the Ubuntu desktop’s top bar:
The current power mode is displayed next to the NVIDIA icon. In the illustration above, the current mode is MODE1SW.
• To switch the current power mode, click the NVIDIA icon to open a dropdown menu from the icon. Click “Power mode” to open a submenu of power modes.
Click the power mode you want to set.
• To run tegrastats, click the NVIDIA icon to open the dropdown menu.
Click “Run tegrastats” to spawn a terminal window and run tegrastats.
The tegrastats display provides power-related information such as CPU, GPU, and EMC frequencies and the temperatures of thermal zones registered to the system.
• If system input voltage drops below a safe level, the nvpmodel GUI displays a desktop notification to warn you that the system is being throttled back to avoid a shutdown due to insufficient power. When the system is thermally throttled, the GUI displays a similar notification to show that the device is operating at lowered speed to reduce heat generation.
These are examples of notifications:
Note: | The look and feel of the nvpmodel GUI are different in different desktop environments. A notable example is the LXDE desktop, where “Power mode” menu item is not shown in the NVIDIA icon’s dropdown menu. |