NVIDIA Tegra
DRIVE 5.0 Linux Open Source Software

Development Guide
5.0.10.3 Release


 
Discrete GPU
 
GPU Clock Controls
Monitoring Discrete GPU Events
Discrete GPU Thermal Alert
The following sections relate to discrete GPU.
GPU Clock Controls
The following sections describe the GPU clock controls.
Setting GPU Clocks Target Frequencies
A privileged application can set target frequencies for discrete GPU clock domains.
When the application opens a handle to the GPU device, it implicitly starts a clock session. For this clock session, the application can set target frequencies for GPU clock domains. The clock arbiter in the GPU driver will try to meet requested target frequencies, as long as the clock session is active, (as long as the device handle is valid). Closing the device handle implies cancelling all target frequency requests. It is allowed for an application to open multiple handles to a GPU device, and therefore having multiple clock sessions. It is allowed for multiple applications to set target frequencies for the same clock domain.
Clock target frequencies of multiple sessions are coalesced. Generally, given that thermal and power limits are not exceeded, the actual clock frequency will be at least the greatest requested.
For a given clock domain, the clock arbiter will look for the optimal Voltage/frequency point that meets at least the highest frequency requested. In case of power or thermal condition change, clock arbiter will try to maintain the frequency, by increasing the voltage. It may have to fall back to a lower frequency to meet power and thermal constraints.
An application can get notification on clock domain frequency change.
When an application exits, the target frequencies requested by this application are cancelled.
The libnvrm_gpu library provides APIs to:
Get a list of all configurable clock domains and related clock ranges
Get a list of frequency points for a given domain. Each point corresponds to an optimized voltage.
Set target frequency for one or multiple clock domains. Operation can be synchronous or asynchronous.
Wait for completion of one or multiple asynchronous requests.
Get notifications whenever a clock domain frequency changes
Library API is defined in nvrm_gpu.h and sample code implements command line to control GPU clocks.
To Build the GPU Clock Controls Sample Application
To build a sample application on the host, execute:
cd <top>/drive-<platform|ver>-<os>/samples/nvrm/gpu/clocks
make clean
make
To Run the GPU Clock Controls Sample Application
Here is a step-by-step example that shows how to tweak the discrete GPU clock controls.
1. Get the list of available GPUs with the following command:
./nvrm_gpu_info -list
Here is a sample output from the above command:
gpu-index probe name
======================================
2000 nvgpu:/dev/nvhost-gpu
2001 nvgpu:/dev/nvgpu-pci/card-0000:04:00.0
2. Define an environment variable with the following command so that all commands apply to discrete GPU:
export NVRM_GPU_DEFAULT_DEVICE_INDEX=2001
3. Check the list of configurable clock domains for discrete GPU (on PCI bus) with the following command:
./nvrm_gpuclk get domains
It should return MCLK (Memory Clock) and GPCCLK (Main Graphics Core Clock).
4. Get clock ranges in MHz with the following command by using -m option:
./nvrm_gpuclk get range -m
5. Get frequency points in MHz for configurable clock domains with the following command:
./nvrm_gpuclk get points -m
It is also possible to query a specific clock domain, for instance GPCCLK:
./nvrm_gpuclk get points -m gpcclk
6. Set target MHz for a given clock domain with the following command:
./nvrm_gpuclk set target -m gpcclk 2581 -w -1 &
 
Note:
Since all target frequencies requested by an application are cancelled when the application exits, we run the command in background and add the -w -1 parameter to wait indefinitely at the end of the command. This way it is possible to check that these target frequencies were taken into account.
7. Set target MHz for multiple clock domains at once with the following command:
./nvrm_gpuclk set target -m gpcclk 2581 mclk 3003 -w 5000 &
In the above example, we use -w 5000 parameter to wait 5 seconds before exiting the application. As a consequence, requested target frequencies will only apply for this duration.
8. Get actual MHz for configurable clock domains with the following command:
./nvrm_gpuclk get actual -m
It is possible to query the actual MHZ for specific clock domains, simply by adding clock domain names at the end of the above command.
9. Test frequency points for a given domain with the following command:
./nvrm_gpuclk set points -m gpcclk -i 10
The above command will set all V/F points of GPCCLK in ascending, and then descending order. The -i option allows to specify a delay of 10 milliseconds (ms) between 2 V/F points settings.
10. Test all frequency points combinations with the following command:
./nvrm_gpuclk set points -m -i 10
The above command attempts all combinations of V/F points for all clock domains in ascending and then descending orders.
11. Monitor clock changes with the following command:
./nvrm_gpuclk monitor
The above command displays actual frequencies for all clock domains each time there is a clock change caused by other processes, or caused by power or thermal conditions.
Monitoring Discrete GPU Events
Event sessions are defined to allow monitoring events from dGPU.
Sample events are:
Target frequency update.
Target frequency not possible for a given clock domain.
Temperature above threshold.
On each session it is possible to define an event filter to monitor only a subset of available events.
Event filter is provided when opening the session. All events that occur after the session is opened, and are part of the event filter, are reported on that event session.
The GPU time stamp in nanoseconds is provided together with the event identifier.
An application can open multiple event sessions, and define different event filter for each of them. Multiple applications can also open event sessions.
The libnvrm_gpu library provides APIs to:
Open an event monitoring session and set an event filter
Read events from the event session (blocking or non-blocking)
Close event monitoring session.
List of event and library APIs are defined in nvrm_gpu.h.
For examples, refer to nvrm_gpuclk or nvrm_gputhermal sample codes.
Discrete GPU Thermal Alert
Setting Discrete GPU Thermal Alert
A privileged application can set the thermal alert limit for discrete GPU thermal domains.
When the application opens a handle to the GPU device, it implicitly starts a thermal session. For this thermal session, the application can set thermal alert limit for discrete GPU thermal domains.
An application can get notification when temperature goes from below x mC to upper x mC.
Note:
x represents thermal alert temperature limit.
The libnvrm_gpu library provides APIs to:
Handle discrete GPU event session (open/filter/read/close).
Read discrete GPU temperature (milli degree Celsius).
Set thermal alert limit for discrete GPU thermal domains. Operation should be synchronous.
Get notifications whenever temperature goes from below x mC to upper x mC.
Library API is defined in nvrm_gpu.h and sample code implements command line to control discrete GPU thermal alert limit.
API for configure the GPU thermal alert limit:
/**
* Set thermal alert limit
*
* @param hDevice - GPU device handle
* @param temperature - thermal temperature alert limit value in milli degree Celsius
*
* @return NvSuccess indicates that request was successful
*/
NvError NvRmGpuDeviceThermalAlertSetLimit(NvRmGpuDevice *hDevice,
int32_t temperature_mC);
Building Discrete GPU Thermal Alert Sample Application
To build a sample application on the host, use the following command:
cd <top>/drive-<platform|ver>-<os>/samples/nvrm/gpu/thermal
make clean
make
Running Discrete GPU Thermal Alert Sample Application
1. Define an environment variable with the following command so that all commands apply to discrete GPU:
export NVRM_GPU_DEFAULT_DEVICE_INDEX=2001
2. Run the sample with the following command:
./nvrm_gputhermal --temp_mC <x mC>
User Space Thermal Alert
This feature provides the facility for userspace applications to know whether a thermal alert occurred or not for a particular temperature trip value.
This feature also notifies the userspace when temperature crosses the trip temperature value through the sysfs nodes listed below.
/sys/bus/platform/devices/userspace-alert/thermal_alert_block
Read and Write values:
Read: Blocks the read call until an alert occurs or the specified duration times out. When alert occurs or if alert is already present, then read returns immediately with value "1". If alert is not there and time out happens then read returns with value "0".
Write: Set the time out duration (in milli seconds) to wait when alert is not present. By default this value is "0" if not set. Value of "0" will wait for infinite time. When an alert occurs or if an alert is already present, then the write value has no significance.
/sys/bus/platform/devices/userspace-alert/thermal_alert
Read only value
Read: If alert is present then returns "1", otherwise returns "0".
To change trip temperature
The trip temperature can be modified at runtime as writable permission is set in device tree properties of trip temperature.
If an alert occurs, then by increasing the trip temperature to a higher value than the current temperature will turn off the alert. The userspace can wait again for another alert with the increased trip temperature.
To find the trip temperature sysfs for the userspace-alert, the following script can be used:
#!/bin/bash
cd /sys/class/thermal;
#type="cpu-balanced gpu-balanced userspace-alert";
type="$1";
if [ -z "$type" ]; then
echo " "
echo "*******************************************"
echo "Must pass cdev name to find the trip point"
echo "*******************************************"
echo " "
exit
fi;
for t in thermal_zone*[0-9];
do
for i in ${t}/cdev*[0-9];
do
if [ -d /sys/class/thermal/${i} ]; then
for j in $type; do
if [ "$j" == "`cat ${i}/type`" ]; then
echo " "
trip=`cat ${i}_trip_point`; \
echo "$t: `cat ${t}/type`"; \
echo "$i: `cat ${i}/type`"; \
echo "`pwd`/${t}/trip_point_${trip}_temp:`cat ${t}/trip_point_${trip}_temp`";
fi;
done;
fi;
done;
done;
Name the above script as find_trip_of_cdev.sh, and use the following command to find the thermal trip sysfs:
root@tegra-ubuntu:~# ./find_trip_of_cdev.sh userspace-alert
thermal_zone5: Tdiode_tegra
thermal_zone5/cdev0: userspace-alert
/sys/class/thermal/thermal_zone5/trip_point_5_temp:60000
 
Note:
thermal_zone5: Tdiode_tegra: This is the thermal zone name under which userspace-alert cooling device is configured.
thermal_zone5/cdev0: userspace-alert: This is the cooling device name.
/sys/class/thermal/thermal_zone5/trip_point_5_temp:60000: This is the trip temperature associated with userspace-alert cooling device. When “Tdiode_tegra” thermal zone temperature crosses this trip temperature then alert gets triggered.
To set a different thermal trip (for example: 70degC), use the following command:
root@tegra-ubuntu:~# echo 70000 > /sys/class/thermal/thermal_zone5/trip_point_5_temp
To configure Device Tree for userspace alert
For DT configuration details, refer to kernel documentation
Documentation/devicetree/bindings/thermal/userspace-alert.txt
Userspace alert is currently configured under Tdiode thermal zone, which is the external thermal sensor measurement of Tegra die temperature.
Userspace alert DT settings can be found in the kernel dtsi file:
arch/arm64/boot/dts/nvidia/t18x/tegra186-platforms/tegra186-vcm31-thermal.dtsi