Power and Thermals#

This chapter provides information about CPU power and thermal management settings.

Power Tunings#

GB200 NVL72 provides a new set of features to tune and achieve the best performance per watt for common workloads such as DL inference and training. These features reduce the application tuning steps involved and optimally uses the critical power available resource.

Power profiles: NVIDIA uses its hardware and software architecture expertise to optimize performance and power for key data center use cases like LLM training and inference by considering user constraints like Max-Q versus Max-P.

NVIDIA has developed pretuned GPU power workload profiles for immediate, optimized power configurations. Customers do not need to manually tune the power configuration knobs, and the profile infrastructure automatically provide optimized configurations for internal and external configuration knobs for top data center use cases and based on the customer’s requirement for perf and power (Max-Q vs Max-P). Customers can select at least one profile based on their workload and performance per watt goal. The profile infrastructure, which is supported by profiles and multiple power configuration knobs, automatically configures these knobs. The profile infrastructure supports multiple knobs such as TGP, GPC clocks, and Mem clocks, and each profile can have at least one of these knobs. NVIDIA provides pretuned values for these knobs for each workload profile after analyzing various workloads data. The profile infrastructure supports priority and conflicting priority, automatically choosing a profile based on priority if conflicts arise. Customers can configure the profiles using the BCM, DCGMI, NVSMI, or OOB (Redfish APIs on the HMC).
Power smoothing: Bulk synchronous workloads such as batch processing or scientific computing at the data center level generally start/stop at the same time. This results in large swings of power in the application and at the beginning/end of the workload. This results in failures at utility, transformer level and at the UPS level. Power Smoothing has been implemented to handle these situations, and customers can refer to the application note for more information.
Power Balancing/Sloshing: In the data center design, the provisioned power for a rack/cluster is constrained and determined at built time, which frequently limits GPU performance. This presents an opportunity for GPUs to efficiently use unused power from various components in the rack/module. This helps enhance the performance in the allocated power limits of the rack/cluster.