Managing and Customizing TuneD Profiles#

TuneD is a system tuning service that provides profiles for optimizing system performance for various use cases. The NVIDIA BaseOS software includes the nvidia-tuned-profiles package, which provides pre-configured TuneD profiles optimized for different NVIDIA DGX platforms and use cases.

About NVIDIA TuneD Profiles#

The nvidia-tuned-profiles package installs profiles for various DGX systems to /usr/lib/tuned/profiles/. These profiles are categorized as follows:

Platform-Specific Performance Profiles

  • dgx-a100-performance, dgx-a800-performance - Optimized for DGX A100/A800 systems

  • dgx-h100-performance, dgx-h200-performance, dgx-h800-performance - Optimized for DGX H100/H200/H800 systems

  • dgx-b200-performance, - Optimized for DGX B200 systems

Crashdump Profiles

  • dgx-a100-crashdump, dgx-a800-crashdump - Crashdump configuration for A100/A800 systems

  • dgx-h100-crashdump, dgx-h200-crashdump - Crashdump configuration for H100/H200 systems

  • dgx-b200-crashdump - Crashdump configuration for B200 systems

Base and Common Profiles

  • dgx-base - Base profile with common DGX settings and includes cachefilesd overrides

  • nvidia-base - Base profile with NVIDIA-specific settings and service overrides

  • nvidia-x86-64-performance - Performance profile for x86_64 architectures

  • nvidia-crashdump-core - Core crashdump configuration

  • nvidia-no-mitigations - Disables CPU mitigations for better performance

Understanding Profile Inheritance#

Most NVIDIA profiles use the include directive to inherit settings from base profiles. This creates a hierarchy where platform-specific profiles build upon common base configurations:

  • nvidia-base - Provides core NVIDIA settings including:

    • CPU governor set to performance

    • Network ARP tuning for better networking

    • Service management for docker and nvidia-persistenced

    • Kernel parameter init_on_alloc=0 for performance

    • nvidia-peermem module loading configuration

  • dgx-base - Provides DGX-specific settings including:

    • Configuration for cachefilesd service (requires /raid to be mounted)

    • Service overrides to ensure proper startup dependencies

  • Platform profiles (for example, dgx-h100-performance) - Include both nvidia-base and dgx-base, then add:

    • Platform-specific bootloader parameters

    • Hardware-specific module parameters

    • Console and IOMMU settings

Listing Available TuneD Profiles#

To view all available TuneD profiles on your system:

sudo tuned-adm list

To view the currently active profile:

sudo tuned-adm active

To verify the current profile is properly applied:

sudo tuned-adm verify

To check the status of the TuneD service:

sudo systemctl status tuned

Cloning and Modifying an Existing Profile#

You can clone an existing NVIDIA profile and customize it for your specific needs. Custom profiles should be created in /etc/tuned/ to avoid conflicts with package updates.

  1. Identify the profile to clone by listing available profiles:

    sudo tuned-adm list
    
  2. Create a new directory for your custom profile:

    sudo mkdir -p /etc/tuned/my-custom-dgx-profile
    
  3. Copy the configuration from an existing profile. For example, to clone the dgx-h100-performance profile:

    sudo cp /usr/lib/tuned/profiles/dgx-h100-performance/tuned.conf /etc/tuned/my-custom-dgx-profile/
    

    The original dgx-h100-performance profile contains:

    [main]
    include=nvidia-base,dgx-base
    summary=TuneD Profile for DGX H100
    
    [bootloader]
    cmdline_iommu=iommu=pt
    cmdline_console=console=tty0 console=ttyS0,115200n8
    cmdline_pci=pci=realloc=off
    
  4. Edit the custom profile configuration:

    sudo vi /etc/tuned/my-custom-dgx-profile/tuned.conf
    
  5. Modify settings as needed. For example, you might want to add custom network tuning or adjust kernel parameters:

    [main]
    include=nvidia-base,dgx-base
    summary=Custom DGX H100 profile with network tuning
    
    [bootloader]
    cmdline_iommu=iommu=pt
    cmdline_console=console=tty0 console=ttyS0,115200n8
    cmdline_pci=pci=realloc=off
    # Add custom boot parameters
    cmdline_hugepages=hugepagesz=2M hugepages=8192
    
    [sysctl]
    # Add custom network tuning
    net.core.rmem_max=268435456
    net.core.wmem_max=268435456
    net.core.rmem_default=67108864
    net.core.wmem_default=67108864
    net.ipv4.tcp_rmem=4096 87380 134217728
    net.ipv4.tcp_wmem=4096 65536 134217728
    # Reduce swappiness for workloads with large memory requirements
    vm.swappiness=10
    
  6. Save the file and activate your custom profile:

    sudo tuned-adm profile my-custom-dgx-profile
    
  7. Verify the profile is active:

    sudo tuned-adm active
    sudo tuned-adm verify
    

Creating a Custom Profile from Scratch#

If you need to create a completely custom profile rather than modifying an existing one:

  1. Create a directory for your new profile:

    sudo mkdir -p /etc/tuned/my-dgx-custom
    
  2. Create a new tuned.conf file:

    sudo vi /etc/tuned/my-dgx-custom/tuned.conf
    
  3. Add your profile configuration. Here’s an example that builds on NVIDIA base profiles:

    [main]
    include=nvidia-base
    summary=Custom DGX performance profile for AI workloads
    description=Optimized profile for training large language models
    
    [bootloader]
    # Enable IOMMU in passthrough mode for better performance
    cmdline_iommu=iommu=pt
    # Allocate hugepages for better memory performance
    cmdline_hugepages=hugepagesz=2M hugepages=16384
    # Disable CPU mitigations for maximum performance
    cmdline_mitigations=mitigations=off
    
    [sysctl]
    # Network tuning for distributed training
    net.core.rmem_max=268435456
    net.core.wmem_max=268435456
    net.core.rmem_default=67108864
    net.core.wmem_default=67108864
    net.ipv4.tcp_rmem=4096 87380 134217728
    net.ipv4.tcp_wmem=4096 65536 134217728
    
    # Memory management
    vm.swappiness=10
    vm.dirty_ratio=40
    vm.dirty_background_ratio=10
    
    # Disable NUMA balancing for better performance in GPU workloads
    kernel.numa_balancing=0
    
    [modules]
    # Load nvidia driver with relaxed ordering enabled
    nvidia=NVreg_EnablePCIERelaxedOrderingMode=1
    

    Common profile sections and options:

    • [main] - Profile metadata (summary, description)

    • [cpu] - CPU-related settings (governor, energy policy)

    • [sysctl] - Kernel parameters

    • [bootloader] - Kernel boot parameters

    • [disk] - Storage settings (readahead, scheduler)

    • [vm] - Virtual memory settings

    • [service] - Service enable/disable directives

    • [script] - Custom scripts to run

  4. Activate the new profile:

    sudo tuned-adm profile my-dgx-custom
    
  5. Verify the profile is active and properly applied:

    sudo tuned-adm active
    sudo tuned-adm verify
    

Using TuneD Profile Merging#

TuneD supports merging multiple profiles to combine their settings. This is useful when you want to combine the optimizations from multiple profiles without creating a completely new profile.

Applying Multiple Profiles#

You can apply multiple profiles at once, and TuneD will merge their configurations. Profiles are applied in order, with later profiles overriding settings from earlier ones.

Note

TuneD merges profiles automatically without validating the logical consistency of the combined settings. Carefully review the profiles you’re merging to avoid conflicting configurations. For example, combining a profile optimized for high throughput with one optimized for power saving could result in counterproductive settings.

  1. To apply multiple profiles using the merge functionality:

    sudo tuned-adm profile <profile1> <profile2> <profile3>
    

    Example - combining NVIDIA base settings with architecture-specific optimizations:

    sudo tuned-adm profile nvidia-base nvidia-x86-64-performance
    
  2. Verify the merged profile is active:

    sudo tuned-adm active
    

    The output will show all active profiles separated by spaces.

Creating a Profile that Includes Other Profiles#

You can also create a custom profile that explicitly includes other profiles using the include directive:

  1. Create a new custom profile directory:

    sudo mkdir -p /etc/tuned/my-merged-profile
    
  2. Create a tuned.conf file that includes other profiles:

    sudo vi /etc/tuned/my-merged-profile/tuned.conf
    
  3. Use the include directive to merge base profiles and add customizations:

    [main]
    summary=Custom merged DGX profile
    include=dgx-base nvidia-x86-64-performance
    
    [sysctl]
    # Additional custom settings that extend the included profiles
    vm.swappiness=5
    net.core.netdev_max_backlog=5000
    
    [cpu]
    # Override CPU settings from included profiles
    governor=performance
    

    In this example:

    • The profile includes settings from both dgx-base and nvidia-x86-64-performance

    • Additional custom settings are layered on top

    • Settings defined in this profile will override those from included profiles

  4. Activate the merged profile:

    sudo tuned-adm profile my-merged-profile
    

Common Profile Merge Examples#

Example 1: Performance with security mitigations disabled

sudo tuned-adm profile dgx-h100-performance nvidia-no-mitigations

This combines the DGX H100 performance optimizations with CPU security mitigations disabled for maximum performance. Only use this in isolated, trusted environments.

Example 2: Base configuration with architecture-specific tuning

sudo tuned-adm profile nvidia-base nvidia-x86-64-performance

This creates a generic high-performance NVIDIA profile by combining:

  • nvidia-base: Core NVIDIA settings (CPU governor, network tuning, service management)

  • nvidia-x86-64-performance: Architecture-specific optimizations for x86_64 systems

After applying, verify with:

sudo tuned-adm active
# Output: Current active profile: nvidia-base nvidia-x86-64-performance

Example 3: Crashdump-enabled profile

sudo tuned-adm profile dgx-h100-crashdump

The crashdump profiles (like dgx-h100-crashdump) automatically include the base performance profile and add crashdump configuration:

[main]
include=dgx-h100-performance,nvidia-crashdump-core
summary=TuneD Profile for DGX H100 with Crashdump Enabled

[bootloader]
cmdline_crashkernel=crashkernel=1G-:2048M

This reserves 2GB of memory for crashdump capture and configures kernel panic behavior.

Important

When merging profiles, settings from profiles listed later in the command override settings from earlier profiles. Plan your profile order accordingly.

Viewing Profile Contents#

To examine the contents of a profile before using it:

  1. View system-provided profiles:

    cat /usr/lib/tuned/profiles/<profile-name>/tuned.conf
    

    For example, to view the dgx-h100-performance profile:

    cat /usr/lib/tuned/profiles/dgx-h100-performance/tuned.conf
    

    This will show:

    [main]
    include=nvidia-base,dgx-base
    summary=TuneD Profile for DGX H100
    
    [bootloader]
    cmdline_iommu=iommu=pt
    cmdline_console=console=tty0 console=ttyS0,115200n8
    cmdline_pci=pci=realloc=off
    
  2. View custom profiles:

    cat /etc/tuned/<profile-name>/tuned.conf
    

Example: Examining Key Profiles#

nvidia-base profile - The foundation for most NVIDIA profiles:

[main]
summary=Base NVIDIA tuning configuration

[service]
service.docker=start,enable,file:/usr/lib/tuned/profiles/nvidia-base/docker-override.conf
service.nvidia-persistenced=start,enable,file:/usr/lib/tuned/profiles/nvidia-base/nvidia-persistenced-override.conf

[cpu]
governor=performance

[bootloader]
cmdline_init_on_alloc=init_on_alloc=0

[modules]
nvidia-peermem=+r opt1=noop

[sysctl]
net.ipv4.conf.all.arp_announce = 2
net.ipv4.conf.default.arp_announce = 2
net.ipv4.conf.all.arp_ignore = 1
net.ipv4.conf.default.arp_ignore = 1

dgx-a100-performance profile - Platform-specific configuration:

[main]
include=nvidia-base,dgx-base
summary=TuneD Profile for DGX A100

[bootloader]
cmdline_iommu=iommu=pt
cmdline_console=console=tty0 console=ttyS1,115200n8

[modules]
nvidia=NVreg_EnablePCIERelaxedOrderingMode=1

nvidia-crashdump-core profile - Crashdump configuration:

[sysctl]
kernel.panic_on_unrecovered_nmi = 1
kernel.unknown_nmi_panic = 1
kernel.hardlockup_panic = 1
kernel.panic_on_io_nmi = 1
kernel.softlockup_panic = 1
kernel.panic_on_oops = 1
kernel.hung_task_panic = 1
kernel.panic_on_rcu_stall = 1
kernel.panic = 30

Disabling TuneD#

If you need to disable TuneD and revert all tuning changes:

sudo tuned-adm off
sudo systemctl stop tuned
sudo systemctl disable tuned

To re-enable TuneD:

sudo systemctl enable tuned
sudo systemctl start tuned
sudo tuned-adm profile <profile-name>

Disabling Security Mitigations for Maximum Performance#

The nvidia-no-mitigations profile disables CPU security mitigations (such as Spectre and Meltdown protections) to achieve maximum system performance. This section provides detailed instructions on when and how to disable these mitigations safely.

Understanding Security Mitigations#

Modern CPUs include hardware security vulnerabilities (such as Spectre, Meltdown, L1TF, MDS, and others) that allow potential side-channel attacks. The Linux kernel implements mitigations for these vulnerabilities, but these protections can reduce system performance by 5-30% depending on the workload.

Performance Impact:

  • Memory-intensive workloads: 5-10% overhead

  • System call-heavy workloads: 15-30% overhead

  • GPU compute workloads: 3-8% overhead (varies by operation)

Security Considerations:

Disabling mitigations should only be done in environments where:

  • Systems are physically isolated or on trusted networks

  • No untrusted code or containers are executed

  • Multi-tenant workloads are not running

  • Maximum performance is critical and security risks are understood and accepted

Danger

Disabling CPU security mitigations removes protections against known CPU vulnerabilities including Spectre, Meltdown, L1TF, MDS, TAA, and others. Only disable mitigations in trusted, isolated environments where you control all code execution.

Using the nvidia-no-mitigations Profile#

The nvidia-no-mitigations profile contains a simple configuration:

[main]
summary=NVIDIA no mitigations settings

[bootloader]
cmdline_mitigations=mitigations=off

This adds the mitigations=off kernel parameter, which disables all CPU vulnerability mitigations.

Method 1: Applying nvidia-no-mitigations with Your Platform Profile#

To combine your platform-specific profile with the no-mitigations setting:

  1. Check your current active profile:

    sudo tuned-adm active
    

    Example output: Current active profile: dgx-h100-performance

  2. Apply your platform profile merged with nvidia-no-mitigations:

    sudo tuned-adm profile dgx-h100-performance nvidia-no-mitigations
    

    Replace dgx-h100-performance with your actual platform profile (for example, dgx-a100-performance, dgx-b200-performance, and so forth).

  3. Verify the profile is active:

    sudo tuned-adm active
    

    Output should show: Current active profile: dgx-h100-performance nvidia-no-mitigations

  4. Reboot the system for the kernel command line changes to take effect:

    sudo reboot
    
  5. After reboot, verify the mitigations are disabled:

    cat /proc/cmdline | grep mitigations
    

    You should see mitigations=off in the output.

  6. Check the current mitigation status:

    grep . /sys/devices/system/cpu/vulnerabilities/*
    

    The output should show “Mitigation” entries are either disabled or show “Vulnerable” status, indicating mitigations are not active.

Method 2: Creating a Custom Profile with Mitigations Disabled#

If you want to create a permanent custom profile that includes your platform settings and disabled mitigations:

  1. Create a custom profile directory:

    sudo mkdir -p /etc/tuned/dgx-h100-no-mitigations
    
  2. Create the profile configuration:

    sudo vi /etc/tuned/dgx-h100-no-mitigations/tuned.conf
    
  3. Add the following configuration:

    [main]
    include=dgx-h100-performance
    summary=DGX H100 Performance with Security Mitigations Disabled
    
    [bootloader]
    cmdline_mitigations=mitigations=off
    

    This profile inherits all settings from dgx-h100-performance and adds the mitigations=off parameter.

  4. Activate the custom profile:

    sudo tuned-adm profile dgx-h100-no-mitigations
    
  5. Verify the profile is active:

    sudo tuned-adm active
    sudo tuned-adm verify
    
  6. Reboot the system:

    sudo reboot
    
  7. After reboot, verify mitigations are disabled as shown in Method 1.

Method 3: Selective Mitigation Disabling#

If you want more granular control, you can disable specific mitigations instead of all of them:

  1. Create a custom profile:

    sudo mkdir -p /etc/tuned/dgx-h100-selective-mitigations
    sudo vi /etc/tuned/dgx-h100-selective-mitigations/tuned.conf
    
  2. Configure selective mitigations:

    [main]
    include=dgx-h100-performance
    summary=DGX H100 with Selective Mitigations
    
    [bootloader]
    # Disable specific mitigations individually
    cmdline_spectre_v2=spectre_v2=off
    cmdline_spec_store_bypass=spec_store_bypass_disable=off
    cmdline_l1tf=l1tf=off
    cmdline_mds=mds=off
    cmdline_tsx_async_abort=tsx_async_abort=off
    cmdline_kpti=nopti
    

    Available options for selective disablement:

    • spectre_v2=off - Disable Spectre Variant 2 mitigations

    • spec_store_bypass_disable=off - Disable Spectre Variant 4 mitigations

    • l1tf=off - Disable L1 Terminal Fault mitigations

    • mds=off - Disable Microarchitectural Data Sampling mitigations

    • tsx_async_abort=off - Disable TSA mitigations

    • nopti - Disable Page Table Isolation (Meltdown mitigation)

  3. Activate the profile and reboot:

    sudo tuned-adm profile dgx-h100-selective-mitigations
    sudo reboot
    

Verifying Mitigation Status#

After disabling mitigations and rebooting, verify the configuration:

  1. Check kernel command line:

    cat /proc/cmdline
    

    Look for mitigations=off or your specific mitigation parameters.

  2. Check CPU vulnerability status:

    grep . /sys/devices/system/cpu/vulnerabilities/*
    

    Example output with mitigations disabled:

    /sys/devices/system/cpu/vulnerabilities/itlb_multihit:Not affected
    /sys/devices/system/cpu/vulnerabilities/l1tf:Vulnerable
    /sys/devices/system/cpu/vulnerabilities/mds:Vulnerable; SMT vulnerable
    /sys/devices/system/cpu/vulnerabilities/meltdown:Vulnerable
    /sys/devices/system/cpu/vulnerabilities/spec_store_bypass:Vulnerable
    /sys/devices/system/cpu/vulnerabilities/spectre_v1:Vulnerable
    /sys/devices/system/cpu/vulnerabilities/spectre_v2:Vulnerable
    /sys/devices/system/cpu/vulnerabilities/tsx_async_abort:Not affected
    

    The Vulnerable status indicates mitigations are disabled.

  3. Compare with enabled mitigations (for reference):

    Example output with mitigations enabled:

    /sys/devices/system/cpu/vulnerabilities/l1tf:Mitigation: PTE Inversion
    /sys/devices/system/cpu/vulnerabilities/mds:Mitigation: Clear buffers
    /sys/devices/system/cpu/vulnerabilities/meltdown:Mitigation: PTI
    /sys/devices/system/cpu/vulnerabilities/spec_store_bypass:Mitigation: SSB disabled
    /sys/devices/system/cpu/vulnerabilities/spectre_v1:Mitigation: usercopy/swapgs barriers
    /sys/devices/system/cpu/vulnerabilities/spectre_v2:Mitigation: Enhanced IBRS
    

Re-enabling Security Mitigations#

If you need to re-enable security mitigations:

Method 1: Switch back to standard profile#

sudo tuned-adm profile dgx-h100-performance
sudo reboot

Method 2: Remove nvidia-no-mitigations from merged profile#

If you were using a merged profile:

# Instead of: dgx-h100-performance nvidia-no-mitigations
# Use:
sudo tuned-adm profile dgx-h100-performance
sudo reboot

Method 3: Delete custom profile#

If you created a custom profile:

sudo tuned-adm profile dgx-h100-performance
sudo rm -rf /etc/tuned/dgx-h100-no-mitigations
sudo reboot

After rebooting, verify mitigations are enabled:

grep . /sys/devices/system/cpu/vulnerabilities/*

You should see Mitigation entries instead of Vulnerable status.

Performance Testing Recommendations#

When disabling mitigations, measure the actual performance impact for your specific workload:

  1. Benchmark with mitigations enabled (baseline):

    sudo tuned-adm profile dgx-h100-performance
    sudo reboot
    # Run your performance benchmarks and record results
    
  2. Benchmark with mitigations disabled:

    sudo tuned-adm profile dgx-h100-performance nvidia-no-mitigations
    sudo reboot
    # Run the same benchmarks and compare results
    
  3. Calculate the performance improvement:

    • If improvement is < 5%, consider keeping mitigations enabled for security

    • If improvement is > 10%, the trade-off may be worthwhile in trusted environments

    • Document your findings and revisit the decision periodically

Practical Use Case Scenarios#

Here are some common scenarios and recommended profile configurations.

Scenario 1: Standard Production DGX H100 System#

For most production workloads, use the default platform profile:

sudo tuned-adm profile dgx-h100-performance

This profile includes all necessary optimizations from nvidia-base and dgx-base.

Scenario 2: High-Performance GPU Training#

For GPU training workloads requiring maximum performance, use the default platform profile:

sudo tuned-adm profile dgx-h100-performance

This profile provides platform-specific optimizations including IOMMU settings, console configuration, and PCI settings optimized for DGX H100 systems.

To verify the profile is active:

sudo tuned-adm active

Output: Current active profile: dgx-h100-performance

Scenario 3: High-Performance Inference Server#

Create a custom profile optimized for inference workloads with low latency requirements:

sudo mkdir -p /etc/tuned/dgx-inference
sudo vi /etc/tuned/dgx-inference/tuned.conf
[main]
include=dgx-h100-performance
summary=Optimized for inference workloads

[bootloader]
# Isolate CPUs for inference processes (adjust based on your CPU count)
cmdline_isolcpus=isolcpus=8-63
# Allocate hugepages for better memory access
cmdline_hugepages=hugepagesz=2M hugepages=8192

[sysctl]
# Minimize latency
vm.swappiness=1
# Optimize for response time
kernel.sched_latency_ns=1000000
kernel.sched_min_granularity_ns=100000

Then activate:

sudo tuned-adm profile dgx-inference

Scenario 4: System with Crashdump Debugging Required#

When you need to capture crash information for debugging:

sudo tuned-adm profile dgx-h100-crashdump

This automatically configures crashkernel memory reservation and panic behavior.

Scenario 5: Multi-Node Training Cluster with Network Optimization#

For distributed training across multiple nodes requiring network tuning:

sudo mkdir -p /etc/tuned/dgx-distributed-training
sudo vi /etc/tuned/dgx-distributed-training/tuned.conf
[main]
include=dgx-h100-performance
summary=Optimized for multi-node distributed training

[sysctl]
# Network buffer tuning for high-throughput connections
net.core.rmem_max=536870912
net.core.wmem_max=536870912
net.core.rmem_default=134217728
net.core.wmem_default=134217728
net.ipv4.tcp_rmem=4096 87380 268435456
net.ipv4.tcp_wmem=4096 65536 268435456

# Increase connection backlog
net.core.netdev_max_backlog=10000
net.ipv4.tcp_max_syn_backlog=8192

# TCP tuning for high-speed networks
net.ipv4.tcp_congestion_control=bbr
net.core.default_qdisc=fq

Then activate:

sudo tuned-adm profile dgx-distributed-training

Best Practices#

  • Always create custom profiles in /etc/tuned/ to avoid conflicts with package updates

  • Test custom profiles on non-production systems before deploying to production

  • Use tuned-adm verify to ensure profiles are correctly applied

  • Document any custom settings and the reasons for the changes

  • For platform-specific systems, start with the appropriate NVIDIA profile and customize as needed

  • Use profile merging when you want to combine features from multiple profiles

  • Monitor system performance after applying or changing profiles to ensure desired results

  • Remember that bootloader changes (in the [bootloader] section) require a system reboot to take effect

  • Use tuned-adm profile_info <profile-name> to see detailed information about what a profile does

Note

After installing NVIDIA BaseOS, the appropriate dgx-<platform>-performance profile should be automatically activated. Verify this with sudo tuned-adm active during initial setup.

Additional Resources#

For more information about TuneD configuration options and advanced usage, refer to:

  • TuneD documentation: man tuned and man tuned-adm

  • TuneD configuration guide: man tuned.conf

  • /usr/share/doc/tuned/ directory for additional documentation