Platform Software Patches and Configurations#
This section provides information about software patches and configuration settings that are required or recommended for the Grace platform.
Linux Kernel Patches#
This section provides information about the Linux Kernel patches that support the Grace platform.
Upstream Bare Metal Linux Kernel#
The tables in this section list Linux kernel patches that are upstream, which means that the patches are accepted into the main Linux kernel branch.
Note
There might be circumstances where additional, dependent patches are required to support the patches listed in these tables (for example, when the patch listed is part of a larger series).
The Git description is pulled directly from the main Linux kernel Git log and is intended to help with searches and comparisons. The description might contain spelling and grammatical errors.
The following table contains patches that enable functions and are required for bare metal support on the Grace platform.
LKML Discussion |
Git Commit |
Git Description |
Minimum Linux Kernel Release |
|---|---|---|---|
PCI: Mark some NVIDIA GPUs to avoid bus reset |
v5.13 |
||
PCI: Add support for ACPI _RST reset method |
v5.15 |
||
dma-mapping: remove bogus test for pfn_valid from dma_map_resource |
v5.16 |
||
i2c: tegra: Add the ACPI support |
v5.17 |
||
spi: tegra210-quad: use devm call for cdata memory |
|||
i2c: tegra: use i2c_timings for bus clock freq |
|||
gpio: tegra186: Add IRQ per bank for Tegra241 |
|||
gpio: tegra186: Add support for Tegra241 |
|||
i2c: tegra: Add SMBus block read function |
v5.18 |
||
device property: Add fwnode_irq_get_byname |
|||
i2c: smbus: Use device_*() functions instead of of_*() |
|||
docs: firmware-guide: ACPI: Add named interrupt doc |
|||
spi: tegra210-quad: use device_reset method |
|||
spi: tegra210-quad: add new chips to compatible |
|||
spi: tegra210-quad: add acpi support |
|||
spi: tegra210-quad: combined sequence mode |
|||
spi: tegra210-quad: Multi-cs support |
v6.0 |
||
arm64: tegra: Enable Tegra SPI & QSPI in deconfig |
v6.1 |
||
genirq: Use a maple tree for interrupt descriptor management |
v6.5 |
||
PCI: Extend ACS configurability |
v6.11 |
The following table contains patches that resolve critical issues and hardware errata.
LKML Discussion |
Git Commit |
Git Description |
Minimum Linux Kernel Release |
|---|---|---|---|
i2c: tegra: Set ACPI node as primary fwnode |
v6.2 |
||
i2c: tegra: Fix PEC support for SMBUS block read |
v6.4 |
||
irqchip/gicv3: Workaround for NVIDIA erratum T241-FABRIC-4 |
|||
drm/ast: Fix ARM compatibility |
|||
tpm_tis_spi: Account for SPI header when allocating TPM SPI xfer buffer |
v6.10 |
||
PCI: Use downstream bridges for distributing resources |
v6.15 |
The following table contains patches that resolve faults in enablement patches and other issues that have been discovered while testing the Grace platform with various workloads.
LKML Discussion |
Git Commit |
Git Description |
Minimum Linux Kernel Release |
|---|---|---|---|
spi: tegra210-quad: Fix combined sequence |
v6.1 |
||
spi: tegra210-quad: Don’t initialise DMA if not supported |
|||
spi: tegra210-quad: Fix duplicate resource error |
|||
mm: remember young/dirty bit for page migrations |
|||
rtc: efi: Enable SET/GET WAKEUP services as optional |
v6.2 |
||
KVM: arm64: GICv4.1: Fix race with doorbell on VPE activation/deactivation |
|||
ACPI/IORT: Update SMMUv3 DeviceID support |
|||
spi: tegra210-quad: Fix validate combined sequence |
v6.3 |
||
spi: tegra210-quad: Fix iterator outside loop |
|||
spi: tegra210-quad: set half duplex flag |
|||
arm64: kaslr: don’t pretend KASLR is enabled if offset < MIN_KIMG_ALIGN |
|||
ACPI: processor: Reorder acpi_processor_driver_init() |
|||
thermal: core: Introduce thermal_cooling_device_present() |
|||
thermal: core: Introduce thermal_cooling_device_update() |
|||
ACPI: processor: thermal: Update CPU cooling devices on cpufreq policy changes |
|||
thermal: core: Drop excessive lockdep_assert_held() calls |
|||
PCI/AER: Configure ECRC only if AER is native |
|||
spi: Add TPM HW flow flag |
v6.4 |
||
spi: tegra210-quad: Enable TPM wait polling |
|||
arm64: module: rework module VA range selection |
v6.5 |
||
tpm_tis-spi: Add hardware wait polling |
v6.6 |
||
iommu/arm-smmu-v3: Fix soft lockup triggered by arm_smmu_mm_invalidate_range |
|||
mm/page_alloc: fix min_free_kbytes calculation regarding ZONE_MOVABLE |
|||
mm/mglru: fix underprotected page cache |
v6.7 |
||
mm/mglru: try to stop at high watermarks |
|||
mm/mglru: respect min_ttl_ms with memcgs |
|||
mm/mglru: reclaim offlined memcgs harder |
|||
PCI/MSI: Prevent MSI hardware interrupt number truncation |
v6.8 |
||
PCI/ASPM: Update save_state when configuration changes |
v6.9 |
||
gpio: tegra186: Fix tegra186_gpio_is_accessible() check |
|||
swiotlb: Fix double-allocation of slots due to broken alignment handling |
|||
swiotlb: Honour dma_alloc_coherent() alignment in swiotlb_alloc() |
|||
swiotlb: Fix alignment checks when both allocation and DMA masks are present |
|||
iommu/dma: Force swiotlb_max_mapping_size on an untrusted device |
|||
swiotlb: extend buffer pre-padding to alloc_align_mask if necessary |
|||
arm64: tlb: Fix TLBI RANGE operand |
v6.10 |
||
arm64: tlb: Improve __TLBI_VADDR_RANGE() |
|||
arm64: tlb: Allow range operation for MAX_TLBI_RANGE_PAGES |
|||
PCI: Clear Secondary Status errors after enumeration |
|||
PCI/DOE: Support discovery version 2 |
|||
cpufreq/cppc: Don’t compare desired_perf in target() |
v6.11 |
||
mm: fix old/young bit handling in the faulting path |
|||
i2c: tegra: Do not mark ACPI devices as irq safe |
|||
ACPI: PRM: Find EFI_MEMORY_RUNTIME block for PRM handler and context |
v6.12 |
||
mm/gup: stop leaking pinned pages in low memory conditions |
|||
mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases |
|||
mm/gup: handle NULL pages in unpin_user_pages() |
|||
PCI: Fix pci_enable_acs() support for the ACS quirks |
|||
cppc_cpufreq: Use desired perf if feedback ctrs are 0 or unchanged |
v6.13 |
||
cppc_cpufreq: Remove HiSilicon CPPC workaround |
|||
mm/vmscan: wake up flushers conditionally to avoid cgroup OOM |
|||
ACPI/HMAT: Move HMAT messages to pr_debug() |
v6.14 |
||
ACPI: PRM: Remove unnecessary strict handler address checks |
|||
RDMA/mlx5: Fix a WARN during dereg_mr for DM type |
|||
Fix mmu notifiers for range-based invalidates |
|||
PCI/ACS: Fix ‘pci=config_acs=’ parameter |
v6.15 |
||
iommu: Skip PASID validation for devices without PASID capability |
|||
spi: tegra210-quad: use WARN_ON_ONCE instead of WARN_ON for timeouts |
|||
spi: tegra210-quad: add rate limiting and simplify timeout error message |
|||
i2c: tegra: check msg length in SMBUS block read |
v6.16 |
||
i2c: tegra: Fix reset error handling with ACPI |
|||
spi: tegra210-quad: modify chip select (CS) deactivation |
|||
i2c: tegra: Use internal reset when reset property is not available |
v6.17 |
||
drm/ast: Use msleep instead of mdelay for edid read |
v6.18 |
||
n/a |
spi: tegra210-quad: Fix timeout handling |
n/a |
|
n/a |
spi: tegra210-quad: Check hardware status on timeout |
n/a |
The following table contains optional patches that improve performance on Grace platforms.
LKML Discussion |
Git Commit |
Git Description |
Minimum Linux Kernel Release |
|---|---|---|---|
ACPI: thermal: Add Thermal fast Sampling Period (_TFP) support |
v6.8 |
||
ACPI: processor: reduce CPUFREQ thermal reduction pctg for Tegra241 |
|||
ACPI: arm64: export acpi_arch_thermal_cpufreq_pctg() |
|||
mm: allow deferred splitting of arbitrary anon large folios |
|||
mm: non-pmd-mappable, large folios for folio_add_new_anon_rmap() |
|||
mm: thp: introduce multi-size THP sysfs interface |
|||
mm: thp: support allocation of anonymous multi-size THP |
|||
selftests/mm/kugepaged: restore thp settings at exit |
|||
selftests/mm: factor out thp settings management |
|||
selftests/mm: support multi-size THP interface in thp_settings |
|||
selftests/mm/khugepaged: enlighten for multi-size THP |
|||
selftests/mm/cow: generalize do_run_with_thp() helper |
|||
selftests/mm/cow: add tests for anonymous multi-size THP |
|||
mm: clarify the spec for set_ptes() |
v6.9 |
||
mm: thp: batch-collapse PMD with set_ptes() |
|||
mm: introduce pte_advance_pfn() and use for pte_next_pfn() |
|||
arm64/mm: convert pte_next_pfn() to pte_advance_pfn() |
|||
x86/mm: convert pte_next_pfn() to pte_advance_pfn() |
|||
mm: tidy up pte_next_pfn() definition |
|||
arm64/mm: convert READ_ONCE(*ptep) to ptep_get(ptep) |
|||
arm64/mm: convert set_pte_at() to set_ptes(…, 1) |
|||
arm64/mm: convert ptep_clear() to ptep_get_and_clear() |
|||
arm64/mm: new ptep layer to manage contig bit |
|||
arm64/mm: dplit __flush_tlb_range() to elide trailing DSB |
|||
arm64/mm: wire up PTE_CONT for user mappings |
|||
arm64/mm: implement new wrprotect_ptes() batch API |
|||
arm64/mm: implement new [get_and_]clear_full_ptes() batch APIs |
|||
mm: add pte_batch_hint() to reduce scanning in folio_pte_batch() |
|||
arm64/mm: implement pte_batch_hint() |
|||
arm64/mm: __always_inline to improve fork() perf |
|||
arm64/mm: automatically fold contpte mappings |
|||
arm64/io: Provide a WC friendly __iowriteXX_copy() |
v6.10 |
||
net: hns3: Remove io_stop_wc() calls after __iowrite64_copy() |
|||
IB/mlx5: Use __iowrite64_copy() for write combining stores |
The following table contains optional patches that enable functions, or resolve faults, with performance tooling on Grace platforms.
LKML Discussion |
Git Commit |
Git Description |
Minimum Linux Kernel Release |
|---|---|---|---|
ACPICA: Add support for ARM Performance Monitoring Unit Table |
v5.19 |
||
ACPI: ARM Performance Monitoring Unit Table (APMT) initial support |
v6.2 |
||
perf: arm_cspmu: Add support for ARM CoreSight PMU driver |
|||
perf: arm_cspmu: Add support for NVIDIA SCF and MCF attribute |
|||
ACPI: APMT: Fix kerneldoc and indentation |
|||
perf: arm_cspmu: Fix modular builds due to missing MODULE_LICENSE()s |
|||
perf: arm_cspmu: Fix build failure on x86_64 |
|||
perf: arm_cspmu: Fix module cyclic dependency |
|||
perf: arm_cspmu: Fix variable dereference warning |
v6.4 |
||
arm64: defconfig: Enable ARM CoreSight PMU driver |
|||
perf: arm_cspmu: Set irq affinitiy only if overflow interrupt is used |
v6.5 |
||
perf/arm_cspmu: Fix event attribute type |
|||
perf: arm_cspmu: Add missing MODULE_DEVICE_TABLE |
|||
ACPI/APMT: Don’t register invalid resource |
|||
perf/arm_cspmu: Decouple APMT dependency |
|||
perf/arm_cspmu: Clean up ACPI dependency |
|||
coresight: etm4x: Allocate and device assign ‘struct etmv4_drvdata’ earlier |
v6.6 |
||
coresight: etm4x: Drop iomem ‘base’ argument from etm4_probe() |
|||
coresight: etm4x: Change etm4_platform_driver driver for MMIO devices |
|||
coresight: platform: acpi: Ignore the absence of graph |
|||
coresight: etm4x: Add ACPI support in platform driver |
|||
arm_pmu: acpi: Add a representative platform device for TRBE |
|||
perf vendor events arm64: Update N2 and V2 metrics and events using Arm telemetry repo |
|||
perf: arm_cspmu: Reject events meant for other PMUs |
v6.7 |
||
perf cs-etm: Fix incorrect or missing decoder for raw trace |
|||
perf: arm_cspmu: Separate Arm and vendor module |
|||
coresight: trbe: Enable ACPI based TRBE devices |
v6.8 |
||
coresight: trbe: Add a representative coresight_platform_data for TRBE |
|||
perf parse-events: Make legacy events lower priority than sysfs/JSON |
|||
arm64: Add Neoverse-V2 part |
v6.10 |
||
tools headers arm64: Sync arm64’s cputype.h with the kernel sources |
|||
perf cs-etm: Create decoders after both AUX and HW_ID search passes |
v6.12 |
||
perf: cs-etm: Allocate queues for all CPUs |
|||
perf: cs-etm: Move traceid_list to each queue |
|||
perf: cs-etm: Create decoders based on the trace ID mappings |
|||
perf: cs-etm: Only save valid trace IDs into files |
|||
perf: cs-etm: Support version 0.1 of HW_ID packets |
|||
perf: cs-etm: Print queue number in raw trace dump |
|||
coresight: Remove unused ETM Perf stubs |
|||
coresight: Clarify comments around the PID of the sink owner |
|||
coresight: Move struct coresight_trace_id_map to common header |
|||
coresight: Expose map arguments in trace ID API |
|||
coresight: Make CPU id map a property of a trace ID map |
|||
coresight: Use per-sink trace ID maps for Perf sessions |
|||
coresight: Remove pending trace ID release mechanism |
|||
coresight: Emit sink ID in the HW_ID packets |
|||
coresight: Make trace ID map spinlock local to the map |
|||
perf arm-spe: Rename arm_spe__synth_data_source_generic() |
v6.13 |
||
perf arm-spe: Rename the common data source encoding |
|||
perf arm-spe: Introduce arm_spe__is_homogeneous() |
|||
perf arm-spe: Use metadata to decide the data source feature |
|||
perf arm-spe: Remove the unused ‘midr’ field |
|||
perf arm-spe: Add Neoverse-V2 to common data source encoding list |
|||
perf arm-spe: Add Cortex CPUs to common data source encoding list |
|||
perf: arm_cspmu: nvidia: remove unsupported SCF events |
v6.14 |
||
perf: arm_cspmu: nvidia: fix sysfs path in the kernel doc |
|||
perf: arm_cspmu: nvidia: enable NVLINK-C2C port filtering |
|||
perf: arm_cspmu: nvidia: monitor all ports by default |
|||
arm64: amu: Delay allocating cpumask for AMU FIE support |
|||
arch_topology: init capacity_freq_ref to 0 |
v6.15 |
||
cpufreq: Allow arch_freq_get_on_cpu to return an error |
|||
cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry |
|||
arm64: Provide an AMU-based version of arch_freq_get_on_cpu |
|||
arm64: Update AMU-based freq scale factor on entering idle |
|||
arm64: Utilize for_each_cpu_wrap for reference lookup |
The following table contains patches that support NVIDIA CUDA® features on Grace platforms.
LKML Discussion |
Git Commit |
Git Description |
Minimum Linux Kernel Release |
|---|---|---|---|
mm: Convert page kmemcg type to a page memcg flag |
v5.11 |
||
RDMA/umem: Support importing dma-buf as user memory region |
v5.12 |
||
vfio/pci: remove vfio_pci_nvlink2 |
v5.13 |
||
mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove() |
v5.18 |
||
mm/migrate_device.c: copy pte dirty bit to page |
v6.0 |
||
mm/migrate_device.c: add missing flush_cache_page() |
|||
mm/migrate_device.c: flush TLB while holding PTL |
Linux Kernel Configs#
This section provides information about the Linux Kernel config settings for the Grace platform.
The following table contains config settings that enable function and are required for bare metal support on the Grace platform.
Kernel Config |
Description |
|---|---|
CONFIG_NR_CPUS=512 |
Supports the maximum Grace configuration. |
CONFIG_NODES_SHIFT=6 |
Supports the maximum Grace configuration. |
CONFIG_ARM_SMMU_V3_SVA=y |
Support shared virtual addressing. |
CONFIG_ARM64_PMEM=y |
Support persistent memory. |
CONFIG_ARM_SDE_INTERFACE=y |
Support RAS notifications. |
CONFIG_BLK_DEV_PMEM=m |
Enable persistent memory block device. |
CONFIG_DEVICE_MIGRATION=y |
Enable device physical page migration. |
CONFIG_DEVICE_PRIVATE=y |
Supports unaddressable device memory; only required when using NVIDIA GPU Driver. |
CONFIG_GPIO_TEGRA186=y |
Supports the GPIO interface. |
CONFIG_HOTPLUG_PCI_PCIE=y |
Supports the PCIe native hotplug. |
CONFIG_IOMMU_DEFAULT_PASSTHROUGH=n |
Disable IOMMU translation bypass for DMA. | Refer to Input-Output Memory Management Unit Passthrough in the NVIDIA Grace Performance Tuning Guide for more information. |
CONFIG_IOMMU_SVA=y |
Required for NVIDIA Unified Virtual Memory driver to detect and enable ATS support for the CUDA stack |
CONFIG_PCIE_DPC=y |
Supports downstream port containment. |
CONFIG_PCIE_EDR=y |
Enables the error disconnect recover support. |
CONFIG_SPI_TEGRA210_QUAD=y |
Support the QSPI controller. |
CONFIG_TCG_TIS_SPI=y |
Supports the TPM SPI interface. |
CONFIG_MTD_SPI_NOR=y |
Support the SPI NOR flash device. |
CONFIG_IPMI_SSIF=m |
Supports the SMBus interface to BMC. |
arch/arm64/include/asm/irq.h:
#if defined(CONFIG_ARM_GIC_V3_ITS)
#define NR_IRQS (1 << 19)
#endif
|
Supports the maximum Grace configuration. | Not required when the kernel carries 721255b9826bd11c7a38b585905fc2dd0fb94e52. |
CONFIG_ARCH_TEGRA_241_SOC=y |
Supports reading Grace fuses. | This is only available on kernels that carry 8402074f30238ee1bdc70b843932cd7350830ab6. |
CONFIG_TEGRA_IVC=y |
Enables the Inter Processor Communication framework. |
CONFIG_USB_XHCI_PCI_RENESAS=y|m |
Enables the Renesas xHCI controller. |
The following table contains the recommended config settings that provide performance improvements for certain workloads.
Kernel Config |
Description |
|---|---|
CONFIG_ARM64_64K_PAGES=y |
Use 64K page size; required when using NVIDIA GPU Driver. |
CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y |
Set default CPU frequency governor to performance. |
CONFIG_CPU_FREQ_GOV_SCHEDUTIL=y |
Supports the schedutil CPU frequency governor. |
CONFIG_PREEMPT_DYNAMIC=y |
Allows dynamic preemption tuning using preempt. |
CONFIG_PREEMPT_NONE=y |
Default dynamic preemption tuning to preempt=none for throughput. |
CONFIG_DMABUF_HEAPS=y |
Enables DMA-BUF memory heaps. |
CONFIG_DMABUF_HEAPS_SYSTEM=y |
Enables the system dmabuf heap. |
CONFIG_DMI_SYSFS=y |
Enables the export of raw DMI table data. |
CONFIG_INIT_ON_ALLOC_DEFAULT_ON=n |
Disables heap memory zeroing on allocation by default. |
CONFIG_IOMMU_DEFAULT_DMA_LAZY=y |
Improves IOMMU performance by enabling lazy mode. |
The following table contains optional config settings that enable the performance tooling functions on Grace platforms.
Kernel Config |
Description |
|---|---|
CONFIG_ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU=m |
Enables the ARM CoreSight PMU driver. |
CONFIG_NVIDIA_CORESIGHT_PMU_ARCH_SYSTEM_PMU=m |
Enables the NVIDIA ARM Coresight PMU driver. |
CONFIG_ARM_SPE_PMU=m |
Enables access to the ARM SPE registers. |
The following table contains config settings that are required when supporting partner diagnostics on Grace platforms.
Kernel Config |
Description |
|---|---|
CONFIG_ACPI_APEI_EINJ=m |
Provides a hardware error injection mechanism. | Used for debugging and testing APEI features. |
CONFIG_ARM_FFA_TRANSPORT=m |
Enables the Arm Firmware Framework driver. |
CONFIG_ARM_FFA_SMCCC=y |
Enables Arm Secure Monitor Call Calling Convention |
CONFIG_CPU_FREQ_STAT=y |
Exports CPU frequency statistics information through sysfs. |
CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y |
Enables the “conservative” governor. |
CONFIG_CPU_FREQ_GOV_ONDEMAND=y |
Enables the “ondemand” governor. |
CONFIG_CPU_FREQ_GOV_POWERSAVE=y |
Enables the “powersave” governor. |
CONFIG_CPU_FREQ_GOV_USERSPACE=y |
Enables the “userspace” governor, which allows user-space utilities to set the CPU frequency |
CONFIG_STRICT_DEVMEM=y |
Filter access to the |
CONFIG_SENSORS_ACPI_POWER=m |
Enables power telemetry through hwmon. | Enable when sysfs endpoints for hardware power monitoring are not present. |