Understanding Your Grace Machine#

After you boot up your Grace machine, run sudo ipmitool fru print command and check the information about the NVIDIA Grace module.

Here is the sample output from a Grace CPU Superchip machine. The FRU Device Description is PG535, and the Product Name is C2.

FRU Device Description  : PG535 (ID 192)
Board Mfg Date          : [REDACTED]
Board Mfg               : NVIDIA
Board Product           : PG535
Board Serial            : [REDACTED]
Board Part Number       : 699-2G535-0200-DV2
Product Manufacturer    : NVIDIA
Product Name            : C2
Product Part Number     : 900-2G535-0000-000
Product Version         : B-R00
Product Serial          : [REDACTED]

Here is the sample output from a Grace Hopper Superchip machine. The FRU Device Description is PG530, and the Product Name is GH200.

FRU Device Description  : PG530 (ID 133)
Board Mfg Date          : [REDACTED]
Board Mfg               : NVIDIA
Board Product           : PG530
Board Serial            : [REDACTED]
Board Part Number       : 699-2G530-0206-QS1
Product Manufacturer    : NVIDIA
Product Name            : GH200 480GB
Product Part Number     : 900-2G530-0000-000
Product Version         : A-R00
Product Serial          : [REDACTED]

Here is the sample output from a Grace Blackwell Superchip machine. The FRU Device Description is PG548, and the Product Name is GB200.

FRU Device Description  : ProcMod_0 (ID 71)
Board Mfg Date          : [REDACTED]
Board Mfg               : NVIDIA
Board Product           : PG548
Board Serial            : [REDACTED]
Board Part Number       : 699-2G548-1201-RC3
Board Extra             : Version: S
Board Extra             : Rework:
Product Manufacturer    : NVIDIA
Product Name            : GB200 1CPU:2GPU Board PC
Product Part Number     : 699-2G548-1201-RC3
Product Version         : C05
Product Serial          : [REDACTED]

Checking the CPUs#

The lscpu command-line utility in Linux gets CPU information about the system, fetches the CPU architecture information from the sysfs and /proc/cpuinfo files, and displays the information in a terminal.

After you boot your Grace machine, run the lscpu command and check the CPUs.

Here is the sample output from a Grace CPU Superchip machine:

Architecture:                   aarch64
  CPU op-mode(s):                 64-bit
  Byte Order:                     Little Endian
CPU(s):                         144
  On-line CPU(s) list:            0-143
Vendor ID:                      ARM
  Model:                          0
  Thread(s) per core:             1
  Core(s) per socket:             72
  Socket(s):                      2
  Stepping:                       r0p0
  Frequency boost:                disabled
  CPU max MHz:                    3582.0000
  CPU min MHz:                    81.0000
  BogoMIPS:                       2000.00
  Flags:                          fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp a
                                  simdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4
                                  asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs
                                  sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesh
                                  a3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh bti
Caches (sum of all):
  L1d:                           9 MiB (144 instances)
  L1i:                           9 MiB (144 instances)
  L2:                            144 MiB (144 instances)
  L3:                            228 MiB (2 instances)
NUMA:
  NUMA node(s):                  2
  NUMA node0 CPU(s):             0-71
  NUMA node1 CPU(s):             72-143
Vulnerabilities:
  Itlb multihit:                 Not affected
  L1tf:                          Not affected
  Mds:                           Not affected
  Meltdown:                      Not affected
  Mmio stale data:               Not affected
  Retbleed:                      Not affected
  Spec store bypass:             Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:                    Mitigation; --user pointer sanitization
  Spectre v2:                    Not affected
  Srbds:                         Not affected
  Tsx async abort:               Not affected

From this output, you can see information such as the number of CPU sockets, how many cores per socket, how many hardware threads per core, and the max/min CPU frequency. You can also find the size of the L1, the L2, and the L3 caches.

Here is the sample output of a Grace Hopper Superchip system:

Architecture:                   aarch64
  CPU op-mode(s):                 64-bit
  Byte Order:                     Little Endian
CPU(s):                         72
  On-line CPU(s) list:            0-71
Vendor ID:                      ARM
  Model:                          0
  Thread(s) per core:             1
  Core(s) per socket:             72
  Socket(s):                      1
  Stepping:                       r0p0
  Frequency boost:                disabled
  CPU max MHz:                    3591.0000
  CPU min MHz:                    81.0000
  BogoMIPS:                       2000.00
  Flags:                          fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp
                                  cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha
                                  512 sve asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpod
                                  p sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint s
                                  vei8mm svebf16 i8mm bf16 dgh bti
Caches (sum of all):
  L1d:                           4.5 MiB (72 instances)
  L1i:                           4.5 MiB (72 instances)
  L2:                            72 MiB (72 instances)
  L3:                            114 MiB (1 instance)
NUMA:
  NUMA node(s):                  9
  NUMA node0 CPU(s):             0-71
  NUMA node1 CPU(s):
  NUMA node2 CPU(s):
  NUMA node3 CPU(s):
  NUMA node4 CPU(s):
  NUMA node5 CPU(s):
  NUMA node6 CPU(s):
  NUMA node7 CPU(s):
  NUMA node8 CPU(s):
Vulnerabilities:
  Itlb multihit:                 Not affected
  L1tf:                          Not affected
  Mds:                           Not affected
  Meltdown:                      Not affected
  Mmio stale data:               Not affected
  Retbleed:                      Not affected
  Spec store bypass:             Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:                    Mitigation; --user pointer sanitization
  Spectre v2:                    Not affected
  Srbds:                         Not affected
  Tsx async abort:               Not affected

Note

This output shows nine NUMA nodes. The first node corresponds to the Grace CPU, the second to the Hopper GPU, and the remaining seven nodes correspond to NVIDIA Multi-Instance GPU (MIG) instances.

The seven MIG instances can be ignored if MIG mode is not being used.

Here is the sample output of a system with two Grace Blackwell Superchips:

Architecture:             aarch64
  CPU op-mode(s):         64-bit
  Byte Order:             Little Endian
CPU(s):                   144
  On-line CPU(s) list:    0-143
Vendor ID:                ARM
  Model:                  0
  Thread(s) per core:     1
  Core(s) per socket:     72
  Socket(s):              2
  Stepping:               r0p0
  Frequency boost:        disabled
  CPU max MHz:            3384.0000
  CPU min MHz:            81.0000
  BogoMIPS:               2000.00
  Flags:                  fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs sb pa
                          ca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh bti
Caches (sum of all):
  L1d:                    9 MiB (144 instances)
  L1i:                    9 MiB (144 instances)
  L2:                     144 MiB (144 instances)
  L3:                     228 MiB (2 instances)
NUMA:
  NUMA node(s):           34
  NUMA node0 CPU(s):      0-71
  NUMA node1 CPU(s):      72-143
  NUMA node2 CPU(s):
  NUMA node3 CPU(s):
  NUMA node4 CPU(s):
  NUMA node5 CPU(s):
  NUMA node6 CPU(s):
  NUMA node7 CPU(s):
  NUMA node8 CPU(s):
  NUMA node9 CPU(s):
  NUMA node10 CPU(s):
  NUMA node11 CPU(s):
  NUMA node12 CPU(s):
  NUMA node13 CPU(s):
  NUMA node14 CPU(s):
  NUMA node15 CPU(s):
  NUMA node16 CPU(s):
  NUMA node17 CPU(s):
  NUMA node18 CPU(s):
  NUMA node19 CPU(s):
  NUMA node20 CPU(s):
  NUMA node21 CPU(s):
  NUMA node22 CPU(s):
  NUMA node23 CPU(s):
  NUMA node24 CPU(s):
  NUMA node25 CPU(s):
  NUMA node26 CPU(s):
  NUMA node27 CPU(s):
  NUMA node28 CPU(s):
  NUMA node29 CPU(s):
  NUMA node30 CPU(s):
  NUMA node31 CPU(s):
  NUMA node32 CPU(s):
  NUMA node33 CPU(s):
Vulnerabilities:
  Gather data sampling:   Not affected
  Itlb multihit:          Not affected
  L1tf:                   Not affected
  Mds:                    Not affected
  Meltdown:               Not affected
  Mmio stale data:        Not affected
  Reg file data sampling: Not affected
  Retbleed:               Not affected
  Spec rstack overflow:   Not affected
  Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:             Mitigation; __user pointer sanitization
  Spectre v2:             Not affected
  Srbds:                  Not affected
  Tsx async abort:        Not affected

This output shows 34 NUMA nodes. NUMA node0 corresponds to the Grace CPU on the first GB200 Superchip. NUMA node1 corresponds to the Grace CPU on the second GB200 Superchip. NUMA node2 and NUMA node10 correspond to the two Blackwell GPUs on the first GB200 Superchip, and the trailing seven nodes correspond to NVIDIA Multi-Instance GPU (MIG) instances for that GPU, respectively. NUMA node18 and NUMA node26 correspond to the two Blackwell GPUs on the second GB200 Superchip, and the trailing seven nodes correspond to NVIDIA Multi-Instance GPU (MIG) instances for that GPU, respectively.

The seven MIG instances per GPU can be ignored if MIG mode is not being used.

Checking the Non-Uniform Memory Access Settings#

The lscpu output includes basic information about the Non-Uniform Memory Access (NUMA) settings on your Grace machine.

To understand more about the NUMA settings, run the numactl -H command, and here is the sample output from a Grace Superchip machine:

available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67
68 69 70 71
node 0 size: 245090 MB
node 0 free: 99633 MB
node 1 cpus: 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110
111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128
129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
node 1 size: 245317 MB
node 1 free: 126895 MB
node distances:
node 0   1
  0: 10 40
  1: 40 10

The output shows that there are two NUMA nodes on this machine, the number of cores on each NUMA node, and how much memory is available for each node. The output also shows the node distances between NUMA nodes, which helps the Kernel scheduler execute application threads on CPU cores that are closest to the memory resident data.

Here is the sample output from a Grace Hopper Superchip system:

available: 9 nodes (0-8)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70
71
node 0 size: 490310 MB
node 0 free: 166560 MB
node 1 cpus:
node 1 size: 95232 MB
node 1 free: 92094 MB
node 2 cpus:
node 2 size: 0 MB
node 2 free: 0 MB
node 3 cpus:
node 3 size: 0 MB
node 3 free: 0 MB
node 4 cpus:
node 4 size: 0 MB
node 4 free: 0 MB
node 5 cpus:
node 5 size: 0 MB
node 5 free: 0 MB
node 6 cpus:
node 6 size: 0 MB
node 6 free: 0 MB
node 7 cpus:
node 7 size: 0 MB
node 7 free: 0 MB
node 8 cpus:
node 8 size: 0 MB
node 8 free: 0 MB
node distances:
node     0   1    2    3    4    5    6    7    8
     0:  10  80   80   80   80   80   80   80   80
     1:  80  10   255  255  255  255  255  255  255
     2:  80  255  10   255  255  255  255  255  255
     3:  80  255  255  10   255  255  255  255  255
     4:  80  255  255  255  10   255  255  255  255
     5:  80  255  255  255  255  10   255  255  255
     6:  80  255  255  255  255  255  10   255  255
     7:  80  255  255  255  255  255  255  10   255
     8:  80  255  255  255  255  255  255  255  10

As mentioned in Checking the CPUs, if MIG is not used, the final seven NUMA nodes can be ignored.

Here is the sample output from a system with two Grace Blackwell Superchips:

available: 34 nodes (0-33)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
node 0 size: 490551 MB
node 0 free: 481845 MB
node 1 cpus: 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143
node 1 size: 490736 MB
node 1 free: 472190 MB
node 2 cpus:
node 2 size: 188416 MB
node 2 free: 188415 MB
node 3 cpus:
node 3 size: 0 MB
node 3 free: 0 MB
node 4 cpus:
node 4 size: 0 MB
node 4 free: 0 MB
node 5 cpus:
node 5 size: 0 MB
node 5 free: 0 MB
node 6 cpus:
node 6 size: 0 MB
node 6 free: 0 MB
node 7 cpus:
node 7 size: 0 MB
node 7 free: 0 MB
node 8 cpus:
node 8 size: 0 MB
node 8 free: 0 MB
node 9 cpus:
node 9 size: 0 MB
node 9 free: 0 MB
node 10 cpus:
node 10 size: 188416 MB
node 10 free: 188415 MB
node 11 cpus:
node 11 size: 0 MB
node 11 free: 0 MB
node 12 cpus:
node 12 size: 0 MB
node 12 free: 0 MB
node 13 cpus:
node 13 size: 0 MB
node 13 free: 0 MB
node 14 cpus:
node 14 size: 0 MB
node 14 free: 0 MB
node 15 cpus:
node 15 size: 0 MB
node 15 free: 0 MB
node 16 cpus:
node 16 size: 0 MB
node 16 free: 0 MB
node 17 cpus:
node 17 size: 0 MB
node 17 free: 0 MB
node 18 cpus:
node 18 size: 188416 MB
node 18 free: 188415 MB
node 19 cpus:
node 19 size: 0 MB
node 19 free: 0 MB
node 20 cpus:
node 20 size: 0 MB
node 20 free: 0 MB
node 21 cpus:
node 21 size: 0 MB
node 21 free: 0 MB
node 22 cpus:
node 22 size: 0 MB
node 22 free: 0 MB
node 23 cpus:
node 23 size: 0 MB
node 23 free: 0 MB
node 24 cpus:
node 24 size: 0 MB
node 24 free: 0 MB
node 25 cpus:
node 25 size: 0 MB
node 25 free: 0 MB
node 26 cpus:
node 26 size: 188416 MB
node 26 free: 188415 MB
node 27 cpus:
node 27 size: 0 MB
node 27 free: 0 MB
node 28 cpus:
node 28 size: 0 MB
node 28 free: 0 MB
node 29 cpus:
node 29 size: 0 MB
node 29 free: 0 MB
node 30 cpus:
node 30 size: 0 MB
node 30 free: 0 MB
node 31 cpus:
node 31 size: 0 MB
node 31 free: 0 MB
node 32 cpus:
node 32 size: 0 MB
node 32 free: 0 MB
node 33 cpus:
node 33 size: 0 MB
node 33 free: 0 MB
node distances:
node   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33
  0:  10  40  80  80  80  80  80  80  80  80  80  80  80  80  80  80  80  80  120  120  120  120  120  120  120  120  120  120  120  120  120  120  120  120
  1:  40  10  120  120  120  120  120  120  120  120  120  120  120  120  120  120  120  120  80  80  80  80  80  80  80  80  80  80  80  80  80  80  80  80
  2:  80  120  10  11  11  11  11  11  11  11  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40
  3:  80  120  11  10  11  11  11  11  11  11  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40
  4:  80  120  11  11  10  11  11  11  11  11  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40
  5:  80  120  11  11  11  10  11  11  11  11  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40
  6:  80  120  11  11  11  11  10  11  11  11  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40
  7:  80  120  11  11  11  11  11  10  11  11  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40
  8:  80  120  11  11  11  11  11  11  10  11  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40
  9:  80  120  11  11  11  11  11  11  11  10  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40
 10:  80  120  40  40  40  40  40  40  40  40  10  11  11  11  11  11  11  11  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40
 11:  80  120  40  40  40  40  40  40  40  40  11  10  11  11  11  11  11  11  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40
 12:  80  120  40  40  40  40  40  40  40  40  11  11  10  11  11  11  11  11  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40
 13:  80  120  40  40  40  40  40  40  40  40  11  11  11  10  11  11  11  11  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40
 14:  80  120  40  40  40  40  40  40  40  40  11  11  11  11  10  11  11  11  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40
 15:  80  120  40  40  40  40  40  40  40  40  11  11  11  11  11  10  11  11  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40
 16:  80  120  40  40  40  40  40  40  40  40  11  11  11  11  11  11  10  11  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40
 17:  80  120  40  40  40  40  40  40  40  40  11  11  11  11  11  11  11  10  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40
 18:  120  80  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  10  11  11  11  11  11  11  11  40  40  40  40  40  40  40  40
 19:  120  80  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  11  10  11  11  11  11  11  11  40  40  40  40  40  40  40  40
 20:  120  80  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  11  11  10  11  11  11  11  11  40  40  40  40  40  40  40  40
 21:  120  80  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  11  11  11  10  11  11  11  11  40  40  40  40  40  40  40  40
 22:  120  80  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  11  11  11  11  10  11  11  11  40  40  40  40  40  40  40  40
 23:  120  80  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  11  11  11  11  11  10  11  11  40  40  40  40  40  40  40  40
 24:  120  80  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  11  11  11  11  11  11  10  11  40  40  40  40  40  40  40  40
 25:  120  80  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  11  11  11  11  11  11  11  10  40  40  40  40  40  40  40  40
 26:  120  80  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  10  11  11  11  11  11  11  11
 27:  120  80  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  11  10  11  11  11  11  11  11
 28:  120  80  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  11  11  10  11  11  11  11  11
 29:  120  80  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  11  11  11  10  11  11  11  11
 30:  120  80  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  11  11  11  11  10  11  11  11
 31:  120  80  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  11  11  11  11  11  10  11  11
 32:  120  80  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  11  11  11  11  11  11  10  11
 33:  120  80  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  11  11  11  11  11  11  11  10

You can check NUMA nodes based on the descriptions in Checking the CPUs. If MIG is not used, the trailing seven NUMA nodes after each GPU NUMA node can be ignored.

Checking the GPU#

Running the nvidia-smi command displays the status of the GPU in the system.

Here is sample output from a Grace Hopper Superchip system:

+----------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.06 Driver Version: 535.104.06 CUDA Version: 12.2              |
|-----------------------------------------+----------------------+-----------------+
| GPU Name Persistence-M          | Bus-Id Disp.A        | Volatile Uncorr. ECC    |
|                                                                                  |
| Fan Temp Perf Pwr:Usage/Cap     | Memory-Usage         | GPU-Util Compute M.     |
|                                 |                      | MIG M.                  |
=========================================+======================+==================|
|                                  |                      |                        |
| 0    GH200  480GB             Off| 00000009:01:00.0 Off |          0             |
|                                  |                      |                        |
| N/A  29C    P0         108W/900W |      0MiB / 97871MiB |      8% Default        |
|                                  |                      |                        |
|                                  |                      | Disabled               |
+-----------------------------------------+----------------------+-----------------+

+----------------------------------------------------------------------------------+
| Processes:                      |                      |                         |
| GPU     GI     CI     PID Type      Process name                     GPU Memory  |
|         ID     ID                                                    Usage       |
|==================================================================================|
| No running processes found                                                       |
+----------------------------------------------------------------------------------+

Here is sample output from a system with two Grace Blackwell Superchips:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.120                Driver Version: 570.120        CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA Graphics Device         On  |   00000008:01:00.0 Off |                    0 |
| N/A   35C    P0            174W / 1200W |       1MiB / 189471MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA Graphics Device         On  |   00000009:01:00.0 Off |                    0 |
| N/A   35C    P0            158W / 1200W |       1MiB / 189471MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA Graphics Device         On  |   00000018:01:00.0 Off |                    0 |
| N/A   35C    P0            154W / 1200W |       1MiB / 189471MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA Graphics Device         On  |   00000019:01:00.0 Off |                    0 |
| N/A   35C    P0            169W / 1200W |       1MiB / 189471MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Checking the Memory#

One of the common ways of checking the memories on your Grace system is to run the sudo dmidecode -t memory command. Here is the sample output from a Grace-Grace machine:

# dmidecode 3.3
Getting SMBIOS data from sysfs.
SMBIOS 3.6.0 present.
# SMBIOS implementations newer than version 3.5.0 are not
# fully supported by this version of dmidecode.
Handle 0x000B, DMI type 16, 23 bytes
Physical Memory Array
        Location: System Board Or Motherboard
        Use: System Memory
        Error Correction Type: Single-bit ECC
        Maximum Capacity: 480 GB
        Error Information Handle: No Error
        Number Of Devices: 2
Handle 0x000C, DMI type 17, 92 bytes
Memory Device
       Array Handle: 0x000B
       Error Information Handle: 0x0000
       Total Width: 540 bits
       Data Width: 480 bits
       Size: 240 GB
       Form Factor: Die
       Set: None
       Locator: Not Specified
       Bank Locator: Not Specified
       Type: LPDDR5
       Type Detail: None
       Speed: Unknown
       Manufacturer: Not Specified
       Serial Number: 9223381974177924187
       Asset Tag: Not Specified
       Part Number: Not Specified
       Rank: 1
       Configured Memory Speed: Unknown
       Minimum Voltage: Unknown
       Maximum Voltage: Unknown
       Configured Voltage: Unknown
       Memory Technology: DRAM
       Memory Operating Mode Capability: None
       Firmware Version: Not Specified
       Module Manufacturer ID: Unknown
       Module Product ID: Unknown
       Memory Subsystem Controller Manufacturer ID: Unknown
       Memory Subsystem Controller Product ID: Unknown
       Non-Volatile Size: None
       Volatile Size: None
       Cache Size: None
       Logical Size: None
Handle 0x000D, DMI type 17, 92 bytes
Memory Device
       Array Handle: 0x000B
       Error Information Handle: 0x0000
       Total Width: 540 bits
       Data Width: 480 bits
       Size: 240 GB
       Form Factor: Die
       Set: None
       Locator: Not Specified
       Bank Locator: Not Specified
       Type: LPDDR5
       Type Detail: None
       Speed: Unknown
       Manufacturer: Not Specified
       Serial Number: 9223382071351559259
       Asset Tag: Not Specified
       Part Number: Not Specified
       Rank: 1
       Configured Memory Speed: Unknown
       Minimum Voltage: Unknown
       Maximum Voltage: Unknown
       Configured Voltage: Unknown
       Memory Technology: DRAM
       Memory Operating Mode Capability: None
       Firmware Version: Not Specified
       Module Manufacturer ID: Unknown
       Module Product ID: Unknown
       Memory Subsystem Controller Manufacturer ID: Unknown
       Memory Subsystem Controller Product ID: Unknown
       Non-Volatile Size: None
       Volatile Size: None
       Cache Size: None
       Logical Size: None

You can see from the output that there are two zones of LPDDR5 memories, each with 240 GB, and each zone is from one Grace chip.