H200 Node Provisioning#

  1. Download the H200 tar file onto the headnode’s /root directory.

    wget https://support2.brightcomputing.com/h200-parker/DGXOS-6.3.1-H200.tar.gz -P /root
    
  2. Use cm-create-image to add the H200 image to cmsh.

     1cm-create-image --fromarchive /root/DGXOS-6.3.1-H200.tar.gz --imagename dgx-6.3.1-h200-image  --skipdist
     2
     3Running validate base tar........................ [  OK  ]
     4
     5Running sanity check............................. [  OK  ]
     6
     7Running unpack base tar.......................... [  OK  ]
     8    ******************** IMPORTANT ****************************
     9    Please confirm that the base distribution repositories for
    10    the software image are enabled. For instructions on how to
    11    enable repositories for your software image, please refer
    12    the administrator's manual.
    13
    14
    15    Image creation can be resumed in one of the following ways:
    16    -----------------------------------------------------------
    17    1. Enter 'e' to exit, and configure repositories.
    18        Then, restart program with the -d (--fromdir) option.
    19        cm-create-image -d /cm/images/dgx-6.3.1-h200-image -n dgx-6.3.1-h200-image
    20
    21    2. Open a new console, and configure repositories.
    22        Then enter 'c' on this console, to continue software
    23        image creation.
    24
    25    ***********************************************************
    26
    27Continue(c)/Exit(e)? c
    28
    29
    30Finalize base distribution....................... [  OK  ]
    31
    32Copying cm repo files............................ [  OK  ]
    33
    34Validating repo configuration.................... [  OK  ]
    35
    36Finalizing image services........................ [  OK  ]
    37
    38Installing CM packages........................... [  OK  ]
    39
    40Finalizing cluster services...................... [  OK  ]
    41
    42Copying cluster certificate to image............. [  OK  ]
    43
    44Adding/Updating software image................... [  OK  ]
    
  3. In cmsh, go to the softwareimages and verify the H200 image has been created.

    1cmsh
    2softwareimage
    3ls
    4
    5Name (key)                        Path (key)                                   Kernel version      Nodes
    6--------------------------------- -------------------------------------------- ------------------- --------
    7dgx-6.3.1-h200-image              /cm/images/dgx-6.3.1-h200-image              5.15.0-1063-nvidia  0
    
  4. Add the bonding module to the H200 image.

    1cmsh
    2softwareimage
    3use dgx-6.3.1-h200-image
    4kernelmodules
    5add bonding
    6commit
    
  5. In the category section, clone dgx-h200 from dgx-h100 and set the software image to the H200 image.

    1cmsh
    2category
    3clone dgx-h100 dgx-h200
    4use dgx-h200
    5set softwareimage dgx-6.3.1-h200-image
    6commit
    
  6. Assign the category to the H200 node and power on.