Cloud-init Configuration File#
This section provides instructions for creating a cloud-init configuration file for the Ubuntu Automated Server Installation.
Modifying the Configuration File#
The following text is an outline of the example configuration file. It won’t work as is but requires additional modifications as described in the sections. Refer to Ubuntu AUtomated Server Installation for more details.
Begin the configuration file with the following header:
#cloud-config autoinstall: version: 1
Define a default user (the example uses Ubuntu), localization, and keyboard layout.
## ## Set initial system and user information ## use mkpassword -m sha-512 <password> to create a password ## identity: realname: DGX Ubuntu User hostname: dgx-host password: <PASSWORD HASH> username: ubuntu locale: en_US keyboard: layout: en variant: us reporting: builtin: type: print
The network section describes the network configuration and supports fixed addresses, DHCP, and various other network options. The names of the network interfaces are system-dependent. These are the primary management ports for various DGX systems. For example:
DGX-1: enp1s0f0
DGX-2: enp6s0
DGX A100: enp226s0
## ## Network Configuration ## network: version: 2 ethernets: enp1s0f0: dhcp4: yes
Update the Subiquity installer to the edge channel. The NVIDIA repositories require to also set up Apt preferences, which is not supported by the version of Subiquity that is shipped with Ubuntu 20.04 ISO images.
refresh-installer: channel: edge update: yes
Provide details about the additional NVIDIA repositories. Refer to Drive Partitioning below for more information.
## ## Enable this for using the remote repositories ## apt: <Repository details for the CUDA Compute and DGX Repository> conf: | Dpkg::Options { "--force-confdef"; "--force-confold";
Configure storage.
The next section describes the storage configuration, including swap configuration and drive partitioning. By setting the size to 0, we disable the SWAP partition. Refer to Drive Partitioning.
The
reorder_uefi
flag tells the installer not to change the boot order to place the currently booted entry (BootCurrent) to the first option.## ## Storage Configuration ## storage: config: <Partition and other configurations> swap: size: 0 grub: reorder_uefi: false
Enable the SSH server.
You can also set a default SSH key.
## ## SSH Server ## ssh: install-server: yes allow-pw: yes
Provide a list of packages that should be installed.
Refer to the comments in this text for instructions on changing the package names for specific DGX systems and on enabling or disabling features.
## ## Packages ## packages: ## ## NVIDIA DGX system configurations and system tools ## Replace dgx-a100 for other DGX systems: ## dgx1 for DGX-1 ## dgx2 for DGX-2 ## dgx-a100 for DGX A100 ## - dgx-a100-system-configurations - dgx-a100-system-tools - dgx-a100-system-tools-extra ## Remove this if you don’t want to use cachefilesd - nvidia-conf-cachefilesd ## Remove this if boot drive encryption is enabled and you don’t ## want the passphrase dialog only visible on the serial console - nvidia-ipmisol ## ## NVIDIA CUDA driver and tools ## Change the driver version to the branch you want to install ## - datacenter-gpu-manager - libnvidia-nscq-450 - linux-modules-nvidia-450-server-generic - nvidia-driver-450-server - nvidia-modprobe - nv-persistence-mode ## Uncomment these to support the NVswitch on DGX-2 and DGX A100 ## Ensure that the driver version matches with the versions above # - libnvidia-nscq-450 # - nvidia-fabricmanager-450 ## ## Mellanox drivers and tools ## - mlnx-ofed-all - nvidia-mlnx-ofed-misc ## ## NVIDIA container support ## - docker-ce - nv-docker-options - nvidia-docker2 ## ## NVIDIA system management tools ## - nvsm - nvidia-motd
Add any additional software packages you want to install during autoinstall.
Finally, add a list of additional commands to be executed at the end of the installation.
Disable unattended upgrades
Disable the ondemand governor defaulting to performance mode
Enable DCGM and OpenIBD services
Enable
nv-peer-mem
## ## Commands executed after completion of the installation ## late-commands: - curtin in-target --target=/target -- apt purge -y unattended-upgrades - curtin in-target --target=/target -- systemctl disable ondemand - curtin in-target --target=/target -- systemctl enable dcgm openibd - curtin in-target --target=/target -- update-rc.d nv_peer_mem defaults # DGX A100 … - curtin in-target -- mlnx_pxe_setup.bash
Drive Partitioning#
storage:
config:
- id: disk-sda
type: disk
ptable: gpt
path: /dev/sda
name: osdisk
wipe: superblock-recursive
- id: partition-sda1
type: partition
device: disk-sda
number: 1
size: 512M
flag: boot
grub_device: true
- id: partition-sda2
type: partition
device: disk-sda
number: 2
size: 100G
- id: format-partition-sda1
type: format
fstype: fat32
label: efi
volume: partition-sda1
- id: format-partition-sda2
type: format
fstype: ext4
label: root
volume: partition-sda2
- id: root-mount
type: mount
path: /
device: format-partition-sda2
options: errors=remount-ro
passno: 1
- id: boot-mount
type: mount
path: /boot/efi
device: format-partition-sda1
passno: 1
- id: disk-sdb
type: disk
ptable: gpt
path: /dev/sdb
name: raid
wipe: superblock-recursive
- id: partition-sdb1
type: partition
device: disk-sdb
number: 1
- id: format-partition-sdb1
type: format
fstype: ext4
label: raid
volume: partition-sdb1
- id: raid-mount
type: mount
path: /raid
device: format-partition-sdb1
passno: 2