Install Operating System on Bare Metal Nodes#

Note

This is an example workflow with specific servers NVIDIA had in its test bed. Consult the BMC manual of your server for detailed instructions.

All physical nodes (RTX PRO Servers and an HGX B200) are provisioned with Ubuntu 24.04 LTS Server.

Accessing BMC Interface — Example of an RTX PRO Server#

  1. Open a web browser and navigate to the BMC IP address of your server.

  2. Login with BMC credentials.

  3. After login, you will see the BMC interface.

  4. Navigate to the Remote Console section.

_images/physical-ai-bmc-rtx-pro.png

Figure 3 BMC interface for the RTX PRO Server showing the Remote Console section#

Accessing BMC Interface — HGX B200#

  1. Open a web browser and navigate to the BMC IP address of your HGX B200 Server.

  2. Login with BMC credentials.

  3. After login, you will see the Baseboard Management Controller interface.

Provisioning Ubuntu 24.04 via BMC#

  1. In the BMC interface, go to Remote Console.

  2. Launch the remote console (Java-based or HTML5 console).

  3. At the top of the remote console window, click “Media”.

_images/physical-ai-bmc-remote-console.png

Figure 4 Remote console window with the Media menu and Mount Virtual Media dialog#

  1. Browse and select the Ubuntu 24.04 Server ISO file from your local disk.

  2. Click “Mount” or “Mount all media”.

  3. Wait patiently (approximately 10 minutes) for the server to boot from the ISO.

  4. The Ubuntu installation screen will eventually appear.

_images/physical-ai-ubuntu-language.png

Figure 5 Ubuntu 24.04 Server installer — language selection screen#

  1. Proceed with Ubuntu installation:

    • Select language and keyboard layout.

    • Choose network configuration (configure static IP or DHCP as needed).

    • Configure storage (use entire disk for simple setup).

    • Create user account (recommend: username “nvidia”).

    • Enable OpenSSH server (IMPORTANT — check this option).

    • Do NOT select any additional snaps/packages (minimal installation).

    • Complete installation and reboot.

_images/physical-ai-ubuntu-os-type.png

Figure 6 Ubuntu 24.04 Server installer — OS type selection (Ubuntu Server Minimized recommended)#

_images/physical-ai-ubuntu-ssh-config.png

Figure 7 Ubuntu 24.04 Server installer — SSH configuration (Install OpenSSH server must be checked)#

  1. Server should now boot into Ubuntu 24.04.

After installation reboot the server:

_images/physical-ai-ubuntu-install-complete.png

Figure 8 Ubuntu 24.04 Server installation complete screen#

Prepare the Future Kubernetes Nodes#

The control plane and worker (GPU) nodes can have minimal OS packages installed. Moving on we only need the ssh server installed, which is done right during the installation.

Update the OS Packages and Disable Sudo Prompt#

On the GPU workers and control plane nodes do the update and upgrade of the OS software packages:

sudo apt -y update && sudo apt -y upgrade

Add some visual editor, which you’re going to install later on with. Here we go with vim, chose your own editor instead, if you don’t like vim:

sudo apt -y install vim

The nodes need NOPASSWD sudo mode. On all nodes execute:

sudo visudo

Navigate and change these lines of configuration to match the following:

# Allow members of group sudo to execute any command
%sudo   ALL=(ALL:ALL) NOPASSWD: ALL

Save the file and exit the editor.

Prepare the Installation Node#

Can be any Linux OS supporting docker / podman or any similar container runtime. The node should have IPv4 connectivity to the future kubernetes nodes.

Generate a private-public RSA SSH keypair:

ssh-keygen -t rsa

Then copy the public key to all future kubernetes nodes. Example of the command for one node:

ssh-copy-id -i ~/.ssh/id_rsa.pub <your login>@<a node IP>

Now test you can SSH to all the future kubernetes nodes. Example of the command for one node:

ssh <your login>@<a node IP>

It’s expected you are not asked for the password any longer.

Next follow the official Docker installation guide if Docker is your preference and it’s not yet installed. Install any other container runtime of your choice alternatively.

We will also need git (and the node will need connectivity to github):

sudo apt -y update && sudo apt -y install git