Installation#
Important
Ensure the basic prerequisites have been met before proceeding with the setup.
Note
Manual setup instructions are also available in the Manual Installation section. However, the automated setup is recommended as it provides a streamlined installation process with minimal manual intervention.
Download Resources#
Create a directory (for example,
~/cns) to store the resources and navigate to it.mkdir ~/cns cd ~/cns
Download the
local-kubernetes-setup-playbook.tar.gzfrom the Local Developer Setup resource using an NGC API Key.You can also do this on the command line as follows after setting the
API_KEYenvironment variable.curl -fsSLO "https://api.ngc.nvidia.com/v2/org/nvidia/team/holoscan-for-media/resources/local-developer-setup/versions/0.4.0/files/local-kubernetes-setup-playbook.tar.gz" -H "Authorization: Bearer ${API_KEY}"
Decompress the downloaded package in the current directory.
tar -xzvf local-kubernetes-setup-playbook.tar.gz
Install Ansible#
Run the following command to install Ansible:
./ansible_setup.sh
This script installs Ansible version 8.0.0 along with other necessary components in a virtual Python environment. If the setup succeeds, the script creates a directory
.cnswith theactivateanddeactivatescripts.Open the
~/.bashrcfile and add the following lines to suppress some informational Ansible warnings:export ANSIBLE_LOCALHOST_WARNING=False export ANSIBLE_INVENTORY_UNPARSED_WARNING=False
Save the file and return to the shell.
Reload your
.bashrcfile:source ~/.bashrc
To run any Ansible command, activate the Python virtual environment first:
source .cns/bin/activate
To exit the virtual environment at any time, run:
deactivate
More information on python virtual environments can be found here.
Cluster Installation#
The default
config.yamlvalues are sufficient to run the reference applications in a basic configuration. Edit the file to enable persistent storage or configure a load balancer as these features are disabled by default.Other settings allow you to override the default selection of SR-IOV network interfaces, configure basic GPU time slicing using a single value, change the shared memory and hugepages configuration, or disable platform monitoring, as needed.
Run the automation using the following command:
ANSIBLE_LOG_PATH=./install_$(date +'%Y%m%d_%H%M%S').log ansible-playbook h4m_playbook.yaml -e action=install -e "NGC_API_KEY=<API-KEY>" -vv
where:
<API-KEY>is an NGC API Key (legacy NGC API keys are not supported)The
-vvflag enables verbose output for debugging. You can change it to-vfor less detail or-vvvor-vvvvfor increasingly more detailed logs.
Important
During the installation process, you will be required to input the password for sudo access.
If the playbook fails to run with the following message, reboot and rerun the playbook:
TASK [Show message if reboot failed] ************************************** ok: [localhost] = { "msg": "Reboot failed. Please manually reboot the system and try again." }
If the system gets rebooted after applying the SR-IOV Network Node Policy, you need to complete some tasks after the reboot.
TASK [Apply the SR-IOV Network Node Policy] ****************************************************************************************************************** changed: [localhost] TASK [Wait for NIC Resources to be ready] ******************************************************************************************************************** FAILED - RETRYING: [localhost]: Wait for NIC Resources to be ready (10 retries left). FAILED - RETRYING: [localhost]: Wait for NIC Resources to be ready (9 retries left). FAILED - RETRYING: [localhost]: Wait for NIC Resources to be ready (8 retries left).
Allow some time for the system to reboot and all components to fully initialize. Run the below command to complete the remaining tasks:
ANSIBLE_LOG_PATH=./install_$(date +'%Y%m%d_%H%M%S').log ansible-playbook h4m_playbook.yaml -e action=install --start-at-task="Wait for NIC Resources to be ready" -vv
Cluster Uninstallation#
To uninstall, run:
ANSIBLE_LOG_PATH=./uninstall_$(date +'%Y%m%d_%H%M%S').log ansible-playbook h4m_playbook.yaml -e action=uninstall --ask-become-pass -vv
After uninstalling, it is important to reboot the system to completely remove any components loaded by CNS.