Installation#

Important

Ensure the basic prerequisites have been met before proceeding with the setup.

Note

Manual setup instructions are also available in the Manual Installation section. However, the automated setup is recommended as it provides a streamlined installation process with minimal manual intervention.

Download Resources#

  1. Create a directory (for example, ~/cns) to store the resources and navigate to it.

    mkdir ~/cns
    cd ~/cns
    
  2. Download the local-kubernetes-setup-playbook.tar.gz from the Local Developer Setup resource using an NGC API Key.

    You can also do this on the command line as follows after setting the API_KEY environment variable.

    curl -fsSLO "https://api.ngc.nvidia.com/v2/org/nvidia/team/holoscan-for-media/resources/local-developer-setup/versions/0.4.0/files/local-kubernetes-setup-playbook.tar.gz" -H "Authorization: Bearer ${API_KEY}"
    
  3. Decompress the downloaded package in the current directory.

    tar -xzvf local-kubernetes-setup-playbook.tar.gz
    

Install Ansible#

  1. Run the following command to install Ansible:

    ./ansible_setup.sh
    

    This script installs Ansible version 8.0.0 along with other necessary components in a virtual Python environment. If the setup succeeds, the script creates a directory .cns with the activate and deactivate scripts.

  2. Open the ~/.bashrc file and add the following lines to suppress some informational Ansible warnings:

    export ANSIBLE_LOCALHOST_WARNING=False
    export ANSIBLE_INVENTORY_UNPARSED_WARNING=False
    

    Save the file and return to the shell.

    Reload your .bashrc file:

    source ~/.bashrc
    
  3. To run any Ansible command, activate the Python virtual environment first:

    source .cns/bin/activate
    

    To exit the virtual environment at any time, run:

    deactivate
    

    More information on python virtual environments can be found here.

Cluster Installation#

  1. The default config.yaml values are sufficient to run the reference applications in a basic configuration. Edit the file to enable persistent storage or configure a load balancer as these features are disabled by default.

    Other settings allow you to override the default selection of SR-IOV network interfaces, configure basic GPU time slicing using a single value, change the shared memory and hugepages configuration, or disable platform monitoring, as needed.

  2. Run the automation using the following command:

    ANSIBLE_LOG_PATH=./install_$(date +'%Y%m%d_%H%M%S').log ansible-playbook h4m_playbook.yaml -e action=install -e "NGC_API_KEY=<API-KEY>" -vv
    

    where:

    • <API-KEY> is an NGC API Key (legacy NGC API keys are not supported)

    • The -vv flag enables verbose output for debugging. You can change it to -v for less detail or -vvv or -vvvv for increasingly more detailed logs.

    Important

    • During the installation process, you will be required to input the password for sudo access.

    • If the playbook fails to run with the following message, reboot and rerun the playbook:

      TASK [Show message if reboot failed] **************************************
      ok: [localhost] = {
         "msg": "Reboot failed. Please manually reboot the system and try again."
      }
      
    • If the system gets rebooted after applying the SR-IOV Network Node Policy, you need to complete some tasks after the reboot.

      TASK [Apply the SR-IOV Network Node Policy] ******************************************************************************************************************
      changed: [localhost]
      
      TASK [Wait for NIC Resources to be ready] ********************************************************************************************************************
      FAILED - RETRYING: [localhost]: Wait for NIC Resources to be ready (10 retries left).
      FAILED - RETRYING: [localhost]: Wait for NIC Resources to be ready (9 retries left).
      FAILED - RETRYING: [localhost]: Wait for NIC Resources to be ready (8 retries left).
      

      Allow some time for the system to reboot and all components to fully initialize. Run the below command to complete the remaining tasks:

      ANSIBLE_LOG_PATH=./install_$(date +'%Y%m%d_%H%M%S').log ansible-playbook h4m_playbook.yaml -e action=install --start-at-task="Wait for NIC Resources to be ready" -vv
      

Cluster Uninstallation#

To uninstall, run:

ANSIBLE_LOG_PATH=./uninstall_$(date +'%Y%m%d_%H%M%S').log ansible-playbook h4m_playbook.yaml -e action=uninstall --ask-become-pass -vv

After uninstalling, it is important to reboot the system to completely remove any components loaded by CNS.