Kubernetes Deployment

This section addresses configuration steps to be performed on the BCM head node.

  1. In the root shell, run the Kubernetes setup script.

  2. Select Deploy to continue

  3. Select the newest version certified for NVIDIA AI Enterprise.

  4. (Optional) Enter a private registry server here if required.

  5. Keep the default settings for the cluster.

  6. Select yes to allow the cluster to be used from the headnode.

  7. Select internalnet since that is what the Kubernetes control nodes use.

  8. Select the 3 knodes for the control plane.

  9. Select the dgx-h100 category for Kubernetes workers.

  10. Skip selecting individual worker nodes.

  11. Select the 3 knodes to be Etcd nodes.

  12. Accept the default values.

  13. Select the Calico plugin.

  14. Select no for installing the Kyverno Policy Engine.

  15. (Optional) If an NVAIE license has been provided, select yes and enter the details on the following page. Otherwise, select no.

  16. Select the following operators to install:

    • NVIDIA GPU Operator

    • Network Operator

    • Prometheus Adapter

    • Prometheus Operator Stack

    • cm-jupyter-kernel-operator

    • cm-kubernetes-mpi-operator

  17. Select the latest version certified for NVIDIA AI Enterprise.

  18. Select the latest version certified for NVIDIA AI Enterprise.

  19. Leave the custom YAML file blank.

  20. Select both cdi.enabled and nfd.enabled.

  21. Leave the custom YAML file blank.

  22. Configure the Network Operator by selecting the following:

    • nfd.enabled

    • sriovNetworkOperator.enabled

    • deployCR

    • secondaryNetwork.deploy

    • secondaryNetwork.cniPlugins.deploy

    • secondaryNetwork.multus.deploy

    • secondaryNetwork.ipamPlugin.deploy

  23. Deploy all the addons.

  24. Leave the ports as default.

  25. Choose yes to install the Permission Manager.

  26. Select both enabled and default for the local storage path.

  27. Leave the storage path as default.

  28. Choose to save config and deploy.

  29. Keep the default file path for the config file and continue.

  30. Once the Kubernetes setup has finished, check that all the nodes are online.

    root@bcm10-headnode:~# kubectl get nodes
    NAME       STATUS   ROLES                  AGE     VERSION
    dgx-01     Ready    worker                 5m56s   v1.28.10
    dgx-02     Ready    worker                 5m49s   v1.28.10
    knode-01   Ready    control-plane,master   6m26s   v1.28.10
    knode-02   Ready    control-plane,master   5m47s   v1.28.10
    knode-03   Ready    control-plane,master   5m56s   v1.28.10