Kubernetes Security Hardening#

This section of the install guide details a series of Kubernetes security hardening recommendations for NVIDIA Mission Control’s user and admin space Kubernetes clusters. Apply these recommendations after setting up Kubernetes via the BCM provided wizard.

Run this hardening guide after installing Kubernetes, but before installing NVIDIA Mission Control services as autonomous job recovery or autonomous hardware recovery.

User and Admin Cluster Background#

In order to promote the principle of least privilege, NVIDIA Mission Control segments services so that only services which must have access to the out of band (OOB) network have access to it. User space includes those services that do not require access to the out of band (OOB) network. Admin space includes those services that do require access to the OOB network.

To implement this separation of user and admin spaces, NVIDIA Mission Control segments the control plane into two distinct Kubernetes clusters: the user and admin space Kubernetes clusters. The user space cluster runs services that do not need access to the out of band network. The admin space cluster runs services that need access to the out of band (OOB) network.

The user space cluster is used for the deployment of Run:ai. If Run:ai is not in use, then the user space cluster is not deployed. The user space cluster includes three control plane nodes along with the B200 / GB200 compute trays.

The admin space cluster is used for the deployment of NetQ, Grafana, Prometheus, Loki, NMC autonomous job recovery, and NMC autonomous hardware recovery. It is always present in every DGX GB200 and later NMC deployment. It includes three control plane nodes only.

Securing Control Nodes#

Only administrators should have ssh access to the control nodes that underlie the user and admin space Kubernetes clusters.

To achieve this, we will leverage BCM to restrict access to the user and admin space control node categories. To restrict ssh access to only administrators for the user space control nodes, run the following command:

[a03-p1-head-01->category[k8s-system-user]]% set usernodelogin never
[a03-p1-head-01->category*[k8s-system-user*]]% commit
[a03-p1-head-01->category[k8s-system-user]]%

Note in the above example, replace k8s-system-user with the category name used for your user space Kubernetes control node category.

Now we will also introduce this restriction for the admin space control nodes:

[a03-p1-head-01->category[k8s-system-admin]]% set usernodelogin never
[a03-p1-head-01->category*[k8s-system-admin*]]% commit
[a03-p1-head-01->category[k8s-system-admin]]%

Note in the above example, replace k8s-system-admin with the category name used for your admin space Kubernetes control node category.

BCM has now enabled and configured ssh PAM to deny connections. To add back ssh access, you must explicitly add users and groups to the whitelist of PAM.

Restricting kubectl Access#

Certificate based authentication#

Access to Kubernetes is limited by default. By default, only the root user can access kubectl.

Administrators can grant kubectl access to additional users by modifying the file permissions associated with ~/.kube/config. By default, this file is owned by root, and read and write is restricted solely to root.

BCM also provides tooling to add users to Kubernetes clusters. This creates new certificates and populates the users home directory for ~/.kube/config

An example:

cm-kubernetes-setup --cluster k8s-admin --add-user johndoe --role cluster-admin

Configuring head nodes to restrict network access#

Kubernetes API is available via the nginx instance on the head nodes used for proxying traffic back to the Kubernetes cluster(s). By default this is configured to listen on 10443/tcp passing traffic to 6443/tcp on the Kubernetes admin space cluster, k8s-admin in our example.

A snippet of the nginx.conf file:

stream {
        upstream kube-k8s-admin {
            server a03-p1-netq-x86-01:6443;
            server b04-p1-netq-x86-03:6443;
            server b03-p1-netq-x86-02:6443;
        }
        server {
            listen 10443;
            proxy_pass kube-k8s-admin;
            proxy_next_upstream on;
            proxy_connect_timeout 300ms;
        }

You can test this by using curl from any node in the user space. In this example, we’ll test using a rack of DGX GB200 compute nodes:

pdsh -g rack=B05 'curl -m 1 -k -s https://master:10443 >/dev/null 2>&1; [ $? -eq 28 ] && echo "TIMEOUT" || echo "WORKING"' | dshbak -c
----------------
b05-p1-dgx-05-c[01-18]
----------------
WORKING

To restrict access, we’ll configure shorewall on the head nodes.

Add rules to accept traffic from admin space nodes#

for admin_ip in $(cmsh -c "device list -t headnode -f ip" | tail -n +1; cmsh -c "device list -c k8s-system-admin -f ip" | tail -n +1); do \
    for headnode in $(cmsh -c "device list -t headnode -f hostname" | tail -n +1); do \
        echo "Configuring shorewall on $headnode to accept $admin_ip/32 on port 10443/tcp" && \
        cmsh -c "device; use $headnode; roles; use firewall; openports; add accept nat 10443 tcp fw $admin_ip/32; commit"; \
    done; \
done

Note: substitute the k8s-system-admin category name in the example above with the actual category name used on your cluster if it differs.

Add rule to drop all other traffic from non-matching nodes#

for headnode in $(cmsh -c "device list -t headnode -f hostname" | tail -n +1); do \
    echo "Configuring shorewall on $headnode to deny remaining traffic" && \
    cmsh -c "device; use $headnode; roles; use firewall; openports; add drop nat 10443 tcp fw 0.0.0.0/0; commit"; \
done

Configuration via BCM (cmsh) will automatically restart the shorewall daemon on the head nodes

Validate that we now drop traffic#

pdsh -g rack=B05 'curl -m 1 -k -s https://master:10443 >/dev/null 2>&1; [ $? -eq 28 ] && echo "TIMEOUT" || echo "WORKING"' | dshbak -c
----------------
b05-p1-dgx-05-c[01-18]
----------------
TIMEOUT

Configuring kubernetes nodes to restrict network access#

Network and Kyverno policies are deployed using Helm charts from the NVIDIA NGC NMC Collections and used to restrict network access to the Kubernetes clusters.

and the nmc-kyverno-policies chart from the nmc-kyverno-policies chart on NGC.

helm fetch https://helm.ngc.nvidia.com/nvidia/nv-mission-control/charts/nmc-kyverno-policies-2.0.12.tgz --username='$oauthtoken' --password=<YOUR API KEY>

For NGC access, refer to the NGC User Guide.

Deploying network policies with Helm (nmc-network-policies)#

NVIDIA Mission Control’s Kubernetes installation wizard provisions Calico out of the box for network security. Calico is an open source network security solution for Kubernetes. Using Calico, administrators can manage network access between services within the SuperPod.

Deploy the nmc-network-policies Helm chart to apply additional Calico network policies to the admin (k8s-admin) and user (k8s-user) Kubernetes clusters. Configure a values file to specify the cluster type and the BCM head node CIDR (bcmHeadNodeCidr).

The following approach is recommended for switching between clusters via the command line from the BCM head node:

List available Kubernetes modules#

module avail kubernetes
------------------------------------------------------------------------------------------- /cm/local/modulefiles -------------------------------------------------------------------------------------------
kubernetes/k8s-admin/1.32.7-1.1  kubernetes/k8s-user/1.32.7-1.1

Display which modules are presently loaded#

module list
Currently Loaded Modulefiles:
 1) shared   2) cluster-tools/11.0   3) cm-image/11.0   4) cmd   5) cmsh   6) cm-setup/11.0   7) slurm/slurm/24.11   8) kubernetes/k8s-admin/1.32.7-1.1

Set access to a cluster#

module load kubernetes/k8s-user

Switch between Kubernetes clusters#

module swap kubernetes/k8s-admin kubernetes/k8s-user

Apply network policies to the k8s-admin cluster#

For the k8s-admin cluster:

  1. Log in to the BCM Head Node and load the k8s-admin module so kubectl targets the admin cluster.

  2. Download the nmc-network-policies chart from the nmc-network-policies chart on NGC to the BCM Head Node.

helm fetch https://helm.ngc.nvidia.com/nvidia/nv-mission-control/charts/nmc-network-policies-2.0.12.tgz --username='$oauthtoken' --password=<YOUR API KEY>
  1. Create a values file (for example, values-admin.yaml) that sets the cluster type to admin and the BCM head node CIDR:

    # Specify admin cluster
    clusterType:
      user: false
      admin: true
    # CIDR of the BCM head nodes
    bcmHeadNodeCidr: "203.0.113.0/24"   # Replace with your BCM head node CIDR
    
  2. Install or upgrade the chart:

    helm upgrade --install nmc-network-policies ./nmc-network-policies-2.0.12.tgz -f values-admin.yaml --wait
    

Apply network policies to the k8s-user cluster#

For the k8s-user cluster where Run:ai is deployed:

  1. Log in to the BCM Head Node and load the k8s-user module so kubectl targets the user cluster.

  2. Download the nmc-network-policies chart from the nmc-network-policies chart on NGC to an accessible location on the BCM Head Node. For example:

    helm fetch https://helm.ngc.nvidia.com/nvidia/nv-mission-control/charts/nmc-network-policies-2.0.12.tgz --username='$oauthtoken' --password=<YOUR API KEY>
    
  3. Create a values file (for example, values-user.yaml) that sets the cluster type to user and the BCM head node CIDR:

    # Specify user cluster
    clusterType:
      admin: false
      user: true
    # CIDR of the BCM head nodes
    bcmHeadNodeCidr: "203.0.113.0/24"   # Replace with your BCM head node CIDR
    
  4. Install or upgrade the chart:

    helm upgrade --install nmc-network-policies ./nmc-network-policies-2.0.12.tgz -f values-user.yaml --wait
    

Configuring Border Top Of Rack (BTOR) switching#

To secure access to the kubeapi-server server TCP port, we’ll configure a ACL to restrict access.

Perform the following configuration on both BTOR switches.

  1. Create IP ACLs for each address in the control plane to permit communication with kubeapi-server. IPs in the example would be replaced with the IPs on the current system (Head node IPs, VIP, and Admin K8s cluster) for the admin space Kubernetes cluster:

    nv set acl acl-kubeapi type ipv4
    nv set acl acl-kubeapi rule 10 match ip tcp dest-port 6443
    nv set acl acl-kubeapi rule 10 match ip source-ip 7.241.28.8/31
    nv set acl acl-kubeapi rule 10 action permit
    nv set acl acl-kubeapi rule 20 match ip tcp dest-port 6443
    nv set acl acl-kubeapi rule 20 match ip source-ip 7.241.28.10/32
    nv set acl acl-kubeapi rule 20 action permit
    nv set acl acl-kubeapi rule 30 match ip tcp dest-port 6443
    nv set acl acl-kubeapi rule 30 match ip source-ip 7.241.28.12/31
    nv set acl acl-kubeapi rule 30 action permit
    nv set acl acl-kubeapi rule 40 match ip tcp dest-port 6443
    nv set acl acl-kubeapi rule 40 match ip source-ip 7.241.28.14/32
    nv set acl acl-kubeapi rule 40 action permit
    
  2. Create catch-all deny ACLs:

    nv set acl acl-kubeapi rule 50 match ip tcp dest-port 6443
    nv set acl acl-kubeapi rule 50 action deny
    
  3. Apply to the physical interfaces, NOT the bond interfaces that the hosts are connected. Each of the hosts is connected twice to each BTOR. You must use inband physical ports, NOT out-of-band physical ports:

    nv set interface swp3s1 acl acl-kubeapi outbound
    nv set interface swp3s2 acl acl-kubeapi outbound
    nv set interface swp2s2 acl acl-kubeapi outbound
    nv set interface swp6s3 acl acl-kubeapi outbound
    nv set interface swp5s0 acl acl-kubeapi outbound
    
  4. Save and apply the configuration:

    nv config apply
    nv config save
    

Configuring Kyverno#

The Kubernetes API can also be hardened to restrict capabilities to only those services which need them. To do this, NVIDIA Mission Control leverages Kyverno, a widely employed declarative engine (a Kubernetes admission controller) to apply governance policies.

Installing Kyverno#

For both the k8s-user and k8s-admin Kubernetes clusters, the same approach and set of commands is used to install the Kyverno engine.

  1. Add the Kyverno Helm repository and create the namespace:

    helm repo add kyverno https://kyverno.github.io/kyverno/
    kubectl create ns kyverno
    kubectl label namespace kyverno zarf.dev/agent=ignore
    

    Note

    The label command ensures that Zarf won’t interfere with the Kyverno namespace.

  2. Copy the values-kyverno.yaml file to an accessible location on the BCM Head Node.

    ngc registry resource download-version "nvidia/nv-mission-control/nmc-k8s-security:2.0.12"
    

    This file configures resource limits and tolerations for Kyverno controllers to run on control plane nodes. Use it as a template to configure the resource limits and tolerations for your environment. For instance, you might need to increase the memory limits if you have a lot of policies.

    admissionController:
       container:
          resources:
             limits:
                memory: 3840Mi # increase this if you have a lot of policies
             requests:
                  cpu: 1000m
                  memory: 1280Mi # increase this if you have a lot of policies
    
  3. Install Kyverno:

    helm install kyverno kyverno/kyverno --namespace kyverno --version 3.5.2 --values values-kyverno.yaml --wait --timeout 2m
    
  4. Verify the installation:

    kubectl get pods -n kyverno
    

    All Kyverno pods should be in Running state before proceeding.

Applying Security Policies#

The user and admin space Kubernetes clusters require different Kyverno policies because they run different software.

Apply policies to the k8s-admin cluster#

For the admin space Kubernetes cluster (k8s-admin):

  1. Log in to the BCM Head Node and ensure the k8s-admin module is loaded:

    module load kubernetes/k8s-admin
    
  2. Download the nmc-kyverno-policies Helm chart from the NGC NMC Collections (refer to the note above).

    helm fetch https://helm.ngc.nvidia.com/nvidia/nv-mission-control/charts/nmc-kyverno-policies-2.0.12.tgz --username='$oauthtoken' --password=<YOUR API KEY>
    
  3. Create a values file (for example, values-kyverno-admin.yaml) that sets the cluster type to admin:

    clusterType:
      admin: true
      user: false
    
  1. Install or upgrade the chart:

    helm upgrade --install nmc-kyverno-policies ./nmc-kyverno-policies-2.0.12.tgz -f values-kyverno-admin.yaml -n kyverno --wait
    
  2. Verify the policies are applied:

    kubectl get clusterpolicy
    

    You should see the applied ClusterPolicy resources listed.

Apply policies to the k8s-user cluster#

For the user space Kubernetes cluster (k8s-user) where Run:ai is deployed:

  1. Log in to the BCM Head Node and ensure the k8s-user module is loaded:

    module load kubernetes/k8s-user
    
  2. Download the nmc-kyverno-policies chart from the nmc-kyverno-policies chart on NGC to the BCM Head Node.

helm fetch https://helm.ngc.nvidia.com/nvidia/nv-mission-control/charts/nmc-kyverno-policies-2.0.12.tgz --username='$oauthtoken' --password=<YOUR API KEY>
  1. Create a values file (for example, values-kyverno-user.yaml) that sets the cluster type to user:

    clusterType:
      user: true
    
  2. Install or upgrade the chart:

    helm upgrade --install nmc-kyverno-policies ./nmc-kyverno-policies -f values-kyverno-user.yaml -n kyverno --wait
    
  3. Verify the policies are applied:

    kubectl get clusterpolicy
    

    You should see the applied ClusterPolicy resources listed.

Validating Policy Enforcement#

After applying policies, you can verify they are enforcing security restrictions:

# Check for policy violations
kubectl get policyreport -A

# View details of a specific policy
kubectl describe clusterpolicy disallow-privileged-containers

Policy enforcement is set to Enforce mode by default, meaning non-compliant resources will be blocked at admission time. The enforcement mode is set to Enforce in these policies, meaning non-compliant resources will be blocked at admission time. When reviewing the output of these commands, you may observe violations for pre-existing system components in infrastructure namespaces. Examples may include kube-system, kube-public, and tigera-operator. This is expected behavior for essential components that require elevated privileges. These resources are allowed to continue operating due to the allowExistingViolations: true setting in each policy. It permits resources that existed prior to run while logging their violations for audit and compliance visibility. However, any attempts to create NEW resources or UPDATE existing ones with similar security violations—even in system namespaces — will be rejected. This approach provides protection against non-compliant workloads while maintaining operational continuity for essential infrastructure components.

Configuring PermissionManager#

When adding new users to Kubernetes, it is important to restrict user capabilities to only those capabilities that are necessary for the user.

Run:ai features an advanced authorization policy mechanism to control access and rights for the user space Kubernetes cluster. If users do not need kubectl access, Run:ai’s authorization mechanism is the best way to restrict functionality for the user space cluster.

If users need access to the user space Kubernetes cluster, PermissionManager is included with the Kubernetes deployment. PermissionManager is an open source tool for Kubernetes role based access control (RBAC) management. To configure PermissionManager, refer to their documentation.