Kubernetes Security Hardening#

This section of the install guide details a series of Kubernetes security hardening recommendations for NVIDIA Mission Control’s user and admin space Kubernetes clusters. These recommendations should be applied after setting up Kubernetes via the BCM provided wizard.

This hardening guide should be executed after installing Kubernetes, but before installing NVIDIA Mission Control services as autonomous job recovery or autonomous hardware recovery.

User and Admin Cluster Background#

In order to promote the principle of least privilege, NVIDIA Mission Control segments services so that only services which must have access to the out of band (OOB) network have access to it. User space includes those services that do not require access to the out of band (OOB) network. Admin space includes those services that do require access to the OOB network.

To implement this separation of user and admin spaces, NVIDIA Mission Control segments the control plane into two distinct Kubernetes clusters: the user and admin space Kubernetes clusters.The user space cluster runs services that do not need access to the out of band network. The admin space cluster runs services that need access to the out of band (OOB) network.

The user space cluster is used for the deployment of Run:ai. If Run:ai is not in use, then the user space cluster is not deployed. The user space cluster includes three control plane nodes along with the B200 / GB200 compute trays.

The admin space cluster is used for the deployment of NMX-M, Grafana, Prometheus, Loki, NMC autonomous job recovery, and NMC autonomous hardware recovery. It is always present in every DGX GB200 and later NMC deployment. It includes three control plane nodes only.

Securing Control Nodes#

Only administrators should have ssh access to the control nodes that underlie the user and admin space Kubernetes clusters.

To achieve this, we will leverage BCM to restrict access to the user and admin space control node categories. To restrict ssh access to only administrators for the user space control nodes, please run the following command:

[a03-p1-head-01->category[k8s-system-user]]% set usernodelogin never
[a03-p1-head-01->category*[k8s-system-user*]]% commit
[a03-p1-head-01->category[k8s-system-user]]%

Note in the above example, the k8s-system-user should be replaced with the category named used for your user space Kubernetes control node category.

Now we will also introduce this restriction for the admin space control nodes:

[a03-p1-head-01->category[k8s-system-admin]]% set usernodelogin never
[a03-p1-head-01->category*[k8s-system-admin*]]% commit
[a03-p1-head-01->category[k8s-system-admin]]%

Note in the above example, the k8s-system-admin should be replaced with the category named used for your admin space Kubernetes control node category.

BCM has now enabled and configured ssh PAM to deny connections. To add back ssh access, you must explicitly add users and groups to the whitelist of PAM.

Restricting kubectl Access#

Certificate based authentication#

Access to Kubernetes is limited by default. By default, only the root user can access kubectl.

Administrators can grant kubectl access to additional users by modifying the file permissions associated with ~/.kube/config. By default, this file is owned by root, and read and write is restricted solely to root.

BCM also provides tooling to add users to Kubernetes clusters. This creates new new cerficiates and populates the users home directory for ~/.kube/config

An example:

cm-kubernetes-setup --cluster k8s-admin --add-user johndoe --role cluster-admin

Configuring head nodes to restrict network access#

Kubernetes API is available via the nginx instance on the head nodes used for proxying traffic back to the Kubernetes cluster(s). By default this is configured to listen on 10443/tcp passing traffic to 6443/tcp on the Kubernetes admin space cluster, k8s-admin in our example.

A snipit of the nginx.conf file:

stream {
        upstream kube-k8s-admin {
            server a03-p1-nmxm-x86-01:6443;
            server b04-p1-nmxm-x86-03:6443;
            server b03-p1-nmxm-x86-02:6443;
        }
        server {
            listen 10443;
            proxy_pass kube-k8s-admin;
            proxy_next_upstream on;
            proxy_connect_timeout 300ms;
        }

You can test this by using curl from any node in the user space. In this example, we’ll test using a rack of DGX GB200 compute nodes:

pdsh -g rack=B05 'curl -m 1 -k -s https://master:10443 >/dev/null 2>&1; [ $? -eq 28 ] && echo "TIMEOUT" || echo "WORKING"' | dshbak -c
----------------
b05-p1-dgx-05-c[01-18]
----------------
WORKING

To restrict access, we’ll configure shorewall on the head nodes.

Add rules to accept traffic from admin space nodes#

for admin_ip in $(cmsh -c "device list -t headnode -f ip" | tail -n +1; cmsh -c "device list -c k8s-system-admin -f ip" | tail -n +1); do \
    for headnode in $(cmsh -c "device list -t headnode -f hostname" | tail -n +1); do \
        echo "Configuring shorewall on $headnode to accept $admin_ip/32 on port 10443/tcp" && \
        cmsh -c "device; use $headnode; roles; use firewall; openports; add accept nat 10443 tcp fw $admin_ip/32; commit"; \
    done; \
done

Add rule to drop all other traffic from non-matching nodes#

for headnode in $(cmsh -c "device list -t headnode -f hostname" | tail -n +1); do \
    echo "Configuring shorewall on $headnode to deny remaining traffic" && \
    cmsh -c "device; use $headnode; roles; use firewall; openports; add drop nat 10443 tcp fw 0.0.0.0/0; commit"; \
done

Configuration via BCM (cmsh) will automatically restart the shorewall daemon on the head nodes

Validate that we now drop traffic#

pdsh -g rack=B05 'curl -m 1 -k -s https://master:10443 >/dev/null 2>&1; [ $? -eq 28 ] && echo "TIMEOUT" || echo "WORKING"' | dshbak -c
----------------
b05-p1-dgx-05-c[01-18]
----------------
TIMEOUT

Configuring kubernetes nodes to restrict network access#

Note

Please reach out to your NVIDIA representative for the latest example configurations and suggested policies for Kyverno and Calico Network Policies intended for both the NMC (k8s-admin) and Run:ai (k8s-user) clusters.

The following documentation references the policy manifests that will be provided.

Configuring Calico#

NVIDIA Mission Control’s Kubernetes installation wizard provisions Calico out of the box for network security. Calico is an open source network security solution for Kubernetes. Using Calico, administrators can manage network access between services within the SuperPod.

Additional Calico policies can be applied to the user (k8s-user) and admin (k8s-admin) Kubernetes clusters in order to control network traffic. The policies are different for each of these clusters because different services are running in each cluster.

The following approach is recommended for switching between clusters via the command line from the BCM head node:

List available Kubernetes modules#

module avail kubernetes
------------------------------------------------------------------------------------------- /cm/local/modulefiles -------------------------------------------------------------------------------------------
kubernetes/k8s-admin/1.32.7-1.1  kubernetes/k8s-user/1.32.7-1.1

Display which modules are presently loaded#

module list
Currently Loaded Modulefiles:
 1) shared   2) cluster-tools/11.0   3) cm-image/11.0   4) cmd   5) cmsh   6) cm-setup/11.0   7) slurm/slurm/24.11   8) kubernetes/k8s-admin/1.32.7-1.1

Set access to a cluster#

module load kubernetes/k8s-user

Switch between Kubernetes clusters#

module swap kubernetes/k8s-admin kubernetes/k8s-user

Apply policy to the k8s-admin cluster#

For the k8s-admin cluster, apply the recommended policies via the following:

Login to the BCM Head Node and make sure that you’ve loaded the k8s-admin module so kubectl will be configured for that cluster
Either create a new file or copy the 0_admin_network_policies.yaml to an accessible location on the BCM Head Node.

Modify the 0_admin_network_policies.yaml file to replace the placeholder CIDR:

spec:
 nets:
 # TODO: Replace this value with your BCM Head Node CIDR
 - "203.0.113.0/24" # placeholder example (RFC 5737 reserved safe address range)

Validate first and then apply the policy:

# Validate
kubectl apply --dry-run=client -f 0_admin_network_policies.yaml

# Apply
kubectl apply -f 0_admin_network_policies.yaml

Apply policy to the k8s-user cluster#

For the k8s-user cluster where Run:ai is deployed, apply the recommended policies via the following:

Either create a new file or copy the 0_user_network_policies.yaml to an accessible location on the BCM Head Node.

Login to the BCM Head Node and make sure that you’ve loaded the k8s-admin module so kubectl will be configured for that cluster

Modify the 0_user_network_policies.yaml file to replace the placeholder CIDR:

spec:
 nets:
 # TODO: Replace this value with your BCM Head Node CIDR
 - "203.0.113.0/24" # placeholder example (RFC 5737 reserved safe address range)

Validate first and then apply the policy:

# Validate
kubectl apply --dry-run=client -f 0_user_network_policies.yaml

# Apply
kubectl apply -f 0_user_network_policies.yaml

Configuring Border Top Of Rack (BTOR) switching#

To secure access to the kubeapi-server server TCP port, we’ll configure a ACL to restrict access.

The following config must be performed on both BTOR switches.

Create IP ACLs for each address in the control plane to permit communication with kubeapi-server. IPs in the example would be replaced with the IPs on the current system (Head node IPs, VIP, and Admin K8s cluster):

nv set acl acl-kubeapi type ipv4
nv set acl acl-kubeapi rule 10 match ip tcp dest-port 6443
nv set acl acl-kubeapi rule 10 match ip source-ip 7.241.28.8/31
nv set acl acl-kubeapi rule 10 action permit
nv set acl acl-kubeapi rule 20 match ip tcp dest-port 6443
nv set acl acl-kubeapi rule 20 match ip source-ip 7.241.28.10/32
nv set acl acl-kubeapi rule 20 action permit
nv set acl acl-kubeapi rule 30 match ip tcp dest-port 6443
nv set acl acl-kubeapi rule 30 match ip source-ip 7.241.28.12/31
nv set acl acl-kubeapi rule 30 action permit
nv set acl acl-kubeapi rule 40 match ip tcp dest-port 6443
nv set acl acl-kubeapi rule 40 match ip source-ip 7.241.28.14/32
nv set acl acl-kubeapi rule 40 action permit

Create catch-all deny ACLs:

nv set acl acl-kubeapi rule 50 match ip tcp dest-port 6443
nv set acl acl-kubeapi rule 50 action deny

Apply to the physical interfaces, NOT the bond interfaces that the hosts are connected. Each of the hosts is connected twice to each BTOR. You must use inband physical ports, NOT out-of-band physical ports:

nv set interface swp3s1 acl acl-kubeapi outbound
nv set interface swp3s2 acl acl-kubeapi outbound
nv set interface swp2s2 acl acl-kubeapi outbound
nv set interface swp6s3 acl acl-kubeapi outbound
nv set interface swp5s0 acl acl-kubeapi outbound

Save and apply the configuration:
```
nv config apply
nv config save
```

Configuring Kyverno#

The Kubernetes API can also be hardened to restrict capabilities to only those services which need them. To do this, NVIDIA mission control leverages Kyverno, an open source, declarative, engine to apply governance policies to Kubernetes.

For the user and admin space Kubernetes clusters, the same set of commands is used to install Kyverno.

To begin installing Kyverno, run the following commands:

helm repo add kyverno https://kyverno.github.io/kyverno/
kubectl create ns kyverno
kubectl label namespace kyverno `zarf.dev/agent=ignore <http://zarf.dev/agent=ignore>`_

Note the importance of the label command above - this is crucial to make sure that Zarf does not interfere with the Kyverno namespace.

Next, run the following commands to create Kyverno’s configuration files:

cat <<EOF > values-kyverno.yaml
admissionController:
  container:
    resources:
      limits:
        memory: 3840Mi
      requests:
        cpu: 1000m
        memory: 1280Mi
  replicas: 1
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
  - effect: NoSchedule
    key: node-role.kubernetes.io/control-plane
    operator: Exists
backgroundController:
  replicas: 1
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
  - effect: NoSchedule
    key: node-role.kubernetes.io/control-plane
    operator: Exists
cleanupController:
  replicas: 1
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
  - effect: NoSchedule
    key: node-role.kubernetes.io/control-plane
    operator: Exists
cleanupJobs:
  admissionReports:
    tolerations:
    - effect: NoSchedule
      key: node-role.kubernetes.io/master
      operator: Exists
    - effect: NoSchedule
      key: node-role.kubernetes.io/control-plane
      operator: Exists
  clusterAdmissionReports:
    tolerations:
    - effect: NoSchedule
      key: node-role.kubernetes.io/master
      operator: Exists
    - effect: NoSchedule
      key: node-role.kubernetes.io/control-plane
      operator: Exists
policyReportsCleanup:
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
  - effect: NoSchedule
    key: node-role.kubernetes.io/control-plane
    operator: Exists
reportsController:
  replicas: 1
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
  - effect: NoSchedule
    key: node-role.kubernetes.io/control-plane
    operator: Exists
webhooksCleanup:
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
  - effect: NoSchedule
    key: node-role.kubernetes.io/control-plane
    operator: Exists
EOF

cat <<EOF > values-kyverno-policies.yaml
policyExclude:
  disallow-capabilities:
    any:
    - resources:
        kinds:
        - Pod
        namespaces:
        - default
  disallow-host-namespaces:
    any:
    - resources:
        kinds:
        - Pod
        namespaces:
        - default
        - calico-system
        - kube-system
        - prometheus
        - tigera-operator
        - longhorn-system
  disallow-host-path:
    any:
    - resources:
        kinds:
        - Pod
        namespaces:
        - default
        - calico-system
        - prometheus
        - cm
        - '*-restricted'
        - local-path-storage
        - tigera-operator
        - longhorn-system
  disallow-host-ports:
    any:
    - resources:
        kinds:
        - Pod
        namespaces:
        - default
        - calico-system
        - prometheus
        - longhorn-system
  disallow-privileged-containers:
    any:
    - resources:
        kinds:
        - Pod
        namespaces:
        - local-path-storage
        - default
        - calico-system
        - cm
        - longhorn-system
validationFailureAction: Enforce
EOF

Now, to finish installing Kyverno, please run the following commands:

helm install kyverno kyverno/kyverno --namespace kyverno --version 3.4.4 --values values-kyverno.yaml --wait --timeout 2m
helm install kyverno-policies kyverno/kyverno-policies --namespace kyverno --version 3.4.4 --values values-kyverno-policies.yaml

Kyverno is now installed, but the user and admin space specific policies to each cluster have not been installed. The user and admin space Kubernetes clusters require different Kyverno policies because they run different software. If Run:ai is deployed, run the following commands to configure the user space Kubernetes cluster:

kubectl apply -f 0_user_policies.yaml

For the admin space Kubernetes cluster, to configure Kyverno, please run the following commands:

kubectl apply -f 0_admin_policies.yaml

Configuring PermissionManager#

When adding new users to Kubernetes, it is important to restrict user capabilities to only those capabilities that are necessary for the user.

Run:ai features an advanced authorization policy mechanism to control access and rights for the user space Kubernetes cluster. If users do not need kubectl access, Run:ai’s authorization mechanism is the best way to restrict functionality for the user space cluster.

If users need access to the user space Kubernetes cluster, PermissionManager is included with the Kubernetes deployment. PermissionManager is an open source tool for Kubernetes role based access control (RBAC) management. To configure PermissionManager, please see their documentation.