Kubernetes Security Hardening#

This section of the install guide details a series of Kubernetes security hardening recommendations for NVIDIA Mission Control’s user and admin space Kubernetes clusters. These recommendations should be applied after setting up Kubernetes via the BCM provided wizard.

This hardening guide should be executed after installing Kubernetes, but before installing NVIDIA Mission Control services as autonomous job recovery or autonomous hardware recovery.

User and Admin Cluster Background#

In order to promote the principle of least privilege, NVIDIA Mission Control segments services so that only services which must have access to the out of band (OOB) network have access to it. User space includes those services that do not require access to the out of band (OOB) network. Admin space includes those services that do require access to the OOB network.

To implement this separation of user and admin spaces, NVIDIA Mission Control segments the control plane into two distinct Kubernetes clusters: the user and admin space Kubernetes clusters.The user space cluster runs services that do not need access to the out of band network. The admin space cluster runs services that need access to the out of band (OOB) network.

The user space cluster is used for the deployment of Run:ai. If Run:ai is not in use, then the user space cluster is not deployed. The user space cluster includes three control plane nodes along with the B200 / GB200 compute trays.

The admin space cluster is used for the deployment of NetQ, Grafana, Prometheus, Loki, NMC autonomous job recovery, and NMC autonomous hardware recovery. It is always present in every DGX GB200 and later NMC deployment. It includes three control plane nodes only.

Securing Control Nodes#

Only administrators should have ssh access to the control nodes that underlie the user and admin space Kubernetes clusters.

To achieve this, we will leverage BCM to restrict access to the user and admin space control node categories. To restrict ssh access to only administrators for the user space control nodes, please run the following command:

[a03-p1-head-01->category[k8s-system-user]]% set usernodelogin never
[a03-p1-head-01->category*[k8s-system-user*]]% commit
[a03-p1-head-01->category[k8s-system-user]]%

Note in the above example, the k8s-system-user should be replaced with the category named used for your user space Kubernetes control node category.

Now we will also introduce this restriction for the admin space control nodes:

[a03-p1-head-01->category[k8s-system-admin]]% set usernodelogin never
[a03-p1-head-01->category*[k8s-system-admin*]]% commit
[a03-p1-head-01->category[k8s-system-admin]]%

Note in the above example, the k8s-system-admin should be replaced with the category named used for your admin space Kubernetes control node category.

BCM has now enabled and configured ssh PAM to deny connections. To add back ssh access, you must explicitly add users and groups to the whitelist of PAM.

Restricting kubectl Access#

Certificate based authentication#

Access to Kubernetes is limited by default. By default, only the root user can access kubectl.

Administrators can grant kubectl access to additional users by modifying the file permissions associated with ~/.kube/config. By default, this file is owned by root, and read and write is restricted solely to root.

BCM also provides tooling to add users to Kubernetes clusters. This creates new new cerficiates and populates the users home directory for ~/.kube/config

An example:

cm-kubernetes-setup --cluster k8s-admin --add-user johndoe --role cluster-admin

Configuring head nodes to restrict network access#

Kubernetes API is available via the nginx instance on the head nodes used for proxying traffic back to the Kubernetes cluster(s). By default this is configured to listen on 10443/tcp passing traffic to 6443/tcp on the Kubernetes admin space cluster, k8s-admin in our example.

A snipit of the nginx.conf file:

stream {
        upstream kube-k8s-admin {
            server a03-p1-netq-x86-01:6443;
            server b04-p1-netq-x86-03:6443;
            server b03-p1-netq-x86-02:6443;
        }
        server {
            listen 10443;
            proxy_pass kube-k8s-admin;
            proxy_next_upstream on;
            proxy_connect_timeout 300ms;
        }

You can test this by using curl from any node in the user space. In this example, we’ll test using a rack of DGX GB200 compute nodes:

pdsh -g rack=B05 'curl -m 1 -k -s https://master:10443 >/dev/null 2>&1; [ $? -eq 28 ] && echo "TIMEOUT" || echo "WORKING"' | dshbak -c
----------------
b05-p1-dgx-05-c[01-18]
----------------
WORKING

To restrict access, we’ll configure shorewall on the head nodes.

Add rules to accept traffic from admin space nodes#

for admin_ip in $(cmsh -c "device list -t headnode -f ip" | tail -n +1; cmsh -c "device list -c k8s-system-admin -f ip" | tail -n +1); do \
    for headnode in $(cmsh -c "device list -t headnode -f hostname" | tail -n +1); do \
        echo "Configuring shorewall on $headnode to accept $admin_ip/32 on port 10443/tcp" && \
        cmsh -c "device; use $headnode; roles; use firewall; openports; add accept nat 10443 tcp fw $admin_ip/32; commit"; \
    done; \
done

Note: the k8s-system-admin category name in the example above may need to be substituted with the actual category name used on your cluster if it differs.

Add rule to drop all other traffic from non-matching nodes#

for headnode in $(cmsh -c "device list -t headnode -f hostname" | tail -n +1); do \
    echo "Configuring shorewall on $headnode to deny remaining traffic" && \
    cmsh -c "device; use $headnode; roles; use firewall; openports; add drop nat 10443 tcp fw 0.0.0.0/0; commit"; \
done

Configuration via BCM (cmsh) will automatically restart the shorewall daemon on the head nodes

Validate that we now drop traffic#

pdsh -g rack=B05 'curl -m 1 -k -s https://master:10443 >/dev/null 2>&1; [ $? -eq 28 ] && echo "TIMEOUT" || echo "WORKING"' | dshbak -c
----------------
b05-p1-dgx-05-c[01-18]
----------------
TIMEOUT

Configuring kubernetes nodes to restrict network access#

Note

Please download the latest example configurations and suggested policies for Kyverno and Calico Network Policies intended for both the NMC (k8s-admin) and Run:ai (k8s-user) clusters from the NVIDIA NGC Catalog.

The following documentation references the policy manifests available in the NGC Catalog. For assistance with NGC access please see the NGC User Guide.

Configuring Calico#

NVIDIA Mission Control’s Kubernetes installation wizard provisions Calico out of the box for network security. Calico is an open source network security solution for Kubernetes. Using Calico, administrators can manage network access between services within the SuperPod.

Additional Calico policies can be applied to the user (k8s-user) and admin (k8s-admin) Kubernetes clusters in order to control network traffic. The policies are different for each of these clusters because different services are running in each cluster.

The following approach is recommended for switching between clusters via the command line from the BCM head node:

List available Kubernetes modules#

module avail kubernetes
------------------------------------------------------------------------------------------- /cm/local/modulefiles -------------------------------------------------------------------------------------------
kubernetes/k8s-admin/1.32.7-1.1  kubernetes/k8s-user/1.32.7-1.1

Display which modules are presently loaded#

module list
Currently Loaded Modulefiles:
 1) shared   2) cluster-tools/11.0   3) cm-image/11.0   4) cmd   5) cmsh   6) cm-setup/11.0   7) slurm/slurm/24.11   8) kubernetes/k8s-admin/1.32.7-1.1

Set access to a cluster#

module load kubernetes/k8s-user

Switch between Kubernetes clusters#

module swap kubernetes/k8s-admin kubernetes/k8s-user

Apply policy to the k8s-admin cluster#

For the k8s-admin cluster, apply the recommended policies via the following:

Login to the BCM Head Node and make sure that you’ve loaded the k8s-admin module so kubectl will be configured for that cluster
Either create a new file or copy the 0_admin_network_policies.yaml to an accessible location on the BCM Head Node.

Modify the 0_admin_network_policies.yaml file to replace the placeholder CIDR:

spec:
 nets:
 # TODO: Replace this value with your BCM Head Node CIDR
 - "203.0.113.0/24" # placeholder example (RFC 5737 reserved safe address range)

Validate first and then apply the policy:

# Validate
kubectl apply --dry-run=client -f 0_admin_network_policies.yaml

# Apply
kubectl apply -f 0_admin_network_policies.yaml

Apply policy to the k8s-user cluster#

For the k8s-user cluster where Run:ai is deployed, apply the recommended policies via the following:

Either create a new file or copy the 0_user_network_policies.yaml to an accessible location on the BCM Head Node.

Login to the BCM Head Node and make sure that you’ve loaded the k8s-admin module so kubectl will be configured for that cluster

Modify the 0_user_network_policies.yaml file to replace the placeholder CIDR:

spec:
 nets:
 # TODO: Replace this value with your BCM Head Node CIDR
 - "203.0.113.0/24" # placeholder example (RFC 5737 reserved safe address range)

Validate first and then apply the policy:

# Validate
kubectl apply --dry-run=client -f 0_user_network_policies.yaml

# Apply
kubectl apply -f 0_user_network_policies.yaml

Configuring Border Top Of Rack (BTOR) switching#

To secure access to the kubeapi-server server TCP port, we’ll configure a ACL to restrict access.

The following config must be performed on both BTOR switches.

Create IP ACLs for each address in the control plane to permit communication with kubeapi-server. IPs in the example would be replaced with the IPs on the current system (Head node IPs, VIP, and Admin K8s cluster):

nv set acl acl-kubeapi type ipv4
nv set acl acl-kubeapi rule 10 match ip tcp dest-port 6443
nv set acl acl-kubeapi rule 10 match ip source-ip 7.241.28.8/31
nv set acl acl-kubeapi rule 10 action permit
nv set acl acl-kubeapi rule 20 match ip tcp dest-port 6443
nv set acl acl-kubeapi rule 20 match ip source-ip 7.241.28.10/32
nv set acl acl-kubeapi rule 20 action permit
nv set acl acl-kubeapi rule 30 match ip tcp dest-port 6443
nv set acl acl-kubeapi rule 30 match ip source-ip 7.241.28.12/31
nv set acl acl-kubeapi rule 30 action permit
nv set acl acl-kubeapi rule 40 match ip tcp dest-port 6443
nv set acl acl-kubeapi rule 40 match ip source-ip 7.241.28.14/32
nv set acl acl-kubeapi rule 40 action permit

Create catch-all deny ACLs:

nv set acl acl-kubeapi rule 50 match ip tcp dest-port 6443
nv set acl acl-kubeapi rule 50 action deny

Apply to the physical interfaces, NOT the bond interfaces that the hosts are connected. Each of the hosts is connected twice to each BTOR. You must use inband physical ports, NOT out-of-band physical ports:

nv set interface swp3s1 acl acl-kubeapi outbound
nv set interface swp3s2 acl acl-kubeapi outbound
nv set interface swp2s2 acl acl-kubeapi outbound
nv set interface swp6s3 acl acl-kubeapi outbound
nv set interface swp5s0 acl acl-kubeapi outbound

Save and apply the configuration:
```
nv config apply
nv config save
```

Configuring Kyverno#

The Kubernetes API can also be hardened to restrict capabilities to only those services which need them. To do this, NVIDIA Mission Control (NMC) leverages Kyverno, a widely employed declarative engine (a Kubernetes admission controller) to apply governance policies.

Installing Kyverno#

For both the k8s-user and k8s-admin Kubernetes clusters, the same approach and set of commands is used to install the Kyverno engine.

Add the Kyverno Helm repository and create the namespace:

helm repo add kyverno https://kyverno.github.io/kyverno/
kubectl create ns kyverno
kubectl label namespace kyverno zarf.dev/agent=ignore

Note

The label command ensures that Zarf won’t interfere with the Kyverno namespace.

Copy the values-kyverno.yaml file to an accessible location on the BCM Head Node.

This file configures resource limits and tolerations for Kyverno controllers to run on control plane nodes.

Install Kyverno:

helm install kyverno kyverno/kyverno --namespace kyverno --version 3.5.2 --values values-kyverno.yaml --wait --timeout 2m

Verify the installation:
```
kubectl get pods -n kyverno
```
All Kyverno pods should be in Running state before proceeding.

Applying NMC Security Policies#

The user and admin space Kubernetes clusters require different Kyverno policies because they run different software.

Note

Download example NMC security policy manifests from the NVIDIA NGC Catalog.

values-kyverno.yaml - Kyverno engine configuration (same for both clusters)
0_admin_policies.yaml - Admin cluster (k8s-admin) policies
0_user_policies.yaml - User cluster (k8s-user) policies

Apply policies to the k8s-admin cluster#

For the admin space Kubernetes cluster (k8s-admin):

Login to the BCM Head Node and ensure the k8s-admin module is loaded:
```
module load kubernetes/k8s-admin
```
Copy the 0_admin_policies.yaml file to an accessible location on the BCM Head Node.

Validate and apply the policy:

# Validate
kubectl apply --dry-run=client -f 0_admin_policies.yaml

# Apply
kubectl apply -f 0_admin_policies.yaml

Verify the policies are applied:
```
kubectl get clusterpolicy
```
You should see the applied ClusterPolicy resources listed.

Apply policies to the k8s-user cluster#

For the user space Kubernetes cluster (k8s-user) where Run:ai is deployed:

Login to the BCM Head Node and ensure the k8s-user module is loaded:
```
module load kubernetes/k8s-user
```
Copy the 0_user_policies.yaml file to an accessible location on the BCM Head Node.

Validate and apply the policy:

# Validate
kubectl apply --dry-run=client -f 0_user_policies.yaml

# Apply
kubectl apply -f 0_user_policies.yaml

Verify the policies are applied:
```
kubectl get clusterpolicy
```
You should see the applied ClusterPolicy resources listed.

Validating Policy Enforcement#

After applying policies, you can verify they are enforcing security restrictions:

# Check for policy violations
kubectl get policyreport -A

# View details of a specific policy
kubectl describe clusterpolicy disallow-privileged-containers

Policy enforcement is set to Enforce mode by default, meaning non-compliant resources will be blocked at admission time. The enforcement mode is set to Enforce in these policies, meaning non-compliant resources will be blocked at admission time. When reviewing the output of these commands, you may observe violations for pre-existing system components in infrastructure namespaces. Examples may include kube-system, kube-public, and tigera-operator. This is expected behavior for essential components that require elevated privileges. These resources are allowed to continue operating due to the allowExistingViolations: true setting in each policy. It permits resources that existed prior to run while logging their violations for audit and compliance visibility. However, any attempts to create NEW resources or UPDATE existing ones with similar security violations—even in system namespaces — will be rejected. This approach provides protection against non-compliant workloads while maintaining operational continuity for essential infrastructure components.

Configuring PermissionManager#

When adding new users to Kubernetes, it is important to restrict user capabilities to only those capabilities that are necessary for the user.

Run:ai features an advanced authorization policy mechanism to control access and rights for the user space Kubernetes cluster. If users do not need kubectl access, Run:ai’s authorization mechanism is the best way to restrict functionality for the user space cluster.

If users need access to the user space Kubernetes cluster, PermissionManager is included with the Kubernetes deployment. PermissionManager is an open source tool for Kubernetes role based access control (RBAC) management. To configure PermissionManager, please see their documentation.