Manual Installation#

Attention

This is an alternative to the automated installation. If you have used the automation, move to the Reference Applications section.

Important

Ensure the basic prerequisites have been met before proceeding with the setup.

Cluster Installation#

Follow these steps to install Cloud Native Stack (CNS) on your system:

Clone the Cloud Native Stack repository from GitHub, and navigate to the playbooks directory.

git clone --branch v25.7.2 https://github.com/NVIDIA/cloud-native-stack.git
cd cloud-native-stack/playbooks

Edit the cns_version.yaml file to specify version 15.1.
```
cns_version: 15.1
```
Open the file cns_values_15.1.yaml, and configure the following settings according to your requirements as described below:
```
enable_gpu_operator: yes
enable_network_operator: yes
enable_rdma: yes
deploy_ofed: yes
storage: no
monitoring: yes
loadbalancer: no
loadbalancer_ip: ""
cns_validation: yes
```
- enable_gpu_operator and enable_network_operator must be set to yes to ensure that NVIDIA Network Operator and NVIDIA GPU Operator are deployed.
- If you have DOCA-OFED driver installed on your host system, set deploy_ofed to no.
- To enable persistent storage, set storage to yes. This will deploy Local Path Provisioner and NFS Provisioner as storage options.
- To deploy the monitoring stack, set monitoring to yes. This will deploy Prometheus and Grafana with GPU metrics. After the stack is installed, access Grafana at http://<node-ip>:32222 with the credentials admin/cns-stack.
- To deploy MetalLB, set loadbalancer to yes and set loadbalancer_ip to the node/host IP address (for example, 10.117.20.50/32).
Modify the hosts file located at ./hosts to reflect your local machine configuration.
```
[master]
localhost ansible_connection=local
[nodes]
```
Edit the ./files/network-operator-value.yaml file (not ./files/network-operator-values.yaml) to disable the NFD deployment using the network operator.
```
nfd:
    enabled: false
    deployNodeFeatureRules: false
```

Modify the NicClusterPolicy Custom Resource (CR) by replacing the file located at ./files/nic-cluster-policy.yaml with the following file which removes the rdmaSharedDevicePlugin configuration.

apiVersion: mellanox.com/v1alpha1
kind: NicClusterPolicy
metadata:
  name: nic-cluster-policy
spec:
{% if deploy_ofed %}
  ofedDriver:
    readinessProbe:
      initialDelaySeconds: 10
      periodSeconds: 30
    forcePrecompiled: false
    terminationGracePeriodSeconds: 300
    livenessProbe:
      initialDelaySeconds: 30
      periodSeconds: 30
    upgradePolicy:
      autoUpgrade: true
      drain:
        deleteEmptyDir: true
        enable: true
        force: true
        timeoutSeconds: 300
        podSelector: ''
      maxParallelUpgrades: 1
      safeLoad: false
      waitForCompletion:
        timeoutSeconds: 0
    startupProbe:
      initialDelaySeconds: 10
      periodSeconds: 20
    image: doca-driver
    repository: nvcr.io/nvidia/mellanox
    version: 25.04-0.6.1.0-2
{% endif %}
  secondaryNetwork:
    cniPlugins:
      image: plugins
      repository: ghcr.io/k8snetworkplumbingwg
      version: v1.6.2-update.1
      imagePullSecrets: []
    multus:
      image: multus-cni
      repository: ghcr.io/k8snetworkplumbingwg
      version: v4.1.0
      imagePullSecrets: []
    ipamPlugin:
      image: whereabouts
      repository: ghcr.io/k8snetworkplumbingwg
      version: v0.7.0
      imagePullSecrets: []

Ensure you are in the ~/cloud-native-stack/playbooks directory. Create a Python virtual environment (venv) in which to install CNS:
```
sudo apt update
sudo apt install python3-pip python3-venv sshpass -y
python3 -m venv ".cns"
```
Run the following command to activate the virtual environment and begin the CNS installation:
```
source .cns/bin/activate
pip install --upgrade pip
bash setup.sh install
```

Wait for ten to fifteen minutes for the installation to complete.

Note

During the installation of CNS, it may need to reboot the system, which can result in the following error:

TASK [reboot the system] *************************************************************************************************************************************************************************************
fatal: [localhost]: FAILED! => {"changed": false, "elapsed": 0, "msg": "Running reboot with local connection would reboot the control node.", "rebooted": false}

If this occurs, manually reboot the system. After it restarts, re-activate the virtual environment and re-run the CNS installation command.

To deactivate the virtual environment, run:

deactivate

Verify the installation:

Check the status of the node:

kubectl get nodes

NAME   STATUS   ROLES                  AGE   VERSION
h4m    Ready    control-plane,worker   8h    v1.32.2

Verify that all NVIDIA Network Operator and NVIDIA GPU Operator pods are running:

kubectl get pods --all-namespaces | grep -E "network-operator|nvidia-gpu-operator"

NAMESPACE             NAME                                                              READY   STATUS      RESTARTS   AGE
network-operator      cni-plugins-ds-rbz46                                              1/1     Running     0          17h
network-operator      kube-multus-ds-jdwz5                                              1/1     Running     0          17h
network-operator      mofed-ubuntu22.04-84df8f497b-ds-kgjdr                             1/1     Running     0          17h
network-operator      network-operator-84798648dc-jhd9l                                 1/1     Running     0          17h
network-operator      whereabouts-2hc6n                                                 1/1     Running     0          17h
nvidia-gpu-operator   gpu-feature-discovery-7w6pq                                       1/1     Running     0          17h
nvidia-gpu-operator   gpu-operator-1727018588-node-feature-discovery-gc-65c5f8cf45tlp   1/1     Running     0          17h
nvidia-gpu-operator   gpu-operator-1727018588-node-feature-discovery-master-56b7qsghn   1/1     Running     0          17h
nvidia-gpu-operator   gpu-operator-1727018588-node-feature-discovery-worker-rckps       1/1     Running     0          17h
nvidia-gpu-operator   gpu-operator-849f9c989-gr4sv                                      1/1     Running     0          17h
nvidia-gpu-operator   nvidia-container-toolkit-daemonset-cnkv8                          1/1     Running     0          17h
nvidia-gpu-operator   nvidia-cuda-validator-fg28g                                       0/1     Completed   0          17h
nvidia-gpu-operator   nvidia-dcgm-exporter-vqpl5                                        1/1     Running     0          17h
nvidia-gpu-operator   nvidia-device-plugin-daemonset-5v5md                              1/1     Running     0          17h
nvidia-gpu-operator   nvidia-driver-daemonset-gmbjq                                     2/2     Running     0          17h
nvidia-gpu-operator   nvidia-operator-validator-x8527                                   1/1     Running     0          17h

Installing Operators#

Deploy Cert Manager#

Cert Manager is a Kubernetes add-on that automates the management and issuance of TLS certificates from various issuing sources. Deploying Cert-Manager in the cluster allows to automatically manage TLS certificates as Kubernetes secrets.

Add the Jetstack Helm repository:

helm repo add jetstack https://charts.jetstack.io
helm repo update

Use Helm to install Cert-Manager into the cert-manager namespace:

helm install cert-manager jetstack/cert-manager \
    --namespace cert-manager \
    --create-namespace \
    --version v1.18.2 \
    --set crds.enabled=true

Wait for two to three minutes for the installation to complete.

Verify that all pods in the cert-manager namespace have the Ready status:

kubectl get pods -n cert-manager -o wide

NAME                                       READY   STATUS    RESTARTS   AGE     IP              NODE             NOMINATED NODE   READINESS GATES
cert-manager-56cc584bd4-r8jbs              1/1     Running   0          2m48s   192.168.34.32   h4m-dev-system   <none>           <none>
cert-manager-cainjector-7cfc74b84b-bpdc7   1/1     Running   0          2m48s   192.168.34.31   h4m-dev-system   <none>           <none>
cert-manager-webhook-784f6dd68-qt48v       1/1     Running   0          2m48s   192.168.34.30   h4m-dev-system   <none>           <none>

Install SR-IOV Network Operator#

SR-IOV Network Operator is responsible for configuring the SR-IOV components in the cluster.

Clone the SR-IOV network operator repository from GitHub and navigate to the sriov-network-operator directory.

git clone --branch v1.5.0 \
    https://github.com/k8snetworkplumbingwg/sriov-network-operator.git
cd sriov-network-operator

Edit the ~/deployment/sriov-network-operator-chart/values.yaml file to enable admission controller configuration. This configures the operator-webhook and the network-resource-injector for installation, which is disabled by default.

Refer to the file below to view the configuration changes for the default values, including the specific recommended image versions.

Or replace the default values.yaml file with this one:

operator:
  tolerations:
    - key: "node-role.kubernetes.io/master"
      operator: "Exists"
      effect: "NoSchedule"
    - key: "node-role.kubernetes.io/control-plane"
      operator: "Exists"
      effect: "NoSchedule"
  nodeSelector: {}
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 1
          preference:
            matchExpressions:
              - key: "node-role.kubernetes.io/master"
                operator: In
                values: [""]
        - weight: 1
          preference:
            matchExpressions:
              - key: "node-role.kubernetes.io/control-plane"
                operator: In
                values: [""]
  nameOverride: ""
  fullnameOverride: ""
  resourcePrefix: "openshift.io"
  cniBinPath: "/opt/cni/bin"
  clusterType: "kubernetes"
  # minimal amount of time (in minutes) the operator will wait before removing
  # stale SriovNetworkNodeState objects (objects that doesn't match node with the daemon)
  # "0" means no extra delay, in this case the CR will be removed by the next reconciliation cycle (may take up to five minutes)
  staleNodeStateCleanupDelayMinutes: "30"
  metricsExporter:
    port: "9110"
    certificates:
      secretName: "metrics-exporter-cert"
    prometheusOperator:
      enabled: false
      serviceAccount: "prometheus-k8s"
      namespace: "monitoring"
      deployRules: false
  admissionControllers:
    enabled: true
    certificates:
      secretNames:
        operator: "operator-webhook-cert"
        injector: "network-resources-injector-cert"
      certManager:
        # When enabled, makes use of certificates managed by cert-manager.
        enabled: true
        # When enabled, certificates are generated via cert-manager and then name will match the name of the secrets
        # defined above
        generateSelfSigned: true
      # If not specified, no secret is created and secrets with the names defined above are expected to exist in the
      # cluster. In that case, the ca.crt must be base64 encoded twice since it ends up being an env variable.
      custom:
        enabled: false
      #  operator:
      #    caCrt: |
      #      -----BEGIN CERTIFICATE-----
      #      MIIMIICLDCCAdKgAwIBAgIBADAKBggqhkjOPQQDAjB9MQswCQYDVQQGEwJCRTEPMA0G
      #      ...
      #      -----END CERTIFICATE-----
      #    tlsCrt: |
      #      -----BEGIN CERTIFICATE-----
      #      MIIMIICLDCCAdKgAwIBAgIBADAKBggqhkjOPQQDAjB9MQswCQYDVQQGEwJCRTEPMA0G
      #      ...
      #      -----END CERTIFICATE-----
      #    tlsKey: |
      #      -----BEGIN EC PRIVATE KEY-----
      #      MHcl4wOuDwKQa+upc8GftXE2C//4mKANBC6It01gUaTIpo=
      #      ...
      #     -----END EC PRIVATE KEY-----
      #  injector:
      #    caCrt: |
      #      -----BEGIN CERTIFICATE-----
      #      MIIMIICLDCCAdKgAwIBAgIBADAKBggqhkjOPQQDAjB9MQswCQYDVQQGEwJCRTEPMA0G
      #      ...
      #      -----END CERTIFICATE-----
      #    tlsCrt: |
      #      -----BEGIN CERTIFICATE-----
      #      MIIMIICLDCCAdKgAwIBAgIBADAKBggqhkjOPQQDAjB9MQswCQYDVQQGEwJCRTEPMA0G
      #      ...
      #      -----END CERTIFICATE-----
      #    tlsKey: |
      #      -----BEGIN EC PRIVATE KEY-----
      #      MHcl4wOuDwKQa+upc8GftXE2C//4mKANBC6It01gUaTIpo=
      #      ...
      #     -----END EC PRIVATE KEY-----

sriovOperatorConfig:
  # deploy sriovOperatorConfig CR with the below values
  deploy: true
  # node selectors for sriov-network-config-daemon
  configDaemonNodeSelector:
    beta.kubernetes.io/os: "linux"
    network.nvidia.com/operator.mofed.wait: "false"
  # log level for both operator and sriov-network-config-daemon
  logLevel: 2
  # disable node draining when configuring SR-IOV, set to true in case of a single node
  # cluster or any other justifiable reason
  disableDrain: false
  # sriov-network-config-daemon configuration mode. either "daemon" or "systemd"
  configurationMode: daemon
  # feature gates to enable/disable
  featureGates: {}

# Example for supportedExtraNICs values ['MyNIC: "8086 1521 1520"']
supportedExtraNICs: []

# Image URIs for sriov-network-operator components
images:
  operator: nvcr.io/nvidia/mellanox/sriov-network-operator:network-operator-25.4.0
  sriovConfigDaemon: nvcr.io/nvidia/mellanox/sriov-network-operator-config-daemon:network-operator-25.4.0
  sriovCni: ghcr.io/k8snetworkplumbingwg/sriov-cni:v2.8.1
  ibSriovCni: ghcr.io/k8snetworkplumbingwg/ib-sriov-cni:v1.2.1
  ovsCni: ghcr.io/k8snetworkplumbingwg/ovs-cni-plugin:v0.38.2
  rdmaCni: ghcr.io/k8snetworkplumbingwg/rdma-cni:v1.3.0
  sriovDevicePlugin: ghcr.io/k8snetworkplumbingwg/sriov-network-device-plugin:v3.9.0
  resourcesInjector: ghcr.io/k8snetworkplumbingwg/network-resources-injector:v1.7.0
  webhook: nvcr.io/nvidia/mellanox/sriov-network-operator-webhook:network-operator-25.4.0
  metricsExporter: ghcr.io/k8snetworkplumbingwg/sriov-network-metrics-exporter
  metricsExporterKubeRbacProxy: gcr.io/kubebuilder/kube-rbac-proxy:v0.15.0

imagePullSecrets: []
extraDeploy: []

Install the SRIOV Network Operator. Make sure you are in the parent directory where you cloned the sriov-network-operator repo:

helm install sriov-network-operator ./deployment/sriov-network-operator-chart \
    -n sriov-network-operator \
    --create-namespace \
    --wait

Wait for one to two minutes for the operator installation to complete.

Verify that all pods in the sriov-network-operator namespace have the Ready status:

kubectl get pods -n sriov-network-operator

NAME                                      READY   STATUS    RESTARTS   AGE
network-resources-injector-m8dwx          1/1     Running   0          75s
operator-webhook-9wwjm                    1/1     Running   0          75s
sriov-network-config-daemon-j2xw7         1/1     Running   0          75s
sriov-network-operator-5bfc88d89c-bkrzz   1/1     Running   0          82s

Create the following custom resources for a proper network configuration:
- SriovNetworkNodePolicy (refer to Configure SR-IOV Network Node Policy)
- SriovNetwork (refer to Configure SR-IOV Network)

Configure SR-IOV Network Node Policy#

Identify the node name using the following command:
```
kubectl get nodes --no-headers -o custom-columns=NAME:.metadata.name
```
Use the retrieved node name to replace <node_name> in the following steps.

Use the following command to identify the interface name corresponding to your NIC.

Replace the <node_name> with the name of the node that is determined in step 1 and select the appropriate link speed based on your network environment.

kubectl -n sriov-network-operator \
    get sriovnetworknodestates.sriovnetwork.openshift.io <node_name> -o json | \
jq '.status.interfaces[] |
    select(.linkSpeed | test("^[1-9][0-9]{4,} Mb/s$")) | .name'

"enp3s0f0"
"enp3s0f1"

Create an SriovNetworkNodePolicy CR with the following content in the sriov_policy.yaml file:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
    name: media-a-tx-pool
    namespace: sriov-network-operator
spec:
    nodeSelector:
        feature.node.kubernetes.io/rdma.capable: "true"
    resourceName: media_a_tx_pool
    priority: 99
    mtu: 1500
    numVfs: 16
    nicSelector:
        pfNames: ["<interface_name_0>#0-15"]
    deviceType: netdevice
    isRdma: true
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
    name: media-a-rx-pool
    namespace: sriov-network-operator
spec:
    nodeSelector:
        feature.node.kubernetes.io/rdma.capable: "true"
    resourceName: media_a_rx_pool
    priority: 99
    mtu: 1500
    numVfs: 16
    nicSelector:
        pfNames: ["<interface_name_1>#0-15"]
    deviceType: netdevice
    isRdma: true

Replace <interface_name_0> and <interface_name_1> in the above snippet with the interface names that you got from running the command identify the interface name.

Create an SriovNetworkNodePolicy CR using the following command:
```
kubectl apply -f sriov_policy.yaml
```
```
sriovnetworknodepolicy.sriovnetwork.openshift.io/media-a-tx-pool created
sriovnetworknodepolicy.sriovnetwork.openshift.io/media-a-rx-pool created
```
Note

After applying this file, the system might become temporarily unreachable. Be assured that the system will become accessible again. Allow some time for all components to fully initialize.
Wait for one to two minutes for the sriov-device-plugin pod to have Ready status.

Check the pod status using the following command:

kubectl get pods -n sriov-network-operator

NAME                                      READY   STATUS    RESTARTS   AGE
network-resources-injector-m8dwx          1/1     Running   0          7m55s
operator-webhook-9wwjm                    1/1     Running   0          7m55s
sriov-device-plugin-dql8q                 1/1     Running   0          67s
sriov-network-config-daemon-j2xw7         1/1     Running   0          7m55s
sriov-network-operator-5bfc88d89c-bkrzz   1/1     Running   0          8m2s

Wait one to two minutes for virtual functions to get created.

Verify that the two pools each have a positive value for the node before proceeding. For example:
```
kubectl get node <node_name> -o json | \
jq '.status.allocatable |
    with_entries(select(.key|test("^openshift.io/.+pool$")))'
```
```
{
"openshift.io/media_a_rx_pool": "16",
"openshift.io/media_a_tx_pool": "16"
}
```
Replace the <node_name> with the name of the node that is determined in step 1.

Configure SR-IOV Network#

This configuration requires that you create an sriov_network.yaml file that refers to the resourceName values defined in the SriovNetworkNodePolicy.

Create an SriovNetwork CR for the chosen network interfaces.

In the example below, two networks are created for each port. The first is configured to use the Whereabouts plugin for dynamic IP Address Management (IPAM). The second is configured with static IPAM to allow manual and fixed assignment of IP addresses.

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
    name: media-a-tx-net
    namespace: sriov-network-operator
spec:
    ipam: |
        {
        "type": "whereabouts",
        "range": "192.168.100.0/24",
        "exclude": [ "192.168.100.0/26", "192.168.100.128/25" ]
        }
    networkNamespace: default
    resourceName: media_a_tx_pool
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
    name: media-a-rx-net
    namespace: sriov-network-operator
spec:
    ipam: |
        {
            "type": "whereabouts",
            "range": "192.168.100.0/24",
            "exclude": [ "192.168.100.0/25", "192.168.100.128/26" ]
        }
    networkNamespace: default
    resourceName: media_a_rx_pool
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
    name: media-a-tx-net-static
    namespace: sriov-network-operator
spec:
    ipam: |
        {
            "type": "static"
        }
    networkNamespace: default
    resourceName: media_a_tx_pool
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
    name: media-a-rx-net-static
    namespace: sriov-network-operator
spec:
    ipam: |
        {
            "type": "static"
        }
    networkNamespace: default
    resourceName: media_a_rx_pool

IP range for each of the networks are listed in the following table:

Resource Name	Network Names	Static IPAM Range	Dynamic IPAM Range
`media_a_tx_pool`	`media-a-tx-net(-static)`	192.168.100.0-63	192.168.100.64-127
`media_a_rx_pool`	`media-a-rx-net(-static)`	192.168.100.128-191	192.168.100.192-255

Create an SriovNetwork CR using the following:

kubectl apply -f sriov_network.yaml

sriovnetwork.sriovnetwork.openshift.io/media-a-tx-net created
sriovnetwork.sriovnetwork.openshift.io/media-a-rx-net created
sriovnetwork.sriovnetwork.openshift.io/media-a-tx-net-static created
sriovnetwork.sriovnetwork.openshift.io/media-a-rx-net-static created

Execute the following command to validate successful creation of the SriovNetwork:

kubectl get network-attachment-definitions

NAME                    AGE
media-a-rx-net          5m48s
media-a-rx-net-static   5m48s
media-a-tx-net          5m48s
media-a-tx-net-static   5m48s

Advanced Configuration#

Configure CPU Manager#

The CPU management policy in Kubernetes enables exclusive CPU allocation for containers within Guaranteed Quality of Service (QoS) pods. To enable this, we need to change the cpuManagerPolicy from none to static.

Identify the name of node and drain it:

kubectl get nodes
kubectl drain --ignore-daemonsets <node_name>

Stop the Kubelet:
```
sudo systemctl stop kubelet
```
Remove the old CPU manager state file. By default, the path to this file is /var/lib/kubelet/cpu_manager_state. This clears the state maintained by the CPU Manager so that the cpusets created by the new policy won’t conflict with it.
```
sudo rm /var/lib/kubelet/cpu_manager_state
```
Edit the Kubelet configuration file, /var/lib/kubelet/config.yaml, and add the given lines:
```
cpuManagerPolicy: "static"
reservedSystemCPUs: "0-1"
```
Start the Kubelet:
```
sudo systemctl start kubelet
```
Uncordon the node:
```
kubectl uncordon <node_name>
```

Configuring GPU Time-Slicing#

By default, on a workstation with a single GPU, only one container (and pod) using GPU can be installed. However, time-slicing allows sharing the same GPU among different containers (and pods) by creating replicas.

Create time-slicing-config-all.yaml based on the following example. Configure the number of time-sliced GPU replicas to make available for shared access, for example, 4:

apiVersion: v1
kind: ConfigMap
metadata:
    name: time-slicing-config-all
data:
    any: |-
        version: v1
        flags:
            migStrategy: none
        sharing:
            timeSlicing:
                resources:
                - name: nvidia.com/gpu
                  replicas: 4

Add the config map to the same namespace as the GPU operator:

kubectl create -n nvidia-gpu-operator -f time-slicing-config-all.yaml

Configure the device plugin with the config map and set the default time-slicing configuration:

kubectl patch clusterpolicies.nvidia.com/cluster-policy \
    -n nvidia-gpu-operator --type merge \
    -p '{"spec": {"devicePlugin": {"config": {"name": "time-slicing-config-all","default": "any"}}}}'

Verify the GPU replica count on the node, using the following command:
```
kubectl describe node | grep nvidia.com/gpu.replicas
```

Performance Configuration#

The Rivermax SDK which enables ST 2110 streaming takes advantage of GPUDirect and uses huge pages for performance. Huge pages are a memory management technique used in modern computer systems to improve performance for memory-intensive applications or large memory transfer. Enabling it is strongly recommended to benefit from the best performance.

Check that the nvidia-peermem module is loaded:

lsmod | grep nvidia

nvidia_peermem         16384  0
nvidia_uvm           4956160  4
nvidia_drm            122880  3
nvidia_modeset       1355776  5 nvidia_drm
nvidia              54296576  87 nvidia_uvm,nvidia_peermem,nvidia_modeset
ib_uverbs             196608  3 nvidia_peermem,rdma_ucm,mlx5_ib
video                  73728  4 asus_wmi,amdgpu,asus_nb_wmi,nvidia_modeset

If the nvidia-peermem module is not loaded, run the following command to load the module:

sudo modprobe nvidia-peermem
echo "nvidia-peermem" | sudo tee /etc/modules-load.d/nvidia-peermem.conf

Configure shmmax and HugePages:
1. Configure shmmax as needed (for example, set to 2 GiB):
```
sudo sysctl -w kernel.shmmax=2147483648
```
  - If huge pages are being set for the first time on the system, the setting can be made persistent across reboots using:
```
echo 'kernel.shmmax=2147483648' | sudo tee -a /etc/sysctl.conf
```
  - If the persistent setting needs to be updated, edit the existing value in /etc/sysctl.conf and apply the settings using:
```
sudo sysctl -p
```
2. Enable the HugePages allocation:
  - Check HugePage size:
    cat /proc/meminfo | grep Hugepagesize Hugepagesize: 2048 kB
  - Calculate and configure HugePages (for example, allocate 10 GiB)
    
    Calculate required HugePages based on HugePage size:
    10 GiB = 10240 MiB Number of HugePages = 10240 MiB / 2 MiB = 5120
    Set HugePages:
    sudo sysctl -w vm.nr_hugepages=5120
    - If huge pages are being set for the first time on the system, the setting can be made persistent across reboots using:
    echo 'vm.nr_hugepages=5120' | sudo tee -a /etc/sysctl.conf
    - If the persistent setting needs to be updated, edit the existing value in /etc/sysctl.conf and apply the settings using:
    sudo sysctl -p
  Note
  
  Allocating HugePages reserves 10 GB, reducing the total memory available for other applications.
3. Restart the Kubelet:
```
sudo systemctl restart kubelet
```

Cluster Uninstallation#

Ensure you are in the ~/cloud-native-stack/playbooks directory. To uninstall CNS, run the following commands:

sudo iptables -P INPUT ACCEPT
sudo iptables -P FORWARD ACCEPT
sudo iptables -P OUTPUT ACCEPT
source .cns/bin/activate
bash setup.sh uninstall

After uninstalling CNS, it is important to reboot the system to completely remove any components loaded by CNS.