Advanced Configuration#

Reboot Node Gracefully#

Graceful restart instructions are found in the OpenShift Understanding node rebooting documentation.

Listing those here for ease of use:

  1. Mark the node as unschedulable:

    oc adm cordon <node>
    
  2. Drain the node to remove all the running pods:

    oc adm drain <node> --ignore-daemonsets --delete-emptydir-data --force
    
  3. From the jump node, access the node in debug mode:

    oc debug node/<node>
    
  4. Change your root directory to /host:

    chroot /host
    
  5. Restart the node:

    systemctl reboot
    
  6. After the reboot is complete, mark the node as schedulable by running the following command:

    oc adm uncordon <node>
    
  7. Verify that the node is ready:

    oc get node <node>
    
    NAME STATUS ROLES AGE VERSION
    <node> Ready master,worker 5d21h v1.24.16+2e1e137
    

Configuring GPU Time-Slicing#

By default, if N GPUs are available on a node, at most only N containers (and pods) using GPU can be installed. However, time-slicing allows sharing the same GPU among different containers (and pods) by creating replicas.

  1. Create time-slicing-config-all.yaml based on the following example. Configure the number of time-sliced GPU replicas to make available for shared access, for example, 4:

    apiVersion: v1
    kind: ConfigMap
    metadata:
        name: time-slicing-config-all
    data:
        any: |-
            version: v1
            flags:
                migStrategy: none
            sharing:
                timeSlicing:
                    resources:
                    - name: nvidia.com/gpu
                      replicas: 4
    
  2. Add the config map to the same namespace as the GPU operator:

    oc create -n nvidia-gpu-operator -f time-slicing-config-all.yaml
    
  3. Configure the device plugin with the config map and set the default time-slicing configuration:

    oc patch clusterpolicies.nvidia.com/gpu-cluster-policy
        -n nvidia-gpu-operator --type merge
        -p '{"spec": {"devicePlugin": {"config": {"name": "time-slicing-config-all","default": "any"}}}}'
    
  4. Confirm that the node advertises additional GPU resources:

    oc describe node <node-name>
    

    The output varies according to the GPUs in your node and the configuration that you apply.

    The key considerations are as follows:

    • The nvidia.com/gpu.count label reports the number of physical GPUs in the machine.

    • The nvidia.com/gpu.product label includes a -SHARED suffix to the product name.

    • The nvidia.com/gpu.replicas label indicates the expected number of replicas, and the reported capacity corresponds to the count × replicas.

    ...
    Labels:
                nvidia.com/gpu.count=4
                nvidia.com/gpu.product=NVIDIA-L40S-SHARED
                nvidia.com/gpu.replicas=4
    Capacity:
        nvidia.com/gpu: 16
        ...
    Allocatable:
        nvidia.com/gpu: 16
        ...
    

For more advanced configuration, see Time-Slicing GPUs in Kubernetes.