image image image image image image



On This Page

Created on Nov 16, 2020 by Boris Kovalev, Vitaliy Razinkov

Scope

This Reference Deployment Guide (RDG) explains how to build the highest performing Kubernetes (K8s) cluster capable of hosting the most demanding distributed workloads, running on top of an NVIDIA GPU and an NVIDIA Mellanox end-to-end InfiniBand fabric. 

Abbreviation and Acronyms

TermDefinitionTermDefinition
AOCActive Optical CableIBInfiniBand
AIArtificial Intelligence

K8s

Kubernetes

CNI

Container Network Interface

ML Machine Learning
CRCustome ResourceMOFEDMellanox OpenFabrics Enterprise Distribution
DACDirect Attach Copper cable

PF

Physical Function

DHCP

Dynamic Host Configuration Protocol

RDMARemote Direct Memory Access
EDREnhanced Data Rate - 100Gb/s

QSG

Quick Start Guide

GPU

 Graphics Processing Unit

SR-IOV

Single Root Input Output Virtualization

HDRHigh Data Rate - 200Gb/s

VF

Virtual Function

HPCHigh Performance Computing

References

Introduction

Provisioning of Machine Learning (ML) and High Performance Computing (HPC) cloud solutions may become a very complicated task. Proper design, and software and hardware component selection may become a gating task toward successful deployment.

This document will guide you through a complete solution cycle including design, component selection, technology overview and deployment steps. 

The solution will be provisioned on top of GPU enabled servers over an NVIDIA Mellanox end-to-end InfiniBand fabric. 
NVIDIA GPU and SR-IOV Network Operators allow to run GPU accelerated and native RDMA workloads on the InfiniBand fabric such as HPC, Big Data, ML, AI and other applications.

The following processes are described below:

  1. K8s cluster deployment by Kubespray over bare metal nodes with Ubuntu 20.04 OS.
  2. NVIDIA GPU Operator deployment.
  3. InfiniBand fabric configuration.
  4. POD deployment example.

This document covers a single Kubernetes controller deployment scenario.

For high-availability cluster deployment, please refer to https://github.com/kubernetes-sigs/kubespray/blob/master/docs/ha-mode.md

Solution Architecture

Key Components and Technologies

  • NVIDIA® T4 GPU

    The NVIDIA® T4 GPU is based on the NVIDIA Turing architecture and packaged in an energy-efficient 70-watt small PCIe form factor. T4 is optimized for mainstream computing environments, and features multi-precision Turing Tensor Cores and RT Cores. Combined with accelerated containerized software stacks from NGC, T4 delivers revolutionary performance at scale to accelerate cloud workloads, such as high-performance computing, deep learning training and inference, machine learning, data analytics, and graphics.
  • NVIDIA MLNX-OS®
    NVIDIA MLNX-OS is Mellanox's InfiniBand/VPI switch operating system for data centers with storage, enterprise, high-performance, machine learning, Big Data computing and cloud fabrics.
  • NVIDIA Mellanox ConnectX InfiniBand adapters 
    NVIDIA Mellanox® ConnectX® InfiniBand smart adapters with acceleration engines deliver best-in-class network performance and efficiency, enabling low-latency, high throughput and high message rates for applications at SDR, QDR, DDR, FDR, EDR and HDR InfiniBand speeds.
  • NVIDIA Mellanox smart InfiniBand switch systems
    NVIDIA Mellanox smart InfiniBand switch systems deliver the highest performance and port density for high performance computing (HPC), AI, Web 2.0, big data, clouds, and enterprise data centers. Support for 36 to 800-port configurations at up to 200Gb/s per port, allows compute clusters and converged data centers to operate at any scale, reducing operational costs and infrastructure complexity.
  • NVIDIA Mellanox LinkX® InfiniBand Cables
    NVIDIA Mellanox LinkX cables and transceivers are designed to maximize the performance of High Performance Computing networks, requiring high-bandwidth, low-latency connections between compute nodes and switch nodes. DAC is available up to 7m. AOCs are available in <30m OM2 fiber lowest-cost lengths; OM3/OM4 multimode to 100m.  DACs and AOCs data rates of QDR(40G), FDR10(40G), FDR(56G), EDR(100G), HDR100 (100G) and HDR (200G).
  • Kubernetes
    Kubernetes (K8s) is an open-source container orchestration platform for deployment automation, scaling, and management of containerized applications.
  • Kubespray (From Kubernetes.io)
    Kubespray is a composition of Ansible playbooks, inventory, provisioning tools, and domain knowledge for generic OS/Kubernetes clusters configuration management tasks and provides:
    • A highly available cluster
    • Composable attributes
    • Support for most popular Linux distributions
  • NVIDIA GPU Operator
    NVIDIA GPU Operator uses the operator framework within Kubernetes to automate the management of all NVIDIA software components needed to provision GPUs.
    These components include the NVIDIA drivers (to enable CUDA), Kubernetes device plugin for GPUs, the NVIDIA Container Runtime, automatic node labelling, 
    DCGM based monitoring and others.
  • RDMA 
    Remote Direct Memory Access (RDMA) is a technology that allows computers in a network to exchange data without involving the processor, cache or operating system of either computer.
    Like locally based Direct Memory Access (DMA), 
    RDMA improves throughput and performance and frees up compute resources.
  • SR-IOV Network Operator 
    SR-IOV Network Operator is designed to help the user to provision and configure SR-IOV CNI plugin and Device plugin in the Openshift and Kubernetes clusters.

Logical Design

The logical design includes the following layers:

  • One compute layer: 
    1. Deployment node
    2. K8s Master node
    3. 2 x K8s Worker nodes with two NVIDIA T4 GPUs and one Mellanox ConnectX adapter. 
  • Two separate networking layers: 
    1. Management network
    2. High-speed InfiniBand (IB) fabric  

Kubernetes cluster deployment Logical Design

Fabric Design

In this RDG we will describe a small scale solution with only one switch.

Simple Setup with One Switch

In a single switch case, by using an NVIDIA Mellanox QM8700 InfiniBand HDR Switch System you can connect up to 40 servers with NVIDIA Mellanox LinkX HDR 200Gb/s QSFP56 DAC cables.

Scaled Setup for InfiniBand Fabric

For assistance in designing the scaled InfiniBand topology, use the Mellanox InfiniBand Topology Generator, an online cluster configuration tool that offers flexible cluster configurations and sizes.
For a scaled setup we recommend using Mellanox Unified Fabric Manager (UFM®)

Bill of Materials (BoM)

The following hardware setup is utilized in the distributed K8s configuration described in this guide:

Kubernetes cluster deployment BoM

The above table does not contain Kubernetes Management network connectivity components.


Deployment and Configuration

The deployment is validated using Ubuntu 20.04 OS and Kubespray v2.14.2.

Wiring

The first port of each NVIDIA Mellanox HCA on each Worker node is wired to the NVIDIA Mellanox switch using NVIDIA Mellanox LinkX HDR 200Gb/s QSFP56 DAC cables.

 Kubernetes cluster deployment Wiring

Network

Prerequisites

  • InfiniBand fabric
    • Switch
      NVIDIA Mellanox QM8700
    • Switch OS
      NVIDIA MLNX-OS®
  • Management Network 
    DHCP and DNS services are part of the IT infrastructure. The component installation and configuration are not covered in this guide.

Network Configuration

Below are the server names with their relevant network configurations.


Server/Switch type


Server/Switch name
IP and NICS

High-speed network

HDR

Management network

1 GigE

Master Nodenode1

eno0: DHCP
192.168.1.40

Worker Nodenode2

ibs6f0: none

eno0: DHCP
192.168.1.10
Worker Nodenode3

ibs6f0: none

eno0: DHCP
192.168.1.11

Deployment Nodesl-depl-node


eno0: DHCP
192.168.1.43
High-speed switchswx-mld-ib67nonemgmt0: From DHCP
192.168.1.38
ibs6f0 interfaces do not require any additional configuration.

InfiniBand Fabric Configuration

Below is a list of recommendations and prerequisites that are important for the configuration process:

  • Refer to the MLNX-OS User Manual to become familiar with the switch software (located at support.mellanox.com)
  • Upgrade the switch software to the latest MLNX-OS version
  • InfiniBand Subnet Manager (SM) is required to configure InfiniBand fabric properly

There are three ways to run an InfiniBand SM in the InfiniBand fabric:

  1. Start the SM on one or more managed switches. This is a very convenient and quick operation which allows for easier InfiniBand ‘plug & play'.
  2. Run OpenSM daemon on one or more servers by executing the /etc/init.d/opensmd command. It is recommended to run the SM on a server in case there are 648 nodes or more.
  3. Use Unified Fabric Management (UFM®). 
    UFM is a powerful platform for scale-out computing, eliminates the complexity of fabric management, provides deep visibility into traffic, and optimizes fabric performance.

In this guide, we will launch the InfiniBand SM on the InfiniBand switch (Method num. 1). Below are the configuration steps for the chosen method.

To enable the SM on one of the managed switches:

  1. Login to the switch and enter the next configuration commands (swx-mld-ib67 is our switch name):

    IB switch configuration
    Mellanox MLNX-OS Switch Management
    
    switch login: admin
    Password: 
     
    swx-mld-ib67 [standalone: master] > enable 
    swx-mld-ib67 [standalone: master] # configure terminal
    swx-mld-ib67 [standalone: master] (config) # ib smnode swx-mld-ib67 enable 
    swx-mld-ib67 [standalone: master] (config) # ib smnode swx-mld-ib67 sm-priority 0
    
    swx-mld-ib67 [standalone: master] (config) # ib sm virt enable
    swx-mld-ib67 [standalone: master] (config) # write memory
    swx-mld-ib67 [standalone: master] (config) # reload
     
  2. Once the switch reboots, check the switch configuration. It should look like the following:

    Switch config example
    Mellanox MLNX-OS Switch Management
    
    switch login: admin
    Password: 
    
    swx-mld-ib67 [standalone: master] > enable 
    swx-mld-ib67 [standalone: master] # configure terminal
    swx-mld-ib67 [standalone: master] (config) # show running-config 
    ##
    ## Running database "initial"
    ## Generated at 2020/12/16 17:40:41 +0000
    ## Hostname: swx-mld-ib67
    ## Product release: 3.9.1600
    ##
    
    ##
    ## Running-config temporary prefix mode setting
    ##
    no cli default prefix-modes enable
    
    ##
    ## Subnet Manager configuration
    ##
       ib sm virt enable
    
    ##
    ## Other IP configuration
    ##
       hostname swx-mld-ib67
    
    ##
    ## Other IPv6 configuration
    ##
    no ipv6 enable
    
    ##
    ## Local user account configuration
    ##
       username admin password 7 $6$6GZ8Q0RF$FZW9pc23JJkwwOJTq85xZe1BJgqQV/m6APQNPkagZlTEUgKMWLr5X3Jq2hsUyB.K5nrGdDNUaSLiK2xupnIJo1
       username monitor password 7 $6$z1.r4Kl7$TIwaNf7uXNxZ9UdGdUpOO9kVug0shRqGtu75s3dSrY/wY1v1mGjrqQLNPHvHYh5HAhVuUz5wKzD6H/beYeEqL.
    
    ##
    ## AAA remote server configuration
    ##
    # ldap bind-password ********
    # radius-server key ********
    # tacacs-server key ********
    
    ##
    ## Network management configuration
    ##
    # web proxy auth basic password ********
    
    ##
    ## X.509 certificates configuration
    ##
    #
    # Certificate name system-self-signed, ID 12d0989d8623825b71bc25f9bc02de813fc9fe2a
    # (public-cert config omitted since private-key config is hidden)
    
    
    ##
    ## IB nodename to GUID mapping
    ##
       ib smnode swx-mld-ib67 create
       ib smnode swx-mld-ib67 enable
       ib smnode swx-mld-ib67 sm-priority 0
    ##
    ## Persistent prefix mode setting
    ##
    cli default prefix-modes enable

Nodes Configuration

General Prerequisites:

  • Hardware
    All the K8s Worker nodes have the same hardware specification (see BoM for details).
  • Host BIOS
    Verify that you are using a SR-IOV supported server platform for K8s Worker nodes, and review the BIOS settings in the hardware documentation to enable SR-IOV in the BIOS.
  • Host OS
    Ubuntu Server 20.04 operating system should be installed on all servers with OpenSSH server packages.
  • Experience with Kubernetes
    Make sure to familiarize yourself with the Kubernetes Cluster architecture. 

Host OS Prerequisites 

Make sure Ubuntu Server 20.04 operating system is installed on all servers with OpenSSH server packages, and create a non-root user account with sudo privileges without password.

Update the Ubuntu software packages by running the following commands:

Server console
$ sudo apt-get update
$ sudo apt-get upgrade -y
$ sudo reboot

Non-root User Account Prerequisites 

In this solution we added the following line to the EOF /etc/sudoers:

Server Console
$ sudo vim /etc/sudoers

#includedir /etc/sudoers.d

#K8s cluster deployment user with sudo privileges without password
user ALL=(ALL) NOPASSWD:ALL

Software Prerequisites

  1. Disable/blacklist Nouveau NVIDIA driver on the Worker node servers by running the commands below or paste each line into the terminal:

    Server Console
    $ sudo su -
    # lsmod |grep nouv
    # bash -c "echo blacklist nouveau > /etc/modprobe.d/blacklist-nvidia-nouveau.conf"
    # bash -c "echo options nouveau modeset=0 >> /etc/modprobe.d/blacklist-nvidia-nouveau.conf"
    # update-initramfs -u
    # reboot
    $ lsmod |grep nouv
  2. Install NVIDIA Mellanox MOFED and upgrade FW on the Worker node servers by running the commands below or paste each line into the terminal:

    Server Console
    $ sudo su -
    # apt-get install rdma-core
    # wget -qO - https://www.mellanox.com/downloads/ofed/RPM-GPG-KEY-Mellanox | sudo apt-key add -
    # curl https://linux.mellanox.com/public/repo/mlnx_ofed/latest/ubuntu20.04/mellanox_mlnx_ofed.list --output /etc/apt/sources.list.d/mellanox_mlnx_ofed.list
    # apt update
    # apt install -y mlnx-ofed-kernel-only
    # wget http://www.mellanox.com/downloads/firmware/mlxup/4.15.2/SFX/linux_x64/mlxup
    # chmod +x mlxup
    # ./mlxup --online -u
    # reboot
  3. Set Up IB port link on the Worker node servers.

    Server Console
    root@node2:~# ibdev2netdev
    ...
    mlx5_2 port 1 ==> ibs6f0 (Down)
    mlx5_3 port 1 ==> ibs6f1 (Down)
    ...
    
    root@node2:~# vim /etc/netplan/00-installer-config.yaml
    
    # This is the network config written by 'subiquity'
    network:
      ethernets:
        ibs6f0: {}
        eno1:
          dhcp4: true
      version: 2
    
    root@node2:~# netplan apply
    
    root@node2:~# ibdev2netdev
    ...
    mlx5_2 port 1 ==> ibs6f0 (Up)
    mlx5_3 port 1 ==> ibs6f1 (Down)
    ...
  4. Set netns to exclusive mode for allows network namespace isolation for RDMA workloads on the Worker node servers.

    Server Console
    root@node2:~# vim /etc/modprobe.d/ib_core.conf
    
    # Set netns to exclusive mode for namespace isolation
    options ib_core netns_mode=0
    
    root@node2:~# update-initramfs -u
    root@node2:~# reboot
  5. Check netns mode and InfiniBand devices on the Worker node servers.

    Server Console
    $ rdma system
    netns exclusive
    
    $ ls -la /dev/infiniband/
    total 0
    drwxr-xr-x  2 root root      300 Jan 26 16:26 .
    drwxr-xr-x 22 root root     5100 Jan 26 16:55 ..
    crw-------  1 root root 231,  64 Jan 26 16:26 issm0
    crw-------  1 root root 231,  65 Jan 26 16:26 issm1
    crw-------  1 root root 231,  66 Jan 26 16:26 issm2
    crw-------  1 root root 231,  67 Jan 26 16:26 issm3
    crw-rw-rw-  1 root root  10,  57 Jan 26 16:26 rdma_cm
    crw-------  1 root root 231,   0 Jan 26 16:26 umad0
    crw-------  1 root root 231,   1 Jan 26 16:26 umad1
    crw-------  1 root root 231,   2 Jan 26 16:26 umad2
    crw-------  1 root root 231,   3 Jan 26 16:26 umad3
    crw-rw-rw-  1 root root 231, 192 Jan 26 16:26 uverbs0
    crw-rw-rw-  1 root root 231, 193 Jan 26 16:26 uverbs1
    crw-rw-rw-  1 root root 231, 194 Jan 26 16:26 uverbs2
    crw-rw-rw-  1 root root 231, 195 Jan 26 16:26 uverbs3
    
    
    
    $ ls -la /sys/class/infiniband
    total 0
    drwxr-xr-x  2 root root 0 Jan 11 13:52 .
    drwxr-xr-x 82 root root 0 Jan 11 13:52 ..
    lrwxrwxrwx  1 root root 0 Jan 11 13:53 mlx5_0 -> ../../devices/pci0000:11/0000:11:02.0/0000:13:00.0/infiniband/mlx5_0
    lrwxrwxrwx  1 root root 0 Jan 11 13:53 mlx5_1 -> ../../devices/pci0000:11/0000:11:02.0/0000:13:00.1/infiniband/mlx5_1
    lrwxrwxrwx  1 root root 0 Jan 11 13:52 mlx5_2 -> ../../devices/pci0000:ae/0000:ae:00.0/0000:af:00.0/infiniband/mlx5_2
    lrwxrwxrwx  1 root root 0 Jan 11 13:52 mlx5_3 -> ../../devices/pci0000:ae/0000:ae:00.0/0000:af:00.1/infiniband/mlx5_3

All Worker nodes must have the same configuration and the same PCIe card placement.

Check that IB interface is UP.

K8s Cluster Deployment and Configuration

The Kubernetes cluster in this solution will be installed using Kubespray with a non-root user account from a Deployment node.

SSH Private Key and SSH Passwordless Login

  • Login to the Deployment node as a deployment user (in this case - user) and create an SSH private key for configuring the password-less authentication on your computer by running the following commands:

    Deployment Node Console
    $ ssh-keygen
    
    Generating public/private rsa key pair.
    Enter file in which to save the key (/home/user/.ssh/id_rsa):
    Created directory '/home/user/.ssh'.
    Enter passphrase (empty for no passphrase):
    Enter same passphrase again:
    Your identification has been saved in /home/user/.ssh/id_rsa.
    Your public key has been saved in /home/user/.ssh/id_rsa.pub.
    The key fingerprint is:
    SHA256:PaZkvxV4K/h8q32zPWdZhG1VS0DSisAlehXVuiseLgA user@sl-depl-node
    The key's randomart image is:
    +---[RSA 2048]----+
    |      ...+oo+o..o|
    |      .oo   .o. o|
    |     . .. . o  +.|
    |   E  .  o +  . +|
    |    .   S = +  o |
    |     . o = + o  .|
    |      . o.o +   o|
    |       ..+.*. o+o|
    |        oo*ooo.++|
    +----[SHA256]-----+


  • Copy your SSH private key, such as ~/.ssh/id_rsa, to all nodes in your deployment by running the following command. Sample:

    Deployment Node Console
    Sample:
    $ ssh-copy-id -i ~/.ssh/id_rsa user@192.168.1.40
    
    /usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/user/.ssh/id_rsa.pub"
    The authenticity of host '192.168.1.40 (192.168.1.40)' can't be established.
    ECDSA key fingerprint is SHA256:uyglY5g0CgPNGDm+XKuSkFAbx0RLaPijpktANgXRlD8.
    Are you sure you want to continue connecting (yes/no)? yes
    /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
    /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
    user@192.168.1.40's password:
    
    Number of key(s) added: 1
    
    Now try logging into the machine, with:   "ssh 'user@192.168.1.40'"
    and check to make sure that only the key(s) you wanted were added.


  • Check SSH connectivity to all nodes in your deployment by running the following command:

    Deployment Node Console
    Sample:
    $ ssh user@192.168.1.40
    Welcome to Ubuntu 18.04.5 LTS (GNU/Linux 5.4.0-52-generic x86_64)
    
     * Documentation:  https://help.ubuntu.com
     * Management:     https://landscape.canonical.com
     * Support:        https://ubuntu.com/advantage
    
      System information as of Mon Jan 11 17:23:23 IST 2021
    
      System load:  0.0               Processes:             216
      Usage of /:   6.5% of 68.40GB   Users logged in:       1
      Memory usage: 2%                IP address for ens160: 192.168.1.40
      Swap usage:   0%
    
     * Introducing self-healing high availability clusters in MicroK8s.
       Simple, hardened, Kubernetes for production, from RaspberryPi to DC.
    
         https://microk8s.io/high-availability
    
    8 packages can be updated.
    8 of these updates are security updates.
    To see these additional updates run: apt list --upgradable
    
    New release '20.04.1 LTS' available.
    Run 'do-release-upgrade' to upgrade to it.
    
    Your Hardware Enablement Stack (HWE) is supported until April 2023.
    
    Last login: Mon Jan 11 17:04:04 2021 from 192.168.1.43
    
    
    user@node1:~$ exit
Kubespray Deployment and Configuration
  1. Install dependencies for running Kubespray with Ansible on the Deployment server.

    Deployment Node Console
    $ cd ~
    $ sudo apt -y install python3-pip jq
    $ wget https://github.com/kubernetes-sigs/kubespray/archive/v2.14.2.tar.gz
    $ tar -zxf v2.14.2.tar.gz
    $ cd kubespray-2.14.2
    $ sudo pip3 install -r requirements.txt
    The default folder for subsequent commands is ~/kubespray-2.14.2.
  2. Create a new cluster configuration.

    Deployment Node Console
    $ cp -rfp inventory/sample inventory/mycluster
    $ declare -a IPS=(192.168.1.40 192.168.1.10 192.168.1.11)
    $ CONFIG_FILE=inventory/mycluster/hosts.yaml python3 contrib/inventory_builder/inventory.py ${IPS[@]}

    As a result, theinventory/mycluster/hosts.yaml file will be created.

    Review and change the host configuration file - inventory/mycluster/hosts.yaml.


    Below is an example for this deployment.

    Deployment Node Console
    $ sudo vim inventory/mycluster/hosts.yaml
    all:
      hosts:
        node1:
          ansible_host: 192.168.1.40
          ip: 192.168.1.40
          access_ip: 192.168.1.40
        node2:
          ansible_host: 192.168.1.10
          ip: 192.168.1.10
          access_ip: 192.168.1.10
        node3:
          ansible_host: 192.168.1.11
          ip: 192.168.1.11
          access_ip: 192.168.1.11
      children:
        kube-master:
          hosts:
            node1:
        kube-node:
          hosts:
            node2:
            node3:
        etcd:
          hosts:
            node1:
        k8s-cluster:
          children:
            kube-master:
            kube-node:
        calico-rr:
          hosts: {}
  3. Review and change cluster installation parameters in the files:
          >  inventory/mycluster/group_vars/all/all.yml 
          >  inventory/mycluster/group_vars/k8s-cluster/k8s-cluster.yml

    In inventory/mycluster/group_vars/all/all.yml uncomment the following line so the metrics can receive data about the use of cluster resources:

    Deployment Node Console
    $ sudo vim inventory/mycluster/group_vars/all/all.yml
    
    ## The read-only port for the Kubelet to serve on with no authentication/authorization. Uncomment to enable.
    kube_read_only_port: 10255

    In inventory/mycluster/group_vars/k8s-cluster/k8s-cluster.yml set a default Kubernetes CNI by setting the desired kube_network_plugin value (default: calico) parameter and enable multi_networking by setting kube_network_plugin_multus: true.

    Deployment Node Console
    $ sudo vim inventory/mycluster/group_vars/k8s-cluster/k8s-cluster.yml
    
    ...
    
    # Choose network plugin (cilium, calico, contiv, weave or flannel. Use cni for generic cni plugin)
    # Can also be set to 'cloud', which lets the cloud provider setup appropriate routing
    kube_network_plugin: calico
    
    # Setting multi_networking to true will install Multus: https://github.com/intel/multus-cni
    kube_network_plugin_multus: true
    
    ...

Deploy K8s Cluster by Kubespray Ansible Playbook

Deployment Node Console
$ ansible-playbook -i inventory/mycluster/hosts.yaml --become --become-user=root cluster.yml 
The execution time for this step may take a while to finalize.

Example of a successful completion of the playbooks looks like:

Deployment Node Console
PLAY RECAP ***************************************************************************************************************************************
localhost                  : ok=1    changed=0    unreachable=0    failed=0   
node1                      : ok=617  changed=101  unreachable=0    failed=0   
node2                      : ok=453  changed=58   unreachable=0    failed=0   
node3                      : ok=410  changed=53   unreachable=0    failed=0   


Monday 30 November 2020  10:48:14 +0300 (0:00:00.265)       0:13:49.321 ********** 
=============================================================================== 
kubernetes/master : kubeadm | Initialize first master ------------------------------------------------------------------------------------ 55.94s
kubernetes/kubeadm : Join to cluster ----------------------------------------------------------------------------------------------------- 37.65s
kubernetes/master : Master | wait for kube-scheduler ------------------------------------------------------------------------------------- 21.97s
download : download_container | Download image if required ------------------------------------------------------------------------------- 21.34s
kubernetes-apps/ansible : Kubernetes Apps | Start Resources ------------------------------------------------------------------------------ 14.85s
kubernetes/preinstall : Update package management cache (APT) ---------------------------------------------------------------------------- 12.49s
download : download_file | Download item ------------------------------------------------------------------------------------------------- 11.45s
etcd : Install | Copy etcdctl binary from docker container ------------------------------------------------------------------------------- 10.57s
download : download_file | Download item -------------------------------------------------------------------------------------------------- 9.37s
kubernetes/preinstall : Install packages requirements ------------------------------------------------------------------------------------- 9.18s
etcd : wait for etcd up ------------------------------------------------------------------------------------------------------------------- 8.78s
etcd : Configure | Check if etcd cluster is healthy --------------------------------------------------------------------------------------- 8.62s
download : download_file | Download item -------------------------------------------------------------------------------------------------- 8.24s
kubernetes-apps/network_plugin/multus : Multus | Start resources -------------------------------------------------------------------------- 7.32s
download : download_container | Download image if required -------------------------------------------------------------------------------- 6.61s
policy_controller/calico : Start of Calico kube controllers ------------------------------------------------------------------------------- 4.92s
download : download_file | Download item -------------------------------------------------------------------------------------------------- 4.76s
kubernetes-apps/cluster_roles : Apply workaround to allow all nodes with cert O=system:nodes to register ---------------------------------- 4.56s
download : download_container | Download image if required -------------------------------------------------------------------------------- 4.48s
download : download | Download files / images --------------------------------------------------------------------------------------------- 4.28s

Label Worker nodes using node-role.kubernetes.io/worker label, run on the K8s Master node.

K8s Master Node Console
# kubectl label nodes node2 node-role.kubernetes.io/worker=
# kubectl label nodes node3 node-role.kubernetes.io/worker=

K8s Deployment Verification

Verifying the Kubernetes cluster deployment can be done through the ROOT user account on the K8s Master node.

Below is an output example of a K8s cluster with the deployment information, with default Kubespray configuration using the Calico Kubernetes CNI plugin.

To ensure that the Kubernetes cluster is installed correctly, run the following commands:

K8s Master Node Console
root@node1:~# kubectl get nodes -o wide
NAME    STATUS   ROLES    AGE   VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
node1   Ready    master   16d   v1.19.2   192.168.1.40   <none>        Ubuntu 18.04.5 LTS   5.4.0-52-generic   docker://19.3.12
node2   Ready    worker   16d   v1.19.2   192.168.1.10   <none>        Ubuntu 20.04.1 LTS   5.4.0-56-generic   docker://19.3.12
node3   Ready    worker   16d   v1.19.2   192.168.1.11   <none>        Ubuntu 20.04.1 LTS   5.4.0-56-generic   docker://19.3.12


     
root@node1:~# kubectl get pod -n kube-system -o wide
NAME                                          READY   STATUS    RESTARTS   AGE   IP             NODE    NOMINATED NODE   READINESS GATES
calico-kube-controllers-b885f5f4-8cr8s        1/1     Running   0          23m   192.168.1.10   node2   <none>           <none>
calico-node-8bb6p                             1/1     Running   1          24m   192.168.1.10   node2   <none>           <none>
calico-node-9hnd4                             1/1     Running   0          24m   192.168.1.40   node1   <none>           <none>
calico-node-qm7z9                             1/1     Running   1          24m   192.168.1.11   node3   <none>           <none>
coredns-dff8fc7d-5n645                        1/1     Running   0          30s   10.233.92.4    node3   <none>           <none>
coredns-dff8fc7d-6qqcc                        1/1     Running   0          32s   10.233.96.1    node2   <none>           <none>
dns-autoscaler-66498f5c5f-vhz22               1/1     Running   0          23m   10.233.90.2    node1   <none>           <none>
kube-apiserver-node1                          1/1     Running   0          25m   192.168.1.40   node1   <none>           <none>
kube-controller-manager-node1                 1/1     Running   0          25m   192.168.1.40   node1   <none>           <none>
kube-multus-ds-amd64-cgz57                    1/1     Running   0          50s   192.168.1.40   node1   <none>           <none>
kube-multus-ds-amd64-jwhwj                    1/1     Running   0          50s   192.168.1.10   node2   <none>           <none>
kube-multus-ds-amd64-qj4dh                    1/1     Running   0          50s   192.168.1.11   node3   <none>           <none>
kube-proxy-ddjjm                              1/1     Running   0          24m   192.168.1.11   node3   <none>           <none>
kube-proxy-j4228                              1/1     Running   0          24m   192.168.1.10   node2   <none>           <none>
kube-proxy-qsb2g                              1/1     Running   0          25m   192.168.1.40   node1   <none>           <none>
kube-scheduler-node1                          1/1     Running   0          25m   192.168.1.40   node1   <none>           <none>
kubernetes-dashboard-667c4c65f8-7xdxf         1/1     Running   0          23m   10.233.92.1    node3   <none>           <none>
kubernetes-metrics-scraper-54fbb4d595-6mtgd   1/1     Running   0          23m   10.233.92.2    node3   <none>           <none>
nginx-proxy-node2                             1/1     Running   0          23m   192.168.1.10   node2   <none>           <none>
nginx-proxy-node3                             1/1     Running   0          24m   192.168.1.11   node3   <none>           <none>
nodelocaldns-67s2w                            1/1     Running   0          23m   192.168.1.10   node2   <none>           <none>
nodelocaldns-mmb2r                            1/1     Running   0          23m   192.168.1.11   node3   <none>           <none>
nodelocaldns-zxlzl                            1/1     Running   0          23m   192.168.1.40   node1   <none>           <none>

NVIDIA GPU Operator Installation for K8s cluster 

  1. The preferred method to deploy the device plugin is as a daemonset using helm from K8s Master NodeInstall Helm from the official installer script.

    K8s Master Node Console
    # curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3
    # chmod 700 get_helm.sh
    # ./get_helm.sh
  2. Add the NVIDIA Helm repository.

    K8s Master Node Console
    # helm repo add nvidia https://nvidia.github.io/gpu-operator 
    # helm repo update
  3. Deploy NVIDIA GPU Operator.

    K8s Master Node Console
    # helm install --wait --generate-name nvidia/gpu-operator
    
    "nvidia" has been added to your repositories
    root@sl-k8s-master:~# helm repo update
    Hang tight while we grab the latest from your chart repositories...
    ...Successfully got an update from the "nvidia" chart repository
    Update Complete. ⎈Happy Helming!⎈
    root@sl-k8s-master:~# helm install --wait --generate-name nvidia/gpu-operator
    NAME: gpu-operator-1610381204
    LAST DEPLOYED: Mon Jan 11 18:06:50 2021
    NAMESPACE: default
    STATUS: deployed
    REVISION: 1
    TEST SUITE: None
    
    
    K8s Master Node Console
    # helm ls
    NAME                    NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                   APP VERSION
    gpu-operator-1610381204 default         1               2021-01-11 18:06:50.465874914 +0200 IST deployed        gpu-operator-1.4.0      1.4.0
  4. Verify the NVIDIA GPU Operator installation (wait ~ 5-10 minutes for the operator installation will finished).

    K8s Master Node Console
    # kubectl get pod -A -o wide
    
    NAMESPACE                NAME                                                              READY   STATUS      RESTARTS   AGE     IP             NODE    NOMINATED NODE   READINESS GATES
    default                  gpu-operator-1610455631-node-feature-discovery-master-c8dbgrnpf   1/1     Running     0          6m52s   10.233.90.5    node1   <none>           <none>
    default                  gpu-operator-1610455631-node-feature-discovery-worker-24zlr       1/1     Running     0          6m52s   10.233.92.4    node3   <none>           <none>
    default                  gpu-operator-1610455631-node-feature-discovery-worker-47mbw       1/1     Running     0          6m52s   10.233.90.4    node1   <none>           <none>
    default                  gpu-operator-1610455631-node-feature-discovery-worker-qmnmj       1/1     Running     0          6m52s   10.233.96.1    node2   <none>           <none>
    default                  gpu-operator-7d4649d96c-2d2xj                                     1/1     Running     4          6m52s   10.233.90.3    node1   <none>           <none>
    gpu-operator-resources   gpu-feature-discovery-4h8dh                                       1/1     Running     0          75s     10.233.92.11   node3   <none>           <none>
    gpu-operator-resources   gpu-feature-discovery-c4fzh                                       1/1     Running     0          75s     10.233.96.5    node2   <none>           <none>
    gpu-operator-resources   nvidia-container-toolkit-daemonset-5hpng                          1/1     Running     0          4m19s   10.233.96.2    node2   <none>           <none>
    gpu-operator-resources   nvidia-container-toolkit-daemonset-n7mkv                          1/1     Running     0          4m19s   10.233.92.5    node3   <none>           <none>
    gpu-operator-resources   nvidia-dcgm-exporter-mjpg7                                        1/1     Running     0          2m5s    10.233.92.10   node3   <none>           <none>
    gpu-operator-resources   nvidia-dcgm-exporter-smmpp                                        1/1     Running     0          2m5s    10.233.96.4    node2   <none>           <none>
    gpu-operator-resources   nvidia-device-plugin-daemonset-7tvqh                              1/1     Running     0          3m5s    10.233.92.7    node3   <none>           <none>
    gpu-operator-resources   nvidia-device-plugin-daemonset-p9djf                              1/1     Running     0          3m5s    10.233.96.3    node2   <none>           <none>
    gpu-operator-resources   nvidia-device-plugin-validation                                   0/1     Completed   0          2m8s    10.233.92.8    node3   <none>           <none>
    gpu-operator-resources   nvidia-driver-daemonset-5cxb7                                     1/1     Running     0          5m41s   192.168.1.10   node2   <none>           <none>
    gpu-operator-resources   nvidia-driver-daemonset-b5dlv                                     1/1     Running     0          5m41s   192.168.1.11   node3   <none>           <none>
    gpu-operator-resources   nvidia-driver-validation                                          0/1     Completed   2          3m54s   10.233.92.6    node3   <none>           <none>
    ...

SR-IOV Network Operator Installation for K8s Cluster 

SR-IOV network is an additional feature of a Kubernetes cluster.

To make it work, you need to provision and configure different components.

SR-IOV Network Operator Deployment Steps

  • Initialize the supported SR-IOV NIC types on selected nodes.
  • Provision SR-IOV device plugin executable on selected nodes.
  • Provision SR-IOV CNI plugin executable on selected nodes.
  • Manage configuration of SR-IOV device plugin on host.
  • Generate net-att-def CRs for SR-IOV CNI plugin.

Prerequisites

Install general dependencies on the Master node server, run the commands below.

Server Console
# apt-get install jq make gcc -y
# snap install skopeo --edge --devmode
# snap install go --classic
# export GOPATH=$HOME/go
# export PATH=$GOPATH/bin:$PATH

Below is a detailed step-by-step description of an SR-IOV Network Operator installation.

  1. Install Whereabouts CNI.

    You can install this plugin with a Daemonset, using the following commands:

    K8s Master Node Console
    # kubectl apply -f https://raw.githubusercontent.com/openshift/whereabouts-cni/master/doc/daemonset-install.yaml
    # kubectl apply -f https://raw.githubusercontent.com/openshift/whereabouts-cni/master/doc/whereabouts.cni.cncf.io_ippools.yaml
    # kubectl apply -f https://raw.githubusercontent.com/openshift/whereabouts-cni/master/doc/whereabouts.cni.cncf.io_overlappingrangeipreservations.yaml

    To ensure the plugin is installed correctly, run the following command:

    K8s Master Node Console
    # kubectl get pods -A
    NAMESPACE                NAME                                                              READY   STATUS      RESTARTS   AGE
    .......
    kube-system              whereabouts-nsw6x                                                 1/1     Running     0          22d
    kube-system              whereabouts-pnhvn                                                 1/1     Running     1          27d
    kube-system              whereabouts-pv694                                                 1/1     Running     0          27d
  2. Clone this GitHub repository.

    K8s Master Node Console
    # cd /root
    # go get github.com/k8snetworkplumbingwg/sriov-network-operator


  3. Deploy the operator.

    By default, the operator will be deployed in namespace 'sriov-network-operator' for a Kubernetes cluster. You can check if the deployment is finished successfully.

    K8s Master Node Console
    # cd go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/
    # make deploy-setup-k8s
  4. Checking the status of SriovNetworkNodeState CRs to find out all the SR-IOV capable devices in our cluster.

    In our deployment we choose IB interface with name ibs6f0.

    K8s Master Node Console
    # kubectl -n sriov-network-operator get sriovnetworknodestates.sriovnetwork.openshift.io node2 -o yaml
    ...
    	    deviceID: 101b
            driver: mlx5_core
            linkType: IB
            mac: 00:00:03:87:fe:80:00:00:00:00:00:00:98:03:9b:03:00:9f:cd:b6
            mtu: 4092
            name: ibs6f0
            numVfs: 8
            pciAddress: 0000:af:00.0
            totalvfs: 8
            vendor: 15b3
      -     deviceID: 101b
            driver: mlx5_core
            linkType: IB
            mac: 00:00:0b:0f:fe:80:00:00:00:00:00:00:98:03:9b:03:00:9f:cd:b7
            mtu: 4092
            name: ibs6f1
            pciAddress: 0000:af:00.1
            totalvfs: 8
            vendor: 15b3
    ...
  5. With the chosen IB interface we create SriovNetworkNodePolicy CR.

    K8s Master Node Console
    # cd /root
    # mkdir YAMLs
    # cd YAMLs/
    # vim policy.yaml
    
    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetworkNodePolicy
    metadata:
      name: policy-ib0
      namespace: sriov-network-operator
    spec:
      resourceName: "mlnx_ib0"
      nodeSelector:
        feature.node.kubernetes.io/custom-rdma.available: "true"
      priority: 10
      numVfs: 8
      nicSelector:
        vendor: "15b3"
        deviceID: "101b"
        pfNames: [ "ibs6f0" ]
      isRdma: true
      linkType: ib
  6. Apply the SriovNetworkNodePolicy.

    K8s Master Node Console
    # kubectl apply -f policy.yaml
  7. Check the Operator deployment after the police activation.

    K8s Master Node Console
    # kubectl -n sriov-network-operator get all
    NAME                                          READY   STATUS    RESTARTS   AGE
    pod/sriov-cni-bzdsv                           2/2     Running   0          59s
    pod/sriov-cni-vsjbt                           2/2     Running   0          9m6s
    pod/sriov-device-plugin-9ghjx                 1/1     Running   0          9m6s
    pod/sriov-device-plugin-hkzct                 1/1     Running   0          12s
    pod/sriov-network-config-daemon-8x749         1/1     Running   0          22m
    pod/sriov-network-config-daemon-k7plr         1/1     Running   0          61s
    pod/sriov-network-operator-79b8bb586f-ptgr6   1/1     Running   0          22m
    
    NAME                                         DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                                                 AGE
    daemonset.apps/sriov-cni                     2         2         2       2            2           beta.kubernetes.io/os=linux,node-role.kubernetes.io/worker=   9m6s
    daemonset.apps/sriov-device-plugin           2         2         2       2            2           beta.kubernetes.io/os=linux,node-role.kubernetes.io/worker=   9m6s
    daemonset.apps/sriov-network-config-daemon   2         2         2       2            2           beta.kubernetes.io/os=linux,node-role.kubernetes.io/worker=   22m
    
    NAME                                     READY   UP-TO-DATE   AVAILABLE   AGE
    deployment.apps/sriov-network-operator   1/1     1            1           22m
    
    NAME                                                DESIRED   CURRENT   READY   AGE
    replicaset.apps/sriov-network-operator-79b8bb586f   1         1         1       22m
  8. Create a Network Attachment Definition with file name sriov-ib0.yaml.

    File sample
    # vim sriov-ib0.yaml
    
    
    apiVersion: k8s.cni.cncf.io/v1
    kind: NetworkAttachmentDefinition
    metadata:
      annotations:
        k8s.v1.cni.cncf.io/resourceName: openshift.io/mlnx_ib0
      name: sriovib0
      namespace: default
    spec:
      config: |-
        {
          "cniVersion": "0.3.1",
          "name": "sriovib0",
          "plugins": [
            {
              "type": "ib-sriov",
              "link_state": "enable",
              "rdmaIsolation": true,
              "ibKubernetesEnabled": false,
              "ipam": {
                "datastore": "kubernetes",
                "kubernetes": {
                  "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig"
                },
                "log_file": "/tmp/whereabouts.log",
                "log_level": "debug",
                "type": "whereabouts",
                "range": "192.168.101.0/24"
              }
            }
          ]
        }


  9. Apply the Network Attachment Definition.

    K8s Master Node Console
    # kubectl apply -f sriov-ib0.yaml
  10. Verify the Network Attachment Definition installation.

    K8s Master Node Console
    # kubectl get network-attachment-definitions.k8s.cni.cncf.io
    NAME       AGE
    sriovib0   28d
  11. Check Worker node 2.

    Worker Node 2
    # kubectl describe nodes node2
    Name:               node2
    Roles:              worker
    Labels:             beta.kubernetes.io/arch=amd64
                        beta.kubernetes.io/os=linux
                        feature.node.kubernetes.io/cpu-cpuid.ADX=true
                        feature.node.kubernetes.io/cpu-cpuid.AESNI=true
                        feature.node.kubernetes.io/cpu-cpuid.AVX=true
                        feature.node.kubernetes.io/cpu-cpuid.AVX2=true
                        feature.node.kubernetes.io/cpu-cpuid.AVX512BW=true
                        feature.node.kubernetes.io/cpu-cpuid.AVX512CD=true
                        feature.node.kubernetes.io/cpu-cpuid.AVX512DQ=true
                        feature.node.kubernetes.io/cpu-cpuid.AVX512F=true
                        feature.node.kubernetes.io/cpu-cpuid.AVX512VL=true
                        feature.node.kubernetes.io/cpu-cpuid.FMA3=true
                        feature.node.kubernetes.io/cpu-cpuid.HLE=true
                        feature.node.kubernetes.io/cpu-cpuid.IBPB=true
                        feature.node.kubernetes.io/cpu-cpuid.MPX=true
                        feature.node.kubernetes.io/cpu-cpuid.RTM=true
                        feature.node.kubernetes.io/cpu-cpuid.STIBP=true
                        feature.node.kubernetes.io/cpu-cpuid.VMX=true
                        feature.node.kubernetes.io/cpu-rdt.RDTCMT=true
                        feature.node.kubernetes.io/cpu-rdt.RDTL3CA=true
                        feature.node.kubernetes.io/cpu-rdt.RDTMBA=true
                        feature.node.kubernetes.io/cpu-rdt.RDTMBM=true
                        feature.node.kubernetes.io/cpu-rdt.RDTMON=true
                        feature.node.kubernetes.io/custom-rdma.available=true
                        feature.node.kubernetes.io/custom-rdma.capable=true
                        feature.node.kubernetes.io/kernel-config.NO_HZ=true
                        feature.node.kubernetes.io/kernel-config.NO_HZ_IDLE=true
                        feature.node.kubernetes.io/kernel-version.full=5.4.0-56-generic
                        feature.node.kubernetes.io/kernel-version.major=5
                        feature.node.kubernetes.io/kernel-version.minor=4
                        feature.node.kubernetes.io/kernel-version.revision=0
                        feature.node.kubernetes.io/memory-numa=true
                        feature.node.kubernetes.io/pci-0300_102b.present=true
                        feature.node.kubernetes.io/pci-0302_10de.present=true
                        feature.node.kubernetes.io/pci-0302_10de.sriov.capable=true
                        feature.node.kubernetes.io/storage-nonrotationaldisk=true
                        feature.node.kubernetes.io/system-os_release.ID=ubuntu
                        feature.node.kubernetes.io/system-os_release.VERSION_ID=20.04
                        feature.node.kubernetes.io/system-os_release.VERSION_ID.major=20
                        feature.node.kubernetes.io/system-os_release.VERSION_ID.minor=04
                        kubernetes.io/arch=amd64
                        kubernetes.io/hostname=node2
                        kubernetes.io/os=linux
                        node-role.kubernetes.io/worker=
                        nvidia.com/gpu.present=true
    Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                        nfd.node.kubernetes.io/extended-resources:
                        nfd.node.kubernetes.io/feature-labels:
                          cpu-cpuid.ADX,cpu-cpuid.AESNI,cpu-cpuid.AVX,cpu-cpuid.AVX2,cpu-cpuid.AVX512BW,cpu-cpuid.AVX512CD,cpu-cpuid.AVX512DQ,cpu-cpuid.AVX512F,cpu-...
                        nfd.node.kubernetes.io/worker.version: v0.6.0
                        node.alpha.kubernetes.io/ttl: 0
                        sriovnetwork.openshift.io/state: Idle
                        volumes.kubernetes.io/controller-managed-attach-detach: true
    CreationTimestamp:  Tue, 01 Dec 2020 17:22:46 +0200
    Taints:             <none>
    Unschedulable:      false
    Lease:
      HolderIdentity:  node2
      AcquireTime:     <unset>
      RenewTime:       Wed, 30 Dec 2020 14:30:59 +0200
    Conditions:
      Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
      ----                 ------  -----------------                 ------------------                ------                       -------
      NetworkUnavailable   False   Mon, 07 Dec 2020 16:00:20 +0200   Mon, 07 Dec 2020 16:00:20 +0200   CalicoIsUp                   Calico is running on this node
      MemoryPressure       False   Wed, 30 Dec 2020 14:31:06 +0200   Mon, 07 Dec 2020 15:59:48 +0200   KubeletHasSufficientMemory   kubelet has sufficient memory available
      DiskPressure         False   Wed, 30 Dec 2020 14:31:06 +0200   Mon, 07 Dec 2020 15:59:48 +0200   KubeletHasNoDiskPressure     kubelet has no disk pressure
      PIDPressure          False   Wed, 30 Dec 2020 14:31:06 +0200   Mon, 07 Dec 2020 15:59:48 +0200   KubeletHasSufficientPID      kubelet has sufficient PID available
      Ready                True    Wed, 30 Dec 2020 14:31:06 +0200   Mon, 07 Dec 2020 15:59:53 +0200   KubeletReady                 kubelet is posting ready status. AppArmor enabled
    Addresses:
      InternalIP:  192.168.1.10
      Hostname:    node2
    Capacity:
      cpu:                    32
      ephemeral-storage:      229700940Ki
      hugepages-1Gi:          0
      hugepages-2Mi:          0
      memory:                 197754972Ki
      nvidia.com/gpu:         2
      openshift.io/mlnx_ib0:  8
      pods:                   110
    Allocatable:
      cpu:                    31900m
      ephemeral-storage:      211692385954
      hugepages-1Gi:          0
      hugepages-2Mi:          0
      memory:                 197402572Ki
      nvidia.com/gpu:         2
      openshift.io/mlnx_ib0:  8
      pods:                   110
    System Info:
      Machine ID:                 646aa8cc13d14c47ac112babe9daf77c
      System UUID:                37383638-3330-5a43-3238-3435304d3647
      Boot ID:                    049266f0-98a5-48a8-b225-e118d1508ae1
      Kernel Version:             5.4.0-56-generic
      OS Image:                   Ubuntu 20.04.1 LTS
      Operating System:           linux
      Architecture:               amd64
      Container Runtime Version:  docker://19.3.12
      Kubelet Version:            v1.19.2
      Kube-Proxy Version:         v1.19.2
    PodCIDR:                      10.233.65.0/24
    PodCIDRs:                     10.233.65.0/24
    Non-terminated Pods:          (15 in total)
      Namespace                   Name                                                           CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
      ---------                   ----                                                           ------------  ----------  ---------------  -------------  ---
      default                     gpu-operator-1606837056-node-feature-discovery-worker-sjh9c    0 (0%)        0 (0%)      0 (0%)           0 (0%)         28d
      gpu-operator-resources      gpu-feature-discovery-lfzpz                                    0 (0%)        0 (0%)      0 (0%)           0 (0%)         22d
      gpu-operator-resources      nvidia-container-toolkit-daemonset-z8zbj                       0 (0%)        0 (0%)      0 (0%)           0 (0%)         22d
      gpu-operator-resources      nvidia-dcgm-exporter-d4s79                                     0 (0%)        0 (0%)      0 (0%)           0 (0%)         22d
      gpu-operator-resources      nvidia-device-plugin-daemonset-jm4sm                           0 (0%)        0 (0%)      0 (0%)           0 (0%)         22d
      gpu-operator-resources      nvidia-driver-daemonset-7kj6c                                  0 (0%)        0 (0%)      0 (0%)           0 (0%)         22d
      kube-system                 calico-node-9st8w                                              150m (0%)     300m (0%)   64M (0%)         500M (0%)      28d
      kube-system                 kube-multus-ds-amd64-xhzwv                                     100m (0%)     100m (0%)   90Mi (0%)        90Mi (0%)      28d
      kube-system                 kube-proxy-l45cj                                               0 (0%)        0 (0%)      0 (0%)           0 (0%)         28d
      kube-system                 nginx-proxy-node2                                              25m (0%)      0 (0%)      32M (0%)         0 (0%)         28d
      kube-system                 nodelocaldns-lvqwb                                             100m (0%)     0 (0%)      70Mi (0%)        170Mi (0%)     28d
      kube-system                 whereabouts-nsw6x                                              100m (0%)     100m (0%)   50Mi (0%)        50Mi (0%)      22d
      sriov-network-operator      sriov-cni-r2vlr                                                0 (0%)        0 (0%)      0 (0%)           0 (0%)         9m40s
      sriov-network-operator      sriov-device-plugin-zschv                                      0 (0%)        0 (0%)      0 (0%)           0 (0%)         4m50s
      sriov-network-operator      sriov-network-config-daemon-7qtmt                              0 (0%)        0 (0%)      0 (0%)           0 (0%)         22d
    Allocated resources:
      (Total limits may be over 100 percent, i.e., overcommitted.)
      Resource               Requests        Limits
      --------               --------        ------
      cpu                    475m (1%)       500m (1%)
      memory                 316200960 (0%)  825058560 (0%)
      ephemeral-storage      0 (0%)          0 (0%)
      hugepages-1Gi          0 (0%)          0 (0%)
      hugepages-2Mi          0 (0%)          0 (0%)
      nvidia.com/gpu         0               0
      openshift.io/mlnx_ib0  0               0
    Events:                  <none>
  12. Check Worker node 3.
Worker Node 3
# kubectl describe nodes node3
Name:               node3
Roles:              worker
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    feature.node.kubernetes.io/cpu-cpuid.ADX=true
                    feature.node.kubernetes.io/cpu-cpuid.AESNI=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX2=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512BW=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512CD=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512DQ=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512F=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512VL=true
                    feature.node.kubernetes.io/cpu-cpuid.FMA3=true
                    feature.node.kubernetes.io/cpu-cpuid.HLE=true
                    feature.node.kubernetes.io/cpu-cpuid.IBPB=true
                    feature.node.kubernetes.io/cpu-cpuid.MPX=true
                    feature.node.kubernetes.io/cpu-cpuid.RTM=true
                    feature.node.kubernetes.io/cpu-cpuid.STIBP=true
                    feature.node.kubernetes.io/cpu-cpuid.VMX=true
                    feature.node.kubernetes.io/cpu-hardware_multithreading=true
                    feature.node.kubernetes.io/cpu-rdt.RDTCMT=true
                    feature.node.kubernetes.io/cpu-rdt.RDTL3CA=true
                    feature.node.kubernetes.io/cpu-rdt.RDTMBA=true
                    feature.node.kubernetes.io/cpu-rdt.RDTMBM=true
                    feature.node.kubernetes.io/cpu-rdt.RDTMON=true
                    feature.node.kubernetes.io/custom-rdma.available=true
                    feature.node.kubernetes.io/custom-rdma.capable=true
                    feature.node.kubernetes.io/kernel-config.NO_HZ=true
                    feature.node.kubernetes.io/kernel-config.NO_HZ_IDLE=true
                    feature.node.kubernetes.io/kernel-version.full=5.4.0-56-generic
                    feature.node.kubernetes.io/kernel-version.major=5
                    feature.node.kubernetes.io/kernel-version.minor=4
                    feature.node.kubernetes.io/kernel-version.revision=0
                    feature.node.kubernetes.io/memory-numa=true
                    feature.node.kubernetes.io/pci-0300_102b.present=true
                    feature.node.kubernetes.io/pci-0302_10de.present=true
                    feature.node.kubernetes.io/pci-0302_10de.sriov.capable=true
                    feature.node.kubernetes.io/storage-nonrotationaldisk=true
                    feature.node.kubernetes.io/system-os_release.ID=ubuntu
                    feature.node.kubernetes.io/system-os_release.VERSION_ID=20.04
                    feature.node.kubernetes.io/system-os_release.VERSION_ID.major=20
                    feature.node.kubernetes.io/system-os_release.VERSION_ID.minor=04
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=node3
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/worker=
                    nvidia.com/gpu.present=true
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                    nfd.node.kubernetes.io/extended-resources:
                    nfd.node.kubernetes.io/feature-labels:
                      cpu-cpuid.ADX,cpu-cpuid.AESNI,cpu-cpuid.AVX,cpu-cpuid.AVX2,cpu-cpuid.AVX512BW,cpu-cpuid.AVX512CD,cpu-cpuid.AVX512DQ,cpu-cpuid.AVX512F,cpu-...
                    nfd.node.kubernetes.io/worker.version: v0.6.0
                    node.alpha.kubernetes.io/ttl: 0
                    sriovnetwork.openshift.io/state: Idle
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Tue, 01 Dec 2020 17:22:53 +0200
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  node3
  AcquireTime:     <unset>
  RenewTime:       Wed, 30 Dec 2020 14:36:15 +0200
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Mon, 07 Dec 2020 15:40:51 +0200   Mon, 07 Dec 2020 15:40:51 +0200   CalicoIsUp                   Calico is running on this node
  MemoryPressure       False   Wed, 30 Dec 2020 14:36:19 +0200   Mon, 07 Dec 2020 15:40:42 +0200   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Wed, 30 Dec 2020 14:36:19 +0200   Mon, 07 Dec 2020 15:40:42 +0200   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Wed, 30 Dec 2020 14:36:19 +0200   Mon, 07 Dec 2020 15:40:42 +0200   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Wed, 30 Dec 2020 14:36:19 +0200   Mon, 07 Dec 2020 15:40:44 +0200   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  InternalIP:  192.168.1.11
  Hostname:    node3
Capacity:
  cpu:                    64
  ephemeral-storage:      229698892Ki
  hugepages-1Gi:          0
  hugepages-2Mi:          0
  memory:                 197747532Ki
  nvidia.com/gpu:         2
  openshift.io/mlnx_ib0:  7
  pods:                   110
Allocatable:
  cpu:                    63900m
  ephemeral-storage:      211690498517
  hugepages-1Gi:          0
  hugepages-2Mi:          0
  memory:                 197395132Ki
  nvidia.com/gpu:         2
  openshift.io/mlnx_ib0:  0
  pods:                   110
System Info:
  Machine ID:                 c9f34445383f445eb44cd27fb90634e8
  System UUID:                37383638-3330-5a43-3238-3435304d3643
  Boot ID:                    20be7b74-ce7d-4180-b904-48135f823819
  Kernel Version:             5.4.0-56-generic
  OS Image:                   Ubuntu 20.04.1 LTS
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://19.3.12
  Kubelet Version:            v1.19.2
  Kube-Proxy Version:         v1.19.2
PodCIDR:                      10.233.66.0/24
PodCIDRs:                     10.233.66.0/24
Non-terminated Pods:          (17 in total)
  Namespace                   Name                                                           CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                   ----                                                           ------------  ----------  ---------------  -------------  ---
  default                     gpu-operator-1606837056-node-feature-discovery-worker-4mxl8    0 (0%)        0 (0%)      0 (0%)           0 (0%)         28d
  default                     rdma-test-pod                                                  0 (0%)        0 (0%)      0 (0%)           0 (0%)         22d
  gpu-operator-resources      gpu-feature-discovery-kbkwt                                    0 (0%)        0 (0%)      0 (0%)           0 (0%)         28d
  gpu-operator-resources      nvidia-container-toolkit-daemonset-fmvgk                       0 (0%)        0 (0%)      0 (0%)           0 (0%)         28d
  gpu-operator-resources      nvidia-dcgm-exporter-nlwhx                                     0 (0%)        0 (0%)      0 (0%)           0 (0%)         28d
  gpu-operator-resources      nvidia-device-plugin-daemonset-k7c99                           0 (0%)        0 (0%)      0 (0%)           0 (0%)         28d
  gpu-operator-resources      nvidia-driver-daemonset-sslgt                                  0 (0%)        0 (0%)      0 (0%)           0 (0%)         28d
  kube-system                 calico-node-hmsml                                              150m (0%)     300m (0%)   64M (0%)         500M (0%)      28d
  kube-system                 coredns-84646c885d-zh86b                                       100m (0%)     0 (0%)      70Mi (0%)        170Mi (0%)     22d
  kube-system                 kube-multus-ds-amd64-4r25b                                     100m (0%)     100m (0%)   90Mi (0%)        90Mi (0%)      28d
  kube-system                 kube-proxy-bbd99                                               0 (0%)        0 (0%)      0 (0%)           0 (0%)         28d
  kube-system                 nginx-proxy-node3                                              25m (0%)      0 (0%)      32M (0%)         0 (0%)         28d
  kube-system                 nodelocaldns-xw9tq                                             100m (0%)     0 (0%)      70Mi (0%)        170Mi (0%)     28d
  kube-system                 whereabouts-pnhvn                                              100m (0%)     100m (0%)   50Mi (0%)        50Mi (0%)      28d
  sriov-network-operator      sriov-cni-kcz44                                                0 (0%)        0 (0%)      0 (0%)           0 (0%)         10m
  sriov-network-operator      sriov-device-plugin-gmmdr                                      0 (0%)        0 (0%)      0 (0%)           0 (0%)         4s
  sriov-network-operator      sriov-network-config-daemon-xq9c9                              0 (0%)        0 (0%)      0 (0%)           0 (0%)         28d
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource               Requests        Limits
  --------               --------        ------
  cpu                    575m (0%)       500m (0%)
  memory                 389601280 (0%)  1003316480 (0%)
  ephemeral-storage      0 (0%)          0 (0%)
  hugepages-1Gi          0 (0%)          0 (0%)
  hugepages-2Mi          0 (0%)          0 (0%)
  nvidia.com/gpu         0               0
  openshift.io/mlnx_ib0  1               1
Events:                  <none>

Deployment Verification

  1. Create a sample Deployment (Container image must include Cuda and InfiniBand performance tools):

    K8s Master Node Console
    # vim sample-depl.yaml
    
    
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: sample-pod
      labels:
        app: sriov
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: sriov
      template:
        metadata:
          labels:
            app: sriov
          annotations:
            k8s.v1.cni.cncf.io/networks: sriovib0
        spec:
          containers:
          - image: <Container Image Name>
            name: mlnx-inbox-ctr
            securityContext:
              capabilities:
                add: [ "IPC_LOCK" ]
            resources:
              requests:
                openshift.io/mlnx_ib0: '1'
                nvidia.com/gpu: 1
              limits:
                openshift.io/mlnx_ib0: '1'
                nvidia.com/gpu: 1
            command:
            - sh
            - -c
            - sleep inf
  2. Deploy the sample POD.

    K8s Master Node Console
    # kubectl apply -f sample-depl.yaml
  3. Verify the POD is running.

    K8s Master Node Console
    # kubectl get pod -o wide
    NAME                                                              READY   STATUS    RESTARTS   AGE   IP             NODE    NOMINATED NODE   READINESS GATES
    gpu-operator-1610455631-node-feature-discovery-master-c8dbgrnpf   1/1     Running   0          20h   10.233.90.5    node1   <none>           <none>
    gpu-operator-1610455631-node-feature-discovery-worker-24zlr       1/1     Running   4          20h   10.233.92.31   node3   <none>           <none>
    gpu-operator-1610455631-node-feature-discovery-worker-47mbw       1/1     Running   1          20h   10.233.90.4    node1   <none>           <none>
    gpu-operator-1610455631-node-feature-discovery-worker-qmnmj       1/1     Running   2          20h   10.233.96.20   node2   <none>           <none>
    gpu-operator-7d4649d96c-2d2xj                                     1/1     Running   4          20h   10.233.90.3    node1   <none>           <none>
    sample-pod-65b94586b4-8k784                                       1/1     Running   0          17h   10.233.92.37   node3   <none>           <none>
    sample-pod-65b94586b4-8xn6m                                       1/1     Running   0          17h   10.233.96.27   node2   <none>           <none>
  4. Check GPU in a container.

    K8s Master Node Console
    # kubectl exec -it sample-pod-65b94586b4-8k784 -- bash
    root@sample-pod-65b94586b4-8k784:/tmp# nvidia-smi
    Wed Jan 13 09:38:49 2021
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: N/A      |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  Tesla T4            On   | 00000000:37:00.0 Off |                    0 |
    | N/A   48C    P8    16W /  70W |      0MiB / 15109MiB |      0%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    |  No running processes found                                                 |
    +-----------------------------------------------------------------------------+
    
    
    root@sample-pod-65b94586b4-8k784:/# exit
    exit
  5. Check network adapters.

    K8s Master Node Console
    # kubectl exec -it sample-pod-65b94586b4-8k784 -- bash
    
    root@sample-pod-65b94586b4-8k784:/tmp# ip a s
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
    2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
        link/ipip 0.0.0.0 brd 0.0.0.0
    4: eth0@if48: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
        link/ether 8a:87:13:3b:bd:c4 brd ff:ff:ff:ff:ff:ff link-netnsid 0
        inet 10.233.92.37/32 scope global eth0
           valid_lft forever preferred_lft forever
    49: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP group default qlen 256
        link/infiniband 00:00:0e:e3:fe:80:00:00:00:00:00:00:60:cc:fa:35:1d:14:a4:cc brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
        inet 192.168.101.1/24 brd 192.168.101.255 scope global net1
           valid_lft forever preferred_lft forever
    
    root@sample-pod-65b94586b4-8k784:/tmp# ibdev2netdev
    mlx5_9 port 1 ==> net1 (Up)
    
    root@sample-pod-65b94586b4-8k784:/# exit
    exit
  6. Run an RDMA Write - ib_write_bw bandwidth stress benchmark over IB.

    Server

    ib_write_bw -a -d mlx5_0 &

    Client

    ib_write_bw -a -F $server_IP -d mlx5_0 --report_gbits


    Open 2 consoles to K8s Master node.

    1. In a first console (Server side) to K8s Master node run the following commands:

      K8s Master Node Console
      # kubectl exec -it sample-pod-65b94586b4-8k784 -- bash
      
      root@sample-pod-65b94586b4-8k784:/tmp# ibdev2netdev
      mlx5_9 port 1 ==> net1 (Up)
      root@sample-pod-65b94586b4-8k784:/tmp# ib_write_bw -a -d mlx5_9 &
      [1] 1081
      root@sample-pod-65b94586b4-8k784:/tmp#
      ************************************
      * Waiting for client to connect... *
      ************************************
    2. In a second console (Client side) to K8s Master node run the following commands:

      K8s Master Node Console
      # kubectl exec -it sample-pod-65b94586b4-8xn6m -- bash
      
      root@sample-pod-65b94586b4-8xn6m:/tmp# ibdev2netdev
      mlx5_7 port 1 ==> net1 (Up)
      root@sample-pod-65b94586b4-8xn6m:/tmp# ib_write_bw -a -F 192.168.101.1 -d mlx5_7 --report_gbits
    3. Results:

      K8s Master Node Console
      Server:
      ---------------------------------------------------------------------------------------
                          RDMA_Write BW Test
       Dual-port       : OFF          Device         : mlx5_9
       Number of qps   : 1            Transport type : IB
       Connection type : RC           Using SRQ      : OFF
       CQ Moderation   : 100
       Mtu             : 4096[B]
       Link type       : IB
       Max inline data : 0[B]
       rdma_cm QPs     : OFF
       Data ex. method : Ethernet
      ---------------------------------------------------------------------------------------
       local address: LID 0x0d QPN 0x0ec6 PSN 0xa49cc3 RKey 0x0e0400 VAddr 0x007fa17b1ef000
       remote address: LID 0x0c QPN 0x0bac PSN 0xa6c47a RKey 0x0a0400 VAddr 0x007f54c7554000
      ---------------------------------------------------------------------------------------
       #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
       8388608    5000             96.58              96.54              0.001439
      ---------------------------------------------------------------------------------------
      
      
      
      Client:
      ---------------------------------------------------------------------------------------
                          RDMA_Write BW Test
       Dual-port       : OFF          Device         : mlx5_7
       Number of qps   : 1            Transport type : IB
       Connection type : RC           Using SRQ      : OFF
       TX depth        : 128
       CQ Moderation   : 100
       Mtu             : 4096[B]
       Link type       : IB
       Max inline data : 0[B]
       rdma_cm QPs     : OFF
       Data ex. method : Ethernet
      ---------------------------------------------------------------------------------------
       local address: LID 0x0c QPN 0x0ba9 PSN 0xf563c8 RKey 0x0a0400 VAddr 0x007fd3ff9cb000
       remote address: LID 0x0d QPN 0x0ec3 PSN 0x9445de RKey 0x0e0400 VAddr 0x007fac5f879000
      ---------------------------------------------------------------------------------------
       #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
       2          5000             0.11               0.11               6.680528
       4          5000             0.23               0.22               6.961376
       8          5000             0.48               0.43               6.746956
       16         5000             0.96               0.86               6.703131
       32         5000             1.90               1.80               7.021876
       64         5000             3.82               3.59               7.014856
       128        5000             7.45               7.02               6.853576
       256        5000             14.69              14.38              7.019255
       512        5000             28.51              27.75              6.774283
       1024       5000             54.31              48.41              5.909477
       2048       5000             82.91              80.73              4.927545
       4096       5000             95.75              95.62              2.918237
       8192       5000             95.88              95.88              1.462960
       16384      5000             96.18              96.15              0.733546
       32768      5000             96.49              96.37              0.367604
       65536      5000             96.54              96.53              0.184124
       131072     5000             96.56              96.55              0.092081
       262144     5000             96.56              96.56              0.046041
       524288     5000             96.57              96.57              0.023024
       1048576    5000             96.58              96.57              0.011512
       2097152    5000             96.58              96.52              0.005753
       4194304    5000             96.58              96.56              0.002878
       8388608    5000             96.58              96.56              0.001439
      ---------------------------------------------------------------------------------------
      
      
  7. Delete the sample deployment by running:

    K8s Master Node Console
    # root@sample-pod-65b94586b4-lz5x9:/tmp# exit
    # kubectl delete -f sample-depl.yaml

          

Done !



Boris Kovalev

Boris Kovalev has worked for the past several years as a Solutions Architect, focusing on NVIDIA Networking/Mellanox technology, and is responsible for complex machine learning, Big Data and advanced VMware-based cloud research and design. Boris previously spent more than 20 years as a senior consultant and solutions architect at multiple companies, most recently at VMware. He has written multiple reference designs covering VMware, machine learning, Kubernetes, and container solutions which are available at the Mellanox Documents website.



Vitaliy Razinkov

Over the past few years, Vitaliy Razinkov has been working as a Solutions Architect on the NVIDIA Networking team, responsible for complex Kubernetes/OpenShift and Microsoft's leading solutions, research and design. He previously spent more than 25 years in senior positions at several companies. Vitaliy has written several reference designs guides on Microsoft technologies, RoCE/RDMA accelerated machine learning in Kubernetes/OpenShift, and container solutions, all of which are available on the NVIDIA Networking Documentation website.








Notice

This document is provided for information purposes only and shall not be regarded as a warranty of a certain functionality, condition, or quality of a product. Neither NVIDIA Corporation nor any of its direct or indirect subsidiaries and affiliates (collectively: “NVIDIA”) make any representations or warranties, expressed or implied, as to the accuracy or completeness of the information contained in this document and assumes no responsibility for any errors contained herein. NVIDIA shall have no liability for the consequences or use of such information or for any infringement of patents or other rights of third parties that may result from its use. This document is not a commitment to develop, release, or deliver any Material (defined below), code, or functionality.
NVIDIA reserves the right to make corrections, modifications, enhancements, improvements, and any other changes to this document, at any time without notice.
Customer should obtain the latest relevant information before placing orders and should verify that such information is current and complete.
NVIDIA products are sold subject to the NVIDIA standard terms and conditions of sale supplied at the time of order acknowledgement, unless otherwise agreed in an individual sales agreement signed by authorized representatives of NVIDIA and customer (“Terms of Sale”). NVIDIA hereby expressly objects to applying any customer general terms and conditions with regards to the purchase of the NVIDIA product referenced in this document. No contractual obligations are formed either directly or indirectly by this document.
NVIDIA products are not designed, authorized, or warranted to be suitable for use in medical, military, aircraft, space, or life support equipment, nor in applications where failure or malfunction of the NVIDIA product can reasonably be expected to result in personal injury, death, or property or environmental damage. NVIDIA accepts no liability for inclusion and/or use of NVIDIA products in such equipment or applications and therefore such inclusion and/or use is at customer’s own risk.
NVIDIA makes no representation or warranty that products based on this document will be suitable for any specified use. Testing of all parameters of each product is not necessarily performed by NVIDIA. It is customer’s sole responsibility to evaluate and determine the applicability of any information contained in this document, ensure the product is suitable and fit for the application planned by customer, and perform the necessary testing for the application in order to avoid a default of the application or the product. Weaknesses in customer’s product designs may affect the quality and reliability of the NVIDIA product and may result in additional or different conditions and/or requirements beyond those contained in this document. NVIDIA accepts no liability related to any default, damage, costs, or problem which may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this document or (ii) customer product designs.
No license, either expressed or implied, is granted under any NVIDIA patent right, copyright, or other NVIDIA intellectual property right under this document. Information published by NVIDIA regarding third-party products or services does not constitute a license from NVIDIA to use such products or services or a warranty or endorsement thereof. Use of such information may require a license from a third party under the patents or other intellectual property rights of the third party, or a license from NVIDIA under the patents or other intellectual property rights of NVIDIA.
Reproduction of information in this document is permissible only if approved in advance by NVIDIA in writing, reproduced without alteration and in full compliance with all applicable export laws and regulations, and accompanied by all associated conditions, limitations, and notices.
THIS DOCUMENT AND ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, “MATERIALS”) ARE BEING PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT WILL NVIDIA BE LIABLE FOR ANY DAMAGES, INCLUDING WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, ARISING OUT OF ANY USE OF THIS DOCUMENT, EVEN IF NVIDIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. Notwithstanding any damages that customer might incur for any reason whatsoever, NVIDIA’s aggregate and cumulative liability towards customer for the products described herein shall be limited in accordance with the Terms of Sale for the product.

Trademarks
NVIDIA, the NVIDIA logo, and Mellanox are trademarks and/or registered trademarks of NVIDIA Corporation and/or Mellanox Technologies Ltd. in the U.S. and in other countries. Other company and product names may be trademarks of the respective companies with which they are associated.

Copyright
© 2022 NVIDIA Corporation & affiliates. All Rights Reserved.