NVIDIA GPUDirect Storage Release Notes

Release information for NVIDIA® GPUDirect® Storage version 0.9v1.

1. Introduction

Release information for NVIDIA® GPUDirect® Storage (GDS) for developers and users.

GDS is the newest addition to the GPUDirect family. GDS enables a direct data path for direct memory access (DMA) transfers between GPU memory and storage, which avoids a bounce buffer through the CPU. This direct path increases system bandwidth and decreases the latency and utilization load on the CPU.

GDS is enabled on distributed filesystems like DDN Exascaler®, WekaFS, and VAST. GDS documents and online resources provide additional context for the optimal use of and understanding of GPUDirect Storage.

To learn more about GDS, refer to the following blogs:

2. New Features

This section provides information about the new features in this release.

The following features have been added since the 0.7 release:
  • Support for load balancing policy for user-space RDMA.
  • Support for user-space RDMA stats.
  • Support for the configuration file in gdsio.

3. MOFED and Filesystem Requirements

Here are the MOFED and filesystem requirements for GDS:

  • Ubuntu 18.04 and 20.04
  • MOFED 5.1-2.5.8.0 and later, which supports NVMe NVMeoF, NFSoRDMA (VAST) on Linux kernel 4.15.x and 5.4.X

    You need to install MOFED before you install GDS. Refer to Installing GPUDirect Storage for more information about installing MOFED.

  • The following distributed filesystems:
    • WekaFS 3.8.0
    • DDN Exascaler 5.2
    • VAST 3.4

4. Improvements

The following improvements have been made to GDS since version 0.7.

  • Performance improvements for Compatible Mode and non-registered GDS IO path.
  • Buffer pool for non-GDS path in compatibility mode.
  • Improvements to gdsio tool v1.1 for testing random IO buffer, offsets and sizes

5. Installing GPUDirect Storage

This section provides the steps to install GDS.

The GDS package contains three Debian packages:
  • gds_0.9.0_amd64.deb
  • gds-tools_0.9.0_amd64.deb
  • nvidia-fs_2.3_amd64.deb
Note: Each component has a README file. For example, for gds-tools, the README file is in the /usr/local/CUDA-X.y/tools/ directory.

To install GDS, complete the following steps:

  1. Run the following command to check the current status of IOMMU.
    $ dmesg | grep -i iommu
    • If IOMMU is disabled, verify that IOMMU disabled is displayed, and proceed to step 3 to install MOFED.
    • If IOMMU is enabled, complete step 2 to disable it.
  2. Disable IOMMU.
    1. Run the following command.
      $ sudo vi /etc/default/grub
    2. Add one of the folloiwng options to the GRUB_CMDLINE_LINUX_DEFAULT option.
      • If you have an AMD CPU, add amd_iommu=off.
      • If you have an Intel® CPU, add intel_iommu=off.

      If there are already other options, enter a space to separate the options, for example, GRUB_CMDLINE_LINUX_DEFAULT="console=tty0 amd_iommu=off.

    3. Run the following commands.
      $ sudo update-grub
      $ sudo reboot
    4. After the system reboots, to verify that the change took effect, run the following command.
      cat /proc/cmdline
  3. Install MOFED 5.1.
    Note: This is required for NVMe, NVMeOF, and NFS.
    1. To install GDS support with the MOFED 5.1 package, run the following command:
      $ sudo ./mlnxofedinstall --with-nfsrdma --with-nvmf --enable-gds --add-kernel-support
    2. Run the following command:
      $ sudo update-initramfs -u -k `uname -r`
    3. Restart your system.
      Important: You must install MOFED before you install GDS.
  4. Install GDS.
    Note: To download additional packages that use the Ubuntu advance packaging tool (APT) packaging software, ensure the machine can access your network.
    1. Download the Debian packages to your local client and complete the following tasks:
      1. Install the NVIDIA driver by using the APT package manager.

        The driver that is installed by using the NVIDIA-Linux-x86_64.<version>.run file is not supported with the nvidia-gds package.

      2. Download the correct GDS debian package that is based on the Ubuntu distribution and CUDA toolkit that you are using:
        • To download version 20.04, run the following command.
          $ sudo dpkg -i gpudirect-storage-local-repo-ubuntu2004-cuda-x.y-0.9.0_1.0-1_amd64.deb
        • To download version 18.04, run the following command.
          $ sudo dpkg -i gpudirect-storage-local-repo-ubuntu1804-cuda-x.y-0.9.0_1.0-1_amd64.deb
      3. To update to the downloaded Debian package, run the following commands.
        $ sudo apt-key add /var/gpudirect-storage-local-repo-*/7fa2af80.pub
        $ sudo apt-get update
    2. Install the GDS-related packages that use the nvidia-gds metapackage.

      If GDS version 0.8.0 installed, before you upgrade to version 0.9.0, run the following commands in the order in which they are listed:

      $ sudo dpkg --purge nvidia-fs
      $ sudo dpkg --purge gds-tools
      $ sudo dpkg --purge gds
    3. To display the current NVIDIA driver version in the system, run the following command.
      $ NVIDIA_DRV_VERSION=$(cat /proc/driver/nvidia/version | grep Module | awk '{print $8}' | cut -d '.' -f 1)
      • On DGX-based systems, or systems with NVIDIA prebuilt kernels, run the following commands to install nvidia-gds with correct dependencies:
        $ sudo apt install nvidia-gds nvidia-dkms-${NVIDIA_DRV_VERSION}-server
        $ sudo modprobe nvidia_fs
      • For systems that have the nvidia-dkms-${NVIDIA_DRV_VERSION} package installed, run the following commands.
        $ sudo apt install nvidia-gds
        $ sudo modprobe nvidia_fs
    4. To verify that the metapackage has installed, run the following command.
      $ dpkg -s nvidia-gds
      The following output shows that the metapackage has been installed:
      Package: nvidia-gds
      Status: install ok installed
      Priority: optional
      Section: multiverse/devel
      Installed-Size: 7
      Maintainer: cudatools <cudatools@nvidia.com>
      Architecture: amd64
      Source: gds-ubuntu1804
      Version: 0.9.0.15-1
      Provides: gds
      Depends: libcufile0, gds-tools, nvidia-fs
      Description: Metapackage for GPU Direct Storage
      GPU Direct Storage metapackage
    5. To verify that GDS installed, run gdscheck.
      $ /usr/local/cuda-x.y/gds/tools/gdscheck.py -p
      The following output shows that GDS version 0.9.0 has been installed:
      GDS release version (beta): 0.9.0.15
      nvidia_fs version:  2.3 libcufile version: 2.3
      cuFile CONFIGURATION:
      NVMe           : Supported
      NVMeOF         : Unsupported
      SCSI           : Unsupported
      SCALEFLUX CSD  : Unsupported
      LUSTRE         : Unsupported
      NFS            : Unsupported
      WEKAFS         : Supported
      USERSPACE RDMA : Supported
      --MOFED peer direct  : enabled
      --rdma library       : Loaded (libcufile_rdma.so)
      --rdma devices       : Configured
      --rdma_device_status : Up: 1 Down: 0
      properties.use_compat_mode : 1
      properties.use_poll_mode : 0
      properties.poll_mode_max_size_kb : 4
      properties.max_batch_io_timeout_msecs : 5
      properties.max_direct_io_size_kb : 16384
      properties.max_device_cache_size_kb : 131072
      properties.max_device_pinned_mem_size_kb : 33554432
      properties.posix_pool_slab_size_kb : 4096 1048576 16777216
      properties.posix_pool_slab_count : 128 64 32
      properties.rdma_peer_affinity_policy : RoundRobin
      fs.generic.posix_unaligned_writes : 0
      fs.lustre.posix_gds_min_kb: 0
      fs.weka.rdma_write_support: 0
      profile.nvtx : 0
      profile.cufile_stats : 3
      miscellaneous.api_check_aggressive : 0
      GPU INFO:
      GPU index 0 Tesla T4 bar:1 bar size (MB):256 supports GDS
      GPU index 1 Tesla T4 bar:1 bar size (MB):256 supports GDS
      GPU index 2 Tesla T4 bar:1 bar size (MB):256 supports GDS
      GPU index 3 Tesla T4 bar:1 bar size (MB):256 supports GDS
      IOMMU : disabled
      Platform verification succeeded

  5. If you have a RAID array with NVMe, run the following commands.
    $ sudo umount -l /dev/md1
    $ sudo mount -o data=ordered /dev/md1 /raid
  6. Update the /etc/fstab with following line:
    /dev/md1 /raid ext4 defaults,nofail,discard,data=ordered 0 0
After you install GDS, to verify filesystem support, run the following command.
/usr/local/CUDA-X.y/tools/gdscheck -p

6. Uninstalling GPUDirect Storage

This section provides the steps to uninstall GDS.

To uninstall GDS:

Run the following commands in the following order:
  1. $ dpkg --purge gds-tools
  2. $ dpkg --purge gds
  3. $ dpkg --purge nvidia-fs

7. Minor Updates and Bug Fixes

Here are the updates and bug fixes since version 0.7.

The following minor update and bug fix was made after version 0.7:
  • Bug fixes in nvidia-fs driver, libcufile, GDS tools to improve resiliency and supportability

8. Known Issues

This section provides informaton about the known issues in GDS.

A hang is observed in NFS environments when the process crashes.

9. Deprecations

This section provides information about the APIs that have been deprecated.

No APIs have been deprecated since version 0.7.

Notices

Notice

This document is provided for information purposes only and shall not be regarded as a warranty of a certain functionality, condition, or quality of a product. NVIDIA Corporation (“NVIDIA”) makes no representations or warranties, expressed or implied, as to the accuracy or completeness of the information contained in this document and assumes no responsibility for any errors contained herein. NVIDIA shall have no liability for the consequences or use of such information or for any infringement of patents or other rights of third parties that may result from its use. This document is not a commitment to develop, release, or deliver any Material (defined below), code, or functionality.

NVIDIA reserves the right to make corrections, modifications, enhancements, improvements, and any other changes to this document, at any time without notice.

Customer should obtain the latest relevant information before placing orders and should verify that such information is current and complete.

NVIDIA products are sold subject to the NVIDIA standard terms and conditions of sale supplied at the time of order acknowledgement, unless otherwise agreed in an individual sales agreement signed by authorized representatives of NVIDIA and customer (“Terms of Sale”). NVIDIA hereby expressly objects to applying any customer general terms and conditions with regards to the purchase of the NVIDIA product referenced in this document. No contractual obligations are formed either directly or indirectly by this document.

NVIDIA products are not designed, authorized, or warranted to be suitable for use in medical, military, aircraft, space, or life support equipment, nor in applications where failure or malfunction of the NVIDIA product can reasonably be expected to result in personal injury, death, or property or environmental damage. NVIDIA accepts no liability for inclusion and/or use of NVIDIA products in such equipment or applications and therefore such inclusion and/or use is at customer’s own risk.

NVIDIA makes no representation or warranty that products based on this document will be suitable for any specified use. Testing of all parameters of each product is not necessarily performed by NVIDIA. It is customer’s sole responsibility to evaluate and determine the applicability of any information contained in this document, ensure the product is suitable and fit for the application planned by customer, and perform the necessary testing for the application in order to avoid a default of the application or the product. Weaknesses in customer’s product designs may affect the quality and reliability of the NVIDIA product and may result in additional or different conditions and/or requirements beyond those contained in this document. NVIDIA accepts no liability related to any default, damage, costs, or problem which may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this document or (ii) customer product designs.

Notices

No license, either expressed or implied, is granted under any NVIDIA patent right, copyright, or other NVIDIA intellectual property right under this document. Information published by NVIDIA regarding third-party products or services does not constitute a license from NVIDIA to use such products or services or a warranty or endorsement thereof. Use of such information may require a license from a third party under the patents or other intellectual property rights of the third party, or a license from NVIDIA under the patents or other intellectual property rights of NVIDIA.

Reproduction of information in this document is permissible only if approved in advance by NVIDIA in writing, reproduced without alteration and in full compliance with all applicable export laws and regulations, and accompanied by all associated conditions, limitations, and notices.

THIS DOCUMENT AND ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, “MATERIALS”) ARE BEING PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT WILL NVIDIA BE LIABLE FOR ANY DAMAGES, INCLUDING WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, ARISING OUT OF ANY USE OF THIS DOCUMENT, EVEN IF NVIDIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. Notwithstanding any damages that customer might incur for any reason whatsoever, NVIDIA’s aggregate and cumulative liability towards customer for the products described herein shall be limited in accordance with the Terms of Sale for the product.

VESA DisplayPort

DisplayPort and DisplayPort Compliance Logo, DisplayPort Compliance Logo for Dual-mode Sources, and DisplayPort Compliance Logo for Active Cables are trademarks owned by the Video Electronics Standards Association in the United States and other countries.

HDMI

HDMI, the HDMI logo, and High-Definition Multimedia Interface are trademarks or registered trademarks of HDMI Licensing LLC.

OpenCL

OpenCL is a trademark of Apple Inc. used under license to the Khronos Group Inc.

Notices

Trademarks

NVIDIA, the NVIDIA logo, DGX, DGX-1, DGX-2, Tesla, and Quadro are trademarks and/or registered trademarks of NVIDIA Corporation in the Unites States and other countries. Other company and product names may be trademarks of the respective companies with which they are associated.