NVIDIA GPUDirect Storage Troubleshooting Guide

The NVIDIA® GPUDirect Storage® Troubleshooting Guide describes how to debug and isolate the performance and functional problems that are related to GDS and is intended for systems administrators and developers.

1. Introduction

This guide describes how to debug and isolate the NVIDIA® GPUDirect® Storage (GDS)-related performance and functional problems and is intended for systems administrators and developers.

GDS enables a direct data path for direct memory access (DMA) transfers between GPU memory and storage, which avoids a bounce buffer through the CPU. This direct path increases system bandwidth and decreases the latency and utilization load on the CPU.

Creating this direct path involves distributed filesystems like DDN EXAScaler® parallel filesystem solutions (based on the Lustre filesystem) and WekaFS, so the GDS environment is composed of multiple software and hardware components. This guide addresses questions that are related to the GDS installation and helps you triage functionality and performance issues. For non-GDS issues, contact the respective OEM or filesystems vendor to understand and debug the issue.

This guide describes how to debug and isolate the performance and functional problems that are related to GDS and is intended for systems administrators and developers.

For GDS support, contact GPUDirectStorageSupportExt@nvidia.com.

The following GDS technical specifications and guides provide additional context for the optimal use of and understanding of the solution:

On the NVIDIA Developer Blog, read: GPUDirect Storage: A Direct Path Between Storage and GPU Memory.

2. Troubleshooting Installation

The following scenarios might occur during GDS installation.

2.1. Before You Install GDS

Here are the software requirements that you need to install GDS:

  • Ubuntu 18.04 and 20.04
  • MOFED 5.1-0.6.6.0 and later, which supports NVMe NVMeoF, NFSoRDMA (VAST) on Linux kernel 4.15.x and 5.4.X
  • The following distributed filesystems:
    • WekaFS 3.8.0
    • DDN Exascaler 5.2
    • VAST

2.2. Verifying a Successful GDS Installation

This section provides information about how you can verify whether your GDS installation was successful.

To verify that GDS installation was successful, run gdscheck:
$ /usr/local/gds/gdscheck -p

The output of this command will show whether the EXAScaler® or WekaIO filesystem that was installed on the system will support GDS. The output also shows whether PCIe ACS is enabled on any of the PCI switches.

Note: For best GDS performance, disable PCIe ACS.
Run the following command:
$ /usr/local/gds/tools/gdscheck -p

Here is the sample output for this example:

GDS release version (beta-partner): 0.8
nvidia_fs version:  2.2 libcufile version: 2.2
CUfile CONFIGURATION:
LUSTRE support : 1
WEKAFS support : 1
properties.use_compat_mode : 0
properties.use_poll_mode : 0
properties.poll_mode_max_size_kb : 32
properties.max_batch_io_timeout_msecs : 5
properties.max_direct_io_size_kb : 16384
properties.max_device_cache_size_kb : 131072
properties.max_device_pinned_mem_size_kb : 33554432
fs.generic.posix_unaligned_writes : 0
fs.lustre.posix_gds_min_kb: 0
profile.nvtx : 0
profile.cufile_stats : 0
GPU INFO:
GPU Index: 0 bar:1 bar size (MB):32768
GPU Index: 1 bar:1 bar size (MB):32768
GPU Index: 2 bar:1 bar size (MB):32768
GPU Index: 3 bar:1 bar size (MB):32768
GPU Index: 4 bar:1 bar size (MB):32768
GPU Index: 5 bar:1 bar size (MB):32768
GPU Index: 6 bar:1 bar size (MB):32768
GPU Index: 7 bar:1 bar size (MB):32768
GPU Index: 8 bar:1 bar size (MB):32768
GPU Index: 9 bar:1 bar size (MB):32768
GPU Index: 10 bar:1 bar size (MB):32768
GPU Index: 11 bar:1 bar size (MB):32768
GPU Index: 12 bar:1 bar size (MB):32768
GPU Index: 13 bar:1 bar size (MB):32768
GPU Index: 14 bar:1 bar size (MB):32768
GPU Index: 15 bar:1 bar size (MB):32768
GPU index 0 Tesla V100-SXM3-32GB-H supports GDS
GPU index 1 Tesla V100-SXM3-32GB-H supports GDS
GPU index 2 Tesla V100-SXM3-32GB-H supports GDS
GPU index 3 Tesla V100-SXM3-32GB-H supports GDS
GPU index 4 Tesla V100-SXM3-32GB-H supports GDS
GPU index 5 Tesla V100-SXM3-32GB-H supports GDS
GPU index 6 Tesla V100-SXM3-32GB-H supports GDS
GPU index 7 Tesla V100-SXM3-32GB-H supports GDS
GPU index 8 Tesla V100-SXM3-32GB-H supports GDS
GPU index 9 Tesla V100-SXM3-32GB-H supports GDS
GPU index 10 Tesla V100-SXM3-32GB-H supports GDS
GPU index 11 Tesla V100-SXM3-32GB-H supports GDS
GPU index 12 Tesla V100-SXM3-32GB-H supports GDS
GPU index 13 Tesla V100-SXM3-32GB-H supports GDS
GPU index 14 Tesla V100-SXM3-32GB-H supports GDS
GPU index 15 Tesla V100-SXM3-32GB-H supports GDS
Platform verification succeeded

2.3. Installed GDS Libraries and Tools

Here is some information about how you can determine which GDS libraries and tools you installed.

To determine the GDS libraries and tools you installed, review the /usr/local/gds/ path:
$ ls -lh /usr/local/gds/
total 24k
drwxr-xr-x 3 root root 4.0K Apr 20 13:26 drivers
drwxr-xr-x 2 root root 4.0K Apr 21 23:08 lib
-rw-r--r-- 1 root root 4.5K Apr 20 13:26 README
drwxr-xr-x 2 root root 4.0K Apr 20 13:26 scripts
drwxr-xr-x 6 root root 4.0K Apr 21 23:50 tools

2.4. JSON Config Parameters Used by GDS

Here is a list of the JSON configuration paramaters that are used by GDS.

Table 1. JSON Configuration Parameters
Configuration Parameter Description
logging:dir

In the log directory for the cufile.log file.

If this parameter is not enabled, the log file is created under the current working directory. The default value is the current working directory.

logging:level The level indicates the type of messages that will be logged:
  • ERROR indicates only critical errors.
  • INFO indicates informational messages including the errors
  • DEBUG indicates that the log information includes the error, informational, and debugging the library.
  • TRACE indicates the lowest log level and will log all possible messages in the library.

The default value is ERROR.

properties:max_direct_io_size_kb

This parameter indicates the max unit of the IO size that is exchanged between the cuFile library and the storage system.

The default value is 16MB.

properties:max_device_cache_size_kb

This indicates the maximum per GPU memory size in KB that can be reserved for internal bounce buffers.

The default value is 128MB.

properties:max_device_pinned_mem_size_kb

This indicates the maximum per GPU memory size in KB, including the memory for the internal bounce buffers, that can be pinned.

The default value is 32GB.

properties:use_poll_mode

Boolean that indicates whether the cuFile library uses polling or synchronous wait for the storage to complete IO. Polling might be useful for small IO transactions.

The default value is false.

properties:poll_mode_max_size_kb The maximum IO size in KB that will be used as threshold when the polling mode is set to true.
properties:allow_compat_mode

If this parameter is set to true, the cuFile APIs work functionally with the nvidia-fs driver.

The purpose is to test newer filesystems, for the environments where GDS applications do not have the kernel driver installed, to install the driver, or complete comparison tests.

properties:rdma_dev_addr_list This parameter list provides the list of IPv4 addresses for all the interfaces that can be used for RDMA.
properties:rdma_load_balancing_policy This parameter is used to specify the load balancing policy for RDMA memory registration. By default, this value is set to RoundRobin. This parameter uses only the NICS that are the closest to the GPU for memory registration in a round robin fashion.
fs:generic:posix_unaligned_writes If this parameter is set to true, the GDS path is disabled for unaligned writes and will go through the POSIX compatibility mode.
fs:lustre:posix_gds_min_kb This parameter is applicable only for the EXAScaler® filesystem and is an option to fallback to the POSIX compatible mode for IO sizes that are smaller than the set KB value. This is applicable for reads and writes.
fs:weka:rdma_write_support If this parameter is set to true, cuFileWrite will use RDMA writes instead of falling back to posix writes for a WekaFs mount.
denylist:drivers This parameter is an administrative setting that disables supported storage drivers on the node.
denylist:devices

This parameter is an administrative setting that disables specific supported block devices on the node.

Not applicable for DFS.

denylist:mounts This parameter is an administrative setting that disables specific mounts in the supported GDS enabled filesystems on the node.
denylist:filesystems This parameter is an administrative setting that disables specific supported GDS-ready filesystems on the node.
profile.nvtx This parameter is boolean, which if set to true, generates NVTX traces for profiling.
profile.cufile_stats This parameter is an integer that ranges between 0 and 3, in increasing order of verbosity, to show GDS per process user-space statistics.

CUFILE_ENV_PATH_JSON is the environment variable that sets the default path of the /etc/cufile.json file for a specific application instance to use different settings for the application and further restrict using the denylist option if the application is not ready for that filesystem or mount paths.

2.5. Determining Which Version of GDS is Installed

Here is some information about how you can determine your GDS version.

To determine which version of GDS you have, run the following command:
$ /usr/local/gds/tools/gdscheck -v
Review the output, for example:
GDS release version (beta): 0.8
nvidia_fs version:  2.0 libcufile version: 2.2

3. API Errors

This section provides information about the API errors you might get when using GDS.

3.1. CU_FILE_DRIVER_NOT_INITIALIZED

Here is some information about the CU_FILE_DRIVER_NOT_INITIALIZED API error.

If the cuFileDriverOpen API is not called, errors that are encountered in the implicit call to driver initialization will be reported as cuFile errors that were encountered when calling cuFileBufRegister or cuFileHandleRegister.

3.2. CU_FILE_DEVICE_NOT_SUPPORTED

Here is some information about the CU_FILE_DEVICE_NOT_SUPPORTED error.

GDS is supported only on NVIDIA graphics processing units (GPU) Tesla® or Quadro® models with the compute mode supported and a compute major capability greater than or equal to 6. This includes V100 and T4 cards.

3.3. CU_FILE_IO_NOT_SUPPORTED

Here is some information about the CU_FILE_IO_NOT_SUPPORTED error.

GDS is currently only supported on EXAScaler® and Weka IO filesystems. If the file descriptor is from a local filesystem, or a mount that is not GDS ready, the API returns this error.

Here is a list of common reasons for this error:
  • The file descriptor belongs to an unsupported filesystem.
  • The specified fd is not a regular UNIX file.
  • O_DIRECT is not specified on the file.
  • Any combination of encryption, and compression, compliance settings on the fd are set.

    For example, FS_COMPR_FL | FS_ENCRYPT_FL | FS_APPEND_FL | FS_IMMUTABLE_FL.

    Note: These settings are allowed when compat_mode is set to true.
  • Any combination of unsupported file modes are specified in the open call for the fd.
O_APPEND | O_NOCTTY | O_NONBLOCK | O_DIRECTORY | O_NOFOLLOW | O_TMPFILE

3.4. CU_FILE_CUDA_MEMORY_TYPE_INVALID

Here is some information about the CU_FILE_CUDA_MEMORY_TYPE_INVALID error.

Physical memory for cudaMallocManaged memory is allocated dynamically at the first use and currently does not provide a mechanism to expose physical memory or Base Address Register (BAR) memory to pin for use in GDS. However, GDS indirectly supports cudaMallocManaged memory when the memory is used as an unregistered buffer with cuFileWrite and cuFileRead.

4. Basic Troubleshooting

This section provides information about basic troubleshooting for GDS.

4.1. Log Files for the GDS Library

Here is some information about troubleshooting the GDS library log files.

A cufile.log file is created in the same location where the application binaries are located. Currently the maximum log file size is 32MB. If the log file size increases to greater than 32MB, the log file is truncated and logging is resumed on the same file.

4.2. Enabling a Different cufile.log File for Each Application

You can enable a different cufile.log file for each application.

There are several relevant cases:
  • If the logging:dir property in the default /etc/cufile.json file is not set, by default, the cufile.log file is generated in the current working directory of the application.
  • If the logging:dir property is set in the default /etc/cufile.json file, the log file is created in the specified directory path.
Note: This is usually not recommended for scenarios where multiple applications use the libcufile.so library.
For example:
E.g 
 "logging": {
    // log directory, if not enabled 
    // will create log file under current working      
    // directory
      "dir": "/opt/gdslogs/",
}

The cufile.log will be created as a /opt/gdslogs/cufile.log file.

If the application needs to enable a different cufile.log for different applications, the application can override the default JSON path by doing the following steps:

  1. Export CUFILE_ENV_PATH_JSON="/opt/myapp/cufile.json".
  2. Edit the /opt/myapp/cufile.json file.
    "logging": {
        // log directory, if not enabled 
        // will create log file under current working 
        // directory
        "dir": "/opt/myapp",
    }
  3. Run the application.
  4. To check for logs, run $ ls -l /opt/myapp/cufile.log.

4.3. Enabling Tracing GDS Library API Calls

There are different logging levels, which can be enabled in the /etc/cufile.json file.

By default, logging level is set to ERROR. Logging will have performance impact as we increase the verbosity levels like INFO, DEBUG, and TRACE, and should be enabled only to debug field issues.
Run the following:
"logging": {
// log directory, if not enabled // will create log file under local directory//"dir": "/home/<xxxx>",

// ERROR|WARN|INFO|DEBUG|TRACE (in decreasing order of priority)
"level": "ERROR"
},

4.4. cuFileHandleRegister Error

Here is some information about the cuFileHandleRegister error.

If you see this error on the cufile.log file when an IO is issued:
“cuFileHandleRegister error: GPUDirect Storage not supported on current file.”
Here are some reasons why this error might occur:
  • The filesystem is not supported by GDS.

    See CU_FILE_DEVICE_NOT_SUPPORTED for more information.

  • The DIRECT_IO functionality is not supported for the mount on which the file resides.

For more information, enable tracing in the /etc/cufile.json file.

4.5. Troubleshooting Applications that Return cuFile Errors

Here is some information about how to troubleshoot cuFile errors.

To debug these errors:

  1. See the cufile.h file for more information about errors that are returned by the API.
  2. If the IO was submitted to the GDS driver, check whether there are any errors in GDS stats.

    If the IO fails, the error stats should provide information about the type of error.

    See Finding the GDS Driver Statistics for more information.

  3. Enable GDS library tracing and monitor the cufile.log file.
  4. Enable GDS Driver debugging:
    $ echo 1 >/sys/module/nvidia_fs/parameters/dbg_enabled
After the driver debug logs are enabled, you might get more information about the error.

4.6. cuFile-* Errors with No Activity in the GDS Statistics

This section provides information about a scenario where there are cuFile errorsin the GDS statistics.

This issue means that the API failed in the GDS library. You can enable tracing by setting the appropriate logging level in the /etc/cufile.json file to get more information about the failure in cufile.log.

4.7. CUDA Runtime and Driver Mismatch with Error Code 35

Here is some information about how to resolve CUDA error 35.

Error code 35 from the CUDA documentation points to cudaErrorInsufficientDriver, which indicates that the installed NVIDIA CUDA® driver is older than the CUDA runtime library. This is not a supported configuration. For the application to run, you must update the NVIDIA display driver.

Note: cufIle tools depend on CUDA runtime 10.1 and later. You must ensure that the installed CUDA runtime is compatible with the installed CUDA driver and is at the recommended version.

4.8. CUDA API Errors when Running the cuFile-* APIs

Here is some information about CUDA API errors.

The GDS library uses the CUDA driver APIs. If you observe CUDA API errors, you will observe an error code. Refer to the error codes in the CUDA Libraries documentation for more information.

4.9. Finding GDS Driver Statistics

Here is some information about how you can find the driver statistics.

To find the GDS Driver Statistics, run the following command:
$ cat /proc/driver/nvidia-fs/stats

See Finding GDS Statistics for more information about statistics.

GDS Driver kernel statistics for READ/WRITE are available only for the EXAScaler® filesystem. See the WekaIO filesystem-specific section for more information about READ/WRITE.

4.10. Tracking IO Activity that Goes Through the GDS Driver

Here is some information about tracking IO activity.

In GDS Driver statistics, the ops row shows the active IO operation. The Read and Write fields show the current active operation in flight. This information should provide an idea of how many total IOs are in flight across all applications in the kernel. If there is a bottleneck in the userspace, the number of active IOs will be less than the number of threads that are submitting the IO. Additionally, to get more details about the Read and Write bandwidth numbers, look out for counters in the Read/Write rows.

4.11. Read/Write Bandwidth and Latency Numbers in GDS Stats

Here is some information about Read/Write bandwidth and latency numbers in GDS.

Measured latencies begin when the IO is submitted and end when the IO completion is received by the GDS kernel driver. Userspace latencies are not reported. This should provide an idea whether the user space is bottlenecked or whether the IO is bottlenecked on the backend disks/fabric.

Note: The WekaIO filesystem reads do not go through the nvidia-fs driver, so Read/Write bandwidth stats are not available for WekaIO filesystem by using this interface.

See the WekaIO filesystem-specific statistics for more information.

4.12. Tracking Registration and Deregistration of GPU Buffers

This section provides information about registering and deregistering GPU buffers.

In GDS Driver stats, look for the active field in BAR1-map stats row. The pinning and unpinning of GPU memory through cuFileBufRegister and cuFileBufDeregister is an expensive operation. If you notice a large number of registrations(n) and deregistration(free) in the nvidia-fs stats, it can hurt performance.

Notices

Notice

This document is provided for information purposes only and shall not be regarded as a warranty of a certain functionality, condition, or quality of a product. NVIDIA Corporation (“NVIDIA”) makes no representations or warranties, expressed or implied, as to the accuracy or completeness of the information contained in this document and assumes no responsibility for any errors contained herein. NVIDIA shall have no liability for the consequences or use of such information or for any infringement of patents or other rights of third parties that may result from its use. This document is not a commitment to develop, release, or deliver any Material (defined below), code, or functionality.

NVIDIA reserves the right to make corrections, modifications, enhancements, improvements, and any other changes to this document, at any time without notice.

Customer should obtain the latest relevant information before placing orders and should verify that such information is current and complete.

NVIDIA products are sold subject to the NVIDIA standard terms and conditions of sale supplied at the time of order acknowledgement, unless otherwise agreed in an individual sales agreement signed by authorized representatives of NVIDIA and customer (“Terms of Sale”). NVIDIA hereby expressly objects to applying any customer general terms and conditions with regards to the purchase of the NVIDIA product referenced in this document. No contractual obligations are formed either directly or indirectly by this document.

NVIDIA products are not designed, authorized, or warranted to be suitable for use in medical, military, aircraft, space, or life support equipment, nor in applications where failure or malfunction of the NVIDIA product can reasonably be expected to result in personal injury, death, or property or environmental damage. NVIDIA accepts no liability for inclusion and/or use of NVIDIA products in such equipment or applications and therefore such inclusion and/or use is at customer’s own risk.

NVIDIA makes no representation or warranty that products based on this document will be suitable for any specified use. Testing of all parameters of each product is not necessarily performed by NVIDIA. It is customer’s sole responsibility to evaluate and determine the applicability of any information contained in this document, ensure the product is suitable and fit for the application planned by customer, and perform the necessary testing for the application in order to avoid a default of the application or the product. Weaknesses in customer’s product designs may affect the quality and reliability of the NVIDIA product and may result in additional or different conditions and/or requirements beyond those contained in this document. NVIDIA accepts no liability related to any default, damage, costs, or problem which may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this document or (ii) customer product designs.

Notices

No license, either expressed or implied, is granted under any NVIDIA patent right, copyright, or other NVIDIA intellectual property right under this document. Information published by NVIDIA regarding third-party products or services does not constitute a license from NVIDIA to use such products or services or a warranty or endorsement thereof. Use of such information may require a license from a third party under the patents or other intellectual property rights of the third party, or a license from NVIDIA under the patents or other intellectual property rights of NVIDIA.

Reproduction of information in this document is permissible only if approved in advance by NVIDIA in writing, reproduced without alteration and in full compliance with all applicable export laws and regulations, and accompanied by all associated conditions, limitations, and notices.

THIS DOCUMENT AND ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, “MATERIALS”) ARE BEING PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT WILL NVIDIA BE LIABLE FOR ANY DAMAGES, INCLUDING WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, ARISING OUT OF ANY USE OF THIS DOCUMENT, EVEN IF NVIDIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. Notwithstanding any damages that customer might incur for any reason whatsoever, NVIDIA’s aggregate and cumulative liability towards customer for the products described herein shall be limited in accordance with the Terms of Sale for the product.

VESA DisplayPort

DisplayPort and DisplayPort Compliance Logo, DisplayPort Compliance Logo for Dual-mode Sources, and DisplayPort Compliance Logo for Active Cables are trademarks owned by the Video Electronics Standards Association in the United States and other countries.

HDMI

HDMI, the HDMI logo, and High-Definition Multimedia Interface are trademarks or registered trademarks of HDMI Licensing LLC.

OpenCL

OpenCL is a trademark of Apple Inc. used under license to the Khronos Group Inc.

Notices

Trademarks

NVIDIA, the NVIDIA logo, DGX, DGX-1, DGX-2, Tesla, and Quadro are trademarks and/or registered trademarks of NVIDIA Corporation in the Unites States and other countries. Other company and product names may be trademarks of the respective companies with which they are associated.