cuFile Compatibility Mode

Use Cases

cuFile APIs can be used in different scenarios:

  • Developers building GPUDirect Storage applications with cuFile APIs, but don’t have the supported hardware configurations.
  • Developers building applications running on GPU cards that have CUDA compute capability > 6, but don’t have BAR space exposed.
  • Deployments where nvidia-fs.ko is not loaded or cannot be loaded.
  • Deployments where the Linux distribution does not support GPUDirect Storage.
  • Deployments where the filesystem may be not supported with GPUDirect Storage.
  • Deployments where the network links are not enabled with RDMA support.
  • Deployment where the configuration is not optimal for GPUDirect Storage.

Behavior

The cuFile library provides a mechanism for cuFile reads and writes to use compatibility mode using POSIX pread, pwrite, and aio_submit APIS respectively to host memory and copying to GPU memory when applicable. The behavior of compatibility mode with cuFile APIs is determined by the following configuration parameters.
Configuration Option (default) cuFile IO Behavior
"allow_compat_mode": true If true, falls back to using compatibility mode when the library detects that the buffer file descriptor opened cannot use GPUDirect Storage.
"force_compat_mode": false If true, this option can be used to force all IO to use compatibility mode. Alternatively the admin can unload the nvidia_fs.ko or not expose the character devices in the docker container environment.
"gds_rdma_write_support": true If false, forces compatibility mode to be used for writes even when the underlying file system is capable of performing GPUDirect Storage writes.

Note: If the option is “false”, this option will override and disable any filesystem-specific option to enable RDMA writes.

"posix_unaligned_writes" : false

If true, forces compatibility mode to be used for writes where the file offset and/or IO size is not aligned to Page Boundary (4KB).

“lustre:posix_gds_min_kb" : 0

For a lustre filesystem, if greater than 0, compatibility mode is used for IO sizes between [1 - posix_gds_min_kb] specified in kB.

Note: This option will force posix mode even if “allow_compat_mode” is set to “false”.

"weka:rdma_write_support" : false

If this option is false, all writes to WekaFS will use compatibility mode.

Note: If the option is set to “false”, cuFile library will use the posix path even if the allow_compat_mode option is true or false.

"gpfs:gds_write_support" : false

If this option is false, all writes to IBM Spectrum Scale will use compatibility mode.

Note: If the option is set to “false”, cuFile library will use the posix path even if the allow_compat_mode option is true or false.

"rdma_dynamic_routing": false,

"rdma_dynamic_routing_order": [ " "SYS_MEM" ]

If rdma_dynamic_routing is set to true and rdma_dynamic_routing_order is set to [“SYS_MEM”], then all IO for DFS will use compatibility mode.

In addition to the above configuration options, compatibility mode will be used as a fallback option for following use cases.

Use Case cuFile IO Behavior
No BAR1 memory in GPU. Use compatibility mode.

For wekaFS or IBM Spectrum Scale mounts: If there are no rdma_dev_addr_list specified, or failure to register MR with ib device.

Use compatibility mode.
Bounce buffers cannot be allocated in GPU memory. Use compatibility mode.
For WekaFS and IBM Spectrum Scale: If the kernel returns -ENOTSUP for GPUDirect Storage read/write. Retry the IO operation internally using compatibility mode.
cuFile Stream and cuFile Batch APIs on IBM Spectrum Scale or WekaFS All Async and batch operations will internally use compatibility mode IO.
The nvidia_fs.ko driver is not loaded. All IO operations will use compatibility mode.

Limitations

  • Compatible mode does not work in cases where the GPUs have CUDA compute capability less than 6.
  • GDS Compat mode has been tested and works with GDS enabled file systems and environments. It has not been tested to work on all other filesystems.
© Copyright 2024, NVIDIA. Last updated on Apr 3, 2024.