GPUDirect Storage Parameters

This section describes the JSON configuration parameters used by GDS.

When GDS is installed, the /etc/cufile.json parameter file is installed with default values. The implementation allows for generic GDS settings and parameters specific to a file system or storage partner.

Note: Consider compat_mode for systems or mounts that are not yet set up with GDS support.
Table 1. GPUDirect Storage cufile.json Variables
Parameter Default Value Description
logging:dir CWD Location of the GDS log file.
logging:level ERROR Verbosity of logging.
profile:nvtx false Boolean which if set to true, generates NVTX traces for profiling.
profile:cufile_stats 0 Enable cuFile IO stats. Level 0 means no cuFile statistics.
properties:max_direct_io_size_kb 16384 Maximum IO chunk size (4K aligned) used by cuFile for each IO request (in KB).
properties:max_device_cache_size_kb 131072 Maximum device memory size (4K aligned) for reserving bounce buffers for the entire GPU (in KB).
properties:max_device_pinned_mem_size_kb 33554432 Maximum per-GPU memory size in KB, including the memory for the internal bounce buffers, that can be pinned.
properties:use_poll_mode false Boolean that indicates whether the cuFile library uses polling or synchronous wait for the storage to complete IO. Polling might be useful for small IO transactions. Refer to Poll Mode below.
properties:poll_mode_max_size_kb 4 Maximum IO request size (4K aligned) in or equal to which library will be polled (in KB).
properties:allow_compat_mode false If true, enables the compatibility mode, which allows cuFile to issue POSIX read/write. To switch to GDS-enabled I/O, set this to false. Refer to Compatibility Mode below.
properties:rdma_dev_addr_list empty Provides the list of IPv4 addresses for all the interfaces that can be used for RDMA.
properties:rdma_load_balancing_policy RoundRobin Specifies the load balancing policy for RDMA memory registration. By default, this value is set to RoundRobin. Here are the valid values that can be used for this property:

FirstFit - Suitable for cases where numGpus matches numPeers and GPU PCIe lane width is greater or equal to the peer PCIe lane width.

MaxMinFit - This will try to assign peers in a manner that there is least sharing. Suitable for cases, where all GPUs are loaded uniformly.

RoundRobin - This parameter uses only the NICs that are the closest to the GPU for memory registration in a round robin fashion.

RoundRobinMaxMin - Similar to RoundRobin but uses peers with least sharing.

Randomized - This parameter uses only the NICs that are the closest to the GPU for memory registration in a randomized fashion.
properties:rdma_dynamic_routing false Boolean parameter applicable only to Network Based File Systems. This could be enabled for platforms where GPUs and NICs do not share a common PCIe-root port.
properties:rdma_dynamic_routing_order   The routing order applies only if rdma_dynamic_routing is enabled. Users can specify an ordered list of routing policies selected when routing an IO on a first-fit basis.
fs:generic:posix_unaligned_writes false Setting to true forces the use of a POSIX write instead of cuFileWrite for unaligned writes.
fs:lustre:posix_gds_min_kb 4KB Applicable only for the EXAScaler filesystem. This is applicable for reads and writes. IO threshold for read/write (4K aligned) that is equal to or below the threshold that cufile will use for a POSIX read/write.
fs:lustre:rdma_dev_addr_list empty Provides the list of IPv4 addresses for all the interfaces that can be used by a single lustre mount. This property is used by the cuFile dynamic routing feature to infer preferred RDMA devices.
fs:lustre:mount_table empty Specifies a dictionary of IPv4 mount addresses against a Lustre mount point.This property is used by the cuFile dynamic routing feature. Refer to the default cufile.json for sample usage.
fs:nfs:rdma_dev_addr_list empty Provides the list of IPv4 addresses for all the interfaces a single NFS mount can use. This property is used by the cuFile dynamic routing feature to infer preferred RDMA devices.
fs:nfs:mount_table empty Specifies a dictionary of IPv4 mount addresses against a Lustre mount point. This property is used by the cuFile dynamic routing feature. Refer to the default cufile.json for sample usage.
fs:weka:rdma_write_support false If set to true, cuFileWrite will use RDMA writes instead of falling back to posix writes for a WekaFs mount.
fs:weka:<rdma_dev_addr_list> empty Provides the list of IPv4 addresses for all the interfaces a single WekaFS mount can use. This property is also used by the cuFile dynamic routing feature to infer preferred rdma devices.
fs:weka:mount_table empty Specifies a dictionary of IPv4 mount addresses against a WekaFS mount point. This property is used by the cuFile dynamic routing feature. Refer to the default cufile.json for sample usage.
denylist:drivers   Administrative setting that disables supported storage drivers on the node.
denylist:devices  

Administrative setting that disables specific supported block devices on the node.

Not applicable for DFS.

denylist:mounts   Administrative setting that disables specific mounts in the supported GDS-enabled filesystems on the node.
denylist:filesystems   Administrative setting that disables specific supported GDS-ready filesystems on the node.
Note: Workload/application-specific parameters can be set by using the CUFILE_ENV_PATH_JSON environment variable that is set to point to an alternate cufile.json file, for example, CUFILE_ENV_PATH_JSON=/home/gds_user/my_cufile.json.
There are two mode types that you can set in the cufile.json configuration file:

From a benchmarking and performance perspective, the default settings work very well across a variety of IO loads and use cases. We recommended that you use the default values for max_direct_io_size_kb, max_device_cache_size_kb, and max_device_pinned_mem_size_kb unless a storage provider has a specific recommendation, or analysis and testing show better performance after you change one or more of the defaults.

The cufile.json file has been designed to be extensible such that parameters can be set that are either generic and apply to all supported file systems (fs:generic), or file system specific (fs:lustre). The fs:generic:posix_unaligned_writes parameter enables the use of the POSIX write path when unaligned writes are encountered. Unaligned writes are generally sub-optimal, as they can require read-modify-write operations.

If the target workload generates unaligned writes, you might want to set posix_unaligned_writes to true, as the POSIX path for handling unaligned writes might be more performant, depending on the target filesystem and underlying storage. Also, in this case, the POSIX path will write to the page cache (system memory).

When the IO size is less than or equal to posix_gds_min_kb, the fs:lustre:posix_gds_min_kb setting invokes the POSIX read/write path rather than cuFile path. When using Lustre, for small IO sizes, the POSIX path can have better (lower) latency.

The GDS parameters are among several elements that factor into delivered storage IO performance. It is advisable to start with the defaults and only make changes based on recommendations from a storage vendor or based on empirical data obtained during testing and measurements of the target workload.