This section describes the JSON configuration parameters used by GDS.
When GDS is installed, the /etc/cufile.json parameter file is installed with default values. The implementation allows for generic GDS settings and parameters specific to a file system or storage partner.
|logging:dir||CWD||Location of the GDS log file.|
|logging:level||ERROR||Verbosity of logging.|
|profile:nvtx||false||Boolean which if set to true, generates NVTX traces for profiling.|
|profile:cufile_stats||0||Enable cuFile IO stats. Level 0 means no cuFile statistics.|
|profile:io_batchsize||128||Maximum size of the batch allowed.|
|properties:max_direct_io_size_kb||16384||Maximum IO chunk size (4K aligned) used by cuFile for each IO request (in KB).|
|properties:max_device_cache_size_kb||131072||Maximum device memory size (4K aligned) for reserving bounce buffers for the entire GPU (in KB).|
|properties:max_device_pinned_mem_size_kb||33554432||Maximum per-GPU memory size in KB, including the memory for the internal bounce buffers, that can be pinned.|
|properties:use_poll_mode||false||Boolean that indicates whether the cuFile library uses polling or synchronous wait for the storage to complete IO. Polling might be useful for small IO transactions. Refer to Poll Mode below.|
|properties:poll_mode_max_size_kb||4||Maximum IO request size (4K aligned) in or equal to which library will be polled (in KB).|
|properties:allow_compat_mode||false||If true, enables the compatibility mode, which allows cuFile to issue POSIX read/write. To switch to GDS-enabled I/O, set this to false. Refer to Compatibility Mode below.|
|properties:rdma_dev_addr_list||empty||Provides the list of IPv4 addresses for all the interfaces that can be used for RDMA.|
|properties:rdma_load_balancing_policy||RoundRobin||Specifies the load balancing policy for RDMA memory registration.
By default, this value is set to RoundRobin. Here are the valid
values that can be used for this property:
MaxMinFit - This will try to assign peers in a manner that there is least sharing. Suitable for cases, where all GPUs are loaded uniformly.
RoundRobinMaxMin - Similar to RoundRobin but uses peers with least sharing.Randomized - This parameter uses only the NICs that are the closest to the GPU for memory registration in a randomized fashion.
|properties:rdma_dynamic_routing||false||Boolean parameter applicable only to Network Based File Systems. This could be enabled for platforms where GPUs and NICs do not share a common PCIe-root port.|
|properties:rdma_dynamic_routing_order||The routing order applies only if rdma_dynamic_routing is enabled. Users can specify an ordered list of routing policies selected when routing an IO on a first-fit basis.|
|fs:generic:posix_unaligned_writes||false||Setting to true forces the use of a POSIX write instead of cuFileWrite for unaligned writes.|
|fs:lustre:posix_gds_min_kb||4KB||Applicable only for the EXAScaler filesystem. This is applicable for reads and writes. IO threshold for read/write (4K aligned) that is equal to or below the threshold that cufile will use for a POSIX read/write.|
|fs:lustre:rdma_dev_addr_list||empty||Provides the list of IPv4 addresses for all the interfaces that can be used by a single lustre mount. This property is used by the cuFile dynamic routing feature to infer preferred RDMA devices.|
|fs:lustre:mount_table||empty||Specifies a dictionary of IPv4 mount addresses against a Lustre mount point.This property is used by the cuFile dynamic routing feature. Refer to the default cufile.json for sample usage.|
|fs:nfs:rdma_dev_addr_list||empty||Provides the list of IPv4 addresses for all the interfaces a single NFS mount can use. This property is used by the cuFile dynamic routing feature to infer preferred RDMA devices.|
|fs:nfs:mount_table||empty||Specifies a dictionary of IPv4 mount addresses against a Lustre mount point. This property is used by the cuFile dynamic routing feature. Refer to the default cufile.json for sample usage.|
|fs:weka:rdma_write_support||false||If set to true, cuFileWrite will use RDMA writes instead of falling back to posix writes for a WekaFs mount.|
|fs:weka:<rdma_dev_addr_list>||empty||Provides the list of IPv4 addresses for all the interfaces a single WekaFS mount can use. This property is also used by the cuFile dynamic routing feature to infer preferred rdma devices.|
|fs:weka:mount_table||empty||Specifies a dictionary of IPv4 mount addresses against a WekaFS mount point. This property is used by the cuFile dynamic routing feature. Refer to the default cufile.json for sample usage.|
|denylist:drivers||Administrative setting that disables supported storage drivers on the node.|
Administrative setting that disables specific supported block devices on the node.
Not applicable for DFS.
|denylist:mounts||Administrative setting that disables specific mounts in the supported GDS-enabled filesystems on the node.|
|denylist:filesystems||Administrative setting that disables specific supported GDS-ready filesystems on the node.|
The cuFile API set includes an interface to put the driver in polling mode. Refer to cuFileDriverSetPollMode() in the cuFile API Reference Guide for more information. When the poll mode is set, a read or write issued that is less than or equal to properties:poll_mode_max_size_kb (4KB by default) will result in the library polling for IO completion, rather than blocking (sleep). For small IO size workloads, enabling poll mode may reduce latency.
There are several possible scenarios where GDS might not be available or supported, for example, when the GDS software is not installed, the target file system is not GDS supported, O_DIRECT cannot be enabled on the target file, and so on. When you enable compatibility mode, and GDS is not functional for the IO target, the code that uses the cuFile APIs fall backs to the standard POSIX read/write path. To learn more about compatibility mode, refer to cuFile Compatibility Mode.
From a benchmarking and performance perspective, the default settings work very well across a variety of IO loads and use cases. We recommended that you use the default values for max_direct_io_size_kb, max_device_cache_size_kb, and max_device_pinned_mem_size_kb unless a storage provider has a specific recommendation, or analysis and testing show better performance after you change one or more of the defaults.
The cufile.json file has been designed to be extensible such that parameters can be set that are either generic and apply to all supported file systems (fs:generic), or file system specific (fs:lustre). The fs:generic:posix_unaligned_writes parameter enables the use of the POSIX write path when unaligned writes are encountered. Unaligned writes are generally sub-optimal, as they can require read-modify-write operations.
If the target workload generates unaligned writes, you might want to set posix_unaligned_writes to true, as the POSIX path for handling unaligned writes might be more performant, depending on the target filesystem and underlying storage. Also, in this case, the POSIX path will write to the page cache (system memory).
When the IO size is less than or equal to posix_gds_min_kb, the fs:lustre:posix_gds_min_kb setting invokes the POSIX read/write path rather than cuFile path. When using Lustre, for small IO sizes, the POSIX path can have better (lower) latency.
The GDS parameters are among several elements that factor into delivered storage IO performance. It is advisable to start with the defaults and only make changes based on recommendations from a storage vendor or based on empirical data obtained during testing and measurements of the target workload.