nvidia.dali.fn.readers.numpy#

nvidia.dali.fn.readers.numpy( *, device=None, name=None, bytes_per_sample_hint=[0], cache_header_information=False, dont_use_mmap=False, file_filter='*.npy', file_list=None, file_root=None, files=None, fill_value=0.0, initial_fill=1024, lazy_init=False, num_shards=1, out_of_bounds_policy='error', pad_last_batch=False, prefetch_queue_depth=1, preserve=False, random_shuffle=False, read_ahead=False, register_buffers=True, rel_roi_end=None, rel_roi_shape=None, rel_roi_start=None, roi_axes=[], roi_end=None, roi_shape=None, roi_start=None, seed=-1, shard_id=0, shuffle_after_epoch=False, shuffle_after_epoch_seed=None, skip_cached_images=False, stick_to_shard=False, tensor_init_bytes=1048576, use_o_direct=False, )#

Reads Numpy arrays from a directory.

This operator can be used in the following modes:

Read all files from a directory indicated by file_root that match given file_filter.
Read file names from a text file indicated in file_list argument.
Read files listed in files argument.

Note

The gpu backend requires cuFile/GDS support (418.x driver family or newer). which is shipped with the CUDA toolkit starting from CUDA 11.4. Please check the GDS documentation for more details.

The gpu reader reads the files in chunks. The size of the chunk can be controlled process-wide with an environment variable DALI_GDS_CHUNK_SIZE. Valid values are powers of 2 between 4096 and 16M, with the default being 2M. For convenience, the value can be specified with a k or M suffix, applying a multiplier of 1024 and 2^20, respectively.

Supported backends

‘cpu’
‘gpu’

Keyword Arguments:

bytes_per_sample_hint¶ (int or list of int, optional, default = [0]) –
Output size hint, in bytes per sample.

If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
cache_header_information¶ (bool, optional, default = False) – If set to True, the header information for each file is cached, improving access speed.
dont_use_mmap¶ (bool, optional, default = False) –
If set to True, the Loader will use plain file I/O instead of trying to map the file in memory.

Mapping provides a small performance benefit when accessing a local file system, but most network file systems, do not provide optimum performance.
file_filter¶ (str, optional, default = ‘*.npy’) –
If a value is specified, the string is interpreted as glob string to filter the list of files in the sub-directories of the file_root.

This argument is ignored when file paths are taken from file_list or files.
file_list¶ (str, optional) –
Path to a text file that contains filenames (one per line) where the filenames are relative to the location of that file or to file_root, if specified.

This argument is mutually exclusive with files.
file_root¶ (str, optional) –
Path to a directory that contains the data files.

If not using file_list or files. this directory is traversed to discover the files. file_root is required in this mode of operation.
files¶ (str or list of str, optional) –
A list of file paths to read the data from.

If file_root is provided, the paths are treated as being relative to it.

This argument is mutually exclusive with file_list.
fill_value¶ (float, optional, default = 0.0) – Determines the padding value when out_of_bounds_policy is set to “pad”.
initial_fill¶ (int, optional, default = 1024) –
Size of the buffer that is used for shuffling.

If random_shuffle is False, this parameter is ignored.
lazy_init¶ (bool, optional, default = False) – Parse and prepare the dataset metadata only during the first run instead of in the constructor.
num_shards¶ (int, optional, default = 1) –
Partitions the data into the specified number of parts (shards).

This is typically used for multi-GPU or multi-node training.
out_of_bounds_policy¶ (str, optional, default = ‘error’) –
Determines the policy when reading outside of the bounds of the numpy array.

Here is a list of the supported values:
- "error" (default): Attempting to read outside of the bounds of the image will produce an error.
- "pad": The array will be padded as needed with zeros or any other value that is specified with the fill_value argument.
- "trim_to_shape": The ROI will be cut to the bounds of the array.
pad_last_batch¶ (bool, optional, default = False) –
If set to True, pads the shard by repeating the last sample.

Note

If the number of batches differs across shards, this option can cause an entire batch of repeated samples to be added to the dataset.
prefetch_queue_depth¶ (int, optional, default = 1) –
Specifies the number of batches to be prefetched by the internal Loader.

This value should be increased when the pipeline is CPU-stage bound, trading memory consumption for better interleaving with the Loader thread.
preserve¶ (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
random_shuffle¶ (bool, optional, default = False) –
Determines whether to randomly shuffle data.

A prefetch buffer with a size equal to initial_fill is used to read data sequentially, and then samples are selected randomly to form a batch.
read_ahead¶ (bool, optional, default = False) –
Determines whether the accessed data should be read ahead.

For large files such as LMDB, RecordIO, or TFRecord, this argument slows down the first access but decreases the time of all of the following accesses.
register_buffers¶ (bool, optional, default = True) –
Applies only to the gpu backend type.

Warning

This argument is temporarily disabled and left for backward compatibility. It will be reenabled in the future releases.

If true, the device I/O buffers will be registered with cuFile. It is not recommended if the sample sizes vary a lot.
rel_roi_end¶ (float or list of float or TensorList of float, optional) –
End of the region-of-interest, in relative coordinates (range [0.0 - 1.0]).

This argument is incompatible with “roi_end”, “roi_shape” and “rel_roi_shape”.
rel_roi_shape¶ (float or list of float or TensorList of float, optional) –
Shape of the region-of-interest, in relative coordinates (range [0.0 - 1.0]).

This argument is incompatible with “roi_shape”, “roi_end” and “rel_roi_end”.
rel_roi_start¶ (float or list of float or TensorList of float, optional) –
Start of the region-of-interest, in relative coordinates (range [0.0 - 1.0]).

This argument is incompatible with “roi_start”.
roi_axes¶ (int or list of int, optional, default = []) –
Order of dimensions used for the ROI anchor and shape arguments, as dimension indices.

If not provided, all the dimensions should be specified in the ROI arguments.
roi_end¶ (int or list of int or TensorList of int, optional) –
End of the region-of-interest, in absolute coordinates.

This argument is incompatible with “rel_roi_end”, “roi_shape” and “rel_roi_shape”.
roi_shape¶ (int or list of int or TensorList of int, optional) –
Shape of the region-of-interest, in absolute coordinates.

This argument is incompatible with “rel_roi_shape”, “roi_end” and “rel_roi_end”.
roi_start¶ (int or list of int or TensorList of int, optional) –
Start of the region-of-interest, in absolute coordinates.

This argument is incompatible with “rel_roi_start”.
seed¶ (int, optional, default = -1) – Random seed; if not set, one will be assigned automatically.
shard_id¶ (int, optional, default = 0) – Index of the shard to read.
shuffle_after_epoch¶ (bool, optional, default = False) –
If set to True, the reader shuffles the entire dataset after each epoch.

stick_to_shard and random_shuffle cannot be used when this argument is set to True.
shuffle_after_epoch_seed¶ (int, optional) –
Random seed for the dataset shuffling performed after each epoch.

If not provided, a fixed default seed is used, which results in the same shuffling pattern across different training runs. Providing a custom seed allows for different shuffle patterns across training runs, which may be desirable for better statistical properties.

Note

When using multiple DALI pipelines (e.g., for multi-GPU training), all pipeline instances should use the same shuffle_after_epoch_seed to ensure a consistent global shuffle across all shards.

Note

This argument has no effect unless shuffle_after_epoch is set to True.
skip_cached_images¶ (bool, optional, default = False) –
If set to True, the loading data will be skipped when the sample is in the decoder cache.

In this case, the output of the loader will be empty.
stick_to_shard¶ (bool, optional, default = False) –
Determines whether the reader should stick to a data shard instead of going through the entire dataset.

If decoder caching is used, it significantly reduces the amount of data to be cached, but might affect accuracy of the training.
tensor_init_bytes¶ (int, optional, default = 1048576) – Hint for how much memory to allocate per image.
use_o_direct¶ (bool, optional, default = False) –
If set to True, the data will be read directly from the storage bypassing system cache.

Mutually exclusive with dont_use_mmap=False.