nvidia.dali.fn.readers.numpy#
- nvidia.dali.fn.readers.numpy(*, bytes_per_sample_hint=[0], cache_header_information=False, dont_use_mmap=False, file_filter='*.npy', file_list=None, file_root=None, files=None, fill_value=0.0, initial_fill=1024, lazy_init=False, num_shards=1, out_of_bounds_policy='error', pad_last_batch=False, prefetch_queue_depth=1, preserve=False, random_shuffle=False, read_ahead=False, register_buffers=True, rel_roi_end=None, rel_roi_shape=None, rel_roi_start=None, roi_axes=[], roi_end=None, roi_shape=None, roi_start=None, seed=-1, shard_id=0, shuffle_after_epoch=False, skip_cached_images=False, stick_to_shard=False, tensor_init_bytes=1048576, use_o_direct=False, device=None, name=None)#
- Reads Numpy arrays from a directory. - This operator can be used in the following modes: - Read all files from a directory indicated by - file_rootthat match given- file_filter.
- Read file names from a text file indicated in - file_listargument.
- Read files listed in - filesargument.
 - Note - The - gpubackend requires cuFile/GDS support (418.x driver family or newer). which is shipped with the CUDA toolkit starting from CUDA 11.4. Please check the GDS documentation for more details.- The - gpureader reads the files in chunks. The size of the chunk can be controlled process-wide with an environment variable- DALI_GDS_CHUNK_SIZE. Valid values are powers of 2 between 4096 and 16M, with the default being 2M. For convenience, the value can be specified with a k or M suffix, applying a multiplier of 1024 and 2^20, respectively.- Supported backends
- ‘cpu’ 
- ‘gpu’ 
 
 - Keyword Arguments:
- bytes_per_sample_hint¶ (int or list of int, optional, default = [0]) – - Output size hint, in bytes per sample. - If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size. 
- cache_header_information¶ (bool, optional, default = False) – If set to True, the header information for each file is cached, improving access speed. 
- dont_use_mmap¶ (bool, optional, default = False) – - If set to True, the Loader will use plain file I/O instead of trying to map the file in memory. - Mapping provides a small performance benefit when accessing a local file system, but most network file systems, do not provide optimum performance. 
- file_filter¶ (str, optional, default = ‘*.npy’) – - If a value is specified, the string is interpreted as glob string to filter the list of files in the sub-directories of the - file_root.- This argument is ignored when file paths are taken from - file_listor- files.
- file_list¶ (str, optional) – - Path to a text file that contains filenames (one per line) where the filenames are relative to the location of that file or to - file_root, if specified.- This argument is mutually exclusive with - files.
- file_root¶ (str, optional) – - Path to a directory that contains the data files. - If not using - file_listor- files. this directory is traversed to discover the files.- file_rootis required in this mode of operation.
- files¶ (str or list of str, optional) – - A list of file paths to read the data from. - If - file_rootis provided, the paths are treated as being relative to it.- This argument is mutually exclusive with - file_list.
- fill_value¶ (float, optional, default = 0.0) – Determines the padding value when - out_of_bounds_policyis set to “pad”.
- initial_fill¶ (int, optional, default = 1024) – - Size of the buffer that is used for shuffling. - If - random_shuffleis False, this parameter is ignored.
- lazy_init¶ (bool, optional, default = False) – Parse and prepare the dataset metadata only during the first run instead of in the constructor. 
- num_shards¶ (int, optional, default = 1) – - Partitions the data into the specified number of parts (shards). - This is typically used for multi-GPU or multi-node training. 
- out_of_bounds_policy¶ (str, optional, default = ‘error’) – - Determines the policy when reading outside of the bounds of the numpy array. - Here is a list of the supported values: - "error"(default): Attempting to read outside of the bounds of the image will produce an error.
- "pad": The array will be padded as needed with zeros or any other value that is specified with the- fill_valueargument.
- "trim_to_shape": The ROI will be cut to the bounds of the array.
 
- pad_last_batch¶ (bool, optional, default = False) – - If set to True, pads the shard by repeating the last sample. - Note - If the number of batches differs across shards, this option can cause an entire batch of repeated samples to be added to the dataset. 
- prefetch_queue_depth¶ (int, optional, default = 1) – - Specifies the number of batches to be prefetched by the internal Loader. - This value should be increased when the pipeline is CPU-stage bound, trading memory consumption for better interleaving with the Loader thread. 
- preserve¶ (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used. 
- random_shuffle¶ (bool, optional, default = False) – - Determines whether to randomly shuffle data. - A prefetch buffer with a size equal to - initial_fillis used to read data sequentially, and then samples are selected randomly to form a batch.
- read_ahead¶ (bool, optional, default = False) – - Determines whether the accessed data should be read ahead. - For large files such as LMDB, RecordIO, or TFRecord, this argument slows down the first access but decreases the time of all of the following accesses. 
- register_buffers¶ (bool, optional, default = True) – - Applies only to the - gpubackend type.- Warning - This argument is temporarily disabled and left for backward compatibility. It will be reenabled in the future releases. - If true, the device I/O buffers will be registered with cuFile. It is not recommended if the sample sizes vary a lot. 
- rel_roi_end¶ (float or list of float or TensorList of float, optional) – - End of the region-of-interest, in relative coordinates (range [0.0 - 1.0]). - This argument is incompatible with “roi_end”, “roi_shape” and “rel_roi_shape”. 
- rel_roi_shape¶ (float or list of float or TensorList of float, optional) – - Shape of the region-of-interest, in relative coordinates (range [0.0 - 1.0]). - This argument is incompatible with “roi_shape”, “roi_end” and “rel_roi_end”. 
- rel_roi_start¶ (float or list of float or TensorList of float, optional) – - Start of the region-of-interest, in relative coordinates (range [0.0 - 1.0]). - This argument is incompatible with “roi_start”. 
- roi_axes¶ (int or list of int, optional, default = []) – - Order of dimensions used for the ROI anchor and shape arguments, as dimension indices. - If not provided, all the dimensions should be specified in the ROI arguments. 
- roi_end¶ (int or list of int or TensorList of int, optional) – - End of the region-of-interest, in absolute coordinates. - This argument is incompatible with “rel_roi_end”, “roi_shape” and “rel_roi_shape”. 
- roi_shape¶ (int or list of int or TensorList of int, optional) – - Shape of the region-of-interest, in absolute coordinates. - This argument is incompatible with “rel_roi_shape”, “roi_end” and “rel_roi_end”. 
- roi_start¶ (int or list of int or TensorList of int, optional) – - Start of the region-of-interest, in absolute coordinates. - This argument is incompatible with “rel_roi_start”. 
- seed¶ (int, optional, default = -1) – Random seed; if not set, one will be assigned automatically. 
- shard_id¶ (int, optional, default = 0) – Index of the shard to read. 
- shuffle_after_epoch¶ (bool, optional, default = False) – - If set to True, the reader shuffles the entire dataset after each epoch. - stick_to_shardand- random_shufflecannot be used when this argument is set to True.
- skip_cached_images¶ (bool, optional, default = False) – - If set to True, the loading data will be skipped when the sample is in the decoder cache. - In this case, the output of the loader will be empty. 
- stick_to_shard¶ (bool, optional, default = False) – - Determines whether the reader should stick to a data shard instead of going through the entire dataset. - If decoder caching is used, it significantly reduces the amount of data to be cached, but might affect accuracy of the training. 
- tensor_init_bytes¶ (int, optional, default = 1048576) – Hint for how much memory to allocate per image. 
- use_o_direct¶ (bool, optional, default = False) – - If set to True, the data will be read directly from the storage bypassing system cache. - Mutually exclusive with - dont_use_mmap=False.