nvidia.dali.fn.readers.file¶
- 
nvidia.dali.fn.readers.file(*inputs, **kwargs)¶
- Reads file contents and returns file-label pairs. - This operator can be used in the following modes: - Listing files from a directory, assigning labels based on subdirectory structure. 
 - In this mode, the directory indicated in - file_rootargument should contain one or more subdirectories. The files in these subdirectories are listed and assigned labels based on lexicographical order of the subdirectory. If you provide- file_filtersargument with a list of glob strings, the operator will list files matching at least one of the patterns. Otherwise, a default set of filters is used (see the default value of- file_filtersfor details).- For example, this directory structure: - <file_root>/0/image0.jpg <file_root>/0/world_map.jpg <file_root>/0/antarctic.png <file_root>/1/cat.jpeg <file_root>/1/dog.tif <file_root>/2/car.jpeg <file_root>/2/truck.jp2 - by default will yield the following outputs: - <contents of 0/image0.jpg> 0 <contents of 0/world_map.jpg> 0 <contents of 0/antarctic.png> 0 <contents of 1/cat.jpeg> 1 <contents of 1/dog.tif> 1 <contents of 2/car.jpeg> 2 <contents of 2/truck.jp2> 2 - and with - file_filters = ["*.jpg", "*.jpeg"]will yield the following outputs:- <contents of 0/image0.jpg> 0 <contents of 0/world_map.jpg> 0 <contents of 1/cat.jpeg> 1 <contents of 2/car.jpeg> 2 - Use file names and labels stored in a text file. 
 - file_listargument points to a file which contains one file name and label per line. Example:- dog.jpg 0 cute kitten.jpg 1 doge.png 0 - The file names can contain spaces in the middle, but cannot contain trailing whitespace. - Use file names and labels provided as a list of strings and integers, respectively. 
 - As with other readers, the (file, label) pairs returned by this operator can be randomly shuffled and various sharding strategies can be applied. See documentation of this operator’s arguments for details. - Supported backends
- ‘cpu’ 
 
 - Keyword Arguments
- bytes_per_sample_hint (int or list of int, optional, default = [0]) – - Output size hint, in bytes per sample. - If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size. 
- case_sensitive_filter (bool, optional, default = False) – If set to True, the filter will be matched case-sensitively, otherwise case-insensitively. 
- dont_use_mmap (bool, optional, default = False) – - If set to True, the Loader will use plain file I/O instead of trying to map the file in memory. - Mapping provides a small performance benefit when accessing a local file system, but most network file systems, do not provide optimum performance. 
- file_filters (str or list of str, optional, default = [‘*.jpg’, ‘*.jpeg’, ‘*.png’, ‘*.bmp’, ‘*.tif’, ‘*.tiff’, ‘*.pnm’, ‘*.ppm’, ‘*.pgm’, ‘*.pbm’, ‘*.jp2’, ‘*.webp’, ‘*.flac’, ‘*.ogg’, ‘*.wav’]) – - A list of glob strings to filter the list of files in the sub-directories of the - file_root.- This argument is ignored when file paths are taken from - file_listor- files.
- file_list (str, optional) – - Path to a text file that contains one whitespace-separated - filename labelpair per line. The filenames are relative to the location of that file or to- file_root, if specified.- This argument is mutually exclusive with - files.
- file_root (str, optional) – - Path to a directory that contains the data files. - If not using - file_listor- files, this directory is traversed to discover the files.- file_rootis required in this mode of operation.
- files (str or list of str, optional) – - A list of file paths to read the data from. - If - file_rootis provided, the paths are treated as being relative to it. When using- files, the labels are taken from- labelsargument or, if it was not supplied, contain indices at which given file appeared in the- fileslist.- This argument is mutually exclusive with - file_list.
- initial_fill (int, optional, default = 1024) – - Size of the buffer that is used for shuffling. - If - random_shuffleis False, this parameter is ignored.
- labels (int or list of int, optional) – - Labels accompanying contents of files listed in - filesargument.- If not used, sequential 0-based indices are used as labels 
- lazy_init (bool, optional, default = False) – Parse and prepare the dataset metadata only during the first run instead of in the constructor. 
- num_shards (int, optional, default = 1) – - Partitions the data into the specified number of parts (shards). - This is typically used for multi-GPU or multi-node training. 
- pad_last_batch (bool, optional, default = False) – - If set to True, pads the shard by repeating the last sample. - Note - If the number of batches differs across shards, this option can cause an entire batch of repeated samples to be added to the dataset. 
- prefetch_queue_depth (int, optional, default = 1) – - Specifies the number of batches to be prefetched by the internal Loader. - This value should be increased when the pipeline is CPU-stage bound, trading memory consumption for better interleaving with the Loader thread. 
- preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used. 
- random_shuffle (bool, optional, default = False) – - Determines whether to randomly shuffle data. - A prefetch buffer with a size equal to - initial_fillis used to read data sequentially, and then samples are selected randomly to form a batch.
- read_ahead (bool, optional, default = False) – - Determines whether the accessed data should be read ahead. - For large files such as LMDB, RecordIO, or TFRecord, this argument slows down the first access but decreases the time of all of the following accesses. 
- seed (int, optional, default = -1) – - Random seed. - If not provided, it will be populated based on the global seed of the pipeline. 
- shard_id (int, optional, default = 0) – Index of the shard to read. 
- shuffle_after_epoch (bool, optional, default = False) – - If set to True, the reader shuffles the entire dataset after each epoch. - stick_to_shardand- random_shufflecannot be used when this argument is set to True.
- skip_cached_images (bool, optional, default = False) – - If set to True, the loading data will be skipped when the sample is in the decoder cache. - In this case, the output of the loader will be empty. 
- stick_to_shard (bool, optional, default = False) – - Determines whether the reader should stick to a data shard instead of going through the entire dataset. - If decoder caching is used, it significantly reduces the amount of data to be cached, but might affect accuracy of the training. 
- tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.