nvidia.dali.experimental.dynamic.experimental.readers.Fits#
- class nvidia.dali.experimental.dynamic.experimental.readers.Fits(max_batch_size=None, name=None, device='cpu', num_inputs=None, *, dont_use_mmap=None, dtypes=None, file_filter=None, file_list=None, file_root=None, files=None, hdu_indices=None, initial_fill=None, lazy_init=None, num_shards=None, pad_last_batch=None, prefetch_queue_depth=None, random_shuffle=None, read_ahead=None, seed=None, shard_id=None, shuffle_after_epoch=None, skip_cached_images=None, stick_to_shard=None, tensor_init_bytes=None)#
- __init__(max_batch_size=None, name=None, device='cpu', num_inputs=None, *, dont_use_mmap=None, dtypes=None, file_filter=None, file_list=None, file_root=None, files=None, hdu_indices=None, initial_fill=None, lazy_init=None, num_shards=None, pad_last_batch=None, prefetch_queue_depth=None, random_shuffle=None, read_ahead=None, seed=None, shard_id=None, shuffle_after_epoch=None, skip_cached_images=None, stick_to_shard=None, tensor_init_bytes=None)#
Reads Fits image HDUs from a directory.
This operator can be used in the following modes:
Read all files from a directory indicated by
file_rootthat match givenfile_filter.Read file names from a text file indicated in
file_listargument.Read files listed in
filesargument.
4. Number of outputs per sample corresponds to the length of
hdu_indicesargument. By default, first HDU with data is read from each file, so the number of outputs defaults to 1.- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments:
dont_use_mmap¶ (bool, optional, default = False) –
If set to True, the Loader will use plain file I/O instead of trying to map the file in memory.
Mapping provides a small performance benefit when accessing a local file system, but most network file systems, do not provide optimum performance.
dtypes¶ (DALIDataType or list of DALIDataType, optional) –
Data types of the respective outputs.
If specified, it must be a list of types of respective outputs. By default, all outputs are assumed to be UINT8.”
file_filter¶ (str, optional, default = ‘*.fits’) –
If a value is specified, the string is interpreted as glob string to filter the list of files in the sub-directories of the
file_root.This argument is ignored when file paths are taken from
file_listorfiles.file_list¶ (str, optional) –
Path to a text file that contains filenames (one per line). The filenames are relative to the location of the text file or to
file_root, if specified.This argument is mutually exclusive with
files.file_root¶ (str, optional) –
Path to a directory that contains the data files.
If not using
file_listorfiles. this directory is traversed to discover the files.file_rootis required in this mode of operation.files¶ (str or list of str, optional) –
A list of file paths to read the data from.
If
file_rootis provided, the paths are treated as being relative to it.This argument is mutually exclusive with
file_list.hdu_indices¶ (int or list of int, optional, default = [2]) – HDU indices to read. If not provided, the first HDU after the primary will be yielded. Since HDUs are indexed starting from 1, the default value is as follows: hdu_indices = [2]. Size of the provided list hdu_indices defines number of outputs per sample.
initial_fill¶ (int, optional, default = 1024) –
Size of the buffer that is used for shuffling.
If
random_shuffleis False, this parameter is ignored.lazy_init¶ (bool, optional, default = False) – Parse and prepare the dataset metadata only during the first run instead of in the constructor.
num_shards¶ (int, optional, default = 1) –
Partitions the data into the specified number of parts (shards).
This is typically used for multi-GPU or multi-node training.
pad_last_batch¶ (bool, optional, default = False) –
If set to True, pads the shard by repeating the last sample.
Note
If the number of batches differs across shards, this option can cause an entire batch of repeated samples to be added to the dataset.
prefetch_queue_depth¶ (int, optional, default = 1) –
Specifies the number of batches to be prefetched by the internal Loader.
This value should be increased when the pipeline is CPU-stage bound, trading memory consumption for better interleaving with the Loader thread.
random_shuffle¶ (bool, optional, default = False) –
Determines whether to randomly shuffle data.
A prefetch buffer with a size equal to
initial_fillis used to read data sequentially, and then samples are selected randomly to form a batch.read_ahead¶ (bool, optional, default = False) –
Determines whether the accessed data should be read ahead.
For large files such as LMDB, RecordIO, or TFRecord, this argument slows down the first access but decreases the time of all of the following accesses.
seed¶ (int, optional, default = -1) – Random seed; if not set, one will be assigned automatically.
shard_id¶ (int, optional, default = 0) – Index of the shard to read.
shuffle_after_epoch¶ (bool, optional, default = False) –
If set to True, the reader shuffles the entire dataset after each epoch.
stick_to_shardandrandom_shufflecannot be used when this argument is set to True.skip_cached_images¶ (bool, optional, default = False) –
If set to True, the loading data will be skipped when the sample is in the decoder cache.
In this case, the output of the loader will be empty.
stick_to_shard¶ (bool, optional, default = False) –
Determines whether the reader should stick to a data shard instead of going through the entire dataset.
If decoder caching is used, it significantly reduces the amount of data to be cached, but might affect accuracy of the training.
tensor_init_bytes¶ (int, optional, default = 1048576) – Hint for how much memory to allocate per image.
- next_epoch(batch_size=None, ctx=None)#
Obtains an iterator that goes over the next epoch from the reader.
The return value is an iterator that returns either individual samples (if
batch_sizeisNoneand was not specified at construction) or batches (ifbatch_sizewas specified here or at construction).This iterator will go over the dataset (or shard, if sharding was specified at construction) once.
Note
The iterator must be traversed completely before the next call to next_epoch is made. Therefore, it is impossible to traverse one reader using two iterators. If another iterator is necessary, create a separate reader instance.