nvidia.dali.fn.readers.nemo_asr¶
- nvidia.dali.fn.readers.nemo_asr(*inputs, **kwargs)¶
- Reads automatic speech recognition (ASR) data (audio, text) from an NVIDIA NeMo compatible manifest. - Example manifest file: - {"audio_filepath": "path/to/audio1.wav", "duration": 3.45, "text": "this is a nemo tutorial"} {"audio_filepath": "path/to/audio1.wav", "offset": 3.45, "duration": 1.45, "text": "same audio file but using offset"} {"audio_filepath": "path/to/audio2.wav", "duration": 3.45, "text": "third transcript in this example"} - Note - Only - audio_filepathis field mandatory. If- durationis not specified, the whole audio file will be used. A missing- textfield will produce an empty string as a text.- Warning - Handling of - durationand- offsetfields is not yet implemented. The current implementation always reads the whole audio file.- This reader produces between 1 and 3 outputs: - Decoded audio data: float, - shape=(audio_length,)
- (optional, if - read_sample_rate=True) Audio sample rate: float,- shape=(1,)
- (optional, if - read_text=True) Transcript text as a null terminated string: uint8,- shape=(text_len + 1,)
- (optional, if - read_idxs=True) Index of the manifest entry: int64,- shape=(1,)
 - Supported backends
- ‘cpu’ 
 
 - Keyword Arguments:
- manifest_filepaths (str or list of str) – List of paths to NeMo’s compatible manifest files. 
- bytes_per_sample_hint (int or list of int, optional, default = [0]) – - Output size hint, in bytes per sample. - If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size. 
- dont_use_mmap (bool, optional, default = False) – - If set to True, the Loader will use plain file I/O instead of trying to map the file in memory. - Mapping provides a small performance benefit when accessing a local file system, but most network file systems, do not provide optimum performance. 
- downmix (bool, optional, default = True) – If True, downmix all input channels to mono. If downmixing is turned on, decoder will produce always 1-D output 
- dtype ( - nvidia.dali.types.DALIDataType, optional, default = DALIDataType.FLOAT) –- Output data type. - Supported types: - INT16,- INT32, and- FLOAT.
- initial_fill (int, optional, default = 1024) – - Size of the buffer that is used for shuffling. - If - random_shuffleis False, this parameter is ignored.
- lazy_init (bool, optional, default = False) – Parse and prepare the dataset metadata only during the first run instead of in the constructor. 
- max_duration (float, optional, default = 0.0) – - If a value greater than 0 is provided, it specifies the maximum allowed duration, in seconds, of the audio samples. - Samples with a duration longer than this value will be ignored. 
- min_duration (float, optional, default = 0.0) – - If a value greater than 0 is provided, it specifies the minimum allowed duration,
- in seconds, of the audio samples. 
 - Samples with a duration shorter than this value will be ignored. 
- normalize_text (bool) – - Warning - The argument - normalize_textis no longer used and will be removed in a future release.
- num_shards (int, optional, default = 1) – - Partitions the data into the specified number of parts (shards). - This is typically used for multi-GPU or multi-node training. 
- pad_last_batch (bool, optional, default = False) – - If set to True, pads the shard by repeating the last sample. - Note - If the number of batches differs across shards, this option can cause an entire batch of repeated samples to be added to the dataset. 
- prefetch_queue_depth (int, optional, default = 1) – - Specifies the number of batches to be prefetched by the internal Loader. - This value should be increased when the pipeline is CPU-stage bound, trading memory consumption for better interleaving with the Loader thread. 
- preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used. 
- quality (float, optional, default = 50.0) – - Resampling quality, 0 is lowest, 100 is highest. - 0 corresponds to 3 lobes of the sinc filter; 50 gives 16 lobes and 100 gives 64 lobes. 
- random_shuffle (bool, optional, default = False) – - Determines whether to randomly shuffle data. - A prefetch buffer with a size equal to - initial_fillis used to read data sequentially, and then samples are selected randomly to form a batch.
- read_ahead (bool, optional, default = False) – - Determines whether the accessed data should be read ahead. - For large files such as LMDB, RecordIO, or TFRecord, this argument slows down the first access but decreases the time of all of the following accesses. 
- read_idxs (bool, optional, default = False) – - Whether to output the indices of samples as they occur in the manifest file
- as a separate output 
 
- read_sample_rate (bool, optional, default = True) – Whether to output the sample rate for each sample as a separate output 
- read_text (bool, optional, default = True) – Whether to output the transcript text for each sample as a separate output 
- sample_rate (float, optional, default = -1.0) – If specified, the target sample rate, in Hz, to which the audio is resampled. 
- seed (int, optional, default = -1) – - Random seed. - If not provided, it will be populated based on the global seed of the pipeline. 
- shard_id (int, optional, default = 0) – Index of the shard to read. 
- shuffle_after_epoch (bool, optional, default = False) – If true, reader shuffles whole dataset after each epoch 
- skip_cached_images (bool, optional, default = False) – - If set to True, the loading data will be skipped when the sample is in the decoder cache. - In this case, the output of the loader will be empty. 
- stick_to_shard (bool, optional, default = False) – - Determines whether the reader should stick to a data shard instead of going through the entire dataset. - If decoder caching is used, it significantly reduces the amount of data to be cached, but might affect accuracy of the training. 
- tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.