stages.audio.inference.asr_nemo
#
Module Contents#
Classes#
Stage that do speech recognition inference using NeMo model. |
API#
- class stages.audio.inference.asr_nemo.InferenceAsrNemoStage#
Bases:
nemo_curator.stages.base.ProcessingStage
[nemo_curator.tasks.FileGroupTask | nemo_curator.tasks.DocumentBatch | nemo_curator.tasks.AudioBatch
,nemo_curator.tasks.AudioBatch
]Stage that do speech recognition inference using NeMo model.
Args: model_name (str): name of the speech recognition NeMo model. See full list at https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/all_chkpt.html asr_model (Any): ASR model object. Defaults to None filepath_key (str): which key of the data object should be used to find the path to audiofile. Defaults to “audio_filepath” pred_text_key (str): key is used to identify the field containing the predicted transcription associated with a particular audio sample. Defaults to “pred_text” name (str): Stage name. Defaults to “ASR_inference”
- asr_model: Any | None#
None
- check_cuda() torch.device #
- filepath_key: str#
‘audio_filepath’
- inputs() tuple[list[str], list[str]] #
Define the input attributes required by this stage.
Returns: Tuple of (top_level_attrs, data_attrs) where: - top_level_attrs: [“data”] - requires FileGroupTask.data to be populated
- model_name: str#
None
- outputs() tuple[list[str], list[str]] #
Define the output attributes produced by this stage.
Returns: Tuple of (top_level_attrs, data_attrs) where: - top_level_attrs: [“data”] - populates FileGroupTask.data - data_attrs: [self.filepath_key, self.pred_text_key] - audiofile path and predicted text.
- pred_text_key: str#
‘pred_text’
- process(
- task: nemo_curator.tasks.FileGroupTask | nemo_curator.tasks.DocumentBatch | nemo_curator.tasks.AudioBatch,
Process a audio task by reading audio file and do ASR inference.
Args: tasks: List of FileGroupTask containing a path to audop file for inference.
Returns: List of SpeechObject with self.filepath_key . If errors occur, the task is returned with error information stored.
- setup(
- _worker_metadata: nemo_curator.backends.base.WorkerMetadata = None,
Initialise heavy object self.asr_model: nemo_asr.models.ASRModel
- setup_on_node(
- _node_info: nemo_curator.backends.base.NodeInfo | None = None,
- _worker_metadata: nemo_curator.backends.base.WorkerMetadata = None,
Setup method called once per node in distributed settings. Override this method to perform node-level initialization. Args: node_info (NodeInfo, optional): Information about the node (provided by some backends) worker_metadata (WorkerMetadata, optional): Information about the worker (provided by some backends)
- transcribe(files: list[str]) list[str] #
Run inference for speech recognition model Args: files: list of audio file paths.
Returns: list of predicted texts.