Important

You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.

Models#

End-to-End ASR models are typically of encoder-decoder style, where the encoder does acoustic modeling i.e., converting speech wavform into features, and the decoder converts those features into text. Encoder contains the bulk of trainable parameters and is usually the focus of SSL in ASR. Thus, any architecture that can be used as encoder in ASR models can be pre-trained using SSL. For an overview of model architectures that are currently supported in NeMo’s ASR’s collection, refer to ASR Models. Note that SSL also uses encoder-decoder style of models. During down-stream fine-tuning, the encoder is retained where as the decoder (used during SSL) is replaced with down-stream task specific module. Refer to checkpoints to see how this is accomplished in NeMo.