country_code
Skip to main content
Ctrl+K
NeMo-Speech - Home NeMo-Speech - Home

NeMo-Speech

  • GitHub
NeMo-Speech - Home NeMo-Speech - Home

NeMo-Speech

  • GitHub

Table of Contents

Getting Started

  • Installation
  • NeMo Speech Inference in 5 Minutes
  • Key Concepts in Speech AI
  • Choosing a Model
  • Tutorials

Training

  • Parallelisms
  • Mixed Precision Training
  • Checkpoint Formats
  • Lhotse Dataloading

Collections

  • Automatic Speech Recognition (ASR)
    • Featured Models
    • ASR Model Checkpoints
    • Inference
    • Fine-Tuning
    • Datasets
    • ASR Language Modeling and Customization
      • NGPU-LM (GPU-based N-gram Language Model) Language Model Fusion
      • Neural Rescoring
      • N-gram Language Model Fusion
      • Scripts for building and merging N-gram Language Models
      • Word Boosting
    • NeMo ASR Configuration Files
    • NeMo ASR API
    • Featured Community Checkpoints
  • Text-to-Speech (TTS)
    • Models
    • Data Preprocessing
    • Checkpoints
    • NeMo TTS Configuration Files
    • Grapheme-to-Phoneme Models
    • Magpie-TTS
    • Magpie-TTS Finetuning
    • Magpie-TTS Preference Optimization
    • Magpie-TTS Longform Inference
  • SpeechLM2
    • Models
    • Datasets
    • Configuration Files
    • Training and Scaling
  • Speaker Diarization
    • Models
    • Datasets
    • Checkpoints
    • Speaker Diarization Configuration Files
    • NeMo Speaker Diarization API
    • Resource and Documentation Guide
  • Speaker Recognition (SR)
    • Models
    • NeMo Speaker Recognition Configuration Files
    • Datasets
    • Checkpoints
    • NeMo Speaker Recognition API
    • Resource and Documentation Guide
  • Speech and Audio Processing
    • Models
    • Datasets
    • Checkpoints
    • NeMo Audio Configuration Files
    • NeMo Audio API
  • Speech Self-Supervised Learning
    • Models
    • Datasets
    • Checkpoints
    • NeMo SSL Configuration Files
    • NeMo SSL collection API
    • Resources and Documentation
  • Speech Classification
    • Models
    • Datasets
    • Checkpoints
    • NeMo Speech Classification Configuration Files
    • Resource and Documentation Guide

Speech AI Tools

  • NeMo Forced Aligner (NFA)
  • Dataset Creation Tool Based on CTC-Segmentation
  • Speech Data Explorer
  • Comparison tool for ASR Models
  • ASR Evaluator
  • Speech Data Processor

APIs

  • NeMo Models
  • Neural Modules
  • Experiment Manager
  • Neural Types
  • Adapters
    • Adapter Components
    • Adapters API
  • NeMo Core APIs
  • NeMo Common Collection API
    • Callbacks
    • Losses
    • Metrics
    • Tokenizers
    • Data
    • S3 Checkpointing
  • NeMo ASR API
  • NeMo TTS API
  • NeMo Audio API
  • Automatic Speech Recognition (ASR)
  • Featured Community Checkpoints
Is this page helpful?

Featured Community Checkpoints#

Community fine-tunes built on NVIDIA NeMo ASR checkpoints and published on Hugging Face. For NVIDIA-published checkpoints, see ASR Model Checkpoints and the NVIDIA Hugging Face organization.

Note

Community checkpoints are maintained by their authors, not by the NeMo team. Use each model’s Hugging Face model card and the framework project linked below for up-to-date setup and inference instructions.

Checkpoint

What’s special

Framework

akera/parakeet-tdt-salt

SALT multilingual ASR for 10 East African languages. Hybrid TDT+CTC FastConformer (600M), fine-tuned from parakeet-tdt-0.6b-v3.

NeMo

johannhartmann/parakeet_de_med

German medical documentation ASR (PEFT). WER 11.73% → 3.28% on a 122-sample medical eval set.

NeMo

qenneth/parakeet-tdt-0.6b-v3-finetuned-for-ATC

ATC English ASR on jacktol/ATC-ASR-Dataset. Test WER 5.99%.

NeMo

KasuleTrevor/parakeet-0.6b-cv-sw-5hr_v9

Swahili ASR fine-tune on ~5 hours of Common Voice data.

NeMo

NeurologyAI/neuro-parakeet-mlx

German medical/neurology ASR for Apple Silicon. WER 1.04% on the author’s medical validation set.

MLX

cstr/parakeet-tdt-0.6b-v3-GGUF

Quantised Parakeet TDT (Q4_K ~467 MB). 25 EU languages, word-level timestamps.

GGUF (CrispASR)

cstr/canary-1b-v2-GGUF

Quantised Canary 1B (Q4_K ~673 MB). Multilingual ASR and speech translation.

GGUF (CrispASR)

Submit a Community Checkpoint#

To suggest a checkpoint for this page, open a GitHub issue with the Hugging Face model link, NeMo base checkpoint, task, languages, evaluation results, and inference framework.

previous

NeMo ASR API

next

Text-to-Speech (TTS)

On this page
  • Submit a Community Checkpoint
NVIDIA NVIDIA
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.