Sarvam#

Sarvam language models are MoE models from Sarvam AI. Megatron Bridge supports Sarvam dense-MLA and MoE variants through the Sarvam bridge implementations.

Supported Variants#

Variant

Hugging Face ID

Notes

Sarvam 30B

sarvamai/sarvam-30b

30B total, 3B active, 128 experts top-6

Sarvam 105B

sarvamai/sarvam-105b

105B total, 10.3B active, 128 experts top-8

Architecture Notes#

  • Sarvam MoE models use QKV layernorm and Grouped Query Attention.

  • MoE layers map router weights, expert bias, shared experts, and per-expert gate/up/down projections.

  • Examples use --trust-remote-code for Hugging Face loading.

Examples#

For checkpoint import/export, round-trip validation, and inference commands, see the Sarvam examples README.