Sarvam#
Sarvam language models are MoE models from Sarvam AI. Megatron Bridge supports Sarvam dense-MLA and MoE variants through the Sarvam bridge implementations.
Supported Variants#
Variant |
Hugging Face ID |
Notes |
|---|---|---|
Sarvam 30B |
|
30B total, 3B active, 128 experts top-6 |
Sarvam 105B |
|
105B total, 10.3B active, 128 experts top-8 |
Architecture Notes#
Sarvam MoE models use QKV layernorm and Grouped Query Attention.
MoE layers map router weights, expert bias, shared experts, and per-expert gate/up/down projections.
Examples use
--trust-remote-codefor Hugging Face loading.
Examples#
For checkpoint import/export, round-trip validation, and inference commands, see the Sarvam examples README.