Falcon H1#

Falcon H1 is a hybrid language model family that combines Mamba, attention, and MLP blocks in each decoder layer. Megatron Bridge supports Falcon H1 through a custom provider and model implementation.

Supported Variants#

The public examples target the smallest instruction checkpoint:

Falcon-H1-0.5B-Instruct: https://huggingface.co/tiiuae/Falcon-H1-0.5B-Instruct

The bridge is registered for the FalconH1ForCausalLM architecture and can auto-detect compatible Falcon H1 checkpoints when their Hugging Face config uses the falcon_h1 model type.

Architecture Notes#

Uses a custom FalconH1ModelProvider and FalconH1Model.
Each decoder block is represented as a parallel Mamba, attention, and MLP layer.
The bridge maps Mamba input projections, convolution weights, state-space parameters, QKV projections, and gated MLP weights.
Falcon H1 checkpoints use custom Hugging Face code; examples pass --trust-remote-code.

Examples#

For checkpoint conversion, tokenizer asset export, round-trip validation, and inference, see the Falcon H1 examples README.