Mistral#

Mistral AI develops frontier large language models with a focus on efficiency and performance. The Mistral family includes both dense and Mixture-of-Experts architectures, featuring innovations like sliding window attention and efficient context handling.

Mistral models are supported via the Bridge system with auto-detected configuration and weight mapping.

Available Models#

Megatron Bridge supports the following Mistral model variants:

Mistral Small 3 (24B): 24B parameters with 128K context length
Mistral 7B: 7B parameters, efficient baseline model
Mistral 7B Instruct: Instruction-tuned variant

Additional Mistral models (including MoE variants like Mixtral) may be supported through the standard conversion pipeline.

Model Architecture Features#

Sliding Window Attention: Efficient attention mechanism for long sequences
Grouped Query Attention (GQA): Memory-efficient attention mechanism
Rotary Positional Embeddings (RoPE): Relative position encoding
SwiGLU Activation: Gated linear units in the feedforward network
Extended Context: Support for sequences up to 128K tokens (Mistral Small 3)
YaRN RoPE Scaling: Advanced rope scaling for extended context lengths

Conversion with 🤗 Hugging Face#

Load HF → Megatron#

from megatron.bridge import AutoBridge

# Example: Mistral Small 3 24B
bridge = AutoBridge.from_hf_pretrained("mistralai/Mistral-Small-24B-Base-2501")
provider = bridge.to_megatron_provider()

# Optionally configure parallelism before instantiating the model
provider.tensor_model_parallel_size = 2
provider.pipeline_model_parallel_size = 1

model = provider.provide_distributed_model(wrap_with_ddp=False)

Import Checkpoint from HF#

python examples/conversion/convert_checkpoints.py import \
  --hf-model mistralai/Mistral-Small-24B-Base-2501 \
  --megatron-path /checkpoints/mistral_small_24b_megatron

Export Megatron → HF#

from megatron.bridge import AutoBridge

# Load the bridge from HF model ID
bridge = AutoBridge.from_hf_pretrained("mistralai/Mistral-Small-24B-Base-2501")

# Export a trained Megatron checkpoint to HF format
bridge.export_ckpt(
    megatron_path="/results/mistral_small_24b/checkpoints/iter_0000500",
    hf_path="/exports/mistral_small_24b_hf",
)

Run Inference on Converted Checkpoint#

python examples/conversion/hf_to_megatron_generate_text.py \
  --hf_model_path mistralai/Mistral-Small-24B-Base-2501 \
  --megatron_model_path /checkpoints/mistral_small_24b_megatron \
  --prompt "What is artificial intelligence?" \
  --max_new_tokens 100 \
  --tp 2

For more details, see examples/conversion/hf_to_megatron_generate_text.py

Recipes#

Training recipes for Mistral models are not currently available. The Bridge supports checkpoint conversion for inference and deployment use cases.

Hugging Face Model Cards & References#

Hugging Face Model Cards#

Mistral Small 3 (24B): https://huggingface.co/mistralai/Mistral-Small-24B-Base-2501
Mistral Small 3 (24B) Instruct: https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501
Mistral 7B v0.1: https://huggingface.co/mistralai/Mistral-7B-v0.1
Mistral 7B Instruct v0.2: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2

Technical Papers#

Mistral 7B: https://arxiv.org/abs/2310.06825

Additional Resources#

Mistral AI Website: https://mistral.ai/
Mistral Documentation: https://docs.mistral.ai/