Mistral
Mistral AI models are efficient transformer decoder models featuring sliding window attention for long context support. Mistral-Nemo is a 12B model developed jointly with NVIDIA.
Available Models
- Mistral-7B: v0.1, v0.2, v0.3
- Mistral-7B-Instruct: v0.1, v0.2, v0.3
- Mistral-Nemo-Instruct-2407: 12B
Architecture
MistralForCausalLM
Example HF Models
Example Recipes
Try with NeMo AutoModel
1. Install (full instructions):
2. Clone the repo to get the example recipes:
3. Run the recipe from inside the repo:
Run with Docker
1. Pull the container and mount a checkpoint directory:
2. Navigate to the AutoModel directory (where the recipes are):
3. Run the recipe:
See the Installation Guide and LLM Fine-Tuning Guide.
Fine-Tuning
See the LLM Fine-Tuning Guide.