Mistral

User Guide (Latest Version)

Released in September 2023, Mistral AI’s Mistral 7B is another classic transformer decoder framework Large Language Model with significant enhancements. The Mistral AI team implemented Grouped-Query Attention and Sliding Window Attention, which leverages Flash Attention 2. More information about the model and how it was trained are available in the companion paper “Mistral 7B”.

Feature

Status

Data parallelism
Tensor parallelism
Pipeline parallelism
Interleaved Pipeline Parallelism Sched N/A
Sequence parallelism
Selective activation checkpointing
Gradient checkpointing
Partial gradient checkpointing
FP32/TF32
AMP/FP16
BF16
TransformerEngine/FP8
Multi-GPU
Multi-Node
Inference N/A
Slurm
Base Command Manager
Base Command Platform
Distributed data preprcessing
NVfuser
P-Tuning and Prompt Tuning
IA3 and Adapter learning
Distributed Optimizer
Distributed Checkpoint
Fully Shared Data Parallel N/A
Previous Parameter Efficient Fine-Tuning (PEFT)
Next Data Preparation
© | | | | | | |. Last updated on Jun 19, 2024.