Mistral Models#

This page provides detailed technical specifications for the Mistral model family supported by NeMo Customizer. For information about supported features and capabilities, refer to Tested Models.

Mistral-7B-Instruct-v0.3#

Property

Value

Creator

Mistral AI

Architecture

transformer

Description

Mistral-7B-Instruct-v0.3 is an instruction-tuned model optimized for dialogue and instruction-following tasks.

Max I/O Tokens

4096

Parameters

7 billion

Training Data

Not specified

Default Name

mistralai/Mistral-7B-Instruct-v0.3

HuggingFace

mistralai/Mistral-7B-Instruct-v0.3

Training Options#

  • LoRA: 1x 80GB GPU, tensor parallel size 1

  • Full SFT: 1x 80GB GPU, tensor parallel size 1

Deployment Configuration#

  • LoRA:

    • NIM Image: nvcr.io/nim/nvidia/llm-nim:1.15.5

    • GPU Count: 1x 80GB

  • Full SFT:

    • NIM Image: nvcr.io/nim/nvidia/llm-nim:1.15.5

    • GPU Count: 1x 80GB

    • Additional Environment Variables:

      • NIM_MODEL_PROFILE: vllm

Ministral-3-3B-Instruct-2512#

Property

Value

Creator

Mistral AI

Architecture

transformer

Description

Ministral-3-3B-Instruct-2512 is a compact instruction-tuned model from Mistral AI designed for efficient deployment.

Max I/O Tokens

4096

Parameters

3 billion

Training Data

Not specified

Default Name

mistralai/Ministral-3-3B-Instruct-2512

HuggingFace

mistralai/Ministral-3-3B-Instruct-2512

Training Options#

  • LoRA: 1x 80GB GPU, tensor parallel size 1

  • Full SFT: 2x 80GB GPU, tensor parallel size 1

Note

Deployment using NIM is not supported for this model.

Ministral-3-3B-Reasoning-2512#

Property

Value

Creator

Mistral AI

Architecture

transformer

Description

Ministral-3-3B-Reasoning-2512 is a compact model from Mistral AI optimized for reasoning tasks.

Max I/O Tokens

4096

Parameters

3 billion

Training Data

Not specified

Default Name

mistralai/Ministral-3-3B-Reasoning-2512

HuggingFace

mistralai/Ministral-3-3B-Reasoning-2512

Training Options#

  • LoRA: 1x 80GB GPU, tensor parallel size 1

  • Full SFT: 2x 80GB GPU, tensor parallel size 1

Note

Deployment using NIM is not supported for this model.