Release Notes#

Release 2.0.6-variant#

This release contains model updates outlined in the following sections.

Mistral Medium 3.5 128B#

This is the initial release of Mistral Medium 3.5 128B. This NIM is part of the NIM Certified offering. For more information on this model, see the model card.

For GPU support, refer to the support matrix for Mistral Medium 3.5 128B.

Note the following limitations:

  • Mistral Medium 3.5 128B

    • For the /v1/responses endpoint, objects with "type": "input_text" are not supported. Use comparable fields instead, such as "input": "<text>" or "content": "<text>".

    • Prompts with Unicode characters in the range from 0x0e0020 to 0x0e007f can produce unpredictable responses. You should filter these characters out of a prompt before submitting the prompt to the VLM.

    • Per-request chat_template and chat_template_kwargs overrides are not supported for Mistral tokenizer-based models.

    • An HTTP 500 Internal Server Error may mask upstream errors, such as sending a request that contains an unsupported image format.

    • The /v1/chat/completions, /v1/messages, and /v1/completions endpoints silently accept out-of-range or wrong-type parameter values and return HTTP 200 OK responses. Responses may contain unexpected or invalid data. Some parameter validation is handled by the vLLM schema, while other validation is handled by the OpenAI schema. NIM reports parameter validation from vLLM only.

    • list-model-profiles may classify a profile as runnable on SKUs not listed in the support matrix; deployment may fail with a CUDA out-of-memory error.

    • Reasoning is not supported on the Anthropic /v1/messages endpoint.

    • Use the reasoning_effort request parameter to control reasoning behavior. The following values are supported:

      • "high" — Enable reasoning.

      • "none" — Disable reasoning (default). The model responds directly without generating reasoning tokens.

For information about past updates and older versions, refer to the previous release notes.

Mistral Small 4#

This is the initial release of Mistral Small 4. This NIM is part of the NIM Certified offering. For more information on this model, see the model card.

For GPU support, refer to the support matrix for Mistral Small 4.

Note the following limitations:

  • Mistral Small 4

    • For the /v1/responses endpoint, objects with "type": "input_text" are not supported. Use comparable fields instead, such as "input": "<text>" or "content": "<text>".

    • Prompts with Unicode characters in the range from 0x0e0020 to 0x0e007f can produce unpredictable responses. You should filter these characters out of a prompt before submitting the prompt to the VLM.

    • Per-request chat_template and chat_template_kwargs overrides are not supported for Mistral tokenizer-based models.

    • An HTTP 500 Internal Server Error may mask upstream errors, such as sending a request that contains an unsupported image format.

    • The /v1/chat/completions, /v1/messages, and /v1/completions endpoints silently accept out-of-range or wrong-type parameter values and return HTTP 200 OK responses. Responses may contain unexpected or invalid data. Some parameter validation is handled by the vLLM schema, while other validation is handled by the OpenAI schema. NIM reports parameter validation from vLLM only.

    • list-model-profiles may classify a profile as runnable on SKUs not listed in the support matrix; deployment may fail with a CUDA out-of-memory error.

    • Reasoning is not supported on the Anthropic /v1/messages endpoint.

    • Use the reasoning_effort request parameter to control reasoning behavior. The following values are supported:

      • "high" — Enable reasoning.

      • "none" — Disable reasoning (default). The model responds directly without generating reasoning tokens.

For information about past updates and older versions, refer to the previous release notes.