Release Notes#

Release 1.2.0#

Summary#

This is the initial release of nemoretriever-parse. For more information, see nemoretriever-parse Overview.

Visual Language Models#

Limitations#

  • Only one image per request is supported.

  • Text input is not allowed.

  • System messages are not allowed.

Release 1.1.1#

Summary#

This patch release fixes CUDA runtime errors seen on AWS and Azure instances.

Visual Language Models#

Limitations#

  • PEFT is not supported.

  • Following Meta’s guidance, function calling is not supported.

  • Following Meta’s guidance, only one image per request is supported.

  • Following Meta’s guidance, system messages are not allowed with images.

  • Following the official vLLM implementation, images are always added to the front of user messages.

  • Maximum concurrency can be low when using the vLLM backend.

  • Image and vision encoder Prometheus metrics are not available with the vLLM backend.

  • With context length larger than 32k, the accuracy of Llama-3.2-90B-Vision-Instruct can be degraded.

Release 1.1.0#

Summary#

This is the 1.1.0 release of NIM for VLMs.

Visual Language Models#

Limitations#

  • PEFT is not supported.

  • Following Meta’s guidance, function calling is not supported.

  • Following Meta’s guidance, only one image per request is supported.

  • Following Meta’s guidance, system messages are not allowed with images.

  • Following the official vLLM implementation, images are always added to the front of user messages.

  • Maximum concurrency can be low when using the vLLM backend.

  • Image and vision encoder Prometheus metrics are not available with the vLLM backend.

  • With context length larger than 32k, the accuracy of Llama-3.2-90B-Vision-Instruct can be degraded.

  • When deploying an optimized profile on AWS A10G, you might see [TensorRT-LLM][ERROR] ICudaEngine::createExecutionContextWithoutDeviceMemory: Error Code 1: Cuda Runtime (an illegal memory access was encountered). Use the vLLM backend instead as described here.