Release Notes#

Release 1.2.0#

This is the initial release of nemoretriever-parse. For more information, see nemoretriever-parse Overview.

This patch release fixes CUDA runtime errors seen on AWS and Azure instances.

PEFT is not supported.
Following Meta’s guidance, function calling is not supported.
Following Meta’s guidance, only one image per request is supported.
Following Meta’s guidance, system messages are not allowed with images.
Following the official vLLM implementation, images are always added to the front of user messages.
Maximum concurrency can be low when using the vLLM backend.
Image and vision encoder Prometheus metrics are not available with the vLLM backend.
With context length larger than 32k, the accuracy of Llama-3.2-90B-Vision-Instruct can be degraded.

This is the 1.1.0 release of NIM for VLMs.

PEFT is not supported.
Following Meta’s guidance, function calling is not supported.
Following Meta’s guidance, only one image per request is supported.
Following Meta’s guidance, system messages are not allowed with images.
Following the official vLLM implementation, images are always added to the front of user messages.
Maximum concurrency can be low when using the vLLM backend.
Image and vision encoder Prometheus metrics are not available with the vLLM backend.
With context length larger than 32k, the accuracy of Llama-3.2-90B-Vision-Instruct can be degraded.
When deploying an optimized profile on AWS A10G, you might see [TensorRT-LLM][ERROR] ICudaEngine::createExecutionContextWithoutDeviceMemory: Error Code 1: Cuda Runtime (an illegal memory access was encountered). Use the vLLM backend instead as described here.