Release Notes#
Release 1.2.0#
Summary#
This is the initial release of nemoretriever-parse. For more information, see nemoretriever-parse Overview.
Visual Language Models#
Limitations#
- Only one image per request is supported. 
- Text input is not allowed. 
- System messages are not allowed. 
Release 1.1.1#
Summary#
This patch release fixes CUDA runtime errors seen on AWS and Azure instances.
Visual Language Models#
Limitations#
- PEFT is not supported. 
- Following Meta’s guidance, function calling is not supported. 
- Following Meta’s guidance, only one image per request is supported. 
- Following Meta’s guidance, system messages are not allowed with images. 
- Following the official vLLM implementation, images are always added to the front of user messages. 
- Maximum concurrency can be low when using the vLLM backend. 
- Image and vision encoder Prometheus metrics are not available with the vLLM backend. 
- With context length larger than 32k, the accuracy of Llama-3.2-90B-Vision-Instruct can be degraded. 
Release 1.1.0#
Summary#
This is the 1.1.0 release of NIM for VLMs.
Visual Language Models#
Limitations#
- PEFT is not supported. 
- Following Meta’s guidance, function calling is not supported. 
- Following Meta’s guidance, only one image per request is supported. 
- Following Meta’s guidance, system messages are not allowed with images. 
- Following the official vLLM implementation, images are always added to the front of user messages. 
- Maximum concurrency can be low when using the vLLM backend. 
- Image and vision encoder Prometheus metrics are not available with the vLLM backend. 
- With context length larger than 32k, the accuracy of Llama-3.2-90B-Vision-Instruct can be degraded. 
- When deploying an optimized profile on AWS A10G, you might see - [TensorRT-LLM][ERROR] ICudaEngine::createExecutionContextWithoutDeviceMemory: Error Code 1: Cuda Runtime (an illegal memory access was encountered). Use the vLLM backend instead as described here.