Release Notes for NVIDIA NIM for Image OCR (NeMo Retriever OCR)#

This documentation contains the release notes for NVIDIA NIM for Image OCR (NeMo Retriever OCR).

Release 1.4.0#

Added the nemotron-ocr-v2 model.
Added OCR v2 support for English and multilingual document text extraction.
Added the NIM_OCR_MODEL_VERSION environment variable to select multilingual OCR (v2, default) or English OCR (v2-en).
Added FP16 TensorRT profiles for supported NVIDIA data center GPUs.
Added a Helm chart for nemotron-ocr-v2 deployments.

Setting NIM_TRITON_ENABLE_MODEL_CONTROL=true causes a race condition in which model warmup is attempted before the models are loaded.
On NVIDIA A10G and NVIDIA L4 GPUs, the multilingual OCR v2 TensorRT profiles can produce lower OCR accuracy than the PyTorch profile. To resolve this issue, refer to Use PyTorch on A10G and L4 GPUs.

Rename existing models to the new Nemotron brand. The impacted models are the following:
- The nemoretriever-ocr-v1 model is now named nemotron-ocr-v1.
Add fixes for high and critical vulnerabilities.
Added performance optimizations.
Added the following new environment variables. For details, refer to environment variables.
- NIM_TRITON_DATA_MAX_BATCH_SIZE
- NIM_TRITON_DYNAMIC_BATCHING_ENABLED
- NIM_TRITON_ENABLE_ASYNC_MODEL_EXECUTION
- NIM_TRITON_ENABLE_PIPELINE_TIMING
- NIM_TRITON_GPU_DECODING_BATCH_THRESHOLD
- NIM_TRITON_MODEL_MAX_BATCH_SIZE
- NIM_TRITON_MODEL_MAX_QUEUE_DELAY_MICROSECONDS
- NIM_TRITON_PIPELINE_MAX_BATCH_SIZE
- NIM_TRITON_PIPELINE_MAX_QUEUE_DELAY_MICROSECONDS
- NIM_TRITON_PIPELINE_TIMING_INTERVAL
- NIM_TRITON_WORKER_INSTANCE_COUNT

The following are the known issues that are fixed in this version:

Fixed an issue with the persistence.enabled helm chart value. Persistent storage options (persistence.storageClass, persistence.existingClaim, hostPath.enabled) are now fully functional.
Memory is now freed after inference requests complete.
NIM_TRITON_MODEL_INSTANCE_COUNT now controls only the number of model instances, not the number of pipeline workers.

Setting NIM_TRITON_ENABLE_MODEL_CONTROL=true causes a race condition in which model warmup is attempted before the models are loaded.

This is a patch release of the Image OCR NIM (NeMo Retriever OCR).
Image OCR NIM (NeMo Retriever OCR) now selects the smallest sufficient TensorRT profile for the configured NIM_TRITON_MAX_BATCH_SIZE instead of loading all TensorRT profiles simultaneously.
HTTP responses with code 422 now have body formats that comply with the OpenAPI standard.
Added the NIM_TRITON_MAX_QUEUE_DELAY_MICROSECONDS alias for backward compatibility with the EA release of Image OCR NIM (NeMo Retriever OCR). For details, refer to environment variables.
Changed the NIM_TRITON_MAX_QUEUE_DELAY_MICROSECONDS default value from 0 to 10000. For details, refer to environment variables.

The persistence.enabled value and all related dependent configuration flags are currently non-functional in the NIM helm chart.

This is the first General Access release of the NVIDIA NIM for Image OCR (NeMo Retriever OCR).
Added TRT optimized engines for CUDA GPU Compute Capability. Support includes 12.0, 10.0, 9.0, 8.9, 8.6, and 8.0.
The NIM_TRITON_OPTIMIZATION_MODE environment variable is no longer supported.

The persistence.enabled value and all related dependent configuration flags are currently non-functional in the NIM helm chart.

Upgraded to use Triton Inference Server 25.08 to address CVEs.
Added Triton Ensemble Configuration which supports configuring the underlying Triton Ensemble model pipeline.
Added the NIM_TRITON_PINNED_MEMORY_POOL_MB environment variable.
Added the NIM_TRITON_ENABLE_MODEL_CONTROL environment variable.
Added the NIM_TRITON_IDLE_BYTES_LIMIT environment variable.
Added the NIM_TRITON_FLUSH_INTERVAL environment variable.
Added the NIM_TRITON_RATE_LIMIT environment variable.

The persistence.enabled value and all related dependent configuration flags are currently non-functional in the NIM helm chart.

This is the first Early Access release of the NVIDIA NIM for Image OCR (NeMo Retriever OCR).