Release Notes for NVIDIA NIM for Image OCR (NeMo Retriever OCR)#
This documentation contains the release notes for NVIDIA NIM for Image OCR (NeMo Retriever OCR).
Release 1.3.0#
Highlights#
Rename existing models to the new Nemotron brand. The impacted models are the following:
The nemoretriever-ocr-v1 model is now named nemotron-ocr-v1.
Add fixes for high and critical vulnerabilities.
Added performance optimizations.
Added the following new environment variables. For details, refer to environment variables.
NIM_TRITON_DATA_MAX_BATCH_SIZENIM_TRITON_DYNAMIC_BATCHING_ENABLEDNIM_TRITON_ENABLE_ASYNC_MODEL_EXECUTIONNIM_TRITON_ENABLE_PIPELINE_TIMINGNIM_TRITON_GPU_DECODING_BATCH_THRESHOLDNIM_TRITON_MODEL_MAX_BATCH_SIZENIM_TRITON_MODEL_MAX_QUEUE_DELAY_MICROSECONDSNIM_TRITON_PIPELINE_MAX_BATCH_SIZENIM_TRITON_PIPELINE_MAX_QUEUE_DELAY_MICROSECONDSNIM_TRITON_PIPELINE_TIMING_INTERVALNIM_TRITON_WORKER_INSTANCE_COUNT
Fixed Known Issues#
The following are the known issues that are fixed in this version:
Fixed an issue with the
persistence.enabledhelm chart value. Persistent storage options (persistence.storageClass,persistence.existingClaim,hostPath.enabled) are now fully functional.Memory is now freed after inference requests complete.
NIM_TRITON_MODEL_INSTANCE_COUNTnow controls only the number of model instances, not the number of pipeline workers.
Known Issues#
Setting
NIM_TRITON_ENABLE_MODEL_CONTROL=truecauses a race condition in which model warmup is attempted before the models are loaded.
Release 1.2.1#
Summary#
This is a patch release of the Image OCR NIM (NeMo Retriever OCR).
Image OCR NIM (NeMo Retriever OCR) now selects the smallest sufficient TensorRT profile for the configured
NIM_TRITON_MAX_BATCH_SIZEinstead of loading all TensorRT profiles simultaneously.HTTP responses with code 422 now have body formats that comply with the OpenAPI standard.
Added the
NIM_TRITON_MAX_QUEUE_DELAY_MICROSECONDSalias for backward compatibility with the EA release of Image OCR NIM (NeMo Retriever OCR). For details, refer to environment variables.Changed the
NIM_TRITON_MAX_QUEUE_DELAY_MICROSECONDSdefault value from0to100. For details, refer to environment variables.
Known Issues#
The
persistence.enabledvalue and all related dependent configuration flags are currently non-functional in the NIM helm chart.
Release 1.2.0#
Summary#
This is the first General Access release of the NVIDIA NIM for Image OCR (NeMo Retriever OCR).
Added TRT optimized engines for CUDA GPU Compute Capability. Support includes 12.0, 10.0, 9.0, 8.9, 8.6, and 8.0.
The
NIM_TRITON_OPTIMIZATION_MODEenvironment variable is no longer supported.
Known Issues#
The
persistence.enabledvalue and all related dependent configuration flags are currently non-functional in the NIM helm chart.
Release 1.1.0#
Summary#
Upgraded to use Triton Inference Server 25.08 to address CVEs.
Added Triton Ensemble Configuration which supports configuring the underlying Triton Ensemble model pipeline.
Added the
NIM_TRITON_PINNED_MEMORY_POOL_MBenvironment variable.Added the
NIM_TRITON_ENABLE_MODEL_CONTROLenvironment variable.Added the
NIM_TRITON_IDLE_BYTES_LIMITenvironment variable.Added the
NIM_TRITON_FLUSH_INTERVALenvironment variable.Added the
NIM_TRITON_RATE_LIMITenvironment variable.
Known Issues#
The
persistence.enabledvalue and all related dependent configuration flags are currently non-functional in the NIM helm chart.
Release 1.0.0#
Summary#
This is the first Early Access release of the NVIDIA NIM for Image OCR (NeMo Retriever OCR).
Known Issues#
This release only supports a single GPU-agnostic PyTorch backend profile.