Release Notes for NVIDIA NIM for Image OCR (NeMo Retriever OCR)#

This documentation contains the release notes for NVIDIA NIM for Image OCR (NeMo Retriever OCR).

Release 1.4.0#

Highlights#

  • Added the nemotron-ocr-v2 model.

  • Added OCR v2 support for English and multilingual document text extraction.

  • Added the NIM_OCR_MODEL_VERSION environment variable to select multilingual OCR (v2, default) or English OCR (v2-en).

  • Added FP16 TensorRT profiles for supported NVIDIA data center GPUs.

  • Added a Helm chart for nemotron-ocr-v2 deployments.

Known Issues#

  • Setting NIM_TRITON_ENABLE_MODEL_CONTROL=true causes a race condition in which model warmup is attempted before the models are loaded.

  • On NVIDIA A10G and NVIDIA L4 GPUs, the multilingual OCR v2 TensorRT profiles can produce lower OCR accuracy than the PyTorch profile. To resolve this issue, refer to Use PyTorch on A10G and L4 GPUs.

Release 1.3.0#

Highlights#

  • Rename existing models to the new Nemotron brand. The impacted models are the following:

    • The nemoretriever-ocr-v1 model is now named nemotron-ocr-v1.

  • Add fixes for high and critical vulnerabilities.

  • Added performance optimizations.

  • Added the following new environment variables. For details, refer to environment variables.

    • NIM_TRITON_DATA_MAX_BATCH_SIZE

    • NIM_TRITON_DYNAMIC_BATCHING_ENABLED

    • NIM_TRITON_ENABLE_ASYNC_MODEL_EXECUTION

    • NIM_TRITON_ENABLE_PIPELINE_TIMING

    • NIM_TRITON_GPU_DECODING_BATCH_THRESHOLD

    • NIM_TRITON_MODEL_MAX_BATCH_SIZE

    • NIM_TRITON_MODEL_MAX_QUEUE_DELAY_MICROSECONDS

    • NIM_TRITON_PIPELINE_MAX_BATCH_SIZE

    • NIM_TRITON_PIPELINE_MAX_QUEUE_DELAY_MICROSECONDS

    • NIM_TRITON_PIPELINE_TIMING_INTERVAL

    • NIM_TRITON_WORKER_INSTANCE_COUNT

Fixed Known Issues#

The following are the known issues that are fixed in this version:

  • Fixed an issue with the persistence.enabled helm chart value. Persistent storage options (persistence.storageClass, persistence.existingClaim, hostPath.enabled) are now fully functional.

  • Memory is now freed after inference requests complete.

  • NIM_TRITON_MODEL_INSTANCE_COUNT now controls only the number of model instances, not the number of pipeline workers.

Known Issues#

  • Setting NIM_TRITON_ENABLE_MODEL_CONTROL=true causes a race condition in which model warmup is attempted before the models are loaded.

Release 1.2.1#

Summary#

  • This is a patch release of the Image OCR NIM (NeMo Retriever OCR).

  • Image OCR NIM (NeMo Retriever OCR) now selects the smallest sufficient TensorRT profile for the configured NIM_TRITON_MAX_BATCH_SIZE instead of loading all TensorRT profiles simultaneously.

  • HTTP responses with code 422 now have body formats that comply with the OpenAPI standard.

  • Added the NIM_TRITON_MAX_QUEUE_DELAY_MICROSECONDS alias for backward compatibility with the EA release of Image OCR NIM (NeMo Retriever OCR). For details, refer to environment variables.

  • Changed the NIM_TRITON_MAX_QUEUE_DELAY_MICROSECONDS default value from 0 to 10000. For details, refer to environment variables.

Known Issues#

  • The persistence.enabled value and all related dependent configuration flags are currently non-functional in the NIM helm chart.

Release 1.2.0#

Summary#

  • This is the first General Access release of the NVIDIA NIM for Image OCR (NeMo Retriever OCR).

  • Added TRT optimized engines for CUDA GPU Compute Capability. Support includes 12.0, 10.0, 9.0, 8.9, 8.6, and 8.0.

  • The NIM_TRITON_OPTIMIZATION_MODE environment variable is no longer supported.

Known Issues#

  • The persistence.enabled value and all related dependent configuration flags are currently non-functional in the NIM helm chart.

Release 1.1.0#

Summary#

Known Issues#

  • The persistence.enabled value and all related dependent configuration flags are currently non-functional in the NIM helm chart.

Release 1.0.0#

Summary#

This is the first Early Access release of the NVIDIA NIM for Image OCR (NeMo Retriever OCR).

Known Issues#

  • This release only supports a single GPU-agnostic PyTorch backend profile.