Deploy NeMo 2.0 LLMs by Exporting to Inference Optimized Libraries#
NeMo Export-Deploy library offers scripts and APIs to export NeMo 2.0 models to two inference optimized libraries, TensorRT-LLM and vLLM, and to deploy the exported model with the NVIDIA Triton Inference Server and Ray Serve.