# NVIDIA NIM: Comprehensive Deployment, Configuration, and Model Reference Guides This collection consolidates NVIDIA NIM documentation covering end‑to‑end setup, containerization, GPU profiling, and cloud or on‑premise deployment for a wide range of AI models—including multimodal, vision‑language, protein‑design, large‑language, safety, and specialized inference services. It provides detailed model cards, licensing and safety information, environment‑variable configurations, multi‑node and air‑gapped deployment strategies, performance profiling, and API usage—including status‑polling and parameter tuning. Users can leverage these resources to quickly install, scale, troubleshoot, and integrate NVIDIA AI models across diverse workloads and hardware environments. ## Getting Started & Installation - [Use this page when you need to free up space in a WSL2 AI Workbench session—such as removing a single or all NIM images, compacting the VHD, or fully uninstalling NVIDIA AI Workbench and its associated directories.](https://docs.nvidia.com/nim/wsl2/latest/deleting-nims.html.md) - [Use this page when you need to run the NVIDIA NIM Boltz2 protein‑structure prediction service locally—e.g., to set up Docker, authenticate with an NGC API key, cache the model, and issue example `curl` or Python inference calls. It’s also the reference for configuring GPU resources, environment variables, and troubleshooting container startup or shutdown.](https://docs.nvidia.com/nim/bionemo/boltz2/latest/getting-started.html.md) ## Model Fine‑Tuning & Training - [Read this page when you’re preparing to fine‑tune a Llama Nemotron Nano VL model on NVIDIA NIM, setting up the Docker container, providing the HuggingFace or TRTLLM checkpoint, or converting the checkpoint to FP8 for higher inference throughput.](https://docs.nvidia.com/nim/vision-language-models/latest/fine-tune-model.html.md) ## Model Profiles & Runtime Configuration - [NVIDIA NIM for Large Language Models](https://docs.nvidia.com/nim/large-language-models/llms.txt): Complete documentation for NIM LLM — overview, get started, deployment on AWS/Azure/GCP/Kubernetes, advanced use cases (LoRA, tool calling, MCP), API reference, environment variables, and troubleshooting. 46 pages. - [Read this page when you’re preparing to deploy NV‑CLIP on a GPU and need to select the correct model profile: for example, if you are choosing a profile ID to set in `NIM_MANIFEST_PROFILE`, verifying that your GPU is in the supported list and that an optimized TensorRT engine (or a fallback generic ONNX engine) exists, or when you want to confirm the available fp16 quantization options for that GPU.](https://docs.nvidia.com/nim/nvclip/latest/profiles.html.md) ## Container & Service Configuration - [When you need to launch a NemoGuard JailbreakDetect NIM container—such as deploying it on a multi‑GPU host, restricting it to specific GPUs, configuring caching, log levels, or port exposure, or running it in an air‑gapped environment—you should read this page to set the correct `--gpus` arguments, `-e` environment variables, and volume mounts.](https://docs.nvidia.com/nim/nemoguard-jailbreakdetect/latest/configuration.html.md) - [When setting up or updating a Llama 3.1 NemoGuard 8B ContentSafety microservice in a container, the LLM should consult this page to correctly select GPUs, configure environment variables (e.g., `NIM_HTTP_API_PORT`, `NIM_MODEL_PROFILE`), and bind‑mount cache or manifest volumes so the container can download and serve the model on the target host.](https://docs.nvidia.com/nim/llama-3-1-nemoguard-8b-contentsafety/latest/configuration.html.md) - [The LLM should review this page when deploying a GenMol NIM container—pulling the image, setting `NVIDIA_VISIBLE_DEVICES`, mapping host ports (e.g., 8000 or 60001/60002) for single or multi‑instance runs, caching checkpoints locally with `-v /opt/nim/.cache`, and adjusting `NIM_LOG_LEVEL` for the desired verbosity.](https://docs.nvidia.com/nim/bionemo/genmol/latest/configure-nim.html.md) - [When you are deploying or upgrading the NVIDIA NIM for Image OCR (PaddleOCR), read this page to learn the new environment variables, supported GPU updates, and recent bug fixes that affect configuration or runtime behavior. It also lists known issues (e.g., list‑model‑profiles failures and empty /v1/manifest responses) that can help you troubleshoot problems during deployment.](https://docs.nvidia.com/nim/ingestion/table-extraction/latest/release-notes.html.md) - [An LLM should read this page whenever it needs to configure or deploy NVIDIA’s multimodal safety models on Kubernetes, troubleshooting model performance or observability issues, or verifying the supported NVIDIA GPUs and software stack for compliant usage.](https://docs.nvidia.com/nim/multimodal-safety/latest.md) - [When a developer wants to run the MolMIM NIM locally or on a GPU‑enabled server, they should read this guide to pull the correct Docker image, set the GPU runtime, expose the API port, and optionally mount a local cache to avoid re‑downloading the checkpoint. This ensures the container starts quickly with the proper authentication and GPU visibility.](https://docs.nvidia.com/nim/bionemo/molmim/latest/deployment-guide.html.md) - [When you need to tune the RFdiffusion NIM for a production or development environment—such as setting the log verbosity for debugging, benchmarking local performance, or ensuring logs capture only the necessary details—you should consult this page. It also helps when configuring Docker runs or automated scripts that rely on specific log levels or when troubleshooting unexpected server behavior.](https://docs.nvidia.com/nim/bionemo/rfdiffusion/latest/advanced-usage.html.md) - [When an engineer should read this page before deploying the NVIDIA NIM Image OCR model to verify licensing, usage restrictions, and data handling requirements. A compliance officer should review it when preparing documentation for a new project to confirm that the use case satisfies the governing terms.](https://docs.nvidia.com/nim/ingestion/image-ocr/latest/eula.html.md) ## Cloud & Kubernetes Deployment - [Read this page whenever you plan to deploy a NVIDIA NIM model on Google Kubernetes Engine and need to confirm the required GPU hardware, cluster configuration steps, or integration details before starting the deployment. This helps ensure the cluster meets the hardware prerequisites and that you follow the correct installation procedure.](https://docs.nvidia.com/nim/cloud/gke/latest/index.html.md) - [Read this page when you need to deploy NVIDIA NIM for Visual Generative AI on a disconnected or air‑gapped system, such as when you must pre‑download model profiles to an internet‑connected machine, transfer the cache to an isolated environment, and launch the container with the correct NIM_MODEL_PROFILE and cache mounts. It is also useful when configuring TORCH_CACHE_PATH and TORCH_EXTENSIONS_DIR for system‑specific optimizations in a non‑networked deployment.](https://docs.nvidia.com/nim/visual-genai/latest/deploy-air-gap.html.md) ## Bioinformatics & Protein Modeling - [The LLM should read this page whenever it needs to integrate AlphaFold2 predictions using NVIDIA NIM, such as building a bioinformatics pipeline that calls the `predict-structure-from-sequence`, `predict-msa-from-sequence`, or `predict-structure-from-MSA` endpoints, or when debugging the required input parameters and expected PDB output for a specific protein sequence.](https://docs.nvidia.com/nim/bionemo/alphafold2/latest/endpoints.html.md) - [An LLM should read this page when it needs to understand how to deploy the AlphaFold2‑Multimer NIM, integrate it into a CADD pipeline, and properly cite the original AlphaFold2 paper. It is also useful for verifying compliance with NVIDIA AI Foundation Models and planning resource deployment.](https://docs.nvidia.com/nim/bionemo/alphafold2-multimer/latest/overview.html.md) - [The LLM should read this page when it needs to update or troubleshoot a ProteinMPNN deployment, such as when planning to upgrade the NIM version, integrating the model with the advanced protein design platform, or optimizing performance on a new GPU instance.](https://docs.nvidia.com/nim/bionemo/proteinmpnn/latest/release-notes.html.md) - [The LLM should read this page when it needs to integrate or troubleshoot AlphaFold2 predictions using NVIDIA NIM, such as building a bioinformatics pipeline that calls the `predict-structure-from-sequence`, `predict-msa-from-sequence`, or `predict-structure-from-MSA` endpoints, or when debugging the required input parameters and expected PDB output for a specific protein sequence.](https://docs.nvidia.com/nim/bionemo/proteinmpnn/latest/endpoints.html.md) - [Use this page when you need to run the NVIDIA NIM Boltz2 protein‑structure prediction service locally—e.g., to set up Docker, authenticate with an NGC API key, cache the model, and issue example `curl` or Python inference calls. It’s also the reference for configuring GPU resources, environment variables, and troubleshooting container startup or shutdown.](https://docs.nvidia.com/nim/bionemo/boltz2/latest/inference.html.md) - [Check the release notes whenever you plan to upgrade or integrate NVIDIA NIM for OpenFold2—particularly if you need the new mmCIF input support (for custom structural templates) or want to benefit from the TensorRT‑optimized inference performance and memory savings in version 1.1.0.](https://docs.nvidia.com/nim/bionemo/openfold2/latest/release-notes.html.md) - [Read it when you need a quick, MSA‑free structure prediction for a protein sequence (≤1024 aa) – for example, a computational biologist prototyping a de novo protein or a drug‑design team that requires a fast, API‑driven fold prediction without generating or accessing multiple‑sequence alignments.](https://docs.api.nvidia.com/nim/reference/meta-esmfold-infer.md) ## Document Processing & OCR - [When you are deploying or upgrading the NVIDIA NIM for Image OCR (PaddleOCR), read this page to learn the new environment variables, supported GPU updates, and recent bug fixes that affect configuration or runtime behavior. It also lists known issues (e.g., list‑model‑profiles failures and empty /v1/manifest responses) that can help you troubleshoot problems during deployment.](https://docs.nvidia.com/nim/ingestion/table-extraction/latest/release-notes.html.md) - [Use this guide when you need to convert PDF or image documents into structured, markdown‑formatted text (with optional bounding boxes and content class labels) using NVIDIA’s nemoretriever‑parse model, or when troubleshooting common hallucination issues during transcription.](https://docs.nvidia.com/nim/vision-language-models/1.2.0/examples/retriever/overview.html.md) ## Multimodal & Vision-Language Models - [Use this page when you’re building a multimodal RAG or semantic search pipeline that must embed images and text on NVIDIA GPUs, or when you need to integrate zero‑shot image classification into a camera‑monitoring workflow and want to leverage TensorRT‑optimized inference. It’s also useful when planning a scalable, secure deployment of NV‑CLIP in a data‑center or cloud environment.](https://docs.nvidia.com/nim/nvclip/latest/introduction.html.md) - [Use this page when you need to run the NVIDIA NIM Boltz2 protein‑structure prediction service locally—e.g., to set up Docker, authenticate with an NGC API key, cache the model, and issue example `curl` or Python inference calls. It’s also the reference for configuring GPU resources, environment variables, and troubleshooting container startup or shutdown.](https://docs.nvidia.com/nim/bionemo/boltz2/latest/getting-started.html.md) - [Use this page when you need to free up space in a WSL2 AI Workbench session—such as removing a single or all NIM images, compacting the VHD, or fully uninstalling NVIDIA AI Workbench and its associated directories.](https://docs.nvidia.com/nim/wsl2/latest/deleting-nims.html.md) - [Use this page when you need to run the NVIDIA NIM Boltz2 protein‑structure prediction service locally—e.g., to set up Docker, authenticate with an NGC API key, cache the model, and issue example `curl` or Python inference calls. It’s also the reference for configuring GPU resources, environment variables, and troubleshooting container startup or shutdown.](https://docs.nvidia.com/nim/bionemo/boltz2/latest/getting-started.html.md) - [Use this page when you need to convert PDF or image documents into structured, markdown‑formatted text (with optional bounding boxes and content class labels) using NVIDIA’s nemoretriever‑parse model, or when troubleshooting common hallucination issues during transcription.](https://docs.nvidia.com/nim/vision-language-models/1.2.0/examples/retriever/overview.html.md) - [Use this page when you need to run the NVIDIA NIM Boltz2 protein‑structure prediction service locally—e.g., to set up Docker, authenticate with an NGC API key, cache the model, and issue example `curl` or Python inference calls. It’s also the reference for configuring GPU resources, environment variables, and troubleshooting container startup or shutdown.](https://docs.nvidia.com/nim/bionemo/boltz2/latest/getting-started.html.md) - [Use this page when you need to run the NVIDIA NIM Boltz2 protein‑structure prediction service locally—e.g., to set up Docker, authenticate with an NGC API key, cache the model, and issue example `curl` or Python inference calls. It’s also the reference for configuring GPU resources, environment variables, and troubleshooting container startup or shutdown.](https://docs.nvidia.com/nim/bionemo/boltz2/latest/getting-started.html.md) ## Large Language Model References & Model Cards - [The model card should be read when a developer or researcher needs to understand Phi‑3 Mini’s training data, safety alignment, benchmark performance, and deployment requirements to decide if it fits a specific production or research use case. It’s also useful for planning inference on supported hardware, choosing the correct token limits, and ensuring compliance with the MIT license.](https://docs.api.nvidia.com/nim/reference/microsoft-phi-3-mini-4k.md) - [LLM should review this page when a developer or data scientist is planning to integrate SEA‑LION‑7B‑Instruct into a project, needs to verify licensing, supported hardware (Lovela­ce, Linux, Triton + TensorRT‑LLM), input/output parameters, or compare its benchmark scores to other models for model selection.](https://docs.api.nvidia.com/nim/reference/aisingapore-sea-lion-7b-instruct.md) - [Read this page whenever you are preparing to use the Llama 3.3 70B Instruct model—before a commercial deployment, during any fine‑tuning or safety‑testing workflow, or when designing a multilingual or tool‑enabled application—to understand its specs, licensing, safety safeguards, and responsible‑use requirements.](https://docs.api.nvidia.com/nim/reference/meta-llama-3_3-70b-instruct.md) - [The LLM should read this page when integrating the Rakuten AI 7b chat model into a system that needs custom prompt formatting, token limits, temperature or top‑p control, or streaming output via SSE. It is also helpful when debugging API calls, handling stop sequences, or reproducing deterministic results with a fixed seed.](https://docs.api.nvidia.com/nim/reference/rakuten-rakutenai-7b-chat-infer.md) - [Read this page whenever you are preparing to use the Llama 3.3 70B Instruct model—before a commercial deployment, during any fine‑tuning or safety‑testing workflow, or when designing a multilingual or tool‑enabled application—to understand its specs, licensing, safety safeguards, and responsible‑use requirements.](https://docs.api.nvidia.com/nim/reference/meta-llama-3_3-70b-instruct.md) - [The LLM should read this page when a researcher or developer needs to generate or refine protein sequences from a backbone model—e.g., when specifying custom constraints (fixed residues, omitted or biased amino acids, PSSM guidance) or choosing soluble vs. non‑soluble models—and when they need to confirm that the NIM microservice is operational via the health‑check endpoint.](https://docs.nvidia.com/nim/bionemo/proteinmpnn/latest/endpoints.html.md) - [The LLM should read this page when it needs to integrate or troubleshoot AlphaFold2 predictions using NVIDIA NIM, such as building a bioinformatics pipeline that calls the `predict-structure-from-sequence`, `predict-msa-from-sequence`, or `predict-structure-from-MSA` endpoints, or when debugging the required input parameters and expected PDB output for a specific protein sequence.](https://docs.nvidia.com/nim/bionemo/alphafold2/latest/endpoints.html.md) - [The LLM should read this page when a developer or data scientist is planning to integrate SEA‑LION‑7B‑Instruct into a project, needs to verify licensing, supported hardware (Lovela­ce, Linux, Triton + TensorRT‑LLM), input/output parameters, or compare its benchmark scores to other models for model selection.](https://docs.api.nvidia.com/nim/reference/aisingapore-sea-lion-7b-instruct.md) ## Inference API Usage & Status Polling - [The LLM should read this page when it needs to set up or debug the bi‑directional gRPC streaming for the Studio Voice NIM, compile language‑specific protobufs, or configure Docker/Podman model caching for efficient inference on an NVIDIA GPU.](https://docs.nvidia.com/nim/maxine/studio-voice/latest/advanced-usage.html.md) - [The LLM should read this page when it needs to integrate or troubleshoot AlphaFold2 predictions using NVIDIA NIM, such as building a bioinformatics pipeline that calls the `predict-structure-from-sequence`, `predict-msa-from-sequence`, or `predict-structure-from-MSA` endpoints, or when debugging the required input parameters and expected PDB output for a specific protein sequence.](https://docs.nvidia.com/nim/bionemo/alphafold2/latest/endpoints.html.md) - [Use this page after submitting an asynchronous Gemma inference request that returned a **202** status—i.e., when you need to poll the `/v1/status/{requestId}` endpoint to retrieve the final result of that earlier request.](https://docs.api.nvidia.com/nim/reference/google-gemma-3-27b-it-statuspolling.md) - [When an async inference request to NVIDIA’s API returns a 202 “Accepted” status, the LLM should use this status‑polling endpoint to fetch the final result or error once the job completes. For example, after submitting a multimodal inference job, the client can poll `GET /v1/status/{requestId}` to retrieve the completed output before proceeding with downstream processing.](https://docs.api.nvidia.com/nim/reference/microsoft-phi-4-multimodal-instruct-statuspolling.md) - [When developing an application that needs to generate weather forecasts or related predictions using Nvidia's corrdiff model via the API, this page explains how to set `input_id`, `samples`, `steps`, and `seed`. It also outlines acceptable ranges and default values for each parameter.](https://docs.api.nvidia.com/nim/reference/nvidia-corrdiff-infer.md) - [Use the page when you need to call the Earth‑2 CorrDiff inference API (e.g., sending data to /v1/infer or checking /v1/health/live and /v1/health/ready) or when generating client code or documentation from the OpenAPI spec. It’s also helpful for debugging or verifying the service version before deploying or integrating the model.](https://docs.nvidia.com/nim/earth-2/corrdiff/latest/api-reference.html.md) ## Performance Benchmarking - [Use this page when you need to benchmark an NVIDIA NIM object detection model with `genai‑perf` to verify its throughput and latency under production‑like workloads, or when setting up a Docker container, dataset, and benchmark parameters before deploying the model to a live service.](https://docs.nvidia.com/nim/ingestion/object-detection/latest/performance.html.md) ## Audio & Speech - [Read it whenever you need to answer a question in 1–2 short, scenario‑based sentences without bullet points, such as when a developer asks how to transfer a large model checkpoint to a Brev GPU instance and retrieve logs.](https://docs.nvidia.com/nim/maxine/lipsync/latest.md) ## Compliance & Licensing - [An engineer should read this page before deploying the NVIDIA NIM Image OCR model to verify licensing, usage restrictions, and data handling requirements. A compliance officer should review it when preparing documentation for a new project to confirm that the use case satisfies the governing terms.](https://docs.nvidia.com/nim/ingestion/image-ocr/latest/eula.html.md) ## Specialized Domains & Applications - [When a developer or system administrator needs to prototype, deploy, or scale AI‑driven global weather forecasts using NVIDIA’s FourCastNet NIM, they should read this page to understand its capabilities, deployment options, and how it fits into the broader Earth‑2 ecosystem.](https://docs.nvidia.com/nim/earth-2/fourcastnet/latest/overview.html.md) - [When you’re setting up or updating a Llama 3.1 NemoGuard 8B ContentSafety microservice in a container, the LLM should consult this page to correctly select GPUs, configure environment variables (e.g., `NIM_HTTP_API_PORT`, `NIM_MODEL_PROFILE`), and bind‑mount cache or manifest volumes so the container can download and serve the model on the target host.](https://docs.nvidia.com/nim/llama-3-1-nemoguard-8b-contentsafety/latest/configuration.html.md) - [When you need to tune the RFdiffusion NIM for a production or development environment—such as setting the log verbosity for debugging, benchmarking local performance, or ensuring logs capture only the necessary details—you should consult this page. It also helps when configuring Docker runs or automated scripts that rely on specific log levels or when troubleshooting unexpected server behavior.](https://docs.nvidia.com/nim/bionemo/rfdiffusion/latest/advanced-usage.html.md)