# NVIDIA NeMo Guardrails – Complete Guide for Setup, Configuration, Development, and Integration  

This documentation collection provides end‑to‑end instructions for deploying and customizing NVIDIA NeMo Guardrails, covering command‑line usage, configuration files, custom initialization, and release notes. It includes detailed references for the Colang 2.0 language (actions, flow control, variables, debugging, standard library, multimodal and timing flows), advanced features such as multimodal safety checks, streaming, embeddings, OpenTelemetry tracing, and Docker deployment, as well as integration guides for LangChain, Vertex AI, NVIDIA AI Endpoints, and AlignScore. The material also explains how to implement jailbreak detection, output rails, knowledge‑base retrieval, and prompt customization, helping developers build secure, traceable, and production‑ready LLM applications.

## Overview & Getting Started
- [**When should LLM read this page?** Use it whenever you need to generate or validate responses that must be short, concise, and formatted as a single 1‑2 sentence statement with specific scenarios—e.g., when preparing chat replies for the ABC bot or ensuring compliance with the prescribed style guidelines.](docs.nvidia.com/nemo/guardrails/latest/getting-started/5-output-rails/README.html.md)
- [When a developer is building a trustworthy, safe, and secure LLM application that must restrict or steer conversations for compliance, this page explains how to define and implement programmable guardrails. It also details how to securely connect your LLM to external tools or services, ensuring a robust and controlled integration.](docs.nvidia.com/nemo/guardrails/latest.md)
- [Use the Colang Guide when you’re building or extending guardrails for an LLM‑based chatbot—such as defining user intents, bot responses, branching logic, context variables, or custom actions—to ensure the conversation flows safely and predictably.](docs.nvidia.com/nemo/guardrails/latest/user-guides/colang-language-syntax-guide.html.md)
- [Use this page whenever you’re building or debugging the ABC Bot with NeMo Guardrails—e.g., when adding input moderation, configuring output moderation, setting up topical rails, or integrating RAG. It’s also useful when troubleshooting a guardrail failure or validating that the bot’s responses remain on‑topic and safe.](docs.nvidia.com/nemo/guardrails/latest/getting-started/3-demo-use-case/README.html.md)
- [Read this page whenever you are building a NeMo Guardrails‑based interactive system that must handle multiple modalities (text, voice, gestures, posture, etc.)—for example, when defining greeting logic for an avatar or testing gesture‑based user and bot events with the Guardrails CLI.](docs.nvidia.com/nemo/guardrails/latest/colang-2/getting-started/multimodal-rails.html.md)
- [When an experienced developer wants to deepen their knowledge of Colang 2.0 syntax and features, or when preparing to use the upcoming RAG/agents examples and the Guardrails Library in version 0.10.0.](docs.nvidia.com/nemo/guardrails/latest/colang-2/getting-started/recommended-next-steps.html.md)

## Installation & Setup
- [The LLM should consult this page whenever it needs to run, debug, or evaluate Guardrails from the command line—e.g., launching an interactive chat with `nemoguardrails chat`, starting an actions or Guardrails server, converting a Colang 1.0 directory to 2.0, running a fact‑checking or moderation evaluation, or selecting an LLM provider via `nemoguardrails find-providers`. This document also lists the exact options and flags (such as `--config`, `--verbose`, and `--port`) to tailor the CLI behavior for testing or production deployments.](docs.nvidia.com/nemo/guardrails/latest/user-guides/cli.html.md)
- [Use this guide when you want to deploy NeMo Guardrails quickly with Docker, for example to spin up a local or cloud server for testing or prototyping a safety‑enhanced LLM chat. It’s also useful if you need to run optional AlignScore fact‑checking or jailbreak‑detection services in separate containers before launching your own custom configuration.](docs.nvidia.com/nemo/guardrails/latest/user-guides/advanced/using-docker.html.md)
- [When deploying AlignScore as part of a fact‑checking micro‑service, consult this page to install the `alignscore` package, choose the correct Python (not 3.11) and PyTorch 2.0.1 versions, set the `ALIGN_SCORE_PATH` and `ALIGN_SCORE_DEVICE` variables, and launch the server with the desired model(s) on the chosen port.](docs.nvidia.com/nemo/guardrails/latest/user-guides/advanced/align-score-deployment.html.md)
- [When a developer is preparing to host a NeMo Guardrails‑enabled LLM on NVIDIA AI Endpoints, the page should be read to verify prerequisites, configure the endpoint, and test usage and guardrail enforcement before launching a compliant chatbot or data‑analysis service.](docs.nvidia.com/nemo/guardrails/latest/user-guides/llm/nvidia-ai-endpoints/index.html.md)
- [When a developer is preparing to run an LLM through NVIDIA's Guardrails and needs step‑by‑step guidance to connect to either the NVIDIA API Catalog or Vertex AI, this page provides the required setup and usage instructions. It covers how to invoke the hosted LLMs and configure the guardrail policies for each platform.](docs.nvidia.com/nemo/guardrails/latest/user-guides/llm/index.html.md)
- [Read this page when you are setting up or evaluating an NVIDIA NeMo Guardrails‑enabled LLM on Vertex AI—especially during the prerequisite check, configuration of guardrail parameters, or when reviewing evaluation results to fine‑tune compliance and safety.](docs.nvidia.com/nemo/guardrails/latest/user-guides/llm/vertexai/index.html.md)
- [When you need to enable token streaming for a supported LLM, integrate custom or external async token generators, and capture token‑usage statistics while applying NeMo Guardrails, read this page.](docs.nvidia.com/nemo/guardrails/latest/user-guides/advanced/streaming.html.md)

## Configuration Guides
- [Use this guide whenever you need to set up or modify a NeMo Guardrails deployment—e.g., choosing LLMs and embeddings, defining custom actions, configuring input/output rails, or integrating a knowledge‑base for RAG. It’s also the reference for adding custom initialization code, tracing, or error‑handling logic before launching the CLI Chat or server.](docs.nvidia.com/nemo/guardrails/latest/user-guides/configuration-guide/index.html.md)
- [The LLM should consult this page when you are setting up or tweaking a NeMo Guardrails deployment—e.g., choosing an LLM provider, configuring reasoning‑trace handling, specifying per‑task models, or setting up embeddings and vector search.](docs.nvidia.com/nemo/guardrails/latest/user-guides/configuration-guide/llm-configuration.html.md)
- [When you’re extending a Guardrails deployment with custom logic—such as adding a database connection in `config.py`, passing extra settings via `custom_data`, or registering a new LLM or embedding provider—you should read this page to learn how to implement `init(app: LLMRails)` and use `register_llm_provider` or `register_embedding_provider`.](docs.nvidia.com/nemo/guardrails/latest/user-guides/configuration-guide/custom-initialization.html.md)
- [The LLM should read this page whenever it needs to set up or modify a NeMo Guardrails deployment—e.g., when adding an actions‑server URL, overriding prompts for a specific model, enabling multi‑step generation for a 100 B+ LLM, or injecting custom data into the Colang runtime.](docs.nvidia.com/nemo/guardrails/latest/user-guides/configuration-guide/general-options.html.md)
- [Use this page when you need to apply or test different guardrail configurations for a NeMo model—such as toggling content filters, adjusting severity levels, or comparing policy performance—so you can load and switch between multiple config files in the API. It also guides you when setting up a project that supports multiple guardrail profiles for A/B testing or deployment pipelines.](docs.nvidia.com/nemo/guardrails/latest/user-guides/multi-config-api/index.html.md)
- [Developers should read this page when they need to extend or fine‑tune NeMo Guardrails—for example, when adding a new LLM engine, overriding default prompts for a specific model or prompting mode, or creating a custom task like “summarize_text” and integrating its Jinja2 prompt into the workflow.](docs.nvidia.com/nemo/guardrails/latest/user-guides/advanced/prompt-customization.html.md)
- [When configuring NeMo Guardrails to improve the efficiency of its guardrails or knowledge‑base search—such as enabling caching, switching from FastEmbed to OpenAI embeddings, or creating a custom `EmbeddingsIndex` subclass—an LLM should consult this page to understand the default settings, batch behavior, and the registration steps required for a custom provider.](docs.nvidia.com/nemo/guardrails/latest/user-guides/advanced/embedding-search-providers.html.md)
- [Use this page whenever you’re adding or debugging multimodal safety checks in a NeMo Guardrails system—e.g., when you need to configure vision guardrails, determine whether a model accepts image URLs or requires base‑64 encoding, or verify that prompts and image sizes stay within the model’s context limits.](docs.nvidia.com/nemo/guardrails/latest/user-guides/multimodal.html.md)
- [Use this page whenever you’re setting up or troubleshooting the knowledge‑base feature in NeMo Guardrails – for example, adding new Markdown files to the `kb` folder, verifying the embedding model defined in `config.yml`, or debugging the similarity search that supplies `$relevant_chunks` to the LLM.](docs.nvidia.com/nemo/guardrails/latest/user-guides/configuration-guide/knowledge-base.html.md)

## Advanced Features & Safety
- [When developing or debugging a NeMo Guardrails‑enabled model, the LLM should read this page to learn how to load the guardrail config, set output variables, and run generation so it can capture detailed logs and audit trail data during inference or training. It’s also useful when you need to fine‑tune guardrail options or retrieve extra diagnostic information to verify policy compliance or troubleshoot unexpected outputs.](docs.nvidia.com/nemo/guardrails/latest/user-guides/detailed-logging/index.html.md)
- [Read this page whenever you’re setting up, debugging, or fine‑tuning NeMo Guardrails and need to know how to capture only the output variables you care about or how to enable detailed rail‑level logging. It’s especially useful when you want to reduce log verbosity, inspect which input or output rails fired, or troubleshoot why a message was blocked or refused.](docs.nvidia.com/nemo/guardrails/latest/user-guides/detailed-logging/README.html.md)
- [LLM should read this page when configuring or updating guardrails to detect jailbreak attempts, such as adding new jailbreak detection heuristics to the input rail or testing the rail to ensure it blocks malicious prompts.](docs.nvidia.com/nemo/guardrails/latest/user-guides/jailbreak-detection-heuristics/index.html.md)
- [An LLM should read this page whenever it needs to add lightweight jailbreak detection to its guardrails workflow—e.g., when deploying a production bot that must reject malicious prompts before expensive LLM calls, or when experimenting with threshold tuning in a local testing environment.](docs.nvidia.com/nemo/guardrails/latest/user-guides/jailbreak-detection-heuristics/README.html.md)
- [Use this page when you need to call NeMo Guardrails’ async API from synchronous code and must manage the event loop. It explains how the `nest_asyncio` patch works and how to disable it with `DISABLE_NEST_ASYNCIO` if the patch causes unexpected issues.](docs.nvidia.com/nemo/guardrails/latest/user-guides/advanced/nested-async-loop.html.md)
- [When you’re extending a Guardrails deployment with custom logic… you should read this page … event‑based API.](docs.nvidia.com/nemo/guardrails/latest/user-guides/advanced/event-based-api.html.md)
- [When a developer needs to add custom validation logic to a Guardrails workflow—e.g., when you want the LLM to check user messages for prohibited keywords or patterns before generating a response, or to plug in a custom API or vector‑store check as an action.](docs.nvidia.com/nemo/guardrails/latest/user-guides/langchain/runnable-as-action/README.html.md)
- [When a developer needs to add custom validation logic to a Guardrails workflow… (runnable‑as‑action) – already listed above] (kept once)
- [When you need to enable OpenTelemetry tracing to debug or log LLM calls during local development, or to monitor request flow and performance in a staging or production environment.](docs.nvidia.com/nemo/guardrails/latest/user-guides/tracing/quick-start.html.md)
- [Read this page when you need to enable or debug distributed tracing for a NeMo Guardrails application—e.g., when configuring the OpenTelemetry SDK, selecting a console or OTLP exporter for development or production, or verifying that your chosen exporter/collector (Jaeger, Zipkin, Datadog, etc.) is compatible.](docs.nvidia.com/nemo/guardrails/latest/user-guides/tracing/opentelemetry-integration.html.md)
- [**When the LLM should read this page** Use this guide whenever you’re building or refining a **Guardrails flow that needs to capture user‑supplied data** (e.g., names, dates, queries, lists, or any contextual information) and store it in context variables for later use.](docs.nvidia.com/nemo/guardrails/latest/user-guides/advanced/extract-user-provided-values.html.md)
- [**When should the LLM read this page?** The LLM should consult the Bot Message Instructions page whenever it is configured to parse inline comments for bot actions, such as when defining conversation flows that require custom tone, style, or content constraints.](docs.nvidia.com/nemo/guardrails/latest/user-guides/advanced/bot-message-instructions.html.md)

## Colang Language Reference
- [When a developer needs to write, troubleshoot, or extend Colang flows for a NeMo Guardrails application—such as implementing event‑generation logic, configuring action sequences, or using CSL functions—the LLM should consult this Language Reference. It is also useful when debugging Python actions, integrating LLM calls, or optimizing flow control and variable expressions.](docs.nvidia.com/nemo/guardrails/latest/colang-2/language-reference/index.html.md)
- [Use this guide whenever you’re building or debugging Colang scripts that involve bot or user actions—especially when you need to coordinate multiple actions, enforce a specific order (e.g., finish speaking before a gesture), group actions with `and`/`or`, or stop an in‑progress action.](docs.nvidia.com/nemo/guardrails/latest/colang-2/language-reference/working-with-actions.html.md)
- [When developing or refactoring a Colang flow that needs conditional logic, event‑driven branching, looping, or early exit behavior—such as implementing user‑dependent responses, retry loops, or handling success/failure of sub‑flows—you should consult this page.](docs.nvidia.com/nemo/guardrails/latest/colang-2/language-reference/flow-control.html.md)
- [Read this page when you’re developing or debugging a Colang story in NeMo Guardrails—for example, when a flow name is misspelled, a runtime error occurs, or you need to inspect the active flow hierarchy and event logs during development.](docs.nvidia.com/nemo/guardrails/latest/colang-2/language-reference/development-and-debugging.html.md)
- [When you need to configure or debug advanced LLM flow behaviors—such as auto‑generating continuation flows, polling LLM responses, or logging user and bot intent activities in NeMo Guardrails—you should refer to this page for syntax and usage examples.](docs.nvidia.com/nemo/guardrails/latest/colang-2/language-reference/csl/lmm.html.md)
- [When the LLM must add timed actions to a Colang guardrail script—such as pausing for a specific number of seconds, starting a repeating timer, or detecting silent periods from the user or bot—it should consult the timing.flows page for the appropriate flows.](docs.nvidia.com/nemo/guardrails/latest/colang-2/language-reference/csl/timing.html.md)
- [When you’re building a Colang‑based chatbot that needs core, timing, LLM, avatar, guardrail, or attention flows, the LLM should consult this page to learn how to import the corresponding library (e.g., `import llm`).](docs.nvidia.com/nemo/guardrails/latest/colang-2/language-reference/the-standard-library.html.md)
- [When you need to configure or debug variable scoping, expression evaluation, or accessing flow, action, or event member variables, the LLM should refer to this page.](docs.nvidia.com/nemo/guardrails/latest/colang-2/language-reference/working-with-variables-and-expressions.html.md)

## LangChain Integration
- [Read this page when you’re building or debugging a LangChain application that needs to enforce NVIDIA NeMo Guardrails—e.g., adding guardrails to a chain, wrapping a RunnableRails pipeline, or integrating Guardrails with tools or LangSmith.](docs.nvidia.com/nemo/guardrails/latest/user-guides/langchain/index.html.md)
- [Read this page when a developer is building a LangChain pipeline that must enforce safety, compliance, or policy checks, or when they need to debug and verify that guardrails correctly intercept unsafe outputs during chain execution.](docs.nvidia.com/nemo/guardrails/latest/user-guides/langchain/chain-with-guardrails/index.html.md)
- [<good-format>
When integrating NeMo Guardrails into a LangChain chain for a production chatbot to enforce compliance rules and prevent jailbreak attempts.
<good-format>](docs.nvidia.com/nemo/guardrails/latest/user-guides/langchain/chain-with-guardrails/README.html.md)
- [Use this guide when you need to embed NeMo Guardrails into a LangChain workflow—either by wrapping a chain with `RunnableRails` to enforce safety checks or by registering a chain as an action inside a Guardrails config. Also refer to it when setting up LangSmith tracing to capture execution traces from your LangChain applications.](docs.nvidia.com/nemo/guardrails/latest/user-guides/langchain/langchain-integration.html.md)
- [LLM should consult this guide when it’s building a LangChain application that requires safety, compliance, or tool‑powered logic—e.g., to wrap an LLM or a chain with NeMo Guardrails, customize `input_key`/`output_key`, disable prompt passthrough, or embed tools such as a calculator in a dialogue rail.](docs.nvidia.com/nemo/guardrails/latest/user-guides/langchain/runnable-rails.html.md)
- [Use this guide when you need to add custom validation logic to a Guardrails workflow—e.g., when you want the LLM to check user messages for prohibited keywords or patterns before generating a response, or to plug in a custom API or vector‑store check as an action.](docs.nvidia.com/nemo/guardrails/latest/user-guides/langchain/runnable-as-action/README.html.md)

## Production & Deployment
- [When you’re setting up or troubleshooting a Guardrails‑enabled deployment, the LLM should consult this page to understand the guardrails workflow, how it hooks into the LLM, and the required server configuration. This ensures that both the interaction layer and backend are correctly aligned to enforce safety constraints.](docs.nvidia.com/nemo/guardrails/latest/architecture/index.html.md)
- [If you’re planning to integrate NeMo Guardrails with an LLM, decide whether to deploy it in production, or build a question‑answering knowledge base, the FAQ explains the current alpha status, supported LLM providers, and how to adapt example configs for robustness.](docs.nvidia.com/nemo/guardrails/latest/faqs.html.md)
- [Read this page when you need to run, debug, or evaluate Guardrails … (release notes).](docs.nvidia.com/nemo/guardrails/latest/release-notes.html.md)

## Security & Red Teaming
- [null](docs.nvidia.com/nemo/guardrails/latest/security/red-teaming.html.md)