> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/guardrails/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/guardrails/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/guardrails/_mcp/server.

# Custom LLM Frameworks for the NVIDIA NeMo Guardrails library

> Replace the LLM framework layer to connect LiteLLM, an in-house orchestrator, or any non-default LLM stack to the NVIDIA NeMo Guardrails library.

The NVIDIA NeMo Guardrails library has two layers of LLM extensibility: providers and frameworks. Most users only need the provider layer. This guide is for the smaller set of cases that need to replace the framework layer itself.

## The Two-Layer Model

```text
Framework Layer (system-wide, swappable)
|-- DefaultFramework (built-in, all OpenAI-compatible HTTP)
|     |-- openai (provider)
|     |-- nim (provider)
|     |-- ollama (provider)
|     '-- <your custom provider>
|-- LangChainFramework (built-in, opt-in)
|     '-- LangChain providers
'-- <YourCustomFramework>
      '-- <your providers>
```

A *provider* is a name a user types as `engine:` in `config.yml`: a label your framework dispatches on. In `DefaultFramework`, `openai`, `nim`, and `ollama` are provider names that all dispatch to the same `OpenAIChatModel` runtime. They differ only in default base URLs and small per-provider conventions. In `LangChainFramework`, each provider name dispatches to its own LangChain class. Your framework decides whether multiple provider names share one runtime or each name has its own. Adding a provider is the right move when you want to plug in one new backend and the surrounding framework's behavior is fine. For details, refer to [Custom LLM Providers](/configure-guardrails/custom-initialization/custom-llm-providers) and [Custom LLM Model](/configure-guardrails/custom-initialization/custom-llm-model).

A *framework* owns the entire LLM stack: how models are constructed, how providers are looked up, and how resources are released at shutdown. Adding a framework is the right move when you want to replace the entire stack (for example, route everything through LiteLLM, a proprietary in-house orchestrator, or a service mesh).

| Decision                                                           | Pick a provider                                                | Pick a framework                                          |
| ------------------------------------------------------------------ | -------------------------------------------------------------- | --------------------------------------------------------- |
| You need one new engine alongside the existing ones                | Yes                                                            | No                                                        |
| You have one new HTTP backend with custom auth                     | Yes (subclass `OpenAICompatibleClient` if it is OpenAI-shaped) | No                                                        |
| You want all engines to flow through your own gateway              | No                                                             | Yes                                                       |
| You want to disable LangChain entirely and replace it with LiteLLM | No                                                             | Yes                                                       |
| You want per-call observability hooks across every model           | Maybe                                                          | Yes if you also need to control construction and shutdown |

In practice almost every customization is a provider. A custom framework is reserved for the cases where you are replacing more than one engine and you need shared lifecycle management across them.

## The LLMFramework Contract

The protocol is `nemoguardrails.types.LLMFramework` and is `@runtime_checkable`, so callers can verify a framework with `isinstance(instance, LLMFramework)`. As a Python `Protocol`, it expresses a contract. Nothing prevents you from passing an object that duck-types most of it, but the rest of the NVIDIA NeMo Guardrails library assumes both invariants below hold:

1. The registered object structurally matches the `LLMFramework` protocol (the four methods and their signatures listed below).
2. Its `reset` attribute is an `async` coroutine function. The registry awaits it directly during test teardown.

A custom framework implements four methods.

```python
from typing import Any, Dict, List, Optional

from nemoguardrails import LLMModel

class MyFramework:
    def create_model(
        self,
        model_name: str,
        provider_name: str,
        model_kwargs: Optional[Dict[str, Any]] = None,
    ) -> LLMModel: ...

    def register_provider(self, name: str, provider_cls: Any) -> None: ...

    def get_provider_names(self) -> List[str]: ...

    async def reset(self) -> None: ...
```

### `create_model`

Called once per `models:` entry in `config.yml` when `LLMRails` builds its task models. `model_name` is the value of `model:`, `provider_name` is the value of `engine:`, and `model_kwargs` carries everything from the entry's `parameters` block plus a few platform keys like `mode`. Your framework decides what `provider_name` means. Typically, you use it to dispatch to a specific `LLMModel` class or to pick provider-specific defaults. Return any object that implements `LLMModel`. For details, refer to [Custom LLM Model](/configure-guardrails/custom-initialization/custom-llm-model).

The framework owns construction. It can cache and reuse expensive resources, such as HTTP clients, gRPC channels, and auth tokens. It can also inject defaults for headers, timeouts, and retries, or short-circuit on a registered custom provider. Review `DefaultFramework` and `LangChainFramework` for two contrasting implementations.

### `register_provider`

Called by user code (usually from a `config.py`) to add a custom class your framework should dispatch to. Implementations typically just record the class in an in-memory dict; `create_model` then checks that dict before falling back to its built-in dispatch.

### `get_provider_names`

Returns the list of provider names this framework knows about, including built-ins and anything registered at runtime. Used by tooling (`nemoguardrails find_providers`) and for debugging.

### `reset`

`reset` is called at process or test boundaries to release framework-owned resources. It must:

* Close any pooled HTTP clients, gRPC channels, file handles, or database connections.
* Clear any registered-provider state if you want a clean slate (some frameworks like `DefaultFramework` separate `aclose` from `clear_providers` and call both from `reset`; others may want to keep registrations).
* Be idempotent: calling `reset` twice in a row must not raise.
* Be safe to call from a running event loop. The registry awaits it directly with `_areset_frameworks`.

After `reset`, the instance must remain usable. New resources are constructed lazily on the next `create_model` call.

Today `reset` is invoked only by the test suite; the runtime does not call it on `nemoguardrails server` shutdown. Implement it for test isolation, not for production cleanup.

## Minimal Working Example

The example below is fully self-contained and runs end-to-end without any
external dependencies. The model is an "echo" implementation that returns a
fixed string for every prompt. Swap in real HTTP calls or SDK invocations after
you verify that the registration and dispatch path works. Refer to
`custom-llm-model.md` for the canonical `httpx`-based pattern.

Create a config directory `my_config/` next to your smoke-test script with
two files:

```text
my_config/
├── config.py    # framework + LLMModel definitions, registered at import time
└── config.yml   # references the framework's engine name
```

`my_config/config.py`:

```python
from typing import Any, Dict, List, Optional

from nemoguardrails import LLMModel, LLMResponse, LLMResponseChunk, register_framework, set_default_framework

class EchoLLMModel:
    """Returns a canned response. Useful as a skeleton or in offline tests."""

    def __init__(self, model: str, response: str = "echo", **kwargs: Any):
        self._model = model
        self._response = response
        self._default_kwargs = kwargs

    @property
    def model_name(self) -> str:
        return self._model

    @property
    def provider_name(self) -> Optional[str]:
        return "my_engine"

    @property
    def provider_url(self) -> Optional[str]:
        return None

    async def generate_async(self, prompt, *, stop=None, **kwargs) -> LLMResponse:
        return LLMResponse(content=self._response)

    async def stream_async(self, prompt, *, stop=None, **kwargs):
        yield LLMResponseChunk(delta_content=self._response)
        yield LLMResponseChunk(finish_reason="stop")

class MyFramework:
    def __init__(self):
        self._providers: Dict[str, Any] = {}

    def create_model(
        self,
        model_name: str,
        provider_name: str,
        model_kwargs: Optional[Dict[str, Any]] = None,
    ) -> LLMModel:
        kwargs = dict(model_kwargs) if model_kwargs else {}
        kwargs.pop("mode", None)
        if provider_name in self._providers:
            return self._providers[provider_name](model=model_name, **kwargs)
        return EchoLLMModel(model=model_name, **kwargs)

    def register_provider(self, name: str, provider_cls: Any) -> None:
        self._providers[name] = provider_cls

    def get_provider_names(self) -> List[str]:
        return sorted({"my_engine", *self._providers})

    async def reset(self) -> None:
        # Release any framework-scoped resources you hold (HTTP clients,
        # connection pools, caches). The echo framework only owns a registry
        # dict, so clearing it is sufficient. A real framework typically
        # closes a shared `httpx.AsyncClient` here.
        self._providers.clear()

register_framework("my", MyFramework())
set_default_framework("my")
```

`my_config/config.yml`:

```yaml
models:
  - type: main
    engine: my_engine
    model: echo
    parameters:
      response: "echo from echo"
```

### Trying it out

Run a smoke test from the parent directory of `my_config/`. `LLMRails`
imports `config.py` automatically, which triggers the `register_framework`
and `set_default_framework` calls at the bottom of that file:

```python
# smoke.py (next to my_config/)
from nemoguardrails import LLMRails, RailsConfig

config = RailsConfig.from_path("./my_config")
app = LLMRails(config)

result = app.generate(messages=[{"role": "user", "content": "hi"}])
print(result["content"])  # -> echo from echo
```

If the smoke test prints `echo from echo`, the framework is wired up. From
there, replace `EchoLLMModel.generate_async` and `stream_async` with real
backend calls. Refer to `custom-llm-model.md`.

After `register_framework("my", MyFramework())`, the framework is selectable in three ways:

1. Process-wide default at import time. Set the environment variable before importing the NVIDIA NeMo Guardrails library:

   ```bash
   export NEMOGUARDRAILS_LLM_FRAMEWORK=my
   ```

   The registry reads `NEMOGUARDRAILS_LLM_FRAMEWORK` at module load and uses it as the active framework name.
2. Programmatic flip in `config.py`. Call `set_default_framework("my")` after registering. All subsequent `LLMRails` constructions use it.
3. Targeted dispatch. If you want different frameworks for different model entries, route directly with `framework.create_model` in your own initialization code (advanced; not the standard path).

`config.yml` entries do not name the framework; they name a provider. The framework is implicit in whichever one is active.

```yaml
models:
  - type: main
    engine: my_engine
    model: my-flagship-model
    parameters:
      temperature: 0.2
```

## Reference Implementations

Review these production-grade frameworks:

* [`nemoguardrails/llm/frameworks/default.py`](https://github.com/NVIDIA-NeMo/Guardrails/blob/develop/nemoguardrails/llm/frameworks/default.py): `DefaultFramework`. Pools `OpenAICompatibleClient` instances keyed on `(base_url, api_key, timeouts, headers, query)`. Splits lifecycle into `aclose` (HTTP teardown), `clear_providers` (registry teardown), and `reset` (both, used in tests).
* [`nemoguardrails/integrations/langchain/llm_adapter.py`](https://github.com/NVIDIA-NeMo/Guardrails/blob/develop/nemoguardrails/integrations/langchain/llm_adapter.py): `LangChainFramework`. Defers to `nemoguardrails.integrations.langchain.providers` for registration, calls `init_langchain_model` for construction, wraps the result in `LangChainLLMAdapter`. Has a no-op `reset` because the LangChain side has no pooled state of its own.
* [`nemoguardrails/llm/frameworks/registry.py`](https://github.com/NVIDIA-NeMo/Guardrails/blob/develop/nemoguardrails/llm/frameworks/registry.py): `register_framework`, `get_framework`, `set_default_framework`, `get_default_framework`, `_areset_frameworks`. Read this to understand the environment variable, lazy lookup, and registration behavior.

## Failure Modes

### Registering a provider before any framework is active

`register_provider` from `nemoguardrails.llm.providers` resolves the active framework with `get_default_framework()` and calls `framework.register_provider` on it. The registry has a built-in `default` framework that is constructed lazily on first access, so this almost always works without explicit setup. The failure mode appears only when the user sets `NEMOGUARDRAILS_LLM_FRAMEWORK` to a name that has not been registered yet:

```bash
export NEMOGUARDRAILS_LLM_FRAMEWORK=my
```

```python
# config.py runs BEFORE `register_framework("my", ...)`
from nemoguardrails.llm.providers import register_provider

register_provider("echo", EchoLLMModel)
# KeyError: Unknown framework 'my'. Available frameworks: []
```

The fix is simple: register the framework before any provider, or keep `NEMOGUARDRAILS_LLM_FRAMEWORK` unset until after `register_framework` has run.

### Unknown framework on activation

```python
set_default_framework("typo")
# KeyError: Unknown framework 'typo'. Register it first or use one of: ['default', 'langchain']
```

The two built-in names always appear in this hint because the registry knows them by default. If you are working with only your own framework, register it first then call `set_default_framework`.

## Best Practices

1. Treat `reset` as a hard contract, not a hint. Test it. Pooled HTTP connections that survive across tests cause surprising flakes elsewhere.
2. Prefer composition over inheritance. `MyFramework` does not need to subclass `DefaultFramework`. The protocol is small enough to implement from scratch.
3. Pool HTTP clients on the framework when multiple `models:` entries share a backend. `create_model` runs once per entry at `LLMRails` startup, so a model can safely build its own client. When two entries point at the same backend, only the framework can deduplicate them. `DefaultFramework._get_or_create_client` keys clients by `(base_url, api_key, ...)` for exactly this case.
4. Do not import LangChain in a default-framework-style implementation. The whole point of swapping the framework layer is to avoid pulling in dependencies you do not need. Keep your imports tight.
5. Document your framework's provider taxonomy. `get_provider_names` is what `nemoguardrails find_providers` shows users.

## Related Topics

* [Custom LLM Model](/configure-guardrails/custom-initialization/custom-llm-model) - Implement the `LLMModel` protocol that your framework constructs.
* [Custom LLM Providers](/configure-guardrails/custom-initialization/custom-llm-providers) - LangChain `BaseLLM`/`BaseChatModel` providers (uses `engine: langchain`).
* [Init Function](/configure-guardrails/custom-initialization/init-function) - Where `register_framework` and `set_default_framework` calls usually go.
* [Configuration Reference](/configure-guardrails/configuration-reference) - `config.yml` schema and the `engine`, `model`, and `parameters` fields.