> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/guardrails/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/guardrails/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/guardrails/_mcp/server.

# nemoguardrails.testing.fake_model

Framework-agnostic fake LLM model for testing guardrails configurations.

This module exposes :class:`FakeLLMModel`, a lightweight implementation of the
`LLMModel` protocol used by NeMo Guardrails. It is intended for use in tests
where a deterministic, scripted set of responses is preferable to calling out
to a real model provider.

## Module Contents

### Classes

| Name                                                              | Description                                                                  |
| ----------------------------------------------------------------- | ---------------------------------------------------------------------------- |
| [`FakeLLMModel`](#nemoguardrails-testing-fake_model-FakeLLMModel) | Framework-agnostic fake LLM for testing. Implements the `LLMModel` protocol. |

### API

```python
class nemoguardrails.testing.fake_model.FakeLLMModel(
    responses: typing.Optional[typing.List[str]] = None,
    llm_responses: typing.Optional[typing.List[nemoguardrails.types.LLMResponse]] = None,
    llm_exception: typing.Optional[Exception] = None,
    token_usage: typing.Optional[typing.List[typing.Dict[str, int]]] = None,
    should_return_token_usage: bool = False
)
```

Framework-agnostic fake LLM for testing. Implements the `LLMModel` protocol.

**Parameters:**

A list of plain string responses. Each call to
:meth:`generate_async` (or :meth:`stream_async`) consumes the next
entry. Mutually exclusive with `llm_responses`.

A list of :class:`~nemoguardrails.types.LLMResponse`
objects. Useful when tool calls or structured fields need to be
asserted. Takes precedence over `responses` when provided.

An exception instance to raise on every generation,
useful for exercising error-handling paths.

Optional list of token usage dictionaries (one per
response). Each entry may include `prompt_tokens`,
`completion_tokens` and `total_tokens` keys.

When `True`, populate
:attr:`LLMResponse.usage` from `token_usage`.

```python
nemoguardrails.testing.fake_model.FakeLLMModel._get_usage() -> typing.Optional[nemoguardrails.types.UsageInfo]
```

```python
nemoguardrails.testing.fake_model.FakeLLMModel._next_response() -> nemoguardrails.types.LLMResponse
```

```python
nemoguardrails.testing.fake_model.FakeLLMModel.generate_async(
    prompt,
    stop = None,
    kwargs = {}
) -> nemoguardrails.types.LLMResponse
```

async

```python
nemoguardrails.testing.fake_model.FakeLLMModel.stream_async(
    prompt,
    stop = None,
    kwargs = {}
)
```

async