> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/guardrails/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/guardrails/_mcp/server.

# nemoguardrails.evaluate.cli.evaluate

## Module Contents

### Functions

| Name                                                                   | Description                                                                              |
| ---------------------------------------------------------------------- | ---------------------------------------------------------------------------------------- |
| [`fact_checking`](#nemoguardrails-evaluate-cli-evaluate-fact_checking) | Evaluate the performance of the fact-checking rails defined in a Guardrails application. |
| [`hallucination`](#nemoguardrails-evaluate-cli-evaluate-hallucination) | Evaluate the performance of the hallucination rails defined in a Guardrails application. |
| [`moderation`](#nemoguardrails-evaluate-cli-evaluate-moderation)       | Evaluate the performance of the moderation rails defined in a Guardrails application.    |
| [`topical`](#nemoguardrails-evaluate-cli-evaluate-topical)             | Evaluates the performance of the topical rails defined in a Guardrails application.      |

### Data

[`app`](#nemoguardrails-evaluate-cli-evaluate-app)

### API

```python
nemoguardrails.evaluate.cli.evaluate.fact_checking(
    config: str = typer.Option(help='The path...,
    dataset_path: str = typer.Option('nemoguardrail...,
    num_samples: int = typer.Option(50, help='Numb...,
    create_negatives: bool = typer.Option(True, help='cr...,
    output_dir: str = typer.Option('eval_outputs/...,
    write_outputs: bool = typer.Option(True, help='Wr...
)
```

Evaluate the performance of the fact-checking rails defined in a Guardrails application.

This command computes accuracy for fact-checking.
Negatives can be created synthetically by an LLM that acts as an adversary and modifies the answer to make it incorrect.

**Parameters:**

The path to the guardrails config. Defaults to "config".

Path to the folder containing the dataset. Defaults to "nemoguardrails/evaluate/data/factchecking/sample.json".

Number of samples to be evaluated. Defaults to 50.

Create synthetic negative samples. Defaults to True.

Path to the folder where the outputs will be written. Defaults to "eval\_outputs/factchecking".

Write outputs to the output directory. Defaults to True.

```python
nemoguardrails.evaluate.cli.evaluate.hallucination(
    config: str = typer.Option(help='The path...,
    dataset_path: str = typer.Option('nemoguardrail...,
    num_samples: int = typer.Option(50, help='Numb...,
    output_dir: str = typer.Option('eval_outputs/...,
    write_outputs: bool = typer.Option(True, help='Wr...
)
```

Evaluate the performance of the hallucination rails defined in a Guardrails application.

This command computes accuracy for hallucination detection.

**Parameters:**

The path to the guardrails config. Defaults to "config".

Dataset path. Defaults to "nemoguardrails/evaluate/data/hallucination/sample.txt".

Number of samples to evaluate. Defaults to 50.

Output directory. Defaults to "eval\_outputs/hallucination".

Write outputs to file. Defaults to True.

```python
nemoguardrails.evaluate.cli.evaluate.moderation(
    config: str = typer.Option(help='The path...,
    dataset_path: str = typer.Option('nemoguardrail...,
    num_samples: int = typer.Option(50, help='Numb...,
    check_input: bool = typer.Option(True, help='Ev...,
    check_output: bool = typer.Option(True, help='Ev...,
    output_dir: str = typer.Option('eval_outputs/...,
    write_outputs: bool = typer.Option(True, help='Wr...,
    split: str = typer.Option('harmful', hel...
)
```

Evaluate the performance of the moderation rails defined in a Guardrails application.

This command computes accuracy for jailbreak detection and output moderation.

**Parameters:**

The path to the guardrails config. Defaults to "config".

Path to the dataset containing prompts.
Defaults to "nemoguardrails/evaluate/data/moderation/harmful.txt".

Number of samples to evaluate. Defaults to 50.

Evaluate the input self-check rail. Defaults to True.

Evaluate the output self-check rail. Defaults to True.

Output directory for predictions.
Defaults to "eval\_outputs/moderation".

Write outputs to file. Defaults to True.

Whether prompts are harmful or helpful. Defaults to "harmful".

```python
nemoguardrails.evaluate.cli.evaluate.topical(
    config: typing.List[str] = typer.Option(default=[''], ...,
    verbose: bool = typer.Option(default=False,...,
    test_percentage: float = typer.Option(default=0.3, h...,
    max_tests_intent: int = typer.Option(default=3, hel...,
    max_samples_intent: int = typer.Option(default=0, hel...,
    results_frequency: int = typer.Option(default=10, he...,
    sim_threshold: float = typer.Option(default=0.0, h...,
    random_seed: int = typer.Option(default=None, ...,
    output_dir: str = typer.Option(default=None, ...
)
```

Evaluates the performance of the topical rails defined in a Guardrails application.
Computes accuracy for canonical form detection, next step generation, and next bot message generation.
Only a single Guardrails application can be specified in the config option.

**Parameters:**

Path to a directory containing configuration files of the Guardrails application for evaluation.
Can also point to a single configuration file. Defaults to \[""].

If the chat should be verbose and output the prompts. Defaults to False.

Percentage of the samples for an intent to be used as test set. Defaults to 0.3.

Maximum number of test samples per intent to be used when testing.
If value is 0, no limit is used. Defaults to 3.

Maximum number of samples per intent indexed in vector database.
If value is 0, all samples are used. Defaults to 0.

Print evaluation intermediate results using this step. Defaults to 10.

Minimum similarity score to select the intent when exact match fails. Defaults to 0.0.

Random seed used by the evaluation. Defaults to None.

Output directory for predictions. Defaults to None.

```python
nemoguardrails.evaluate.cli.evaluate.app = typer.Typer()
```