nemoguardrails.evaluate.cli.evaluate | NVIDIA NeMo Guardrails Library Developer Guide

Module Contents

Functions

Name	Description
`fact_checking`	Evaluate the performance of the fact-checking rails defined in a Guardrails application.
`hallucination`	Evaluate the performance of the hallucination rails defined in a Guardrails application.
`moderation`	Evaluate the performance of the moderation rails defined in a Guardrails application.
`topical`	Evaluates the performance of the topical rails defined in a Guardrails application.

Data

app

API

nemoguardrails.evaluate.cli.evaluate.fact_checking(
    config: str = typer.Option(help='The path...,
    dataset_path: str = typer.Option('nemoguardrail...,
    num_samples: int = typer.Option(50, help='Numb...,
    create_negatives: bool = typer.Option(True, help='cr...,
    output_dir: str = typer.Option('eval_outputs/...,
    write_outputs: bool = typer.Option(True, help='Wr...
)

Evaluate the performance of the fact-checking rails defined in a Guardrails application.

This command computes accuracy for fact-checking. Negatives can be created synthetically by an LLM that acts as an adversary and modifies the answer to make it incorrect.

Parameters:

config

strDefaults to typer.Option(help='The path to the guardrails config.', default='config')

The path to the guardrails config. Defaults to “config”.

dataset_path

strDefaults to typer.Option('nemoguardrails/evaluate/data/factchecking/sample.json', help='Path to the folder containing the dataset')

Path to the folder containing the dataset. Defaults to “nemoguardrails/evaluate/data/factchecking/sample.json”.

num_samples

intDefaults to typer.Option(50, help='Number of samples to be evaluated')

Number of samples to be evaluated. Defaults to 50.

create_negatives

boolDefaults to typer.Option(True, help='create synthetic negative samples')

Create synthetic negative samples. Defaults to True.

output_dir

strDefaults to typer.Option('eval_outputs/factchecking', help='Path to the folder where the outputs will be written')

Path to the folder where the outputs will be written. Defaults to “eval_outputs/factchecking”.

write_outputs

boolDefaults to typer.Option(True, help='Write outputs to the output directory')

Write outputs to the output directory. Defaults to True.

nemoguardrails.evaluate.cli.evaluate.hallucination(
    config: str = typer.Option(help='The path...,
    dataset_path: str = typer.Option('nemoguardrail...,
    num_samples: int = typer.Option(50, help='Numb...,
    output_dir: str = typer.Option('eval_outputs/...,
    write_outputs: bool = typer.Option(True, help='Wr...
)

Evaluate the performance of the hallucination rails defined in a Guardrails application.

This command computes accuracy for hallucination detection.

Parameters:

config

strDefaults to typer.Option(help='The path to the guardrails config.', default='config')

The path to the guardrails config. Defaults to “config”.

dataset_path

strDefaults to typer.Option('nemoguardrails/evaluate/data/hallucination/sample.txt', help='Dataset path')

Dataset path. Defaults to “nemoguardrails/evaluate/data/hallucination/sample.txt”.

num_samples

intDefaults to typer.Option(50, help='Number of samples to evaluate')

Number of samples to evaluate. Defaults to 50.

output_dir

strDefaults to typer.Option('eval_outputs/hallucination', help='Output directory')

Output directory. Defaults to “eval_outputs/hallucination”.

write_outputs

boolDefaults to typer.Option(True, help='Write outputs to file')

Write outputs to file. Defaults to True.

nemoguardrails.evaluate.cli.evaluate.moderation(
    config: str = typer.Option(help='The path...,
    dataset_path: str = typer.Option('nemoguardrail...,
    num_samples: int = typer.Option(50, help='Numb...,
    check_input: bool = typer.Option(True, help='Ev...,
    check_output: bool = typer.Option(True, help='Ev...,
    output_dir: str = typer.Option('eval_outputs/...,
    write_outputs: bool = typer.Option(True, help='Wr...,
    split: str = typer.Option('harmful', hel...
)

Evaluate the performance of the moderation rails defined in a Guardrails application.

This command computes accuracy for jailbreak detection and output moderation.

Parameters:

config

strDefaults to typer.Option(help='The path to the guardrails config.', default='config')

The path to the guardrails config. Defaults to “config”.

dataset_path

strDefaults to typer.Option('nemoguardrails/evaluate/data/moderation/harmful.txt', help='Path to dataset containing prompts')

Path to the dataset containing prompts. Defaults to “nemoguardrails/evaluate/data/moderation/harmful.txt”.

num_samples

intDefaults to typer.Option(50, help='Number of samples to evaluate')

Number of samples to evaluate. Defaults to 50.

check_input

boolDefaults to typer.Option(True, help='Evaluate input self-check rail')

Evaluate the input self-check rail. Defaults to True.

check_output

boolDefaults to typer.Option(True, help='Evaluate output self-check rail')

Evaluate the output self-check rail. Defaults to True.

output_dir

strDefaults to typer.Option('eval_outputs/moderation', help='Output directory for predictions')

Output directory for predictions. Defaults to “eval_outputs/moderation”.

write_outputs

boolDefaults to typer.Option(True, help='Write outputs to file')

Write outputs to file. Defaults to True.

split

strDefaults to typer.Option('harmful', help='Whether prompts are harmful or helpful')

Whether prompts are harmful or helpful. Defaults to “harmful”.

nemoguardrails.evaluate.cli.evaluate.topical(
    config: typing.List[str] = typer.Option(default=[''], ...,
    verbose: bool = typer.Option(default=False,...,
    test_percentage: float = typer.Option(default=0.3, h...,
    max_tests_intent: int = typer.Option(default=3, hel...,
    max_samples_intent: int = typer.Option(default=0, hel...,
    results_frequency: int = typer.Option(default=10, he...,
    sim_threshold: float = typer.Option(default=0.0, h...,
    random_seed: int = typer.Option(default=None, ...,
    output_dir: str = typer.Option(default=None, ...
)

Evaluates the performance of the topical rails defined in a Guardrails application. Computes accuracy for canonical form detection, next step generation, and next bot message generation. Only a single Guardrails application can be specified in the config option.

Parameters:

config

List[str]Defaults to typer.Option(default=[''], exists=True, help='Path to a directory containing configuration files of the Guardrails application for evaluation. Can also point to a single configuration file.')

Path to a directory containing configuration files of the Guardrails application for evaluation. Can also point to a single configuration file. Defaults to [""].

verbose

boolDefaults to typer.Option(default=False, help='If the chat should be verbose and output the prompts.')

If the chat should be verbose and output the prompts. Defaults to False.

test_percentage

floatDefaults to typer.Option(default=0.3, help='Percentage of the samples for an intent to be used as test set.')

Percentage of the samples for an intent to be used as test set. Defaults to 0.3.

max_tests_intent

intDefaults to typer.Option(default=3, help='Maximum number of test samples per intent to be used when testing. If value is 0, no limit is used.')

Maximum number of test samples per intent to be used when testing. If value is 0, no limit is used. Defaults to 3.

max_samples_intent

intDefaults to typer.Option(default=0, help='Maximum number of samples per intent indexed in vector database. If value is 0, all samples are used.')

Maximum number of samples per intent indexed in vector database. If value is 0, all samples are used. Defaults to 0.

results_frequency

intDefaults to typer.Option(default=10, help='Print evaluation intermediate results using this step.')

Print evaluation intermediate results using this step. Defaults to 10.

sim_threshold

floatDefaults to typer.Option(default=0.0, help='Minimum similarity score to select the intent when exact match fails.')

Minimum similarity score to select the intent when exact match fails. Defaults to 0.0.

random_seed

intDefaults to typer.Option(default=None, help='Random seed used by the evaluation.')

Random seed used by the evaluation. Defaults to None.

output_dir

strDefaults to typer.Option(default=None, help='Output directory for predictions.')

Output directory for predictions. Defaults to None.

nemoguardrails.evaluate.cli.evaluate.app = typer.Typer()