nemoguardrails.evaluate.cli.evaluate

View as Markdown

Module Contents

Functions

NameDescription
fact_checkingEvaluate the performance of the fact-checking rails defined in a Guardrails application.
hallucinationEvaluate the performance of the hallucination rails defined in a Guardrails application.
moderationEvaluate the performance of the moderation rails defined in a Guardrails application.
topicalEvaluates the performance of the topical rails defined in a Guardrails application.

Data

app

API

nemoguardrails.evaluate.cli.evaluate.fact_checking(
config: str = typer.Option(help='The path...,
dataset_path: str = typer.Option('nemoguardrail...,
num_samples: int = typer.Option(50, help='Numb...,
create_negatives: bool = typer.Option(True, help='cr...,
output_dir: str = typer.Option('eval_outputs/...,
write_outputs: bool = typer.Option(True, help='Wr...
)

Evaluate the performance of the fact-checking rails defined in a Guardrails application.

This command computes accuracy for fact-checking. Negatives can be created synthetically by an LLM that acts as an adversary and modifies the answer to make it incorrect.

Parameters:

config
strDefaults to typer.Option(help='The path to the guardrails config.', default='config')

The path to the guardrails config. Defaults to “config”.

dataset_path
strDefaults to typer.Option('nemoguardrails/evaluate/data/factchecking/sample.json', help='Path to the folder containing the dataset')

Path to the folder containing the dataset. Defaults to “nemoguardrails/evaluate/data/factchecking/sample.json”.

num_samples
intDefaults to typer.Option(50, help='Number of samples to be evaluated')

Number of samples to be evaluated. Defaults to 50.

create_negatives
boolDefaults to typer.Option(True, help='create synthetic negative samples')

Create synthetic negative samples. Defaults to True.

output_dir
strDefaults to typer.Option('eval_outputs/factchecking', help='Path to the folder where the outputs will be written')

Path to the folder where the outputs will be written. Defaults to “eval_outputs/factchecking”.

write_outputs
boolDefaults to typer.Option(True, help='Write outputs to the output directory')

Write outputs to the output directory. Defaults to True.

nemoguardrails.evaluate.cli.evaluate.hallucination(
config: str = typer.Option(help='The path...,
dataset_path: str = typer.Option('nemoguardrail...,
num_samples: int = typer.Option(50, help='Numb...,
output_dir: str = typer.Option('eval_outputs/...,
write_outputs: bool = typer.Option(True, help='Wr...
)

Evaluate the performance of the hallucination rails defined in a Guardrails application.

This command computes accuracy for hallucination detection.

Parameters:

config
strDefaults to typer.Option(help='The path to the guardrails config.', default='config')

The path to the guardrails config. Defaults to “config”.

dataset_path
strDefaults to typer.Option('nemoguardrails/evaluate/data/hallucination/sample.txt', help='Dataset path')

Dataset path. Defaults to “nemoguardrails/evaluate/data/hallucination/sample.txt”.

num_samples
intDefaults to typer.Option(50, help='Number of samples to evaluate')

Number of samples to evaluate. Defaults to 50.

output_dir
strDefaults to typer.Option('eval_outputs/hallucination', help='Output directory')

Output directory. Defaults to “eval_outputs/hallucination”.

write_outputs
boolDefaults to typer.Option(True, help='Write outputs to file')

Write outputs to file. Defaults to True.

nemoguardrails.evaluate.cli.evaluate.moderation(
config: str = typer.Option(help='The path...,
dataset_path: str = typer.Option('nemoguardrail...,
num_samples: int = typer.Option(50, help='Numb...,
check_input: bool = typer.Option(True, help='Ev...,
check_output: bool = typer.Option(True, help='Ev...,
output_dir: str = typer.Option('eval_outputs/...,
write_outputs: bool = typer.Option(True, help='Wr...,
split: str = typer.Option('harmful', hel...
)

Evaluate the performance of the moderation rails defined in a Guardrails application.

This command computes accuracy for jailbreak detection and output moderation.

Parameters:

config
strDefaults to typer.Option(help='The path to the guardrails config.', default='config')

The path to the guardrails config. Defaults to “config”.

dataset_path
strDefaults to typer.Option('nemoguardrails/evaluate/data/moderation/harmful.txt', help='Path to dataset containing prompts')

Path to the dataset containing prompts. Defaults to “nemoguardrails/evaluate/data/moderation/harmful.txt”.

num_samples
intDefaults to typer.Option(50, help='Number of samples to evaluate')

Number of samples to evaluate. Defaults to 50.

check_input
boolDefaults to typer.Option(True, help='Evaluate input self-check rail')

Evaluate the input self-check rail. Defaults to True.

check_output
boolDefaults to typer.Option(True, help='Evaluate output self-check rail')

Evaluate the output self-check rail. Defaults to True.

output_dir
strDefaults to typer.Option('eval_outputs/moderation', help='Output directory for predictions')

Output directory for predictions. Defaults to “eval_outputs/moderation”.

write_outputs
boolDefaults to typer.Option(True, help='Write outputs to file')

Write outputs to file. Defaults to True.

split
strDefaults to typer.Option('harmful', help='Whether prompts are harmful or helpful')

Whether prompts are harmful or helpful. Defaults to “harmful”.

nemoguardrails.evaluate.cli.evaluate.topical(
config: typing.List[str] = typer.Option(default=[''], ...,
verbose: bool = typer.Option(default=False,...,
test_percentage: float = typer.Option(default=0.3, h...,
max_tests_intent: int = typer.Option(default=3, hel...,
max_samples_intent: int = typer.Option(default=0, hel...,
results_frequency: int = typer.Option(default=10, he...,
sim_threshold: float = typer.Option(default=0.0, h...,
random_seed: int = typer.Option(default=None, ...,
output_dir: str = typer.Option(default=None, ...
)

Evaluates the performance of the topical rails defined in a Guardrails application. Computes accuracy for canonical form detection, next step generation, and next bot message generation. Only a single Guardrails application can be specified in the config option.

Parameters:

config
List[str]Defaults to typer.Option(default=[''], exists=True, help='Path to a directory containing configuration files of the Guardrails application for evaluation. Can also point to a single configuration file.')

Path to a directory containing configuration files of the Guardrails application for evaluation. Can also point to a single configuration file. Defaults to [""].

verbose
boolDefaults to typer.Option(default=False, help='If the chat should be verbose and output the prompts.')

If the chat should be verbose and output the prompts. Defaults to False.

test_percentage
floatDefaults to typer.Option(default=0.3, help='Percentage of the samples for an intent to be used as test set.')

Percentage of the samples for an intent to be used as test set. Defaults to 0.3.

max_tests_intent
intDefaults to typer.Option(default=3, help='Maximum number of test samples per intent to be used when testing. If value is 0, no limit is used.')

Maximum number of test samples per intent to be used when testing. If value is 0, no limit is used. Defaults to 3.

max_samples_intent
intDefaults to typer.Option(default=0, help='Maximum number of samples per intent indexed in vector database. If value is 0, all samples are used.')

Maximum number of samples per intent indexed in vector database. If value is 0, all samples are used. Defaults to 0.

results_frequency
intDefaults to typer.Option(default=10, help='Print evaluation intermediate results using this step.')

Print evaluation intermediate results using this step. Defaults to 10.

sim_threshold
floatDefaults to typer.Option(default=0.0, help='Minimum similarity score to select the intent when exact match fails.')

Minimum similarity score to select the intent when exact match fails. Defaults to 0.0.

random_seed
intDefaults to typer.Option(default=None, help='Random seed used by the evaluation.')

Random seed used by the evaluation. Defaults to None.

output_dir
strDefaults to typer.Option(default=None, help='Output directory for predictions.')

Output directory for predictions. Defaults to None.

nemoguardrails.evaluate.cli.evaluate.app = typer.Typer()