Fiddler Guardrails Integration

View as Markdown

Fiddler Guardrails utilizes Fiddler Trust Models in a specialized low-latency, high-throughput configuration. Guardrails can be used to guard Large Language Model (LLM) applications against user threats, such as prompt injection or harmful and inappropriate content, and LLM hallucinations.

Currently, only Fiddler Trust Models (Faithfulness and Safety) - Fiddler’s in-house, purpose-built SLMs - are available for guardrail use. Future model releases and model updates/improvements will also be available for guardrail use.

Setup

  1. Ensure that you have access to a valid Fiddler environment. To obtain one, please contact us.

  2. Create a new Fiddler environment key and set the FIDDLER_API_KEY environment variable to this key to authenticate into the Fiddler service.

Update your config.yml file to include the following settings:

1rails:
2 config:
3 fiddler:
4 fiddler_endpoint: https://testfiddler.ai # Replace this with your fiddler environment
5 safety_threshold: .2 # Any value greater than this threshold will trigger a violation
6 faithfulness_threshold: .3 # Any value less than this threshold will trigger a violation
7 input:
8 flows:
9 - fiddler user safety
10 output:
11 flows:
12 - fiddler bot safety
13 - fiddler bot faithfulness

Usage

Once configured, the Fiddler Guardrails integration will automatically:

  1. Detect unsafe, offensive, harmful, or jailbreaking inputs into the LLM
  2. Detect unsafe, offensive, or harmful outputs from the LLM
  3. Detect potential hallucinated outputs from the LLM

Customization

You can configure the thresholds for both the safety and hallucination detection using the safety_threshold and the faithfulness_threshold parameters

Error Handling

Fiddler Guardrails will not block inputs in the event of any API failure.

Notes

For more information about Fiddler Guardrails, please visit the Fiddler Guardrails documentation.