Polygraf Integration
Polygraf offers a state-of-the-art PII detection and masking API designed to help identify and protect sensitive information in your text. This integration enables NeMo Guardrails to use Polygraf for PII detection and masking in input, output, and retrieval flows.
Setup
-
Obtain a Polygraf API key and set it as an environment variable so the integration can authenticate cloud requests:
-
Pick the endpoint that matches your deployment:
- For Polygraf cloud, use
https://governance.api.polygraf.ai/gcp/pii/text-detect. - For self-hosted deployments, set this to your service endpoint (the local default is typically
http://localhost:8000/v1/pii/text-detect).
- For Polygraf cloud, use
-
Update your
config.ymlfile to include the Polygraf settings.PII detection config
The detection flow blocks the input, output, or retrieval text if PII is detected and an entity match is configured.
PII masking config
The masking flow replaces detected PII spans with
<EntityType>placeholders. For example,Hi John, my email is john@example.combecomesHi <Person>, my email is <Email>.
Retrieval Flows
To detect or mask PII in retrieved documents, configure the retrieval entities and enable the retrieval flow variant:
Usage
Once configured, the Polygraf integration can automatically:
- Detect or mask PII in user inputs before they are processed by the LLM.
- Detect or mask PII in LLM outputs before they are sent back to the user.
- Detect or mask PII in retrieved chunks before they are sent to the LLM.
The polygraf_detect_pii and polygraf_mask_pii actions in nemoguardrails/library/polygraf/actions.py handle the PII detection and masking processes, respectively.
Entity Types
You can customize the PII handling behavior by modifying the entities lists under input, output, and retrieval. Entity labels should match the labels returned by your Polygraf deployment. Common entities include:
PersonEmailPhone
For a complete list of supported entity types, refer to the Polygraf documentation.
Failure Handling
The integration is fail-closed: a Polygraf failure must not allow potentially-PII text to pass through the rail.
- Provider/network failure (timeout, DNS, TLS, non-200 response, invalid JSON, malformed response shape). The underlying HTTP helper raises
ValueError, which the actions catch internally.polygraf detect pii on …returnsTrue(the rail blocks the message).polygraf mask pii on …replaces the entire payload with the<REDACTED>placeholder. The actions log a structural warning (failure category only); request bodies, response bodies, and entity values are never logged. - Malformed entity span (Polygraf returns an entry without a known
entity_type, or with non-integer offsets, or with offsets outside0 <= start < end <= len(text)). The actions also fail closed: detection blocks the message and masking redacts the whole payload, rather than silently skipping the malformed span and forwarding the rest. - Default timeout:
30seconds per call. Slow or unreachable endpoints cannot hang the rail pipeline. - Missing API key: if
POLYGRAF_API_KEYis not set, the integration logs a warning since cloud endpoints typically reject unauthenticated requests, and proceeds to call the endpoint without anAuthorizationheader.