GLiNER Integration
GLiNER is a generalist and lightweight model for named entity recognition. NVIDIA GLiNER-PII is an adaptation that detects a wide range of PII categories. This integration enables NeMo Guardrails to use GLiNER-PII for PII detection and masking in input, output, and retrieval flows.
Prerequisites
To use the NVIDIA-hosted NIMs, set your NVIDIA API key:
You can obtain an API key at build.nvidia.com.
You will also need to install the NeMo Guardrails library.
Configure Guardrails
Create a config/ directory with one subdirectory per use case. The examples below cover two flows — PII detection and PII masking — both targeting the NVIDIA-hosted GLiNER-PII and Llama 3.1 8B NIM endpoints.
NeMo Guardrails loads every .yml / .yaml file in the directory passed to --config and merges them into a single configuration. Keeping each flow in its own subdirectory prevents the detection and masking rule sets from colliding; the Chat CLI then selects a flow with --config config/pii_detection or --config config/pii_masking.
nvidia/gliner-pii does not appear in the configs below because it is the default value of rails.config.gliner.model. You only need to set that field explicitly if you want to use a different model.
PII Detection
The detection flow blocks any input or output that contains PII. To implement this flow, save the config below as config/pii_detection/config.yml.
PII Masking
The masking flow replaces detected PII with label placeholders before the LLM processes the text, rather than blocking the request outright. For example, Hi, I am John. My email is john@example.com becomes Hi, I am [FIRST_NAME]. My email is [EMAIL]. To implement this flow, save the config below as config/pii_masking/config.yml.
Run the Guardrails Chat CLI
Start an interactive chat session by pointing --config at the subdirectory for the flow you want to test:
With PII detection enabled, any message containing PII is blocked before reaching the LLM:
With PII masking enabled, PII is replaced in-place before the LLM sees the message:
Use the Python SDK
Deploy NIMs Locally
Running both NIMs locally eliminates network round-trips and removes the NVIDIA API key requirement for inference. You still need an NGC Personal API key — generate one at org.ngc.nvidia.com/setup/api-keys with at least the NGC Catalog service selected — to pull the Docker images and download the model artifacts.
GPU Requirements
The Llama NIM auto-selects the optimal TensorRT-LLM profile (FP16 or INT8) based on available hardware. An A10G (24 GB) or L4 (24 GB) is the practical minimum for comfortable headroom; a T4 (16 GB) may work but is not officially supported.
Note:
nvidia/gliner-piiis pre-GA (1.0.0-rc1). The GPU requirements above are estimates based on the GLiNER encoder-only architecture because NVIDIA has not officially published requirements yet.
Start the Containers
Export your NGC Personal API key as NGC_API_KEY:
Important: The key must start with
nvapi-. You can generate one at org.ngc.nvidia.com/setup/api-keys (select at least the NGC Catalog service) or at build.nvidia.com — both portals issue interchangeablenvapi-keys. Legacy NGC keys (older format, not starting withnvapi-) will cause the GLiNER container to fail during model-artifact download.If you already have an
NVIDIA_API_KEYstarting withnvapi-, you can reuse it:Alternatively, you can pass the key directly at container runtime — this avoids overwriting any existing
NGC_API_KEYin your environment:
On a multi-GPU host, pin each container to a distinct GPU with --gpus '"device=N"' instead of --gpus all. Without an explicit device, both NIMs default to GPU 0 and compete for memory. The examples below assign GLiNER to GPU 0 and Llama to GPU 1; adjust the indices to match your host.
The GLiNER-PII NIM runs on port 8000 (GPU 0):
Map the Llama 3.1 8B Instruct NIM to port 8001 (GPU 1) to avoid a conflict with GLiNER:
Wait until both containers log Application startup complete before proceeding.
Update the Configuration
Update both config.yml files (under config/pii_detection/ and config/pii_masking/) with the local-endpoint versions below, removing the api_key_env_var fields:
PII detection:
PII masking:
Reuse the CLI and SDK Workflows
With the containers running and config updated, rerun the CLI and SDK commands from Run the Guardrails Chat CLI and Use the Python SDK. No other changes are required.
API Specification
The GLiNER-PII NIM exposes an OpenAI-compatible chat completions endpoint.
Chat Completions Endpoint
Extract entities from text.
Request Body:
Example Request:
Response Body:
The response follows the OpenAI chat completions format. The choices[0].message.content field contains a JSON string with the detected entities.
Parsed Content Fields:
EntitySpan Object:
Example Parsed Content:
Supported Entity Types
The NVIDIA GLiNER-PII NIM supports these PII categories:
Configuration Options
Testing
The GLiNER integration tests in tests/test_gliner.py use mocked API responses, so they don’t require a running server.
To run them:
For a self-hosted alternative using the nvidia/gliner-PII model directly, the examples/deployment/gliner_server/ directory provides a reference implementation. The server exposes a POST /v1/extract endpoint. If you use it, set server_endpoint to http://localhost:1235/v1/extract. Refer to the deployment README for setup instructions.
For more information on GLiNER, refer to the GLiNER GitHub repository.