Detect Jailbreak Attempts with NVIDIA NemoGuard JailbreakDetect NIM
Learn how to block adversarial prompts and jailbreak attempts using NVIDIA NemoGuard JailbreakDetect NIM.
By following this tutorial, you learn how to configure jailbreak detection using the NeMo Guardrails library. You will secure an application LLM and test block prompt injection and jailbreak attempts automatically.
Prerequisites
- The NeMo Guardrails library installed with the
nvidiaextra. - A personal NVIDIA API key generated on https://build.nvidia.com/.
Configure Guardrails
-
Create a configuration directory:
-
Save the following as
config/config.yml:The Nemoguard Jailbreak Detect model does not use any prompts, so you don’t need to create a
prompts.ymlfile for this model.For more information about the configuration parameters, refer to the Configuration Reference.
Run the Guardrails chat application
-
Set the NVIDIA_API_KEY environment variable. Guardrails uses this to access models hosted on https://build.nvidia.com/.
-
Run the interactive chat application.
-
Enter a malicious jailbreak prompt.
This prompt is a truncated version of the Do Anything Now prompt.
The model recognizes a jailbreak attempt and blocks it from the Application LLM.
-
Enter a safe non-jailbreak prompt.
The model returns the following response.
Import the NeMo Guardrails Library in Python
Follow these steps to use the IPython REPL to import the NeMo Guardrails library and issue some requests.
-
Install the IPython REPL and run it to interpret the Python code below.
-
Load the guardrails configuration you created earlier.
-
Verify guardrails with a malicious jailbreak attempt.
The model returns:
-
Verify guardrails with a safe request.
The model returns:
Deploy the NVIDIA NemoGuard JailbreakDetect NIM locally
This section shows how to run the NVIDIA NemoGuard JailbreakDetect NIM microservice locally while still using the build.nvidia.com hosted main model. The prerequisites for running the microservice are:
- The NeMo Guardrails library installed.
- NVIDIA NGC API key with the necessary permissions.
- Docker installed.
- NVIDIA Container Toolkit installed.
- System requirements specified in the NVIDIA NemoGuard JailbreakDetect NIM Support Matrix.
To run the NVIDIA NemoGuard JailbreakDetect NIM in a Docker container, follow these steps:
-
Update the
config.ymlfile you created earlier to point to a local NIM deployment rather than build.nvidia.com. The following configuration updates thenim_base_urlto point tohttp://localhost:8123, which tells the NeMo Guardrails library to make requests to the local NIM deployment. The Guardrails configuration must match the NIM Docker container configuration for them to communicate. -
Start the NemoGuard JailbreakDetect NIM Docker container. Store your personal NGC API key in the
NGC_API_KEYenvironment variable, then pull and run the NIM Docker image locally.-
Log in to your NVIDIA NGC account.
Export your personal NGC API key to an environment variable.
Log in to the NGC registry by running the following command.
-
Download the container.
-
Create a model cache directory on the host machine.
-
Run the container with the cache directory mounted.
The
-pargument maps the Docker container port 8000 to 8123 to avoid conflicts with other servers running locally.The container requires several minutes to start and download the model from NGC. You can monitor the progress by running the
docker logs nemoguard-jailbreakdetectcommand. -
Confirm the service is ready to respond to inference requests.
This returns the following response.
-
-
Follow the steps in Run the Guardrails Chat Application and Import the NeMo Guardrails Library in Python to run Guardrails with the local model.
Next Steps
- NVIDIA NemoGuard JailbreakDetect NIM documentation
- Configuration Reference for all configuration options