Custom LLM Providers#

Note

The time to complete this tutorial is approximately 15 minutes.

About Custom LLM Providers#

You can use LLM providers other than NVIDIA NIM with NeMo Guardrails microservice. The provider can be hosted locally or through external services. Refer to the following steps to configure additional LLM providers using the config.yml file and properly setting environment variables.

The microservice supports multi-LLM engines and configuration. The microservice recognizes and integrates the specified LLM providers based on your config.yml and environment variables.

Only OpenAI compatible LLM providers are supported.

Understanding Configuration File Changes#

The config.yml file located at the root of your configuration store identifies other LLM providers. Define each model with model_id, engine, model, and base_url fields. Provide optional configuration in parameters if needed.

The following example defines three models and uses two LLM providers, nim and openai.

models:
  - model_id: meta/llama-3.1-8b-instruct
    engine: nim
    model: meta/llama-3.1-8b-instruct
    base_url: https://integrate.api.nvidia.com/v1

  - model_id: davinci-002
    engine: openai
    model: davinci-002
    base_url: https://api.openai.com/v1

  - model_id: gpt-4o
    engine: openai
    model: gpt-4o
    base_url: https://api.openai.com/v1

Each model is identified by the model_id field, which has to be unique for each entry. This allows Guardrails clients to call the same model hosted at multiple URLs. For example a staging and production cluster.

The following configuration demonstrates this concept. Client requests can refer to either staging/my-awesome-llm or prod/my-awesome-llm models to select between the staging and production clusters.

models:
  - model_id: staging/my-awesome-llm
    engine: nim
    model: my-awesome-llm
    base_url: https://staging.models.com/v1

  - model_id: prod/my-awesome-llm
    engine: nim
    model: my-awesome-llm
    base_url: https://production.models.com/v1

Using Environment Variables#

Guardrails simplifies authentication by using the engine field of the model configuration to look up API keys stored in environment variables as follows.

When the engine value is set to nim, the NeMo Guardrails microservice gets the API key from the environment variable $NVIDIA_API_KEY.
When the engine value is set to openai, the NeMo Guardrails microservice gets the API key from the environment variable $OPENAI_API_KEY.
For other engine values, the NeMo Guardrails microservice gets the API key from the client X-Model-Authorization header field in the client request is used. For more information, refer to Custom HTTP Headers.

Set the environment variable values for the container.

Identify the engine names in the config.yml file, such as nim or openai. Create environment variables as needed for each engine.

export OPENAI_API_KEY="<OpenAI API key>"
export NVIDIA_API_KEY="<NVIDIA API key>"

Example#

Create a config.yml file with contents like the following:

models:
  - model_id: davinci-002
    engine: openai
    model: davinci-002
    base_url: https://api.openai.com/v1

Start the microservice container, specifying -e arguments for environment variables:

:emphasize-lines: 5

docker run -d -p $GUARDRAILS_PORT:$GUARDRAILS_PORT \
  --name guardrails-ms \
  --platform linux/amd64 \
  -e NIM_ENDPOINT_API_KEY=$NVIDIA_API_KEY \
  -e OPENAI_API_KEY=$OPENAI_API_KEY \
  -e CONFIG_STORE_PATH=$CONFIG_STORE_PATH \
  -e DEFAULT_CONFIG_ID=$DEFAULT_CONFIG_ID \
  -v ./config-store:$CONFIG_STORE_PATH \
  nvcr.io/nvidia/nemo-microservices/guardrails:25.06

Replace $OPENAI_API_KEY, $CONFIG_STORE_PATH, and other environment variables with your actual values or set them as environment variables.
If an LLM provider does not require an API key or specific environment variables, omit the corresponding -e argument.

Run inference:

curl -X 'POST' \
  "http://localhost:${GUARDRAILS_PORT}/v1/guardrail/completions" \
  -H 'Accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "davinci-002",
  "prompt": "what can you do for me?",
  "max_tokens": 16,
  "stream": false,
  "temperature": 1,
  "top_p": 1,
  "frequency_penalty": 0,
  "presence_penalty": 0
}'