Custom LLM Providers#

Note

The time to complete this tutorial is approximately 15 minutes.

About Custom LLM Providers#

You can use LLM providers other than NVIDIA NIM with NeMo Guardrails microservice. The provider can be hosted locally or through external services. Refer to the following steps to configure additional LLM providers using the config.yml file and properly setting environment variables.

The microservice supports multi-LLM engines and configuration. The microservice recognizes and integrates the specified LLM providers based on your config.yml and environment variables.

Only OpenAI compatible LLM providers are supported.

Understanding Configuration File Changes#

The config.yml file located at the root of your configuration store identifies other LLM providers. Define each model with an engine, model name, and required parameters.

The following example defines three models and uses two LLM providers, nim and openai.

models:
  - engine: nim
    model: meta/llama-3.1-8b-instruct
    parameters:
      base_url: https://integrate.api.nvidia.com/v1

  - engine: openai
    model: davinci-002
    parameters:
      api_key: $OPENAI_API_KEY
      base_url: https://api.openai.com/v1

  - engine: openai
    model: gpt-4o
    parameters:
      api_key: $OPENAI_API_KEY
      base_url: https://api.openai.com/v1

Each model is associated with a single LLM provider in the engine field. This one-to-one relationship ensures clarity in configuration and prevents conflicts.

The following configuration is invalid because my-awesome-llm is associated with two LLM providers:

models:
  - engine: abc
    model: my-awesome-llm
    parameters:
      api_key: $OPENAI_API_KEY
      base_url: https://my-awesome-llm.com/v1

  - engine: xyz
    model: my-awesome-llm   # Conflict: Same model with a different provider
    parameters:
      base_url: https://my-awesome-llm.com/v1

Using Environment Variables#

For LLM providers that require authentication or specific configurations through environment variables, using variables is a two-step process:

  1. Identify the variable names in the config.yml file, such as OPENAI_API_KEY.

    In the config.yml file, you identify the environment variable names within the parameters field, along other parameters such as base_url.

  2. Set the environment variable values for the container.

    Ensure that the variables are set in your deployment environment. When running with Docker, specify them using the -e argument.

Example#

  1. Create a config.yml file with contents like the following:

    models:
      - engine: openai
        model: davinci-002
        parameters:
          api_key: $OPENAI_API_KEY
          base_url: https://api.openai.com/v1
    
  2. Start the microservice container, specifying -e arguments for environment variables:

    :emphasize-lines: 5
    
    docker run -d -p $GUARDRAILS_PORT:$GUARDRAILS_PORT \
      --name guardrails-ms \
      --platform linux/amd64 \
      -e NIM_ENDPOINT_API_KEY=$NVIDIA_API_KEY \
      -e OPENAI_API_KEY=$OPENAI_API_KEY \
      -e CONFIG_STORE_PATH=$CONFIG_STORE_PATH \
      -e DEFAULT_CONFIG_ID=$DEFAULT_CONFIG_ID \
      -v ./config-store:$CONFIG_STORE_PATH \
      nvcr.io/nvidia/nemo-microservices/guardrails:25.04
    
    • Replace $OPENAI_API_KEY, $CONFIG_STORE_PATH, and other environment variables with your actual values or set them as environment variables.

    • If an LLM provider does not require an API key or specific environment variables, omit the corresponding -e argument and do not include the API key in config.yml.

  3. Run inference:

    curl -X 'POST' \
      "http://localhost:${GUARDRAILS_PORT}/v1/guardrail/completions" \
      -H 'Accept: application/json' \
      -H 'Content-Type: application/json' \
      -d '{
      "model": "davinci-002",
      "prompt": "what can you do for me?",
      "max_tokens": 16,
      "stream": false,
      "temperature": 1,
      "top_p": 1,
      "frequency_penalty": 0,
      "presence_penalty": 0
    }'
    

    As an alternative to specifying the OpenAI API key as an environment variable, you can specify the X-Model-Authorization header. For more information, refer to Custom HTTP Headers.