Working with Multiple NIM for LLMs#

Note

The time to complete this tutorial is approximately 15 minutes.

You can use a single NeMo Guardrails microservice deployment to provide guardrails for multiple NIM microservices instances. You add a config.yml file at the root of your configuration store, which defines the available models.

/config-store
├── config.yml
├── config-1
│   └── ...
├── config-2
│   └── ...
└── config-3
    └── ...

For example, the following snippet defines three models, llama3-70b-instruct, llama3-8b-instruct and mixtral-8x22b-instruct-v0.1, by providing different base URLs for each of them, corresponding to three NIM instances.

models:
- engine: nim
  model: meta/llama-3.1-70b-instruct
  parameters:
    base_url: http://0.0.0.0:9999/v1

- engine: nim
  model: meta/llama3-8b-instruct
  parameters:
    base_url: http://0.0.0.0:7799/v1

- engine: nim
  model: mistralai/mixtral-8x22b-instruct-v0.1
  parameters:
    base_url: http://0.0.0.0:5599/v1

Whenever a request is made to any of these models, the corresponding NIM instance is used. Also, whenever a request to any other model is made, the default LLM provider is used to handle it.

When a model is defined in the config.yml file at the root of the configuration store, you can use the model in a guardrail configuration by providing the name. For example, you can specify meta/llama-3.1-70b-instruct model for the self-check input rail.

    models:
      - type: self_check_input_llm
        model: meta/llama-3.1-70b-instruct

      ...