Working with Multiple NIM for LLMs#
Note
The time to complete this tutorial is approximately 15 minutes.
You can use a single NeMo Guardrails microservice deployment to provide guardrails for multiple NIM microservices instances.
You add a config.yml
file at the root of your configuration store, which defines the available models.
/config-store
├── config.yml
├── config-1
│ └── ...
├── config-2
│ └── ...
└── config-3
└── ...
For example, the following snippet defines three models, llama3-70b-instruct
, llama3-8b-instruct
and mixtral-8x22b-instruct-v0.1
, by providing different base URLs for each of them, corresponding to three NIM instances.
models:
- engine: nim
model: meta/llama-3.1-70b-instruct
parameters:
base_url: http://0.0.0.0:9999/v1
- engine: nim
model: meta/llama3-8b-instruct
parameters:
base_url: http://0.0.0.0:7799/v1
- engine: nim
model: mistralai/mixtral-8x22b-instruct-v0.1
parameters:
base_url: http://0.0.0.0:5599/v1
Whenever a request is made to any of these models, the corresponding NIM instance is used. Also, whenever a request to any other model is made, the default LLM provider is used to handle it.
When a model is defined in the config.yml
file at the root of the configuration store, you can use the model in a guardrail configuration by providing the name.
For example, you can specify meta/llama-3.1-70b-instruct
model for the self-check input rail.
models:
- type: self_check_input_llm
model: meta/llama-3.1-70b-instruct
...