Using External Endpoints#
You might want to use a external endpoint and therefore do not need to deploy specific resources. Follow the steps below and update the Docker Compose before deployment.
Remote LLM Endpoint#
The default Docker Compose deployment will launch the Llama 3.1 70b NIM to use as the LLM, but you might want to use a different LLM depending on your specific needs. This can be changed to use a different LLM by adjusting the configuration.
Open ``config.yaml`` file
Update model and
base_url
accordingly if needed.Change LLMs in config.yaml
By default it will look like the following:
tools: graph_db: type: neo4j params: host: !ENV ${GRAPH_DB_HOST} port: !ENV ${GRAPH_DB_BOLT_PORT} username: !ENV ${GRAPH_DB_USERNAME} password: !ENV ${GRAPH_DB_PASSWORD} tools: embedding: nvidia_embedding vector_db: type: milvus params: host: !ENV ${MILVUS_DB_HOST} port: !ENV ${MILVUS_DB_GRPC_PORT} tools: embedding: nvidia_embedding chat_llm: type: llm params: model: "nvdev/meta/llama-3.1-70b-instruct" base_url: "https://integrate.api.nvidia.com/v1" max_tokens: 2048 temperature: 0.2 top_p: 0.7 api_key: !ENV ${NVIDIA_API_KEY} summarization_llm: type: llm params: model: "nvdev/meta/llama-3.1-70b-instruct" base_url: "https://integrate.api.nvidia.com/v1" max_tokens: 2048 temperature: 0.2 top_p: 0.7 api_key: !ENV ${NVIDIA_API_KEY} notification_llm: type: llm params: model: "nvdev/meta/llama-3.1-70b-instruct" base_url: "https://integrate.api.nvidia.com/v1" max_tokens: 2048 temperature: 0.2 top_p: 0.7 api_key: !ENV ${NVIDIA_API_KEY} nvidia_embedding: type: embedding params: model: "nvdev/nvidia/llama-3.2-nv-embedqa-1b-v2" base_url: "https://integrate.api.nvidia.com/v1" api_key: !ENV ${NVIDIA_API_KEY} nvidia_reranker: type: reranker params: model: "nvidia/llama-3.2-nv-rerankqa-1b-v2" base_url: "https://ai.api.nvidia.com/v1/retrieval/nvidia/llama-3_2-nv-rerankqa-1b-v2/reranking" api_key: !ENV ${NVIDIA_API_KEY} notification_tool: type: alert_sse_notifier params: endpoint: "http://127.0.0.1:60000/via-alert-callback" functions: summarization: type: batch_summarization params: batch_size: 6 # Use even batch size if speech recognition enabled. batch_max_concurrency: 20 prompts: caption: "Write a concise and clear dense caption for the provided warehouse video, focusing on irregular or hazardous events such as boxes falling, workers not wearing PPE, workers falling, workers taking photographs, workers chitchatting, forklift stuck, etc. Start and end each sentence with a time stamp." caption_summarization: "You should summarize the following events of a warehouse in the format start_time:end_time:caption. For start_time and end_time use . to seperate seconds, minutes, hours. If during a time segment only regular activities happen, then ignore them, else note any irregular activities in detail. The output should be bullet points in the format start_time:end_time: detailed_event_description. Don't return anything else except the bullet points." summary_aggregation: "You are a warehouse monitoring system. Given the caption in the form start_time:end_time: caption, Aggregate the following captions in the format start_time:end_time:event_description. If the event_description is the same as another event_description, aggregate the captions in the format start_time1:end_time1,...,start_timek:end_timek:event_description. If any two adjacent end_time1 and start_time2 is within a few tenths of a second, merge the captions in the format start_time1:end_time2. The output should only contain bullet points. Cluster the output into Unsafe Behavior, Operational Inefficiencies, Potential Equipment Damage and Unauthorized Personnel" tools: llm: summarization_llm db: graph_db ingestion_function: type: graph_ingestion params: batch_size: 1 image: false cot: false top_k: 5 tools: llm: chat_llm db: graph_db retriever_function: type: graph_retrieval params: batch_size: 1 image: false cot: false top_k: 5 tools: llm: chat_llm db: graph_db notification: type: notification params: events: [] tools: llm: chat_llm notification_tool: notification_tool context_manager: functions: - summarization - ingestion_function - retriever_function - notification
Change the model and
base_url
to the new LLM in the respective llm tool section inconfig.yaml
file:Examples:
Using GPT-4o model for chat_llm
tools: chat_llm: params: model: "gpt-4o" base_url: "https://api.openai.com/v1" api_key: !ENV ${OPENAI_API_KEY}
Similarly, change the engine, model, and base_url to the new LLM in the
guardrails/config.yml
file:models: - type: main engine: openai model: gpt-4o parameters: base_url: https://api.openai.com/v1
Using
deepseek-r1
model for chat_llmtools: chat_llm: params: model: "deepseek-ai/deepseek-r1" base_url: "https://integrate.api.nvidia.com/v1" api_key: !ENV ${NVIDIA_API_KEY}
Similarly, change the engine, model, and base_url to the new LLM in
guardrails/config.yml
file:models: - type: main engine: nim model: deepseek-ai/deepseek-r1 parameters: base_url: https://integrate.api.nvidia.com/v1
Set NVIDIA_API_KEY
When using endpoints from
build.nvidia.com
, you need to set theNVIDIA_API_KEY
environment variable in the.env
file. Refer to Using NIMs from build.nvidia.com for obtaining the API key.
Remote Embedding and Reranker Endpoint#
Remote Embedding and Reranker Endpoint can be used by updating the config.yaml
file:
tools:
nvidia_embedding:
type: embedding
params:
model: "nvidia/llama-3.2-nv-embedqa-1b-v2"
base_url: "https://integrate.api.nvidia.com/v1"
api_key: !ENV ${NVIDIA_API_KEY}
nvidia_reranker:
type: reranker
params:
model: "nvidia/llama-3.2-nv-rerankqa-1b-v2"
base_url: "https://ai.api.nvidia.com/v1/retrieval/nvidia/llama-3_2-nv-rerankqa-1b-v2/reranking"
api_key: !ENV ${NVIDIA_API_KEY}
Similarly, change the engine, model, and base_url to the remote embedding endpoint in guardrails/config.yml
file:
models:
- type: embeddings
engine: nim
model: nvidia/llama-3.2-nv-embedqa-1b-v2
parameters:
base_url: "https://integrate.api.nvidia.com/v1"
Remote RIVA ASR Endpoint#
To use a remote RIVA ASR endpoint, you need to set the following environment variables in .env
file:
export RIVA_ASR_SERVER_URI="grpc.nvcf.nvidia.com"
export RIVA_ASR_GRPC_PORT=443
export RIVA_ASR_SERVER_IS_NIM=true
export RIVA_ASR_SERVER_USE_SSL=true
export RIVA_ASR_SERVER_API_KEY=nvapi-***
export RIVA_ASR_SERVER_FUNC_ID="d8dd4e9b-fbf5-4fb0-9dba-8cf436c8d965"
Set RIVA_ASR_SERVER_API_KEY
environment variable in .env
file as shown in Using Riva ASR NIM from build.nvidia.com.
For more details about the env
variables, refer to the VSS Deployment-Time Configuration Glossary section.