Docker Compose Pattern Examples#
- This page provides complete, tested compose file examples for common multi-container patterns in AI Workbench.
For compose file specifications, see Multi-Container Environments (Docker Compose).
For step-by-step procedures, see Use Multi-Container Environments.
For conceptual background, see Docker Compose Environments.
Pattern 1: NIM Model Selection with Profiles#
Use Case#
- Run one of several NIM models based on available GPU resources.
Users select a model size that fits their hardware. Multiple models share the same port and interface. Profiles enable easy switching without maintaining separate compose files.
- This pattern works well for:
Development and testing with different model sizes
Demos where users have varying hardware capabilities
Projects where model selection happens at deployment time
Key Features#
Multiple services with identical interfaces
Profile-based service selection
Variable GPU requirements per model
Shared network configuration
Model cache volume management
Example Configuration#
Compose file with three model variants:
services:
llama-3.1-8b-instruct:
image: nvcr.io/nim/meta/llama-3.1-8b-instruct:latest
runtime: nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
ports:
- "8000:8000"
volumes:
- type: bind
source: /tmp
target: /opt/nim/.cache/
environment:
- NGC_API_KEY=${NVIDIA_API_KEY:?Error NVIDIA_API_KEY not set}
networks:
- app-network
profiles:
- meta/llama-3.1-8b-instruct
llama-3.1-70b-instruct:
image: nvcr.io/nim/meta/llama-3.1-70b-instruct:latest
runtime: nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 2
capabilities: [gpu]
ports:
- "8000:8000"
volumes:
- type: bind
source: /tmp
target: /opt/nim/.cache/
environment:
- NGC_API_KEY=${NVIDIA_API_KEY:?Error NVIDIA_API_KEY not set}
networks:
- app-network
profiles:
- meta/llama-3.1-70b-instruct
llama-3.1-405b-instruct:
image: nvcr.io/nim/meta/llama-3.1-405b-instruct:latest
runtime: nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 8
capabilities: [gpu]
ports:
- "8000:8000"
volumes:
- type: bind
source: /tmp
target: /opt/nim/.cache/
environment:
- NGC_API_KEY=${NVIDIA_API_KEY:?Error NVIDIA_API_KEY not set}
networks:
- app-network
profiles:
- meta/llama-3.1-405b-instruct
networks:
app-network:
driver: bridge
Configuration Notes#
- GPU count varies by model size:
8B model: 1 GPU
70B model: 2 GPUs
405B model: 8 GPUs
- All models use the same port (8000):
Only one model can run at a time. This is intentional for easy model switching.
- Model cache mounted to
/tmp: Change
source: /tmpto a dedicated directory for persistent caching. Ensure the directory has write permissions.- NGC_API_KEY is required:
Set the
NVIDIA_API_KEYsecret in AI Workbench. The compose file validates this variable is set before starting.- Profiles match model names:
Select the profile matching your desired model in the AI Workbench UI. Only the selected model’s service will start.
Pattern 2: Full RAG Pipeline with Multiple Services#
Use Case#
- Run a complete RAG system with ingestion, retrieval, generation, and frontend services.
Each component runs in its own container with specific GPU assignments. Multiple profiles enable running different subsets of services. Services communicate over a shared network with healthcheck dependencies.
- This pattern works well for:
Production RAG applications
End-to-end AI pipelines
Applications requiring multiple specialized models
Systems with document processing, vector search, and generation
Key Features#
Multiple GPU-accelerated services on different GPUs
Profile-based deployment modes (local, ingest, rag, vectordb, guardrails)
Service dependencies with healthchecks
Persistent storage with volumes
Web service integration with NVWB_TRIM_PREFIX
Example Configuration#
Compose file excerpt showing key services:
services:
# LLM for response generation
nim-llm:
container_name: nim-llm-ms
image: nvcr.io/nim/nvidia/llama-3.3-nemotron-super-49b-v1.5:1.13.1
volumes:
- ${MODEL_DIRECTORY:-/tmp}:/opt/nim/.cache
ports:
- "8999:8000"
environment:
NGC_API_KEY: ${NGC_API_KEY}
shm_size: 20gb
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['${LLM_MS_GPU_ID:-1}']
capabilities: [gpu]
healthcheck:
test: ["CMD", "python3", "-c", "import requests; requests.get('http://localhost:8000/v1/health/ready')"]
interval: 10s
timeout: 20s
retries: 100
profiles: ["local"]
# Embedding model
nemoretriever-embedding-ms:
container_name: nemoretriever-embedding-ms
image: nvcr.io/nim/nvidia/llama-3.2-nv-embedqa-1b-v2:1.10.0
volumes:
- ${MODEL_DIRECTORY:-/tmp}:/opt/nim/.cache
ports:
- "9080:8000"
environment:
NGC_API_KEY: ${NGC_API_KEY}
shm_size: 16GB
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['${EMBEDDING_MS_GPU_ID:-0}']
capabilities: [gpu]
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/v1/health/ready"]
interval: 30s
timeout: 20s
retries: 3
start_period: 10m
profiles: ["local"]
# RAG orchestrator server
rag-server:
container_name: rag-server
image: nvcr.io/nvidia/blueprint/rag-server:2.3.0
command: --port 8081 --host 0.0.0.0 --workers 8
environment:
APP_VECTORSTORE_URL: "http://milvus:19530"
APP_LLM_SERVERURL: "nim-llm:8000"
APP_EMBEDDINGS_SERVERURL: "nemoretriever-embedding-ms:8000"
NVIDIA_API_KEY: ${NGC_API_KEY}
ports:
- "8081:8081"
shm_size: 5gb
profiles: ["rag"]
# Frontend UI
rag-frontend:
container_name: rag-frontend
image: nvcr.io/nvidia/blueprint/rag-frontend:2.3.0
ports:
- "8090:3000"
depends_on:
- rag-server
environment:
VITE_API_CHAT_URL: "http://rag-server:8081/v1"
NVWB_TRIM_PREFIX: "true"
profiles: ["rag"]
# Vector database (GPU-accelerated)
milvus:
container_name: milvus-standalone
image: milvusdb/milvus:v2.5.3-gpu
command: ["milvus", "run", "standalone"]
environment:
ETCD_ENDPOINTS: etcd:2379
MINIO_ADDRESS: minio:9010
volumes:
- ${DOCKER_VOLUME_DIRECTORY:-./volumes/milvus}:/var/lib/milvus
ports:
- "19530:19530"
depends_on:
- etcd
- minio
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['${VECTORSTORE_GPU_DEVICE_ID:-0}']
capabilities: [gpu]
profiles: ["vectordb"]
# Supporting services
etcd:
container_name: milvus-etcd
image: quay.io/coreos/etcd:v3.5.19
environment:
- ETCD_AUTO_COMPACTION_MODE=revision
- ETCD_AUTO_COMPACTION_RETENTION=1000
volumes:
- ${DOCKER_VOLUME_DIRECTORY:-./volumes/etcd}:/etcd
command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
profiles: ["vectordb"]
minio:
container_name: milvus-minio
image: minio/minio:RELEASE.2025-02-28T09-55-16Z
environment:
MINIO_ACCESS_KEY: minioadmin
MINIO_SECRET_KEY: minioadmin
ports:
- "9011:9011"
- "9010:9010"
volumes:
- ${DOCKER_VOLUME_DIRECTORY:-./volumes/minio}:/minio_data
command: minio server /minio_data --console-address ":9011" --address ":9010"
profiles: ["vectordb"]
redis:
image: redis/redis-stack
ports:
- "6379:6379"
profiles: ["ingest"]
volumes:
nim_cache:
external: true
networks:
default:
name: nvidia-rag
Configuration Notes#
- Services are organized by profile:
local: GPU-accelerated inference services (NIMs)rag: RAG orchestration and frontendvectordb: Vector database and dependenciesingest: Document ingestion pipelineguardrails: Optional content safety services
- GPU assignment uses device_ids:
device_ids: ['${LLM_MS_GPU_ID:-1}']assigns a specific GPU. Environment variables (LLM_MS_GPU_ID, EMBEDDING_MS_GPU_ID) control which GPU each service uses. Default values provided after:-if variables are not set.- Services communicate by name:
APP_LLM_SERVERURL: "nim-llm:8000"connects to the nim-llm service. All services share thenvidia-ragnetwork.- Healthchecks ensure proper startup order:
Services with
depends_onwait for dependencies to be healthy. Healthchecks use HTTP endpoints or curl commands.- NVWB_TRIM_PREFIX enables proxy for frontend:
The rag-frontend service is accessible through AI Workbench’s proxy. Backend services do not need this variable.
- Volumes provide persistent storage:
Model caches in
/opt/nim/.cachepersist between restarts. Vector database and Minio data stored in./volumes/.
Pattern 3: Custom Microservices with Build Contexts#
Use Case#
- Build and run custom application services alongside supporting infrastructure.
Your own code runs in containers built from Dockerfiles. Services communicate through a shared network and message queues. Supporting services provide databases, caching, and observability.
- This pattern works well for:
Custom AI applications with multiple components
Microservices architectures
Applications requiring specialized build steps
Integration with external APIs and services
Key Features#
Custom Dockerfiles with build contexts
Service-to-service communication via networks
Message queues and task systems (Celery, Redis)
Observability with tracing (Jaeger)
Persistent volumes for shared data
Example Configuration#
Compose file with custom services:
services:
# Custom API service
api-service:
build:
context: .
dockerfile: services/APIService/Dockerfile
ports:
- "8002:8002"
environment:
- PDF_SERVICE_URL=http://pdf-service:8003
- AGENT_SERVICE_URL=http://agent-service:8964
- TTS_SERVICE_URL=http://tts-service:8889
- REDIS_URL=redis://redis:6379
depends_on:
- redis
- pdf-service
- agent-service
- tts-service
networks:
- app-network
# Agent service with GPU access
agent-service:
build:
context: .
dockerfile: services/AgentService/Dockerfile
ports:
- "8964:8964"
environment:
- NVIDIA_API_KEY=${NVIDIA_API_KEY}
- REDIS_URL=redis://redis:6379
- MODEL_CONFIG_PATH=/app/config/models.json
volumes:
- ./models.json:/app/config/models.json
depends_on:
- redis
networks:
- app-network
# PDF processing service
pdf-service:
build:
context: .
dockerfile: services/PDFService/Dockerfile
ports:
- "8003:8003"
environment:
- REDIS_URL=redis://redis:6379
- MODEL_API_URL=http://pdf-api:8004
depends_on:
- redis
- pdf-api
networks:
- app-network
# Celery worker for async tasks
celery-worker:
build:
context: services/PDFService/PDFModelService
dockerfile: Dockerfile.worker
environment:
- CELERY_BROKER_URL=redis://redis:6379/0
- CELERY_RESULT_BACKEND=redis://redis:6379/0
volumes:
- pdf_temp:/tmp/pdf_conversions
depends_on:
- redis
restart: unless-stopped
networks:
- app-network
# Supporting services
redis:
image: redis:latest
ports:
- "6379:6379"
command: redis-server --appendonly no
networks:
- app-network
minio:
image: minio/minio:latest
ports:
- "9000:9000"
- "9001:9001"
environment:
- MINIO_ROOT_USER=minioadmin
- MINIO_ROOT_PASSWORD=minioadmin
volumes:
- ./data/minio:/data
command: minio server /data --console-address ":9001"
networks:
- app-network
# Observability
jaeger:
image: jaegertracing/all-in-one:latest
ports:
- "16686:16686" # UI
- "4317:4317" # OTLP GRPC
- "4318:4318" # OTLP HTTP
environment:
- COLLECTOR_OTLP_ENABLED=true
networks:
- app-network
volumes:
pdf_temp:
networks:
app-network:
driver: bridge
Configuration Notes#
- Services use custom Dockerfiles:
build.contextsets the build directory.build.dockerfilespecifies the Dockerfile path. AI Workbench builds these images when starting compose.- Services communicate through service names:
PDF_SERVICE_URL=http://pdf-service:8003references the pdf-service by name. All services must be on the same network.- Redis provides message queue and caching:
Multiple services connect to the same Redis instance. Celery uses Redis as broker and result backend.
- Volumes share data between services:
pdf_tempvolume is shared between pdf-api and celery-worker. Bind mounts (./models.json) inject configuration files.- Dependencies ensure startup order:
depends_onstarts redis before services that need it. Does not wait for services to be healthy unless healthchecks are defined.- Jaeger provides distributed tracing:
Services can send traces to Jaeger for observability. Jaeger UI accessible at http://localhost:16686.
Best Practices Across All Patterns#
- Use named networks for better isolation:
Create explicit networks instead of relying on the default network. Makes service communication explicit and easier to debug.
- Define healthchecks for critical services:
Prevents dependent services from starting before dependencies are ready. Use HTTP endpoints or simple commands that verify service readiness.
- Use environment variables for configuration:
Reference secrets and configuration through
${VARIABLE_NAME}syntax. Set variables in AI Workbench or.envfiles. Never hardcode sensitive values in compose files.- Pin image versions in production:
Use specific tags (
image:1.2.3) instead oflatest. Ensures reproducible deployments across environments.- Use volumes for persistent data:
Model caches, databases, and application data should use volumes. Prevents data loss when containers restart.
- Organize services with profiles:
Group related services into profiles for different deployment scenarios. Enables flexible deployments without maintaining multiple compose files.
- Document GPU requirements clearly:
Comment GPU assignments and memory requirements. Helps users understand hardware requirements before deployment.
- Use service names for inter-service communication:
Services on the same network can reach each other by service name. Avoid using localhost or IP addresses for service-to-service calls.