Docker Compose Pattern Examples#

Overview#

This reference provides complete examples of common multi-container patterns in AI Workbench.

Each pattern addresses a specific use case with full compose file examples. Use these as starting points and adapt them to your specific requirements.

The patterns cover three common scenarios: model selection, full pipelines, and custom microservices.

Model selection lets users choose between multiple NIMs based on available hardware. Full pipelines demonstrate complex multi-service AI applications. Custom microservices show how to build and connect your own services.

All examples use real-world configurations from NVIDIA reference projects.

These patterns are tested and production-ready. Adapt image names, ports, and configurations to match your specific needs.

Pattern 1: NIM Model Selection with Profiles#

Use Case#

Run one of several NIM models based on available GPU resources.

Users select a model size that fits their hardware. Multiple models share the same port and interface. Profiles enable easy switching without maintaining separate compose files.

This pattern works well for:
  • Development and testing with different model sizes

  • Demos where users have varying hardware capabilities

  • Projects where model selection happens at deployment time

Key Features#

  • Multiple services with identical interfaces

  • Profile-based service selection

  • Variable GPU requirements per model

  • Shared network configuration

  • Model cache volume management

Example Configuration#

Compose file with three model variants:

services:
  llama-3.1-8b-instruct:
    image: nvcr.io/nim/meta/llama-3.1-8b-instruct:latest
    runtime: nvidia
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    ports:
      - "8000:8000"
    volumes:
      - type: bind
        source: /tmp
        target: /opt/nim/.cache/
    environment:
      - NGC_API_KEY=${NVIDIA_API_KEY:?Error NVIDIA_API_KEY not set}
    networks:
      - app-network
    profiles:
      - meta/llama-3.1-8b-instruct

  llama-3.1-70b-instruct:
    image: nvcr.io/nim/meta/llama-3.1-70b-instruct:latest
    runtime: nvidia
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 2
              capabilities: [gpu]
    ports:
      - "8000:8000"
    volumes:
      - type: bind
        source: /tmp
        target: /opt/nim/.cache/
    environment:
      - NGC_API_KEY=${NVIDIA_API_KEY:?Error NVIDIA_API_KEY not set}
    networks:
      - app-network
    profiles:
      - meta/llama-3.1-70b-instruct

  llama-3.1-405b-instruct:
    image: nvcr.io/nim/meta/llama-3.1-405b-instruct:latest
    runtime: nvidia
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 8
              capabilities: [gpu]
    ports:
      - "8000:8000"
    volumes:
      - type: bind
        source: /tmp
        target: /opt/nim/.cache/
    environment:
      - NGC_API_KEY=${NVIDIA_API_KEY:?Error NVIDIA_API_KEY not set}
    networks:
      - app-network
    profiles:
      - meta/llama-3.1-405b-instruct

networks:
  app-network:
    driver: bridge

Configuration Notes#

GPU count varies by model size:
  • 8B model: 1 GPU

  • 70B model: 2 GPUs

  • 405B model: 8 GPUs

All models use the same port (8000):

Only one model can run at a time. This is intentional for easy model switching.

Model cache mounted to /tmp:

Change source: /tmp to a dedicated directory for persistent caching. Ensure the directory has write permissions.

NGC_API_KEY is required:

Set the NVIDIA_API_KEY secret in AI Workbench. The compose file validates this variable is set before starting.

Profiles match model names:

Select the profile matching your desired model in the AI Workbench UI. Only the selected model’s service will start.

Using This Pattern#

Step One: Configure the NVIDIA_API_KEY secret.
  1. Select Project Tab > Environment > Secrets

  2. Add NVIDIA_API_KEY with your NGC API key

Step Two: Select a model profile.
  1. Select Project Tab > Environment > Compose

  2. Select profile from dropdown (e.g., meta/llama-3.1-8b-instruct)

Step Three: Start the compose environment.
  1. Select Start

  2. Monitor logs for model download and startup

  3. Wait for the service to become ready

Success: The selected NIM is accessible at http://localhost:8000

Pattern 2: Full RAG Pipeline with Multiple Services#

Use Case#

Run a complete RAG system with ingestion, retrieval, generation, and frontend services.

Each component runs in its own container with specific GPU assignments. Multiple profiles enable running different subsets of services. Services communicate over a shared network with healthcheck dependencies.

This pattern works well for:
  • Production RAG applications

  • End-to-end AI pipelines

  • Applications requiring multiple specialized models

  • Systems with document processing, vector search, and generation

Key Features#

  • Multiple GPU-accelerated services on different GPUs

  • Profile-based deployment modes (local, ingest, rag, vectordb, guardrails)

  • Service dependencies with healthchecks

  • Persistent storage with volumes

  • Web service integration with NVWB_TRIM_PREFIX

Example Configuration#

Compose file excerpt showing key services:

services:
  # LLM for response generation
  nim-llm:
    container_name: nim-llm-ms
    image: nvcr.io/nim/nvidia/llama-3.3-nemotron-super-49b-v1.5:1.13.1
    volumes:
      - ${MODEL_DIRECTORY:-/tmp}:/opt/nim/.cache
    ports:
      - "8999:8000"
    environment:
      NGC_API_KEY: ${NGC_API_KEY}
    shm_size: 20gb
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ['${LLM_MS_GPU_ID:-1}']
              capabilities: [gpu]
    healthcheck:
      test: ["CMD", "python3", "-c", "import requests; requests.get('http://localhost:8000/v1/health/ready')"]
      interval: 10s
      timeout: 20s
      retries: 100
    profiles: ["local"]

  # Embedding model
  nemoretriever-embedding-ms:
    container_name: nemoretriever-embedding-ms
    image: nvcr.io/nim/nvidia/llama-3.2-nv-embedqa-1b-v2:1.10.0
    volumes:
      - ${MODEL_DIRECTORY:-/tmp}:/opt/nim/.cache
    ports:
      - "9080:8000"
    environment:
      NGC_API_KEY: ${NGC_API_KEY}
    shm_size: 16GB
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ['${EMBEDDING_MS_GPU_ID:-0}']
              capabilities: [gpu]
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/v1/health/ready"]
      interval: 30s
      timeout: 20s
      retries: 3
      start_period: 10m
    profiles: ["local"]

  # RAG orchestrator server
  rag-server:
    container_name: rag-server
    image: nvcr.io/nvidia/blueprint/rag-server:2.3.0
    command: --port 8081 --host 0.0.0.0 --workers 8
    environment:
      APP_VECTORSTORE_URL: "http://milvus:19530"
      APP_LLM_SERVERURL: "nim-llm:8000"
      APP_EMBEDDINGS_SERVERURL: "nemoretriever-embedding-ms:8000"
      NVIDIA_API_KEY: ${NGC_API_KEY}
    ports:
      - "8081:8081"
    shm_size: 5gb
    profiles: ["rag"]

  # Frontend UI
  rag-frontend:
    container_name: rag-frontend
    image: nvcr.io/nvidia/blueprint/rag-frontend:2.3.0
    ports:
      - "8090:3000"
    depends_on:
      - rag-server
    environment:
      VITE_API_CHAT_URL: "http://rag-server:8081/v1"
      NVWB_TRIM_PREFIX: "true"
    profiles: ["rag"]

  # Vector database (GPU-accelerated)
  milvus:
    container_name: milvus-standalone
    image: milvusdb/milvus:v2.5.3-gpu
    command: ["milvus", "run", "standalone"]
    environment:
      ETCD_ENDPOINTS: etcd:2379
      MINIO_ADDRESS: minio:9010
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-./volumes/milvus}:/var/lib/milvus
    ports:
      - "19530:19530"
    depends_on:
      - etcd
      - minio
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ['${VECTORSTORE_GPU_DEVICE_ID:-0}']
              capabilities: [gpu]
    profiles: ["vectordb"]

  # Supporting services
  etcd:
    container_name: milvus-etcd
    image: quay.io/coreos/etcd:v3.5.19
    environment:
      - ETCD_AUTO_COMPACTION_MODE=revision
      - ETCD_AUTO_COMPACTION_RETENTION=1000
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-./volumes/etcd}:/etcd
    command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
    profiles: ["vectordb"]

  minio:
    container_name: milvus-minio
    image: minio/minio:RELEASE.2025-02-28T09-55-16Z
    environment:
      MINIO_ACCESS_KEY: minioadmin
      MINIO_SECRET_KEY: minioadmin
    ports:
      - "9011:9011"
      - "9010:9010"
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-./volumes/minio}:/minio_data
    command: minio server /minio_data --console-address ":9011" --address ":9010"
    profiles: ["vectordb"]

  redis:
    image: redis/redis-stack
    ports:
      - "6379:6379"
    profiles: ["ingest"]

volumes:
  nim_cache:
    external: true

networks:
  default:
    name: nvidia-rag

Configuration Notes#

Services are organized by profile:
  • local: GPU-accelerated inference services (NIMs)

  • rag: RAG orchestration and frontend

  • vectordb: Vector database and dependencies

  • ingest: Document ingestion pipeline

  • guardrails: Optional content safety services

GPU assignment uses device_ids:

device_ids: ['${LLM_MS_GPU_ID:-1}'] assigns a specific GPU. Environment variables (LLM_MS_GPU_ID, EMBEDDING_MS_GPU_ID) control which GPU each service uses. Default values provided after :- if variables are not set.

Services communicate by name:

APP_LLM_SERVERURL: "nim-llm:8000" connects to the nim-llm service. All services share the nvidia-rag network.

Healthchecks ensure proper startup order:

Services with depends_on wait for dependencies to be healthy. Healthchecks use HTTP endpoints or curl commands.

NVWB_TRIM_PREFIX enables proxy for frontend:

The rag-frontend service is accessible through AI Workbench’s proxy. Backend services do not need this variable.

Volumes provide persistent storage:

Model caches in /opt/nim/.cache persist between restarts. Vector database and Minio data stored in ./volumes/.

Using This Pattern#

Step One: Set required environment variables.

Create a .env file in your project root:

NGC_API_KEY=your_ngc_api_key
MODEL_DIRECTORY=/path/to/model/cache
LLM_MS_GPU_ID=1
EMBEDDING_MS_GPU_ID=0
VECTORSTORE_GPU_DEVICE_ID=0
Step Two: Create required volumes.
docker volume create nim_cache
mkdir -p volumes/milvus volumes/etcd volumes/minio
Step Three: Select profiles in AI Workbench.
  1. Select Project Tab > Environment > Compose

  2. Select profiles: local, rag, vectordb

  3. Select Start

Step Four: Wait for all services to become healthy.

Monitor compose output for healthcheck status. NIMs may take several minutes to download and initialize.

Success: Access the RAG frontend through the AI Workbench proxy URL.

Pattern 3: Custom Microservices with Build Contexts#

Use Case#

Build and run custom application services alongside supporting infrastructure.

Your own code runs in containers built from Dockerfiles. Services communicate through a shared network and message queues. Supporting services provide databases, caching, and observability.

This pattern works well for:
  • Custom AI applications with multiple components

  • Microservices architectures

  • Applications requiring specialized build steps

  • Integration with external APIs and services

Key Features#

  • Custom Dockerfiles with build contexts

  • Service-to-service communication via networks

  • Message queues and task systems (Celery, Redis)

  • Observability with tracing (Jaeger)

  • Persistent volumes for shared data

Example Configuration#

Compose file with custom services:

services:
  # Custom API service
  api-service:
    build:
      context: .
      dockerfile: services/APIService/Dockerfile
    ports:
      - "8002:8002"
    environment:
      - PDF_SERVICE_URL=http://pdf-service:8003
      - AGENT_SERVICE_URL=http://agent-service:8964
      - TTS_SERVICE_URL=http://tts-service:8889
      - REDIS_URL=redis://redis:6379
    depends_on:
      - redis
      - pdf-service
      - agent-service
      - tts-service
    networks:
      - app-network

  # Agent service with GPU access
  agent-service:
    build:
      context: .
      dockerfile: services/AgentService/Dockerfile
    ports:
      - "8964:8964"
    environment:
      - NVIDIA_API_KEY=${NVIDIA_API_KEY}
      - REDIS_URL=redis://redis:6379
      - MODEL_CONFIG_PATH=/app/config/models.json
    volumes:
      - ./models.json:/app/config/models.json
    depends_on:
      - redis
    networks:
      - app-network

  # PDF processing service
  pdf-service:
    build:
      context: .
      dockerfile: services/PDFService/Dockerfile
    ports:
      - "8003:8003"
    environment:
      - REDIS_URL=redis://redis:6379
      - MODEL_API_URL=http://pdf-api:8004
    depends_on:
      - redis
      - pdf-api
    networks:
      - app-network

  # Celery worker for async tasks
  celery-worker:
    build:
      context: services/PDFService/PDFModelService
      dockerfile: Dockerfile.worker
    environment:
      - CELERY_BROKER_URL=redis://redis:6379/0
      - CELERY_RESULT_BACKEND=redis://redis:6379/0
    volumes:
      - pdf_temp:/tmp/pdf_conversions
    depends_on:
      - redis
    restart: unless-stopped
    networks:
      - app-network

  # Supporting services
  redis:
    image: redis:latest
    ports:
      - "6379:6379"
    command: redis-server --appendonly no
    networks:
      - app-network

  minio:
    image: minio/minio:latest
    ports:
      - "9000:9000"
      - "9001:9001"
    environment:
      - MINIO_ROOT_USER=minioadmin
      - MINIO_ROOT_PASSWORD=minioadmin
    volumes:
      - ./data/minio:/data
    command: minio server /data --console-address ":9001"
    networks:
      - app-network

  # Observability
  jaeger:
    image: jaegertracing/all-in-one:latest
    ports:
      - "16686:16686"  # UI
      - "4317:4317"    # OTLP GRPC
      - "4318:4318"    # OTLP HTTP
    environment:
      - COLLECTOR_OTLP_ENABLED=true
    networks:
      - app-network

volumes:
  pdf_temp:

networks:
  app-network:
    driver: bridge

Configuration Notes#

Services use custom Dockerfiles:

build.context sets the build directory. build.dockerfile specifies the Dockerfile path. AI Workbench builds these images when starting compose.

Services communicate through service names:

PDF_SERVICE_URL=http://pdf-service:8003 references the pdf-service by name. All services must be on the same network.

Redis provides message queue and caching:

Multiple services connect to the same Redis instance. Celery uses Redis as broker and result backend.

Volumes share data between services:

pdf_temp volume is shared between pdf-api and celery-worker. Bind mounts (./models.json) inject configuration files.

Dependencies ensure startup order:

depends_on starts redis before services that need it. Does not wait for services to be healthy unless healthchecks are defined.

Jaeger provides distributed tracing:

Services can send traces to Jaeger for observability. Jaeger UI accessible at http://localhost:16686.

Using This Pattern#

Step One: Ensure your Dockerfiles exist.

Verify all paths in build.dockerfile are correct. Test building images locally before using in compose.

Step Two: Set required environment variables.

Create a .env file:

NVIDIA_API_KEY=your_api_key
ELEVENLABS_API_KEY=your_elevenlabs_key
MAX_CONCURRENT_REQUESTS=5
Step Three: Build and start services.
  1. Select Project Tab > Environment > Compose

  2. Select Start

  3. First start will build all custom images (may take several minutes)

Step Four: Verify services are running.

Check logs for each service. Test service endpoints to ensure communication works.

Success: All services are running and communicating through the shared network.

Best Practices Across All Patterns#

Use named networks for better isolation:

Create explicit networks instead of relying on the default network. Makes service communication explicit and easier to debug.

Define healthchecks for critical services:

Prevents dependent services from starting before dependencies are ready. Use HTTP endpoints or simple commands that verify service readiness.

Use environment variables for configuration:

Reference secrets and configuration through ${VARIABLE_NAME} syntax. Set variables in AI Workbench or .env files. Never hardcode sensitive values in compose files.

Pin image versions in production:

Use specific tags (image:1.2.3) instead of latest. Ensures reproducible deployments across environments.

Use volumes for persistent data:

Model caches, databases, and application data should use volumes. Prevents data loss when containers restart.

Organize services with profiles:

Group related services into profiles for different deployment scenarios. Enables flexible deployments without maintaining multiple compose files.

Document GPU requirements clearly:

Comment GPU assignments and memory requirements. Helps users understand hardware requirements before deployment.

Use service names for inter-service communication:

Services on the same network can reach each other by service name. Avoid using localhost or IP addresses for service-to-service calls.