Service Port and GPU Reference#

The following table provides a comprehensive reference of all services, their port mappings, and GPU assignments used in the self-hosted deployment.

Core Application Services#

Service	Container Name	Host Port(s)	Container Port(s)	Default GPU ID	Notes
RAG Server	`rag-server`	8081	8081	N/A (CPU)	Main RAG API endpoint
Ingestor Server	`ingestor-server`	8082	8082	N/A (CPU)	Document ingestion API
RAG Frontend	`rag-frontend`	8090	3000	N/A (CPU)	Web UI
NeMo Retriever Library Runtime	`nv-ingest-ms-runtime`	7670, 7671, 8265	7670, 7671, 8265	N/A (CPU)	Main orchestrator (Ray dashboard: 8265)

NIM Microservices#

Service	Container Name	Host Port(s)	Container Port(s)	Default GPU ID	Environment Variable	Notes
LLM	`nim-llm-ms`	8999	8000	1	`LLM_MS_GPU_ID`	Main language model
Text Embedding	`nemotron-embedding-ms`	9080	8000	0	`EMBEDDING_MS_GPU_ID`	Optional text embeddings (profile: text-embed)
VLM Embedding	`nemotron-vlm-embedding-ms`	9081	8000	0	`VLM_EMBEDDING_MS_GPU_ID`	Default vision-language embeddings
Ranking	`nemotron-ranking-ms`	1976	8000	0	`RANKING_MS_GPU_ID`	Reranking model
VLM	`nemotron-3-nano-omni-30b-a3b-reasoning`	1977	8000	5	`VLM_MS_GPU_ID`	Vision-language model (opt-in, profile: vlm-only, vlm-generation, vlm-rag)
VLM Captioning	`nemotron-nano-12b-v2-vl`	1978	8000	6	`VLM_CAPTIONING_MS_GPU_ID`	Image captioning model (opt-in, profile: ingest, vlm-rag, vlm-generation)
VLM Reranker	`nemotron-ranking-vl-ms`	1979	8000	0	`RANKING_VL_MS_GPU_ID`	Vision-language reranking model (opt-in, profile: vlm-rerank, vlm-rag)
Nemotron Parse	`compose-nemotron-parse-1`	8015, 8016, 8017	8000, 8001, 8002	1	`NEMOTRON_PARSE_MS_GPU_ID`	PDF parsing (opt-in, profile: nemotron-parse)
RIVA ASR	`compose-audio-1`	8021, 8022	50051, 9000	0	`AUDIO_MS_GPU_ID`	Audio speech recognition (opt-in, profile: audio)
Page Elements	`compose-page-elements-1`	8000, 8001, 8002	8000, 8001, 8002	0	`YOLOX_MS_GPU_ID`	Object detection for pages
Graphic Elements	`compose-graphic-elements-1`	8003, 8004, 8005	8000, 8001, 8002	0	`YOLOX_GRAPHICS_MS_GPU_ID`	Graphics detection
Table Structure	`compose-table-structure-1`	8006, 8007, 8008	8000, 8001, 8002	0	`YOLOX_TABLE_MS_GPU_ID`	Table structure detection
Nemotron OCR	`compose-nemotron-ocr-1`	8012, 8013, 8014	8000, 8001, 8002	0	`OCR_MS_GPU_ID`	OCR service (default)

Vector Database and Infrastructure#

Service	Container Name	Host Port(s)	Container Port(s)	Default GPU ID	Environment Variable	Notes
Elasticsearch	`elasticsearch`	9200	9200	N/A (CPU)	N/A	Default
Redis	`compose-redis-1`	6379	6379	N/A (CPU)	N/A	Task queue
Milvus	`milvus-standalone`	19530, 9091	19530, 9091	0	`VECTORSTORE_GPU_DEVICE_ID`	Vector database (Profile: milvus)
SeaweedFS Object Store	`seaweedfs`	9010, 9011	9010, 9011	N/A (CPU)	N/A	S3-compatible object storage
Milvus etcd	`milvus-etcd`	N/A	2379	N/A (CPU)	N/A	Metadata storage (Profile: milvus)

Note

Opt-in NIM Services:

The following NIM services are opt-in and require explicit Docker Compose profile activation:

VLM (nemotron-3-nano-omni-30b-a3b-reasoning): Use profile vlm-only, vlm-generation, or vlm-rag for vision-language model
VLM Reranker (nemotron-ranking-vl-ms): Use profile vlm-rerank or vlm-rag for vision-language reranking model
VLM Captioning (nemotron-nano-12b-v2-vl): Use profile vlm-generation, vlm-rag, or ingest for image captioning model
Nemotron Parse (compose-nemotron-parse-1): Use profile nemotron-parse for advanced PDF parsing
RIVA ASR (compose-audio-1): Use profile audio for audio speech recognition

To activate these services, add --profile <profile-name> when launching services. For example:

USERID=$(id -u) docker compose -f deploy/compose/nims.yaml --profile nemotron-parse up -d

Tip

Customizing GPU Allocations:

Set GPU IDs using environment variables in deploy/compose/.env before launching services.
For services using multiple ports (e.g., page-elements: 8000, 8001, 8002), these correspond to HTTP API, gRPC, and metrics endpoints respectively.
Services marked with “Profile:” only start when that Docker Compose profile is specified using --profile <name>.
Multiple services can share the same GPU (e.g., embedding, ranking, and ingestion services default to GPU 0).
For multi-GPU setups on A100 SXM or B200, see step 3 in the deployment procedure.

Note

Port Conflict Resolution:

If you have port conflicts with existing services:

Stop conflicting services, or
Modify port mappings in the respective Docker Compose YAML files (e.g., change "8081:8081" to "8181:8081" to expose on host port 8181).
Update corresponding environment variables that reference these ports (e.g., APP_VECTORSTORE_URL).