Service Port and GPU Reference#
The following table provides a comprehensive reference of all services, their port mappings, and GPU assignments used in the self-hosted deployment.
Core Application Services#
Service |
Container Name |
Host Port(s) |
Container Port(s) |
Default GPU ID |
Notes |
|---|---|---|---|---|---|
RAG Server |
|
8081 |
8081 |
N/A (CPU) |
Main RAG API endpoint |
Ingestor Server |
|
8082 |
8082 |
N/A (CPU) |
Document ingestion API |
RAG Frontend |
|
8090 |
3000 |
N/A (CPU) |
Web UI |
NV-Ingest Runtime |
|
7670, 7671, 8265 |
7670, 7671, 8265 |
N/A (CPU) |
Main orchestrator (Ray dashboard: 8265) |
NIM Microservices#
Service |
Container Name |
Host Port(s) |
Container Port(s) |
Default GPU ID |
Environment Variable |
Notes |
|---|---|---|---|---|---|---|
LLM |
|
8999 |
8000 |
1 |
|
Main language model |
Embedding |
|
9080 |
8000 |
0 |
|
Text embeddings |
VLM Embedding |
|
9081 |
8000 |
0 |
|
Vision-language embeddings (opt-in, profile: vlm-embed) |
Ranking |
|
1976 |
8000 |
0 |
|
Reranking model |
VLM |
|
1977 |
8000 |
5 |
|
Vision-language model (opt-in, profile: vlm-only, vlm-generation) |
Nemotron Parse |
|
8015, 8016, 8017 |
8000, 8001, 8002 |
1 |
|
PDF parsing (opt-in, profile: nemotron-parse) |
RIVA ASR |
|
8021, 8022 |
50051, 9000 |
0 |
|
Audio speech recognition (opt-in, profile: audio) |
Page Elements |
|
8000, 8001, 8002 |
8000, 8001, 8002 |
0 |
|
Object detection for pages |
Graphic Elements |
|
8003, 8004, 8005 |
8000, 8001, 8002 |
0 |
|
Graphics detection |
Table Structure |
|
8006, 8007, 8008 |
8000, 8001, 8002 |
0 |
|
Table structure detection |
NeMo Retriever OCR |
|
8012, 8013, 8014 |
8000, 8001, 8002 |
0 |
|
OCR service (default) |
Vector Database and Infrastructure#
Service |
Container Name |
Host Port(s) |
Container Port(s) |
Default GPU ID |
Environment Variable |
Notes |
|---|---|---|---|---|---|---|
Milvus |
|
19530, 9091 |
19530, 9091 |
0 |
|
Vector database |
Milvus MinIO |
|
9010, 9011 |
9010, 9011 |
N/A (CPU) |
N/A |
Object storage |
Milvus etcd |
|
N/A |
2379 |
N/A (CPU) |
N/A |
Metadata storage |
Redis |
|
6379 |
6379 |
N/A (CPU) |
N/A |
Task queue |
Elasticsearch |
|
9200 |
9200 |
N/A (CPU) |
N/A |
Profile: elasticsearch |
Note
Opt-in NIM Services:
The following NIM services are opt-in and require explicit Docker Compose profile activation:
VLM Embedding (
nemoretriever-vlm-embedding-ms): Use profilevlm-embedfor vision-language embeddingsVLM (
nemo-vlm-microservice): Use profilevlm-onlyorvlm-generationfor vision-language modelNemotron Parse (
compose-nemotron-parse-1): Use profilenemotron-parsefor advanced PDF parsingRIVA ASR (
compose-audio-1): Use profileaudiofor audio speech recognition
To activate these services, add --profile <profile-name> when launching services. For example:
USERID=$(id -u) docker compose -f deploy/compose/nims.yaml --profile nemotron-parse up -d
Tip
Customizing GPU Allocations:
Set GPU IDs using environment variables in
deploy/compose/.envbefore launching services.For services using multiple ports (e.g., page-elements: 8000, 8001, 8002), these correspond to HTTP API, gRPC, and metrics endpoints respectively.
Services marked with “Profile:” only start when that Docker Compose profile is specified using
--profile <name>.Multiple services can share the same GPU (e.g., embedding, ranking, and ingestion services default to GPU 0).
For multi-GPU setups on A100 SXM or B200, see step 3 in the deployment procedure.
Note
Port Conflict Resolution:
If you have port conflicts with existing services:
Stop conflicting services, or
Modify port mappings in the respective Docker Compose YAML files (e.g., change
"8081:8081"to"8181:8081"to expose on host port 8181).Update corresponding environment variables that reference these ports (e.g.,
APP_VECTORSTORE_URL).