Use OCI S3 Object Storage with NVIDIA NIM for LLMs#
NVIDIA NIM for LLMs supports loading models from Oracle Cloud Infrastructure (OCI) Object Storage using the Amazon S3 Compatibility API. Use this option when your models are hosted in an OCI Object Storage bucket and you want NVIDIA NIM for LLMs to fetch and serve them directly from that location over the S3-compatible API. For more information, refer to the OCI Object Storage Amazon S3 Compatibility API documentation.
Note
This feature is supported only on the multi-LLM NIM container.
Requirements#
OCI S3-compatible endpoint: The URL for your OCI Object Storage bucket that supports the S3 API. This is the endpoint that NVIDIA NIM for LLMs will use to connect to your OCI Object Storage.
Model Path: This is the path to your model repository in the OCI Object Storage bucket. This tells NVIDIA NIM for LLMs where to find and load your models.
Credentials: Provide AWS-style credentials for the S3-compatible endpoint using either environment variables or a shared credentials file.
OCI S3-compatible Endpoint#
Set AWS_ENDPOINT_URL
to your OCI Object Storage S3-compatible endpoint.
-e AWS_ENDPOINT_URL="https://<namespace>.compat.objectstorage.<region>.oraclecloud.com"
Model Path#
Set NIM_MODEL_NAME
to your model repo directory in the following format:
s3repo://<org>/<model-repo>[:<version>]
Examples:
s3repo://meta-llama/Meta-Llama-3-8B
s3repo://meta-llama/Meta-Llama-3-8B:1.14
Alternatively, you can specify the bucket explicitly:
s3repo://<bucket>/<org>/<model-name>[:<version>]
Example:
s3repo://llama-models/meta/llama-3.1-8b
For more details about this environment variable, see Model Configuration.
Loading Credentials#
Configure connectivity region and credentials for the S3-compatible endpoint:
AWS_REGION
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_SESSION_TOKEN
(optional, for temporary credentials)
Note
For more details about these environment variables, refer to Remote Model Repository.
Example#
The following example uses AWS environment variables to load credentials and mounts a local cache directory to persist downloads:
docker run --rm --name=test \
--runtime=nvidia \
--gpus '"device=0"' \
--shm-size=64GB \
-e NGC_API_KEY=$NGC_API_KEY \
-e AWS_REGION="us-sanjose-1" \
-e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \
-e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY \
-e AWS_ENDPOINT_URL="https://<namespace>.compat.objectstorage.<region>.oraclecloud.com" \
-e NIM_MODEL_NAME="s3repo://meta-llama/Meta-Llama-3-8B" \
-v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
-u $(id -u) \
-p 8000:8000 \
nvcr.io/nim/nvidia/llm-nim:1.14.0
Note
This feature works with all supported backends. By default, the backend is TensorRT-LLM. You can set NIM_MODEL_PROFILE
to tensorrt_llm
, vllm
, or sglang
to choose a specific backend.
Tip
Mounting a cache volume (for example, -v $HOME/.cache/nim:/opt/nim/.cache
) avoids re-downloading models on each container start. See Local Model Cache.