ModelExpress is a model weight distribution service for faster worker startup in larger Dynamo clusters. Instead of every worker downloading the full model from storage, one worker can publish model weight availability and later workers can pull compatible tensors from that source over NIXL/RDMA. ModelExpress can also pair with ModelStreamer to stream safetensors directly from object storage into GPU memory.
Use ModelExpress when model rollout time, autoscale cold start, or fleet-wide model updates matter more than the simplicity of a shared PVC. For smaller clusters, start with Model Caching.
--load-format mx on newer images, or mx-source / mx-target on older split-loader images).s3://, gs://, az://, or a local path.MODEL_EXPRESS_URL into all Dynamo pods from the platform modelExpressURL setting.Set the ModelExpress server URL when installing the Dynamo platform:
If the ModelExpress server is installed separately, point dynamo-operator.modelExpressURL at that service. The operator injects the value into worker pods as MODEL_EXPRESS_URL.
Use a runtime image that includes the modelexpress Python package. For ModelStreamer, the image also needs runai-model-streamer and the relevant object-storage SDK dependencies.
Use the load format supported by your runtime image. ModelExpress v0.3 and newer document the unified mx loader. Some older Dynamo images expose mx-source and mx-target loader names instead.
If the ModelExpress server cache is on a non-shared volume, workers cannot read the server’s local cache path. Set MODEL_EXPRESS_NO_SHARED_STORAGE=1 on worker pods so the client streams model files from the server over gRPC:
Use this path when the server has an RWO PVC, runs in a different namespace, or the cluster has no RDMA fabric available. Shared-filesystem mode is still faster when available.
Set MX_MODEL_URI when the first worker should stream safetensors directly from object storage or a local mounted path:
Credentials are consumed by the storage SDKs in the worker pod. They do not flow through the ModelExpress server.
modelExpressURL.