Fluid is an open-source, cloud-native data orchestration and acceleration platform for Kubernetes. It virtualizes and accelerates data access from various sources (object storage, distributed file systems, cloud storage), making it ideal for AI, machine learning, and big data workloads.
You can install Fluid on any Kubernetes cluster using Helm.
Prerequisites:
kubectl >= 1.18Helm >= 3.5Quick Install:
For advanced configuration, see the Fluid Installation Guide.
WebUFS allows mounting HTTP/HTTPS sources as filesystems.
After applying, Fluid creates a PersistentVolumeClaim (PVC) named webufs-model containing the files.
Mount an S3 bucket as a Fluid Dataset.
The resulting PVC is named s3-model.
Limitations:
Workaround: Download and Upload to S3/MinIO
Example Pod to Download and Upload:
You can then use s3://hf-models/deepseek-ai/DeepSeek-R1-Distill-Llama-8B as your Dataset mount.
Mount the Fluid-generated PVC in your DynamoGraphDeployment:
When deploying LLaMA 3.3 70B using Fluid as the caching layer, we observed the best performance by configuring a single-node cache that holds 100% of the model files locally. By ensuring that the vllm worker pod is scheduled on the same node as the Fluid cache, we were able to eliminate network I/O bottlenecks, which resulted in the fastest model startup time and the highest inference efficiency during our tests.
and the associated DynamoGraphDeployment with pod affinity to schedule the vllm worker on the same node than the Alluxio cache worker