Large Language Models (1.1.0)
Large Language Models (1.1.0)

Multi-node Deployment

Note

Requires NIM version 1.1.0+

Some models are too large to be deployed on a single node, even when using multiple GPUs. For these models, you can split the model weights across the different nodes—and across the different GPUs with each node—by deploying NIM on multiple different nodes with access to the model weights.

To determine whether your model requires multi-node deployment, find the number of GPUs required for your desired model in the Model Support Matrix. If you don’t have a single node with the at least the specified number of GPUs, you must use multi-node deployment.

Multi-node deployment requires coordinating the creation of NIM containers across multiple different nodes, and setting up a method for communication between those containers. The recommended approach for this orchestration is to use Kubernetes with the nim-deploy.

Previous Tutorials
Next Deploying with Helm
© Copyright © 2024, NVIDIA Corporation. Last updated on Sep 9, 2024.