Grove is a Kubernetes API specifically designed to address the orchestration challenges of modern AI workloads, particularly disaggregated inference systems. Grove provides seamless integration with NVIDIA Dynamo for comprehensive AI infrastructure management.
Grove was originally motivated by the challenges of orchestrating multinode, disaggregated inference systems. It provides a consistent and unified API that allows users to define, configure, and scale prefill, decode, and any other components like routing within a single custom resource.
Grove enables disaggregated serving by breaking down large language model inference into separate, specialized components that can be independently scaled and managed. This architecture provides several advantages:
Grove implements disaggregated serving through several custom Kubernetes resources that provide declarative composition of role-based pod groups:
The top-level Grove object that defines a group of components managed and colocated together. Key features include:
Represents a group of pods with a specific role (e.g., leader, worker, frontend). Each clique features:
A set of PodCliques that scale and are scheduled together, ideal for tightly coupled roles like prefill leader and worker components that need coordinated scaling behavior.
Grove provides several specialized features that make it particularly well-suited for disaggregated serving:
PodCliques and PodCliqueScalingGroups allow users to specify flexible gang-scheduling requirements at multiple levels within a PodCliqueSet to prevent resource deadlocks and ensure all components of a disaggregated system start together.
Supports pluggable horizontal auto-scaling solutions to scale PodCliqueSet, PodClique, and PodCliqueScalingGroup custom resources independently based on their specific metrics and requirements.
Allows specifying network topology pack and spread constraints to optimize for both network performance and service availability, crucial for disaggregated systems where components need efficient inter-node communication.
Prescribes the order in which PodCliques must start in a declarative specification, with pod startup decoupled from pod creation or scheduling. This ensures proper initialization order for disaggregated components.
Grove specifically supports:
Grove is strategically aligned with NVIDIA Dynamo for seamless integration within the AI infrastructure stack:
Grove is aligning its release schedule with NVIDIA Dynamo to ensure seamless integration, with the finalized release cadence reflected in the project roadmap.
The integration creates a comprehensive platform where:
Grove represents a significant advancement in Kubernetes-based orchestration for AI workloads by:
Grove relies on KAI Scheduler for resource allocation and scheduling.
For KAI Scheduler, see the KAI Scheduler Deployment Guide.
For installation instructions, see the Grove Installation Guide.
For practical examples of Grove-based multinode deployments in action, see the Multinode Deployment Guide, which demonstrates multi-node disaggregated serving scenarios.
For the latest updates on Grove, refer to the official project on GitHub.
Dynamo Kubernetes Platform also allows you to install Grove and KAI Scheduler as part of the platform installation. See the Dynamo Kubernetes Platform Deployment Installation Guide for more details.