Nvidia has released Grove, an open source Kubernetes API that streamlines the orchestration of large-scale AI inference workloads.
Nvidia has introduced Grove, an open source Kubernetes API designed to run and orchestrate AI inference workloads across thousands of GPUs. Available through GitHub or as a modular component within Nvidia’s Dynamo platform, Grove transforms the management of large-scale AI systems into a more streamlined, Kubernetes-native process.
By open sourcing Grove, Nvidia is extending its high-performance orchestration technology beyond proprietary ecosystems, allowing developers and cloud operators to adopt, customise, and scale AI inference workloads collaboratively. The move bridges proprietary GPU infrastructure with open, community-driven AI deployment frameworks, advancing scalability and transparency in AI systems management.
Technically, Grove introduces autoscaling components that enable resources to be scaled as a single Kubernetes entity, from individual modules to full service replicas. It also addresses the growing preference for disaggregated inference, dividing workloads into prefill (context processing) and decoding (token generation) phases, each on dedicated hardware resources.
Replacing traditional gang scheduling, Grove’s PodCliqueScalingGroups bundle tightly coupled Kubernetes pods and scale them efficiently. Nvidia engineers explained: “When scaling for additional capacity, Grove creates complete replicas, defines spread constraints that distribute these replicas across the cluster for high availability, while keeping each replica’s components network-packed for optimal performance.”
The engineers added: “The result is a coordinated deployment of multicomponent AI systems preventing resource fragmentation, avoiding partial deployments, and enabling stable, efficient operation of complex model-serving pipelines at scale.”
As AI inference workloads expand, Grove delivers a scalable, open source solution to orchestrate performance-driven, multi-GPU AI systems seamlessly.



