NVIDIA Grove

Type: Tool Tags: NVIDIA, Grove, Dynamo, Kubernetes, inference, disaggregated serving, topology-aware scheduling, gang scheduling Related: NVIDIA-Dynamo, Dynamo-Disaggregated-Serving, Dynamo-Planner, Dynamo-Profiler, KAI-Scheduler, NVIDIA-Run-ai, NVIDIA-Cloud-Accelerator-NCX, NIXL, TensorRT-LLM, NVIDIA-NIM Sources: https://docs.nvidia.com/ncx/index.html; https://github.com/ai-dynamo/grove; https://docs.nvidia.com/dynamo/latest/kubernetes-deployment/multinode/topology-aware-scheduling; https://docs.nvidia.com/dynamo/latest/components/planner; https://www.nvidia.com/en-us/software/run-ai/ Last Updated: 2026-04-29

Summary

Grove is a Kubernetes API for orchestrating multi-component AI inference workloads. NVIDIA NCX describes it as a modular component of NVIDIA-Dynamo, and the public Grove project describes one declarative interface for workloads ranging from single-pod deployments to multi-node, disaggregated systems. Grove is important for NVIDIA inference because it provides the topology-aware, gang-scheduled, autoscaled deployment structure that large disaggregated serving systems need.

Detail

Grove lets operators describe an entire inference serving system as a single custom resource. Components such as prefill, decode, routing, leader, worker, frontend, or other roles are modeled as PodCliques and related scaling/scheduling groups.

The motivation is that modern inference systems often need behavior Kubernetes does not provide natively: multi-node scaling units, hierarchical gang scheduling, explicit startup ordering, and topology-aware placement. This matters for LLM serving because prefill/decode disaggregation, pipeline components, and router/worker relationships can fail or underperform if scheduled independently without topology constraints.

In NVIDIA’s stack, Grove connects NVIDIA-Dynamo inference deployment to KAI-Scheduler placement. Dynamo docs say topology-aware scheduling uses Grove, ClusterTopology resources, and KAI Scheduler so related pods can be placed within appropriate rack, block, or topology domains.

Connections

  • NVIDIA-Dynamo - Grove is a modular Dynamo component for declarative multi-component inference deployments.
  • Dynamo-Disaggregated-Serving - Grove is relevant when prefill and decode workers need coordinated scheduling.
  • Dynamo-Planner - Planner scaling decisions need a placement layer to add/remove workers safely.
  • Dynamo-Profiler - generated deployment recommendations can flow into Grove-managed Kubernetes deployments.
  • KAI-Scheduler - Grove relies on KAI for topology-aware and gang-scheduled placement.
  • NVIDIA-Run-ai - NVIDIA Run:ai product materials list Grove as a topology-optimized serving component.
  • NVIDIA-Cloud-Accelerator-NCX - NCX lists Grove among the modular software components for AI cloud operators.
  • NIXL - disaggregated serving often depends on fast KV cache and tensor movement across topology-aware placements.
  • TensorRT-LLM - optimized LLM runtimes are common payloads for multi-component inference services.
  • NVIDIA-NIM - NIM deployments can sit above the inference-serving substrate that Dynamo/Grove manage.

Source Excerpts

  • “Grove is a Kubernetes API that provides a single declarative interface.”
  • “KAI Scheduler is required by Grove for topology-aware pod placement.”