NVIDIA Dynamo Goes Enterprise As Gcore Rolls Out One-Click AI Inference

0
2
NVIDIA Dynamo Moves From Codebase To Production As Gcore Launches One-Click Managed AI Inference With 6x Throughput And Lower Costs
NVIDIA Dynamo Moves From Codebase To Production As Gcore Launches One-Click Managed AI Inference With 6x Throughput And Lower Costs

Gcore turns NVIDIA’s open source Dynamo into a fully managed, one-click inference service, helping enterprises deploy faster, cheaper, and more efficient AI workloads across any cloud or on-prem setup.

Gcore has integrated NVIDIA’s open-source Dynamo inference framework into its AI stack, transforming the community-built technology into a fully managed, production-ready service for enterprises. Delivered as a one-click deployment, the offering is now available on Gcore Everywhere Inference and Gcore Everywhere AI across public, private, hybrid and on-prem environments.

The move shifts Dynamo from codebase to infrastructure, removing operational barriers that typically slow open-source adoption while enabling organisations to run large-scale generative AI and inference workloads without managing routing, KV cache logic or GPU scheduling.

The performance gains are significant. Gcore reports up to 6× higher throughput and up to 2× lower latency, alongside improved GPU utilisation, reduced wasted cycles, lower cost per token and stronger ROI.

Dynamo is designed to address persistent inference bottlenecks such as GPU underutilisation, static resource allocation, memory limits and inefficient data movement. It uses techniques including prefill/decode disaggregation, KV cache-aware routing, dynamic scheduling and efficient inter-node communication to process more requests on the same hardware.

Seva Vayner, Product Director of Edge Cloud and AI at Gcore, said: “Modern inference isn’t just ‘run a model’-it’s batching, routing, dynamic workloads, longer contexts, and tight SLOs. In that reality, small scheduling and utilisation losses become big performance and cost penalties. By integrating Dynamo as a managed service in Gcore, we bring advanced GPU optimisation directly into the runtime path so customers see higher effective throughput and steadier tail latency, without operating the complexity themselves.”

Dynamo-powered inference is available now, with live demonstrations planned at Mobile World Congress and NVIDIA GTC.

LEAVE A REPLY

Please enter your comment!
Please enter your name here