Gcore integrates NVIDIA Dynamo to deliver high-performance, cost-efficient AI inference as a fully managed service

News provided by

25 Feb, 2026, 10:00 GMT

One-click deployment of NVIDIA's open-source inference framework across public, private, hybrid, and on-prem environments

LUXEMBOURG, Feb. 25, 2026 /PRNewswire/ -- Gcore, the global infrastructure and software provider for AI, cloud, network, and security solutions, today announced the integration of NVIDIA Dynamo into its AI inference solutions. The integration delivers significant GPU efficiency gains—up to 6x higher throughput and 2x lower latency—as a fully managed, one-click deployment. Dynamo is available now on Gcore Everywhere Inference and Gcore Everywhere AI.

NVIDIA Dynamo is an open-source inference framework, specifically designed to accelerate and optimize large-scale generative AI and inference models. Dynamo addresses the core challenges that businesses experience when running inference at scale: GPU underutilization, static resource allocation, memory bottlenecks, and data transfer inefficiency.

Gcore is delivering Dynamo as a fully managed solution, pre-optimized for popular inference models. Customers can activate Dynamo with a single click within the Gcore Customer Portal, without managing routing, KV cache logic, or GPU scheduling. This builds on Gcore's commitment to simplifying AI deployment through its intuitive, easy-to-use platform. The Dynamo integration is supported across private cloud, hybrid, and on-premises inference environments on Gcore Everywhere AI and Everywhere Inference.

Seva Vayner, Product Director of Edge Cloud and AI at Gcore, comments: "Modern inference isn't just 'run a model'—it's batching, routing, dynamic workloads, longer contexts, and tight SLOs. In that reality, small scheduling and utilization losses become big performance and cost penalties. By integrating Dynamo as a managed service in Gcore, we bring advanced GPU optimization directly into the runtime path so customers see higher effective throughput and steadier tail latency, without operating the complexity themselves."

Beyond performance gains, NVIDIA Dynamo delivers meaningful cost optimization by increasing GPU utilization and reducing wasted cycles during decode and cache recomputation. By disaggregating prefill and decode, applying KV cache-aware routing, and leveraging NIXL for efficient inter-node communication, Dynamo ensures more requests are processed on the same hardware. This lowers cost per token and improves overall ROI. Gcore makes it particularly easy to access these efficiencies at scale.

Dynamo-powered inference is available today on Gcore Inference and Everywhere AI. Visit Gcore at MWC (Barcelona, March 2–5) or at GTC (San Jose, March 16–19) for an in-person demonstration of NVIDIA Dynamo on Gcore.

About Gcore

Gcore is a global infrastructure and software provider for AI, cloud, network, and security solutions. Headquartered in Luxembourg, Gcore operates its own sovereign infrastructure across six continents, delivering ultra-low latency and compliance-ready performance for mission-critical workloads. Its AI-native cloud stack combines software innovation with hyperscaler-grade functionality, enabling enterprises and service providers to build, train, and scale AI everywhere—across public, private, and hybrid environments. By integrating AI, compute, networking, and security into a single platform, Gcore accelerates digital transformation and empowers organizations to unlock the full potential of AI-driven services.

Logo - https://mma.prnewswire.com/media/2527184/5820804/Gcore_Logo.jpg