Delivering a validated AI Factory stack for agent workloads on Nebius AI Cloud with DataRobot
March 18, 2026
7 mins to read
AI agents are increasingly being deployed into live business workflows, where sustained inference performance, runtime governance and unit economics matter as much as model quality. Systems must handle continuous token generation, maintain low latency under load and enforce governance controls while keeping inference costs predictable.
At NVIDIA GTC 2026, Nebius and DataRobot, with NVIDIA, announced a validated AI Factory stack designed for production-grade agent workloads.
The DataRobot Agent Workforce Platform, co-engineered with NVIDIA, is now supported on Nebius AI Cloud through a validated configuration optimized for running agent workloads in production environments. This brings together comprehensive agent lifecycle management and a validated AI Factory stack purpose-built for sustained token throughput, ultra-low latency, workload isolation and production grade elasticity with end-to-end governance.
This end-to-end agent lifecycle management platform provides a production-ready path for enterprises deploying AI agents into live workflows where governance, sustained inference and cost control matter as much as model capability.
Nebius AI Cloud provides the infrastructure foundation for running agent workloads. Nebius Token Factory provides serverless access to models for generating tokens. DataRobot sits above these layers, providing the platform used to build, govern, and operate AI agents that rely on those models. Together, these components form the validated AI Factory stack used in this configuration:
DataRobot’s Agent Workforce Platform provides the enterprise layer for building, governing, deploying, and operating AI agents.
NVIDIA AI infrastructure and AI software — including NVIDIA NIM microservices and NVIDIA NeMo Guardrails integrated into DataRobot’s platform — provide the validated AI software components used to support agent execution, policy enforcement, and enterprise controls.
Nebius AI Cloud provides the purpose-built infrastructure layer for running agent workloads. This includes serverless experimentation through Nebius Token Factory, as well as dedicated clusters supporting NVIDIA NIM-based deployments for sustained production inference.
To validated performance of the solution, recent benchmarking of NVIDIA Dynamo on NVIDIA HGX B200 systems using gpt-oss-12b and the Mooncake Traces dataset on Nebius AI Cloud validates the performance characteristics of this stack:
Up to 245,000 tokens per second total throughput on 8× HGX B200 systems;
Time-to-first-token measured in hundreds of milliseconds under load;
Zero-error operation in validated aggregated configurations;
KV-aware routing improving throughput by up to 17% and reducing latency by up to 47%;
Token processing costs as low as the order of pennies per million tokens on a single HGX B200 node.
This benchmarking exercise demonstrated that deployments integrating the NVIDIA NVLink scale up fabric deliver leading, production-ready throughput, while other configurations introduce transport-layer bottlenecks. These findings reinforce the importance of validated infrastructure design for sustained agent workloads.
These findings reinforce the importance of validated infrastructure design for sustained agent workloads. Nebius AI Cloud provides the high-performance GPU infrastructure, while Nebius Token Factory exposes optimized model endpoints with transparent token-level usage, autoscaling, and built-in access controls. DataRobot extends this foundation with the agent control layer — governance, policy enforcement, lifecycle management, and agent-level observability — required to operate AI agents safely and reliably in production environments.
For a full breakdown of benchmarking, including methodology, configuration details, and architectural analysis, see the technical blog published by DataRobot.
Connect with Nebius (booth #713) and DataRobot (booth #104) at GTC 2026.
See our sessions at GTC: Enterprise AI at Speed: Simplifying AI Factory Deployments at Scale [S81852] (Presented by DataRobot), Scale Inference Using Open Models: How Nebius Token Factory Delivers Control and Efficiency (Presented by Nebius) [S82234]