The AI cloud will be won at the software layer

Yesterday we announced that Clarifai, a core AI engineering and research team, is joining Nebius. This follows the recently-announced agreement to acquire Eigen AI. Both are deliberate moves on the same bet: that the AI infrastructure opportunity will be decided at the software layer.

May 13, 2026

12 mins to read

Execution speed is the moat

Over the last couple of years, we have built a global cloud for AI from the ground up. Data centers, GPU capacity, networking, operations, customer support, and a long list of platform services and software built-in, for any type of user and organization to move faster than ever from AI prototype to production. All of this came together at a pace that is unusual. And that pace matters. AI cloud is not a slow-moving market, execution speed is becoming the moat. The companies that win will be the ones that can build, integrate, and innovate faster than the market around them.

But building the infrastructure layer is only the first part.

Training remains essential. Frontier model development, fine-tuning, research workloads, and large-scale experimentation will continue to require massive amounts of compute. This is not going to go away.

But the center of gravity in AI is shifting. As models move from labs into products, inference becomes the workload that grows every day: every user, every prompt, every API call, every agentic workflow, every production deployment. Training is where models are created. Inference is where AI becomes a product.

That shift changes what customers need from an AI cloud. Access to GPUs is not enough. They need models to run fast, reliably, and — very critically — cost-efficiently. A break in an API chain now means customer churn. Predictable performance at scale is imperative.

This is why Token Factory has become central to our AI cloud strategy.

Running inference well is a full-stack problem

Inference has three layers that need to work together:

The first is infrastructure: capacity, networking, storage, scheduling, reliability, and operational discipline. Nebius has built that foundation quickly and deliberately.

On top of that sits the model and runtime layer: optimization, quantization, deployment patterns. The work required to make models faster and cheaper to serve without compromising quality.

Finally, the system layer, where everything comes together: serving, orchestration, routing, hardware support, reliability, observability. All the practical engineering needed that bridges model and infrastructure to perform in production.

Most providers focus on one or two of those layers. We are building an AI cloud where infrastructure and inference are designed to work together, always optimized across the complete stack. That is the difference between renting accelerators and delivering a production AI platform.

Eigen AI and Clarifai: two layers, one stack

The pace of AI infrastructure does not always leave much room for organic capability delivery. Some parts of the stack we build ourselves. For others, the right move is to bring in top-notch teams that have already spent years solving those hard problems. The key criterion is alignment to our vision. In this case, strengthening a layer of the stack that is pivotal to where we are going. Eigen AI and Clarifai do that in different ways.

Eigen AI strengthens the model and runtime layer: optimization, quantization, deployment patterns, and the work required to make models faster and cheaper to serve without compromising quality.

Clarifai strengthens the system layer: serving, orchestration, hardware support, reliability, and the practical engineering needed to make customer workloads run in production.

Together, they accelerate the inference stack we are building: fast, efficient, reliable, hardware-aware, and deeply-integrated with Nebius cloud infrastructure.

Critically, inference is not one workload that “serves” all. Some customers optimize for latency, some for cost, some for compliance or data locality. Some want a managed cloud experience. Others run in hybrid, on-premise, or air-gapped environments. Many need flexibility across diverse hardware platforms. We are deeply conscious of that reality and it’s a key design principle of everything we are building, since the beginning of our journey.

Clarifai’s technology adds important experience here: running across different environments and hardware platforms, including cloud, on-premise, air-gapped deployments. For customers, that flexibility can be a deal-breaker. For Nebius, it expands what Token Factory can become.

Research with a production path

Nebius already has a strong AI research team: publishing at top AI conferences, shipping practical work in AI for software engineering, including the SWE-Rebench leaderboards and datasets cited by leading labs. This is the research profile we want: visible in the field, useful in production.

Now Matthew Zeiler, Clarifai’s founder and CEO, will join us as SVP, Research to lead this organization. Matt is one of the early deep-learning and computer-vision builders, and he has spent years on the practical side of AI systems. His experience fits the direction we are taking: research that connects directly to production performance.

The next areas are clear. Agentic workloads are changing the shape of inference: more steps, more context, more tool use, more sensitivity to latency and cost. Physical AI brings another class of hard problems, especially around multimodal and real-world workloads.

The pace is the strategy

The broader point is simple: we are building the software layer of our AI cloud with the same commitment we have been delivering our global infrastructure.

During our first year, we built the foundation. We have since hardened it for production at scale, stress-tested by the most demanding AI teams in the world. Now we are accelerating the inference stack on top of it through deep engineering and product focus, strong partnerships, research, and selective acquisitions of teams that already understand critical parts of the problem.

Teams like Eigen AI and Clarifai are getting us faster there.

Explore Nebius AI Cloud

Docs

Explore Nebius Token Factory

Docs and support

Danila Shtan

Chief Technology Officer

Contents

Execution speed is the moat
Running inference well is a full-stack problem
Eigen AI and Clarifai: two layers, one stack
Research with a production path
The pace is the strategy

The AI cloud will be won at the software layer

Execution speed is the moat

Running inference well is a full-stack problem

Eigen AI and Clarifai: two layers, one stack

Research with a production path

The pace is the strategy

Explore Nebius AI Cloud

Explore Nebius Token Factory

See also

Nebius achieves NVIDIA Exemplar Cloud on NVIDIA GB300 for training: Validated performance for hyperscale AI

Introducing NVIDIA RTX PRO 6000 Blackwell Server Edition on Nebius

Introducing DevPods, Jobs and Endpoints: Easy compute access with serverless AI

Products

Resources

Solutions

Prices

Security and compliance

Programs

Company

Legal

The AI cloud will be won at the software layer

Execution speed is the moatExecution speed is the moat

Running inference well is a full-stack problemRunning inference well is a full-stack problem

Eigen AI and Clarifai: two layers, one stackEigen AI and Clarifai: two layers, one stack

Research with a production pathResearch with a production path

The pace is the strategyThe pace is the strategy

Explore Nebius AI Cloud

Explore Nebius Token Factory

See also

Nebius achieves NVIDIA Exemplar Cloud on NVIDIA GB300 for training: Validated performance for hyperscale AI

Introducing NVIDIA RTX PRO 6000 Blackwell Server Edition on Nebius

Introducing DevPods, Jobs and Endpoints: Easy compute access with serverless AI

Execution speed is the moat

Running inference well is a full-stack problem

Eigen AI and Clarifai: two layers, one stack

Research with a production path

The pace is the strategy