Search

Contact sales Log in to Token Factory Log in to AI Cloud

Watch the talks: Videos from Nebius AI Cloud Unveiled meetup

We wrapped up our first series of technical meetups with stops in Paris, London and San Francisco — each time providing a deep dive into AI Cloud and AI Studio, the work of Nebius’ in-house AI R&D team, as well as our contributions to open source. Adam Grzywaczewski, Senior Deep Learning Data Scientist at NVIDIA, has also joined us in London. Here, you can watch all the talks from the first event in the series.

April 8, 2025

4 mins to read

How we build cloud for AI workloads

Gleb Kholodov, Head of Foundation Services at Nebius, shared insights into our hardware and software challenges, the decision-making process and the architecture behind Nebius AI Cloud.

NVIDIA’s talk: Laying foundations for the future of AI

Adam Grzywaczewski, Senior Deep Learning Data Scientist at NVIDIA, explored key highlights and guidance for choosing the best architecture for your AI projects.

Marrying Slurm and Kubernetes for workload management

Grigorii Rochev, Senior SRE, gave a talk on Soperator, the open-source Kubernetes operator for Slurm that we released six months ago. Soperator addresses the challenge of integrating Slurm with K8s, helping to manage the complexity of Slurm environments and compensate for the lack of native autoscaling. It also introduces additional features not available in either vanilla Slurm or vanilla Kubernetes.

What it takes to win the large-scale training game: we made the mistakes, so you don’t have to

Vasily Pantyukhin, our Head of Customer Experience, shared how to avoid the top mistakes and master best practices for scaling AI models.

Improving agentic systems with test-time computation

Boris Yangel ia a research engineer with more than a decade of experience leading complex AI projects and Head of AI R&D at Nebius. He spoke about his team’s recent research on combining guided search with agent inference, and how these techniques enable us to build better software engineering agents.

Inference, all you need to know about it

Last but not least, Head of Product Nikita Vdovushkin and Product Manager Roman Gaev spoke about their experience building Nebius AI Studio and its core Inference Service, which provides GenAI open-source models on a per-token basis. They also shared tips on choosing inference providers and models for your specific application.

Explore Nebius AI Cloud

Explore Nebius Token Factory

Docs and support

author

Nebius team

Videos

How we build cloud for AI workloads
NVIDIA’s talk: Laying foundations for the future of AI
Marrying Slurm and Kubernetes for workload management
What it takes to win the large-scale training game: we made the mistakes, so you don’t have to
Improving agentic systems with test-time computation
Inference, all you need to know about it

See also

Nebius opens pre-orders for NVIDIA Blackwell GPU-powered clusters

We are now accepting pre-orders for NVIDIA GB200 NVL72 and NVIDIA HGX B200 clusters to be deployed in our data centers in the United States and Finland from early 2025. Based on NVIDIA Blackwell, the architecture to power a new industrial revolution of generative AI, these new clusters deliver a massive leap forward over existing solutions.

Explaining Soperator, Nebius’ open-source Kubernetes operator for Slurm

We’ve built an open-source Kubernetes operator that runs and manages Slurm clusters as Kubernetes resources. From this article, you’ll find out how the community tackled this task previously and the details of the architecture of our solution, which we named Soperator.

Kvax: Fast and easy-to-use FlashAttention implementation for JAX

Today, we’re open-sourcing Kvax, our FlashAttention implementation based on JAX. Designed for efficient training with long sequences, Kvax supports context parallelism and optimized computation of document masks. It outperforms many other FlashAttention implementations in long-context training with dense packing, achieving state-of-the-art performance.

Sign in to save this post