Leveraging high-speed, rack-scale GPU interconnect with NVIDIA GB200 NVL72
Let’s explore one of the key features that makes the new NVIDIA GB200 NVL72 stand out: the fifth generation NVIDIA NVLink™ scale-up fabric. We’ll discuss how it redefines infrastructure by moving beyond the traditional 8-GPU NVLink. You’ll see a practical example of how to take advantage of this capability. Finally, we’ll examine a real-world use case: pre-training the Nemotron-4 340B LLM.
Managed SkyPilot API Server on Nebius AI Cloud: Technical overview and setup
We’re launching Managed SkyPilot API Server on Nebius AI Cloud. It’s a fully managed service that transforms SkyPilot from a single-user tool into a shared platform where teams can pool resources, coordinate workloads and stop worrying about infrastructure operations.
Behind the AI Cloud “Aether” release: Giving enterprises the control they’ve been asking for
At Nebius, we’ve spent the past year working closely with enterprises that are moving AI projects from experiments to business-critical systems. The challenges they raise aren’t about “getting more GPUs” — they’re about how to govern, secure and scale AI infrastructure without creating bottlenecks for their teams. That’s the backdrop for our latest AI Cloud 3.0 release, named “Aether.”
Nebius meets enterprise-level security standards: ISO 27001, SOC 2 Type II including HIPAA and more
Today, we are thrilled to announce we achieved major security and compliance milestones. Independent third-party audits have verified that our security controls meet the requirements of SOC 2 Type II (including HIPAA), and align with the principles of NIS2 and DORA. We also obtained ISO 27001 certification, strengthened our practices by incorporating principles from ISO 27701, 27018, 27799, 27032, and standalone ISO 22301.
Scaling videogen with Baseten Inference Stack on Nebius
Serving AI companies and enterprises with text-to-video inference is no small feat. These teams demand enterprise-ready performance — at scale, with low latency, and high reliability. In this post, we’ll unpack the state-of-the-art engineering that enables Nebius and Baseten to deliver production-grade video generation — and show you how to test it yourself.
Nebius September digest: Microsoft deal, NVIDIA Exemplar Status & benchmark results
September was a landmark month for Nebius. From a major new customer for our AI infrastructure to industry-leading performance recognition, we’ve made strides that directly strengthen the systems you rely on.
Nebius achieves NVIDIA Exemplar Status on NVIDIA H200 GPUs for training workloads
We’re proud to announce that Nebius is one of the first NVIDIA Cloud Partners to achieve NVIDIA Exemplar Status on NVIDIA H200 GPUs for training workloads. This recognition validates that Nebius meets NVIDIA’s rigorous standards for performance, resiliency, and scalability — addressing one of the most pressing challenges in AI infrastructure: ensuring consistent workload performance and predictable cost across clouds.
How tokenizers work in AI models: A beginner-friendly guide
Before AI can generate text, answer questions or summarize information, it first needs to read and understand human language. That’s where tokenization comes in. A tokenizer takes raw text and breaks it into smaller pieces or tokens. These tokens may represent whole words, parts of words or even individual characters and each is mapped to a unique numerical ID that models can process mathematically. In this article we’ll explore how tokenizers work, examine common approaches and walk through the basics of building one yourself.
Build a multi-agent AI customer support system
This guide walks you through building a production-ready, multi-agent AI system by using the Google ADK and A2A, powered by Nebius AI Studio models. With sentiment detection, RAG-powered answers and escalation handling, you can automate customer queries end-to-end.
Model distillation with compute: How to set it up
Model distillation is a practical way to shrink large models into efficient versions that run faster and cost less. As parameter counts climb into the billions, model distillation LLM makes it possible to cut GPU memory use, speed inference and simplify deployment. In this blog we’ll explain how the method works, why GPU compute matters, and what to keep in mind when moving from research models to production systems.
Setting up a RAG-powered content generation with Nebius AI Studio and Qdrant
Learn how to build a smart, scalable content generator by using Nebius AI’s Llama 3.3-70B and Qdrant’s vector search. This RAG-based system lets you upload brand-specific documents and get custom social posts, article drafts and more, rooted in your actual company data.
Incident post-mortem analysis: us-central1 service disruption on September 3, 2025
A detailed analysis of the incident on September 3, 2025 that led to service outages in the us-central1 region.
The incident impacted API operations and Console functionality due to persistent routing loops between network domains, while other regions remained operational.
What is Jupyter Notebook in the context of AI
Jupyter Notebook is a browser-based tool for interactive coding, data exploration and documentation. It lets you run code step by step while combining results, visualizations and explanations in one place. Widely used in machine learning, it speeds up experimentation, ensures reproducibility and makes collaboration easier. This article looks at how Jupyter supports ML workflows, its key features and the tasks it handles best.