Even larger workloads are welcome! We raised quotas last month, so you can now access up to 32 NVIDIA H200 GPUs on demand in both the US and Europe. Speaking of Europe, we’ll be revealing many of our plans for the region at NVIDIA GTC Paris next week. Our research and other initiatives have continued as well — today’s digest covers it all.
We’re heading to Paris for two great events taking place at the same venue! Nebius’ ambition is to be the default-choice AI cloud in Europe and beyond. Why do we believe we can achieve this? Attend our sessions, meet us at our booths and talk to our customers to find out.
We’ve raised quotas for our self service once again. Now you can deploy even larger workloads simply by yourself, without contacting our team, via Nebius console, API, CLI, Terraform or any of our integrations.
Our AI R&D team’s research paper, “Guided Search Strategies in Non-Serializable Environments with Applications to Software Engineering Agents,” has been accepted to ICML 2025 — here’s the preprint on arXiv. The acceptance rate for this year’s conference after peer review was just 27%, not counting desk rejections.
SWE-rebench is our new benchmark for evaluating agentic LLMs on a continuously updated and decontaminated set of real-world software engineering tasks mined from real GitHub repositories.
As a peer-reviewed industry benchmark suite, MLPerf® Training by MLCommons® is one of the most trustworthy sources of data about AI cloud performance in the industry. We achieved 124.5 min and 244.6 min training step time for Llama 3.1 405B on the 128-node and 64-node cluster, respectively. These numbers demonstrate near-linear scaling of Nebius infrastructure.
SGLang, an LLM inference framework, teamed up with Nebius to supercharge DeepSeek R1’s performance for real-world use. The team achieved a 2× throughput boost and significantly lower latency.
Converge Bio is redefining precision medicine by combining single-cell RNA sequencing with large language models to unlock patient-level therapeutic insights. With Nebius’ AI-native infrastructure, they’ve trained a full-transcriptome foundation model (Converge-SC) capable of processing 20,000+ genes per cell — delivering state-of-the-art accuracy, explainability and speed for drug discovery and clinical development.
Recent advancements in LLMs have opened new possibilities for generative molecular drug design. Researchers from YerevaNN and Yerevan State University presented three Nebius-based models, continuously pre-trained on a novel corpus of 110M molecules with computed properties, totaling 40B tokens. A genetic algorithm integrates the models to optimize molecules with promising properties.
In our annual award recognizing startups that are leveraging AI in healthcare and life sciences, the judges reviewed all 103 semifinalists and selected three nominees in each of the four categories — along with seven remarkable companies that we’ve included as honorable mentions.
Our Cloud Solutions Architect Alex Kim walked through how to get Llama 4 and Qwen3 running on Nebius (recently integrated with SkyPilot) by using SGLang as the serving framework.
In the past weeks, we have made several significant observability-related improvements to the Nebius AI Cloud. Among them: advanced monitoring metrics in our web console and API, out-of-the-box Grafana dashboards, as well as monitoring and logging features, allowing customers to upload their custom metrics and logs to our cloud.
Observability services are now documented in one place — everything on Monitoring and Logging is now available in a single section to help you track resource metrics, set up alerts and analyze logs for better infrastructure visibility and reliability.
Nebius AI Studio keeps evolving. Billing now has its own dedicated section — learn how to manage payment methods and view usage. You can integrate Helicone to track costs and metrics of your model runs, or connect Managed MLflow to gain more control and visibility across the ML workflow.
We are now accepting pre-orders for NVIDIA GB200 NVL72 and NVIDIA HGX B200 clusters to be deployed in our data centers in the United States and Finland from early 2025. Based on NVIDIA Blackwell, the architecture to power a new industrial revolution of generative AI, these new clusters deliver a massive leap forward over existing solutions.