Built with Nebius

We regularly share our clients' experiences on how they build workloads for data processing, model training, fine-tuning and inference with the help of our architects. We also share links to open-source models trained in Nebius.

vLLM: Advancing open-source LLM inference

vLLM is an open-source framework under the Linux Foundation, designed to optimize LLM inference at scale. It enables organizations to deploy and serve large language models with greater efficiency, reducing infrastructure costs and enhancing performance.

Goal: To develop and continuously optimize vLLM framework for efficient LLM inference, enabling organizations to serve large language models at lower costs while ensuring scalability and performance optimization.

Solution: The Nebius team provided vLLM with reliable access to cutting-edge compute accelerators and compute clusters for large-scale inference experiments.

Result: With Nebius, vLLM has successfully optimized inference performance for transformer-based models, including DeepSeek R1. The project has achieved high-throughput inference, seamless scalability, and integration of advanced features like multi-latent attention and multi-token prediction.

  • Inference
  • Open-source
Zero
hardware-related issues
Consistently
accurate hardware performance metrics
Compute clusters to run
DeepSeek R1

Enhancing AI-powered search

Brave Software, with over 80 million users, develops a fast, privacy-focused browser and Brave Search, an independent search engine. Its AI-powered feature, Answer with AI, provides real-time, privacy-centric summaries for user queries.

Goal: To generate AI-driven search responses with modern compute infrastructure.

Solution: Brave uses Terraform for provisioning and HAProxy for load balancing, ensuring efficient AI inference, real-time response generation and seamless traffic scaling.

Result: With Nebius, Brave runs large AI models with nearly 100% compute utilization, delivering real-time AI summaries for over 11 million queries daily. The scalable infra allows Brave Search to provide faster, more relevant answers while maintaining strict privacy standards.

  • Inference
  • Web search
  • AI summaries
10–70B
LLM parameters
1.3B
search queries per month
11M+
AI-generated answers daily

Cost-efficient AI deployment platform

The CentML Platform powers open-source model deployment with automated compute optimizations and flexible configurations. CentML delivers state-of-the art inference at reduced costs, without vendor lock-in.

Goal: Give customers access to a highly performant, cost-optimized full stack solution for AI deployment.

Solution: CentML uses Nebius compute alongside ML techniques to optimize their inference platform, delivering flexible scaling, streamlined deployments and enhanced hardware utilization for AI models.

Result: Significant cost savings, improved reliability and scalability, and enhanced EU-based compute capabilities. Customers can reduce infrastructure complexity and securely deploy open-source LLMs.

  • Inference
  • Open-source
x5
lower costs compared to other major providers
Enhanced
compliance with EU compute requirements
1 week
to get cluster online

Stable diffusion inference

TheStage AI builds inference simulators and DNN optimization tools for a wide range of hardware, significantly reducing GPU costs.

Goal: To enhance the capabilities of TheStage AI platform with a focus on stable diffusion architecture.

Solution: To run tests on TheStage AI acceleration framework, particularly on the computationally intensive UNet component, using the open-source Stable Diffusion v1-5 model by RunwayML.

Result: Two acceleration methods — quantization and structured sparsification — were implemented using NVIDIA H100 Tensor Core GPUs for efficient INT8 and sparse computation. The project resulted in a significant reduction in the number of GPUs needed for inference.

  • H100
  • Inference
  • Stable diffusion
4x leap
in speed over the early version of the framework running on A100
~500 ms
to process one image during inference
1B parameters
of the model

Training gen AI foundational model

Recraft is an AI design tool that lets users create and edit digital illustrations, vector art, icons and 3D graphics in a uniform brand style.

Goal: To train the first generative AI model for designers from scratch.

Solution: To utilize all the key parts of Nebius AI and implement PyTorch + Kubeflow, with NCCL used for the hardware setup.

Result: Thanks to the contributions from the Nebius support and architect teams, Recraft overcame hardware configuration challenges and achieved remarkable system stability.

  • GenAI
  • Training
20B
model parameters
Comparable
to DALL·E 3 with 49% preference on PartiPrompts benchmark
54%
preference over Midjourney v6 on the same benchmark

Streamlining music creation through AI

Wubble is a cutting-edge AI platform designed to empower businesses to generate high-quality, royalty-free music instantly, streamlining creative processes and unlocking limitless possibilities for marketing, advertising, podcasts, games, stores and more.

Goal: To optimize AI operations and model deployment for scalable, efficient and low-latency music generation.

Solution: Leveraging Nebius’ infrastructure and Kubernetes, Wubble built a scalable system for managing workloads and deployments.

Result: The company achieved high-capacity inference, QLoRA adaptation and faster audio analysis pipelines. These advancements reduced the time to first token and ensured reliable performance, while integration with GCP enabled robust scalability and efficient resource utilization.

  • Media
  • Inference
  • LoRA
3B+
model parameters
100+ genres
the model is conversant in
1.8 seconds
Reduced time to first token generation

Quantum Chemistry for drug and material discovery

Simulacra AI is transforming the quantum chemistry field by automatically generating high-precision datasets for molecular dynamics models at scale.

Goal: Build a scalable foundational wave-function model for molecular systems that can generate high-accuracy datasets for pipelines of drug and material discovery.

Solution: Simulacra AI used Nebius infrastructure to overcome scalability and efficiency challenges.

Result: Simulacra AI delivers next-generation molecular data, enabling any company to refine in silico pipelines without relying on broad internal infrastructure to train models.

  • Training
  • Research
  • Quantum tech
100M+
model parameters
90% faster
Thanks to Nebius infrastructure, our largest models take 10–20 minutes to compile for pre-training, compared to over 2 hours previously
H100 + H200
NVIDIA Tensor core GPU fleet

Advancing molecular generation

Quantori is the end-to-end data, technology and digital services partner of choice for leading biopharma and healthcare organizations worldwide.

Goal: To develop an AI framework that generates molecules with precise 3D shapes, enhancing drug discovery and material design.

Solution: Quantori employs a pipeline based on Equivariant Diffusion Model and Structure Seer model trained on 1.6M molecules from the ChEMBL database. The pipeline generates molecular structures using shape descriptors.

Result: After 1,500 training epochs, the model successfully generated chemically sound molecules that closely resemble real molecules in shape. The approach enables rapid molecular ideation, predicting valid 3D conformations with optimized properties.

  • Training
  • Drug discovery
1.6M
molecules from ChEMBL — dataset size
1,500 epochs
Training duration
High similarity
to reference geometries

Start your journey today

Explore the platform