Built with Nebius
We regularly share our clients' experiences on how they build workloads for data processing, model training, fine-tuning and inference with the help of our architects. We also share links to open-source models trained in Nebius.

We regularly share our clients' experiences on how they build workloads for data processing, model training, fine-tuning and inference with the help of our architects. We also share links to open-source models trained in Nebius.
vLLM is an open-source framework under the Linux Foundation, designed to optimize LLM inference at scale. It enables organizations to deploy and serve large language models with greater efficiency, reducing infrastructure costs and enhancing performance.
Goal: To develop and continuously optimize vLLM framework for efficient LLM inference, enabling organizations to serve large language models at lower costs while ensuring scalability and performance optimization.
Solution: The Nebius team provided vLLM with reliable access to cutting-edge compute accelerators and compute clusters for large-scale inference experiments.
Result: With Nebius, vLLM has successfully optimized inference performance for transformer-based models, including DeepSeek R1. The project has achieved high-throughput inference, seamless scalability, and integration of advanced features like multi-latent attention and multi-token prediction.
Brave Software, with over 80 million users, develops a fast, privacy-focused browser and Brave Search, an independent search engine. Its AI-powered feature, Answer with AI, provides real-time, privacy-centric summaries for user queries.
Goal: To generate AI-driven search responses with modern compute infrastructure.
Solution: Brave uses Terraform for provisioning and HAProxy for load balancing, ensuring efficient AI inference, real-time response generation and seamless traffic scaling.
Result: With Nebius, Brave runs large AI models with nearly 100% compute utilization, delivering real-time AI summaries for over 11 million queries daily. The scalable infra allows Brave Search to provide faster, more relevant answers while maintaining strict privacy standards.
The CentML Platform powers open-source model deployment with automated compute optimizations and flexible configurations. CentML delivers state-of-the art inference at reduced costs, without vendor lock-in.
Goal: Give customers access to a highly performant, cost-optimized full stack solution for AI deployment.
Solution: CentML uses Nebius compute alongside ML techniques to optimize their inference platform, delivering flexible scaling, streamlined deployments and enhanced hardware utilization for AI models.
Result: Significant cost savings, improved reliability and scalability, and enhanced EU-based compute capabilities. Customers can reduce infrastructure complexity and securely deploy open-source LLMs.
TheStage AI builds inference simulators and DNN optimization tools for a wide range of hardware, significantly reducing GPU costs.
Goal: To enhance the capabilities of TheStage AI platform with a focus on stable diffusion architecture.
Solution: To run tests on TheStage AI acceleration framework, particularly on the computationally intensive UNet component, using the open-source Stable Diffusion v1-5 model by RunwayML.
Result: Two acceleration methods — quantization and structured sparsification — were implemented using NVIDIA H100 Tensor Core GPUs for efficient INT8 and sparse computation. The project resulted in a significant reduction in the number of GPUs needed for inference.
Recraft is an AI design tool that lets users create and edit digital illustrations, vector art, icons and 3D graphics in a uniform brand style.
Goal: To train the first generative AI model for designers from scratch.
Solution: To utilize all the key parts of Nebius AI and implement PyTorch + Kubeflow, with NCCL used for the hardware setup.
Result: Thanks to the contributions from the Nebius support and architect teams, Recraft overcame hardware configuration challenges and achieved remarkable system stability.
Wubble is a cutting-edge AI platform designed to empower businesses to generate high-quality, royalty-free music instantly, streamlining creative processes and unlocking limitless possibilities for marketing, advertising, podcasts, games, stores and more.
Goal: To optimize AI operations and model deployment for scalable, efficient and low-latency music generation.
Solution: Leveraging Nebius’ infrastructure and Kubernetes, Wubble built a scalable system for managing workloads and deployments.
Result: The company achieved high-capacity inference, QLoRA adaptation and faster audio analysis pipelines. These advancements reduced the time to first token and ensured reliable performance, while integration with GCP enabled robust scalability and efficient resource utilization.
Simulacra AI is transforming the quantum chemistry field by automatically generating high-precision datasets for molecular dynamics models at scale.
Goal: Build a scalable foundational wave-function model for molecular systems that can generate high-accuracy datasets for pipelines of drug and material discovery.
Solution: Simulacra AI used Nebius infrastructure to overcome scalability and efficiency challenges.
Result: Simulacra AI delivers next-generation molecular data, enabling any company to refine in silico pipelines without relying on broad internal infrastructure to train models.
Quantori is the end-to-end data, technology and digital services partner of choice for leading biopharma and healthcare organizations worldwide.
Goal: To develop an AI framework that generates molecules with precise 3D shapes, enhancing drug discovery and material design.
Solution: Quantori employs a pipeline based on Equivariant Diffusion Model and Structure Seer model trained on 1.6M molecules from the ChEMBL database. The pipeline generates molecular structures using shape descriptors.
Result: After 1,500 training epochs, the model successfully generated chemically sound molecules that closely resemble real molecules in shape. The approach enables rapid molecular ideation, predicting valid 3D conformations with optimized properties.