Built with Nebius

We regularly share our clients' experiences on how they build workloads for data processing, model training, fine-tuning and inference with the help of our architects. We also share links to open-source models trained in Nebius.

Stanford

CRISPR-GPT is an LLM-powered agent system developed by scientists from Stanford, Princeton and Google DeepMind, which automates gene-editing experiments, from CRISPR system selection to decision making when designing gene-editing RNA to data analysis.

Recraft V4

To train its latest model, Recraft collaborated with Nebius and deployed NVIDIA HGX B200, achieving a seamless transition from the NVIDIA Hopper architecture with minimal code changes. Through early access testing and hands-on engineering support, the company validated Blackwell’s capabilities for large-scale workloads.

Prime Intellect

Prime Intellect partners with every major cloud provider to supply GPUs, but Nebius is their go-to cloud for flexible on-demand utilization and access to frontier hardware. Their latest PoC with the NVIDIA GB200 NVL72 from Nebius delivered advanced performance out of the box.

Higgsfield AI

With Nebius as a co-engineering collaborator and NVIDIA HGX B200 GPUs as the hardware foundation, Higgsfield AI built a training pipeline that stayed stable under sustained load and supported one of the fastest scale-ups ever seen in the application layer of generative AI.

vLLM

Using Nebius’ infrastructure, vLLM — a leading open-source LLM inference framework — is testing and optimizing their inference capabilities in different conditions, enabling high-performance, low-cost model serving in production environments.

Compugen

Compugen’s team trained a unique model for spatial immune features prediction in Nebius, enabling deeper analysis across vast proprietary libraries. This will help uncover previously invisible immune patterns and aid promising solutions to move from code to clinic more efficiently.

Helical

Helical helps pharma and biotech companies reach scientific breakthroughs faster by scaling virtual experimentation. By relying on Nebius’ purpose-built clusters with reliable connectivity and storage integration, Helical closes the gap between foundation models and scientific outcomes.

SGLang

SGLang, a pioneering LLM inference framework, teamed up with Nebius AI Cloud to supercharge DeepSeek R1’s performance for real-world use. The SGLang team achieved a 2× boost in throughput and markedly lower latency on one node.

StringZilla by Unum

Boosting biological data processing capabilities is essential to draw breakthrough insights from rapidly growing DNA, RNA, and protein datasets. Powered by Nebius, Unum optimized StringZilla — an open-source, high-speed string processing library.

Slingshot AI

Slingshot AI is developing a foundation LLM for psychology to address the global need for mental health support. Ash is a chatbot designed to provide long-term support by helping users identify patterns and meet developmental goals. By partnering with Nebius, Slingshot ran Ash’s large-scale AI training, fine-tuning and inference.

Recraft

Recraft, recently funded in a round led by Khosla Ventures and former GitHub CEO Nat Friedman, is the first generative AI model built for designers. Featuring 20 billion parameters, the model was trained from scratch on Nebius.

TheStage AI

The inference market has grown so significantly that inefficiencies between revenue and inference costs have emerged. TheStage AI closes this gap by providing automatic neural network analyzer and optimizer.

Krisp

Krisp’s work with us lies in the field of Accent Localization, an AI-powered real-time voice conversion technology that removes the accent from call center agent speech resulting in US-native speech.

Dubformer

Dubformer is a secure AI dubbing and end-to-end localization solution that guarantees broadcast quality in over 70 languages. The company manages two of its most resource-intensive tasks on Nebius: ML itself and the deployment of models.

Unum

In our field, effective partnerships that harness complementary strengths can drive significant breakthroughs. Such is the case with the collaboration between Nebius and Unum, an AI research lab known for developing compact and efficient AI models.

London Institute for Mathematical Sciences

How well can LLMs abstract problem-solving rules and how to test such ability? A research by the London Institute for Mathematical Sciences, conducted using our infrastructure, helps to understand the causes of LLM imperfections.

Simulacra AI

Simulacra AI is combining ab initio quantum chemistry with deep learning to build a scalable large wavefunction model (LWM) to generate high-accuracy datasets for drug and material discovery pipelines.

Quantori

Quantori’s cheminformatics department recently embarked on a research initiative to develop a molecular generation pipeline leveraging stable diffusion and their own recent developments in the area of graph-convolutional networks.

Converge Bio

Converge Bio is pioneering the use of LLMs to analyze single-cell RNA sequencing data. Their work aims to transform how scientists understand disease mechanisms and therapeutic responses.

SynthLabs

Synthlabs significantly simplified their training infrastructure setup using TractoAI serverless platform. Synthlabs research engineers leveraged TractoAI distributed offline inference capability to accelerate the release of the first open source reasoning dataset.

YerevaNN

Researchers from YerevaNN and Yerevan State University present three Nebius-based models, continuously pre-trained on a novel corpus of 110M molecules with computed properties, totaling 40B tokens. A genetic algorithm integrates the models to optimize molecules with promising properties.

Positronic Robotics

Positronic Robotics is a startup that creates AI-based robot control systems. The company trains ML models on Nebius AI Cloud, developing tools that allow robots to handle cleaning tasks more effectively than humans.

TrialHub

TrialHub leverages RAG-optimized LLMs, semantic search and other advanced language modeling techniques to extract quantifiable insights from over 80,000 trusted medical sources. With Nebius’ expert support, TrialHub launched its 250-million vector database in days.

SieveStack

Powered by Nebius and TractoAI infrastructure, SieveStack scales high-precision data generation, a significant advantage for training a multi-layer foundational model stack that yields progressively more nuanced insights into drug interactions with the human body.

xAID

xAID is building the ultimate AI assistant for medical imaging. With training cycles lasting over five days on noisy clinical data, xAID relies on Nebius AI Cloud for uninterrupted, high-performance computing at scale and expert MLOps support.

Chatfuel

Using Nebius AI Studio to leverage a cascade of Llama-405B models along with a custom SDK, leading AI-powered customer engagement automation platform Chatfuel achieved significant efficiency gains with much better response quality and interaction speed for its AI chatbot agents.

Lynx Analytics

Lynx Analytics bridges industry and tech expertise to deliver AI-powered solutions to enterprises across life sciences, telecom, retail and finance. The team uses Nebius’ elastic AI infra to accelerate delivery across projects and maintain high GPU utilization, keeping performance and costs optimized.

vLLM: Advancing open-source LLM inference

vLLM is an open-source framework under the Linux Foundation, designed to optimize LLM inference at scale. It enables organizations to deploy and serve large language models with greater efficiency, reducing infrastructure costs and enhancing performance.

Goal: To develop and continuously optimize vLLM framework for efficient LLM inference, enabling organizations to serve large language models at lower costs while ensuring scalability and performance optimization.

Solution: The Nebius team provided vLLM with reliable access to cutting-edge compute accelerators and compute clusters for large-scale inference experiments.

Result: With Nebius, vLLM has successfully optimized inference performance for transformer-based models, including DeepSeek R1. The project has achieved high-throughput inference, seamless scalability, and integration of advanced features like multi-latent attention and multi-token prediction.

  • Inference
  • Open-source
Zero
hardware-related issues
Consistently
accurate hardware performance metrics
Compute clusters to run
DeepSeek R1

Enhancing AI-powered search

Brave Software, with over 80 million users, develops a fast, privacy-focused browser and Brave Search, an independent search engine. Its AI-powered feature, Answer with AI, provides real-time, privacy-centric summaries for user queries.

Goal: To generate AI-driven search responses with modern compute infrastructure.

Solution: Brave uses Terraform for provisioning and HAProxy for load balancing, ensuring efficient AI inference, real-time response generation and seamless traffic scaling.

Result: With Nebius, Brave runs large AI models with nearly 100% compute utilization, delivering real-time AI summaries for over 11 million queries daily. The scalable infra allows Brave Search to provide faster, more relevant answers while maintaining strict privacy standards.

  • Inference
  • Web search
  • AI summaries
10–70B
LLM parameters
1.3B
search queries per month
11M+
AI-generated answers daily

Cost-efficient AI deployment platform

The CentML Platform powers open-source model deployment with automated compute optimizations and flexible configurations. CentML delivers state-of-the art inference at reduced costs, without vendor lock-in.

Goal: Give customers access to a highly performant, cost-optimized full stack solution for AI deployment.

Solution: CentML uses Nebius compute alongside ML techniques to optimize their inference platform, delivering flexible scaling, streamlined deployments and enhanced hardware utilization for AI models.

Result: Significant cost savings, improved reliability and scalability, and enhanced EU-based compute capabilities. Customers can reduce infrastructure complexity and securely deploy open-source LLMs.

  • Inference
  • Open-source
x5
lower costs compared to other major providers
Enhanced
compliance with EU compute requirements
1 week
to get cluster online

Stable diffusion inference

TheStage AI builds inference simulators and DNN optimization tools for a wide range of hardware, significantly reducing GPU costs.

Goal: To enhance the capabilities of TheStage AI platform with a focus on stable diffusion architecture.

Solution: To run tests on TheStage AI acceleration framework, particularly on the computationally intensive UNet component, using the open-source Stable Diffusion v1-5 model by RunwayML.

Result: Two acceleration methods — quantization and structured sparsification — were implemented using NVIDIA H100 Tensor Core GPUs for efficient INT8 and sparse computation. The project resulted in a significant reduction in the number of GPUs needed for inference.

  • H100
  • Inference
  • Stable diffusion
4x leap
in speed over the early version of the framework running on A100
~500 ms
to process one image during inference
1B parameters
of the model

Training gen AI foundational model

Recraft is an AI design tool that lets users create and edit digital illustrations, vector art, icons and 3D graphics in a uniform brand style.

Goal: To train the first generative AI model for designers from scratch.

Solution: To utilize all the key parts of Nebius AI and implement PyTorch + Kubeflow, with NCCL used for the hardware setup.

Result: Thanks to the contributions from the Nebius support and architect teams, Recraft overcame hardware configuration challenges and achieved remarkable system stability.

  • GenAI
  • Training
20B
model parameters
Comparable
to DALL·E 3 with 49% preference on PartiPrompts benchmark
54%
preference over Midjourney v6 on the same benchmark

Streamlining music creation through AI

Wubble is a cutting-edge AI platform designed to empower businesses to generate high-quality, royalty-free music instantly, streamlining creative processes and unlocking limitless possibilities for marketing, advertising, podcasts, games, stores and more.

Goal: To optimize AI operations and model deployment for scalable, efficient and low-latency music generation.

Solution: Leveraging Nebius’ infrastructure and Kubernetes, Wubble built a scalable system for managing workloads and deployments.

Result: The company achieved high-capacity inference, QLoRA adaptation and faster audio analysis pipelines. These advancements reduced the time to first token and ensured reliable performance, while integration with GCP enabled robust scalability and efficient resource utilization.

  • Media
  • Inference
  • LoRA
3B+
model parameters
100+ genres
the model is conversant in
1.8 seconds
Reduced time to first token generation

Quantum Chemistry for drug and material discovery

Simulacra AI is transforming the quantum chemistry field by automatically generating high-precision datasets for molecular dynamics models at scale.

Goal: Build a scalable foundational wave-function model for molecular systems that can generate high-accuracy datasets for pipelines of drug and material discovery.

Solution: Simulacra AI used Nebius infrastructure to overcome scalability and efficiency challenges.

Result: Simulacra AI delivers next-generation molecular data, enabling any company to refine in silico pipelines without relying on broad internal infrastructure to train models.

  • Training
  • Research
  • Quantum tech
100M+
model parameters
90% faster
Thanks to Nebius infrastructure, our largest models take 10–20 minutes to compile for pre-training, compared to over 2 hours previously
H100 + H200
NVIDIA Tensor core GPU fleet

Advancing molecular generation

Quantori is the end-to-end data, technology and digital services partner of choice for leading biopharma and healthcare organizations worldwide.

Goal: To develop an AI framework that generates molecules with precise 3D shapes, enhancing drug discovery and material design.

Solution: Quantori employs a pipeline based on Equivariant Diffusion Model and Structure Seer model trained on 1.6M molecules from the ChEMBL database. The pipeline generates molecular structures using shape descriptors.

Result: After 1,500 training epochs, the model successfully generated chemically sound molecules that closely resemble real molecules in shape. The approach enables rapid molecular ideation, predicting valid 3D conformations with optimized properties.

  • Training
  • Drug discovery
1.6M
molecules from ChEMBL — dataset size
1,500 epochs
Training duration
High similarity
to reference geometries

Start your journey today

Explore the platform