Managed Soperator
A fully managed Slurm-on-Kubernetes solution for simplified AI training on NVIDIA GPU clusters.
One-click cluster setup
Launch your training environment in minutes, not days. Our solution handles everything — from node provisioning to pre-installed dependencies — so you can start scheduling jobs instantly with zero infrastructure configuration.
Fault-tolerant training
Train your models without stress. Automatic health checks and automatic recovery ensure your jobs keep running, even during hardware or node failures. Integrated monitoring dashboards and logging provide an advanced visibility and full control over the cluster.
Maximum GPU utilization
Make the most of your AI hardware. Smart scheduling and topology-aware job placement boost efficiency for large-scale training. Optimized dependencies ensure quick execution of your model training frameworks.
How to launch a Slurm cluster in minutes
How to launch a Slurm cluster in minutes
This video demonstrates how quickly you can set up and launch a Slurm cluster for AI training by using Managed Soperator.

Try Managed Soperator
Sign up for the console, add billing details and set up your cluster parameters. That’s it! The system will provision the compute and make Slurm deployment automatically.
Running Slurm on Kubernetes
Our Managed Soperator is powered by Soperator, our custom-made Kubernetes operator for Slurm. It allows us to combing advanced job scheduling capabilities of Slurm and cloud-native flexibility of Kubernetes in one training environment.
How it works
How it works
Shared root filesystem provides a single file environment for all nodes of the cluster, ensuring simplified package management and cluster scalability.
Open source solution
At Nebius, we believe that only together we can create better technologies. That’s why we made Soperator open source, providing ML enthusiasts and HPC practitioners with the opportunity to use this technology for their endeavors and improve it according to their needs.
Slurm-on-Kubernetes solutions by Nebius
Managed Soperator
Professional Soperator
Soperator
Solution
Slurm-based clusters
Slurm-based clusters
Kubernetes operator for Slurm
Delivery model
Self-service app
Professional service
Open-source software
Cloud environment
Nebius
Nebius
Cloud agnostic
Pre-installed AI/ML-drivers and libraries
Yes
Yes
Yes
All types of containers supported
Yes
Yes
Yes
Passive health checks
Coming soon
Yes
No
Active health checks
Coming soon
Yes
No
Topology-aware job scheduling
Coming soon
Yes
No
Auto-healing mechanism
Coming soon
Yes
on Nebius cloud only
Free software, consumption-based pricing
Yes
Yes
Yes
Getting started
Sign up for the console to get started immediately with Managed Soperator, or contact our team for Professional Soperator where we handle complete installation and configuration.
Questions and answers
Slurm is an open source, fault-tolerant and highly scalable cluster management and job scheduling system for large and small Linux clusters.
SchedMD
SchedMD
By partnering directly with SchedMD, the developer of the Slurm Workload Manager, Nebius provides exceptional support to Slurm users. SchedMD robust Slum workload manager streamlines job scheduling and resource allocation. Its scalability and reliability make it a versatile solution that can meet a variety of business needs.

Slurm is a registered trademerk of SchedMD LLC.