One-click cluster setup

Launch your training environment in minutes, not days. Our solution handles everything — from node provisioning to pre-installed dependencies — so you can start scheduling jobs instantly with zero infrastructure configuration.

Fault-tolerant training

Train your models without stress. Automatic health checks and automatic recovery ensure your jobs keep running, even during hardware or node failures. Integrated monitoring dashboards and logging provide an advanced visibility and full control over the cluster.

Maximum GPU utilization

Make the most of your AI hardware. Smart scheduling and topology-aware job placement boost efficiency for large-scale training. Optimized dependencies ensure quick execution of your model training frameworks.

How to launch a Slurm cluster in minutes

This video demonstrates how quickly you can set up and launch a Slurm cluster for AI training by using Managed Soperator.

Try Managed Soperator

Sign up for the console, add billing details and set up your cluster parameters. That’s it! The system will provision the compute and make Slurm deployment automatically.

Get started

Running Slurm on Kubernetes

Managed Soperator is powered by our custom-made Kubernetes operator for Slurm. We give you the advanced job scheduling capabilities of Slurm and the cloud-native flexibility of Kubernetes in one training environment.

How it works

Shared root filesystem provides a single file environment for all nodes of the cluster, ensuring simplified package management and cluster scalability.

Open source solution

At Nebius, we believe that only together we can create better technologies. That’s why we made Soperator open source, providing ML enthusiasts and HPC practitioners with the opportunity to use this technology for their endeavors and improve it according to their needs.

GitHub Repository

Slurm-on-Kubernetes solutions by Nebius

Managed Soperator

Professional Soperator

Soperator

Solution

Slurm-based clusters

Kubernetes operator for Slurm

Delivery model

Self-service app

Professional service

Open-source software

Cloud environment

Nebius

Cloud agnostic

Pre-installed AI/ML-drivers and libraries

Yes

All types of containers supported

Yes

Passive health checks

Yes

No

Active health checks

Yes

No

Topology-aware job scheduling

Yes

No

Auto-healing mechanism

Yes

on Nebius cloud only

Free software, consumption-based pricing

Yes

Getting started

Sign up for the console to get started immediately with Managed Soperator, or contact our team for Professional Soperator where we handle complete installation and configuration.

Get started Contact us

Related services

Managed Service for Kubernetes®

A fully managed container orchestrator optimized for modern AI workloads.

Compute

Virtual machines and block storage for any AI and ML workloads.

Questions and answers

Slurm is an open source, fault-tolerant and highly scalable cluster management and job scheduling system for large and small Linux clusters.

Learn more

Slurm is a registered trademerk of SchedMD LLC.

Managed Soperator

One-click cluster setup

Fault-tolerant training

Maximum GPU utilization

How to launch a Slurm cluster in minutes

Try Managed Soperator

Running Slurm on Kubernetes

How it works

Open source solution

Slurm-on-Kubernetes solutions by Nebius

Getting started

Related services

Managed Service for Kubernetes®

Compute

Questions and answers

Products

Resources

Solutions

Prices

Security and compliance

Programs

Company

Legal