Webinar. How Slurm meets Kubernetes: introducing Soperator

Managing distributed multi-node ML training on Slurm can be challenging. Soperator, our open-source Kubernetes operator for Slurm, offers a streamlined solution for ML and HPC engineers, making it easier to manage and scale workloads.

Join out live webinar where we’ll demonstrate how Soperator can manage a multi-node GPU cluster to simplify operations and boost productivity.

December 4, Wednesday, 18:00 UTC+1

For who

ML Engineers running distributed training, HPC professionals managing large-scale workloads, DevOps teams supporting ML and HPC environments.

In this webinar, you’ll learn how this solution:

  • Simplifies workload management across multiple GPU nodes.

  • Utilizes a shared root filesystem to reduce setup and scaling complexity.

  • Delivers Slurm job scheduling functionality in a modern and convenient form.

When
December 4, Wednesday, 18:00 (UTC+1). We’ll finish around 19:00 after Q&A part.

Where
Zoom. You will receive the link after registration.

Hosts

Mikhail Mokrushin

Managed Schedulers Team Leader

Alexander Kim

Solutions Architect