Webinar. How Slurm meets Kubernetes: introducing Soperator
Managing distributed multi-node ML training on Slurm can be challenging. Soperator, our open-source Kubernetes operator for Slurm, offers a streamlined solution for ML and HPC engineers, making it easier to manage and scale workloads.
Join out live webinar where we’ll demonstrate how Soperator can manage a multi-node GPU cluster to simplify operations and boost productivity.
December 4, Wednesday, 18:00 UTC+1
For who
ML Engineers running distributed training, HPC professionals managing large-scale workloads, DevOps teams supporting ML and HPC environments.
In this webinar, you’ll learn how this solution:
-
Simplifies workload management across multiple GPU nodes.
-
Utilizes a shared root filesystem to reduce setup and scaling complexity.
-
Delivers Slurm job scheduling functionality in a modern and convenient form.
When
December 4, Wednesday, 18:00 (UTC+1). We’ll finish around 19:00 after Q&A part.
Where
Zoom. You will receive the link after registration.