Announcing Managed Service for MLflow in public preview
Announcing Managed Service for MLflow in public preview
MLflow is a renowned industry tool that streamlines workflows in the model development cycle. We made MLflow more accessible to a broad audience of ML enthusiasts by providing it as a managed solution.
ML model development is a time-consuming and complicated process. It involves tracking and controlling multiple parameters and variables. It also requires collaborative and well-synchronized work of various roles in your ML team. Although the practice of training models and deploying an ML pipeline in-house is rapidly being adopted, it is much more sophisticated than traditional software development.
For many ML teams, MLflow has become their favorite tool, helping to streamline workflows in the model development cycle. Having seen the struggles of many ML practitioners, we decided to make MLflow more accessible to a broad audience of ML enthusiasts by bringing this tool to the cloud.
Today, we are excited to introduce the public preview of Managed Service for MLflow.
With MLflow on Nebius, your ML team will get:
- A fully managed and ready-to-work solution that enables you to deliver production-ready models faster with zero infrastructure maintenance
- A transparent ML pipeline that allows your ML engineers to control the model development process with a high level of accuracy
- An efficient collaborative tool that enables you to organize and seamlessly share training outcomes and model assets across the team.
Comprehensive platform for ML workflows
MLflow is an open-source software, which was created to simplify the model development process and enable ML teams to deliver predictable results in such an uncertain environment. It has a block structure where the functionality is defined by the components. For instance, MLflow Tracking and MLflow Model Registry are the most fundamental units that provide experiment tracking and model management capabilities respectively. Today, MLflow community contributes to the product with the more sophisticated things suitable for Generative AI and LLM-optimized workflows.
Managed Service for MLflow provides the key functionality necessary for modern MLOps lifecycle: reproducibility of ML experiments, efficient model management, and improved cross-functional collaboration.
Figure 1. How Managed Service for MLflow works
Streamlined experimenting process
Unlike conventional software development, building ML models involves more uncertainty and requires a lot of trial and error. Additionally, it runs in a costly compute environment, and every additional try significantly impacts the project budget.
Managed Service for MLflow captures metadata from your training cluster and makes parameters of every run and experiment visible for your ML engineers. It prevents your team from having to do a lot of guesswork and allows them to achieve desired results with few iterations.
Figure 2. Experiments in Managed Service for MLflow
Convenient ML model management
MLflow Model Registry collects and stores all parameters of your models in a single shared space. This model catalog contains detailed information about training outcomes, including model history, lineage, performance, hyperparameters, labels, etc. Model Registry makes it easier to define which model can currently be used in production and which one in testing, and to make comprehensive CI/CD pipelines.
Space for collaborative work
Being a center of data about your ML pipeline makes MLflow a convenient tool for organizing the collaboration on your ML lifecycle. Product managers, ML engineers, data scientists, and MLOps engineers can receive complex and up-to-date information about every iteration your models go through.
Cloud-based MLflow on Nebius
Managed Service for MLflow is available as cloud-hosted solution, meaning you can fully take advantage of this tool without worrying about hosting, databases, security, and updates. Even if your team lacks MLOps expertise, you can get ready-to-go service with minimum effort spent on starting to use it.
At Nebius, we understand how costly and difficult it is to organize and launch the ML pipeline. That’s why we use our experience and expertise to take care of infrastructure maintenance, freeing up ML practitioners from having to manage server-related issues.
From a user perspective it looks like a traditional SaaS: you just activate the product in a cloud console and start using it. All the dependent services like virtual machines and storage are provisioned automatically under the hood.
Getting started
Managed Service for MLflow works for endpoints in Nebius as well as for external endpoints in other public clouds. You just need to insert a line of code into your training script. If you already have a running GPU Cluster on Nebius, you can seamlessly connect MLflow to it and start capturing metadata without any risk of disruption or performance degradation.
Managed Service for MLflow is now in preview and available for customers free of charge. We plan to extend the product with more functional blocks, integrations, and more profound interconnection with Nebius cloud services.