Using SkyPilot and Kubernetes for multi-node fine-tuning of Llama 3.1
When adapting a large language model to your domain or specialized application, you want efficiency and a certain degree of simplicity. The setup of Managed K8s plus SkyPilot we’re going to run through today provides exactly that, albeit not as the only option. Meta Llama-3.1-8B is just an example here — you can apply a similar method to many other LLMs.
This tutorial guides you through setting up distributed multi-node fine-tuning of LLMs using our Managed Kubernetes and SkyPilot. You’ll learn how to:
Deploy a Kubernetes cluster optimized for AI training.
Set up distributed fine-tuning of LLMs.
Monitor training progress and resources.
The benefits of this approach include reduced operational overhead, improved scalability and performance, cost-effective resource utilization and simple management of training jobs and resources.
Managed Service for Kubernetes is a fully managed container orchestration service that simplifies deploying and scaling containerized applications. It handles infrastructure management, supports AI workloads and provides easy-to-use logging and monitoring. This allows ML teams to focus on core tasks rather than managing infrastructure. One of our previous blog posts elaborates on this idea.
SkyPilot is an open-source framework for running machine learning and batch jobs on any cloud or Kubernetes cluster. It simplifies deploying and managing AI workloads by abstracting infrastructure complexities. Key features include automatic cloud selection, managed spot instances and easy scaling of distributed tasks. Built on Ray, SkyPilot enables seamless distributed training across multiple nodes. Its integration with Kubernetes allows running tasks on both cloud and on-premises clusters, offering a comprehensive solution for efficient and cost-effective AI workload management.
The Nebius Solution Library provides a complete Terraform configuration for provisioning a Kubernetes cluster optimized for AI training. The configuration includes:
Unless you change the node settings, the number of nodes of each type will be the same as in the code block above. For more detailed info about available VM types, visit this docs page.
2. Initialize and deploy using the provided environment script:
# Initialize the environment and set access tokens source ./environment.sh
# Initialize Terraform and deploy the cluster
terraform init
terraform apply # this will take a while
Observe the created resources in the console:
Kubernetes cluster
Node groups
3. Configure kubectl config after deployment and check the K8s context:
To save the fine-tuned model, transfer files from sha to an S3-compatible Nebius’ Object Storage:
SSH into the SkyPilot cluster: ssh llama31
Inside the cluster, run:
# See https://docs.nebius.com/iam/service-accounts/access-keys/#configure# on how to obtain the required credentials
aws configure set aws_access_key_id "${NEBIUS_ACCESS_KEY_ID}"
aws configure set aws_secret_access_key "${NEBIUS_SECRET_ACCESS_KEY}"
aws configure set region 'eu-north1'
aws configure set endpoint_url 'https://storage.eu-north1.nebius.cloud:443'
aws s3 cp /mnt/data/MODELSIZE−lora−outputs3://your−nebius−bucket/MODEL_SIZE-lora --recursive
Alternatively, this command could be added at the end of the run section of the SkyPilot task configuration for automatic upload post-training.
After fine-tuning your LLMs with LoRA adapters, you may need a platform to host them for scalable and efficient inference. Nebius AI Studio is designed specifically for this purpose. It provides the most cost-efficient inference of open-source models with per-token pricing on the market. Supporting more than 30 base models, Studio allows you to perform inference at any scale. The infrastructure automatically adapts to your needs based on your current load.
Nebius AI Studio includes a per-token inference feature for LoRA adapters, currently available in preview mode. You can request access to this feature via the Studio interface. Integrating this capability enables you to streamline the deployment of your fine-tuned models, guaranteeing both scalability and cost-effectiveness in production environments.
$ sky queue
Fetching and parsing job queue...
Job queue of cluster llama31
ID NAME SUBMITTED STARTED DURATION RESOURCES STATUS LOG
1 - 10 mins ago 10 mins ago 10m 11s 2x[H100:8] RUNNING ~/sky_logs/sky-2024-10-24-18-46-36-728907
We are now accepting pre-orders for NVIDIA GB200 NVL72 and NVIDIA HGX B200 clusters to be deployed in our data centers in the United States and Finland from early 2025. Based on NVIDIA Blackwell, the architecture to power a new industrial revolution of generative AI, these new clusters deliver a massive leap forward over existing solutions.