We’re excited to announce our integration with SkyPilot, an open-source framework that simplifies running AI and batch workloads across cloud platforms. This collaboration enables direct access to Nebius AI Cloud resources via SkyPilot. If you’re already using SkyPilot or considering it as a solution for your workloads, you can now conveniently use our AI Cloud through the same interface. It’s yet another way to access Nebius — alongside our API, CLI, Terraform recipes and others.
SkyPilot is a framework designed to optimize GPU availability and reduce costs when running AI workloads. It provides unified access to cloud resources through simple configuration files.
Create a simple YAML configuration file named sky.yaml:
resources:cloud:nebiusaccelerators:H100:1region:eu-north1file_mounts:/my_data:source:nebius://my-nebius-bucket# must be unique; replace with your own bucket namesetup:|
echo "Setup will be executed on every `sky launch` command on all nodes"
run:|
echo "Run will be executed on every `sky exec` command on all nodes"
echo "Do we have GPUs?"
nvidia-smi
echo "Do we have data?"
ls -l /my_data
Launch your job:
$ sky launch -c sky-test sky.yaml
YAML to run: sky.yaml
Considered resources (1 node):
--------------------------------------------------------------------------------------------------------------
CLOUD INSTANCE vCPUs Mem(GB) ACCELERATORS REGION/ZONE COST ($) CHOSEN
--------------------------------------------------------------------------------------------------------------
Nebius gpu-h100-sxm_8gpu-128vcpu-1600gb 128 1600 H100:8 eu-north1 23.60 ✔
--------------------------------------------------------------------------------------------------------------
Launching a new cluster 'sky-test'. Proceed? [Y/n]: Y
⚙︎ Launching on Nebius eu-north1.
└── Instance is up.
✓ Cluster launched: sky-test. View logs: sky api logs -l sky-2025-02-27-11-36-07-257438/provision.log
⚙︎ Syncing files.
✓ Setup detached.
⚙︎ Job submitted, ID: 1
├── Waiting for task resources on 1 node.
└── Job started. Streaming logs... (Ctrl-C to exitlog streaming; job will not be killed)
...
(setup pid=3800) Setup will be executed on every command on all nodes
(task, pid=3800) Command 'sky' not found, but can be installed with:
(task, pid=3800) sudo apt install beneath-a-steel-sky
(task, pid=3800) Run will be executed on every command on all nodes
(task, pid=3800) Do we have GPUs?
(task, pid=3800) Mon Mar 24 12:35:14 2025
(task, pid=3800) +-----------------------------------------------------------------------------------------+
(task, pid=3800) | NVIDIA-SMI 550.127.08 Driver Version: 550.127.08 CUDA Version: 12.4 |
(task, pid=3800) |-----------------------------------------+------------------------+----------------------|
(task, pid=3800) | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
(task, pid=3800) | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
(task, pid=3800) | | | MIG M. |
(task, pid=3800) |=========================================+========================+======================|
(task, pid=3800) | 0 NVIDIA H100 80GB HBM3 On | 00000000:8A:00.0 Off | 0 |
(task, pid=3800) | N/A 27C P0 67W / 700W | 1MiB / 81559MiB | 0% Default |
(task, pid=3800) | | | Disabled |
(task, pid=3800) +-----------------------------------------+------------------------+----------------------+
(task, pid=3800) +-----------------------------------------------------------------------------------------+
(task, pid=3800) | Processes: |
(task, pid=3800) | GPU GI CI PID Type Process name GPU Memory Usage |
(task, pid=3800) |=========================================================================================|
(task, pid=3800) | No running processes found |
(task, pid=3800) +-----------------------------------------------------------------------------------------+
(task, pid=3800) Do we have data?
(task, pid=3800) total 377487364
(task, pid=3800) -rw-r--r-- 1 ubuntu ubuntu 32212254720 Mar 10 14:21 file_1
(task, pid=3800) -rw-r--r-- 1 ubuntu ubuntu 32212254720 Mar 10 14:25 file_10
(task, pid=3800) -rw-r--r-- 1 ubuntu ubuntu 32212254720 Mar 10 14:26 file_11
(task, pid=3800) -rw-r--r-- 1 ubuntu ubuntu 32212254720 Mar 10 14:26 file_12
(task, pid=3800) -rw-r--r-- 1 ubuntu ubuntu 32212254720 Mar 10 14:21 file_2
(task, pid=3800) -rw-r--r-- 1 ubuntu ubuntu 32212254720 Mar 10 14:22 file_3
(task, pid=3800) -rw-r--r-- 1 ubuntu ubuntu 32212254720 Mar 10 14:22 file_4
(task, pid=3800) -rw-r--r-- 1 ubuntu ubuntu 32212254720 Mar 10 14:23 file_5
(task, pid=3800) -rw-r--r-- 1 ubuntu ubuntu 32212254720 Mar 10 14:23 file_6
(task, pid=3800) -rw-r--r-- 1 ubuntu ubuntu 32212254720 Mar 10 14:24 file_7
(task, pid=3800) -rw-r--r-- 1 ubuntu ubuntu 32212254720 Mar 10 14:24 file_8
(task, pid=3800) -rw-r--r-- 1 ubuntu ubuntu 32212254720 Mar 10 14:25 file_9
(task, pid=3800) drwxr-xr-x 2 ubuntu ubuntu 4096 Mar 24 12:35 joblogs
✓ Job finished (status: SUCCEEDED).
Here’s a brief overview of what SkyPilot just did:
Provisioned a Nebius AI Cloud instance.
Mounted the my-nebius-bucket cloud bucket to /my_data on the VM.
Executed the setup and run commands to verify GPU availability and access to the mounted data.
High-performance NVIDIA Quantum InfiniBand connections make Nebius AI Cloud ideal for distributed training. Here’s a sample configuration for testing network connectivity:
resources:cloud:nebiusaccelerators:H100:8region:eu-north1num_nodes:2setup:|
sudo apt install perftest -y
run:|
MASTER_ADDR=(echo"SKYPILOT_NODE_IPS" | head -n1)
if [ "${SKYPILOT_NODE_RANK}" == "0" ]; then
ib_send_bw --report_gbits -n 1000 -F > /dev/null
elif [ "${SKYPILOT_NODE_RANK}" == "1" ]; then
echo "Testing connection to: $MASTER_ADDR"
sleep 2 # Wait for master to start
ib_send_bw $MASTER_ADDR --report_gbits -n 1000 -F
fi
$ sky launch -c test-sky sky.yaml
YAML to run: sky.yaml
Running on cluster: test-sky
⚙︎ Launching on Nebius eu-north1.
...
(worker1, rank=1, pid=33870, ip=192.168.0.15) ----------------------------------------------------------------
(worker1, rank=1, pid=33870, ip=192.168.0.15) Send BW Test
(worker1, rank=1, pid=33870, ip=192.168.0.15) Dual-port : OFF Device : mlx5_0
(worker1, rank=1, pid=33870, ip=192.168.0.15) Number of qps : 1 Transport type : IB
(worker1, rank=1, pid=33870, ip=192.168.0.15) Connection type : RC Using SRQ : OFF
(worker1, rank=1, pid=33870, ip=192.168.0.15) PCIe relax order: ON
(worker1, rank=1, pid=33870, ip=192.168.0.15) ibv_wr* API : ON
(worker1, rank=1, pid=33870, ip=192.168.0.15) TX depth : 128
(worker1, rank=1, pid=33870, ip=192.168.0.15) CQ Moderation : 1
(worker1, rank=1, pid=33870, ip=192.168.0.15) Mtu : 4096[B]
(worker1, rank=1, pid=33870, ip=192.168.0.15) Link type : IB
(worker1, rank=1, pid=33870, ip=192.168.0.15) Max inline data : 0[B]
(worker1, rank=1, pid=33870, ip=192.168.0.15) rdma_cm QPs : OFF
(worker1, rank=1, pid=33870, ip=192.168.0.15) Data ex. method : Ethernet
(worker1, rank=1, pid=33870, ip=192.168.0.15) ----------------------------------------------------------------
(worker1, rank=1, pid=33870, ip=192.168.0.15) local address: LID 0x1334 QPN 0x0131 PSN 0xcdddde
(worker1, rank=1, pid=33870, ip=192.168.0.15) remote address: LID 0x132f QPN 0x0131 PSN 0x90f79b
(worker1, rank=1, pid=33870, ip=192.168.0.15) ----------------------------------------------------------------
(worker1, rank=1, pid=33870, ip=192.168.0.15) #bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps]
(worker1, rank=1, pid=33870, ip=192.168.0.15) 65536 1000 361.82 361.67 0.689839
(worker1, rank=1, pid=33870, ip=192.168.0.15) ----------------------------------------------------------------
✓ Job finished (status: SUCCEEDED).
We are now accepting pre-orders for NVIDIA GB200 NVL72 and NVIDIA HGX B200 clusters to be deployed in our data centers in the United States and Finland from early 2025. Based on NVIDIA Blackwell, the architecture to power a new industrial revolution of generative AI, these new clusters deliver a massive leap forward over existing solutions.