
Nebius partners with Positronic on Physical AI Leaderboard (PhAIL)
Nebius partners with Positronic on Physical AI Leaderboard (PhAIL)
Physical AI is moving fast, but the field has lacked a rigorous, real-world benchmark to measure progress. Demo videos and lab success rates tell only part of the story. The operators who decide whether to deploy robotics at scale need harder numbers: throughput, reliability and reproducibility on genuine commercial tasks.
Today, Nebius is excited to announce our role as a founding consortium partner of the Physical AI Leaderboard
Unlike existing benchmarks that report abstract success rates, it measures metrics that matter on an actual shop floor: Units Per Hour (UPH) and Mean Time Between Failures or Assists (MTBF/A). Every run is recorded and published with synchronized video, robot telemetry and scoring logs, so any result can be independently audited. Positronic developed the evaluation methodology and operates the benchmark rigs. The inaugural results, including comparisons to human and teleoperated baselines, are live now at phail.ai
As part of the consortium, Nebius will provide its vertically-integrated AI infrastructure for fine-tuning and evaluation of robot models. Nebius AI Cloud is well suited for physical AI workloads and includes a managed service for data and compute workflows in robotics. Nebius has integrated NVIDIA OSMO
PhAIL is designed to be open and reproducible from end to end. Positronic publishes a free fine-tuning dataset collected through teleoperated demonstrations, along with open-source training scripts that any team can use to prepare their model for evaluation. The benchmark hardware is a Franka Research 3 arm with a Robotiq 2F-85 gripper in the DROID configuration, which is widely available and reproducible. Evaluation is ‘blind’: model checkpoints are rotated randomly so the operator does not know which model is running. Full methodology is documented in the PhAIL white paper
If you’re building physical AI models, the path to participation is open: download the dataset, fine-tune and submit your checkpoint for evaluation on Positronic’s rigs. The consortium launching PhAIL already includes Toloka, the human data infrastructure for frontier AI. If you represent a hardware vendor, simulation platform, academic lab or industry operator and want to help shape what PhAIL measures next, the consortium is actively welcoming new members.
Read Positronic’s full blog post




