Meet SWE-rebench-V2: A multilingual, executable dataset for training Software Engineering Agents

We’re introducing SWE-rebench-V2, the next iteration of our large-scale dataset of reinforcement learning (RL) environments for training autonomous software engineering agents (SWEs).

Despite rapid progress in SWE agents, training remains constrained by limited access to diverse, executable, open-source data. To address this gap, we built a fully automated, language-agnostic pipeline for extracting real-world software engineering tasks at scale, allowing us to collect and release this new dataset of training tasks.

SWE-rebench-V2 includes:

  • 32,000+ executable tasks, each with a pre-built Docker environment;

  • Coverage across 20 programming languages;

  • 100,000+ additional tasks derived from pull requests.

This release is designed to support large-scale, multilingual RL training.

Key features

Executable environments

Every task comes with a pre-built Docker container, enabling reproducible execution out of the box. Environments are configured automatically using a custom interactive setup agent that resolves dependencies and prepares the runtime.

Multilingual at scale

The dataset spans 20 programming languages, including both major ecosystems and underrepresented languages such as Lua and Scala. This enables research on cross-language reasoning and improves robustness beyond Python-only datasets.

Quality filtering and structured metadata

Tasks are filtered and labeled using an ensemble of LLMs. We also extract method signatures exercised by the test suite and generate problem descriptions for pull request–derived tasks, ensuring complete and structured training signals.

Technical report

Alongside the dataset, we are releasing a technical report detailing the extraction pipeline, filtering methodology and a diagnostic study evaluating modern models on these tasks.

Explore Nebius AI Cloud

Explore Nebius Token Factory

author
Ibragim Badertdinov
Lead ML Engineer at Nebius
Sign in to save this post