SWE-rebench dataset: More than 21,000 verifiable tasks for SWE agents
SWE-rebench dataset: More than 21,000 verifiable tasks for SWE agents
Our AI R&D team announces the open-source release of the SWE-rebench dataset of more than 21,000 real-world, interactive software engineering tasks. For a detailed methodology and technical report, please see our accompanying paper on arXiv.
The development of capable LLM-based software engineering (SWE) agents requires large-scale, diverse training data that reflects real-world scenarios, yet such datasets are scarce. At Nebius, one of our aims is to democratize AI and empower developers to build capable agents on top of open models. This is why we are releasing SWE-rebench — the next iteration of our work on curating datasets for agentic software engineering — which addresses this need by providing tasks mined and validated from thousands of open-source GitHub repositories using our fully automated pipeline.
Key features of the SWE-rebench dataset
-
Massive scale: More than 21,000 interactive tasks from more than 3,400 GitHub repositories.
-
Automated collection: Each task is collected via an automated process powered by a combination of carefully engineered heuristics and LLMs.
-
Rich annotations: Includes installation configurations, dependency versions and LLM-assessed quality scores.
Alongside the dataset, we are releasing a technical report
We believe SWE-rebench will be a valuable resource for training more capable SWE agents, and for benchmarking new models on realistic, interactive tasks. As an example, a curated subset of tasks mined using the SWE-rebench methodology already powers our public SWE-rebench leaderboard