Context: Reproducing game bugs, in our case crash bugs in continuously evolving games like Minecraft, is a notoriously manual, time-consuming, and challenging process to automate. Despite the success of LLM-driven bug reproduction in other software domains, games, with their complex interactive environments, remain largely unaddressed.
Objective: This paper introduces BugCraft, a novel end-to-end framework designed to automate the reproduction of crash bugs in Minecraft directly from user-submitted bug reports, addressing the critical gap in automated game bug reproduction.
Method: BugCraft employs a two-stage approach: first, a Step Synthesizer leverages LLMs and Minecraft Wiki knowledge to transform bug reports into high-quality, structured steps to reproduce (S2R). Second, an Action Model, powered by vision-based LLM agents (GPT-4o and GPT-4.1) and a custom macro API, executes these S2R steps within Minecraft to trigger the reported crash.
Results: Evaluated on BugCraft-Bench, our framework with GPT-4.1 successfully reproduced 34.9% of crash bugs end-to-end, outperforming baseline models by 37%. The Step Synthesizer demonstrated a 66.28% accuracy in generating correct bug reproduction plans, highlighting its effectiveness in interpreting and structuring bug report information.
Figure 1: Overview of the BugCraft framework
Figure 2: The BugCraft framework, illustrating the two-stage process of S2R synthesis and action model execution.
Our framework processes bug reports through a sophisticated pipeline that includes preprocessing, step synthesis, and action execution. The Step Synthesizer employs knowledge augmentation and multi-stage refinement to generate high-quality reproduction steps.
Figure 3: Step Synthesizer Component
Figure 4: Action Model Component
Our evaluation demonstrates that BugCraft with GPT-4.1 successfully reproduced 34.9% of crash bugs end-to-end on the BugCraft-Bench dataset (86 valid reports), outperforming baseline models by 37%. This highlights the potential of LLM agents for automated bug reproduction in complex game environments.
GPT-4.1 Success Rate: 34.9% (30 out of 86)
GPT-4o Success Rate: 30.2% (26 out of 86)
Oracle Coverage (Combined): 41.9% (36 out of 86)
When combining both models, 20 bugs were reproduced by both, 10 only by GPT-4.1, and 6 only by GPT-4o, demonstrating complementary strengths.
Accuracy: 66.28% (57 out of 86 reports)
Inter-rater Agreement: Cohen's Kappa: 0.70, Percentage Agreement: 83.0%
BugCraft outperformed all baseline models, achieving a 37% relative improvement over OpenAI's Computer Use Agent and 10-fold cost reduction compared to human experts.
| System | Success Rate | Time | Cost |
|---|---|---|---|
| Human Expert | ~83% | 20 min | $28.20 |
| BugCraft (GPT-4.1) | 34.9% | 10 min | $1.16 |
| BugCraft (GPT-4o) | 30.2% | 15.56 min | $1.45 |
| OpenAI CUA | 25.5% | 6.37 min | $0.65 |
| UI-TARS-1.5-7B | 0.0% | 3.27 min | $0.02 |
For detailed failure analysis and comprehensive breakdowns, please refer to our full paper.
We introduce BugCraft-Bench, a curated dataset of 86 Minecraft crash bug reports, carefully selected and validated for reproducibility. This dataset serves as a benchmark for evaluating automated bug reproduction systems in game environments.
To enhance our framework with comprehensive game knowledge, we scraped the official Minecraft Wiki using MediaWiki API. The extracted content is processed and used for fuzzy matching during bug reproduction, reducing reliance on LLM pre-training data.
Access our curated dataset and contribute to advancing automated game bug reproduction research.
Download DatasetIf you use BugCraft in your research, please cite our paper:
@inproceedings{Yapagci2025Agents,
author = {Yapagci, Eray and Ozturk, Yavuz Alp Sencer and Tuzun, Eray},
title = {{Agents in the Sandbox: End-to-End Crash Bug Reproduction for Minecraft}},
year = {2025},
booktitle = {Proceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering (ASE)},
note = {To appear. Code available at \url{https://bugcraft2025.github.io}}
}
For more information, please contact: