DI-BENCH: Benchmarking Large Language Models on Dependency Inference with Testable Repositories at Scale

Linghao Zhang; Junhao Wang; Shilin He; Chaoyun Zhang; Yu Kang; Bowen Li; Jiaheng Wen; Chengxing Xie; Maoquan Wang; Yufan Huang; Elsie Nallipogu; Qingwei Lin; Yingnong Dang; Saravan Rajmohan; Dongmei Zhang; Qi Zhang

2025 ACL ACL 2025

DI-BENCH: Benchmarking Large Language Models on Dependency Inference with Testable Repositories at Scale

Abstract

AbstractLarge Language Models have advanced automated software development, however, it remains a challenge to correctly infer dependencies, namely, identifying the internal components and external packages required for a repository to successfully run. Existing studies highlight that dependency-related issues cause over 40% of observed runtime errors on the generated repository. To address this, we introduce DI-BENCH, a large-scale benchmark and evaluation framework specifically designed to assess LLMs’ capability on dependency inference. The benchmark features 581 repositories with testing environments across Python, C#, Rust, and JavaScript. Extensive experiments with textual and execution-based metrics reveal that the current best-performing model achieves only a 48% execution pass rate on Python, indicating significant room for improvement. DI-BENCH establishes a new viewpoint for evaluating LLM performance on repositories, paving the way for more robust end-to-end software synthesis.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Natural Language Processing

🧭 Keyword Pioneer — dependency inference

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Linghao Zhang , Junhao Wang , Shilin He , Chaoyun Zhang , Yu Kang , Bowen Li , Jiaheng Wen , Chengxing Xie , Maoquan Wang , Yufan Huang , Elsie Nallipogu , Qingwei Lin , Yingnong Dang , Saravan Rajmohan , Dongmei Zhang , Qi Zhang

Topics

Artificial Intelligence > Core AI > Foundation Models Natural Language Processing > Applications > Information Retrieval

Keywords

benchmark evaluation software development large language model dependency inference

Download PDF

Graphically Speaking: Unmasking Abuse in Social Media with Conversation Insights 2025

CodeTool: Enhancing Programmatic Tool Invocation of LLMs via Process Supervision 2025

Structural Deep Encoding for Table Question Answering 2025

Vision-aided Unsupervised Constituency Parsing with Multi-MLLM Debating 2025

DI-BENCH: Benchmarking Large Language Models on Dependency Inference with Testable Repositories at Scale

Abstract

Authors

Topics

Keywords

Related papers