LibEvolutionEval: A Benchmark and Study for Version-Specific Code Generation

Sachit Kuhar; Wasi Uddin Ahmad; Zijian Wang; Nihal Jain; Haifeng Qian; Baishakhi Ray; Murali Krishna Ramanathan; Xiaofei Ma; Anoop Deoras

2025 NAACL NAACL 2025

LibEvolutionEval: A Benchmark and Study for Version-Specific Code Generation

Abstract

AbstractRecent advancements in code completion models have primarily focused on local file contexts. However, these studies do not fully capture the complexity of real-world software development, which often requires the use of rapidly-evolving public libraries. To address this gap, we introduce LibEvolutionEval, a comprehensive study that emphasizes the need to understand library evolution to perform accurate in-line code completions. LibEvolutionEvaloffers a version-specific code-completion task across eight libraries as they evolve over the years, along with an in-depth analysis of the evolution of two widely used and well-maintained public libraries: PyTorch and Matplotlib. We evaluate several popular models and find that public library evolution significantly affects their performance. To mitigate this, we explored how retrieving version-specific library documentation and prompt-based techniques can enhance model capability in dealing with these fast-evolving packages. This suggests a promising path forward for better handling fast-evolving libraries. Our tasks will be made publicly available upon acceptance.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — library evolution

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio