← Applications

Computer Science › Applications ›

Software Engineering

567 directly classified papers

Papers per year

Papers

PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving EMNLP 2025

What Can Youth Learn About Artificial Intelligence and Machine Learning in One Hour? Examining How Hour of Code Activities Address the Five Big Ideas of AI AAAI 2025

Revisit Self-Debugging with Self-Generated Tests for Code Generation ACL 2025

LLM-Assisted Translation of Legacy FORTRAN Codes to C++: A Cross-Platform Study NAACL 2025

FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation ACL 2025

CAD-Recode: Reverse Engineering CAD Code from Point Clouds ICCV 2025

What Do Machine Learning Researchers Mean by “Reproducible”? AAAI 2025

Deriving Semantic Checkers from Tests to Detect Silent Failures in Production Distributed Systems OSDI 2025

What can Large Language Models Capture about Code Functional Equivalence? NAACL 2025

Benchmarking Long-Context Language Models on Long Code Understanding ACL 2025

TestEval: Benchmarking Large Language Models for Test Case Generation NAACL 2025

CoRet: Improved Retriever for Code Editing ACL 2025

VisualCoder: Guiding Large Language Models in Code Execution with Fine-grained Multimodal Chain-of-Thought Reasoning NAACL 2025

Grammar-Based Code Representation: Is It a Worthy Pursuit for LLMs? ACL 2025

CodeRAG-Bench: Can Retrieval Augment Code Generation? NAACL 2025

More Than a Score: Probing the Impact of Prompt Specificity on LLM Code Generation IJCNLP 2025

Hallucinations in Code Change to Natural Language Generation: Prevalence and Evaluation of Detection Metrics IJCNLP 2025

DSBC : Data Science task Benchmarking with Context engineering IJCNLP 2025

Automating the Expansion of Instrument Typicals in Piping and Instrumentation Diagrams (P&IDs) AAAI 2025

Overlapping Context with Variable-Length Stride Increases Diversity when Training Large Language Model for Code ACL 2025

Bridging the AI Gap: Evaluating the Impact of an AI Education Program for Caregivers on Parental Leave AAAI 2025

PDC & DM-SFT: A Road for LLM SQL Bug-Fix Enhancing COLING 2025

LLM Evaluate: An Industry-Focused Evaluation Tool for Large Language Models COLING 2025

Transforming Code Understanding: Clustering-Based Retrieval for Improved Summarization in Domain-Specific Languages COLING 2025

LMR-BENCH: Evaluating LLM Agent’s Ability on Reproducing Language Modeling Research EMNLP 2025