← Applications

Computer Science › Applications ›

Software Engineering

567 directly classified papers

Papers per year

Papers

PromptSuite: A Task-Agnostic Framework for Multi-Prompt Generation EMNLP 2025

Can LLMs Help You at Work? A Sandbox for Evaluating LLM Agents in Enterprise Environments EMNLP 2025

JSON Whisperer: Efficient JSON Editing with LLMs EMNLP 2025

SWE-MERA: A Dynamic Benchmark for Agenticly Evaluating Large Language Models on Software Engineering Tasks EMNLP 2025

CodeRAG-Bench: Can Retrieval Augment Code Generation? NAACL 2025

LinkAlign: Scalable Schema Linking for Real-World Large-Scale Multi-Database Text-to-SQL EMNLP 2025

Automating the Expansion of Instrument Typicals in Piping and Instrumentation Diagrams (P&IDs) AAAI 2025

TestEval: Benchmarking Large Language Models for Test Case Generation NAACL 2025

VisualCoder: Guiding Large Language Models in Code Execution with Fine-grained Multimodal Chain-of-Thought Reasoning NAACL 2025

Seamlessly Integrating Tree-Based Positional Embeddings into Transformer Models for Source Code Representation ACL 2025

What can Large Language Models Capture about Code Functional Equivalence? NAACL 2025

SelfRACG: Enabling LLMs to Self-Express and Retrieve for Code Generation EMNLP 2025

Unmasking Database Vulnerabilities: Zero-Knowledge Schema Inference Attacks in Text-to-SQL Systems NAACL 2025

CodeScientist: End-to-End Semi-Automated Scientific Discovery with Code-based Experimentation ACL 2025

AssertionBench: A Benchmark to Evaluate Large-Language Models for Assertion Generation NAACL 2025

The Human in Interactive Machine Learning: Analysis and Perspectives for Ambient Intelligence (Abstract Reprint) IJCAI 2025

LLM-Assisted Translation of Legacy FORTRAN Codes to C++: A Cross-Platform Study NAACL 2025

MLDebugging: Towards Benchmarking Code Debugging Across Multi-Library Scenarios ACL 2025

Towards Effectively Leveraging Execution Traces for Program Repair with Code LLMs NAACL 2025

M2RC-EVAL: Massively Multilingual Repository-level Code Completion Evaluation ACL 2025

FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation ACL 2025

ToolCoder: A Systematic Code-Empowered Tool Learning Framework for Large Language Models ACL 2025

Revisit Self-Debugging with Self-Generated Tests for Code Generation ACL 2025

Automated CAD Modeling Sequence Generation from Text Descriptions via Transformer-Based Large Language Models ACL 2025

Classifier-Augmented Generation for Structured Workflow Prediction EMNLP 2025