VDebugger: Harnessing Execution Feedback for Debugging Visual Programs

Xueqing Wu; Zongyu Lin; Songyan Zhao; Te-Lin Wu; Pan Lu; Nanyun Peng; Kai-Wei Chang

2024 EMNLP EMNLP 2024

VDebugger: Harnessing Execution Feedback for Debugging Visual Programs

Abstract

AbstractVisual programs are executable code generated by large language models to address visual reasoning problems. They decompose complex questions into multiple reasoning steps and invoke specialized models for each step to solve the problems. However, these programs are prone to logic errors, with our preliminary evaluation showing that 58% of the total errors are caused by program logic errors. Debugging complex visual programs remains a major bottleneck for visual reasoning. To address this, we introduce **VDebugger**, a novel critic-refiner framework trained to localize and debug visual programs by tracking execution step by step. VDebugger identifies and corrects program errors leveraging detailed execution feedback, improving interpretability and accuracy. The training data is generated through an automated pipeline that injects errors into correct visual programs using a novel mask-best decoding technique. Evaluations on six datasets demonstrate VDebugger’s effectiveness, showing performance improvements of up to 3.2% in downstream task accuracy. Further studies show VDebugger’s ability to generalize to unseen tasks, bringing a notable improvement of 2.3% on the unseen COVR task.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — task accuracy

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xueqing Wu , Zongyu Lin , Songyan Zhao , Te-Lin Wu , Pan Lu , Nanyun Peng , Kai-Wei Chang

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Application Areas > Efficient Computing Artificial Intelligence > Core AI > Reasoning Computer Vision > Core AI > Multimodal Learning Deep Learning > Learning Types > Self-Supervised Learning

Keywords

visual reasoning error correction program execution execution feedback visual program program debugging task accuracy critic-refiner framework

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024