Tools Fail: Detecting Silent Errors in Faulty Tools

Jimin Sun; So Yeon Min; Yingshan Chang; Yonatan Bisk

2024 EMNLP EMNLP 2024

Tools Fail: Detecting Silent Errors in Faulty Tools

Abstract

AbstractTools have become a mainstay of LLMs, allowing them to retrieve knowledge not in their weights, to perform tasks on the web, and even to control robots. However, most ontologies and surveys of tool-use have assumed the core challenge for LLMs is choosing the tool. Instead, we introduce a framework for tools more broadly which guides us to explore a model’s ability to detect “silent” tool errors, and reflect on how to plan. This more directly aligns with the increasingly popular use of models as tools. We provide an initial approach to failure recovery with promising results both on a controlled calculator setting and embodied agent planning.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — silent error

🐣 Hot Topic Early Bird — error detection

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jimin Sun , So Yeon Min , Yingshan Chang , Yonatan Bisk

Topics

Artificial Intelligence > Core AI > Agent Systems Artificial Intelligence > Core AI > AI Safety Artificial Intelligence > Core AI > Large Language Models Machine Learning > Learning Types > Evaluation

Keywords

error detection tool use embodied agent silent error large language model failure recovery tool error detection

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024