Beyond Text: Unveiling Multimodal Proficiency of Large Language Models with MultiAPI Benchmark

Xiao Liu; Jianfeng Lin; Jiawei Zhang

2024 ACL ACL 2024

Beyond Text: Unveiling Multimodal Proficiency of Large Language Models with MultiAPI Benchmark

Abstract

AbstractThe proliferation of Large Language Models like ChatGPT has significantly advanced language understanding and generation, impacting a broad spectrum of applications. However, these models predominantly excel in text-based tasks, overlooking the complexity of real-world multimodal information. This study introduces MultiAPI, a pioneering comprehensive large-scale API benchmark dataset aimed at expanding LLMs’ proficiency in multimodal contexts. Developed collaboratively through ChatGPT, MultiAPI consists of 187 diverse API calls and 1,799 contextual prompts, offering a unique platform evaluation of tool-augmented LLMs handling multimodal tasks. Through comprehensive experiments, our findings reveal that while LLMs demonstrate proficiency in API call decision-making, they face challenges in domain identification, function selection, and argument generation. What’s more, we surprisingly notice that auxiliary context can actually impair the performance. An in-depth error analysis paves the way for a new paradigm to address these challenges, suggesting a potential direction for future LLM research.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Natural Language Processing

🧭 Keyword Pioneer — api benchmark

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xiao Liu , Jianfeng Lin , Jiawei Zhang

Topics

Artificial Intelligence > Core AI > Multimodal Learning Natural Language Processing > Resources & Methods > Large Language Models Artificial Intelligence > Core AI > Large Language Models Artificial Intelligence > Core AI > Multi-Modal Learning Deep Learning > Optimization & Theory > Evaluation

Keywords

multimodal learning large language model api benchmark tool-augmented learning

Download PDF

Related papers

Reinforcement Learning-Driven LLM Agent for Automated Attacks on LLMs 2024

EtymoLink: A Structured English Etymology Dataset 2024

Turkish Delights: A Dataset on Turkish Euphemisms 2024

Subjectivity Detection in English News using Large Language Models 2024

Does DetectGPT Fully Utilize Perturbation? Bridging Selective Perturbation to Fine-tuned Contrastive Learning Detector would be Better 2024