The Impact of Language on Arithmetic Proficiency: A Multilingual Investigation with Cross-Agent Checking Computation

Chung-Chi Chen; Hiroya Takamura; Ichiro Kobayashi; Yusuke Miyao

2024 NAACL NAACL 2024

The Impact of Language on Arithmetic Proficiency: A Multilingual Investigation with Cross-Agent Checking Computation

Abstract

AbstractThis paper critically examines the arithmetic capabilities of Large Language Models (LLMs), uncovering significant limitations in their performance. Our research reveals a notable decline in accuracy for complex calculations involving large numbers, with addition and subtraction tasks showing varying degrees of proficiency. Additionally, we challenge the notion that arithmetic is language-independent, finding up to a 10% difference in performance across twenty languages. The study also compares self-verification methods with cross-agent collaborations, showing that a single model often outperforms collaborative approaches in basic arithmetic tasks. These findings suggest a need to reassess the effectiveness of LLMs in tasks requiring numerical accuracy and precision.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Chung-Chi Chen , Hiroya Takamura , Ichiro Kobayashi , Yusuke Miyao

Topics

Natural Language Processing > Resources & Methods > Large Language Models Natural Language Processing > Resources & Methods > Multilingual NLP

Keywords

large language model

Download PDF

Related papers

Working Alliance Transformer for Psychotherapy Dialogue Classification 2024

Named Entity Recognition Under Domain Shift via Metric Learning for Life Sciences 2024

Assessing Logical Puzzle Solving in Large Language Models: Insights from a Minesweeper Case Study 2024

TelME: Teacher-leading Multimodal Fusion Network for Emotion Recognition in Conversation 2024

Extractive Summarization with Text Generator 2024