2025
ACL
ACL 2025
ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem Solving
Abstract
AbstractWhile Large Language Models (LLMs) have shown impressive capabilities in math problem-solving tasks, their robustness to noisy inputs is not well-studied. We propose ArithmAttack to examine how robust the LLMs are when they encounter noisy prompts that contain extra noise in the form of punctuation marks. While being easy to implement, ArithmAttack does not cause any information loss since words are not added or deleted from the context. We evaluate the robustness of eight LLMs, including LLama3, Mistral, Mathstral, and DeepSeek on noisy GSM8K and MultiArith datasets. Our experiments suggest that all the studied models show vulnerability to such noise, with more noise leading to poorer performances.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— punctuation noise
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Artificial Intelligence > Core AI > AI Safety
Natural Language Processing > Applications > Question Answering
Artificial Intelligence > Core AI > Large Language Models
Artificial Intelligence > Core AI > Reasoning
Deep Learning > Models > Large Language Models
Machine Learning > Learning Types > Evaluation
Machine Learning > Learning Types > Robustness
Deep Learning > Optimization & Theory > Evaluation