Decoding Stumpers: Large Language Models vs. Human Problem-Solvers

Alon Goldstein; Miriam Havin; Roi Reichart; Ariel Goldstein

2023 EMNLP EMNLP 2023

Decoding Stumpers: Large Language Models vs. Human Problem-Solvers

Abstract

AbstractThis paper investigates the problem-solving capabilities of Large Language Models (LLMs) by evaluating their performance on stumpers, unique single-step intuition problems that pose challenges for human solvers but are easily verifiable. We compare the performance of four state-of-the-art LLMs (Davinci-2, Davinci-3, GPT-3.5-Turbo, GPT-4) to human participants. Our findings reveal that the new-generation LLMs excel in solving stumpers and surpass human performance. However, humans exhibit superior skills in verifying solutions to the same problems. This research enhances our understanding of LLMs’ cognitive abilities and provides insights for enhancing their problem-solving potential across various domains.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Interdisciplinary and Machine Learning

📈 Trend Setter — Large Language Models

🧭 Keyword Pioneer — cognitive ability

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Alon Goldstein , Miriam Havin , Roi Reichart , Ariel Goldstein

Topics

Artificial Intelligence > Core AI > Foundation Models Interdisciplinary > Cognitive Science > Cognitive Modeling Artificial Intelligence > Core AI > Large Language Models Deep Learning > Models > Large Language Models Machine Learning > Optimization & Theory > Evaluation Machine Learning > Learning Types > Large Language Models

Keywords

human evaluation cognitive ability comparative evaluation problem solving human performance large language model solution verification

Download PDF

Related papers

Exploring Linguistic Probes for Morphological Generalization 2023

NameGuess: Column Name Expansion for Tabular Data 2023

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning 2023

Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation 2023

On the Calibration of Large Language Models and Alignment 2023