2023
EMNLP
EMNLP 2023
Decoding Stumpers: Large Language Models vs. Human Problem-Solvers
Abstract
AbstractThis paper investigates the problem-solving capabilities of Large Language Models (LLMs) by evaluating their performance on stumpers, unique single-step intuition problems that pose challenges for human solvers but are easily verifiable. We compare the performance of four state-of-the-art LLMs (Davinci-2, Davinci-3, GPT-3.5-Turbo, GPT-4) to human participants. Our findings reveal that the new-generation LLMs excel in solving stumpers and surpass human performance. However, humans exhibit superior skills in verifying solutions to the same problems. This research enhances our understanding of LLMs’ cognitive abilities and provides insights for enhancing their problem-solving potential across various domains.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Deep Learning and Interdisciplinary and Machine Learning
📈
Trend Setter
— Large Language Models
🧭
Keyword Pioneer
— cognitive ability
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Artificial Intelligence > Core AI > Foundation Models
Interdisciplinary > Cognitive Science > Cognitive Modeling
Artificial Intelligence > Core AI > Large Language Models
Deep Learning > Models > Large Language Models
Machine Learning > Optimization & Theory > Evaluation
Machine Learning > Learning Types > Large Language Models