From Generation to Selection: Findings of Converting Analogical Problem-Solving into Multiple-Choice Questions

Donghyeon Shin; Seungpil Lee; Klea Lena Kovacec; Sundong Kim

2024 EMNLP EMNLP 2024

From Generation to Selection: Findings of Converting Analogical Problem-Solving into Multiple-Choice Questions

Abstract

AbstractAs artificial intelligence reasoning abilities gain prominence, generating reliable benchmarks becomes crucial. The Abstract and Reasoning Corpus (ARC) offers challenging problems yet unsolved by AI. While ARC effectively assesses reasoning, its generation-based evaluation overlooks other assessment aspects. Bloom’s Taxonomy suggests evaluating six cognitive stages: Remember, Understand, Apply, Analyze, Evaluate, and Create. To extend ARC’s focus beyond the Create stage, we developed MC-LARC, a multiple-choice format suitable for assessing stages like Understand and Apply in Large Language Models (LLMs). Our evaluation of ChatGPT4V’s analogical reasoning using MC-LARC confirmed that this format supports LLMs’ reasoning capabilities and facilitates evidence analysis. However, we observed LLMs using shortcuts in MC-LARC tasks. To address this, we propose a self-feedback framework where LLMs identify issues and generate improved options. MC-LARC is available at https://mc-larc.github.io/.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — arc dataset

🐣 Hot Topic Early Bird — analogical reasoning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Donghyeon Shin , Seungpil Lee , Klea Lena Kovacec , Sundong Kim

Topics

Artificial Intelligence > Core AI > Foundation Models Natural Language Processing > Applications > Text Classification Artificial Intelligence > Core AI > Large Language Models Artificial Intelligence > Core AI > Reasoning Machine Learning > Learning Types > Evaluation

Keywords

multiple choice reasoning evaluation analogical reasoning large language model benchmark generation arc dataset multiple choice format

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024