MiQA: A Benchmark for Inference on Metaphorical Questions

Iulia Comșa; Julian Eisenschlos; Srini Narayanan

2022 IJCNLP IJCNLP 2022

MiQA: A Benchmark for Inference on Metaphorical Questions

Abstract

AbstractWe propose a benchmark to assess the capability of large language models to reason with conventional metaphors. Our benchmark combines the previously isolated topics of metaphor detection and commonsense reasoning into a single task that requires a model to make inferences by accurately selecting between the literal and metaphorical register. We examine the performance of state-of-the-art pre-trained models on binary-choice tasks and find a large discrepancy between the performance of small and very large models, going from chance to near-human level. We also analyse the largest model in a generative setting and find that although human performance is approached, careful multiple-shot prompting is required.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — metaphorical question

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Iulia Comșa , Julian Eisenschlos , Srini Narayanan

Topics

Machine Learning > Learning Types > Zero-Shot Learning Natural Language Processing > Understanding > Semantic Analysis Natural Language Processing > Applications > Question Answering Artificial Intelligence > Core AI > Reasoning Deep Learning > Models > Large Language Models

Keywords

zero-shot learning benchmark evaluation prompt engineering commonsense reasoning metaphorical reasoning metaphor detection large language model metaphorical question

Download PDF

Related papers

Chasing the Tail with Domain Generalization: A Case Study on Frequency-Enriched Datasets 2022

Double Trouble: How to not Explain a Text Classifier’s Decisions Using Counterfactuals Synthesized by Masked Language Models? 2022

Leveraging Key Information Modeling to Improve Less-Data Constrained News Headline Generation via Duality Fine-Tuning 2022

Graph-augmented Learning to Rank for Querying Large-scale Knowledge Graph 2022

Missing Modality meets Meta Sampling (M3S): An Efficient Universal Approach for Multimodal Sentiment Analysis with Missing Modality 2022