MAIR: A Massive Benchmark for Evaluating Instructed Retrieval

Weiwei Sun; Zhengliang Shi; Wu Jiu Long; Lingyong Yan; Xinyu Ma; Yiding Liu; Min Cao; Dawei Yin; Zhaochun Ren

2024 EMNLP EMNLP 2024

MAIR: A Massive Benchmark for Evaluating Instructed Retrieval

Abstract

AbstractRecent information retrieval (IR) models are pre-trained and instruction-tuned on massive datasets and tasks, enabling them to perform well on a wide range of tasks and potentially generalize to unseen tasks with instructions. However, existing IR benchmarks focus on a limited scope of tasks, making them insufficient for evaluating the latest IR models. In this paper, we propose MAIR (Massive Instructed Retrieval Benchmark), a heterogeneous IR benchmark that includes 126 distinct IR tasks across 6 domains, collected from existing datasets. We benchmark state-of-the-art instruction-tuned text embedding models and re-ranking models. Our experiments reveal that instruction-tuned models generally achieve superior performance compared to non-instruction-tuned models on MAIR Additionally, our results suggest that current instruction-tuned text embedding models and re-ranking models still lack effectiveness in specific long-tail tasks. MAIR is publicly available at https://github.com/sunnweiwei/Mair.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Weiwei Sun , Zhengliang Shi , Wu Jiu Long , Lingyong Yan , Xinyu Ma , Yiding Liu , Min Cao , Dawei Yin , Zhaochun Ren

Topics

Machine Learning > Application Areas > Efficient Computing Natural Language Processing > Applications > Information Retrieval Machine Learning > Learning Types > Evaluation Deep Learning > Techniques > Transfer Learning Deep Learning > Learning Types > Transfer Learning Machine Learning > Application Areas > Information Retrieval Artificial Intelligence > Core AI > Information Retrieval

Keywords

benchmark evaluation information retrieval dense retrieval instruction tuning text embedding information retrieval benchmark

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024