Finding Diamonds in Conversation Haystacks: A Benchmark for Conversational Data Retrieval

Yohan Lee; Yongwoo Song; Sangyeop Kim

2025 EMNLP EMNLP 2025

Finding Diamonds in Conversation Haystacks: A Benchmark for Conversational Data Retrieval

Abstract

AbstractWe present the Conversational Data Retrieval (CDR) benchmark, the first comprehensive test set for evaluating systems that retrieve conversation data for product insights. With 1.6k queries across five analytical tasks and 9.1k conversations, our benchmark provides a reliable standard for measuring conversational data retrieval performance. Our evaluation of 16 popular embedding models shows that even the best models reach only around NDCG@10 of 0.51, revealing a substantial gap between document and conversational data retrieval capabilities. Our work identifies unique challenges in conversational data retrieval (implicit state recognition, turn dynamics, contextual references) while providing practical query templates and detailed error analysis across different task categories. The benchmark dataset and code are available at https://github.com/l-yohai/CDR-Benchmark.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Data Science & Analytics and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — conversational data retrieval

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yohan Lee , Yongwoo Song , Sangyeop Kim

Topics

Natural Language Processing > Applications > Information Retrieval Natural Language Processing > Resources & Methods > Text Representation Data Science & Analytics > Applications > Information Retrieval Machine Learning > Learning Types > Retrieval-Augmented Generation Machine Learning > Application Areas > Information Retrieval Artificial Intelligence > Core AI > Information Retrieval

Keywords

benchmark evaluation information retrieval dialogue system embedding model conversational data retrieval natural language embedding contextual reference

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025