Analysis of Automated Document Relevance Annotation for Information Retrieval in Oil and Gas Industry

João Vitor Mariano Correia; Murilo Missano Bell; João Vitor Robiatti Amorim; Jonas Queiroz; Daniel Pedronette; Ivan Rizzo Guilherme; Felipe Lima de Oliveira

2025 EMNLP EMNLP 2025

Analysis of Automated Document Relevance Annotation for Information Retrieval in Oil and Gas Industry

Abstract

AbstractThe lack of high-quality test collections challenges Information Retrieval (IR) in specialized domains. This work addresses this issue by comparing supervised classifiers against zero-shot Large Language Models (LLMs) for automated relevance annotation in the oil and gas industry, using human expert judgments as a benchmark. A supervised classifier, trained on limited expert data, outperforms LLMs, achieving an F1-score that surpasses even a second human annotator. The study also empirically confirms that LLMs are susceptible to unfairly prefer technologically similar retrieval systems. While LLMs lack precision in this context, a well-engineered classifier offers an accurate and practical path to scaling evaluation datasets within a human-in-the-loop framework that empowers, not replaces, human expertise.

🌉 Interdisciplinary Bridge — Computer Science and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — relevance annotation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

João Vitor Mariano Correia , Murilo Missano Bell , João Vitor Robiatti Amorim , Jonas Queiroz , Daniel Pedronette , Ivan Rizzo Guilherme , Felipe Lima de Oliveira

Topics

Machine Learning > Core Methods > Classification Machine Learning > Learning Types > Zero-Shot Learning Natural Language Processing > Applications > Information Retrieval Computer Science > Applications > Information Retrieval Machine Learning > Learning Types > Transfer Learning Machine Learning > Learning Types > Classification

Keywords

zero-shot learning information retrieval supervised learning document classification document relevance document annotation supervised classifier large language model relevance annotation

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025