2025 EMNLP EMNLP 2025

Grounded Answers from Multi-Passage Regulations: Learning-to-Rank for Regulatory RAG

Abstract

AbstractRegulatory compliance questions often require aggregating evidence from multiple, interrelated sections of long, complex documents. To support question-answering (QA) in this setting, we introduce ObliQA-MP, a dataset for multi-passage regulatory QA, extending the earlier ObliQA benchmark (CITATION), and improve evidence quality with an LLM–based validation step that filters out ~20% of passages missed by prior natural language inference (NLI) based filtering. Our benchmarks show a notable performance drop from single- to multi-passage retrieval, underscoring the challenges of semantic overlap and structural complexity in regulatory texts. To address this, we propose a feature-based learning-to-rank (LTR) framework that integrates lexical, semantic, and graph-derived information, achieving consistent gains over dense and hybrid baselines. We further add a lightweight score-based filter to trim noisy tails and an obligation-centric prompting technique. On ObliQA-MP, LTR improves retrieval (Recall@10/MAP@10/nDCG@10) over dense, hybrid, and fusion baselines. Our generation approach, based on domain-specific filtering plus prompting, achieves strong scores using the RePAS metric (CITATION) on ObliQA-MP, producing faithful, citation-grounded answers. Together, ObliQA-MP and our validation and RAG systems offer a stronger benchmark and a practical recipe for grounded, citation-controlled QA in regulatory domains.

🌉 Interdisciplinary Bridge — Computer Science and Machine Learning and Natural Language Processing
🧭 Keyword Pioneer — multi-passage retrieval
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio