2026 EACL EACL 2026

RAGVUE: A Diagnostic View for Explainable and Automated Evaluation of Retrieval-Augmented Generation

Abstract

AbstractEvaluating Retrieval-Augmented Generation(RAG) systems remains a challenging task: existingmetrics often collapse heterogeneous behaviorsinto single scores and provide little insightinto whether errors arise from retrieval,reasoning, or grounding. In this paper, we introduceRAGVUE, a diagnostic and explainableframework for automated, reference-freeevaluation of RAG pipelines. RAGVUE decomposesRAG behavior into retrieval quality,answer relevance and completeness, strictclaim-level faithfulness, and judge calibration.Each metric includes a structured explanation,making the evaluation process transparent. Ourframework supports both manual metric selectionand fully automated agentic evaluation. Italso provides a Python API, CLI, and a localStreamlit interface for interactive usage. Incomparative experiments, RAGVUE surfacesfine-grained failures that existing tools suchas RAGAS often overlook. We showcase thefull RAGVUE workflow and illustrate how itcan be integrated into research pipelines andpractical RAG development. The source codeand detailed instructions on usage are publiclyavailable on Github.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Natural Language Processing
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio