Retrieval-based Evaluation for LLMs: A Case Study in Korean Legal QA

Cheol Ryu; Seolhwa Lee; Subeen Pang; Chanyeol Choi; Hojun Choi; Myeonggee Min; Jy-yong Sohn

2023 EMNLP EMNLP 2023

Retrieval-based Evaluation for LLMs: A Case Study in Korean Legal QA

Abstract

AbstractWhile large language models (LLMs) have demonstrated significant capabilities in text generation, their utilization in areas requiring domain-specific expertise, such as law, must be approached cautiously. This caution is warranted due to the inherent challenges associated with LLM-generated texts, including the potential presence of factual errors. Motivated by this issue, we propose Eval-RAG, a new evaluation method for LLM-generated texts. Unlike existing methods, Eval-RAG evaluates the validity of generated texts based on the related document that are collected by the retriever. In other words, Eval-RAG adopts the idea of retrieval augmented generation (RAG) for the purpose of evaluation. Our experimental results on Korean Legal Question-Answering (QA) tasks show that conventional LLM-based evaluation methods can be better aligned with Lawyers’ evaluations, by combining with Eval-RAG. In addition, our qualitative analysis show that Eval-RAG successfully finds the factual errors in LLM-generated texts, while existing evaluation methods cannot.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — korean legal qa

🐣 Hot Topic Early Bird — retrieval augmented generation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Cheol Ryu , Seolhwa Lee , Subeen Pang , Chanyeol Choi , Hojun Choi , Myeonggee Min , Jy-yong Sohn

Topics

Natural Language Processing > Applications > Fact-Checking Natural Language Processing > Applications > Information Retrieval Natural Language Processing > Applications > Question Answering Machine Learning > Learning Types > Retrieval-Augmented Generation

Keywords

retrieval augmented generation language model evaluation llm evaluation evaluation metric retrieval-augmented generation factual error detection factual error legal question answering korean legal qa korean legal

Download PDF

Related papers

Exploring Linguistic Probes for Morphological Generalization 2023

NameGuess: Column Name Expansion for Tabular Data 2023

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning 2023

Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation 2023

On the Calibration of Large Language Models and Alignment 2023