Leveraging Large Language Models for NLG Evaluation: Advances and Challenges

Zhen Li; Xiaohan Xu; Tao Shen; Can Xu; Jia-Chen Gu; Yuxuan Lai; Chongyang Tao; Shuai Ma

2024 EMNLP EMNLP 2024

Leveraging Large Language Models for NLG Evaluation: Advances and Challenges

Abstract

AbstractIn the rapidly evolving domain of Natural Language Generation (NLG) evaluation, introducing Large Language Models (LLMs) has opened new avenues for assessing generated content quality, e.g., coherence, creativity, and context relevance. This paper aims to provide a thorough overview of leveraging LLMs for NLG evaluation, a burgeoning area that lacks a systematic analysis. We propose a coherent taxonomy for organizing existing LLM-based evaluation metrics, offering a structured framework to understand and compare these methods. Our detailed exploration includes critically assessing various LLM-based methodologies, as well as comparing their strengths and limitations in evaluating NLG outputs. By discussing unresolved challenges, including bias, robustness, domain-specificity, and unified evaluation, this paper seeks to offer insights to researchers and advocate for fairer and more advanced NLG evaluation techniques.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — generated content

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zhen Li , Xiaohan Xu , Tao Shen , Can Xu , Jia-Chen Gu , Yuxuan Lai , Chongyang Tao , Shuai Ma

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Optimization & Theory > Theory Natural Language Processing > Applications > Machine Translation Artificial Intelligence > Core AI > Large Language Models Natural Language Processing > Applications > Text Generation Natural Language Processing > Applications > Natural Language Generation

Keywords

natural language generation model evaluation text generation evaluation metric text quality large language model generated content

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024