ALERT: An LLM-powered Benchmark for Automatic Evaluation of Recommendation Explanations

Yichuan Li; Xinyang Zhang; Chenwei Zhang; Mao Li; Tianyi Liu; Pei Chen; Yifan Gao; Kyumin Lee; Kaize Ding; Zhengyang Wang; Zhihan Zhang; Jingbo Shang; Xian Li; Trishul Chilimbi

2025 NAACL NAACL 2025

ALERT: An LLM-powered Benchmark for Automatic Evaluation of Recommendation Explanations

Abstract

AbstractRecommendation explanation systems have become increasingly vital with the widespread adoption of recommender systems. However, existing recommendation explanation evaluation benchmarks suffer from limited item diversity, impractical user profiling requirements, and unreliable and unscalable evaluation protocols. We present ALERT, a model-agnostic recommendation explanation evaluation benchmark. The benchmark comprises three main contributions: 1) a diverse dataset encompassing 15 Amazon e-commerce categories with 2,761 user-item interactions, incorporating implicit preferences through purchase histories;2) two novel LLM-powered automatic evaluators that enable scalable and human-preference aligned evaluation of explanations; and 3) a robust divide-and-aggregate approach that synthesizes multiple LLM judgments, achieving 70% concordance with expert human evaluation and substantially outperforming existing methods.ALERT facilitates comprehensive evaluation of recommendation explanations across diverse domains, advancing the development of more effective explanation systems.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Data Science & Analytics and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yichuan Li , Xinyang Zhang , Chenwei Zhang , Mao Li , Tianyi Liu , Pei Chen , Yifan Gao , Kyumin Lee , Kaize Ding , Zhengyang Wang , Zhihan Zhang , Jingbo Shang , Xian Li , Trishul Chilimbi

Topics

Artificial Intelligence > Core AI > Foundation Models Machine Learning > Application Areas > Domain Adaptation Data Science & Analytics > Applications > Recommender Systems

Keywords

natural language generation human preference alignment recommender system automatic evaluation large language model

Download PDF

Few-shot Personalization of LLMs with Mis-aligned Responses 2025

NLI under the Microscope: What Atomic Hypothesis Decomposition Reveals 2025

Understanding Figurative Meaning through Explainable Visual Entailment 2025

CogLM: Tracking Cognitive Development of Large Language Models 2025

ALERT: An LLM-powered Benchmark for Automatic Evaluation of Recommendation Explanations

Abstract

Authors

Topics

Keywords

Related papers