Papers

33 papers found
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Aman Singh Thakur, Kartik Choudhary, Venkat Srinik Ramayapally et al.
2025 ACL
2025 AACL
JuStRank: Benchmarking LLM Judges for System Ranking
Ariel Gera, Odellia Boni, Yotam Perlitz et al.
2025 ACL
2025 COLING
2025 EMNLP
Fooling the LVLM Judges: Visual Biases in LVLM-Based Evaluation
Yerin Hwang, Dongryeol Lee, Kyungmin Min et al.
2025 EMNLP
CourtReasoner: Can LLM Agents Reason Like Judges?
Sophia Simeng Han, Yoshiki Takashima, Shannon Zejiang Shen et al.
2025 EMNLP
Audio-Aware Large Language Models as Judges for Speaking Styles
Cheng-Han Chiang, Xiaofei Wang, Chung-Ching Lin et al.
2025 EMNLP
Can You Trick the Grader? Adversarial Persuasion of LLM Judges
Yerin Hwang, Dongryeol Lee, Taegwan Kang et al.
2025 EMNLP
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Lianghui Zhu, Xinggang Wang, Xinlong Wang
2025 ICLR
JudgeBench: A Benchmark for Evaluating LLM-Based Judges
Sijun Tan, Siyuan Zhuang, Kyle Montgomery et al.
2025 ICLR
2025 IJCNLP
Becoming Experienced Judges: Selective Test-Time Learning for Evaluators
Seungyeon Jwa, Daechul Ahn, Reokyoung Kim et al.
2026 EACL
2026 EACL
Who Judges the Judge? Evaluating LLM-as-a-Judge for French Medical open-ended QA
Ikram Belmadani, Oumaima El Khettari, Pacôme Constant dit Beaufils et al.
2026 EACL
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng et al.
2023 NIPS
EvalAssist: LLM-as-a-Judge Simplified
Michael Desmond, Zahra Ashktorab, Werner Geyer et al.
2025 AAAI
Humans or LLMs as the Judge? A Study on Judgement Bias
Guiming Hardy Chen, Shunian Chen, Ziche Liu et al.
2024 EMNLP
Can LLM be a Personalized Judge?
Yijiang River Dong, Tiancheng Hu, Nigel Collier
2024 EMNLP
Direct Judgement Preference Optimization
PeiFeng Wang, Austin Xu, Yilun Zhou et al.
2025 EMNLP
2025 EMNLP
Improve LLM-as-a-Judge Ability as a General Ability
Jiachen Yu, Shaoning Sun, Xiaohui Hu et al.
2025 EMNLP
MR. Judge: Multimodal Reasoner as a Judge
Renjie Pi, Haoping Bai, Qibin Chen et al.
2025 EMNLP
Agent-as-Judge for Factual Summarization of Long Narratives
Yeonseok Jeong, Minsoo Kim, Seung-won Hwang et al.
2025 EMNLP
2025 EMNLP