Artificial Intelligence › Core AI ›

Evaluation

10 directly classified papers

Papers per year

Papers

Evaluating Text Style Transfer Evaluation: Are There Any Reliable Metrics? NAACL 2025

Towards a Principled Evaluation of Knowledge Editors ACL 2025

ToMBench: Benchmarking Theory of Mind in Large Language Models ACL 2024

BenchIE^FL: A Manually Re-Annotated Fact-Based Open Information Extraction Benchmark ACL 2024

HelloFresh: LLM Evalutions on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits ACL 2024

The Devil is in the Details: On the Pitfalls of Event Extraction Evaluation ACL 2023

ADBench: Anomaly Detection Benchmark NIPS 2022

BenchIE: A Framework for Multi-Faceted Fact-Based Open Information Extraction Evaluation ACL 2022

TabPert : An Effective Platform for Tabular Perturbation EMNLP 2021

Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy Evaluation Approach EMNLP 2021