Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Application Areas
Machine Learning
›
Application Areas
›
Evaluation
22 directly classified papers
Papers per year
2020: 1
2021: 1
2022: 3
2023: 3
2024: 8
2025: 6
Papers
Towards a Principled Evaluation of Knowledge Editors
ACL 2025
Video-Bench: Human-Aligned Video Generation Benchmark
CVPR 2025
EEE-Bench: A Comprehensive Multimodal Electrical And Electronics Engineering Benchmark
CVPR 2025
DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation
AAAI 2025
Standard Quality Criteria Derived from Current NLP Evaluations for Guiding Evaluation Design and Grounding Comparability and AI Compliance Assessments
ACL 2025
(Towards) Scalable Reliable Automated Evaluation with Large Language Models
ACL 2025
Benchmark Data Repositories for Better Benchmarking
NIPS 2024
StudentEval: A Benchmark of Student-Written Prompts for Large Language Models of Code
ACL 2024
chrF-S: Semantics Is All You Need
EMNLP 2024
MSLC24: Further Challenges for Metrics on a Wide Landscape of Translation Quality
EMNLP 2024
MetricX-24: The Google Submission to the WMT 2024 Metrics Shared Task
EMNLP 2024
Expanding the FLORES+ Multilingual Benchmark with Translations for Aragonese, Aranese, Asturian, and Valencian
EMNLP 2024
Adaptive Labeling for Efficient Out-of-distribution Model Evaluation
NIPS 2024
Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2)
NIPS 2024
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion
NIPS 2023
BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric
ACL 2023
Stop Uploading Test Data in Plain Text: Practical Strategies for Mitigating Data Contamination by Evaluation Benchmarks
EMNLP 2023
Evaluating the Knowledge Dependency of Questions
EMNLP 2022
Automated Evaluation Metric for Terminology Consistency in MT
EMNLP 2022
ACES: Translation Accuracy Challenge Sets for Evaluating Machine Translation Metrics
EMNLP 2022
SeqScore: Addressing Barriers to Reproducible Named Entity Recognition Evaluation
EMNLP 2021
Dscorer: A Fast Evaluation Metric for Discourse Representation Structure Parsing
ACL 2020
<
1
>