Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Optimization & Theory
Deep Learning
›
Optimization & Theory
›
Evaluation
345 directly classified papers
Papers per year
2014: 1
2016: 3
2017: 1
2018: 9
2019: 21
2020: 34
2021: 32
2022: 50
2023: 28
2024: 90
2025: 76
Papers
Benchmarking Segmentation Models with Mask-Preserved Attribute Editing
CVPR 2024
LaRE^2: Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection
CVPR 2024
Rethinking FID: Towards a Better Evaluation Metric for Image Generation
CVPR 2024
FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models
CVPR 2024
Mathador-LM: A Dynamic Benchmark for Mathematical Reasoning on Large Language Models
EMNLP 2024
ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object
CVPR 2024
VBench: Comprehensive Benchmark Suite for Video Generative Models
CVPR 2024
VLEU: a Method for Automatic Evaluation for Generalizability of Text-to-Image Models
EMNLP 2024
Do LLMs Overcome Shortcut Learning? An Evaluation of Shortcut Challenges in Large Language Models
EMNLP 2024
VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
EMNLP 2024
On Training Data Influence of GPT Models
EMNLP 2024
MetaMetrics-MT: Tuning Meta-Metrics for Machine Translation via Human Preference Calibration
EMNLP 2024
Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-Context Models
EMNLP 2024
A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations
EMNLP 2024
GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation
CVPR 2024
Downstream Trade-offs of a Family of Text Watermarks
EMNLP 2024
LawBench: Benchmarking Legal Knowledge of Large Language Models
EMNLP 2024
POSIX: A Prompt Sensitivity Index For Large Language Models
EMNLP 2024
BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation
CVPR 2024
TOWER: Tree Organized Weighting for Evaluating Complex Instructions
EMNLP 2024
The Instinctive Bias: Spurious Images lead to Illusion in MLLMs
EMNLP 2024
L-Eval: Instituting Standardized Evaluation for Long Context Language Models
ACL 2024
Scaling Laws of Synthetic Images for Model Training ... for Now
CVPR 2024
Greed is All You Need: An Evaluation of Tokenizer Inference Methods
ACL 2024
VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation
EMNLP 2024
<
1
…
5
6
7
…
14
>