Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Optimization & Theory
Deep Learning
›
Optimization & Theory
›
Evaluation
345 directly classified papers
Papers per year
2014: 1
2016: 3
2017: 1
2018: 9
2019: 21
2020: 34
2021: 32
2022: 50
2023: 28
2024: 90
2025: 76
Papers
Improving Accuracy and Calibration via Differentiated Deep Mutual Learning
CVPR 2025
FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models
CVPR 2024
ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object
CVPR 2024
CORES: Convolutional Response-based Score for Out-of-distribution Detection
CVPR 2024
BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation
CVPR 2024
VLEU: a Method for Automatic Evaluation for Generalizability of Text-to-Image Models
EMNLP 2024
Scaling Laws of Synthetic Images for Model Training ... for Now
CVPR 2024
Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-Context Models
EMNLP 2024
LawBench: Benchmarking Legal Knowledge of Large Language Models
EMNLP 2024
Do LLMs Overcome Shortcut Learning? An Evaluation of Shortcut Challenges in Large Language Models
EMNLP 2024
Towards Reproducible, Automated, and Scalable Anomaly Detection
AAAI 2024
Accelerating Adversarially Robust Model Selection for Deep Neural Networks via Racing
AAAI 2024
Can Large Language Models Understand Real-World Complex Instructions?
AAAI 2024
Discretization-Induced Dirichlet Posterior for Robust Uncertainty Quantification on Regression
AAAI 2024
Benchmarking Segmentation Models with Mask-Preserved Attribute Editing
CVPR 2024
A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations
EMNLP 2024
Integrated Decision Gradients: Compute Your Attributions Where the Model Makes Its Decision
AAAI 2024
Impact of Decoding Methods on Human Alignment of Conversational LLMs
ACL 2024
A Systematic Analysis on the Temporal Generalization of Language Models in Social Media
ACL 2024
Knowledge Acquisition through Continued Pretraining is Difficult: A Case Study on r/AskHistorians
ACL 2024
Beyond Text: Unveiling Multimodal Proficiency of Large Language Models with MultiAPI Benchmark
ACL 2024
Empowering CAM-Based Methods with Capability to Generate Fine-Grained and High-Faithfulness Explanations
AAAI 2024
LLMs as Narcissistic Evaluators: When Ego Inflates Evaluation Scores
ACL 2024
Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language Models
ACL 2024
Challenging Large Language Models with New Tasks: A Study on their Adaptability and Robustness
ACL 2024
<
1
2
3
4
5
…
14
>