Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Optimization & Theory
Deep Learning
›
Optimization & Theory
›
Evaluation
345 directly classified papers
Papers per year
2014: 1
2016: 3
2017: 1
2018: 9
2019: 21
2020: 34
2021: 32
2022: 50
2023: 28
2024: 90
2025: 76
Papers
Measuring the Instability of Fine-Tuning
ACL 2023
REV: Information-Theoretic Evaluation of Free-Text Rationales
ACL 2023
Feature Likelihood Divergence: Evaluating the Generalization of Generative Models Using Samples
NIPS 2023
ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification
NIPS 2023
RDumb: A simple approach that questions our progress in continual test-time adaptation
NIPS 2023
Discriminative Feature Attributions: Bridging Post Hoc Explainability and Inherent Interpretability
NIPS 2023
Ecosystem-level Analysis of Deployed Machine Learning Reveals Homogeneous Outcomes
NIPS 2023
A New Benchmark and Reverse Validation Method for Passage-level Hallucination Detection
EMNLP 2023
On the Calibration of Large Language Models and Alignment
EMNLP 2023
NLP Evaluation in trouble: On the Need to Measure LLM Data Contamination for each Benchmark
EMNLP 2023
Measuring the Knowledge Acquisition-Utilization Gap in Pretrained Language Models
EMNLP 2023
AIO-P: Expanding Neural Performance Predictors beyond Image Classification
AAAI 2023
Training Meta-Surrogate Model for Transferable Adversarial Attack
AAAI 2023
Re-Examining Summarization Evaluation across Multiple Quality Criteria
EMNLP 2023
LINe: Out-of-Distribution Detection by Leveraging Important Neurons
CVPR 2023
Shortcomings of Top-Down Randomization-Based Sanity Checks for Evaluations of Deep Neural Network Explanations
CVPR 2023
Why Is the Winner the Best?
CVPR 2023
Zero-Shot Model Diagnosis
CVPR 2023
Pseudointelligence: A Unifying Lens on Language Model Evaluation
EMNLP 2023
Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics
ACL 2022
BlonDe: An Automatic Evaluation Metric for Document-level Machine Translation
NAACL 2022
Logic Rule Guided Attribution with Dynamic Ablation
AAAI 2022
Model Doctor: A Simple Gradient Aggregation Strategy for Diagnosing and Treating CNN Classifiers
AAAI 2022
Do Feature Attribution Methods Correctly Attribute Features?
AAAI 2022
MINIMAL: Mining Models for Universal Adversarial Triggers
AAAI 2022
<
1
…
7
8
9
…
14
>