← Optimization & Theory

Deep Learning › Optimization & Theory ›

Evaluation

345 directly classified papers

Papers per year

Papers

Measuring the Instability of Fine-Tuning ACL 2023

REV: Information-Theoretic Evaluation of Free-Text Rationales ACL 2023

Feature Likelihood Divergence: Evaluating the Generalization of Generative Models Using Samples NIPS 2023

ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification NIPS 2023

RDumb: A simple approach that questions our progress in continual test-time adaptation NIPS 2023

Discriminative Feature Attributions: Bridging Post Hoc Explainability and Inherent Interpretability NIPS 2023

Ecosystem-level Analysis of Deployed Machine Learning Reveals Homogeneous Outcomes NIPS 2023

A New Benchmark and Reverse Validation Method for Passage-level Hallucination Detection EMNLP 2023

On the Calibration of Large Language Models and Alignment EMNLP 2023

NLP Evaluation in trouble: On the Need to Measure LLM Data Contamination for each Benchmark EMNLP 2023

Measuring the Knowledge Acquisition-Utilization Gap in Pretrained Language Models EMNLP 2023

AIO-P: Expanding Neural Performance Predictors beyond Image Classification AAAI 2023

Training Meta-Surrogate Model for Transferable Adversarial Attack AAAI 2023

Re-Examining Summarization Evaluation across Multiple Quality Criteria EMNLP 2023

LINe: Out-of-Distribution Detection by Leveraging Important Neurons CVPR 2023

Shortcomings of Top-Down Randomization-Based Sanity Checks for Evaluations of Deep Neural Network Explanations CVPR 2023

Why Is the Winner the Best? CVPR 2023

Zero-Shot Model Diagnosis CVPR 2023

Pseudointelligence: A Unifying Lens on Language Model Evaluation EMNLP 2023

Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics ACL 2022

BlonDe: An Automatic Evaluation Metric for Document-level Machine Translation NAACL 2022

Logic Rule Guided Attribution with Dynamic Ablation AAAI 2022

Model Doctor: A Simple Gradient Aggregation Strategy for Diagnosing and Treating CNN Classifiers AAAI 2022

Do Feature Attribution Methods Correctly Attribute Features? AAAI 2022

MINIMAL: Mining Models for Universal Adversarial Triggers AAAI 2022