← Learning Types

Machine Learning › Learning Types ›

Evaluation

1654 directly classified papers

Papers per year

Papers

Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Machine Translation EMNLP 2018

Deep learning evaluation using deep linguistic processing NAACL 2018

Annotation Artifacts in Natural Language Inference Data NAACL 2018

Behavior Analysis of NLI Models: Uncovering the Influence of Three Factors on Robustness NAACL 2018

The Importance of Calibration for Estimating Proportions from Annotations NAACL 2018

Whistle-blowing ASRs: Evaluating the Need for More Inclusive Speech Recognition Systems INTERSPEECH 2018

A Boo(n) for Evaluating Architecture Performance ICML 2018

Sentence-Level Fluency Evaluation: References Help, But Can Be Spared! CONLL 2018

Taylor’s law for Human Linguistic Sequences ACL 2018

The price of debiasing automatic metrics in natural language evalaution ACL 2018

Inherent Biases in Reference-based Evaluation for Grammatical Error Correction ACL 2018

Improving Text-to-SQL Evaluation Methodology ACL 2018

Efficient Online Scalar Annotation with Bounded Support ACL 2018

Named Entity Recognition on Code-Switched Data: Overview of the CALCS 2018 Shared Task ACL 2018

Locally Private Hypothesis Testing ICML 2018

Algorithmic Assurance: An Active Approach to Algorithmic Testing using Bayesian Optimisation NIPS 2018

In Search of Coherence and Consensus: Measuring the Interpretability of Statistical Topics JMLR 2018

Neural Quality Estimation of Grammatical Error Correction EMNLP 2018

On the Impact of Various Types of Noise on Neural Machine Translation ACL 2018

Exploring gap filling as a cheaper alternative to reading comprehension questionnaires when evaluating machine translation for gisting EMNLP 2018

A Methodology for Evaluating Interaction Strategies of Task-Oriented Conversational Agents EMNLP 2018

Practical Methods for Graph Two-Sample Testing NIPS 2018

Online control of the false discovery rate with decaying memory NIPS 2017

On Optimal Generalizability in Parametric Learning NIPS 2017

Finite Sample Analysis of the GTD Policy Evaluation Algorithms in Markov Setting NIPS 2017