Training and Prediction Data Discrepancies: Challenges of Text Classification with Noisy, Historical Data

R. Andrew Kreek; Emilia Apostolova

2018 EMNLP EMNLP 2018

Training and Prediction Data Discrepancies: Challenges of Text Classification with Noisy, Historical Data

Abstract

AbstractIndustry datasets used for text classification are rarely created for that purpose. In most cases, the data and target predictions are a by-product of accumulated historical data, typically fraught with noise, present in both the text-based document, as well as in the targeted labels. In this work, we address the question of how well performance metrics computed on noisy, historical data reflect the performance on the intended future machine learning model input. The results demonstrate the utility of dirty training datasets used to build prediction models for cleaner (and different) prediction inputs.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🐣 Hot Topic Early Bird — training datum

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

R. Andrew Kreek , Emilia Apostolova

Topics

Machine Learning > Learning Types > Weakly Supervised Learning Machine Learning > Application Areas > Domain Adaptation Natural Language Processing > Applications > Text Classification

Keywords

domain adaptation text classification noisy label training datum

Download PDF

Related papers

Speeding Up Neural Machine Translation Decoding by Cube Pruning 2018

Limitations in learning an interpreted language with recurrent models 2018

Results of the sixth edition of the BioASQ Challenge 2018

Neural Segmental Hypergraphs for Overlapping Mention Recognition 2018

Hybrid Neural Attention for Agreement/Disagreement Inference in Online Debates 2018