How to account for mispellings: Quantifying the benefit of character representations in neural content scoring models

Brian Riordan; Michael Flor; Robert Pugh

2019 ACL ACL 2019

How to account for mispellings: Quantifying the benefit of character representations in neural content scoring models

Abstract

AbstractCharacter-based representations in neural models have been claimed to be a tool to overcome spelling variation in in word token-based input. We examine this claim in neural models for content scoring. We formulate precise hypotheses about the possible effects of adding character representations to word-based models and test these hypotheses on large-scale real world content scoring datasets. We find that, while character representations may provide small performance gains in general, their effectiveness in accounting for spelling variation may be limited. We show that spelling correction can provide larger gains than character representations, and that spelling correction improves the performance of models with character representations. With these insights, we report a new state of the art on the ASAP-SAS content scoring dataset.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — content scoring

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Brian Riordan , Michael Flor , Robert Pugh

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Application Areas > Efficient Computing Natural Language Processing > Applications > Text Classification Deep Learning > Learning Types > Representation Learning

Keywords

text classification character representation neural model spelling correction content scoring text processing neural network

Download PDF

Related papers

What do phone embeddings learn about Phonology? 2019

Unsupervised Morphological Segmentation for Low-Resource Polysynthetic Languages 2019

Understanding Undesirable Word Embedding Associations 2019

Inferential Machine Comprehension: Answering Questions by Recursively Deducing the Evidence Chain from Text 2019

Domain Adaptation of Neural Machine Translation by Lexicon Induction 2019