Improving Native Language Identification by Using Spelling Errors

Lingzhen Chen; Carlo Strapparava; Vivi Nastase

2017 ACL ACL 2017

Improving Native Language Identification by Using Spelling Errors

Abstract

AbstractIn this paper, we explore spelling errors as a source of information for detecting the native language of a writer, a previously under-explored area. We note that character n-grams from misspelled words are very indicative of the native language of the author. In combination with other lexical features, spelling error features lead to 1.2% improvement in accuracy on classifying texts in the TOEFL11 corpus by the author’s native language, compared to systems participating in the NLI shared task.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

📈 Trend Setter — Text Classification

🧭 Keyword Pioneer — spelling error

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Lingzhen Chen , Carlo Strapparava , Vivi Nastase

Topics

Machine Learning > Core Methods > Classification Natural Language Processing > Applications > Text Classification Machine Learning > Application Areas > Text Classification

Keywords

text classification character n-gram lexical feature native language identification spelling error

Download PDF

Related papers

A* CCG Parsing with a Supertag and Dependency Factored Model 2017

Detecting annotation noise in automatically labelled data 2017

Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 2017

Annotating tense, mood and voice for English, French and German 2017

Word Embedding for Response-To-Text Assessment of Evidence 2017