#Turki$hTweets: A Benchmark Dataset for Turkish Text Correction

Asiye Tuba Koksal; Ozge Bozal; Emre Yürekli; Gizem Gezici

2020 EMNLP EMNLP 2020

#Turki$hTweets: A Benchmark Dataset for Turkish Text Correction

Abstract

Abstract#Turki$hTweets is a benchmark dataset for the task of correcting the user misspellings, with the purpose of introducing the first public Turkish dataset in this area. #Turki$hTweets provides correct/incorrect word annotations with a detailed misspelling category formulation based on the real user data. We evaluated four state-of-the-art approaches on our dataset to present a preliminary analysis for the sake of reproducibility.

🧭 Keyword Pioneer — misspelling correction

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Asiye Tuba Koksal , Ozge Bozal , Emre Yürekli , Gizem Gezici

Topics

Artificial Intelligence > Core AI > Interpretability

Keywords

benchmark dataset turkish language text normalization text correction misspelling detection misspelling correction turkish text

Download PDF

Related papers

Fast semantic parsing with well-typedness guarantees 2020

Detecting Objectifying Language in Online Professor Reviews 2020

Analogous Process Structure Induction for Sub-event Sequence Prediction 2020

Aspect Sentiment Classification with Aspect-Specific Opinion Spans 2020

Robust and Interpretable Grounding of Spatial References with Relation Networks 2020