Two-Step Classification using Recasted Data for Low Resource Settings

Shagun Uppal; Vivek Gupta; Avinash Swaminathan; Haimin Zhang; Debanjan Mahata; Rakesh Gosangi; Rajiv Ratn Shah; Amanda Stent

2020 AACL AACL 2020

Two-Step Classification using Recasted Data for Low Resource Settings

Abstract

AbstractAn NLP model’s ability to reason should be independent of language. Previous works utilize Natural Language Inference (NLI) to understand the reasoning ability of models, mostly focusing on high resource languages like English. To address scarcity of data in low-resource languages such as Hindi, we use data recasting to create NLI datasets for four existing text classification datasets. Through experiments, we show that our recasted dataset is devoid of statistical irregularities and spurious patterns. We further study the consistency in predictions of the textual entailment models and propose a consistency regulariser to remove pairwise-inconsistencies in predictions. We propose a novel two-step classification method which uses textual-entailment predictions for classification task. We further improve the performance by using a joint-objective for classification and textual entailment. We therefore highlight the benefits of data recasting and improvements on classification performance using our approach with supporting experimental results.

🚀 Conference Pioneer — AACL 2020

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — data recasting

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Shagun Uppal , Vivek Gupta , Avinash Swaminathan , Haimin Zhang , Debanjan Mahata , Rakesh Gosangi , Rajiv Ratn Shah , Amanda Stent

Topics

Natural Language Processing > Applications > Text Classification Machine Learning > Learning Types > Transfer Learning Natural Language Processing > Applications > Natural Language Inference

Keywords

text classification natural language inference low-resource language textual entailment data recasting

Download PDF

Related papers

Can Monolingual Pretrained Models Help Cross-Lingual Classification? 2020

Text Simplification with Reinforcement Learning Using Supervised Rewards on Grammaticality, Meaning Preservation, and Simplicity 2020

ISA: An Intelligent Shopping Assistant 2020

Social Media Medical Concept Normalization using RoBERTa in Ontology Enriched Text Similarity Framework 2020

Overcoming Resistance: The Normalization of an Amazonian Tribal Language 2020