JCT at SemEval-2022 Task 4-A: Patronism Detection in Posts Written in English using Preprocessing Methods and various Machine Leaerning Methods

Yaakov HaCohen-Kerner; Ilan Meyrowitsch; Matan Fchima

2022 NAACL NAACL 2022

JCT at SemEval-2022 Task 4-A: Patronism Detection in Posts Written in English using Preprocessing Methods and various Machine Leaerning Methods

Abstract

AbstractIn this paper, we describe our submissions to SemEval-2022 subtask 4-A - “Patronizing and Condescending Language Detection: Binary Classification”. We developed different models for this subtask. We applied 11 supervised machine learning methods and 9 preprocessing methods. Our best submission was a model we built with BertForSequenceClassification. Our experiments indicate that pre-processing stage is a must for a successful model. The dataset for Subtask 1 is highly imbalanced dataset. The f1-scores on the oversampled imbalanced training dataset were higher the results on the original training dataset.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio