Complex words identification using word-level features for SemEval-2020 Task 1

Jenny A. Ortiz-Zambrano; Arturo Montejo-Ráez

2021 ACL ACL 2021

Complex words identification using word-level features for SemEval-2020 Task 1

Abstract

AbstractThis article describes a system to predict the complexity of words for the Lexical Complexity Prediction (LCP) shared task hosted at SemEval 2021 (Task 1) with a new annotated English dataset with a Likert scale. Located in the Lexical Semantics track, the task consisted of predicting the complexity value of the words in context. A machine learning approach was carried out based on the frequency of the words and several characteristics added at word level. Over these features, a supervised random forest regression algorithm was trained. Several runs were performed with different values to observe the performance of the algorithm. For the evaluation, our best results reported a M.A.E score of 0.07347, M.S.E. of 0.00938, and R.M.S.E. of 0.096871. Our experiments showed that, with a greater number of characteristics, the precision of the classification increases.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — word-level feature

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jenny A. Ortiz-Zambrano , Arturo Montejo-Ráez

Topics

Artificial Intelligence > Learning Paradigms > Transfer Learning Machine Learning > Core Methods > Classification Machine Learning > Core Methods > Regression Machine Learning > Core Methods > Representation Learning Machine Learning > Learning Types > Weakly Supervised Learning Machine Learning > Learning Types > Supervised Learning

Keywords

feature extraction lexical semantics lexical complexity semantic analysis supervised learning semantic similarity feature engineering supervised regression word embedding random forest word-level feature word complexity random forest regression word level feature

Download PDF

Related papers

Out-of-Scope Intent Detection with Self-Supervision and Discriminative Training 2021

A Non-Autoregressive Edit-Based Approach to Controllable Text Simplification 2021

How Did This Get Funded?! Automatically Identifying Quirky Scientific Achievements 2021

Exploring Discourse Structures for Argument Impact Classification 2021

Language Embeddings for Typology and Cross-lingual Transfer Learning 2021