ARTS: Assessing Readability & Text Simplicity

Björn Engelmann; Christin Katharina Kreutz; Fabian Haak; Philipp Schaer

2024 EMNLP EMNLP 2024

ARTS: Assessing Readability & Text Simplicity

Abstract

AbstractAutomatic text simplification aims to reduce a text’s complexity. Its evaluation should quantify how easy it is to understand a text. Datasets with simplicity labels on text level are a prerequisite for developing such evaluation approaches. However, current publicly available datasets do not align with this, as they mainly treat text simplification as a relational concept (“How much simpler has this text gotten compared to the original version?”) or assign discrete readability levels.This work alleviates the problem of Assessing Readability & Text Simplicity. We present ARTS, a method for language-independent construction of datasets for simplicity assessment. We propose using pairwise comparisons of texts in conjunction with an Elo algorithm to produce a simplicity ranking and simplicity scores. Additionally, we provide a high-quality human-labeled and three GPT-labeled simplicity datasets. Our results show a high correlation between human and LLM-based labels, allowing for an effective and cost-efficient way to construct large synthetic datasets.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Science and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — elo algorithm

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Björn Engelmann , Christin Katharina Kreutz , Fabian Haak , Philipp Schaer

Topics

Machine Learning > Application Areas > Domain Adaptation Machine Learning > Application Areas > Fairness Natural Language Processing > Applications > Text Classification Computer Science > Applications > Document Analysis Natural Language Processing > Resources & Methods > Language Modeling Machine Learning > Core Methods > Ranking Natural Language Processing > Applications > Summarization Artificial Intelligence > Core AI > Natural Language Processing Natural Language Processing > Applications > Text Simplification

Keywords

text classification readability assessment text simplification elo rating pairwise comparison language model automatic evaluation elo algorithm simplicity ranking

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024