Detection of Puffery on the English Wikipedia

Amanda Bertsch; Steven Bethard

2021 EMNLP EMNLP 2021

Detection of Puffery on the English Wikipedia

Abstract

AbstractOn Wikipedia, an online crowdsourced encyclopedia, volunteers enforce the encyclopedia’s editorial policies. Wikipedia’s policy on maintaining a neutral point of view has inspired recent research on bias detection, including “weasel words” and “hedges”. Yet to date, little work has been done on identifying “puffery,” phrases that are overly positive without a verifiable source. We demonstrate that collecting training data for this task requires some care, and construct a dataset by combining Wikipedia editorial annotations and information retrieval techniques. We compare several approaches to predicting puffery, and achieve 0.963 f1 score by incorporating citation features into a RoBERTa model. Finally, we demonstrate how to integrate our model with Wikipedia’s public infrastructure to give back to the Wikipedia editor community.

🌉 Interdisciplinary Bridge — Computer Science and Deep Learning and Interdisciplinary and Natural Language Processing

🧭 Keyword Pioneer — puffery detection

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Amanda Bertsch , Steven Bethard

Topics

Deep Learning > Architectures > Transformers Natural Language Processing > Applications > Fact-Checking Natural Language Processing > Applications > Text Classification Natural Language Processing > Resources & Methods > Large Language Models Computer Science > Applications > Information Retrieval Interdisciplinary > Social > Education Deep Learning > Models > Transformers

Keywords

text classification information retrieval roberta model puffery detection

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021