Does BERT Learn as Humans Perceive? Understanding Linguistic Styles through Lexica

Shirley Anugrah Hayati; Dongyeop Kang; Lyle Ungar

2021 EMNLP EMNLP 2021

Does BERT Learn as Humans Perceive? Understanding Linguistic Styles through Lexica

Abstract

AbstractPeople convey their intention and attitude through linguistic styles of the text that they write. In this study, we investigate lexicon usages across styles throughout two lenses: human perception and machine word importance, since words differ in the strength of the stylistic cues that they provide. To collect labels of human perception, we curate a new dataset, Hummingbird, on top of benchmarking style datasets. We have crowd workers highlight the representative words in the text that makes them think the text has the following styles: politeness, sentiment, offensiveness, and five emotion types. We then compare these human word labels with word importance derived from a popular fine-tuned style classifier like BERT. Our results show that the BERT often finds content words not relevant to the target style as important words used in style prediction, but humans do not perceive the same way even though for some styles (e.g., positive sentiment and joy) human- and machine-identified words share significant overlap for some styles.

❓ The Questioner

🌉 Interdisciplinary Bridge — Artificial Intelligence and Interdisciplinary and Natural Language Processing

📈 Trend Setter — Text Classification

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Shirley Anugrah Hayati , Dongyeop Kang , Lyle Ungar

Topics

Artificial Intelligence > Core AI > Interpretability Natural Language Processing > Understanding > Sentiment Analysis Interdisciplinary > Linguistics > Computational Linguistics Natural Language Processing > Applications > Sentiment Analysis Natural Language Processing > Understanding > Text Classification

Keywords

sentiment analysis text classification human perception feature importance model interpretability linguistic style word importance style classification

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021