Learning Interpretable Style Embeddings via Prompting LLMs

Ajay Patel; Delip Rao; Ansh Kothary; Kathleen McKeown; Chris Callison-Burch

2023 EMNLP EMNLP 2023

Learning Interpretable Style Embeddings via Prompting LLMs

Abstract

AbstractStyle representation learning builds content-independent representations of author style in text. To date, no large dataset of texts with stylometric annotations on a wide range of style dimensions has been compiled, perhaps because the linguistic expertise to perform such annotation would be prohibitively expensive. Therefore, current style representation approaches make use of unsupervised neural methods to disentangle style from content to create style vectors. These approaches, however, result in uninterpretable representations, complicating their usage in downstream applications like authorship attribution where auditing and explainability is critical. In this work, we use prompting to perform stylometry on a large number of texts to generate a synthetic stylometry dataset. We use this synthetic data to then train human-interpretable style representations we call LISA embeddings. We release our synthetic dataset (StyleGenome) and our interpretable style embedding model (LISA) as resources.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Interdisciplinary and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — style representation learning

🐣 Hot Topic Early Bird — authorship attribution

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ajay Patel , Delip Rao , Ansh Kothary , Kathleen McKeown , Chris Callison-Burch

Topics

Artificial Intelligence > Core AI > Interpretability Machine Learning > Core Methods > Embedding Learning Natural Language Processing > Understanding > Semantic Analysis Natural Language Processing > Applications > Text Classification Natural Language Processing > Resources & Methods > Large Language Models Interdisciplinary > Linguistics > Computational Linguistics Artificial Intelligence > Core AI > Large Language Models Deep Learning > Techniques > Prompt Engineering

Keywords

prompt engineering authorship attribution style representation interpretable embedding large language model style representation learning

Download PDF

Related papers

Exploring Linguistic Probes for Morphological Generalization 2023

NameGuess: Column Name Expansion for Tabular Data 2023

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning 2023

Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation 2023

On the Calibration of Large Language Models and Alignment 2023