2026 EACL EACL 2026

Cognitive Signatures of Multi-Word Expressions: Reading-Time and Surprisal

Abstract

AbstractThis study investigates whether eye-tracking measures predict if a word is the final token of a multi-word expression (MWE), focusing on two understudied MWE types: fixed expressions (e.g., due to) and phrasal verbs (e.g., turn out). Using mixed-effects logistic regression, we compared tokens in MWE contexts with the same tokens in non-MWE contexts. Results reveal a clear difference in processing. For fixed expressions, reading-time measures significantly predict MWEhood. In contrast, phrasal verbs show no consistent predictive effects. Additionally, we compared the reading-time models to models that included GPT-2 surprisal as a predictor. While surprisal does predict MWEhood, it fails to capture the distinction between types. These findings highlight the need to consider MWE typology in models of formulaic language processing.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing
🧭 Keyword Pioneer — mixed-effects logistic regression
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing