Cognitive Signatures of Multi-Word Expressions: Reading-Time and Surprisal

Diego Alves; Sergei Bagdasarov; Elke Teich

2026 EACL EACL 2026

Cognitive Signatures of Multi-Word Expressions: Reading-Time and Surprisal

Abstract

AbstractThis study investigates whether eye-tracking measures predict if a word is the final token of a multi-word expression (MWE), focusing on two understudied MWE types: fixed expressions (e.g., due to) and phrasal verbs (e.g., turn out). Using mixed-effects logistic regression, we compared tokens in MWE contexts with the same tokens in non-MWE contexts. Results reveal a clear difference in processing. For fixed expressions, reading-time measures significantly predict MWEhood. In contrast, phrasal verbs show no consistent predictive effects. Additionally, we compared the reading-time models to models that included GPT-2 surprisal as a predictor. While surprisal does predict MWEhood, it fails to capture the distinction between types. These findings highlight the need to consider MWE typology in models of formulaic language processing.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — mixed-effects logistic regression

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing

Authors

Diego Alves , Sergei Bagdasarov , Elke Teich

Topics

Machine Learning > Core Methods > Regression Natural Language Processing > Understanding > Semantic Analysis Natural Language Processing > Generation > Language Modeling

Keywords

multi-word expression reading time mixed-effects logistic regression formulaic language

Download PDF

Related papers

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health 2026

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models 2026

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection 2026

Generative Personality Simulation via Theory-Informed Structured Interview 2026

Word Surprisal Correlates with Sentential Contradiction in LLMs 2026