Cheese it up: CamemBERT Outperforms Large Language Models for Identification of French Multi-word Expressions

Sergei Bagdasarov; Diego Alves; Elke Teich

2026 EACL EACL 2026

Cheese it up: CamemBERT Outperforms Large Language Models for Identification of French Multi-word Expressions

Abstract

AbstractIn recent years, language models, both encoder-only and generative, have been applied to a variety of downstream NLP tasks, includingsequence labeling tasks like automatic multi-word expression identification (MWEI). Multiple studies show that, in general, fine-tunedencoder-only models like BERT tend to outperform pretrained generative LLMs on downstream tasks (Arzideh et al., 2025; Ochoa et al.,2025; Bucher and Martini, 2024; Sebok et al., 2025). However, such comparisons are sparse for MWEI, in particular for French, in partdue to the lack of comprehensive gold-standard datasets. In this study, we address this research gap by comparing CamemBERT with gpt-oss and Qwen3 for MWEI, using the French subcorpus of the newly released PARSEME dataset. CamemBERT outperforms both LLMs by large margins in precision, recall, and F1. We complement this numerical evaluation with a qualitative analysis of prediction errors.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Sergei Bagdasarov , Diego Alves , Elke Teich

Topics

Natural Language Processing > Applications > Text Classification Natural Language Processing > Resources & Methods > Large Language Models

Keywords

sequence labeling encoder-only model multi-word expression identification large language model

Download PDF

Related papers

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health 2026

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models 2026

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection 2026

Generative Personality Simulation via Theory-Informed Structured Interview 2026

Word Surprisal Correlates with Sentential Contradiction in LLMs 2026