← Resources & Methods

Natural Language Processing › Resources & Methods ›

Language Modeling

1090 directly classified papers

Papers per year

Papers

Gathering Compositionality Ratings of Ambiguous Noun-Adjective Multiword Expressions in Galician NAACL 2025

A European Portuguese corpus annotated for verbal idioms NAACL 2025

Podcast Outcasts: Understanding Rumble’s Podcast Dynamics NAACL 2025

Beyond Cairo: Sa’idi Egyptian Arabic Literary Corpus Construction and Analysis NAACL 2025

Choose Your Words Wisely: Domain-adaptive Masking Makes Language Models Learn Faster NAACL 2025

Punctuation Restoration Improves Structure Understanding without Supervision NAACL 2025

Restoring Missing Spaces in Scraped Hebrew Social Media NAACL 2025

MissRAG: Addressing the Missing Modality Challenge in Multimodal Large Language Models ICCV 2025

Flaw or Artifact? Rethinking Prompt Sensitivity in Evaluating LLMs EMNLP 2025

Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs EMNLP 2025

A Continuous Approach to Metaphorically Motivated Regular Polysemy in Language Models CONLL 2025

PrimeX: A Dataset of Worldview, Opinion, and Explanation EMNLP 2025

Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index EMNLP 2025

Which Word Orders Facilitate Length Generalization in LMs? An Investigation with GCG-Based Artificial Languages EMNLP 2025

neDIOM: Dataset and Analysis of Nepali Idioms COLING 2025

Pre-trained Language Models Learn Remarkably Accurate Representations of Numbers EMNLP 2025

FinMoE: A MoE-based Large Chinese Financial Language Model COLING 2025

Exploring morphology-aware tokenization: A case study on Spanish language modeling EMNLP 2025

Evaluating Financial Literacy of Large Language Models through Domain Specific Languages for Plain Text Accounting COLING 2025

TRACE: Training and Inference-Time Interpretability Analysis for Language Models EMNLP 2025

Evaluating Structural and Linguistic Quality in Urdu DRS Parsing and Generation through Bidirectional Evaluation COLING 2025

VRCP: Vocabulary Replacement Continued Pretraining for Efficient Multilingual Language Models COLING 2025

Studying the Effect of Hindi Tokenizer Performance on Downstream Tasks COLING 2025

RAP: A Metric for Balancing Repetition and Performance in Open-Source Large Language Models NAACL 2025

Overview of the First Workshop on Language Models for Low-Resource Languages (LoResLM 2025) COLING 2025