← Resources & Methods

Natural Language Processing › Resources & Methods ›

Language Modeling

1090 directly classified papers

Papers per year

Papers

Priority on High-Quality: Selecting Instruction Data via Consistency Verification of Noise Injection EMNLP 2025

PersianMCQ-Instruct: A Comprehensive Resource for Generating Multiple-Choice Questions in Persian COLING 2025

IPA CHILDES & G2P+: Feature-Rich Resources for Cross-Lingual Phonology and Phonemic Language Modeling CONLL 2025

bAI-bAI: A Context-Aware Transliteration System for Baybayin Scripts COLING 2025

Semantic Frame Induction from a Real-World Corpus ACL 2025

Segment First or Comprehend First? Explore the Limit of Unsupervised Word Segmentation with Large Language Models ACL 2025

Not quite Sherlock Holmes: Language model predictions do not reliably differentiate impossible from improbable events ACL 2025

skLEP: A Slovak General Language Understanding Benchmark ACL 2025

Yankari: Monolingual Yoruba Dataset ACL 2025

Re3Syn: A Dependency-Based Data Synthesis Framework for Long-Context Post-training ACL 2025

Beyond Text Compression: Evaluating Tokenizers Across Scales ACL 2025

A Continuous Approach to Metaphorically Motivated Regular Polysemy in Language Models ACL 2025

Compositionality and Event Retrieval in Complement Coercion: A Study of Language Models in a Low-resource Setting ACL 2025

GBEM-UA: Gender Bias Evaluation and Mitigation for Ukrainian Large Language Models ACL 2025

Pre-trained Language Models Learn Remarkably Accurate Representations of Numbers EMNLP 2025

On the Path to Make Ukrainian a High-Resource Language ACL 2025

Enhanced Noun-Noun Compound Interpretation through Textual Enrichment EMNLP 2025

The Learning Dynamics of Subword Segmentation for Morphologically Diverse Languages IJCNLP 2025

Train It and Forget It: Merge Lists are Unnecessary for BPE Inference in Language Models EMNLP 2025

Odysseus Navigates the Sirens’ Song: Dynamic Focus Decoding for Factual and Diverse Open-Ended Text Generation ACL 2025

Training compute-optimal transformer encoder models EMNLP 2025

Deconstructing Attention: Investigating Design Principles for Effective Language Modeling IJCNLP 2025

Multi-token Mask-filling and Implicit Discourse Relations EMNLP 2025

DadmaTools V2: an Adapter-Based Natural Language Processing Toolkit for the Persian Language COLING 2025

Information Locality as an Inductive Bias for Neural Language Models ACL 2025