← Resources & Methods

Natural Language Processing › Resources & Methods ›

Language Modeling

1090 directly classified papers

Papers per year

Papers

Revisiting Word Embeddings in the LLM Era IJCNLP 2025

Not quite Sherlock Holmes: Language model predictions do not reliably differentiate impossible from improbable events ACL 2025

Overlapping Context with Variable-Length Stride Increases Diversity when Training Large Language Model for Code ACL 2025

Speed Without Sacrifice: Fine-Tuning Language Models with Medusa and Knowledge Distillation in Travel Applications ACL 2025

FOCUS: A Benchmark for Targeted Socratic Question Generation via Source-Span Grounding IJCNLP 2025

AQuilt: Weaving Logic and Self-Inspection into Low-Cost, High-Relevance Data Synthesis for Specialist LLMs EMNLP 2025

Realistic Training Data Generation and Rule Enhanced Decoding in LLM for NameGuess EMNLP 2025

NOVA-63: Native Omni-lingual Versatile Assessments of 63 Disciplines EMNLP 2025

Improbable Bigrams Expose Vulnerabilities of Incomplete Tokens in Byte-Level Tokenizers EMNLP 2025

Evaluating Pretrained Causal Language Models for Synonymy ACL 2025

AI-Driven Multicultural Identity Preservation AAAI 2025

skLEP: A Slovak General Language Understanding Benchmark ACL 2025

Global Eye: Breaking the “Fixed Thinking Pattern” during the Instruction Expansion Process ACL 2025

From Human Reading to NLM Understanding: Evaluating the Role of Eye-Tracking Data in Encoder-Based Models ACL 2025

Segment First or Comprehend First? Explore the Limit of Unsupervised Word Segmentation with Large Language Models ACL 2025

BehaviorBox: Automated Discovery of Fine-Grained Performance Differences Between Language Models ACL 2025

GRaMPa: Subword Regularisation by Skewing Uniform Segmentation Distributions with an Efficient Path-counting Markov Model ACL 2025

RATIONALYST: Pre-training Process-Supervision for Improving Reasoning ACL 2025

Do Language Models Understand Honorific Systems in Javanese? ACL 2025

MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities ACL 2025

Unveiling the Potential of BERT-family: A New Recipe for Building Scalable, General and Competitive Large Language Models ACL 2025

Re3Syn: A Dependency-Based Data Synthesis Framework for Long-Context Post-training ACL 2025

Beyond Text Compression: Evaluating Tokenizers Across Scales ACL 2025

Semantic Frame Induction from a Real-World Corpus ACL 2025

IMPARA-GED: Grammatical Error Detection is Boosting Reference-free Grammatical Error Quality Estimator ACL 2025