← Resources & Methods

Natural Language Processing › Resources & Methods ›

Text Representation

2246 directly classified papers

Papers per year

Papers

From Human Reading to NLM Understanding: Evaluating the Role of Eye-Tracking Data in Encoder-Based Models ACL 2025

TriEmbed: Bridge the Gap between Text and Token Indices with Embedding Reparameterization ACL 2025

Unsupervised Morphological Tree Tokenizer ACL 2025

CogStack-KCL-UCL at ArchEHR-QA 2025: Investigating Hybrid LLM Approaches for Grounded Clinical Question Answering ACL 2025

Rubic2: Ensemble Model for Russian Lemmatization ACL 2025

Detecting Bias and Intersectional Bias in Italian Word Embeddings and Language Models ACL 2025

Measuring Gender Bias in Language Models in Farsi ACL 2025

GRaMPa: Subword Regularisation by Skewing Uniform Segmentation Distributions with an Efficient Path-counting Markov Model ACL 2025

Tokenization is Sensitive to Language Variation ACL 2025

Splintering Nonconcatenative Languages for Better Tokenization ACL 2025

SemEval-2025 Task 1: AdMIRe - Advancing Multimodal Idiomaticity Representation ACL 2025

LSC-Eval: A General Framework to Evaluate Methods for Assessing Dimensions of Lexical Semantic Change Using LLM-Generated Synthetic Data ACL 2025

mStyleDistance: Multilingual Style Embeddings and their Evaluation ACL 2025

Field to Model: Pairing Community Data Collection with Scalable NLP through the LiFE Suite ACL 2025

Modeling Complex Semantics Relation with Contrastively Fine-Tuned Relational Encoders ACL 2025

ClaimCatchers at SemEval-2025 Task 7: Sentence Transformers for Claim Retrieval ACL 2025

Dictionaries to the Rescue: Cross-Lingual Vocabulary Transfer for Low-Resource Languages Using Bilingual Dictionaries ACL 2025

Experiential Semantic Information and Brain Alignment: Are Multimodal Models Better than Language Models? ACL 2025

Long-Term Development of Attitudes towards Schizophrenia and Depression in Scientific Abstracts ACL 2025

Can Uniform Meaning Representation Help GPT-4 Translate from Indigenous Languages? ACL 2025

ComicScene154: A Scene Dataset for Comic Analysis EMNLP 2025

LawToken: a single token worth more than its constituents CONLL 2025

Constrained Non-negative Matrix Factorization for Guided Topic Modeling of Minority Topics EMNLP 2025

The iRead4Skills Intelligent Complexity Analyzer EMNLP 2025

Enhancing Character-Level Understanding in LLMs through Token Internal Structure Learning ACL 2025