← Resources & Methods

Natural Language Processing › Resources & Methods ›

Text Representation

2246 directly classified papers

Papers per year

Papers

Universal Patterns of Grammatical Gender in Multilingual Large Language Models EMNLP 2025

FaMTEB: Massive Text Embedding Benchmark in Persian Language EMNLP 2025

Entropy2Vec: Crosslingual Language Modeling Entropy as End-to-End Learnable Language Representations EMNLP 2025

Tracing Definitions: Lessons from Alliance Contracts in the Biopharmaceutical Industry EMNLP 2025

Mind the Query: A Benchmark Dataset towards Text2Cypher Task EMNLP 2025

Embedding Style Beyond Topics: Analyzing Dispersion Effects Across Different Language Models COLING 2025

Whose Palestine Is It? A Topic Modelling Approach to National Framing in Academic Research EMNLP 2025

Evaluating Contextualized Representations of (Spanish) Ambiguous Words: A New Lexical Resource and Empirical Analysis NAACL 2025

Detecting Inconsistencies in Narrative Elements of Cross Lingual Nakba Texts COLING 2025

Sinhala Encoder-only Language Models and Evaluation ACL 2025

Retrieval of Parallelizable Texts Across Church Slavic Variants COLING 2025

From Syntax to Semantics: Evaluating the Impact of Linguistic Structures on LLM-Based Information Extraction ACL 2025

The iRead4Skills Intelligent Complexity Analyzer EMNLP 2025

Finding Diamonds in Conversation Haystacks: A Benchmark for Conversational Data Retrieval EMNLP 2025

Improving Model Evaluation using SMART Filtering of Benchmark Datasets NAACL 2025

Matina: A Large-Scale 73B Token Persian Text Corpus NAACL 2025

Fine-Grained Change Point Detection for Topic Modeling with Pitman-Yor Process JMLR 2025

ComicScene154: A Scene Dataset for Comic Analysis EMNLP 2025

Comparable Corpora: Opportunities for New Research Directions COLING 2025

Transfer of Structural Knowledge from Synthetic Languages ACL 2025

SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts EMNLP 2025

Zero-Shot Cross-Sentential Scientific Relation Extraction via Entity-Guided Summarization IJCNLP 2025

SELEXINI – a large and diverse automatically parsed corpus of French COLING 2025

Constrained Non-negative Matrix Factorization for Guided Topic Modeling of Minority Topics EMNLP 2025

PARSE: LLM Driven Schema Optimization for Reliable Entity Extraction EMNLP 2025