All Entities are Not Created Equal: Examining the Long Tail for Ultra-Fine Entity Typing

Advait Deshmukh; Ashwin Umadi; Dananjay Srinivas; Maria Leonor Pacheco

2025 EMNLP EMNLP 2025

All Entities are Not Created Equal: Examining the Long Tail for Ultra-Fine Entity Typing

Abstract

AbstractDue to their capacity to acquire world knowledge from large corpora, pre-trained language models (PLMs) are extensively used in ultra-fine entity typing tasks where the space of labels is extremely large. In this work, we explore the limitations of the knowledge acquired by PLMs by proposing a novel heuristic to approximate the pre-training distribution of entities when the pre-training data is unknown. Then, we systematically demonstrate that entity-typing approaches that rely solely on the parametric knowledge of PLMs struggle significantly with entities at the long tail of the pre-training distribution, and that knowledge-infused approaches can account for some of these shortcomings. Our findings suggest that we need to go beyond PLMs to produce solutions that perform well for infrequent entities.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Advait Deshmukh , Ashwin Umadi , Dananjay Srinivas , Maria Leonor Pacheco

Topics

Natural Language Processing > Understanding > Named Entity Recognition

Keywords

pre-trained language model long-tail distribution knowledge infusion entity knowledge ultra-fine entity typing

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025