Random Label Forests: An Ensemble Method with Label Subsampling For Extreme Multi-Label Problems

Sheng-Wei Chen; Chih-Jen Lin

2024 EMNLP EMNLP 2024

Random Label Forests: An Ensemble Method with Label Subsampling For Extreme Multi-Label Problems

Abstract

AbstractText classification is one of the essential topics in natural language processing, and each text is often associated with multiple labels. Recently, the number of labels has become larger and larger, especially in the applications of e-commerce, so handling text-related e-commerce problems further requires a large memory space in many existing multi-label learning methods. To address the space concern, utilizing a distributed system to share that large memory requirement is a possible solution. We propose “random label forests,” a distributed ensemble method with label subsampling, for handling extremely large-scale labels. Random label forests can reduce memory usage per computer while keeping competitive performances over real-world data sets.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — label subsampling

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Sheng-Wei Chen , Chih-Jen Lin

Topics

Machine Learning > Core Methods > Classification Natural Language Processing > Applications > Text Classification Machine Learning > Learning Types > Multi-Label Classification Deep Learning > Learning Types > Ensemble Learning

Keywords

ensemble learning multi-label classification distributed learning distributed computing ensemble method random forest label subsampling extreme multi-label

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024