Multi-Label Field Classification for Scientific Documents using Expert and Crowd-sourced Knowledge

Rebecca Gelles; James Dunham

2024 EMNLP EMNLP 2024

Multi-Label Field Classification for Scientific Documents using Expert and Crowd-sourced Knowledge

Abstract

AbstractTaxonomies of scientific research seek to describe complex domains of activity that are overlapping and dynamic. We address this challenge by combining knowledge curated by the Wikipedia community with the input of subject-matter experts to identify, define, and validate a system of 1,110 granular fields of study for use in multi-label classification of scientific publications. The result is capable of categorizing research across subfields of artificial intelligence, computer security, semiconductors, genetics, virology, immunology, neuroscience, biotechnology, and bioinformatics. We then develop and evaluate a solution for zero-shot classification of publications in terms of these fields.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Rebecca Gelles , James Dunham

Topics

Machine Learning > Core Methods > Classification Machine Learning > Learning Types > Zero-Shot Learning Natural Language Processing > Applications > Text Classification Machine Learning > Learning Paradigms > Zero-Shot Learning Machine Learning > Learning Types > Multi-Label Classification Machine Learning > Core Methods > Multi-Label Classification

Keywords

zero-shot learning text classification multi-label classification knowledge base scientific document zero-shot classification scientific publication knowledge engineering

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024