2024
ACL
ACL 2024
Toward Reliable Ad-hoc Scientific Information Extraction: A Case Study on Two Materials Dataset
Abstract
AbstractWe explore the ability of GPT-4 to perform ad-hoc schema-based information extraction from scientific literature. We assess specifically whether it can, with a basic one-shot prompting approach over the full text of the included manusciprts, replicate two existing material science datasets, one pertaining to multi-principal element alloys (MPEAs), and one to silicate diffusion. We collaborate with materials scientists to perform a detailed manual error analysis to assess where and why the model struggles to faithfully extract the desired information, and draw on their insights to suggest research directions to address this broadly important task.
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio