Large Language Models Provide Human-Level Medical Text Snippet Labeling

Ibtihel Amara; Haiyang Yu; Fan Zhang; Yuchen Liu; Benny Li; Chang Liu; Rupesh Kartha; Akshay Goel

2024 NAACL NAACL 2024

Large Language Models Provide Human-Level Medical Text Snippet Labeling

Abstract

AbstractThis study evaluates the proficiency of Large Language Models (LLMs) in accurately labeling clinical document excerpts. Our focus is on the assignment of potential or confirmed diagnoses and medical procedures to snippets of medical text sourced from unstructured clinical patient records. We explore how the performance of LLMs compare against human annotators in classifying these excerpts. Employing a few-shot, chain-of-thought prompting approach with the MIMIC-III dataset, Med-PaLM 2 showcases annotation accuracy comparable to human annotators, achieving a notable precision rate of approximately 92% relative to the gold standard labels established by human experts.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Natural Language Processing

🧭 Keyword Pioneer — medical labeling

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ibtihel Amara , Haiyang Yu , Fan Zhang , Yuchen Liu , Benny Li , Chang Liu , Rupesh Kartha , Akshay Goel

Topics

Artificial Intelligence > Learning Paradigms > Few-Shot Learning Natural Language Processing > Applications > Text Classification Natural Language Processing > Resources & Methods > Large Language Models

Keywords

few-shot learning text classification chain-of-thought prompting clinical annotation medical labeling

Download PDF

Related papers

Working Alliance Transformer for Psychotherapy Dialogue Classification 2024

Named Entity Recognition Under Domain Shift via Metric Learning for Life Sciences 2024

Assessing Logical Puzzle Solving in Large Language Models: Insights from a Minesweeper Case Study 2024

TelME: Teacher-leading Multimodal Fusion Network for Emotion Recognition in Conversation 2024

Extractive Summarization with Text Generator 2024