Empowering Healthcare Practitioners with Language Models: Structuring Speech Transcripts in Two Real-World Clinical Applications

Jean-philippe Corbeil; Asma Ben Abacha; George Michalopoulos; Phillip Swazinna; Miguel Del-Agua; Jerome Tremblay; Akila Jeeson Daniel; Cari Bader; Kevin Cho; Pooja Krishnan; Nathan Bodenstab; Thomas Lin; Wenxuan Teng; Francois Beaulieu; Paul Vozila

2025 EMNLP EMNLP 2025

Empowering Healthcare Practitioners with Language Models: Structuring Speech Transcripts in Two Real-World Clinical Applications

Abstract

AbstractLarge language models (LLMs) such as GPT-4o and o1 have demonstrated strong performance on clinical natural language processing (NLP) tasks across multiple medical benchmarks. Nonetheless, two high-impact NLP tasks — structured tabular reporting from nurse dictations and medical order extraction from doctor-patient consultations — remain underexplored due to data scarcity and sensitivity, despite active industry efforts. Practical solutions to these real-world clinical tasks can significantly reduce the documentation burden on healthcare providers, allowing greater focus on patient care. In this paper, we investigate these two challenging tasks using private and open-source clinical datasets, evaluating the performance of both open- and closed-weight LLMs, and analyzing their respective strengths and limitations. Furthermore, we propose an agentic pipeline for generating realistic, non-sensitive nurse dictations, enabling structured extraction of clinical observations. To support further research in both areas, we release SYNUR and SIMORD, the first open-source datasets for nurse observation extraction and medical order extraction.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Healthcare & Medicine and Natural Language Processing

🧭 Keyword Pioneer — medical order extraction

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jean-philippe Corbeil , Asma Ben Abacha , George Michalopoulos , Phillip Swazinna , Miguel Del-Agua , Jerome Tremblay , Akila Jeeson Daniel , Cari Bader , Kevin Cho , Pooja Krishnan , Nathan Bodenstab , Thomas Lin , Wenxuan Teng , Francois Beaulieu , Paul Vozila

Topics

Artificial Intelligence > Core AI > Foundation Models Artificial Intelligence > Core AI > Human-AI Interaction Natural Language Processing > Applications > Information Extraction Healthcare & Medicine > Clinical > Clinical NLP Artificial Intelligence > Core AI > Large Language Models Healthcare & Medicine > Clinical > Medical AI Healthcare & Medicine > Clinical > Medical NLP

Keywords

speech transcription clinical natural language processing clinical nlp speech transcript clinical documentation large language model structured extraction agentic pipeline medical order extraction structured reporting

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025