2025
COLING
COLING 2025
Page Stream Segmentation with LLMs: Challenges and Applications in Insurance Document Automation
Abstract
AbstractPage Stream Segmentation (PSS) is critical for automating document processing in industries like insurance, where unstructured document collections are common. This paper explores the use of large language models (LLMs) for PSS, applying parameter-efficient fine-tuning to real-world insurance data. Our experiments show that LLMs outperform baseline models in page- and stream-level segmentation accuracy. However, stream-level calibration remains challenging, especially for high-stakes applications. We evaluate post-hoc calibration and Monte Carlo dropout, finding limited improvement. Future work will integrate active learning to enhance model calibration and support deployment in practical settings.
🌉
Interdisciplinary Bridge
— Computer Science and Computer Vision and Deep Learning and Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— page stream segmentation
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Deep Learning > Techniques > Pretraining
Natural Language Processing > Applications > Text Classification
Computer Science > Applications > Document Analysis
Computer Vision > Processing > Semantic Segmentation
Machine Learning > Learning Types > Transfer Learning
Computer Vision > Domain-Specific > Document Analysis
Natural Language Processing > Applications > Document Analysis