Learning to Extract Semantic Structure From Documents Using Multimodal Fully Convolutional Neural Networks

Xiao Yang; Ersin Yumer; Paul Asente; Mike Kraley; Daniel Kifer; C. Lee Giles

2017 CVPR CVPR 2017

Learning to Extract Semantic Structure From Documents Using Multimodal Fully Convolutional Neural Networks

Abstract

We present an end-to-end, multimodal, fully convolutional network for extracting semantic structures from document images. We consider document semantic structure extraction as a pixel-wise segmentation task, and propose a unified model that classifies pixels based not only on their visual appearance, as in the traditional page segmentation task, but also on the content of underlying text. Moreover, we propose an efficient synthetic document generation process that we use to generate pretraining data for our network. Once the network is trained on a large set of synthetic documents, we fine-tune the network on unlabeled real documents using a semi-supervised approach. We systematically study the optimum network architecture and show that both our multimodal approach and the synthetic data pretraining significantly boost the performance.

🌉 Interdisciplinary Bridge — Computer Vision and Machine Learning

🧭 Keyword Pioneer — pixel-wise segmentation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xiao Yang , Ersin Yumer , Paul Asente , Mike Kraley , Daniel Kifer , C. Lee Giles

Topics

Machine Learning > Learning Types > Semi-Supervised Learning Computer Vision > Processing > Image Segmentation

Keywords

semi-supervised learning pixel-wise segmentation document image semantic structure extraction multimodal fully convolutional network synthetic document generation

Download PDF

Related papers

Deep Outdoor Illumination Estimation 2017

SRN: Side-output Residual Network for Object Symmetry Detection in the Wild 2017

Weakly Supervised Semantic Segmentation Using Web-Crawled Videos 2017

FASON: First and Second Order Information Fusion Network for Texture Recognition 2017

Recurrent Convolutional Neural Networks for Continuous Sign Language Recognition by Staged Optimization 2017