DocUNet: Document Image Unwarping via a Stacked U-Net

Ke Ma; Zhixin Shu; Xue Bai; Jue Wang; Dimitris Samaras

2018 CVPR CVPR 2018

DocUNet: Document Image Unwarping via a Stacked U-Net

Abstract

Capturing document images is a common way for digitizing and recording physical documents due to the ubiquitousness of mobile cameras. To make text recognition easier, it is often desirable to digitally flatten a document image when the physical document sheet is folded or curved. In this paper, we develop the first learning-based method to achieve this goal. We propose a stacked U-Net with intermediate supervision to directly predict the forward mapping from a distorted image to its rectified version. Because large-scale real-world data with ground truth deformation is difficult to obtain, we create a synthetic dataset with approximately 100 thousand images by warping non-distorted document images. The network is trained on this dataset with various data augmentations to improve its generalization ability. We further create a comprehensive benchmark that covers various real-world conditions. We evaluate the proposed model quantitatively and qualitatively on the proposed benchmark, and compare it with previous non-learning-based methods.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — document unwarping

🐣 Hot Topic Early Bird — synthetic dataset

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ke Ma , Zhixin Shu , Xue Bai , Jue Wang , Dimitris Samaras

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Application Areas > Data Augmentation

Keywords

image restoration data augmentation synthetic dataset document unwarping stacked u-net

Download PDF

Related papers

Multi-Shot Pedestrian Re-Identification via Sequential Decision Making 2018

Multi-Cue Correlation Filters for Robust Visual Tracking 2018

Pointwise Convolutional Neural Networks 2018

Learning Attentions: Residual Attentional Siamese Network for High Performance Online Visual Tracking 2018

Image Generation From Scene Graphs 2018