Representation Learning for Information Extraction from Form-like Documents

Bodhisattwa Prasad Majumder; Navneet Potti; Sandeep Tata; James Bradley Wendt; Qi Zhao; Marc Najork

2020 ACL ACL 2020

Representation Learning for Information Extraction from Form-like Documents

Abstract

AbstractWe propose a novel approach using representation learning for tackling the problem of extracting structured information from form-like document images. We propose an extraction system that uses knowledge of the types of the target fields to generate extraction candidates and a neural network architecture that learns a dense representation of each candidate based on neighboring words in the document. These learned representations are not only useful in solving the extraction task for unseen document templates from two different domains but are also interpretable, as we show using loss cases.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning and Natural Language Processing

📈 Trend Setter — Document Analysis

🧭 Keyword Pioneer — form-like document

🐣 Hot Topic Early Bird — document understanding

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio