DeepPaperComposer: A Simple Solution for Training Data Preparation for Parsing Research Papers

Meng Ling; Jian Chen

2020 EMNLP EMNLP 2020

DeepPaperComposer: A Simple Solution for Training Data Preparation for Parsing Research Papers

Abstract

AbstractWe present DeepPaperComposer, a simple solution for preparing highly accurate (100%) training data without manual labeling to extract content from scholarly articles using convolutional neural networks (CNNs). We used our approach to generate data and trained CNNs to extract eight categories of both textual (titles, abstracts, headers, figure and table captions, and other texts) and non-textural content (figures and tables) from 30 years of IEEE VIS conference papers, of which a third were scanned bitmap PDFs. We curated this dataset and named it VISpaper-3K. We then showed our initial benchmark performance using VISpaper-3K over itself and CS-150 using YOLOv3 and Faster-RCNN. We open-source DeepPaperComposer of our training data generation and released the resulting annotation data VISpaper-3K to promote re-producible research.

🌉 Interdisciplinary Bridge — Computer Science and Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — figure detection

🐣 Hot Topic Early Bird — document analysis

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Meng Ling , Jian Chen

Topics

Machine Learning > Application Areas > Data Augmentation Deep Learning > Architectures > Neural Networks Computer Vision > Analysis > Object Detection Computer Vision > Processing > Image Segmentation Computer Science > Applications > Document Analysis Computer Vision > Processing > Image Processing Deep Learning > Learning Types > Deep Learning Deep Learning > Learning Types > Data Augmentation

Keywords

semantic segmentation object detection information extraction document parsing document analysis convolutional neural network scientific document training data generation image extraction table detection figure detection

Download PDF

Related papers

Fast semantic parsing with well-typedness guarantees 2020

Detecting Objectifying Language in Online Professor Reviews 2020

Analogous Process Structure Induction for Sub-event Sequence Prediction 2020

Aspect Sentiment Classification with Aspect-Specific Opinion Spans 2020

Robust and Interpretable Grounding of Spatial References with Relation Networks 2020