Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Domain-Specific
Computer Vision
›
Domain-Specific
›
Document Analysis
278 directly classified papers
Papers per year
2005: 1
2007: 1
2009: 1
2011: 1
2013: 2
2014: 1
2015: 1
2016: 1
2017: 3
2018: 7
2019: 10
2020: 19
2021: 16
2022: 31
2023: 44
2024: 43
2025: 94
2026: 2
Papers
PPTSER: A Plug-and-Play Tag-guided Method for Few-shot Semantic Entity Recognition on Visually-rich Documents
ACL 2024
LayoutLLM: Large Language Model Instruction Tuning for Visually Rich Document Understanding
COLING 2024
Reading between the Lines: Image-Based Order Detection in OCR for Chinese Historical Documents
AAAI 2024
DocLLM: A Layout-Aware Generative Language Model for Multimodal Document Understanding
ACL 2024
Towards Real-World Writing Assistance: A Chinese Character Checking Benchmark with Faked and Misspelled Characters
ACL 2024
Multimodal Table Understanding
ACL 2024
Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On
CVPR 2024
Enhancing Vision-Language Pre-training with Rich Supervisions
CVPR 2024
Layout-Agnostic Scene Text Image Synthesis with Diffusion Models
CVPR 2024
Bridging the Gap Between End-to-End and Two-Step Text Spotting
CVPR 2024
Choose What You Need: Disentangled Representation Learning for Scene Text Recognition Removal and Editing
CVPR 2024
PAGED: A Benchmark for Procedural Graphs Extraction from Documents
ACL 2024
Tell Me What’s Next: Textual Foresight for Generic UI Representations
ACL 2024
PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task Completion
ACL 2024
In-Image Neural Machine Translation with Segmented Pixel Sequence-to-Sequence Model
EMNLP 2023
SynthNID: Synthetic Data to Improve End-to-end Bangla Document Key Information Extraction
EMNLP 2023
Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich Document
EMNLP 2023
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
EMNLP 2023
DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading
EMNLP 2023
PaperMage: A Unified Toolkit for Processing, Representing, and Manipulating Visually-Rich Scientific Documents
EMNLP 2023
EDIS: Entity-Driven Image Search over Multimodal Web Content
EMNLP 2023
Improving Table Structure Recognition With Visual-Alignment Sequential Coordinate Modeling
CVPR 2023
TrOCR: Transformer-Based Optical Character Recognition with Pre-trained Models
AAAI 2023
A Critical Analysis of Document Out-of-Distribution Detection
EMNLP 2023
Self-Supervised Implicit Glyph Attention for Text Recognition
CVPR 2023
<
1
…
5
6
7
…
12
>