← Domain-Specific

Computer Vision › Domain-Specific ›

Document Analysis

278 directly classified papers

Papers per year

Papers

PPTSER: A Plug-and-Play Tag-guided Method for Few-shot Semantic Entity Recognition on Visually-rich Documents ACL 2024

LayoutLLM: Large Language Model Instruction Tuning for Visually Rich Document Understanding COLING 2024

Reading between the Lines: Image-Based Order Detection in OCR for Chinese Historical Documents AAAI 2024

DocLLM: A Layout-Aware Generative Language Model for Multimodal Document Understanding ACL 2024

Towards Real-World Writing Assistance: A Chinese Character Checking Benchmark with Faked and Misspelled Characters ACL 2024

Multimodal Table Understanding ACL 2024

Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On CVPR 2024

Enhancing Vision-Language Pre-training with Rich Supervisions CVPR 2024

Layout-Agnostic Scene Text Image Synthesis with Diffusion Models CVPR 2024

Bridging the Gap Between End-to-End and Two-Step Text Spotting CVPR 2024

Choose What You Need: Disentangled Representation Learning for Scene Text Recognition Removal and Editing CVPR 2024

PAGED: A Benchmark for Procedural Graphs Extraction from Documents ACL 2024

Tell Me What’s Next: Textual Foresight for Generic UI Representations ACL 2024

PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task Completion ACL 2024

In-Image Neural Machine Translation with Segmented Pixel Sequence-to-Sequence Model EMNLP 2023

SynthNID: Synthetic Data to Improve End-to-end Bangla Document Key Information Extraction EMNLP 2023

Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich Document EMNLP 2023

UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model EMNLP 2023

DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading EMNLP 2023

PaperMage: A Unified Toolkit for Processing, Representing, and Manipulating Visually-Rich Scientific Documents EMNLP 2023

EDIS: Entity-Driven Image Search over Multimodal Web Content EMNLP 2023

Improving Table Structure Recognition With Visual-Alignment Sequential Coordinate Modeling CVPR 2023

TrOCR: Transformer-Based Optical Character Recognition with Pre-trained Models AAAI 2023

A Critical Analysis of Document Out-of-Distribution Detection EMNLP 2023

Self-Supervised Implicit Glyph Attention for Text Recognition CVPR 2023