← Applications

Computer Vision › Applications ›

Document Analysis

74 directly classified papers

Papers per year

Papers

DongbaMIE: A Multimodal Information Extraction Dataset for Evaluating Semantic Understanding of Dongba Pictograms EMNLP 2025

RFL: Simplifying Chemical Structure Recognition with Ring-Free Language AAAI 2025

PDFMathTranslate: Scientific Document Translation Preserving Layouts EMNLP 2025

Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding CVPR 2025

PatentLMM: Large Multimodal Model for Generating Descriptions for Patent Figures AAAI 2025

Out of Length Text Recognition with Sub-String Matching AAAI 2025

From Charts to Fair Narratives: Uncovering and Mitigating Geo-Economic Biases in Chart-to-Text EMNLP 2025

MMDocIR: Benchmarking Multimodal Retrieval for Long Documents EMNLP 2025

Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation ACL 2025

Zero-Shot Styled Text Image Generation, but Make It Autoregressive CVPR 2025

Linguistics-aware Masked Image Modeling for Self-supervised Scene Text Recognition CVPR 2025

SSAN: A Symbol Spatial-Aware Network for Handwritten Mathematical Expression Recognition AAAI 2025

TAMER: Tree-Aware Transformer for Handwritten Mathematical Expression Recognition AAAI 2025

InstructOCR: Instruction Boosting Scene Text Spotting AAAI 2025

Finding Needles in Images: Can Multi-modal LLMs Locate Fine Details? ACL 2025

DocSAM: Unified Document Image Segmentation via Query Decomposition and Heterogeneous Mixed Learning CVPR 2025

LongDocURL: a Comprehensive Multimodal Long Document Benchmark Integrating Understanding, Reasoning, and Locating ACL 2025

SERVAL: Surprisingly Effective Zero-Shot Visual Document Retrieval Powered by Large Vision and Language Models EMNLP 2025

LRANet: Towards Accurate and Efficient Scene Text Detection with Low-Rank Approximation Network AAAI 2024

Grab What You Need: Rethinking Complex Table Structure Recognition with Flexible Components Deliberation AAAI 2024

Modeling Layout Reading Order as Ordering Relations for Visually-rich Document Understanding EMNLP 2024

Towards Automated Chinese Ancient Character Restoration: A Diffusion-Based Method with a New Dataset AAAI 2024

Bridging the Gap Between End-to-End and Two-Step Text Spotting CVPR 2024

Effective Synthetic Data and Test-Time Adaptation for OCR Correction EMNLP 2024

Enhanced Optical Character Recognition by Optical Sensor Combined with BERT and Cosine Similarity Scoring (Student Abstract) AAAI 2024