← Domain-Specific

Computer Vision › Domain-Specific ›

Document Analysis

278 directly classified papers

Papers per year

Papers

LAW: Legal Agentic Workflows for Custody and Fund Services Contracts COLING 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design EMNLP 2025

CalligraphicOCR for Chinese Calligraphy Recognition EMNLP 2025

Structural Patent Classification Using Label Hierarchy Optimization EMNLP 2025

Automating the Expansion of Instrument Typicals in Piping and Instrumentation Diagrams (P&IDs) AAAI 2025

SERVAL: Surprisingly Effective Zero-Shot Visual Document Retrieval Powered by Large Vision and Language Models EMNLP 2025

MMDocIR: Benchmarking Multimodal Retrieval for Long Documents EMNLP 2025

WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild? EMNLP 2025

VisFinEval: A Scenario-Driven Chinese Multimodal Benchmark for Holistic Financial Understanding EMNLP 2025

PreP-OCR: A Complete Pipeline for Document Image Restoration and Enhanced OCR Accuracy ACL 2025

Any Information Is Just Worth One Single Screenshot: Unifying Search With Visualized Information Retrieval ACL 2025

NusaAksara: A Multimodal and Multilingual Benchmark for Preserving Indonesian Indigenous Scripts ACL 2025

SCITAT: A Question Answering Benchmark for Scientific Tables and Text Covering Diverse Reasoning Types ACL 2025

P²Net: Parallel Pointer-based Network for Key Information Extraction with Complex Layouts ACL 2025

Towards Storage-Efficient Visual Document Retrieval: An Empirical Study on Reducing Patch-Level Embeddings ACL 2025

READoc: A Unified Benchmark for Realistic Document Structured Extraction ACL 2025

AID-Agent: An LLM-Agent for Advanced Extraction and Integration of Documents ACL 2025

Hidden Forms: A Dataset to Fill Masked Interfaces from Language Commands ACL 2025

Page Stream Segmentation with LLMs: Challenges and Applications in Insurance Document Automation COLING 2025

Bringing Suzhou Numerals into the Digital Age: A Dataset and Recognition Study on Ancient Chinese Trade Records NAACL 2025

PRIM: Towards Practical In-Image Multilingual Machine Translation EMNLP 2025

Towards Comprehensive Lecture Slides Understanding: Large-scale Dataset and Effective Method ICCV 2025

M-LongDoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework EMNLP 2025

DocSAM: Unified Document Image Segmentation via Query Decomposition and Heterogeneous Mixed Learning CVPR 2025

FlexDoc: Parameterized Sampling for Diverse Multilingual Synthetic Documents for Training Document Understanding Models EMNLP 2025