← Domain-Specific

Computer Vision › Domain-Specific ›

Document Analysis

278 directly classified papers

Papers per year

Papers

LATTE: Improving Latex Recognition for Tables and Formulae with Iterative Refinement AAAI 2025

Continuous Fingerspelling Dataset for Indian Sign Language IJCNLP 2025

LAW: Legal Agentic Workflows for Custody and Fund Services Contracts COLING 2025

CalligraphicOCR for Chinese Calligraphy Recognition EMNLP 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design EMNLP 2025

M-LongDoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework EMNLP 2025

ChartGaze: Enhancing Chart Understanding in LVLMs with Eye-Tracking Guided Attention Refinement EMNLP 2025

PRIM: Towards Practical In-Image Multilingual Machine Translation EMNLP 2025

TVQACML: Benchmarking Text-Centric Visual Question Answering in Multilingual Chinese Minority Languages EMNLP 2025

SheetDesigner: MLLM-Powered Spreadsheet Layout Generation with Rule-Based and Vision-Based Reflection EMNLP 2025

MultiDocFusion : Hierarchical and Multimodal Chunking Pipeline for Enhanced RAG on Long Industrial Documents EMNLP 2025

WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild? EMNLP 2025

VisFinEval: A Scenario-Driven Chinese Multimodal Benchmark for Holistic Financial Understanding EMNLP 2025

SimpleDoc: Multi‐Modal Document Understanding with Dual‐Cue Page Retrieval and Iterative Refinement EMNLP 2025

SERVAL: Surprisingly Effective Zero-Shot Visual Document Retrieval Powered by Large Vision and Language Models EMNLP 2025

Is Cognition Consistent with Perception? Assessing and Mitigating Multimodal Knowledge Conflicts in Document Understanding EMNLP 2025

MMDocIR: Benchmarking Multimodal Retrieval for Long Documents EMNLP 2025

CHURRO: Making History Readable with an Open-Weight Large Vision-Language Model for High-Accuracy, Low-Cost Historical Text Recognition EMNLP 2025

GraDeT-HTR: A Resource-Efficient Bengali Handwritten Text Recognition System utilizing Grapheme-based Tokenizer and Decoder-only Transformer EMNLP 2025

PDFMathTranslate: Scientific Document Translation Preserving Layouts EMNLP 2025

FlexDoc: Parameterized Sampling for Diverse Multilingual Synthetic Documents for Training Document Understanding Models EMNLP 2025

Structural Patent Classification Using Label Hierarchy Optimization EMNLP 2025

CourtNav: Voice-Guided, Anchor-Accurate Navigation of Long Legal Documents in Courtrooms EMNLP 2025

Any Information Is Just Worth One Single Screenshot: Unifying Search With Visualized Information Retrieval ACL 2025

Arctic-TILT. Business Document Understanding at Sub-Billion Scale ACL 2025