DAPE: Harmonizing Content-Position Encoding for Versatile Dense Visual Prediction

Xiuquan Hou; Meiqin Liu; Senlin Zhang; Shaoyi Du

2026 AAAI AAAI 2026

DAPE: Harmonizing Content-Position Encoding for Versatile Dense Visual Prediction

Abstract

Abstract Dense visual prediction tasks, including object detection and segmentation, inherently require precise and discriminative positional information to delineate object boundaries and pixel regions. Recent DETR-based frameworks advance dense prediction tasks through iterative attention applied to content queries, with sampled proposals as position references. However, this paradigm suffers from the misaligned sampling distribution and insufficient interaction between the content and position features, thereby limiting the encoding effectiveness. To overcome these limitations, we investigate the encoding paradigm for content-position harmonization and propose an effective predictor for dense visual tasks, termed DAPE (DETR with hArmonized content-Position Encoding). DAPE introduces explicit position encoding to facilitate content enhancement while maintaining low memory overhead. To achieves this process, DAPE comprises a Shifted Query Sampler (SQS) that enforces strict alignment between the distributions of content and position queries, and a 2D Low-Rank Position Encoder (LRPE) that progressively modulates attention maps based on the aligned representations. DAPE provides a unified solution for various dense prediction tasks. Extensive experiments on object detection, instance segmentation, and few-shot detection benchmarks demonstrate that DAPE achieves state-of-the-art performance while reducing memory consumption.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xiuquan Hou , Meiqin Liu , Senlin Zhang , Shaoyi Du

Topics

Deep Learning > Architectures > Transformers Computer Vision > Analysis > Object Detection Computer Vision > Processing > Image Segmentation

Keywords

object detection instance segmentation dense prediction position encoding

Download PDF

Related papers

Hi-EF: Benchmarking Emotion Forecasting in Human-interaction 2026

MosaicDoc: A Large-Scale Bilingual Benchmark for Visually Rich Document Understanding 2026

Sparse3DPR: Training-Free 3D Hierarchical Scene Parsing and Task-Adaptive Subgraph Reasoning from Sparse RGB Views 2026

LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning 2026

HDGS: Hierarchical Dynamic Gaussian Splatting for Urban Driving Scenes 2026