Computer Vision › Analysis ›

Scene Understanding

1887 directly classified papers

Papers per year

Papers

Proactive Scene Decomposition and Reconstruction ICCV 2025

PVMamba: Parallelizing Vision Mamba via Dynamic State Aggregation ICCV 2025

Supercharging Floorplan Localization with Semantic Rays ICCV 2025

Image-Guided Shape-from-Template Using Mesh Inextensibility Constraints ICCV 2025

Less is More: Empowering GUI Agent with Context-Aware Simplification ICCV 2025

Unified Reconstruction of Static and Dynamic Scenes from Events CVPR 2025

InstaScene: Towards Complete 3D Instance Decomposition and Reconstruction from Cluttered Scenes ICCV 2025

DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering CVPR 2025

RUBIK: A Structured Benchmark for Image Matching across Geometric Challenges CVPR 2025

MammAlps: A Multi-view Video Behavior Monitoring Dataset of Wild Mammals in the Swiss Alps CVPR 2025

Wonderland: Navigating 3D Scenes from a Single Image CVPR 2025

VisHall3D: Monocular Semantic Scene Completion from Reconstructing the Visible Regions to Hallucinating the Invisible Regions ICCV 2025

Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Mutimodal Models CVPR 2025

Focal Plane Visual Feature Generation and Matching on a Pixel Processor Array ICCV 2025

DiffSim: Taming Diffusion Models for Evaluating Visual Similarity ICCV 2025

TopicGeo: An Efficient Unified Framework for Geolocation ICCV 2025

DepR: Depth Guided Single-view Scene Reconstruction with Instance-level Diffusion ICCV 2025

HOMO-Feature: Cross-Arbitrary-Modal Image Matching with Homomorphism of Organized Major Orientation ICCV 2025

Vision-Language Models Struggle to Align Entities across Modalities ACL 2025

Where am I? Cross-View Geo-localization with Natural Language Descriptions ICCV 2025

Beyond Visual Understanding Introducing PARROT-360V for Vision Language Model Benchmarking COLING 2025

Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation ICCV 2025

MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering ACL 2025

Bridging Language and Scenes through Explicit 3-D Model Construction COLING 2025

UAVScenes: A Multi-Modal Dataset for UAVs ICCV 2025