Papers
2,653 papers found
Minute-Long Videos with Dual Parallelisms
Zeqing Wang, Bowen Zheng, Xingyi Yang et al.
MUTrack: A Memory-Aware Unified Representation Framework for Visual Tracking
Weijing Wu, Qihua Liang, Bineng Zhong et al.
Retrieval-driven Reasoning for Deliberative Visual Classification
Jianye Xie, Lianyong Qi, Fan Wang et al.
SCALAR: Scale-wise Controllable Visual Autoregressive Learning
Ryan Xu, Dongyang Jin, Yancheng Bai et al.
Look-Back: Implicit Visual Re-focusing in MLLM Reasoning
Shuo Yang, Yuwei Niu, Yuyang Liu et al.
VAEVQ: Enhancing Discrete Visual Tokenization Through Variational Modeling
Sicheng Yang, Xing Hu, Qiang Wu et al.
When Eyes and Ears Disagree: Can MLLMs Discern Audio-Visual Confusion?
Qilang Ye, Wei Zeng, Meng Liu et al.
MedEyes: Learning Dynamic Visual Focus for Medical Progressive Diagnosis
Chunzheng Zhu, Yangfang Lin, Shen Chen et al.
Learning DFAs from Positive Examples Only via Word Counting
Benjamin Bordais, Daniel Neider
Making Visual Dialogue More Engaging: A New Task, Method, and Metric
Guanghui Ye, Huan Zhao, Yingxue Gao et al.
Leveraging Visual Blur Perception Characteristics for EEG Decoding
Wenchao Liu, Hongwei Li, Zhouyang Xu et al.
Multigranular Evaluation for Brain Visual Decoding
Weihao Xia, Cengiz Oztireli
Steering Visuomotor Policy in Open Worlds via Cross-View Goal Alignment
Shaofei Cai, Zhancun Mu, Anji Liu et al.
VPN: Visual Prompt Navigation
Shuo Feng, Zihan Wang, Yuchen Li et al.
UNeMo: Collaborative Visual-Language Reasoning and Navigation via a Multimodal World Model
Changxin Huang, Lv Tang, Zhaohuan Zhan et al.
SIFThinker: Spatially-Aware Image Focus for Visual Reasoning
Zhangquan Chen, Ruihui Zhao, Chuwei Luo et al.
Visual Bridge: Universal Visual Perception Representations Generating
Yilin Gao, Shuguang Dou, Junzhou Li et al.
rMMEA: Robust Multi-Modal Entity Alignment with Missing and Noise Visual Modality
Lingbing Guo, Zhuo Chen, Yichi Zhang et al.
Enhancing Spatial Reasoning Through Visual and Textual Thinking
Xun Liang, Xin Guo, Zhongming Jin et al.
E-Logic Prompt: Unified Energy-Logic Framework for Continual Visual Question Answering
Jiayao Tan, Tianle Liu, Fuyuan Hu et al.
MAVERIX: Multimodal Audio-Visual Evaluation and Recognition IndeX
Liuyue Xie, Avik Kuthiala, George Z Wei et al.
Learning Optimal Prompt Ensemble for Multi-source Visual Prompt Transfer
Enming Zhang, Liwen Cao, Yanru Wu et al.
GUI-Eyes: Tool-Augmented Perception for Visual Grounding in GUI Agents
Chen Chen, Jiawei Shao, Dakuan Lu et al.