Papers
3,673 papers found
Visual Bridge: Universal Visual Perception Representations Generating
Yilin Gao, Shuguang Dou, Junzhou Li et al.
rMMEA: Robust Multi-Modal Entity Alignment with Missing and Noise Visual Modality
Lingbing Guo, Zhuo Chen, Yichi Zhang et al.
AV-Edit: Multimodal Generative Sound Effect Editing via Audio-Visual Semantic Joint Control
Xinyue Guo, Xiaoran Yang, Lipan Zhang et al.
Enhancing Spatial Reasoning Through Visual and Textual Thinking
Xun Liang, Xin Guo, Zhongming Jin et al.
Guided Perturbation Sensitivity (GPS): Detecting Adversarial Text via Embedding Stability and Word Importance
Bryan E. Tuck, Rakesh M. Verma
MAVERIX: Multimodal Audio-Visual Evaluation and Recognition IndeX
Liuyue Xie, Avik Kuthiala, George Z Wei et al.
AMS-KV: Adaptive KV Caching in Multi-Scale Visual Autoregressive Transformers
Boxun Xu, Yu Wang, Zihu Wang et al.
Activating Visual Context and Commonsense Reasoning Through Masked Prediction in VLMs
Jiaao Yu, Shenwei Li, Mingjie Han et al.
Learning Optimal Prompt Ensemble for Multi-source Visual Prompt Transfer
Enming Zhang, Liwen Cao, Yanru Wu et al.
Parameter-Free Clustering via Self-Supervised Consensus Maximization
Lijun Zhang, Suyuan Liu, Siwei Wang et al.
Seeing Is Believing: Rich-Context Hallucination Detection for MLLMs via Backward Visual Grounding
Pinxue Guo, Chongruo Wu, Xinyu Zhou et al.
VipAct: Visual-Perception Enhancement via Specialized VLM Agent Collaboration and Tool-use
Zhehao Zhang, Ryan A. Rossi, Tong Yu et al.
Bayesian Network Structural Consensus via Greedy Min-Cut Analysis
Pablo Torrijos, Jose M. Puerta, Juan A. Aledo et al.
Ordinal Secretaries with Advice
Hasti Nourmohammadi, Ying Cao, Bo Sun et al.
SMPRO: Self-Supervised Visual Preference Alignment via Differentiable Multi-Preference Multi-Group Ranking
Sirnam Swetha, Rui Meng, Shwetha Ram et al.
Visual-Friendly Concept Protection via Selective Adversarial Perturbations
Xiaoyue Mi, Fan Tang, You Wu et al.
Traffic Signal Plans Explorer: A General Framework for Visualising Traffic Evolution
Francesco Doria, Francesco Percassi, Marco Maratea et al.
A Visualized Framework for Event Cooperation with Generative Agents
Yuyang Tian, Shunqiang Mao, Wenchang Gao et al.
AgentSeer: Visualizing and Evaluating Temporal Actions in Agentic AI Systems
Ilham Wicaksono, Zekun Wu, Rahul Patel et al.
SPORTSQL: An Interactive System for Real-Time Sports Reasoning and Visualization
Sebastian Martinez, Naman Ahuja, Fenil Bardoliya et al.
XL-DURel: Finetuning Sentence Transformers for Ordinal Word-in-Context Classification
Sachin Yadav, Dominik Schlechtweg