Papers
2,653 papers found
Temporally Streaming Audio-Visual Synchronization for Real-World Videos
Jordan G Voas, Wei-Cheng Tseng, Layne Berry et al.
Learning Visual Grounding from Generative Vision and Language Model
Shijie Wang, Dahun Kim, Ali Taalimi et al.
When Visual State Space Model Meets Backdoor Attacks
Sankalp Nagaonkar, Achyut Mani Tripathi, Ashish Mishra
PTQ4VM: Post-Training Quantization for Visual Mamba
Younghyun Cho, Changhun Lee, Seonggon Kim et al.
WiGNet: Windowed Vision Graph Neural Network
Gabriele Spadaro, Marco Grangetto, Attilio Fiandrotti et al.
From Visual Explanations to Counterfactual Explanations with Latent Diffusion
Tung Luu, Nam Le, Duc Le et al.
Scene-LLM: Extending Language Model for 3D Visual Reasoning
Rao Fu, Jingyu Liu, Xilun Chen et al.
CusConcept: Customized Visual Concept Decomposition with Diffusion Models
Zhi Xu, Shaozhe Hao, Kai Han
Improving Accuracy and Generalization for Efficient Visual Tracking
Ram Zaveri, Shivang Patel, Yu Gu et al.
SUM: Saliency Unification through Mamba for Visual Attention Modeling
Alireza Hosseini, Amirhossein Kazerouni, Saeed Akhavan et al.
Adaptive Deviation Learning for Visual Anomaly Detection with Data Contamination
Anindya Sundar Das, Guansong Pang, Monowar Bhuyan
Dance Any Beat: Blending Beats with Visuals in Dance Video Generation
Xuanchen Wang, Heng Wang, Dongnan Liu et al.
Enhancing Visual Classification using Comparative Descriptors
Hankyeol Lee, Gawon Seo, Wonseok Choi et al.
AdQuestA: Knowledge-Guided Visual Question Answer Framework for Advertisements
Neha Choudhary, Poonam Goyal, Devashish Siwatch et al.
Learning to Visually Connect Actions and their Effects
Paritosh Parmar, Eric Peh, Basura Fernando
3D Part Segmentation via Geometric Aggregation of 2D Visual Features
Marco Garosi, Riccardo Tedoldi, Davide Boscaini et al.
Diffusion-Based Visual Anagram as Multi-Task Learning
Zhiyuan Xu, Yinhe Chen, Huan-ang Gao et al.
Visual Robustness Benchmark for Visual Question Answering (VQA)
Farhan Ishmam, Ishmam Tashdeed, Talukder Asir Saadat et al.
Dataset Augmentation by Mixing Visual Concepts
Md Abdullah Al Rahat Kutubi, Hemanth Venkateswara
Enhancing Skin Disease Diagnosis: Interpretable Visual Concept Discovery with SAM
Xin Hu, Janet Wang, Jihun Hamm et al.
Breaking the Frame: Visual Place Recognition by Overlap Prediction
Tong Wei, Philipp Lindenberger, JirĂ Matas et al.
OpenCowID: Zero-Shot Visual Identification of Dairy Cows
Omkar Prabhune, Younghyun Kim
Grounding Descriptions in Images informs Zero-Shot Visual Recognition
Shaunak Halbe, Junjiao Tian, K J Joseph et al.
Direct Visual Grounding by Directing Attention of Visual Tokens
Parsa Esmaeilkhani, Longin Jan Latecki
Enhancing Visual Planning with Auxiliary Tasks and Multi-token Prediction
Ce Zhang, Yale Song, Ruta Desai et al.