Papers
2,653 papers found
AudioViewer: Learning To Visualize Sounds
Chunjin Song, Yuchi Zhang, Willis Peng et al.
Global-Local Self-Distillation for Visual Representation Learning
Tim Lebailly, Tinne Tuytelaars
LAVA: Label-Efficient Visual Learning and Adaptation
Islam Nassar, Munawar Hayat, Ehsan Abbasnejad et al.
Benchmarking Visual Localization for Autonomous Navigation
Lauri Suomela, Jussi Kalliola, Atakan Dag et al.
Context-Empowered Visual Attention Prediction in Pedestrian Scenarios
Igor Vozniak, Philipp Müller, Lorena Hell et al.
BirdSoundsDenoising: Deep Visual Audio Denoising for Bird Sounds
Youshan Zhang, Jialu Li
Towards Visual Saliency Explanations of Face Verification
Yuhang Lu, Zewei Xu, Touradj Ebrahimi
DREAM: Visual Decoding From Reversing Human Visual System
Weihao Xia, Raoul de Charette, Cengiz Oztireli et al.
A Visual Active Search Framework for Geospatial Exploration
Anindya Sarkar, Michael Lanier, Scott Alfeld et al.
Rethink Cross-Modal Fusion in Weakly-Supervised Audio-Visual Video Parsing
Yating Xu, Conghui Hu, Gim Hee Lee
VD-GR: Boosting Visual Dialog With Cascaded Spatial-Temporal Multi-Modal Graphs
Adnen Abdessaied, Lei Shi, Andreas Bulling
LAVSS: Location-Guided Audio-Visual Spatial Audio Separation
Yuxin Ye, Wenming Yang, Yapeng Tian
Learning Robust Deep Visual Representations From EEG Brain Recordings
Prajwal Singh, Dwip Dalal, Gautam Vashishtha et al.
MIVC: Multiple Instance Visual Component for Visual-Language Models
Wenyi Wu, Qi Li, Wenliang Zhong et al.
FocusTune: Tuning Visual Localization Through Focus-Guided Sampling
Son Tung Nguyen, Alejandro Fontan, Michael Milford et al.
Benchmarking Out-of-Distribution Detection in Visual Question Answering
Xiangxi Shi, Stefan Lee
Personalized Face Inpainting With Diffusion Models by Parallel Visual Attention
Jianjin Xu, Saman Motamed, Praneetha Vaddamanu et al.
Instruct Me More! Random Prompting for Visual In-Context Learning
Jiahao Zhang, Bowen Wang, Liangzhi Li et al.
Annotation-Free Audio-Visual Segmentation
Jinxiang Liu, Yu Wang, Chen Ju et al.
Neural Image Compression Using Masked Sparse Visual Representation
Wei Jiang, Wei Wang, Yue Chen
Interaction Region Visual Transformer for Egocentric Action Anticipation
Debaditya Roy, Ramanathan Rajendiran, Basura Fernando
MVAD: A Multiple Visual Artifact Detector for Video Streaming
Chen Feng, Duolikun Danier, Fan Zhang et al.
Data-Efficient 3D Visual Grounding via Order-Aware Referring
Tung-Yu Wu, Sheng-Yu Huang, Yu-Chiang Frank Wang
Fine-Grained Spatial and Verbal Losses for 3D Visual Grounding
Sombit Dey, Ozan Unal, Christos Sakaridis et al.
VHS: High-Resolution Iterative Stereo Matching with Visual Hull Priors
Markus Plack, Hannah Dröge, Leif Van Holland et al.