Papers
3,673 papers found
Self-Supervised Domain Adaptation for Visual Navigation With Global Map Consistency
Eun Sun Lee, Junho Kim, Young Min Kim
ForeSI: Success-Aware Visual Navigation Agent
Mahdi Kazemi Moghaddam, Ehsan Abbasnejad, Qi Wu et al.
Visual Understanding of Complex Table Structures From Document Images
Sachin Raja, Ajoy Mondal, C.V. Jawahar
Perceptual Consistency in Video Segmentation
Yizhe Zhang, Shubhankar Borse, Hong Cai et al.
MovieCLIP: Visual Scene Recognition in Movies
Digbalay Bose, Rajat Hebbar, Krishna Somandepalli et al.
Audio-Visual Efficient Conformer for Robust Speech Recognition
Maxime Burchi, Radu Timofte
LayerDoc: Layer-Wise Extraction of Spatial Hierarchical Structure in Visually-Rich Documents
Puneet Mathur, Rajiv Jain, Ashutosh Mehra et al.
Towards Disturbance-Free Visual Mobile Manipulation
Tianwei Ni, Kiana Ehsani, Luca Weihs et al.
VLC-BERT: Visual Question Answering With Contextualized Commonsense Knowledge
Sahithya Ravi, Aditya Chinchure, Leonid Sigal et al.
Unsupervised Audio-Visual Lecture Segmentation
Darshan Singh S., Anchit Gupta, C. V. Jawahar et al.
Self-Supervised Pyramid Representation Learning for Multi-Label Visual Analysis and Beyond
Cheng-Yen Hsieh, Chih-Jung Chang, Fu-En Yang et al.
MixVPR: Feature Mixing for Visual Place Recognition
Amar Ali-bey, Brahim Chaib-draa, Philippe Giguère
Match Cutting: Finding Cuts With Smooth Visual Transitions
Boris Chen, Amir Ziai, Rebecca S. Tucker et al.
Pixel-Wise Prediction Based Visual Odometry via Uncertainty Estimation
Hao-Wei Chen, Ting-Hsuan Liao, Hsuan-Kung Yang et al.
Guiding Visual Question Answering With Attention Priors
Thao Minh Le, Vuong Le, Sunil Gupta et al.
Vis2Rec: A Large-Scale Visual Dataset for Visit Recommendation
Michaël Soumm, Adrian Popescu, Bertrand Delezoide
Efficient Visual Tracking With Exemplar Transformers
Philippe Blatter, Menelaos Kanakis, Martin Danelljan et al.
Audio-Visual Face Reenactment
Madhav Agarwal, Rudrabha Mukhopadhyay, Vinay P. Namboodiri et al.
Exploiting Visual Context Semantics for Sound Source Localization
Xinchi Zhou, Dongzhan Zhou, Di Hu et al.
Watch Those Words: Video Falsification Detection Using Word-Conditioned Facial Motion
Shruti Agarwal, Liwen Hu, Evonne Ng et al.
SimGlim: Simplifying Glimpse Based Active Visual Reconstruction
Abhishek Jha, Soroush Seifi, Tinne Tuytelaars
Barlow Constrained Optimization for Visual Question Answering
Abhishek Jha, Badri Patro, Luc Van Gool et al.
Visually Explaining 3D-CNN Predictions for Video Classification With an Adaptive Occlusion Sensitivity Analysis
Tomoki Uchiyama, Naoya Sogi, Koichiro Niinuma et al.
Hear the Flow: Optical Flow-Based Self-Supervised Visual Sound Source Localization
Dennis Fedorishin, Deen Dayal Mohan, Bhavin Jawade et al.