Papers
3,673 papers found
3D Part Segmentation via Geometric Aggregation of 2D Visual Features
Marco Garosi, Riccardo Tedoldi, Davide Boscaini et al.
Generating Visual Explanations from Deep Networks using Implicit Neural Representations
Michal Byra, Henrik Skibbe
Diffusion-Based Visual Anagram as Multi-Task Learning
Zhiyuan Xu, Yinhe Chen, Huan-ang Gao et al.
Visual Robustness Benchmark for Visual Question Answering (VQA)
Farhan Ishmam, Ishmam Tashdeed, Talukder Asir Saadat et al.
Dataset Augmentation by Mixing Visual Concepts
Md Abdullah Al Rahat Kutubi, Hemanth Venkateswara
Enhancing Skin Disease Diagnosis: Interpretable Visual Concept Discovery with SAM
Xin Hu, Janet Wang, Jihun Hamm et al.
OpenCowID: Zero-Shot Visual Identification of Dairy Cows
Omkar Prabhune, Younghyun Kim
CADE: Continual Weakly-supervised Video Anomaly Detection with Ensembles
Satoshi Hashimoto, Tatsuya Konishi, Tomoya Kaichi et al.
SVD-Det: A Lightweight Framework for Video Forgery Detection Using Semantic and Visual Defect Cues
Tsung-Shan Yang, Tianyu Zhang, Feng Qian et al.
Grounding Descriptions in Images informs Zero-Shot Visual Recognition
Shaunak Halbe, Junjiao Tian, K J Joseph et al.
Direct Visual Grounding by Directing Attention of Visual Tokens
Parsa Esmaeilkhani, Longin Jan Latecki
AuViRe: Audio-visual Speech Representation Reconstruction for Deepfake Temporal Localization
Christos Koutlis, Symeon Papadopoulos
Enhancing Visual Planning with Auxiliary Tasks and Multi-token Prediction
Ce Zhang, Yale Song, Ruta Desai et al.
Understanding the Visual Projection Space of Multimodal LLMs
Sungheon Jeong, Yoojeong Song, Hyungjoon Kim
Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising
Yan-Bo Lin, Kevin Lin, Zhengyuan Yang et al.
VOCAL: Visual Odometry via ContrAstive Learning
Chi-Yao Huang, Zeel Bhatt, Yezhou Yang
Self-Supervised Visual Prompting for Cross-Domain Road Damage Detection
Xi Xiao, Zhuxuanzi Wang, Mingqiao Mo et al.
CLIP's Visual Embedding Projector is a Few-shot Cornucopia
Mohammad Fahes, Tuan-Hung Vu, Andrei Bursuc et al.
Online Episodic Memory Visual Query Localization with Egocentric Streaming Object Memory
Zaira Manigrasso, Matteo Dunnhofer, Antonino Furnari et al.
HiMix : Hierarchical Visual-Textual Mixing Network for Lesion Segmentation
Soojin Hwang, Jaeyoon Sim, Won Hwa Kim
See, Record, Do: Automated Generation of UI Workflows from Tutorial Videos
Adam Beauchaine, Craig Shue
ChartQA-X: Generating Explanations for Visual Chart Reasoning
Shamanthak Hegde, Pooyan Fazli, Hasti Seifi
iBERT: Interpretable Embeddings via Sense Decomposition
Vishal Anand, Milad Alshomary, Kathleen McKeown
A Computational Approach to Visual Metonymy
Saptarshi Ghosh, Linfeng Liu, Tianyu Jiang
Word Surprisal Correlates with Sentential Contradiction in LLMs
Ning Shi, Bradley Hauer, David Basil et al.