Papers
3,673 papers found
Do You See Me : A Multidimensional Benchmark for Evaluating Visual Perception in Multimodal LLMs
Aditya Sanjiv Kanade, Tanuja Ganu
Detecting Subtle Sense Shift with Polysemy-Aware Trends
Ondřej Herman, Pavel Rychlý
Progressive Visual Refinement for Multi-modal Summarization
Ye Xiong, Hidetaka Kamigaito, Soichiro Murakami et al.
Chronocept: Instilling a Sense of Time in Machines
Krish Goel, Sanskar Pandey, KS Mahadevan et al.
Probabilistic Bilingual Subword Segmentation with Latent Subword Alignment
Shoto Nishida, Daiki Matsui, Takashi Ninomiya et al.
DRIVINGVQA: A Dataset for Interleaved Visual Chain-of-Thought in Real-World Driving Scenarios
Charles Corbière, Simon Roburin, Syrielle Montariol et al.
Seeing Words Differently: Visual Embeddings for Robust English-Arabic Machine Translation
Mahdi Alshaikh Saleh, Irfan Ahmad
The SlangTrack Dataset: Supporting the Detection of Words Used in Slang Senses
Afnan Mohammed Aloraini, Riza Batista-Navarro, Goran Nenadic et al.
Hebrew Diacritics Restoration using Visual Representation
Yair Elboher, Yuval Pinter
AnimatedLLM: Explaining LLMs with Interactive Visualizations
Zdeněk Kasner, Ondrej Dusek
AgentSense: Virtual Sensor Data Generation Using LLM Agents in Simulated Home Environments
Zikang Leng, Megha Thukral, Yaqi Liu et al.
TDSNNs: Competitive Topographic Deep Spiking Neural Networks for Visual Cortex Modeling
Deming Zhou, Yuetong Fang, Zhaorui Wang et al.
Open-World Object Counting in Videos
Niki Amini-Naieni, Andrew Zisserman
AbductiveMLLM: Boosting Visual Abductive Reasoning Within MLLMs
Boyu Chang, Qi Wang, Xi Guo et al.
VMChill: A Dataset for Fine-Grained Visual-Musical Synergy
Xiaowei Chi, Zeyue Tian, Jialiang Chen et al.
Primary Visual Cortex Inspired Point Cloud Analysis Framework
Jisheng Dang, Delin Deng, Bimei Wang et al.
SCAN: Self-Calibrated AutoregressioN for High-Quality Visual Generation
Zhanzhou Feng, Qingpei Guo, Jingdong Chen et al.
AdaptCLIP: Adapting CLIP for Universal Visual Anomaly Detection
Bin-Bin Gao, Yue Zhou, Jiangtao Yan et al.
BrainLMM: A Label-Free Framework for Mapping Multi-Semantic Representation in the Human Visual Cortex
Tan Gao, Mufan Xue, Haofang Zheng et al.
Concepts from Representations: Post-hoc Concept Bottleneck Models via Sparse Decomposition of Visual Representations
Shizhan Gong, Xiaofan Zhang, Qi Dou
DAPE: Harmonizing Content-Position Encoding for Versatile Dense Visual Prediction
Xiuquan Hou, Meiqin Liu, Senlin Zhang et al.
MSPCaps: A Multi-Scale Patchify Capsule Network with Cross-Agreement Routing for Visual Recognition
Yudong Hu, Yueju Han, Rui Sun et al.
SatireDecoder: Visual Cascaded Decoupling for Enhancing Satirical Image Comprehension
Yue Jiang, Haiwei Xue, Minghao Han et al.
ResProto-FD: Visual-Language Residual Prototype Sets for Generalized Face Forgery Detection
Jiuyao Jing, Yu Zheng, Chunlei Peng
Do Audio-Visual Segmentation Models Truly Segment Sounding Objects?
Jia Li, Wenjie Zhao, Ziru Huang et al.