Papers
18,421 papers found
A Simple yet Effective Layout Token in Large Language Models for Document Understanding
Zhaoqing Zhu, Chuwei Luo, Zirui Shao et al.
Assessing and Learning Alignment of Unimodal Vision and Language Models
Le Zhang, Qian Yang, Aishwarya Agrawal
Associative Transformer
Yuwei Sun, Hideya Ochiai, Zhirong Wu et al.
A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs
Wangbo Zhao, Yizeng Han, Jiasheng Tang et al.
Asynchronous Collaborative Graph Representation for Frames and Events
Dianze Li, Jianing Li, Xu Liu et al.
ATA: Adaptive Transformation Agent for Text-Guided Subject-Position Variable Background Inpainting
Yizhe Tang, Zhimin Sun, Yuzhen Du et al.
A Tale of Two Classes: Adapting Supervised Contrastive Learning to Binary Imbalanced Datasets
David Mildenberger, Paul Hager, Daniel Rueckert et al.
A Theory of Learning Unified Model via Knowledge Integration from Label Space Varying Domains
Dexuan Zhang, Thomas Westfechtel, Tatsuya Harada
AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward
Haonan Han, Xiangzuo Wu, Huan Liao et al.
ATP: Adaptive Threshold Pruning for Efficient Data Encoding in Quantum Neural Networks
Mohamed Afane, Gabrielle Ebbrecht, Ying Wang et al.
ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models
Xubing Ye, Yukang Gan, Yixiao Ge et al.
Attend to Not Attended: Structure-then-Detail Token Merging for Post-training DiT Acceleration
Haipeng Fang, Sheng Tang, Juan Cao et al.
Attention Distillation: A Unified Approach to Visual Characteristics Transfer
Yang Zhou, Xu Gao, Zichong Chen et al.
Attention IoU: Examining Biases in CelebA using Attention Maps
Aaron Serianni, Tyler Zhu, Olga Russakovsky et al.
Attraction Diminishing and Distributing for Few-Shot Class-Incremental Learning
Li-Jun Zhao, Zhen-Duo Chen, Yongxin Wang et al.
Attribute-formed Class-specific Concept Space: Endowing Language Bottleneck Model with Better Interpretability and Scalability
Jianyang Zhang, Qianli Luo, Guowu Yang et al.
Attribute-Missing Multi-view Graph Clustering
Bowen Zhao, Qianqian Wang, Zhengming Ding et al.
AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers
Jiazhi Guan, Kaisiyuan Wang, Zhiliang Xu et al.
Audio-Visual Instance Segmentation
Ruohao Guo, Xianghua Ying, Yaru Chen et al.
Audio-Visual Semantic Graph Network for Audio-Visual Event Localization
Liang Liu, Shuaiyong Li, Yongqiang Zhu
Augmented Deep Contexts for Spatially Embedded Video Coding
Yifan Bian, Chuanbo Tang, Li Li et al.
Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
Federico Cocchi, Nicholas Moratelli, Marcella Cornia et al.
Augmenting Perceptual Super-Resolution via Image Quality Predictors
Fengjia Zhang, Samrudhdhi B. Rangrej, Tristan Aumentado-Armstrong et al.
A Unified Approach to Interpreting Self-supervised Pre-training Methods for 3D Point Clouds via Interactions
Qiang Li, Jian Ruan, Fanghao Wu et al.
A Unified Framework for Heterogeneous Semi-supervised Learning
Marzi Heidari, Abdullah Alchihabi, Hao Yan et al.