← Models

Deep Learning › Models ›

Transformers

1816 directly classified papers

Papers per year

Papers

Transformer-Based Inpainting for Real-Time 3D Streaming in Sparse Multi-Camera Setups WACV 2026

DF-Mamba: Deformable State Space Modeling for 3D Hand Pose Estimation in Interactions WACV 2026

LASOR: Towards Clinically Transparent and Explainable Ophthalmic Report Generation via Lesion-Aware Segmentation WACV 2026

Trajectory Tactics: When Transformers Learn Exploration to Generate Online Signature WACV 2026

Dense Retrieval with Quantity Comparison Intent ACL 2025

CSIRO-LT at SemEval-2025 Task 11: Adapting LLMs for Emotion Recognition for Multiple Languages SEMEVAL 2025

DictAS: A Framework for Class-Generalizable Few-Shot Anomaly Segmentation via Dictionary Lookup ICCV 2025

TexGarment: Consistent Garment UV Texture Generation via Efficient 3D Structure-Guided Diffusion Transformer CVPR 2025

TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model CVPR 2025

Cross-modal Information Flow in Multimodal Large Language Models CVPR 2025

HOPE at TSAR 2025 Shared Task Balancing Control and Complexity in Readability-Controlled Text Simplification EMNLP 2025

Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves CVPR 2025

SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation CVPR 2025

Dinomaly: The Less Is More Philosophy in Multi-Class Unsupervised Anomaly Detection CVPR 2025

PerLA: Perceptive 3D Language Assistant CVPR 2025

Open-ended Hierarchical Streaming Video Understanding with Vision Language Models ICCV 2025

SpectralAR: Spectral Autoregressive Visual Generation ICCV 2025

SPMTrack: Spatio-Temporal Parameter-Efficient Fine-Tuning with Mixture of Experts for Scalable Visual Tracking CVPR 2025

RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion CVPR 2025

CorrCLIP: Reconstructing Patch Correlations in CLIP for Open-Vocabulary Semantic Segmentation ICCV 2025

X-Prompt: Generalizable Auto-Regressive Visual Learning with In-Context Prompting ICCV 2025

3D Mesh Editing using Masked LRMs ICCV 2025

LHM: Large Animatable Human Reconstruction Model for Single Image to 3D in Seconds ICCV 2025

OrderChain: Towards General Instruct-Tuning for Stimulating the Ordinal Understanding Ability of MLLM ICCV 2025

CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models ICCV 2025