← Architectures

Deep Learning › Architectures ›

Transformers

9294 directly classified papers

Papers per year

Papers

A Study of Finetuning Video Transformers for Multi-view Geometry Tasks AAAI 2026

Gradient as Conditions: Rethinking HOG for All-in-one Image Restoration AAAI 2026

Better Matching, Less Forgetting: A Quality-Guided Matcher for Transformer-based Incremental Object Detection AAAI 2026

DLVINet: Advancing Dual-Lens Video Inpainting Beyond Parallax Constraints AAAI 2026

TRT: Harnessing Tensor Ring Transformer for Hyperspectral Image Super-Resolution AAAI 2026

CloudMamba: Grouped Selective State Spaces for Point Cloud Analysis AAAI 2026

Masked Clustering Prediction for Unsupervised Point Cloud Pre-training AAAI 2026

CoMA: Compositional Human Motion Generation with Multi-modal Agents AAAI 2026

CtrlFuse: Mask-Prompt Guided Controllable Infrared and Visible Image Fusion AAAI 2026

Adapt-As-You-Walk Through the Clouds: Training-Free Online Test-Time Adaptation of 3D Vision-Language Foundation Models AAAI 2026

Distillation Dynamics: Towards Understanding Feature-Based Distillation in Vision Transformers AAAI 2026

Cumulant Attention in Vision Transformers (Student Abstract) AAAI 2026

PriorRG: Prior-Guided Contrastive Pre-training and Coarse-to-Fine Decoding for Chest X-ray Report Generation AAAI 2026

EasyText: Controllable Diffusion Transformer for Multilingual Text Rendering AAAI 2026

3DTeethSAM: Taming SAM2 for 3D Teeth Segmentation AAAI 2026

UM-Text: A Unified Multimodal Model for Image Understanding and Visual Text Editing AAAI 2026

EchoMimicV3: 1.3B Parameters Are All You Need for Unified Multi-Modal and Multi-Task Human Animation AAAI 2026

MSTDiff: Multiscale-Aware Transformer Diffusion Network for Video Object Detection AAAI 2026

Temporal Object-Aware Vision Transformer for Few-Shot Video Object Detection AAAI 2026

GeoMoE: Divide-and-Conquer Motion Field Modeling with Mixture-of-Experts for Two-View Geometry AAAI 2026

MR-COSMO: Visual-Text Memory Recall and Direct CrOSs-MOdal Alignment Method for Query-Driven 3D Segmentation AAAI 2026

Target Refocusing via Attention Redistribution for Open-Vocabulary Semantic Segmentation: An Explainability Perspective AAAI 2026

SSR-SAM: Retrieval-Style Segment Anything Model for Semi-Supervised Ultra-High-Resolution Image Segmentation AAAI 2026

HDRMovieformer: A Transformer Framework and Benchmark for Cinematic SDR-to-HDR Conversion AAAI 2026

Drive-R1: Bridging Reasoning and Planning in VLMs for Autonomous Driving with Reinforcement Learning AAAI 2026