← Architectures

Deep Learning › Architectures ›

Transformers

9294 directly classified papers

Papers per year

Papers

ConstructAI: From Real-Time Safety Insight to Skill Growth in Deployed Construction AI Systems AAAI 2026

Autoregressive Styled Text Image Generation, but Make it Reliable WACV 2026

SPAR-Det: Segmentation-guided and Prior-Aided Routing for Small Object Detection WACV 2026

Scalable Video Action Anticipation with Cross Linear Attentive Memory WACV 2026

Spatially-Guided Self-Attention Refinement for Zero-Shot Hair Segmentation (Student Abstract) AAAI 2026

Feasibility-Aware Masked Transformer for the Pickup-and-Delivery Problem with Time Windows (Student Abstract) AAAI 2026

Causal-LLM: Towards Predictive and Interpretable Spatiotemporal Foundation Models AAAI 2026

PEaRL: Pathway-Enhanced Representation Learning for Gene and Pathway Expression Prediction from Histology WACV 2026

Cross-Modal Event Encoder: Bridging Image-Text Knowledge to Event Streams WACV 2026

DiT-VTON: Diffusion Transformer Framework for Unified Multi-Category Virtual Try-On and Virtual Try-All with Integrated Image Editing WACV 2026

Pose-Diverse Multi-View Virtual Try-on from a Single Frontal Image via Diffusion Transformer WACV 2026

Power of Boundary and Reflection: Semantic Transparent Object Segmentation using Pyramid Vision Transformer with Transparent Cues WACV 2026

CraftSVG: Multi-Object Text-to-SVG Synthesis via Layout Guided Diffusion WACV 2026

Language-Guided and Motion-Aware Gait Representation for Generalizable Recognition AAAI 2026

STMI: Segmentation-Guided Token Modulation with Cross-Modal Hypergraph Interaction for Multi-Modal Object Re-Identification AAAI 2026

Hymavi : A Hybrid Mamba-Attention Network in Multi-View Framework for Volumetric Medical Image Segmentation WACV 2026

TimeRefine: Temporal Grounding with Time Refining Video LLM WACV 2026

SasMamba: A Lightweight Structure-Aware Stride State Space Model for 3D Human Pose Estimation WACV 2026

TTVAE: Transformer-Based Generative Modeling for Tabular Data Generation (Abstract Reprint) AAAI 2026

Learning to Parse and Reconstruct: Bidirectional Modeling of Question-to-Program Mapping AAAI 2026

MHA2MLA-VLM: Enabling DeepSeek’s Economical Multi-Head Latent Attention Across Vision-Language Models AAAI 2026

OW-Rep: Open World Object Detection with Instance Representation Learning WACV 2026

RapidMV: Leveraging Spatio-Angular Latent Space for Efficient and Consistent Text-to-Multi-View Synthesis WACV 2026

Distilling Offline Action Detection Models into Real-Time Streaming Models WACV 2026

ISALux: Illumination and Semantics-Aware Transformer Employing Mixture of Experts for Low Light Image Enhancement WACV 2026