TransiT: Transient Transformer for Non-line-of-sight Videography

Ruiqian Li; Siyuan Shen; Suan Xia; Ziheng Wang; Xingyue Peng; Chengxuan Song; Yingsheng Zhu; Tao Wu; Shiying Li; Jingyi Yu

2025 ICCV ICCV 2025

TransiT: Transient Transformer for Non-line-of-sight Videography

Abstract

High quality and high speed videography using Non-Line-of-Sight (NLOS) imaging benefit autonomous navigation, collision prevention, and post-disaster search and rescue tasks. Current solutions have to balance between the frame rate and image quality. High frame rates, for example, can be achieved by reducing either per-point scanning time or scanning density, but at the cost of lowering the information density at individual frames. Fast scanning process further reduces the signal-to-noise ratio and different scanning systems exhibit different distortion characteristics. In this work, we design and employ a new Transient Transformer architecture called TransiT to achieve real-time NLOS recovery under fast scans. TransiT directly compresses the temporal dimension of input transients to extract features, reducing computation costs and meeting high frame rate requirements. It further adopts a feature fusion mechanism as well as employs a spatial-temporal Transformer to help capture features of NLOS transient videos. Moreover, TransiT applies transfer learning to bridge the gap between synthetic and real-measured data. In real experiments, TransiT manages to reconstruct from sparse transients of 16 x16 measured at an exposure time of 0.4 ms per point to NLOS videos at a 64 x64 resolution at 10 frames per second. We will make our code and dataset available to the community.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning

🧭 Keyword Pioneer — transient transformer

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ruiqian Li , Siyuan Shen , Suan Xia , Ziheng Wang , Xingyue Peng , Chengxuan Song , Yingsheng Zhu , Tao Wu , Shiying Li , Jingyi Yu

Topics

Deep Learning > Architectures > Transformers Computer Vision > Analysis > 3D Vision Computer Vision > Processing > Video Processing Artificial Intelligence > Core AI > Computer Vision

Keywords

transfer learning video reconstruction spatial-temporal transformer non-line-of-sight imaging sparse-view imaging transient transformer

Download PDF

Related papers

MA-CIR: A Multimodal Arithmetic Benchmark for Composed Image Retrieval 2025

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality 2025

MonSTeR: a Unified Model for Motion, Scene, Text Retrieval 2025

ASGS: Single-Domain Generalizable Open-Set Object Detection via Adaptive Subgraph Searching 2025

Robust Dataset Condensation using Supervised Contrastive Learning 2025