2026 WACV WACV 2026

BiNAR: A Bi-Modal Framework for Non-Aligned RGB-IR 3D Reconstruction via Gaussian Splatting

Abstract

Existing RGB-IR (infrared) bi-modal 3D reconstruction methods generally have difficulty in simultaneously processing non-aligned multi-modal data with significant differences in resolution and spectral characteristics and achieving high-precision pixel-level reconstruction. Non-aligned RGB-IR 3D reconstruction and rendering represents a new domain. To this end, we propose BiNAR, a bi-modal framework that can directly process non-aligned data collected by conventional RGB and IR cameras and generate high-resolution, pixel-level aligned renderings. BiNAR first uses cross-modal multi-camera joint calibration to estimate the internal and external parameters of the RGB-IR camera and unify the coordinate system; then, it fuses the features of different modalities in the Unified Gaussian Field and jointly optimizes the Gaussians to achieve cross-modal consistent 3D scene expression. Experimental results show that BiNAR significantly outperforms traditional single-modal and bi-modal Gaussian splatting methods in rendering quality, achieving a sub-pixel average reprojection error of 0.242 px and improves IR PSNR by 8.00 dB. We also build a pixel-level aligned RGB-IR dataset covering a variety of indoor and outdoor scenes and including real temperature data, providing a reliable benchmark for subsequent multi-modal research. The code and dataset are available at https://github.com/jankin-wang/BiNAR.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio