2022 ECCV ECCV 2022

D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding