← Back to papers

2024 ECCV ECCV 2024

"X-InstructBLIP: A Framework for Aligning Image, 3D, Audio, Video to LLMs and its Emergent Cross-modal Reasoning"

Authors

Artemis Panagopoulou , Le Xue , Ning Yu , LI JUNNAN , DONGXU LI , Shafiq Joty , Ran Xu , Silvio Savarese , Caiming Xiong , Juan Carlos Niebles

Related papers

Ponymation: Learning Articulated 3D Animal Motions from Unlabeled Online Videos 2024

Learning Camouflaged Object Detection from Noisy Pseudo Label 2024

ScaleDreamer: Scalable Text-to-3D Synthesis with Asynchronous Score Distillation 2024

FinePseudo: Improving Pseudo-Labelling through Temporal-Alignablity for Semi-Supervised Fine-Grained Action Recognition 2024

UniCode : Learning a Unified Codebook for Multimodal Large Language Models 2024