OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies

Lingdong Kong; Youquan Liu; Lai Xing Ng; Benoit R. Cottereau; Wei Tsang Ooi

2024 CVPR CVPR 2024

OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies

Abstract

Event-based semantic segmentation (ESS) is a fundamental yet challenging task for event camera sensing. The difficulties in interpreting and annotating event data limit its scalability. While domain adaptation from images to event data can help to mitigate this issue there exist data representational differences that require additional effort to resolve. In this work for the first time we synergize information from image text and event-data domains and introduce OpenESS to enable scalable ESS in an open-world annotation-efficient manner. We achieve this goal by transferring the semantically rich CLIP knowledge from image-text pairs to event streams. To pursue better cross-modality adaptation we propose a frame-to-event contrastive distillation and a text-to-event semantic consistency regularization. Experimental results on popular ESS benchmarks showed our approach outperforms existing methods. Notably we achieve 53.93% and 43.31% mIoU on DDD17 and DSEC-Semantic without using either event or frame labels.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — event-based semantic segmentation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Lingdong Kong , Youquan Liu , Lai Xing Ng , Benoit R. Cottereau , Wei Tsang Ooi

Topics

Machine Learning > Learning Types > Contrastive Learning Computer Vision > Processing > Semantic Segmentation Machine Learning > Learning Types > Transfer Learning Deep Learning > Learning Types > Contrastive Learning Deep Learning > Learning Types > Multi-Modal Learning Deep Learning > Learning Types > Domain Adaptation

Keywords

contrastive learning transfer learning domain adaptation knowledge distillation multi-modal learning open vocabulary event-based semantic segmentation

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024