Optimizing Vision-Language Model for Road Crossing Intention Estimation

Roy Uziel; Oded Bialer

2025 WACV WACV 2025

Optimizing Vision-Language Model for Road Crossing Intention Estimation

Abstract

Identifying a pedestrian's intention to cross the road is crucial for autonomous driving as it alerts the system to stop or slow down. However determining crossing intention from video is challenging due to the need for extracting complex high-level semantics. This paper introduces ClipCross a novel classification framework optimized to extract high-level semantic features using the vision-language model CLIP for determining crossing intention. Existing CLIP-based methods perform poorly in this task as CLIP's image and text encoders fail to capture the nuanced semantic distinctions between crossing and non-crossing intention images. ClipCross addresses this by optimizing a set of CLIP text embeddings to extract high-level semantic features which a multi-layer perceptron uses to distinguish between crossing and non-crossing intentions. ClipCross achieves state-of-the-art performance on crossing intention estimation benchmark datasets: PIE PSI and JAAD.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Machine Learning

🧭 Keyword Pioneer — road crossing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Roy Uziel , Oded Bialer

Topics

Artificial Intelligence > Core AI > Autonomous Vehicles Artificial Intelligence > Core AI > Multimodal Learning Computer Vision > Domain-Specific > Autonomous Driving Machine Learning > Learning Types > Multi-Modal Learning Artificial Intelligence > Core AI > Computer Vision

Keywords

autonomous driving vision language model vision-language model semantic feature pedestrian intention road crossing crossing intention

Download PDF

Related papers

Neural Graph Map: Dense Mapping with Efficient Loop Closure Integration 2025

ELMGS: Enhancing Memory and Computation Scalability through Compression for 3D Gaussian Splatting 2025

Feature Fusion Transferability Aware Transformer for Unsupervised Domain Adaptation 2025

Uncertainty-Aware Online Extrinsic Calibration: A Conformal Prediction Approach 2025

Disentangling Spatio-Temporal Knowledge for Weakly Supervised Object Detection and Segmentation in Surgical Video 2025