EI-Nexus: Towards Unmediated and Flexible Inter-Modality Local Feature Extraction and Matching for Event-Image Data
Abstract
Event cameras with high temporal resolution and high dynamic range have limited research on the inter-modality local feature extraction and matching of event-image data. We propose EI-Nexus an unmediated and flexible framework that integrates two modality-specific keypoint extractors and a feature matcher. To achieve keypoint extraction across viewpoint and modality changes we bring Local Feature Distillation (LFD) which transfers the viewpoint consistency from a well-learned image extractor to the event extractor ensuring robust feature correspondence. Furthermore with the help of Context Aggregation (CA) a remarkable enhancement is observed in feature matching. We further establish the first two inter-modality feature matching benchmarks MVSEC-RPE and EC-RPE to assess relative pose estimation on event-image data. Our approach outperforms traditional methods that rely on explicit modal transformation offering more unmediated and adaptable feature extraction and matching achieving better keypoint similarity and state-of-the-art results on the MVSEC-RPE and EC-RPE benchmarks. The source code and benchmarks will be made publicly available at EI-Nexus.