CrossPlace: Place Recognition between Omnidirectional Cameras and LiDAR via a Unified Feature Space

CrossPlace: Cross-modal Place Recognition between Omnidirectional Cameras and LiDAR via a Unified Feature Space

¹Institute for Engineering Research (I3E)
²Valencian Graduate School and Research Network for Artificial Intelligence (valgrAI)

Abstract

This paper presents CrossPlace, an innovative method for cross-modal place recognition between heterogeneous sensor modalities, particularly between fisheye cameras and LiDAR. Place recognition is the fundamental capability of mobile robots to determine their most likely location within a database, based on sensory input queries. In cross-modal place recognition, the goal is to localize using a different sensor from the one originally used to construct the database. The core contribution of this paper is a unified feature space that integrates intensity, depth and semantic information. Both the database entries and the queries are obtained by embedding sensor readings through the same CrossPlace model, ensuring a consistent representation across modalities. Consequently, a database constructed from LiDAR can be queried with fisheye images, and vice versa, using a single shared architecture. Furthermore, a comprehensive data transformation and preprocessing pipeline is presented. Specifically, CrossPlace is constituted by three independently branches, each one for processing intensity, depth and semantic information. Each branch consists of a CosPlace model for image embedding with shared weights across sensor modalities. Late fusion through concatenation of the intensity, depth and semantic embbedings provides optimal global performance. We conduct an exhaustive evaluation on the KITTI-360 dataset, where CrossPlace surpasses state-of-the-art techniques across all metrics, establishing a new standard for cross-modal place recognition in urban and highway environments. The results demonstrate the effectiveness of our unified approach for place recognition across different sensor modalities while maintaining a robust performance under various operating environments.

Acknowledgements

The Ministry of Science, Innovation and Universities (Spain) has funded this work through FPU23/00587 (M. Alfaro) and FPU21/04969 (J.J. Cabrera). This work is part of the projects PID2023-149575OB-I00, funded by MICIU/AEI/10.13039/501100011033 and by FEDER UE, and CIPROM/2024/8, funded by Generalitat Valenciana.

CrossPlace: Cross-modal Place Recognition between Omnidirectional Cameras and LiDAR via a Unified Feature Space

Abstract

Successful retrieval example in 2D-3D modality in urban environment 00 with CrossPlace method for place recognition between fisheye images and LiDAR.

Slight error example in 3D-2D modality in urban environment 18 with CrossPlace method for place recognition between fisheye images and LiDAR.

Successful retrieval example in 2D-3D modality in urban environment 18 with CrossPlace method for place recognition between fisheye images and LiDAR.

Successful retrieval example in 2D-3D modality in highway environment 07 with CrossPlace method for place recognition between fisheye images and LiDAR.

Failed retrieval example in 3D-2D modality in highway environment 03 with CrossPlace method for place recognition between fisheye images and LiDAR.

Successful retrieval example in 3D-2D modality in highway environment 07 with CrossPlace method for place recognition between fisheye images and LiDAR.

CrossPlace Evaluation Results

Highway Environment - Sequence 03

2D-3D Modality

3D-2D Modality

CrossPlace Evaluation Results

Urban Environment - Sequence 00

2D-3D Modality

3D-2D Modality

CrossPlace Further Test Results

Highway Environment - Sequence 07

2D-3D Modality

3D-2D Modality

CrossPlace Further Test Results

Urban Environment - Sequence 18

2D-3D Modality

3D-2D Modality

Acknowledgements