Robust Place Recognition under Illumination Changes using pseudo-LiDAR from Omnidirectional Images

Juan José Cabrera¹, Marcos Alfaro¹, Arturo Gil¹, Oscar Reinoso^1,2 Luis Payá^1,2

¹Institute for Engineering Research (I3E), ²Valencian Graduate School and Research Network for Artificial Intelligence (valgrAI)

Code

The pseudo-LiDAR place recognition problem is addressed in two steps: (1) The omnidirectional image is transformed into a 3D point cloud by means of a depth estimation map, obtained with Distill Any Depth [1] and (2) The point cloud is embedded into a global descriptor with the MinkUNeXt architecture [2].

We propose a novel Data Augmentation technique, Distilled Depth Variations, which selectively estimates depth using different distilled versions of Depth Anything v2 (small, base, large) [3] and Distill Any Depth (small, base, large) [1]. This method introduces depth distortions based on the predictions of less robust models (e.g., the small and base variants). By simulating the inaccuracies of weaker depth estimators, this approach enhances the model's resilience to depth estimation errors inherent in pseudo-LiDAR generation pipelines.

Abstract

Visual Place Recognition (VPR) systems often struggle with variations in scene appearance caused by illumination changes and different acquisition platforms. This paper proposes an alternative framework that leverages depth estimation to overcome these challenges. Our approach transforms omnidirectional images from catadioptric cameras into depth maps using Distill Any Depth [1], a state-of-the-art depth estimator based on Depth Anything V2 [3]. These depth maps are then converted into pseudo-LiDAR point clouds, which serve as input to the MinkUNeXt architecture for generating global-appearance descriptors. A key innovation lies in our novel data augmentation technique that exploits different distilled variants of depth estimation models to enhance robustness across varying conditions. Despite training on a limited set of images captured only under cloudy conditions, our system demonstrates strong performance when evaluated across diverse lighting scenarios and previously unseen environments of the COLD database [4]. Experiments show that our approach provides a viable alternative to traditional VPR methods, with competitive results across all tested scenarios. Furthermore, the generated pseudo-LiDAR data offers an additional benefit: enabling the enhancement of 3D processing architectures by providing abundant training data without expensive LiDAR hardware. This work presents a fundamentally different approach to scene representation for VPR, with promising implications for robot localization in changing environments.

Successful Night-Cloudy retrieval on Saarbrücken-A environment.

Successful Sunny-Cloudy retrieval on Freiburg-B environment.

Failed Night-Cloudy retrieval on Saarbrücken-B environment.

Successful Cloudy-Cloudy retrieval on Saarbrücken-A environment.

Failed Night-Cloudy retrieval on Saarbrücken-A environment.

Successful Sunny-Cloudy retrieval on Saarbrücken-B environment.

Failed Sunny-Cloudy retrieval on Saarbrücken-B environment.

Retrieval at Cloudy

Place recognition with query and database under the same illumination condition (query: cloudy, database: cloudy).

Freiburg-A

Saarbrücken-A

Retrieval at Night

Place recognition with query and database under different illumination conditions (query: night, database: cloudy).

Freiburg-A

Saarbrücken-A

Retrieval at Sunny

Place recognition with query and database under different illumination conditions (query: sunny, database: cloudy).

Freiburg-A

Saarbrücken-B

Comparison with other methods

Comparison in terms of Recall@1 (R@1) with state-of-the-art methods.

Comparison in terms of Recall@1% (R@1%) with state-of-the-art methods.

Bibliography:
[1] He, X., Guo, D., Li, H., Li, R., Cui, Y., Zhang, C.: Distill any depth: Distillation creates a stronger monocular depth estimator. arXiv preprint arXiv:2502.19204 (2025)
[2] Cabrera, J.J., Santo, A., Gil, A., Viegas, C., Payá, L.: MinkUNeXt: Point cloud-based large-scale place recognition using 3D sparse convolutions. arXiv preprint arXiv:2403.07593 (2024) https://doi.org/10.48550/arXiv.2403.07593
[3] Yang, L., Kang, B., Huang, Z., Zhao, Z., Xu, X., Feng, J., Zhao, H.: Depth Anything v2. Advances in Neural Information Processing Systems 37, 21875-21911 (2025). https://doi.org/10.48550/arXiv.2406.09414
[4] Pronobis, A., Caputo, B.: COLD: the COsy Localization Database. The International Journal of Robotics Research 28(5), 588-594 (2009) https://doi.org/10.1177/0278364909103912
[5] Cabrera, J.J., Román, V., Gil, A., Reinoso, O., Payá, L.: An experimental evaluation of siamese neural networks for robot localization using omnidirectional imaging in indoor environments. Artificial Intelligence Review 57(198) (2024) https://doi.org/10.1007/s10462-024-10840-0
[6] Alfaro, M., Cabrera, J.J., Jiménez, L.M., Reinoso, O., Payá, L.: Triplet Neural Networks for the Visual Localization of Mobile Robots. In: Proceedings of the 21st International Conference on Informatics in Control, Automation and Robotics (ICINCO 2024), vol. 2, pp. 125-132 (2024). https://doi.org/10.5220/0012927400003822
[7] Izquierdo, S., Civera, J.: Optimal transport aggregation for visual place recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17658-17668 (2024). https://doi.org/10.48550/arXiv.2311.15937
[8] Keetha, N., Mishra, A., Karhade, J., Jatavallabhula, K.M., Scherer, S., Krishna, M., Garg, S.: AnyLoc: towards universal visual place recognition. IEEE Robotics and Automation Letters (2023) https://doi.org/10.1109/LRA.2023.3343602
[9] Berton, G., Masone, C., Caputo, B.: Rethinking visual geo-localization for large-scale applications. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4878-4888 (2022). https://doi.org/10.48550/arXiv.2204.02287
[10] Berton, G., Trivigno, G., Caputo, B., Masone, C.: Eigenplaces: Training viewpoint robust models for visual place recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11080-11090 (2023). https://doi.org/10.48550/arXiv.2308.10832
[11] Ali-Bey, A., Chaib-Draa, B., Giguere, P.: MixVPR: Feature mixing for visual place recognition. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2998-3007 (2023). https://doi.org/10.48550/arXiv.2303.02190

Robust Place Recognition under Illumination Changes using pseudo-LiDAR from Omnidirectional Images

Abstract

Successful Night-Cloudy retrieval on Saarbrücken-A environment.

Successful Sunny-Cloudy retrieval on Freiburg-B environment.

Failed Night-Cloudy retrieval on Saarbrücken-B environment.

Successful Cloudy-Cloudy retrieval on Saarbrücken-A environment.

Failed Night-Cloudy retrieval on Saarbrücken-A environment.

Successful Sunny-Cloudy retrieval on Saarbrücken-B environment.

Failed Sunny-Cloudy retrieval on Saarbrücken-B environment.

Retrieval at Cloudy

Place recognition with query and database under the same illumination condition (query: cloudy, database: cloudy).

Freiburg-A

Saarbrücken-A

Retrieval at Night

Place recognition with query and database under different illumination conditions (query: night, database: cloudy).

Freiburg-A

Saarbrücken-A

Retrieval at Sunny

Place recognition with query and database under different illumination conditions (query: sunny, database: cloudy).

Freiburg-A

Saarbrücken-B

Comparison with other methods

Comparison in terms of Recall@1 (R@1) with state-of-the-art methods.

Comparison in terms of Recall@1% (R@1%) with state-of-the-art methods.

Acknowledgements