MinkUNeXt: Point Cloud-based Large-scale Place Recognition using 3D Sparse Convolutions

1Institute for Engineering Research (I3E)
2Valencian Graduate School and Research Network for Artificial Intelligence (valgrAI)
3ADAI Field Tech Lab
MY ALT TEXT

Abstract

This paper presents MinkUNeXt, an effective and efficient architecture for place-recognition from point clouds entirely based on the new 3D MinkNeXt Block, a residual block composed of 3D sparse convolutions that follows the philosophy established by recent Transformers but purely using simple 3D convolutions. Feature extraction is performed at different scales by a U-Net encoder-decoder network and the feature aggregation of those features into a single descriptor is carried out by a Generalized Mean Pooling (GeM). The proposed architecture demonstrates that it is possible to surpass the current state-of-the-art by only relying on conventional 3D sparse convolutions without making use of more complex and sophisticated proposals such as Transformers, Attention-Layers or Deformable Convolutions. A thorough assessment of the proposal has been carried out using the Oxford RobotCar, the In-house, the KITTI and the USyd datasets. As a result, MinkUNeXt proves to outperform other methods in the state-of-the-art.

Retrieval at the Oxford RobotCar Dataset

2014-11-14-16-34-33

2015-11-12-11-22-05

Retrieval at the In-house Dataset

Business District (B.D.)

Run 1

Run 5

Retrieval at the In-house Dataset

Residential Area (R.A.)

Run 1

Run 5

Retrieval at the In-house Dataset

University Sector (U.S.)

Run 1

Run 5

Comparison with other methods

new_image_1

Comparison in terms of Recall@1 (R@1) and Recall@1% (R@1%) with state-of-the-art methods on the baseline protocol.

new_image_1

Comparison in terms of Recall@1 (R@1) and Recall@1% (R@1%) with state-of-the-art methods on the refined protocol.

new_image_1

Comparison in terms of Recall@1 (R@1) and Recall@1% (R@1%) with state-of-the-art methods on the further test protocol.

Bibliography:
[1] M. A. Uy, G. H. Lee, PointNetVLAD: Deep point cloud based retrieval for large-scale place recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4470–4479.
[2] W. Zhang, C. Xiao, Pcan: 3D attention map learning using contextual information for point cloud based retrieval, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 12436–12445.
[3] Q. Sun, H. Liu, J. He, Z. Fan, X. Du, Dagc: Employing dual attention and graph convolution for point cloud based place recognition, in: Proceedings of the 2020 International Conference on Multimedia Retrieval, 2020, pp. 224–232.
[4] Z. Hou, Y. Shang, T. Gao, Y. Yan, BPT: binary point cloud transformer for place recognition, arXiv preprint arXiv:2303.01166 (2023).
[5] L. Wiesmann, R. Marcuzzi, C. Stachniss, J. Behley, Retriever: Point cloud retrieval in compressed 3D maps, in: 2022 International Conference on Robotics and Automation (ICRA), IEEE, 2022, pp. 10925–10932.
[6] Z. Fan, Z. Song, W. Zhang, H. Liu, J. He, X. Du, RPR-Net: A point cloud-based rotation-aware large scale place recognition network, in: European Conference on Computer Vision, Springer, 2022, pp. 709–725.
[7] Z. Liu, S. Zhou, C. Suo, P. Yin, W. Chen, H. Wang, H. Li, Y.-H. Liu, LPD-Net: 3D point cloud learning for large-scale place recognition and environment analysis, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 2831–2840.
[8] Z. Hou, Y. Yan, C. Xu, H. Kong, in: 2022 International Conference on Robotics and Automation (ICRA), IEEE, 2022, pp. 2612–2618.
[9] L. Hui, M. Cheng, J. Xie, J. Yang, M.-M. Cheng, Efficient 3D point cloud feature learning for large-scale place recognition, IEEE Transactions on Image Processing 31 (2022) 1258–1270.
[10] C. E. Lin, J. Song, R. Zhang, M. Zhu, M. Ghaffari, Se (3)-equivariant point cloud-based place recognition, in: Conference on Robot Learning, PMLR, 2023, pp. 1520–1530.
[11] Y. Xia, Y. Xu, S. Li, R. Wang, J. Du, D. Cremers, U. Stilla, SOE-Net: A self-attention and orientation encoding network for point cloud based place recognition, in: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, 2021, pp. 11348–11357.
[12] J. Komorowski, Minkloc3D: Point cloud based large-scale place recognition, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021, pp. 1790–1799.
[13] D. W. Shu, J. Kwon, Hierarchical bidirected graph convolutions for large-scale 3D point cloud place recognition, IEEE Transactions on Neural Networks and Learning Systems (2023).
[14] Z. Zhou, C. Zhao, D. Adolfsson, S. Su, Y. Gao, T. Duckett, L. Sun, NDT-transformer: Large-scale 3D point cloud localisation using the normal distribution transform representation, in: 2021 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2021, pp. 5654–5660.
[15] L. Hui, H. Yang, M. Cheng, J. Xie, J. Yang, Pyramid point cloud transformer for large-scale place recognition, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 6098–6107.
[16] Z. Fan, Z. Song, H. Liu, Z. Lu, J. He, X. Du, SVT-Net: Super lightweight sparse voxel transformer for large scale place recognition, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36, 2022, pp. 551–560.
[17] T.-X. Xu, Y.-C. Guo, Z. Li, G. Yu, Y.-K. Lai, S.-H. Zhang, TransLoc3D: Point cloud based large-scale place recognition using adaptive receptive fields, arXiv preprint arXiv:2105.11605 (2021).
[18] J. Komorowski, Improving point cloud based place recognition with ranking-based loss and large batch training, in: 2022 26th International Conference on Pattern Recognition (ICPR), IEEE, 2022, pp. 3699–3705.
[19] L. Wiesmann, L. Nunes, J. Behley, C. Stachniss, KPPR: Exploiting momentum contrast for point cloud-based place recognition, IEEE Robotics and Automation Letters 8 (2) (2022) 592–599.
[20] R. Zhang, G. Li, W. Gao, T. H. Li, Compoint: Can complex-valued representation benefit point cloud place recognition?, IEEE Transactions on Intelligent Transportation Systems 25 (7) (2024) 7494–7507.
[21] Y. Xia, M. Gladkova, R. Wang, Q. Li, U. Stilla, J. F. Henriques, D. Cremers, Casspr: Cross attention single scan place recognition, in: Proceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 8461–8472.
[22] G. Li, R. Zhang, A point is a wave: point-wave network for place recognition, in: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2023, pp. 1–5.

BibTeX


        @article{cabrera2025minkunext,
        title = {MinkUNeXt: Point cloud-based large-scale place recognition using 3D sparse convolutions},
        journal = {Array},
        volume = {28},
        pages = {100569},
        year = {2025},
        issn = {2590-0056},
        doi = {https://doi.org/10.1016/j.array.2025.100569},
        url = {https://www.sciencedirect.com/science/article/pii/S2590005625001961},
        author = {Juan José Cabrera and Antonio Santo and Arturo Gil and Carlos Viegas and Luis Payá},
        keywords = {Place recognition, LiDAR, Point cloud embedding, 3D sparse convolutions},
        abstract = {This paper presents MinkUNeXt, an effective and efficient architecture for place-recognition from point clouds entirely based on the new 3D MinkNeXt Block, a residual block composed of 3D sparse convolutions that follows the philosophy established by recent Transformers but purely using simple 3D convolutions. Feature extraction is performed at different scales by a U-Net encoder–decoder network and the feature aggregation of those features into a single descriptor is carried out by a Generalized Mean Pooling (GeM). The proposed architecture demonstrates that it is possible to surpass the current state-of-the-art by only relying on conventional 3D sparse convolutions without making use of more complex and sophisticated proposals such as Transformers, Attention-Layers or Deformable Convolutions. A thorough assessment of the proposal has been carried out using the Oxford RobotCar, the In-house, the KITTI and the USyd datasets. As a result, MinkUNeXt proves to outperform other methods in the state-of-the-art. The implementation is publicly available at https://juanjo-cabrera.github.io/projects-MinkUNeXt/.}
      }

Acknowledgements

The Ministry of Science, Innovation and Universities (Spain) has funded this work through FPU21/04969 (J.J. Cabrera). This work is part of the projects PID2023-149575OB-I00, funded by MICIU/AEI/10.13039/501100011033 and by FEDER UE, and CIPROM/2024/8, funded by Generalitat Valenciana.

Logos of funding institutions