MinkUNeXt: Point Cloud-based Large-scale Place Recognition using 3D Sparse Convolutions

1Institute for Engineering Research (I3E)
2Valencian Graduate School and Research Network for Artificial Intelligence (valgrAI)
3ADAI Field Tech Lab
MY ALT TEXT

Abstract

This paper presents MinkUNeXt, an effective and efficient architecture for place-recognition from point clouds entirely based on the new 3D MinkNeXt Block, a residual block composed of 3D sparse convolutions that follows the philosophy established by recent Transformers but purely using simple 3D convolutions. Feature extraction is performed at different scales by a U-Net encoder-decoder network and the feature aggregation of those features into a single descriptor is carried out by a Generalized Mean Pooling (GeM). The proposed architecture demonstrates that it is possible to surpass the current state-of-the-art by only relying on conventional 3D sparse convolutions without making use of more complex and sophisticated proposals such as Transformers, Attention-Layers or Deformable Convolutions. A thorough assessment of the proposal has been carried out using the Oxford RobotCar and the In-house datasets. As a result, MinkUNeXt proves to outperform other methods in the state-of-the-art. We evaluate the impact of different state-of-the-art CNN models such as ConvNeXt for the proposed localization. Various data augmentation visual effects are separately employed for training the model, and their impact is assessed. The performance of the resulting CNNs is evaluated under real operation conditions, including changes in lighting conditions.

Retrieval at the Oxford RobotCar Dataset

2014-11-14-16-34-33

2015-11-12-11-22-05

Retrieval at the In-house Dataset

Business District (B.D.)

Run 1

Run 5

Retrieval at the In-house Dataset

Residential Area (R.A.)

Run 1

Run 5

Retrieval at the In-house Dataset

University Sector (U.S.)

Run 1

Run 5

Comparison with other methods

new_image_1

Comparison in terms of Recall@1 (R@1) and Recall@1% (R@1%) with state-of-the-art methods.

Bibliography:
[1] Uy, M. A., Pham, Q. H., Hua, B.-S., Nguyen, T., & Yeung, S.-K. (2018). PointNetVLAD: Deep point cloud based retrieval for large-scale place recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4470-4479).
[2] Zhang, J., Hua, B.-S., & Yeung, S.-K. (2019). PCAN: 3D Attention Map Learning Using Contextual Information for Point Cloud Based Retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 12436-12445).
[3] Sun, Y., & Chen, X. (2020). DAGC: Data-Augmentation and Graph Convolution Network for 3D Point Cloud Classification. IEEE Transactions on Multimedia, 22(9), 2237-2249.
[4] Liu, Z., Tang, H., Lin, Y., & Han, S. (2019). LPD-Net: 3D Point Cloud Learning for Large-Scale Place Recognition and Environment Analysis. In Proceedings of the IEEE International Conference on Computer Vision (pp. 12308-12317).
[5] Xia, Y., Chen, X., & Zhang, H. (2021). SOE-Net: A Self-Attention and Orientation Encoding Network for Point Cloud based Place Recognition. IEEE Transactions on Multimedia, 23, 2751-2763.
[6] Komorowski, J. (2021). MinkLoc3D: Point Cloud Based Large-Scale Place Recognition. In Proceedings of the IEEE International Conference on 3D Vision (pp. 1080-1088).
[7] Hui, L., Yi, L., Wu, Z., Qi, C. R., & Guibas, L. J. (2021). PPT-Net: Point Pair Transformation Network for Efficient Point Cloud Registration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1578-1587).
[8] Fan, L., Pang, J., Yang, Y., & Tian, Y. (2022). SVT-Net: Supervised Volumetric Transformer Network for Place Recognition Using 3D Point Clouds. IEEE Transactions on Neural Networks and Learning Systems.
[9] Xu, H., Zhang, J., & Yeung, S.-K. (2021). TransLoc3D: A Transformer Network for Place Recognition using 3D Point Clouds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3677-3686).
[10] Komorowski, J. (2022). Improving Large-Scale Place Recognition Using MinkLoc3D and Sparse Voxelization. IEEE Transactions on Image Processing, 31, 3054-3068.

BibTeX

@misc{cabrera2024minkunext,
        title={MinkUNeXt: Point Cloud-based Large-scale Place Recognition using 3D Sparse Convolutions},
        author={J. J. Cabrera and A. Santo and A. Gil and C. Viegas and L. Payá},
        year={2024},
        eprint={2403.07593},
        archivePrefix={arXiv},
        primaryClass={cs.CV}
      }