• Optical Instruments
  • Vol. 46, Issue 5, 1 (2024)
Dong LIU, Rongfu ZHANG*, Junxiang QIN, Junzhe GONG, and Zhibin CAO
Author Affiliations
  • School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
  • show less
    DOI: 10.3969/j.issn.1005-5630.202308160108 Cite this Article
    Dong LIU, Rongfu ZHANG, Junxiang QIN, Junzhe GONG, Zhibin CAO. Architectural style classification algorithm fusing CNN and Transformer[J]. Optical Instruments, 2024, 46(5): 1 Copy Citation Text show less
    References

    [1] ZHANG L M, SONG M L, LIU X et al. Recognizing architecture styles by hierarchical sparse coding of blocklets[J]. Information Sciences, 254, 141-154(2014).

    [2] XU Z, TAO D C, ZHANG Y, et al. Architectural style classification using multinomial latent logistic regression[C]13th European Conference on Computer Vision–ECCV 2014. Zurich, Switzerl: Springer, 2014: 600 – 615.

    [5] WANG R, GU D H, WEN Z J, et al. Intraclass classification of architectural styles using visualization of CNN[C]5th International Conference on Artificial Intelligence Security. New Yk: Springer, 2019: 205 – 216.

    [6] YI Y K, ZHANG Y H, MYUNG J. House style recognition using deep convolutional neural network[J]. Automation in Construction, 118, 103307(2020).

    [7] ZHAO H S, JIA J Y, KOLTUN V. Expling selfattention f image recognition[C]Proceedings of the IEEECVF Conference on Computer Vision Pattern Recognition. Seattle: IEEE, 2020: 10073 – 10082.

    [8] RAMACHRAN P, PARMAR N, VASWANI A, et al. Stalone selfattention in vision models[C]Proceedings of the 33rd International Conference on Neural Infmation Processing Systems. Vancouver: ACM, 2019: 7.

    [9] WANG B, ZHANG S L, ZHANG J F et al. Architectural style classification based on CNN and channel–spatial attention[J]. Signal, Image and Video Processing, 17, 99-107(2023).

    [10] ASHISH V, NOAM S, NIKI P, et al. Attention is all you need[C]Annual Conference on Neural Infmation Processing Systems 2017. Long Beach: NIPS, 2017: 5998 – 6008.

    [11] PENG Z L, HUANG W, GU S Z, et al. Confmer: Local features coupling global representations f visual recognition[C]Proceedings of the 2021 IEEECVF International Conference on Computer Vision. Montreal: IEEE, 2021: 357 – 366.

    [12] CHEN Y P, DAI X Y, CHEN D D, et al. Mobilefmer: bridging mobile transfmer[C] Proceedings of the 2022 IEEECVF Conference on Computer Vision Pattern Recognition. New leans: IEEE, 2022: 5260 – 5269.

    [13] SLER M, HOWARD A, ZHU M L, et al. MobileV2: Inverted residuals linear bottlenecks[C]2018 IEEECVF Conference on Computer Vision Pattern Recognition. Salt Lake City: IEEE, 2018: 4510 – 4520.

    [14] CDONNIER J B, LOUKAS A, JAGGI M. On the relationship between selfattention convolutional layers[C]8th International Conference on Learning Representations. Addis Ababa: ICLR, 2019.

    [15] SRINIVAS A, LIN T Y, PARMAR N, et al. Bottleneck transfmers f visual recognition[C]Proceedings of the IEEECVF Conference on Computer Vision Pattern Recognition. Nashville: IEEE, 2021: 16514 – 16524.

    [16] TOUVRON H, CD M, DOUZE M, et al. Training dataefficient image transfmers & distillation through attention[C]International conference on machine learning. PMLR, 2021: 10347 – 10357.

    [17] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning f image recognition[C]Proceedings of the IEEE Conference on Computer Vision Pattern Recognition (CVPR). Las Vegas: IEEE, 2016: 770 – 778.

    [18] SCHROFF F, KALENICHENKO D, PHILBIN J. Face: a unified embedding f face recognition clustering[C]Proceedings of the IEEE Conference on Computer Vision Pattern Recognition. Boston: IEEE, 2015: 815 – 823.

    [19] BARZ B, DENZLER J. Wikichurches: A fine-grained dataset of architectural styles with real-world challenges[J]. arXiv preprint arXiv:, 06959, 2021(2108).

    [20] ZHANG H Y, CISSÉ M, DAUPHIN Y N, et al. mixup: Beyond empirical risk minimization[C]6th International Conference on Learning Representations. Vancouver: ICLR, 2018.

    [21] SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture f computer vision[C]Proceedings of the IEEE Conference on Computer Vision Pattern Recognition. Las Vegas: IEEE, 2016: 2818 – 2826.

    [22] LIU Z, LIN Y T, CAO Y, et al. Swin transfmer: Hierarchical vision transfmer using shifted windows[C]Proceedings of the IEEECVF International Conference on Computer Vision. Montreal: IEEE, 2021: 9992 – 10002.

    [23] CHEN Z S, XIE L X, NIU J W, et al. Visfmer: The visionfriendly transfmer[C]Proceedings of the IEEECVF International Conference on Computer Vision. Montreal: IEEE, 2021: 569 – 578.

    [24] LAMAS A, TABIK S, CRUZ P et al. MonuMAI: Dataset, deep learning pipeline and citizen science based app for monumental heritage taxonomy and classification[J]. Neurocomputing, 420, 266-280(2021).

    Dong LIU, Rongfu ZHANG, Junxiang QIN, Junzhe GONG, Zhibin CAO. Architectural style classification algorithm fusing CNN and Transformer[J]. Optical Instruments, 2024, 46(5): 1
    Download Citation