Top

BMC Medical Informatics and Decision Making

Published in:

Open Access 01-12-2023 | Addiction | Research

Swin Unet3D: a three-dimensional medical image segmentation network combining vision transformer and convolution

Authors: Yimin Cai, Yuqing Long, Zhenggong Han, Mingkun Liu, Yuchen Zheng, Wei Yang, Liming Chen

Published in: BMC Medical Informatics and Decision Making | Issue 1/2023

Abstract

Background

Semantic segmentation of brain tumors plays a critical role in clinical treatment, especially for three-dimensional (3D) magnetic resonance imaging, which is often used in clinical practice. Automatic segmentation of the 3D structure of brain tumors can quickly help physicians understand the properties of tumors, such as the shape and size, thus improving the efficiency of preoperative planning and the odds of successful surgery. In past decades, 3D convolutional neural networks (CNNs) have dominated automatic segmentation methods for 3D medical images, and these network structures have achieved good results. However, to reduce the number of neural network parameters, practitioners ensure that the size of convolutional kernels in 3D convolutional operations generally does not exceed \(7 \times 7 \times 7\), which also leads to CNNs showing limitations in learning long-distance dependent information. Vision Transformer (ViT) is very good at learning long-distance dependent information in images, but it suffers from the problems of many parameters. What’s worse, the ViT cannot learn local dependency information in the previous layers under the condition of insufficient data. However, in the image segmentation task, being able to learn this local dependency information in the previous layers makes a big impact on the performance of the model.

Methods

This paper proposes the Swin Unet3D model, which represents voxel segmentation on medical images as a sequence-to-sequence prediction. The feature extraction sub-module in the model is designed as a parallel structure of Convolution and ViT so that all layers of the model are able to adequately learn both global and local dependency information in the image.

Results

On the validation dataset of Brats2021, our proposed model achieves dice coefficients of 0.840, 0.874, and 0.911 on the ET channel, TC channel, and WT channel, respectively. On the validation dataset of Brats2018, our model achieves dice coefficients of 0.716, 0.761, and 0.874 on the corresponding channels, respectively.

Conclusion

We propose a new segmentation model that combines the advantages of Vision Transformer and Convolution and achieves a better balance between the number of model parameters and segmentation accuracy. The code can be found at https://github.com/1152545264/SwinUnet3D.

Board PATE. Adult central nervous system tumors treatment (PDQ®): Health Professional Version. Website. 2022. https://www.cancer.gov/types/brain/hp/adult-brain-treatment-pdq.

Taghanaki SA, Abhishek K, Cohen JP, Cohen-Adad J, Hamarneh G. Deep semantic segmentation of natural and medical images: a review. Artif Intell Rev. 2021;54(1):137–78.CrossRef

Bhargavi K, Jyothi S. A survey on threshold based segmentation technique in image processing. Int J Innov Res Dev. 2014;3(12):234–9.

Kaganami HG, Beiji Z. In: 2009 Fifth international conference on intelligent information hiding and multimedia signal processing (IEEE). 2009; p. 1217–21.

Unser M. Texture classification and segmentation using wavelet frames. IEEE Trans Image Process. 1995;4(11):1549–60.CrossRefPubMed

Manjunath B, Chellappa R. Unsupervised texture segmentation using Markov random field models. IEEE Trans Pattern Anal Mach Intell. 1991;13(5):478–82.CrossRef

Paulinas M, Ušinskas A. A survey of genetic algorithms applications for image enhancement and segmentation. Inf Technol Control. 2007;36(3):66.

Ronneberger O, Fischer P, Brox T. In: International conference on medical image computing and computer-assisted intervention. Springer; 2015;pp. 234–41.

Çiçek Ö, Abdulkadir A, Lienkamp SS, Brox T, Ronneberger O. In: International conference on medical image computing and computer-assisted intervention. Springer; 2016; pp. 424–32.

10.

Odena A, Dumoulin V, Olah C. Deconvolution and checkerboard artifacts. Distill. 2016;1(10): e3.CrossRef

11.

Dumoulin V, Visin F. A guide to convolution arithmetic for deep learning. arXiv preprint arXiv:1603.07285. 2016.

12.

Long J, Shelhamer E, Darrell T. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2015. p. 3431–40.

13.

Isensee F, Jaeger PF, Kohl SA, Petersen J, Maier-Hein KH. nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods. 2021;18(2):203–11.CrossRefPubMed

14.

Milletari F, Navab N, Ahmadi SA. In: 2016 Fourth international conference on 3D vision (3DV). IEEE; 2016. p. 565–71.

15.

Zhou Z, Siddiquee MMR, Tajbakhsh N, Liang J. In: Deep learning in medical image analysis and multimodal learning for clinical decision support. Springer; 2018. p. 3–11.

16.

Oktay O, Schlemper J, Folgoc LL, Lee M, Heinrich M, Misawa K, Mori K, McDonagh S, Hammerla NY, Kainz B, et al. Attention u-net: learning where to look for the pancreas. arXiv preprint arXiv:1804.03999. 2018.

17.

Huang H, Lin L, Tong R, Hu H, Zhang Q, Iwamoto Y, Han X, Chen YW, Wu J. In: ICASSP 2020—2020 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE; 2020. p. 1055–9.

18.

Yu L, Cheng J, Dou Q, Yang X, Chen H, Qin J, Heng P. Automatic 3d cardiovascular MR segmentation with densely-connected volumetric convnets. CoRR. 2017. arXiv:http://arxiv.org/abs/1708.00573.

19.

Huang C, Han H, Yao Q, Zhu S, Zhou SK. In: MICCAI; 2019.

20.

Nikolaos A. Deep learning in medical image analysis: a comparative analysis of multi-modal brain-mri segmentation with 3d deep neural networks. Master’s thesis, University of Patras; 2019. https://github.com/black0017/MedicalZooPytorch.

21.

Chen J, Lu Y, Yu Q, Luo X, Adeli E, Wang Y, Lu L, Yuille AL, Zhou Y. Transunet: transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306. 2021.

22.

Raghu M, Unterthiner T, Kornblith S, Zhang C, Dosovitskiy A. Do vision transformers see like convolutional neural networks? Adv Neural Inf Process Syst. 2021;34:12116–28.

23.

Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122. 2015.

24.

Woo S, Park J, Lee JY, Kweon IS. In: Proceedings of the European conference on computer vision (ECCV); 2018. p. 3–19.

25.

Zhao H, Shi J, Qi X, Wang X, Jia J. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017. p. 2881–90.

26.

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. In: Advances in neural information processing systems; 2017. p. 5998–6008.

27.

Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al. An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929. 2020.

28.

Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B. Swin transformer: hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030. 2021.

29.

Cao H, Wang Y, Chen J, Jiang D, Zhang X, Tian Q, Wang M. Swin-unet: Unet-like pure transformer for medical image segmentation. arXiv preprint arXiv:2105.05537. 2021.

30.

Wang W, Chen C, Ding M, Yu H, Zha S, Li J. In: International conference on medical image computing and computer-assisted intervention. Springer; 2021. p. 109–19.

31.

Hatamizadeh A, Tang Y, Nath V, Yang D, Myronenko A, Landman B, Roth HR, Xu D. In: Proceedings of the IEEE/CVF Winter conference on applications of computer vision; 2022. p. 574–84.

32.

Jiang Y, Zhang Y, Lin X, Dong J, Cheng T, Liang J. Swinbts: a method for 3d multimodal brain tumor segmentation using Swin Transformer. Brain Sci. 2022;12(6):66. https://doi.org/10.3390/brainsci12060797.CrossRef

33.

Baid U, Ghodasara S, Mohan S, Bilello M, Calabrese E, Colak E, Farahani K, Kalpathy-Cramer J, Kitamura FC, Pati S, et al. The rsna–asnr–miccai brats 2021 benchmark on brain tumor segmentation and radiogenomic classification. arXiv preprint arXiv:2107.02314. 2021.

34.

Bakas S, Akbari H, Sotiras A, Bilello M, Rozycki M, Kirby JS, Freymann JB, Farahani K, Davatzikos C. Advancing the cancer genome atlas glioma mri collections with expert segmentation labels and radiomic features. Sci Data. 2017;4(1):1–13.CrossRef

35.

Menze BH, Jakab A, Bauer S, Kalpathy-Cramer J, Farahani K, Kirby J, Burren Y, Porz N, Slotboom J, Wiest R, et al. The multimodal brain tumor image segmentation benchmark (brats). IEEE Trans Med Imaging. 2014;34(10):1993–2024.CrossRefPubMedPubMedCentral

36.

Liu Z, Ning J, Cao Y, Wei Y, Zhang Z, Lin S, Hu H. Video Swin Transformer. arXiv preprint arXiv:2106.13230. 2021.

37.

Ba JL, Kiros JR, Hinton GE. Layer normalization. arXiv preprint arXiv:1607.06450. 2016.

38.

He K, Zhang X, Ren S, Sun J. In: Proceedings of the IEEE international conference on computer vision. 2015; p. 1026–34.

39.

Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H. Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861. 2017.

40.

Guo MH, Lu CZ, Liu ZN, Cheng MM, Hu SM. Visual attention network. arXiv preprint arXiv:2202.09741. 2022.

41.

Rogozhnikov A. In: International conference on learning representations; 2022. https://openreview.net/forum?id=oapKSVM2bcj.

42.

He K, Zhang X, Ren S, Sun J. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. p. 770–8.

43.

Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, et al. Pytorch: an imperative style, high-performance deep learning library. Adv Neural Inf Process Syst. 2019;32:66.

44.

Falcon W, team TPL. Pytorch lightning; 2019. https://doi.org/10.5281/zenodo.3828935. https://www.pytorchlightning.ai

45.

Consortium M. Monai: medical open network for AI; 2020. https://doi.org/10.5281/zenodo.4323058. https://github.com/Project-MONAI/MONAI

46.

Yushkevich PA, Piven J, Cody Hazlett H, Gimpel Smith R, Ho S, Gee JC, Gerig G. User-guided 3D active contour segmentation of anatomical structures: significantly improved efficiency and reliability. Neuroimage. 2006;31(3), 1116–28.

47.

Loshchilov I, Hutter F. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101. 2017.

48.

Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, Burovski E, Peterson P, Weckesser W, Bright J, van der Walt SJ, Brett M, Wilson J, Millman KJ, Mayorov N, Nelson ARJ, Jones E, Kern R, Larson E, Carey CJ, Polat İ, Feng Y, Moore EW, VanderPlas J, Laxalde D, Perktold J, Cimrman R, Henriksen I, Quintero EA, Harris CR, Archibald AM, Ribeiro AH, Pedregosa F, van Mulbregt P SciPy 1.0 Contributors, SciPy 1.0: fundamental algorithms for scientific computing in python. Nat Methods. 2020;17:261–72. https://doi.org/10.1038/s41592-019-0686-2.

Title: Swin Unet3D: a three-dimensional medical image segmentation network combining vision transformer and convolution
Authors: Yimin Cai
Yuqing Long
Zhenggong Han
Mingkun Liu
Yuchen Zheng
Wei Yang
Liming Chen
Publication date: 01-12-2023
Publisher: BioMed Central
Keywords: Addiction
Addiction
Brain Tumor
Published in: BMC Medical Informatics and Decision Making / Issue 1/2023
Electronic ISSN: 1472-6947
DOI: https://doi.org/10.1186/s12911-023-02129-z

Keynote webinar | Spotlight on medication adherence

Springer Medicine

Swin Unet3D: a three-dimensional medical image segmentation network combining vision transformer and convolution

Abstract

Background

Methods

Results

Conclusion

Keynote webinar | Spotlight on medication adherence

Springer Medicine

Abstract

Background

Methods

Results

Conclusion

Please log in to get access to this content

Other articles of this Issue 1/2023

Updating mortality risk estimation in intensive care units from high-dimensional electronic health records with incomplete data

The effect of capacity building evidence-based medicine training on its implementation among healthcare professionals in Southwest Ethiopia: a controlled quasi-experimental outcome evaluation

A deep learning-based automated algorithm for labeling coronary arteries in computed tomography angiography images

Escape to the future – a qualitative study of physicians’ views on the work environment, education, and support in a digital context

In-hospital fall prediction using machine learning algorithms and the Morse fall scale in patients with acute stroke: a nested case-control study

Development, implementation, and evaluation of neonatal thermoregulation decision support web application