Skip to main content
Top
Published in: International Journal of Computer Assisted Radiology and Surgery 5/2021

Open Access 01-05-2021 | Original Article

Simulation-to-real domain adaptation with teacher–student learning for endoscopic instrument segmentation

Authors: Manish Sahu, Anirban Mukhopadhyay, Stefan Zachow

Published in: International Journal of Computer Assisted Radiology and Surgery | Issue 5/2021

Login to get access

Abstract

Purpose

Segmentation of surgical instruments in endoscopic video streams is essential for automated surgical scene understanding and process modeling. However, relying on fully supervised deep learning for this task is challenging because manual annotation occupies valuable time of the clinical experts.

Methods

We introduce a teacher–student learning approach that learns jointly from annotated simulation data and unlabeled real data to tackle the challenges in simulation-to-real unsupervised domain adaptation for endoscopic image segmentation.

Results

Empirical results on three datasets highlight the effectiveness of the proposed framework over current approaches for the endoscopic instrument segmentation task. Additionally, we provide analysis of major factors affecting the performance on all datasets to highlight the strengths and failure modes of our approach.

Conclusions

We show that our proposed approach can successfully exploit the unlabeled real endoscopic video frames and improve generalization performance over pure simulation-based training and the previous state-of-the-art. This takes us one step closer to effective segmentation of surgical instrument in the annotation scarce setting.
Footnotes
1
EndoVis Sub-challenges—2015, 2017, 2018, 2019 [https://​endovis.​grand-challenge.​org].
 
2
pixel-intensity: random brightness and contrast shift, posterisation, solarisation, random gamma shift, random HSV color space shift, histogram equalization and contrast limited adaptive histogram equalization.
 
3
pixel-corruption: gaussian noise, motion blurring, image compression, dropout, random fog simulation and image embossing.
 
Literature
2.
go back to reference Ali S, Zhou F, Braden B, Bailey A, Yang S, Cheng G, Zhang P, Li X, Kayser M, Soberanis-Mukul RD, Albarqouni S, Wang X, Wang C, Watanabe S, Oksuz I, Ning Q, Yang S, Khan MA, Gao XW, Realdon S, Loshchenov M, Schnabel JA, East JE, Wagnieres G, Loschenov VB, Grisan E, Daul C, Blondel W, Rittscher J (2020) An objective comparison of detection and segmentation algorithms for artefacts in clinical endoscopy. Sci Rep 10(1):1–15CrossRef Ali S, Zhou F, Braden B, Bailey A, Yang S, Cheng G, Zhang P, Li X, Kayser M, Soberanis-Mukul RD, Albarqouni S, Wang X, Wang C, Watanabe S, Oksuz I, Ning Q, Yang S, Khan MA, Gao XW, Realdon S, Loshchenov M, Schnabel JA, East JE, Wagnieres G, Loschenov VB, Grisan E, Daul C, Blondel W, Rittscher J (2020) An objective comparison of detection and segmentation algorithms for artefacts in clinical endoscopy. Sci Rep 10(1):1–15CrossRef
3.
go back to reference Allan M, Ourselin S, Hawkes DJ, Kelly JD, Stoyanov D (2018) 3-D pose estimation of articulated instruments in robotic minimally invasive surgery. IEEE Trans Med Imaging 37(5):1204–1213CrossRef Allan M, Ourselin S, Hawkes DJ, Kelly JD, Stoyanov D (2018) 3-D pose estimation of articulated instruments in robotic minimally invasive surgery. IEEE Trans Med Imaging 37(5):1204–1213CrossRef
4.
go back to reference Baker N, Lu H, Erlikhman G, Kellman PJ (2018) Deep convolutional networks do not classify based on global object shape. PLoS Comput Biol 14(12):e1006613CrossRef Baker N, Lu H, Erlikhman G, Kellman PJ (2018) Deep convolutional networks do not classify based on global object shape. PLoS Comput Biol 14(12):e1006613CrossRef
5.
go back to reference Bodenstedt S, Allan M, Agustinos A, Du X, Garcia-Peraza-Herrera L, Kenngott H, Kurmann T, Müller-Stich B, Ourselin S, Pakhomov D, Sznitman R, Teichmann M, Thoma M, Vercauteren T, Voros S, Wagner M, Wochner P, Maier-Hein L, Stoyanov D, Speidel S (2018) Comparative evaluation of instrument segmentation and tracking methods in minimally invasive surgery. arXiv:1805.02475 Bodenstedt S, Allan M, Agustinos A, Du X, Garcia-Peraza-Herrera L, Kenngott H, Kurmann T, Müller-Stich B, Ourselin S, Pakhomov D, Sznitman R, Teichmann M, Thoma M, Vercauteren T, Voros S, Wagner M, Wochner P, Maier-Hein L, Stoyanov D, Speidel S (2018) Comparative evaluation of instrument segmentation and tracking methods in minimally invasive surgery. arXiv:​1805.​02475
6.
go back to reference Chapelle O, Scholkopf B, Zien A (2009) Semi-supervised learning. IEEE Trans Neural Netw 20(3):542CrossRef Chapelle O, Scholkopf B, Zien A (2009) Semi-supervised learning. IEEE Trans Neural Netw 20(3):542CrossRef
7.
go back to reference Colleoni E, Edwards P, Stoyanov D (2020) Synthetic and real inputs for tool segmentation in robotic surgery. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 700–710 Colleoni E, Edwards P, Stoyanov D (2020) Synthetic and real inputs for tool segmentation in robotic surgery. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 700–710
8.
go back to reference Du X, Kurmann T, Chang PL, Allan M, Ourselin S, Sznitman R, Kelly JD, Stoyanov D (2018) Articulated multi-instrument 2-d pose estimation using fully convolutional networks. IEEE Trans Med Imaging 37(5):1276–1287CrossRef Du X, Kurmann T, Chang PL, Allan M, Ourselin S, Sznitman R, Kelly JD, Stoyanov D (2018) Articulated multi-instrument 2-d pose estimation using fully convolutional networks. IEEE Trans Med Imaging 37(5):1276–1287CrossRef
9.
go back to reference Engelhardt S, De Simone R, Full PM, Karck M, Wolf I (2018) Improving surgical training phantoms by hyperrealism: deep unpaired image-to-image translation from real surgeries. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 747–755 Engelhardt S, De Simone R, Full PM, Karck M, Wolf I (2018) Improving surgical training phantoms by hyperrealism: deep unpaired image-to-image translation from real surgeries. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 747–755
10.
go back to reference Engelhardt S, Sharan L, Karck M, De Simone R, Wolf I (2019) Cross-domain conditional generative adversarial networks for stereoscopic hyperrealism in surgical training. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 155–163 Engelhardt S, Sharan L, Karck M, De Simone R, Wolf I (2019) Cross-domain conditional generative adversarial networks for stereoscopic hyperrealism in surgical training. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 155–163
11.
go back to reference French G, Mackiewicz M, Fisher M (2018) Self-ensembling for visual domain adaptation. In: International conference on learning representations French G, Mackiewicz M, Fisher M (2018) Self-ensembling for visual domain adaptation. In: International conference on learning representations
13.
go back to reference González C, Bravo-Sánchez L, Arbelaez P (2020) Isinet: an instance-based approach for surgical instrument segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 595–605 González C, Bravo-Sánchez L, Arbelaez P (2020) Isinet: an instance-based approach for surgical instrument segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 595–605
14.
15.
go back to reference Jin Y, Cheng K, Dou Q, Heng PA (2019) Incorporating temporal prior from motion flow for instrument segmentation in minimally invasive surgery video. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 440–448 Jin Y, Cheng K, Dou Q, Heng PA (2019) Incorporating temporal prior from motion flow for instrument segmentation in minimally invasive surgery video. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 440–448
16.
go back to reference Laina I, Rieke N, Rupprecht C, Vizcaíno JP, Eslami A, Tombari F, Navab N (2017) Concurrent segmentation and localization for tracking of surgical instruments. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 664–672 Laina I, Rieke N, Rupprecht C, Vizcaíno JP, Eslami A, Tombari F, Navab N (2017) Concurrent segmentation and localization for tracking of surgical instruments. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 664–672
17.
go back to reference Laine S, Aila T (2017) Temporal ensembling for semi-supervised learning. In: International conference on learning representations Laine S, Aila T (2017) Temporal ensembling for semi-supervised learning. In: International conference on learning representations
18.
go back to reference Liu D, Wei Y, Jiang T, Wang Y, Miao R, Shan F, Li Z (2020) Unsupervised surgical instrument segmentation via anchor generation and semantic diffusion. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 657–667 Liu D, Wei Y, Jiang T, Wang Y, Miao R, Shan F, Li Z (2020) Unsupervised surgical instrument segmentation via anchor generation and semantic diffusion. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 657–667
19.
go back to reference Luengo I, Flouty E, Giataganas P, Wisanuvej P, Nehme J, Stoyanov D (2018) Surreal: enhancing surgical simulation realism using style transfer. In: British machine vision conference 2018, BMVC 2018, BMVA, pp 1–12 Luengo I, Flouty E, Giataganas P, Wisanuvej P, Nehme J, Stoyanov D (2018) Surreal: enhancing surgical simulation realism using style transfer. In: British machine vision conference 2018, BMVC 2018, BMVA, pp 1–12
20.
go back to reference Mahmood F, Chen R, Durr NJ (2018) Unsupervised reverse domain adaptation for synthetic medical images via adversarial training. IEEE Trans Med Imaging 37(12):2572–2581CrossRef Mahmood F, Chen R, Durr NJ (2018) Unsupervised reverse domain adaptation for synthetic medical images via adversarial training. IEEE Trans Med Imaging 37(12):2572–2581CrossRef
21.
go back to reference Marzullo A, Moccia S, Catellani M, Calimeri F, De Momi E (2020) Towards realistic laparoscopic image generation using image-domain translation. Comput Methods Programs Biomed 200:105834CrossRef Marzullo A, Moccia S, Catellani M, Calimeri F, De Momi E (2020) Towards realistic laparoscopic image generation using image-domain translation. Comput Methods Programs Biomed 200:105834CrossRef
22.
go back to reference Oda M, Tanaka K, Takabatake H, Mori M, Natori H, Mori K (2019) Realistic endoscopic image generation method using virtual-to-real image-domain translation. Healthc Technol Lett 6(6):214–219CrossRef Oda M, Tanaka K, Takabatake H, Mori M, Natori H, Mori K (2019) Realistic endoscopic image generation method using virtual-to-real image-domain translation. Healthc Technol Lett 6(6):214–219CrossRef
23.
go back to reference Pfeiffer M, Funke I, Robu MR, Bodenstedt S, Strenger L, Engelhardt S, Roß T, Clarkson MJ, Gurusamy K, Davidson BR, Maier-Hein L, Riediger C, Welsch T, Weitz J, Speidel S (2019) Generating large labeled data sets for laparoscopic image processing tasks using unpaired image-to-image translation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 119–127 Pfeiffer M, Funke I, Robu MR, Bodenstedt S, Strenger L, Engelhardt S, Roß T, Clarkson MJ, Gurusamy K, Davidson BR, Maier-Hein L, Riediger C, Welsch T, Weitz J, Speidel S (2019) Generating large labeled data sets for laparoscopic image processing tasks using unpaired image-to-image translation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 119–127
24.
go back to reference Rau A, Edwards PE, Ahmad OF, Riordan P, Janatka M, Lovat LB, Stoyanov D (2019) Implicit domain adaptation with conditional generative adversarial networks for depth prediction in endoscopy. Int J Comput Assist Radiol Surg 14(7):1167–1176CrossRef Rau A, Edwards PE, Ahmad OF, Riordan P, Janatka M, Lovat LB, Stoyanov D (2019) Implicit domain adaptation with conditional generative adversarial networks for depth prediction in endoscopy. Int J Comput Assist Radiol Surg 14(7):1167–1176CrossRef
25.
go back to reference Ross T, Zimmerer D, Vemuri A, Isensee F, Wiesenfarth M, Bodenstedt S, Both F, Kessler P, Wagner M, Müller B, Kenngott H, Speidel S, Kopp-Schneider A, Maier-Hein K, Maier-Hein L (2018) Exploiting the potential of unlabeled endoscopic video data with self-supervised learning. Int J Comput Assist Radiol Surg 13(6):925–933CrossRef Ross T, Zimmerer D, Vemuri A, Isensee F, Wiesenfarth M, Bodenstedt S, Both F, Kessler P, Wagner M, Müller B, Kenngott H, Speidel S, Kopp-Schneider A, Maier-Hein K, Maier-Hein L (2018) Exploiting the potential of unlabeled endoscopic video data with self-supervised learning. Int J Comput Assist Radiol Surg 13(6):925–933CrossRef
26.
go back to reference Ross T, Reinke A, Full PM, Wagner M, Kenngott H, Apitz M, Hempe H, Mindroc Filimon D, Scholz P, Nuong Tran T, Bruno P, Arbeláez P, Bian GB, Bodenstedt S, Lindström Bolmgren J, Bravo-Sánchez L, Chen HB, González C, Guo D, Halvorsen P, Heng PA, Hosgor E, Hou ZG, Isensee F, Jha D, Jiang T, Jin Y, Kirtac K, Kletz S, Leger S, Li Z, Maier-Hein KH, Ni ZL, Riegler MA, Schoeffmann K, Shi R, Speidel S, Stenzel M, Twick I, Wang G, Wang J, Wang L, Wang L, Zhang Y, Zhou YJ, Zhu L, Wiesenfarth M, Kopp-Schneider A, Müller-Stich BP, Maier-Hein L (2020) Robust medical instrument segmentation challenge 2019. arXiv:2003.10299 Ross T, Reinke A, Full PM, Wagner M, Kenngott H, Apitz M, Hempe H, Mindroc Filimon D, Scholz P, Nuong Tran T, Bruno P, Arbeláez P, Bian GB, Bodenstedt S, Lindström Bolmgren J, Bravo-Sánchez L, Chen HB, González C, Guo D, Halvorsen P, Heng PA, Hosgor E, Hou ZG, Isensee F, Jha D, Jiang T, Jin Y, Kirtac K, Kletz S, Leger S, Li Z, Maier-Hein KH, Ni ZL, Riegler MA, Schoeffmann K, Shi R, Speidel S, Stenzel M, Twick I, Wang G, Wang J, Wang L, Wang L, Zhang Y, Zhou YJ, Zhu L, Wiesenfarth M, Kopp-Schneider A, Müller-Stich BP, Maier-Hein L (2020) Robust medical instrument segmentation challenge 2019. arXiv:​2003.​10299
27.
go back to reference Sahu M, Strömsdörfer R, Mukhopadhyay A, Zachow S (2020) Endo-sim2real: consistency learning-based domain adaptation for instrument segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 784–794 Sahu M, Strömsdörfer R, Mukhopadhyay A, Zachow S (2020) Endo-sim2real: consistency learning-based domain adaptation for instrument segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 784–794
28.
go back to reference Shvets AA, Rakhlin A, Kalinin AA, Iglovikov VI (2018) Automatic instrument segmentation in robot-assisted surgery using deep learning. In: 2018 17th IEEE international conference on machine learning and applications (ICMLA). IEEE, pp 624–628 Shvets AA, Rakhlin A, Kalinin AA, Iglovikov VI (2018) Automatic instrument segmentation in robot-assisted surgery using deep learning. In: 2018 17th IEEE international conference on machine learning and applications (ICMLA). IEEE, pp 624–628
29.
go back to reference Tarvainen A, Valpola H (2017) Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In: Advances in neural information processing systems, pp 1195–1204 Tarvainen A, Valpola H (2017) Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In: Advances in neural information processing systems, pp 1195–1204
30.
go back to reference Torralba A, Efros AA (2011) Unbiased look at dataset bias. In: CVPR 2011. IEEE, pp 1521–1528 Torralba A, Efros AA (2011) Unbiased look at dataset bias. In: CVPR 2011. IEEE, pp 1521–1528
31.
go back to reference Twinanda AP, Shehata S, Mutter D, Marescaux J, De Mathelin M, Padoy N (2016) Endonet: a deep architecture for recognition tasks on laparoscopic videos. IEEE Trans Med Imaging 36(1):86–97CrossRef Twinanda AP, Shehata S, Mutter D, Marescaux J, De Mathelin M, Padoy N (2016) Endonet: a deep architecture for recognition tasks on laparoscopic videos. IEEE Trans Med Imaging 36(1):86–97CrossRef
33.
go back to reference Wang M, Deng W (2018) Deep visual domain adaptation: a survey. Neurocomputing 312:135–153CrossRef Wang M, Deng W (2018) Deep visual domain adaptation: a survey. Neurocomputing 312:135–153CrossRef
34.
go back to reference Wilson G, Cook DJ (2020) A survey of unsupervised deep domain adaptation. ACM Trans Intell Syst Technol (TIST) 11(5):1–46CrossRef Wilson G, Cook DJ (2020) A survey of unsupervised deep domain adaptation. ACM Trans Intell Syst Technol (TIST) 11(5):1–46CrossRef
35.
go back to reference Zhang Y, David P, Gong B (2017) Curriculum domain adaptation for semantic segmentation of urban scenes. In: Proceedings of the IEEE international conference on computer vision, pp 2020–2030 Zhang Y, David P, Gong B (2017) Curriculum domain adaptation for semantic segmentation of urban scenes. In: Proceedings of the IEEE international conference on computer vision, pp 2020–2030
Metadata
Title
Simulation-to-real domain adaptation with teacher–student learning for endoscopic instrument segmentation
Authors
Manish Sahu
Anirban Mukhopadhyay
Stefan Zachow
Publication date
01-05-2021
Publisher
Springer International Publishing
Published in
International Journal of Computer Assisted Radiology and Surgery / Issue 5/2021
Print ISSN: 1861-6410
Electronic ISSN: 1861-6429
DOI
https://doi.org/10.1007/s11548-021-02383-4

Other articles of this Issue 5/2021

International Journal of Computer Assisted Radiology and Surgery 5/2021 Go to the issue