Skip to main content
Top
Published in: International Journal of Computer Assisted Radiology and Surgery 5/2021

01-05-2021 | Cataract | Original Article

Against spatial–temporal discrepancy: contrastive learning-based network for surgical workflow recognition

Authors: Tong Xia, Fucang Jia

Published in: International Journal of Computer Assisted Radiology and Surgery | Issue 5/2021

Login to get access

Abstract

Purpose

Automatic workflow recognition from surgical videos is fundamental and significant for developing context-aware systems in modern operating rooms. Although many approaches have been proposed to tackle challenges in this complex task, there are still many problems such as the fine-grained characteristics and spatial–temporal discrepancies in surgical videos.

Methods

We propose a contrastive learning-based convolutional recurrent network with multi-level prediction to tackle these problems. Specifically, split-attention blocks are employed to extract spatial features. Through a mapping function in the step-phase branch, the current workflow can be predicted on two mutual-boosting levels. Furthermore, a contrastive branch is introduced to learn the spatial–temporal features that eliminate irrelevant changes in the environment.

Results

We evaluate our method on the Cataract-101 dataset. The results show that our method achieves an accuracy of 96.37% with only surgical step labels, which outperforms other state-of-the-art approaches.

Conclusion

The proposed convolutional recurrent network based on step-phase prediction and contrastive learning can leverage fine-grained characteristics and alleviate spatial–temporal discrepancies to improve the performance of surgical workflow recognition.
Appendix
Available only for authorised users
Literature
1.
go back to reference Cleary K, Kinsella A, Mun SK (2005) Or 2020 workshop report: Operating room of the future. Int Congr Ser 1281:832–838CrossRef Cleary K, Kinsella A, Mun SK (2005) Or 2020 workshop report: Operating room of the future. Int Congr Ser 1281:832–838CrossRef
2.
go back to reference Padoy N (2019) Machine and deep learning for workflow recognition during surgery. Minim Invasive Ther Allied Technol 28(2):82–90CrossRef Padoy N (2019) Machine and deep learning for workflow recognition during surgery. Minim Invasive Ther Allied Technol 28(2):82–90CrossRef
3.
go back to reference Maier-Hein L, Vedula SS, Speidel S, Navab N, Kikinis R, Park A, Eisenmann M, Feussner H, Forestier G, Giannarou S, Hashizume M, Katic D, Kenngott H, Kranzfelder M, Malpani A, März K, Neumuth T, Padoy N, Pugh C, Schoch N, Stoyanov D, Taylor R, Wagner M, Hager GD, Jannin P (2017) Surgical data science for next-generation interventions. Nat Biomed Eng 1(9):691–696CrossRef Maier-Hein L, Vedula SS, Speidel S, Navab N, Kikinis R, Park A, Eisenmann M, Feussner H, Forestier G, Giannarou S, Hashizume M, Katic D, Kenngott H, Kranzfelder M, Malpani A, März K, Neumuth T, Padoy N, Pugh C, Schoch N, Stoyanov D, Taylor R, Wagner M, Hager GD, Jannin P (2017) Surgical data science for next-generation interventions. Nat Biomed Eng 1(9):691–696CrossRef
4.
go back to reference Schoeffmann K, Taschwer M, Sarny S, Münzer B, Primus MJ, Putzgruber D (2018) Cataract-101: video dataset of 101 cataract surgeries. In: Proceedings of the 9th ACM multimedia systems conference, pp 421–425 Schoeffmann K, Taschwer M, Sarny S, Münzer B, Primus MJ, Putzgruber D (2018) Cataract-101: video dataset of 101 cataract surgeries. In: Proceedings of the 9th ACM multimedia systems conference, pp 421–425
5.
go back to reference Loukas C (2018) Video content analysis of surgical procedures. Surg Endosc 32(2):553–568CrossRef Loukas C (2018) Video content analysis of surgical procedures. Surg Endosc 32(2):553–568CrossRef
6.
go back to reference Quellec G, Lamard M, Cochener B, Cazuguel G (2014) Real-time segmentation and recognition of surgical tasks in cataract surgery videos. IEEE Trans Med Imaging 33(12):2352–2360CrossRef Quellec G, Lamard M, Cochener B, Cazuguel G (2014) Real-time segmentation and recognition of surgical tasks in cataract surgery videos. IEEE Trans Med Imaging 33(12):2352–2360CrossRef
7.
go back to reference Twinanda AP, Yengera G, Mutter D, Marescaux J, Padoy N (2019) Rsdnet: Learning to predict remaining surgery duration from laparoscopic videos without manual annotations. IEEE Trans Med Imaging 38(4):1069–1078CrossRef Twinanda AP, Yengera G, Mutter D, Marescaux J, Padoy N (2019) Rsdnet: Learning to predict remaining surgery duration from laparoscopic videos without manual annotations. IEEE Trans Med Imaging 38(4):1069–1078CrossRef
8.
go back to reference Blum T, Feußner H, Navab N (2010) Modeling and segmentation of surgical workflow from laparoscopic video. In: MICCAI. pp. 400-407 Blum T, Feußner H, Navab N (2010) Modeling and segmentation of surgical workflow from laparoscopic video. In: MICCAI. pp. 400-407
9.
go back to reference Twinanda AP, Shehata S, Mutter D, Marescaux J, de Mathelin M, Padoy N (2017) Endonet: A deep architecture for recognition tasks on laparoscopic videos. IEEE Trans Med Imaging 36(1):86–97CrossRef Twinanda AP, Shehata S, Mutter D, Marescaux J, de Mathelin M, Padoy N (2017) Endonet: A deep architecture for recognition tasks on laparoscopic videos. IEEE Trans Med Imaging 36(1):86–97CrossRef
10.
go back to reference Jin Y, Dou Q, Chen H, Yu L, Qin J, Fu C, Heng PA (2018) SV-RCnet: Workflow recognition from surgical videos using recurrent convolutional network. IEEE Trans Med Imaging 37(5):1114–1126CrossRef Jin Y, Dou Q, Chen H, Yu L, Qin J, Fu C, Heng PA (2018) SV-RCnet: Workflow recognition from surgical videos using recurrent convolutional network. IEEE Trans Med Imaging 37(5):1114–1126CrossRef
11.
go back to reference Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef
12.
go back to reference He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR. pp 770–778 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR. pp 770–778
13.
go back to reference Jin Y, Li H, Dou Q, Chen H, Qin J, Fu CW, Heng PA (2020) Multi-task recurrent convolutional network with correlation loss for surgical video analysis. Med Image Anal 59:101572CrossRef Jin Y, Li H, Dou Q, Chen H, Qin J, Fu CW, Heng PA (2020) Multi-task recurrent convolutional network with correlation loss for surgical video analysis. Med Image Anal 59:101572CrossRef
14.
go back to reference Lin TY, RoyChowdhury A, Maji S (2015) Bilinear cnn models for fine-grained visual recognition. In: ICCV. pp 1450–1457 Lin TY, RoyChowdhury A, Maji S (2015) Bilinear cnn models for fine-grained visual recognition. In: ICCV. pp 1450–1457
15.
go back to reference Chen MH, Li B, Bao Y, AlRegib G, Kira Z (2020) Action segmentation with joint self-supervised temporal domain adaptation. In: CVPR. pp 9454–9463 Chen MH, Li B, Bao Y, AlRegib G, Kira Z (2020) Action segmentation with joint self-supervised temporal domain adaptation. In: CVPR. pp 9454–9463
16.
go back to reference Charriere K, Quellec G, Lamard M, Martiano D, Cazuguel G, Coatrieux G, Cochener B (2017) Real-time analysis of cataract surgery videos using statistical models. Multimed Tools Appl 76(21):22473–22491CrossRef Charriere K, Quellec G, Lamard M, Martiano D, Cazuguel G, Coatrieux G, Cochener B (2017) Real-time analysis of cataract surgery videos using statistical models. Multimed Tools Appl 76(21):22473–22491CrossRef
17.
go back to reference Lalys F, Riffaud L, Bouget D, Jannin P (2011) A framework for the recognition of high-level surgical tasks from video images for cataract surgeries. IEEE Trans Biomed Eng 59(4):966–976CrossRef Lalys F, Riffaud L, Bouget D, Jannin P (2011) A framework for the recognition of high-level surgical tasks from video images for cataract surgeries. IEEE Trans Biomed Eng 59(4):966–976CrossRef
18.
19.
go back to reference Schroff F, Kalenichenko D, Philbin J (2015) Facenet: A unified embedding for face recognition and clustering. In: CVPR. pp 815–823 Schroff F, Kalenichenko D, Philbin J (2015) Facenet: A unified embedding for face recognition and clustering. In: CVPR. pp 815–823
20.
go back to reference Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. arXiv preprint. arXiv:2002.05709 Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. arXiv preprint. arXiv:​2002.​05709
21.
go back to reference Zhang H, Wu C, Zhang Z, Zhu Y, Zhang Z, Lin H, Sun Y, He T, Mueller J, Manmatha R, Li M, Smola A (2020) Resnest: Split-attention networks. arXiv preprint. arXiv:2004.08955 Zhang H, Wu C, Zhang Z, Zhu Y, Zhang Z, Lin H, Sun Y, He T, Mueller J, Manmatha R, Li M, Smola A (2020) Resnest: Split-attention networks. arXiv preprint. arXiv:​2004.​08955
22.
go back to reference Lo BPL, Darzi A, Yang GZ (2003) Episode classification for the analysis of tissue/instrument interaction with multiple visual cues. In: MICCAI. pp 230–237 Lo BPL, Darzi A, Yang GZ (2003) Episode classification for the analysis of tissue/instrument interaction with multiple visual cues. In: MICCAI. pp 230–237
23.
go back to reference Deng J, Dong W, Socher R, Li L, Li K, Li F-F (2009) Imagenet: A large-scale hierarchical image database. In: CVPR. pp 248–255 Deng J, Dong W, Socher R, Li L, Li K, Li F-F (2009) Imagenet: A large-scale hierarchical image database. In: CVPR. pp 248–255
24.
go back to reference Qi B, Qin X, Liu J, Xu Y, Chen Y (2019) A deep architecture for surgical workflow recognition with edge information. In: 2019 IEEE international conference on bioinformatics and biomedicine (BIBM), pp 1358–1364 Qi B, Qin X, Liu J, Xu Y, Chen Y (2019) A deep architecture for surgical workflow recognition with edge information. In: 2019 IEEE international conference on bioinformatics and biomedicine (BIBM), pp 1358–1364
Metadata
Title
Against spatial–temporal discrepancy: contrastive learning-based network for surgical workflow recognition
Authors
Tong Xia
Fucang Jia
Publication date
01-05-2021
Publisher
Springer International Publishing
Keyword
Cataract
Published in
International Journal of Computer Assisted Radiology and Surgery / Issue 5/2021
Print ISSN: 1861-6410
Electronic ISSN: 1861-6429
DOI
https://doi.org/10.1007/s11548-021-02382-5

Other articles of this Issue 5/2021

International Journal of Computer Assisted Radiology and Surgery 5/2021 Go to the issue