Top

International Journal of Computer Assisted Radiology and Surgery

Published in:

01-05-2021 | Cataract | Original Article

Against spatial–temporal discrepancy: contrastive learning-based network for surgical workflow recognition

Authors: Tong Xia, Fucang Jia

Published in: International Journal of Computer Assisted Radiology and Surgery | Issue 5/2021

Abstract

Purpose

Automatic workflow recognition from surgical videos is fundamental and significant for developing context-aware systems in modern operating rooms. Although many approaches have been proposed to tackle challenges in this complex task, there are still many problems such as the fine-grained characteristics and spatial–temporal discrepancies in surgical videos.

Methods

We propose a contrastive learning-based convolutional recurrent network with multi-level prediction to tackle these problems. Specifically, split-attention blocks are employed to extract spatial features. Through a mapping function in the step-phase branch, the current workflow can be predicted on two mutual-boosting levels. Furthermore, a contrastive branch is introduced to learn the spatial–temporal features that eliminate irrelevant changes in the environment.

Results

We evaluate our method on the Cataract-101 dataset. The results show that our method achieves an accuracy of 96.37% with only surgical step labels, which outperforms other state-of-the-art approaches.

Conclusion

The proposed convolutional recurrent network based on step-phase prediction and contrastive learning can leverage fine-grained characteristics and alleviate spatial–temporal discrepancies to improve the performance of surgical workflow recognition.

Available only for authorised users

Cleary K, Kinsella A, Mun SK (2005) Or 2020 workshop report: Operating room of the future. Int Congr Ser 1281:832–838CrossRef

Padoy N (2019) Machine and deep learning for workflow recognition during surgery. Minim Invasive Ther Allied Technol 28(2):82–90CrossRef

Maier-Hein L, Vedula SS, Speidel S, Navab N, Kikinis R, Park A, Eisenmann M, Feussner H, Forestier G, Giannarou S, Hashizume M, Katic D, Kenngott H, Kranzfelder M, Malpani A, März K, Neumuth T, Padoy N, Pugh C, Schoch N, Stoyanov D, Taylor R, Wagner M, Hager GD, Jannin P (2017) Surgical data science for next-generation interventions. Nat Biomed Eng 1(9):691–696CrossRef

Schoeffmann K, Taschwer M, Sarny S, Münzer B, Primus MJ, Putzgruber D (2018) Cataract-101: video dataset of 101 cataract surgeries. In: Proceedings of the 9th ACM multimedia systems conference, pp 421–425

Loukas C (2018) Video content analysis of surgical procedures. Surg Endosc 32(2):553–568CrossRef

Quellec G, Lamard M, Cochener B, Cazuguel G (2014) Real-time segmentation and recognition of surgical tasks in cataract surgery videos. IEEE Trans Med Imaging 33(12):2352–2360CrossRef

Twinanda AP, Yengera G, Mutter D, Marescaux J, Padoy N (2019) Rsdnet: Learning to predict remaining surgery duration from laparoscopic videos without manual annotations. IEEE Trans Med Imaging 38(4):1069–1078CrossRef

Blum T, Feußner H, Navab N (2010) Modeling and segmentation of surgical workflow from laparoscopic video. In: MICCAI. pp. 400-407

Twinanda AP, Shehata S, Mutter D, Marescaux J, de Mathelin M, Padoy N (2017) Endonet: A deep architecture for recognition tasks on laparoscopic videos. IEEE Trans Med Imaging 36(1):86–97CrossRef

10.

Jin Y, Dou Q, Chen H, Yu L, Qin J, Fu C, Heng PA (2018) SV-RCnet: Workflow recognition from surgical videos using recurrent convolutional network. IEEE Trans Med Imaging 37(5):1114–1126CrossRef

11.

Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef

12.

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR. pp 770–778

13.

Jin Y, Li H, Dou Q, Chen H, Qin J, Fu CW, Heng PA (2020) Multi-task recurrent convolutional network with correlation loss for surgical video analysis. Med Image Anal 59:101572CrossRef

14.

Lin TY, RoyChowdhury A, Maji S (2015) Bilinear cnn models for fine-grained visual recognition. In: ICCV. pp 1450–1457

15.

Chen MH, Li B, Bao Y, AlRegib G, Kira Z (2020) Action segmentation with joint self-supervised temporal domain adaptation. In: CVPR. pp 9454–9463

16.

Charriere K, Quellec G, Lamard M, Martiano D, Cazuguel G, Coatrieux G, Cochener B (2017) Real-time analysis of cataract surgery videos using statistical models. Multimed Tools Appl 76(21):22473–22491CrossRef

17.

Lalys F, Riffaud L, Bouget D, Jannin P (2011) A framework for the recognition of high-level surgical tasks from video images for cataract surgeries. IEEE Trans Biomed Eng 59(4):966–976CrossRef

18.

van den Oord A, Li Y, Vinyals O (2018) Representation learning with contrastive predictive coding. arXiv reprint. arXiv: 1807.03748

19.

Schroff F, Kalenichenko D, Philbin J (2015) Facenet: A unified embedding for face recognition and clustering. In: CVPR. pp 815–823

20.

Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. arXiv preprint. arXiv:2002.05709

21.

Zhang H, Wu C, Zhang Z, Zhu Y, Zhang Z, Lin H, Sun Y, He T, Mueller J, Manmatha R, Li M, Smola A (2020) Resnest: Split-attention networks. arXiv preprint. arXiv:2004.08955

22.

Lo BPL, Darzi A, Yang GZ (2003) Episode classification for the analysis of tissue/instrument interaction with multiple visual cues. In: MICCAI. pp 230–237

23.

Deng J, Dong W, Socher R, Li L, Li K, Li F-F (2009) Imagenet: A large-scale hierarchical image database. In: CVPR. pp 248–255

24.

Qi B, Qin X, Liu J, Xu Y, Chen Y (2019) A deep architecture for surgical workflow recognition with edge information. In: 2019 IEEE international conference on bioinformatics and biomedicine (BIBM), pp 1358–1364

Title: Against spatial–temporal discrepancy: contrastive learning-based network for surgical workflow recognition
Authors: Tong Xia
Fucang Jia
Publication date: 01-05-2021
Publisher: Springer International Publishing
Keyword: Cataract
Published in: International Journal of Computer Assisted Radiology and Surgery / Issue 5/2021
Print ISSN: 1861-6410
Electronic ISSN: 1861-6429
DOI: https://doi.org/10.1007/s11548-021-02382-5

At a glance: The STEP trials

Springer Medicine

Against spatial–temporal discrepancy: contrastive learning-based network for surgical workflow recognition

Abstract

Purpose

Methods

Results

Conclusion

At a glance: The STEP trials

Springer Medicine

Abstract

Purpose

Methods

Results

Conclusion

Please log in to get access to this content

Other articles of this Issue 5/2021

Automatic extraction of the mitral valve chordae geometry for biomechanical simulation

Cross-modal self-supervised representation learning for gesture and skill recognition in robotic surgery

Fast and robust localization of surgical array using Kalman filter

Domain adaptation and self-supervised learning for surgical margin detection

Towards markerless surgical tool and hand pose estimation

Deep learning to segment pelvic bones: large-scale CT datasets and baseline models