Top

International Journal of Computer Assisted Radiology and Surgery

Published in:

Open Access 01-11-2021 | Original Article

Self-supervised representation learning for surgical activity recognition

Authors: Daniel Paysan, Luis Haug, Michael Bajka, Markus Oelhafen, Joachim M. Buhmann

Published in: International Journal of Computer Assisted Radiology and Surgery | Issue 11/2021

Abstract

Purpose: Virtual reality-based simulators have the potential to become an essential part of surgical education. To make full use of this potential, they must be able to automatically recognize activities performed by users and assess those. Since annotations of trajectories by human experts are expensive, there is a need for methods that can learn to recognize surgical activities in a data-efficient way. Methods: We use self-supervised training of deep encoder–decoder architectures to learn representations of surgical trajectories from video data. These representations allow for semi-automatic extraction of features that capture information about semantically important events in the trajectories. Such features are processed as inputs of an unsupervised surgical activity recognition pipeline. Results: Our experiments document that the performance of hidden semi-Markov models used for recognizing activities in a simulated myomectomy scenario benefits from using features extracted from representations learned while training a deep encoder–decoder network on the task of predicting the remaining surgery progress. Conclusion: Our work is an important first step in the direction of making efficient use of features obtained from deep representation learning for surgical activity recognition in settings where only a small fraction of the existing data is annotated by human domain experts and where those annotations are potentially incomplete.

Available only for authorised users

Ahmidi N, Tao L, Sefati S, Gao Y, Lea C, Haro BB, Zappella L, Khudanpur S, Vidal R, Hager GD (2017) A dataset and benchmarks for segmentation and recognition of gestures in robotic surgery. IEEE Trans Biomed Eng 64(9):2025–2041CrossRef

Arlot S, Celisse A, Harchaoui Z (2019). A kernel multiple change-point algorithm via model selection. Journal of Machine Learning Research 20(162), 1–56 http://jmlr.org/papers/v20/16-155.html

Bajka M, Tuchschmid S, Fink D, Székely G, Harders M (2010) Establishing construct validity of a virtual-reality training simulator for hysteroscopy via a multimetric scoring system. Surg Endoscopy 24(1):79CrossRef

Bjerrum F, Thomsen ASS, Nayahangan LJ, Konge L (2018) Surgical simulation: current practices and future perspectives for technical skills training. Med Teacher 40(7):668–675CrossRef

Chen Y, Sun QL, Zhong K (2018) Semi-supervised spatio-temporal cnn for recognition of surgical workflow. EURASIP J Image Video Process 2018(1):1–9CrossRef

Dauphin Y.N., de Vries H, Chung J, Bengio Y (2015) Rmsprop and equilibrated adaptive learning rates for non-convex optimization. CoRR abs/1502.04390

Deng J, Dong W, Socher R, Li L, Kai Li, Li Fei-Fei (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255

DiPietro R, Ahmidi N, Malpani A, Waldram M, Lee GI, Lee MR, Vedula SS, Hager GD (2019) Segmenting and classifying activities in robot-assisted surgery with recurrent neural networks. Int J Comput Assis Radiol Surg 14(11):2005–2020CrossRef

DiPietro R, Hager G.D. (2018) Unsupervised learning for surgical motion by learning to predict the future. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 281–288. Springer

10.

DiPietro R, Hager G.D. (2019) Automated surgical activity recognition with one labeled sequence. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 458–466. Springer

11.

DiPietro R, Lea C, Malpani A, Ahmidi N, Vedula S.S., Lee G.I., Lee M.R., Hager G.D. (2016) Recognizing surgical activities with recurrent neural networks. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 551–558. Springer

12.

Gong G, Wang X, Mu Y, Tian Q (2020) Learning temporal co-attention models for unsupervised video action localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9819–9828

13.

Killick R, Fearnhead P, Eckley IA (2012) Optimal detection of changepoints with a linear computational cost. J Am Stat Assoc 107(500):1590–1598CrossRef

14.

Kim D, Cho D, Kweon IS (2019) Self-supervised video representation learning with space-time cubic puzzles. Proc AAAI Conf Artific Intell 33:8545–8552

15.

Kingma D.P., Ba J (2015) Adam: A method for stochastic optimization. In: Y. Bengio, Y. LeCun (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings

16.

Lu, A.X., Kraus, O.Z., Cooper, S., Moses, A.M.: Learning unsupervised feature representations for single cell microscopy images with paired cell inpainting. PLoS computational biology 15(9), e1007348 (2019)

17.

Noroozi M, Favaro P (2016) Unsupervised learning of visual representations by solving jigsaw puzzles. In: European Conference on Computer Vision, pp. 69–84. Springer

18.

Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan, G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Kopf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019)Pytorch: An imperative style, high-performance deep learning library. In: H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché Buc, E. Fox, R. Garnett (eds.) Advances in Neural Information Processing Systems 32, pp. 8024–8035

19.

Srivastava N, Mansimov E, Salakhutdinov R (2015) Unsupervised learning of video representations using lstms. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, p. 843–852

20.

Twinanda AP, Yengera G, Mutter D, Marescaux J, Padoy N (2018) Rsdnet: learning to predict remaining surgery duration from laparoscopic videos without manual annotations. IEEE Trans Med Imag 38(4):1069–1078CrossRef

21.

Yengera G., Mutter D, Marescaux J, Padoy N (2018) Less is more: Surgical phase recognition with less annotations through self-supervised pre-training of CNN-LSTM networks. arXiv preprint arXiv:1805.08569

22.

Yu T, Mutter D, Marescaux J, Padoy N (2019) Learning from a tiny dataset of manual annotations: a teacher/student approach for surgical phase recognition. In: International Conference on Information Processing in Computer-Assisted Interventions (IPCAI)

23.

Zhang R, Isola P, Efros A.A. (2017) Split-brain autoencoders: Unsupervised learning by cross-channel prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1058–1067

Title: Self-supervised representation learning for surgical activity recognition
Authors: Daniel Paysan
Luis Haug
Michael Bajka
Markus Oelhafen
Joachim M. Buhmann
Publication date: 01-11-2021
Publisher: Springer International Publishing
Published in: International Journal of Computer Assisted Radiology and Surgery / Issue 11/2021
Print ISSN: 1861-6410
Electronic ISSN: 1861-6429
DOI: https://doi.org/10.1007/s11548-021-02493-z

At a glance: The ONWARDS insulin icodec trials

Springer Medicine

Self-supervised representation learning for surgical activity recognition

Abstract

At a glance: The ONWARDS insulin icodec trials

Springer Medicine

Abstract

Please log in to get access to this content

Other articles of this Issue 11/2021

Classification of large-scale image database of various skin diseases using deep learning

Surgical workflow recognition with 3DCNN for Sleeve Gastrectomy

Good and bad boundaries in ultrasound compounding: preserving anatomic boundaries while suppressing artifacts

Co-occurrence balanced time series classification for the semi-supervised recognition of surgical smoke

Evaluation of ultrasonic fibrosis diagnostic system using convolutional network for ordinal regression

Automated segmentation of an intensity calibration phantom in clinical CT images using a convolutional neural network