Skip to main content
Top
Published in: International Journal of Computer Assisted Radiology and Surgery 11/2021

Open Access 01-11-2021 | Original Article

Self-supervised representation learning for surgical activity recognition

Authors: Daniel Paysan, Luis Haug, Michael Bajka, Markus Oelhafen, Joachim M. Buhmann

Published in: International Journal of Computer Assisted Radiology and Surgery | Issue 11/2021

Login to get access

Abstract

Purpose: Virtual reality-based simulators have the potential to become an essential part of surgical education. To make full use of this potential, they must be able to automatically recognize activities performed by users and assess those. Since annotations of trajectories by human experts are expensive, there is a need for methods that can learn to recognize surgical activities in a data-efficient way. Methods: We use self-supervised training of deep encoder–decoder architectures to learn representations of surgical trajectories from video data. These representations allow for semi-automatic extraction of features that capture information about semantically important events in the trajectories. Such features are processed as inputs of an unsupervised surgical activity recognition pipeline. Results: Our experiments document that the performance of hidden semi-Markov models used for recognizing activities in a simulated myomectomy scenario benefits from using features extracted from representations learned while training a deep encoder–decoder network on the task of predicting the remaining surgery progress. Conclusion: Our work is an important first step in the direction of making efficient use of features obtained from deep representation learning for surgical activity recognition in settings where only a small fraction of the existing data is annotated by human domain experts and where those annotations are potentially incomplete.
Appendix
Available only for authorised users
Literature
1.
go back to reference Ahmidi N, Tao L, Sefati S, Gao Y, Lea C, Haro BB, Zappella L, Khudanpur S, Vidal R, Hager GD (2017) A dataset and benchmarks for segmentation and recognition of gestures in robotic surgery. IEEE Trans Biomed Eng 64(9):2025–2041CrossRef Ahmidi N, Tao L, Sefati S, Gao Y, Lea C, Haro BB, Zappella L, Khudanpur S, Vidal R, Hager GD (2017) A dataset and benchmarks for segmentation and recognition of gestures in robotic surgery. IEEE Trans Biomed Eng 64(9):2025–2041CrossRef
3.
go back to reference Bajka M, Tuchschmid S, Fink D, Székely G, Harders M (2010) Establishing construct validity of a virtual-reality training simulator for hysteroscopy via a multimetric scoring system. Surg Endoscopy 24(1):79CrossRef Bajka M, Tuchschmid S, Fink D, Székely G, Harders M (2010) Establishing construct validity of a virtual-reality training simulator for hysteroscopy via a multimetric scoring system. Surg Endoscopy 24(1):79CrossRef
4.
go back to reference Bjerrum F, Thomsen ASS, Nayahangan LJ, Konge L (2018) Surgical simulation: current practices and future perspectives for technical skills training. Med Teacher 40(7):668–675CrossRef Bjerrum F, Thomsen ASS, Nayahangan LJ, Konge L (2018) Surgical simulation: current practices and future perspectives for technical skills training. Med Teacher 40(7):668–675CrossRef
5.
go back to reference Chen Y, Sun QL, Zhong K (2018) Semi-supervised spatio-temporal cnn for recognition of surgical workflow. EURASIP J Image Video Process 2018(1):1–9CrossRef Chen Y, Sun QL, Zhong K (2018) Semi-supervised spatio-temporal cnn for recognition of surgical workflow. EURASIP J Image Video Process 2018(1):1–9CrossRef
6.
go back to reference Dauphin Y.N., de Vries H, Chung J, Bengio Y (2015) Rmsprop and equilibrated adaptive learning rates for non-convex optimization. CoRR abs/1502.04390 Dauphin Y.N., de Vries H, Chung J, Bengio Y (2015) Rmsprop and equilibrated adaptive learning rates for non-convex optimization. CoRR abs/1502.04390
7.
go back to reference Deng J, Dong W, Socher R, Li L, Kai Li, Li Fei-Fei (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 Deng J, Dong W, Socher R, Li L, Kai Li, Li Fei-Fei (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255
8.
go back to reference DiPietro R, Ahmidi N, Malpani A, Waldram M, Lee GI, Lee MR, Vedula SS, Hager GD (2019) Segmenting and classifying activities in robot-assisted surgery with recurrent neural networks. Int J Comput Assis Radiol Surg 14(11):2005–2020CrossRef DiPietro R, Ahmidi N, Malpani A, Waldram M, Lee GI, Lee MR, Vedula SS, Hager GD (2019) Segmenting and classifying activities in robot-assisted surgery with recurrent neural networks. Int J Comput Assis Radiol Surg 14(11):2005–2020CrossRef
9.
go back to reference DiPietro R, Hager G.D. (2018) Unsupervised learning for surgical motion by learning to predict the future. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 281–288. Springer DiPietro R, Hager G.D. (2018) Unsupervised learning for surgical motion by learning to predict the future. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 281–288. Springer
10.
go back to reference DiPietro R, Hager G.D. (2019) Automated surgical activity recognition with one labeled sequence. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 458–466. Springer DiPietro R, Hager G.D. (2019) Automated surgical activity recognition with one labeled sequence. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 458–466. Springer
11.
go back to reference DiPietro R, Lea C, Malpani A, Ahmidi N, Vedula S.S., Lee G.I., Lee M.R., Hager G.D. (2016) Recognizing surgical activities with recurrent neural networks. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 551–558. Springer DiPietro R, Lea C, Malpani A, Ahmidi N, Vedula S.S., Lee G.I., Lee M.R., Hager G.D. (2016) Recognizing surgical activities with recurrent neural networks. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 551–558. Springer
12.
go back to reference Gong G, Wang X, Mu Y, Tian Q (2020) Learning temporal co-attention models for unsupervised video action localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9819–9828 Gong G, Wang X, Mu Y, Tian Q (2020) Learning temporal co-attention models for unsupervised video action localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9819–9828
13.
go back to reference Killick R, Fearnhead P, Eckley IA (2012) Optimal detection of changepoints with a linear computational cost. J Am Stat Assoc 107(500):1590–1598CrossRef Killick R, Fearnhead P, Eckley IA (2012) Optimal detection of changepoints with a linear computational cost. J Am Stat Assoc 107(500):1590–1598CrossRef
14.
go back to reference Kim D, Cho D, Kweon IS (2019) Self-supervised video representation learning with space-time cubic puzzles. Proc AAAI Conf Artific Intell 33:8545–8552 Kim D, Cho D, Kweon IS (2019) Self-supervised video representation learning with space-time cubic puzzles. Proc AAAI Conf Artific Intell 33:8545–8552
15.
go back to reference Kingma D.P., Ba J (2015) Adam: A method for stochastic optimization. In: Y. Bengio, Y. LeCun (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings Kingma D.P., Ba J (2015) Adam: A method for stochastic optimization. In: Y. Bengio, Y. LeCun (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings
16.
go back to reference Lu, A.X., Kraus, O.Z., Cooper, S., Moses, A.M.: Learning unsupervised feature representations for single cell microscopy images with paired cell inpainting. PLoS computational biology 15(9), e1007348 (2019) Lu, A.X., Kraus, O.Z., Cooper, S., Moses, A.M.: Learning unsupervised feature representations for single cell microscopy images with paired cell inpainting. PLoS computational biology 15(9), e1007348 (2019)
17.
go back to reference Noroozi M, Favaro P (2016) Unsupervised learning of visual representations by solving jigsaw puzzles. In: European Conference on Computer Vision, pp. 69–84. Springer Noroozi M, Favaro P (2016) Unsupervised learning of visual representations by solving jigsaw puzzles. In: European Conference on Computer Vision, pp. 69–84. Springer
18.
go back to reference Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan, G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Kopf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019)Pytorch: An imperative style, high-performance deep learning library. In: H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché Buc, E. Fox, R. Garnett (eds.) Advances in Neural Information Processing Systems 32, pp. 8024–8035 Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan, G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Kopf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019)Pytorch: An imperative style, high-performance deep learning library. In: H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché Buc, E. Fox, R. Garnett (eds.) Advances in Neural Information Processing Systems 32, pp. 8024–8035
19.
go back to reference Srivastava N, Mansimov E, Salakhutdinov R (2015) Unsupervised learning of video representations using lstms. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, p. 843–852 Srivastava N, Mansimov E, Salakhutdinov R (2015) Unsupervised learning of video representations using lstms. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, p. 843–852
20.
go back to reference Twinanda AP, Yengera G, Mutter D, Marescaux J, Padoy N (2018) Rsdnet: learning to predict remaining surgery duration from laparoscopic videos without manual annotations. IEEE Trans Med Imag 38(4):1069–1078CrossRef Twinanda AP, Yengera G, Mutter D, Marescaux J, Padoy N (2018) Rsdnet: learning to predict remaining surgery duration from laparoscopic videos without manual annotations. IEEE Trans Med Imag 38(4):1069–1078CrossRef
21.
go back to reference Yengera G., Mutter D, Marescaux J, Padoy N (2018) Less is more: Surgical phase recognition with less annotations through self-supervised pre-training of CNN-LSTM networks. arXiv preprint arXiv:1805.08569 Yengera G., Mutter D, Marescaux J, Padoy N (2018) Less is more: Surgical phase recognition with less annotations through self-supervised pre-training of CNN-LSTM networks. arXiv preprint arXiv:​1805.​08569
22.
go back to reference Yu T, Mutter D, Marescaux J, Padoy N (2019) Learning from a tiny dataset of manual annotations: a teacher/student approach for surgical phase recognition. In: International Conference on Information Processing in Computer-Assisted Interventions (IPCAI) Yu T, Mutter D, Marescaux J, Padoy N (2019) Learning from a tiny dataset of manual annotations: a teacher/student approach for surgical phase recognition. In: International Conference on Information Processing in Computer-Assisted Interventions (IPCAI)
23.
go back to reference Zhang R, Isola P, Efros A.A. (2017) Split-brain autoencoders: Unsupervised learning by cross-channel prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1058–1067 Zhang R, Isola P, Efros A.A. (2017) Split-brain autoencoders: Unsupervised learning by cross-channel prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1058–1067
Metadata
Title
Self-supervised representation learning for surgical activity recognition
Authors
Daniel Paysan
Luis Haug
Michael Bajka
Markus Oelhafen
Joachim M. Buhmann
Publication date
01-11-2021
Publisher
Springer International Publishing
Published in
International Journal of Computer Assisted Radiology and Surgery / Issue 11/2021
Print ISSN: 1861-6410
Electronic ISSN: 1861-6429
DOI
https://doi.org/10.1007/s11548-021-02493-z

Other articles of this Issue 11/2021

International Journal of Computer Assisted Radiology and Surgery 11/2021 Go to the issue