Surgical Gesture Classification from Video Data

Béjar Haro, Benjamín; Zappella, Luca; Vidal, René

doi:10.1007/978-3-642-33415-3_5

Benjamín Béjar Haro¹⁹,
Luca Zappella¹⁹ &
René Vidal¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7510))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

6131 Accesses
26 Citations

Abstract

Much of the existing work on automatic classification of gestures and skill in robotic surgery is based on kinematic and dynamic cues, such as time to completion, speed, forces, torque, or robot trajectories. In this paper we show that in a typical surgical training setup, video data can be equally discriminative. To that end, we propose and evaluate three approaches to surgical gesture classification from video. In the first one, we model each video clip from each surgical gesture as the output of a linear dynamical system (LDS) and use metrics in the space of LDSs to classify new video clips. In the second one, we use spatio-temporal features extracted from each video clip to learn a dictionary of spatio-temporal words and use a bag-of-features (BoF) approach to classify new video clips. In the third approach, we use multiple kernel learning to combine the LDS and BoF approaches. Our experiments show that methods based on video data perform equally well as the state-of-the-art approaches based on kinematic data.

Download to read the full chapter text

Chapter PDF

Cross-modal self-supervised representation learning for gesture and skill recognition in robotic surgery

Article 24 March 2021

Using 3D Convolutional Neural Networks to Learn Spatiotemporal Features for Automatic Surgical Gesture Recognition in Video

One-Shot-Learning Gesture Recognition Using HOG-HOF Features

Keywords

References

Rosen, J., Solazzo, M., Hannaford, B., Sinanan, M.: Task decomposition of laparo-scopic surgery for objective evaluation of surgical residents’ learning curve using hidden Markov model. Computer Aided Surgery 7(1), 49–61 (2002)
Article Google Scholar
McKenzie, C., Ibbotson, J., Cao, C., Lomax, A.: Hierarchical decomposition of laparoscopic surgery: A human factors approach to investigating the operating room environment. Journal of Minimally Invasive Therapy and Allied Technologies 10(3), 121–127 (2001)
Article Google Scholar
Reiley, C.E., Lin, H.C., Varadarajan, B., Vagolgyi, B., Khudanpur, S., Yuh, D.D., Hager, G.D.: Automatic recognition of surgical motions using statistical modeling for capturing variability. In: Medicine Meets Virtual Reality, pp. 396–401 (2008)
Google Scholar
Dosis, A., Bello, F., Gillies, D., Undre, S., Aggarwal, R., Darzi, A.: Laparoscopic task recognition using hidden Markov models. Studies in Health Technology and Informatics 111, 115–122 (2005)
Google Scholar
Reiley, C.E., Hager, G.D.: Task versus Subtask Surgical Skill Evaluation of Robotic Minimally Invasive Surgery. In: Yang, G.-Z., Hawkes, D., Rueckert, D., Noble, A., Taylor, C. (eds.) MICCAI 2009, Part I. LNCS, vol. 5761, pp. 435–442. Springer, Heidelberg (2009)
Chapter Google Scholar
Varadarajan, B.: Learning and inference algorithms for dynamical system models of dextrous motion. PhD thesis, Johns Hopkins University (2011)
Google Scholar
Varadarajan, B., Reiley, C., Lin, H., Khudanpur, S., Hager, G.: Data-Derived Models for Segmentation with Application to Surgical Assessment and Training. In: Yang, G.-Z., Hawkes, D., Rueckert, D., Noble, A., Taylor, C. (eds.) MICCAI 2009, Part I. LNCS, vol. 5761, pp. 426–434. Springer, Heidelberg (2009)
Chapter Google Scholar
Leong, J.J.H., Nicolaou, M., Atallah, L., Mylonas, G.P., Darzi, A.W., Yang, G.-Z.: HMM Assessment of Quality of Movement Trajectory in Laparoscopic Surgery. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4190, pp. 752–759. Springer, Heidelberg (2006)
Chapter Google Scholar
Tao, L., Elhamifar, E., Khudanpur, S., Hager, G.D., Vidal, R.: Sparse Hidden Markov Models for Surgical Gesture Classification and Skill Evaluation. In: Abolmaesumi, P., Joskowicz, L., Navab, N., Jannin, P. (eds.) IPCAI 2012. LNCS, vol. 7330, pp. 167–177. Springer, Heidelberg (2012)
Chapter Google Scholar
Blum, T., Feußner, H., Navab, N.: Modeling and Segmentation of Surgical Workflow from Laparoscopic Video. In: Jiang, T., Navab, N., Pluim, J.P.W., Viergever, M.A. (eds.) MICCAI 2010, Part III. LNCS, vol. 6363, pp. 400–407. Springer, Heidelberg (2010)
Chapter Google Scholar
Padoy, N., Blum, T., Ahmadi, S., Feussner, H., Berger, M., Navab, N.: Statistical modeling and recognition of surgical workflow. Medical Image Analysis 16(3), 632–641 (2012)
Article Google Scholar
Lalys, F., Riffaud, L., Bouget, D., Jannin, P.: An Application-Dependent Framework for the Recognition of High-Level Surgical Tasks in the OR. In: Fichtinger, G., Martel, A., Peters, T. (eds.) MICCAI 2011, Part I. LNCS, vol. 6891, pp. 331–338. Springer, Heidelberg (2011)
Chapter Google Scholar
Miyawaki, F., Masamune, K., Suzuki, S., Yoshimitsu, K., Vain, J.: Scrub nurse robot system - intraoperative motion analysis of a scrub nurse and timed-automata-based model for surgery. Transactions on Industrial Electronics 52(5), 1227–1235 (2005)
Article Google Scholar
Lin, H.: Structure in surgical motion. PhD thesis, Johns Hopkins University (2010)
Google Scholar
Doretto, G., Chiuso, A., Wu, Y., Soatto, S.: Dynamic textures. Int. Journal of Computer Vision 51(2), 91–109 (2003)
Article MATH Google Scholar
Chaudhry, R., Vidal, R.: Recognition of visual dynamical processes: Theory, kernels and experimental evaluation. Technical Report 09-01, Department of Computer Science, Johns Hopkins University (2009)
Google Scholar
Cock, K.D., Moor, B.D.: Subspace angles and distances between ARMA models. System and Control Letters 46(4), 265–270 (2002)
Article MATH Google Scholar
Martin, A.: A metric for ARMA processes. IEEE Trans. on Signal Processing 48(4), 1164–1170 (2000)
Article MATH Google Scholar
Dance, C., Willamowski, J., Fan, L., Bray, C., Csurka, G.: Visual categorization with bags of keypoints. In: European Conference on Computer Vision (2004)
Google Scholar
Lowe, D.G.: Object recognition from local scale-invariant features. In: IEEE Conf. on Computer Vision and Pattern Recognition, pp. 1150–1157 (1999)
Google Scholar
Laptev, I.: On space-time interest points. Int. Journal of Computer Vision 64(2-3), 107–123 (2005)
Article Google Scholar
Willems, G., Tuytelaars, T., Van Gool, L.: An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 650–663. Springer, Heidelberg (2008)
Chapter Google Scholar
Chaudhry, R., Ravichandran, A., Hager, G., Vidal, R.: Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In: IEEE Conference on Computer Vision and Pattern Recognition (2009)
Google Scholar
Wang, H., Ullah, M.M., Klaser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: British Machine Vision Conference, pp. 1–11 (2009)
Google Scholar
Varma, M., Babu, R.: More generality in efficient multiple kernel learning. In: International Conference on Machine Learning, pp. 1065–1072 (2009)
Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), Software http://www.csie.ntu.edu.tw/~cjlin/libsvm

Download references

Author information

Authors and Affiliations

Center for Imaging Science, Johns Hopkins University, USA
Benjamín Béjar Haro, Luca Zappella & René Vidal

Authors

Benjamín Béjar Haro
View author publications
You can also search for this author in PubMed Google Scholar
Luca Zappella
View author publications
You can also search for this author in PubMed Google Scholar
René Vidal
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Inria Sophia Antipolis, Project Team Asclepios, 06902, Sophia-Antipolis, France
Nicholas Ayache & Hervé Delingette &
MIT, CSAIL, 02139,, Cambridge,, MA, USA
Polina Golland
Information and Communication, Nagoya University, 464-8603, Headquarters, Nagoya, Japan
Kensaku Mori

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Béjar Haro, B., Zappella, L., Vidal, R. (2012). Surgical Gesture Classification from Video Data. In: Ayache, N., Delingette, H., Golland, P., Mori, K. (eds) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2012. MICCAI 2012. Lecture Notes in Computer Science, vol 7510. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33415-3_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-33415-3_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33414-6
Online ISBN: 978-3-642-33415-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Surgical Gesture Classification from Video Data

Abstract

Chapter PDF

Similar content being viewed by others

Cross-modal self-supervised representation learning for gesture and skill recognition in robotic surgery

Using 3D Convolutional Neural Networks to Learn Spatiotemporal Features for Automatic Surgical Gesture Recognition in Video

One-Shot-Learning Gesture Recognition Using HOG-HOF Features

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Surgical Gesture Classification from Video Data

Abstract

Chapter PDF

Similar content being viewed by others

Cross-modal self-supervised representation learning for gesture and skill recognition in robotic surgery

Using 3D Convolutional Neural Networks to Learn Spatiotemporal Features for Automatic Surgical Gesture Recognition in Video

One-Shot-Learning Gesture Recognition Using HOG-HOF Features

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation