Skip to main content
Top
Published in: International Journal of Computer Assisted Radiology and Surgery 5/2021

Open Access 01-05-2021 | Original Article

Towards markerless surgical tool and hand pose estimation

Authors: Jonas Hein, Matthias Seibold, Federica Bogo, Mazda Farshad, Marc Pollefeys, Philipp Fürnstahl, Nassir Navab

Published in: International Journal of Computer Assisted Radiology and Surgery | Issue 5/2021

Login to get access

Abstract

Purpose: 

Tracking of tools and surgical activity is becoming more and more important in the context of computer assisted surgery. In this work, we present a data generation framework, dataset and baseline methods to facilitate further research in the direction of markerless hand and instrument pose estimation in realistic surgical scenarios.

Methods: 

We developed a rendering pipeline to create inexpensive and realistic synthetic data for model pretraining. Subsequently, we propose a pipeline to capture and label real data with hand and object pose ground truth in an experimental setup to gather high-quality real data. We furthermore present three state-of-the-art RGB-based pose estimation baselines.

Results: 

We evaluate three baseline models on the proposed datasets. The best performing baseline achieves an average tool 3D vertex error of 16.7 mm on synthetic data as well as 13.8 mm on real data which is comparable to the state-of-the art in RGB-based hand/object pose estimation.

Conclusion: 

To the best of our knowledge, we propose the first synthetic and real data generation pipelines to generate hand and object pose labels for open surgery. We present three baseline models for RGB based object and object/hand pose estimation based on RGB frames. Our realistic synthetic data generation pipeline may contribute to overcome the data bottleneck in the surgical domain and can easily be transferred to other medical applications.
Appendix
Available only for authorised users
Literature
1.
go back to reference Allan M, Chang PL, Ourselin S, Hawkes DJ, Sridhar A, Kelly J, Stoyanov D (2015) Image based surgical instrument pose estimation with multi-class labelling and optical flow. In: International conference on medical image computing and computer—assisted intervention, pp 331–338 Allan M, Chang PL, Ourselin S, Hawkes DJ, Sridhar A, Kelly J, Stoyanov D (2015) Image based surgical instrument pose estimation with multi-class labelling and optical flow. In: International conference on medical image computing and computer—assisted intervention, pp 331–338
2.
go back to reference Allotta B, Giacalone G, Rinaldi L (1997) A hand-held drilling tool for orthopedic surgery. In: IEEE/ASME transactions on mechatronics 2 Allotta B, Giacalone G, Rinaldi L (1997) A hand-held drilling tool for orthopedic surgery. In: IEEE/ASME transactions on mechatronics 2
3.
go back to reference Amparore D, Checcucci E, Gribaudo M, Piazzolla P, Porpiglia F, Vezzetti E (2020) Non-linear-optimization using sqp for 3d deformable prostate model pose estimation in minimally invasive surgery. Advances in Computer Vision. CVC 2019. Adv Intell Syst Comput 943 Amparore D, Checcucci E, Gribaudo M, Piazzolla P, Porpiglia F, Vezzetti E (2020) Non-linear-optimization using sqp for 3d deformable prostate model pose estimation in minimally invasive surgery. Advances in Computer Vision. CVC 2019. Adv Intell Syst Comput 943
4.
go back to reference Brachmann E, Michel F, Krull A, Yang M.Y, Gumhold S, Rother C (2016) Uncertainty-driven 6d pose estimation of objects and scenes from a single rgb image. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3364–3372 Brachmann E, Michel F, Krull A, Yang M.Y, Gumhold S, Rother C (2016) Uncertainty-driven 6d pose estimation of objects and scenes from a single rgb image. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3364–3372
5.
go back to reference Chetverikov D, Svirko D, Stepanov D, Krsek P (2002) The trimmed iterative closest point algorithm. In: Object recognition supported by user interaction for service robots, Vol. 3. IEEE, pp 545–548 Chetverikov D, Svirko D, Stepanov D, Krsek P (2002) The trimmed iterative closest point algorithm. In: Object recognition supported by user interaction for service robots, Vol. 3. IEEE, pp 545–548
6.
7.
go back to reference Elfring R, de la Fuente M, Radermacher K (2010) Assessment of optical localizer accuracy for computer aided surgery systems. Comput Aid Surg 15(1–3):1–12CrossRef Elfring R, de la Fuente M, Radermacher K (2010) Assessment of optical localizer accuracy for computer aided surgery systems. Comput Aid Surg 15(1–3):1–12CrossRef
8.
go back to reference Esteva A, Robicquet A, Ramsundar B, Kuleshov V, DePristo M, Chou K, Cui C, Corrado G, Thrun S, Dean J (2019) A guide to deep learning in healthcare. Nat Med 25:24–29CrossRef Esteva A, Robicquet A, Ramsundar B, Kuleshov V, DePristo M, Chou K, Cui C, Corrado G, Thrun S, Dean J (2019) A guide to deep learning in healthcare. Nat Med 25:24–29CrossRef
9.
go back to reference Farshad M, Aichmair A, Gerber C, Bauer DE (2020) Classification of perioperative complications in spine surgery. Spine J 20:730–736CrossRef Farshad M, Aichmair A, Gerber C, Bauer DE (2020) Classification of perioperative complications in spine surgery. Spine J 20:730–736CrossRef
10.
go back to reference Farshad M, Bauer DE, Wechsler C, Gerber C, Aichmair A (2018) Risk factors for perioperative morbidity in spine surgeries of different complexities: a multivariate analysis of 1009 consecutive patients. Spine J 18:1625–1631CrossRef Farshad M, Bauer DE, Wechsler C, Gerber C, Aichmair A (2018) Risk factors for perioperative morbidity in spine surgeries of different complexities: a multivariate analysis of 1009 consecutive patients. Spine J 18:1625–1631CrossRef
11.
go back to reference Genovese B, Yin S, Sareh S, DeVirgilio M, Mukdad L, Davis J, Santos VJ, Benharash P (2016) Surgical hand tracking in open surgery using a versatile motion sensing system: Are we there yet? Am Surg 82(10):872–875CrossRef Genovese B, Yin S, Sareh S, DeVirgilio M, Mukdad L, Davis J, Santos VJ, Benharash P (2016) Surgical hand tracking in open surgery using a versatile motion sensing system: Are we there yet? Am Surg 82(10):872–875CrossRef
12.
go back to reference Halliday J, Kamaly I (2016) Use of the brainlab disposable stylet for endoscope and peel-away navigation. Acta Neurochirurgica 158:2327–2331CrossRef Halliday J, Kamaly I (2016) Use of the brainlab disposable stylet for endoscope and peel-away navigation. Acta Neurochirurgica 158:2327–2331CrossRef
13.
go back to reference Hampali S, Rad M, Oberweger M, Lepetit V (2019) Honnotate: a method for 3d annotation of hand and objects poses Hampali S, Rad M, Oberweger M, Lepetit V (2019) Honnotate: a method for 3d annotation of hand and objects poses
14.
go back to reference Hasson Y, Tekin B, Bogo F, Laptev I, Pollefeys M, Schmid C (2020) Leveraging photometric consistency over time for sparsely supervised hand-object reconstruction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) Hasson Y, Tekin B, Bogo F, Laptev I, Pollefeys M, Schmid C (2020) Leveraging photometric consistency over time for sparsely supervised hand-object reconstruction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
15.
go back to reference Hasson Y, Varol G, Tzionas D, Kalevatykh I, Black MJ, Laptev I, Schmid C (2019) Learning joint reconstruction of hands and manipulated objects. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 11807–11816 Hasson Y, Varol G, Tzionas D, Kalevatykh I, Black MJ, Laptev I, Schmid C (2019) Learning joint reconstruction of hands and manipulated objects. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 11807–11816
16.
go back to reference He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
17.
go back to reference Hinterstoisser S, Lepetit V, Ilic S, Holzer S, Bradski G, Konolige K, Navab N (2012) Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In: Asian conference on computer vision. Springer, pp 548–562 Hinterstoisser S, Lepetit V, Ilic S, Holzer S, Bradski G, Konolige K, Navab N (2012) Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In: Asian conference on computer vision. Springer, pp 548–562
18.
go back to reference Kehl W, Manhardt F, Tombari F, Ilic S, Navab N (2017) Ssd-6d: Making rgb-based 3d detection and 6d pose estimation great again. In: Proceedings of the IEEE international conference on computer vision, pp 1521–1529 Kehl W, Manhardt F, Tombari F, Ilic S, Navab N (2017) Ssd-6d: Making rgb-based 3d detection and 6d pose estimation great again. In: Proceedings of the IEEE international conference on computer vision, pp 1521–1529
19.
go back to reference Lepetit V, Moreno-Noguer F, Fua P (2009) Epnp: an accurate o (n) solution to the pnp problem. Int J Comput Vis 81(2):155CrossRef Lepetit V, Moreno-Noguer F, Fua P (2009) Epnp: an accurate o (n) solution to the pnp problem. Int J Comput Vis 81(2):155CrossRef
20.
go back to reference Liebmann F, Roner S, von Atzigen M, Scaramuzza D, Sutter R, Snedeker J, Farshad M, Fürnstahl P (2019) Pedicle screw navigation using surface digitization on the microsoft hololens. Int J Comput Assist Radiol Surg 14:1157–1165CrossRef Liebmann F, Roner S, von Atzigen M, Scaramuzza D, Sutter R, Snedeker J, Farshad M, Fürnstahl P (2019) Pedicle screw navigation using surface digitization on the microsoft hololens. Int J Comput Assist Radiol Surg 14:1157–1165CrossRef
21.
go back to reference Loper M, Mahmood N, Romero J, Pons-Moll G, Black MJ (2015) Smpl: a skinned multi-person linear model. ACM Trans Graph (TOG) 34(6):1–16CrossRef Loper M, Mahmood N, Romero J, Pons-Moll G, Black MJ (2015) Smpl: a skinned multi-person linear model. ACM Trans Graph (TOG) 34(6):1–16CrossRef
22.
go back to reference Merloz P, Troccaz J, Vouaillat H, Vasile C, Tonetti J, Eid A, Plaweski S (2007) Fluoroscopy-based navigation system in spine surgery. Proc Inst Mech Eng Part H J Eng Med 221(7):813–820CrossRef Merloz P, Troccaz J, Vouaillat H, Vasile C, Tonetti J, Eid A, Plaweski S (2007) Fluoroscopy-based navigation system in spine surgery. Proc Inst Mech Eng Part H J Eng Med 221(7):813–820CrossRef
23.
go back to reference Miller AT, Allen PK (2004) Graspit! a versatile simulator for robotic grasping. IEEE Robot Autom Mag 11(4):110–122CrossRef Miller AT, Allen PK (2004) Graspit! a versatile simulator for robotic grasping. IEEE Robot Autom Mag 11(4):110–122CrossRef
24.
go back to reference Navab N, Blum T, Wang L, Okur A, Wendler T (2012) First deployments of augmented reality in operating rooms. Computer 45(7):48–55CrossRef Navab N, Blum T, Wang L, Okur A, Wendler T (2012) First deployments of augmented reality in operating rooms. Computer 45(7):48–55CrossRef
25.
go back to reference Padoy N (2018) Machine and deep learning for workflow recognition during surgery. Minim Invasive Ther Allied Technol 28(2):82–90CrossRef Padoy N (2018) Machine and deep learning for workflow recognition during surgery. Minim Invasive Ther Allied Technol 28(2):82–90CrossRef
26.
go back to reference Peng S, Liu Y, Huang Q, Zhou X, Bao H (2019) Pvnet: pixel-wise voting network for 6dof pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4561–4570 Peng S, Liu Y, Huang Q, Zhou X, Bao H (2019) Pvnet: pixel-wise voting network for 6dof pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4561–4570
27.
go back to reference Qian L, Deguet A, Kazanzides P (2018) Arssist: augmented reality on a head-mounted display for the first assistant in robotic surgery. Healthc Technol Lett 5(5):194–200CrossRef Qian L, Deguet A, Kazanzides P (2018) Arssist: augmented reality on a head-mounted display for the first assistant in robotic surgery. Healthc Technol Lett 5(5):194–200CrossRef
28.
go back to reference Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271 Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271
29.
go back to reference Romero J, Tzionas D, Black MJ (2017) Embodied hands: modeling and capturing hands and bodies together. ACM Trans Graph (ToG) 36(6):245CrossRef Romero J, Tzionas D, Black MJ (2017) Embodied hands: modeling and capturing hands and bodies together. ACM Trans Graph (ToG) 36(6):245CrossRef
30.
go back to reference Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 234–241 Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 234–241
31.
go back to reference Sahiner B, Pezeshk A, Hadjiiski LM, Wang X, Drukker K, Cha KH, Summers RM, Giger ML (2019) Deep learning in medical imaging and radiation therapy. Med Phys 46 Sahiner B, Pezeshk A, Hadjiiski LM, Wang X, Drukker K, Cha KH, Summers RM, Giger ML (2019) Deep learning in medical imaging and radiation therapy. Med Phys 46
32.
go back to reference Saun TJ, Zuo KJ, Grantcharov TP (2019) Video technologies for recording open surgery: a systematic review. Surg Innov 26(5):599–612CrossRef Saun TJ, Zuo KJ, Grantcharov TP (2019) Video technologies for recording open surgery: a systematic review. Surg Innov 26(5):599–612CrossRef
33.
go back to reference Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011) Real-time human pose recognition in parts from single depth images. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011) Real-time human pose recognition in parts from single depth images. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
34.
go back to reference Simon T, Joo H, Matthews I, Sheikh Y (2017) Hand keypoint detection in single images using multiview bootstrapping. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) Simon T, Joo H, Matthews I, Sheikh Y (2017) Hand keypoint detection in single images using multiview bootstrapping. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
35.
go back to reference Tekin B, Bogo F, Pollefeys M (2019) H+o: unified egocentric recognition of 3d hand-object poses and interactions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4511–4520 Tekin B, Bogo F, Pollefeys M (2019) H+o: unified egocentric recognition of 3d hand-object poses and interactions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4511–4520
36.
go back to reference Tremblay J, Prakash A, Acuna D, Brophy M, Jampani V, Anil C, To T, Cameracci E, Boochoon S, Birchfield S (2018) Training deep networks with synthetic data: bridging the reality gap by domain randomization. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) workshops Tremblay J, Prakash A, Acuna D, Brophy M, Jampani V, Anil C, To T, Cameracci E, Boochoon S, Birchfield S (2018) Training deep networks with synthetic data: bridging the reality gap by domain randomization. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) workshops
37.
go back to reference Varol G, Romero J, Martin X, Mahmood N, Black MJ, Laptev I, Schmid C (2017) Learning from synthetic humans. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 109–117 Varol G, Romero J, Martin X, Mahmood N, Black MJ, Laptev I, Schmid C (2017) Learning from synthetic humans. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 109–117
38.
go back to reference Xiang Y, Schmidt T, Narayanan V, Fox D (2017) Posecnn: a convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199 Xiang Y, Schmidt T, Narayanan V, Fox D (2017) Posecnn: a convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:​1711.​00199
39.
go back to reference Zwingmann J, Konrad G, Kotter E, Südkamp NP (1833) Oberst M (2009) Computer-navigated iliosacral screw insertion reduces malposition rate and radiation exposure. Clin Orthop Relat Res 467(7) Zwingmann J, Konrad G, Kotter E, Südkamp NP (1833) Oberst M (2009) Computer-navigated iliosacral screw insertion reduces malposition rate and radiation exposure. Clin Orthop Relat Res 467(7)
Metadata
Title
Towards markerless surgical tool and hand pose estimation
Authors
Jonas Hein
Matthias Seibold
Federica Bogo
Mazda Farshad
Marc Pollefeys
Philipp Fürnstahl
Nassir Navab
Publication date
01-05-2021
Publisher
Springer International Publishing
Published in
International Journal of Computer Assisted Radiology and Surgery / Issue 5/2021
Print ISSN: 1861-6410
Electronic ISSN: 1861-6429
DOI
https://doi.org/10.1007/s11548-021-02369-2

Other articles of this Issue 5/2021

International Journal of Computer Assisted Radiology and Surgery 5/2021 Go to the issue