Top

International Journal of Computer Assisted Radiology and Surgery

Published in:

01-11-2019 | Original Article

Fusing information from multiple 2D depth cameras for 3D human pose estimation in the operating room

Authors: Lasse Hansen, Marlin Siebert, Jasper Diesel, Mattias P. Heinrich

Published in: International Journal of Computer Assisted Radiology and Surgery | Issue 11/2019

Abstract

Purpose

For many years, deep convolutional neural networks have achieved state-of-the-art results on a wide variety of computer vision tasks. 3D human pose estimation makes no exception and results on public benchmarks are impressive. However, specialized domains, such as operating rooms, pose additional challenges. Clinical settings include severe occlusions, clutter and difficult lighting conditions. Privacy concerns of patients and staff make it necessary to use unidentifiable data. In this work, we aim to bring robust human pose estimation to the clinical domain.

Methods

We propose a 2D–3D information fusion framework that makes use of a network of multiple depth cameras and strong pose priors. In a first step, probabilities of 2D joints are predicted from single depth images. These information are fused in a shared voxel space yielding a rough estimate of the 3D pose. Final joint positions are obtained by regressing into the latent pose space of a pre-trained convolutional autoencoder.

Results

We evaluate our approach against several baselines on the challenging MVOR dataset. Best results are obtained when fusing 2D information from multiple views and constraining the predictions with learned pose priors.

Conclusions

We present a robust 3D human pose estimation framework based on a multi-depth camera network in the operating room. Depth images as only input modalities make our approach especially interesting for clinical applications due to the given anonymity for patients and staff.

[38]: https://github.com/Microsoft/human-pose-estimation.pytorch [25]: https://github.com/dragonbook/V2V-PoseNet-pytorch.

Achilles F, Ichim AE, Coskun H, Tombari F, Noachtar S, Navab N (2016) Patient mocap: human pose estimation under blanket occlusion for hospital monitoring applications. In: Proceedings of the international conference on medical image computing and computer-assisted intervention (MICCAI). Springer, pp 491–499

Andriluka M, Iqbal U, Insafutdinov E, Pishchulin L, Milan A, Gall J, Schiele B (2018) Posetrack: a benchmark for human pose estimation and tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5167–5176

Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2D human pose estimation: new benchmark and state of the art analysis. In: Proceedings of the conference on computer vision and pattern recognition (CVPR), pp 3686–3693

Andriluka M, Roth S, Schiele B (2009) Pictorial structures revisited: people detection and articulated pose estimation. In: Proceedings of the conference on computer vision and pattern recognition (CVPR). IEEE, pp 1014–1021

Belagiannis V, Wang X, Shitrit HBB, Hashimoto K, Stauder R, Aoki Y, Kranzfelder M, Schneider A, Fua P, Ilic S, Feussner H, Navab N (2016) Parsing human skeletons in an operating room. Mach Vis Appl (MVA) 27(7):1035–1046CrossRef

Cao Z, Simon T, Wei S.E, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the conference on computer vision and pattern recognition (CVPR), pp 7291–7299

Chen K, Gabriel P, Alasfour A, Gong C, Doyle WK, Devinsky O, Friedman D, Dugan P, Melloni L, Thesen T, Gonda D, Sattar S, Wang S, Gilja V (2018) Patient-specific pose estimation in clinical environments. J Transl Eng Health Med (JTEHM) 6:1–11

Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018) Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the conference on computer vision and pattern recognition (CVPR), pp 7103–7112

Dietz A, Schröder S, Pösch A, Frank K, Reithmeier E (2016) Contactless surgery light control based on 3D gesture recognition. In: GCAI, pp 138–146

10.

Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. In: Proceedings of the conference on computer vision and pattern recognition (CVPR). IEEE, pp 1–8

11.

Girshick R, Shotton J, Kohli P, Criminisi A, Fitzgibbon A (2011) Efficient regression of general-activity human poses from depth images. In: Proceedings of the international conference on computer vision (ICCV). IEEE, pp 415–422

12.

Hansen L, Diesel J, Heinrich MP (2019) Regularized landmark detection with CAEs for human pose estimation in the operating room. In: Bildverarbeitung für die Medizin (BVM). Springer, pp 178–183

13.

Haque A, Peng B, Luo Z, Alahi A, Yeung S, Fei-Fei L (2016) Towards viewpoint invariant 3D human pose estimation. In: Proccedings of the European conference on computer vision (ECCV). Springer, pp 160–177

14.

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the conference on computer vision and pattern recognition (CVPR), pp 770–778

15.

Ionescu C, Papava D, Olaru V, Sminchisescu C (2014) Human3.6m: large scale datasets and predictive methods for 3D human sensing in natural environments. Trans Pattern Anal Mach Intell (TPAMI) 36(7):1325–1339CrossRef

16.

Jacob MG, Li YT, Akingba GA, Wachs JP (2013) Collaboration with a robotic scrub nurse. Commun ACM 56(5):68–75CrossRef

17.

Jung HY, Suh Y, Moon G, Lee KM (2016) A sequential approach to 3d human pose estimation: separation of localization and identification of body joints. In: Proceedings of the European conference on computer vision (ECCV). Springer, pp 747–761

18.

Kadkhodamohammadi A, Gangi A, de Mathelin M, Padoy N (2017) A multi-view RGB-D approach for human pose estimation in operating rooms. In: Proceedings of the winter conference on applications of computer vision (WACV). IEEE, pp 363–372

19.

Kadkhodamohammadi A, Padoy N (2018) A generalizable approach for multi-view 3D human pose regression. arXiv:1804.10462

20.

Katircioglu I, Tekin B, Salzmann M, Lepetit V, Fua P (2018) Learning latent representations of 3D human pose with deep neural networks. Int J Comput Vis (IJCV) 126:1–16CrossRef

21.

Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980

22.

Liu S, Yin Y, Ostadabbas S (2019) In-bed pose estimation: deep learning with shallow dataset. IEEE J Transl Eng Health Med 7:1–12. https://doi.org/10.1109/JTEHM.2019.2892970 CrossRef

23.

Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: Proceedings of the European conference on computer vision (ECCV). Springer, pp 21–37

24.

McCoy TH, Perlis RH (2018) Temporal trends and characteristics of reportable health data breaches, 2010–2017. JAMA 320(12):1282–1284CrossRef

25.

Moon G, Yong Chang J, Mu Lee K (2018) V2v-posenet: voxel-to-voxel prediction network for accurate 3D hand and human pose estimation from a single depth map. In: Proceedings of the conference on computer vision and pattern recognition (CVPR), pp 5079–5088

26.

Mori G, Ren X, Efros AA, Malik J (2018) Recovering human body configurations: combining segmentation and recognition. In: Proceedings of the conference on computer vision and pattern recognition (CVPR), vol. 2. IEEE (2004)

27.

Newell A, Huang Z, Deng J (2017) Associative embedding: end-to-end learning for joint detection and grouping. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems 30, Curran Associates, Inc., pp 2277–2287. http://papers.nips.cc/paper/6822-associative-embedding-end-to-end-learning-for-joint-detection-and-grouping.pdf

28.

Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: Proceedings of the European conference on computer vision (ECCV). Springer, pp 483–499

29.

Padoy N, Blum T, Ahmadi SA, Feussner H, Berger MO, Navab N (2012) Statistical modeling and recognition of surgical workflow. Med Image Anal 16(3):632–641CrossRef

30.

Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in pytorch. In: Advances in neural information processing systems workshop (NIPS-W)

31.

Pavlakos G, Zhou X, Derpanis KG, Daniilidis K (2017) Coarse-to-fine volumetric prediction for single-image 3D human pose. In: Proceedings of the conference on computer vision and pattern recognition (CVPR). IEEE, pp 1263–1272

32.

Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems (NIPS), pp 91–99

33.

Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011) Real-time human pose recognition in parts from single depth images. In: Proceedings of the conference on computer vision and pattern recognition (CVPR). IEEE, pp 1297–1304

34.

Silas MR, Grassia P, Langerman A (2015) Video recording of the operating room-is anonymity possible? J Surg Res 197(2):272–276CrossRef

35.

Srivastav V, Issenhuth T, Kadkhodamohammadi A, de Mathelin M, Gangi A, Padoy N (2018) MVOR: a multi-view RGB-D operating room dataset for 2D and 3D human pose estimation. arXiv:1808.08180

36.

Toshev A, Szegedy C (2014) Deeppose: human pose estimation via deep neural networks. In: Proceedings of the conference on computer vision and pattern recognition (CVPR), pp 1653–1660

37.

Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res (JMLR) 11(Dec):3371–3408

38.

Xiao B, Wu H, Wei, Y (2018) Simple baselines for human pose estimation and tracking. In: Proceedings of the European conference on computer vision (ECCV)

39.

Yao A, Gall J, Van Gool L (2012) Coupled action recognition and pose estimation from multiple views. Int J Comput Vis 100(1):16–37CrossRef

40.

Yusoff YA, Basori AH, Mohamed F (2013) Interactive hand and arm gesture control for 2D medical image and 3D volumetric medical visualization. Proc Soc Behav Sci 97:723–729CrossRef

Title: Fusing information from multiple 2D depth cameras for 3D human pose estimation in the operating room
Authors: Lasse Hansen
Marlin Siebert
Jasper Diesel
Mattias P. Heinrich
Publication date: 01-11-2019
Publisher: Springer International Publishing
Published in: International Journal of Computer Assisted Radiology and Surgery / Issue 11/2019
Print ISSN: 1861-6410
Electronic ISSN: 1861-6429
DOI: https://doi.org/10.1007/s11548-019-02044-7

At a glance: The ONWARDS insulin icodec trials

Springer Medicine

Fusing information from multiple 2D depth cameras for 3D human pose estimation in the operating room

Abstract

Purpose

Methods

Results

Conclusions

At a glance: The ONWARDS insulin icodec trials

Springer Medicine

Abstract

Purpose

Methods

Results

Conclusions

Please log in to get access to this content

Other articles of this Issue 11/2019

Deep transfer learning methods for colon cancer classification in confocal laser microscopy images

Segmenting and classifying activities in robot-assisted surgery with recurrent neural networks

Toward real-time rigid registration of intra-operative ultrasound with preoperative CT images for lumbar spinal fusion surgery

IJCARS: BVM 2019 special issue

Memory-efficient 2.5D convolutional transformer networks for multi-modal deformable registration with weak label supervision applied to whole-heart CT and MRI scans

Pitfalls in interventional X-ray organ dose assessment—combined experimental and computational phantom study: application to prostatic artery embolization