Top

International Journal of Computer Assisted Radiology and Surgery

Published in:

01-06-2019 | Original Article

Face detection in the operating room: comparison of state-of-the-art methods and a self-supervised approach

Authors: Thibaut Issenhuth, Vinkle Srivastav, Afshin Gangi, Nicolas Padoy

Published in: International Journal of Computer Assisted Radiology and Surgery | Issue 6/2019

Abstract

Purpose

Face detection is a needed component for the automatic analysis and assistance of human activities during surgical procedures. Efficient face detection algorithms can indeed help to detect and identify the persons present in the room and also be used to automatically anonymize the data. However, current algorithms trained on natural images do not generalize well to the operating room (OR) images. In this work, we provide a comparison of state-of-the-art face detectors on OR data and also present an approach to train a face detector for the OR by exploiting non-annotated OR images.

Methods

We propose a comparison of six state-of-the-art face detectors on clinical data using multi-view OR faces, a dataset of OR images capturing real surgical activities. We then propose to use self-supervision, a domain adaptation method, for the task of face detection in the OR. The approach makes use of non-annotated images to fine-tune a state-of-the-art detector for the OR without using any human supervision.

Results

The results show that the best model, namely the tiny face detector, yields an average precision of 0.556 at intersection over union of 0.5. Our self-supervised model using non-annotated clinical data outperforms this result by 9.2%.

Conclusion

We present the first comparison of state-of-the-art face detectors on OR images and show that results can be significantly improved by using self-supervision on non-annotated data.

Chen K, Gabriel P, Alasfour A, Gong C, Doyle WK, Devinsky O, Friedman D, Dugan P, Melloni L, Thesen T, Gonda D, Sattar S, Wang S, Gilja V (2018) Patient-specific pose estimation in clinical environments. IEEE J Transl Eng Health Med 6:1–11

Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: CVPR, pp I–I

Najibi M, Samangouei P, Chellappa R, Davis LS (2017) SSH: single stage headless face detector. In: ICCV, pp 4885–4894

Zhang S, Zhu X, Lei Z, Shi H, Wang X, Li SZ (2017) S\(^3\)FD: single shot scale-invariant face detector. In: International conference on computer vision (ICCV) at Venice, Italy

Jiang H, Learned-Miller E (2017) Face detection with the faster R-CNN. In: 12th IEEE international conference on automatic face & gesture recognition (FG 2017)

Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp 91–99

Yang S, Luo P, Loy CC, Tang X (2016) Wider face: a face detection benchmark. In: CVPR

Cao Z, Simon T, Wei S-E, Sheikh Y (2017) Realtime multi-person 2D pose estimation using part affinity fields. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 1302–1310

Insafutdinov E, Pishchulin L, Andres B, Andriluka M, Schiele B (2016) Deepercut: a deeper, stronger, and faster multi-person pose estimation model. In: ECCV, pp 34–50

10.

Fang H-S, Xie S, Tai Y-W, Lu C (2017) RMPE: regional multi-person pose estimation. In: ICCV

11.

Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. In: ECCV

12.

Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018) Cascaded pyramid network for multi-person pose estimation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 7103–7112

13.

Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: common objects in context. In: ECCV, pp 740–755

14.

Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2D human pose estimation: new benchmark and state of the art analysis. In: CVPR, June 2014

15.

Twinanda AP, Shehata S, Mutter D, Marescaux J, de Mathelin M, Padoy N (2016) Multi-stream deep architecture for surgical phase recognition on multi-view RGBD videos. In: MICCAI workshop on modeling and monitoring of computer assisted interventions (M2CAI)

16.

Maier-Hein L, Vedula SS, Speidel S, Navab N, Kikinis R, Park A, Eisenmann M, Feussner H, Forestier G, Giannarou S, Hashizume M, Katic D, Kenngott H, Kranzfelder M, Malpani A, März K, Neumuth T, Padoy N, Pugh C, Schoch N, Stoyanov D, Taylor R, Wagner M, Hager GD, Jannin P (2017) Surgical data science for next-generation interventions. Nat Biomed Eng 1(9):691CrossRefPubMed

17.

Yeung S, Downing NL, Fei-Fei L, Milstein A (2018) Bedside computer vision-moving artificial intelligence from driver assistance to patient safety. NEJM 378(14):1271CrossRefPubMed

18.

Kadkhodamohammadi A, Gangi A, de Mathelin M, Padoy N (2017) Articulated clinician detection using 3D pictorial structures on RGB-D data. Med Image Anal 35:215–224CrossRefPubMed

19.

Kadkhodamohammadi A, Gangi A, de Mathelin M, Padoy N (2017) A multi-view RGB-D approach for human pose estimation in operating rooms. In: WACV, pp 363–372

20.

Belagiannis V, Wang X, Shitrit HB, Hashimoto K, Stauder R, Aoki Y, Kranzfelder M, Schneider A, Fua P, Ilic S, Feussner H, Navab N (2016) Parsing human skeletons in an operating room. Mach Vis Appl 27(7):1035–1046CrossRef

21.

Nieto-Rodríguez A, Mucientes M, Brea VM (2015) System for medical mask detection in the operating room through facial attributes. In: Iberian conference on pattern recognition and image analysis. Springer, pp 138–145

22.

Flouty E, Zisimopoulos O, Stoyanov D (2018) Faceoff: anonymizing videos in the operating rooms. In: OR 2.0 context-aware operating theaters, computer assisted robotic endoscopy, clinical image-based procedures, and skin image analysis. Springer, pp 30–38

23.

Friedman J, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann Stat 28(2):337–407CrossRef

24.

Srivastav V, Issenhuth T, Kadkhodamohammadi A, de Mathelin M, Gangi A, Padoy N (2018) MVOR: a multi-view rgb-d operating room dataset for 2D and 3D human pose estimation. In: MICCAI-LABELS-2018

25.

Laine S, Aila T (2016) Temporal ensembling for semi-supervised learning. In: ICLR. arXiv preprint arXiv:1610.02242

26.

Radosavovic I, Dollár P, Girshick RB, Gkioxari G, He K (2018) Data distillation: towards omni-supervised learning. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 4119–4128

27.

Hu P, Ramanan D (2017) Finding tiny faces. In: CVPR

28.

Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) SSD: single shot multibox detector. In: ECCV. Springer, pp 21–37

Title: Face detection in the operating room: comparison of state-of-the-art methods and a self-supervised approach
Authors: Thibaut Issenhuth
Vinkle Srivastav
Afshin Gangi
Nicolas Padoy
Publication date: 01-06-2019
Publisher: Springer International Publishing
Published in: International Journal of Computer Assisted Radiology and Surgery / Issue 6/2019
Print ISSN: 1861-6410
Electronic ISSN: 1861-6429
DOI: https://doi.org/10.1007/s11548-019-01944-y

At a glance: The STEP trials

Springer Medicine

Face detection in the operating room: comparison of state-of-the-art methods and a self-supervised approach

Abstract

Purpose

Methods

Results

Conclusion

At a glance: The STEP trials

Springer Medicine

Abstract

Purpose

Methods

Results

Conclusion

Please log in to get access to this content

Other articles of this Issue 6/2019

Semantic segmentation and detection of mediastinal lymph nodes and anatomical structures in CT data for lung cancer staging

Deep neural maps for unsupervised visualization of high-grade cancer in prostate biopsies

Automatic biplane left ventricular ejection fraction estimation with mobile point-of-care ultrasound using multi-task learning and adversarial training

Flexible needle and patient tracking using fractional scanning in interventional CT procedures

On the feasibility of transperineal 3D ultrasound image guidance for robotic radical prostatectomy

Prediction of laparoscopic procedure duration using unlabeled, multimodal sensor data