Simultaneous Segmentation and Pose Estimation of Humans Using Dynamic Graph Cuts

Kohli, Pushmeet; Rihan, Jonathan; Bray, Matthieu; Torr, Philip H. S.

doi:10.1007/s11263-007-0120-6

Simultaneous Segmentation and Pose Estimation of Humans Using Dynamic Graph Cuts

Published: 10 January 2008

Volume 79, pages 285–298, (2008)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Pushmeet Kohli¹,
Jonathan Rihan¹,
Matthieu Bray¹ &
…
Philip H. S. Torr¹

692 Accesses
86 Citations
3 Altmetric
Explore all metrics

Abstract

This paper presents a novel algorithm for performing integrated segmentation and 3D pose estimation of a human body from multiple views. Unlike other state of the art methods which focus on either segmentation or pose estimation individually, our approach tackles these two tasks together. Our method works by optimizing a cost function based on a Conditional Random Field (CRF). This has the advantage that all information in the image (edges, background and foreground appearances), as well as the prior information on the shape and pose of the subject can be combined and used in a Bayesian framework. Optimizing such a cost function would have been computationally infeasible. However, our recent research in dynamic graph cuts allows this to be done much more efficiently than before. We demonstrate the efficacy of our approach on challenging motion sequences. Although we target the human pose inference problem in the paper, our method is completely generic and can be used to segment and infer the pose of any rigid, deformable or articulated object.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

Article Open access 08 October 2020

LSD-SLAM: Large-Scale Direct Monocular SLAM

A review of computer vision-based approaches for physical rehabilitation and assessment

Article Open access 19 June 2021

References

Agarwal, A., & Triggs, B. (2004). 3D human pose from silhouettes by relevance vector regression. In: CVPR (Vol. II, pp. 882–888).
Agarwal, A., & Triggs, B. (2006). Recovering 3D human pose from monocular images. IEEE Trans. Pattern Anal. Mach. Intell., 28.
Blake, A., Rother, C., Brown, M., Pérez, P., & Torr, P. (2004). Interactive image segmentation using an adaptive gmmrf model. In: ECCV (Vol. I, pp. 428–441).
Boykov, Y., & Jolly, M. (2001). Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images. In: ICCV (Vol. I, pp. 105–112).
Bray, M., Kohli, P., & Torr, P. H. S. (2006). Posecut: Simultaneous segmentation and 3D pose estimation of humans using dynamic graph-cuts. In: ECCV (Vol. 2, pp. 642–655).
Cremers, D., Osher, S., & Soatto, S. (2006). Kernel density estimation and intrinsic alignment for shape priors in level set segmentation. International Journal of Computer Vision, 69, 335–351.
Article Google Scholar
Deutscher, J., Davison, A., & Reid, I. (2001). Automatic partitioning of high dimensional search spaces associated with articulated body motion capture. In: CVPR (Vol. 2, pp. 669–676).
Ek, C., Laurence, N., & Torr, P. (2007). Gaussian process latent variable models for human pose estimation. In 4th joint workshop on multimodal interaction and related machine learning algorithms.
Felzenszwalb, P. F., & Huttenlocher, D. P. (2000). Efficient matching of pictorial structures. In: CVPR.
Felzenszwalb, P., & Huttenlocher, D. (2004). Distance transforms of sampled functions (Technical Report TR2004-1963). Cornell University.
Freedman, D., & Zhang, T. (2005). Interactive graph cut based segmentation with shape priors. In: CVPR (Vol. I, pp. 755–762).
Gavrila, D., & Davis, L. (1996). 3D model-based tracking of humans in action: a multi-view approach. In: CVPR (pp. 73–80).
Huang, R., Pavlovic, V., & Metaxas, D. (2004). A graphical model framework for coupling mrfs and deformable models. In: CVPR (Vol. II, pp. 739–746).
Kehl, R., Bray, M., & Van Gool, L. (2005). Full body tracking from multiple views using stochastic sampling. In: CVPR (Vol. II, pp. 129–136).
Kohli, P., & Torr, P. (2005). Efficiently solving dynamic Markov random fields using graph cuts. In: ICCV.
Kolmogorov, V., & Zabih, R. (2002). What energy functions can be minimized via graph cuts? In: ECCV (Vol. III).
Kolmogorov, V., Criminisi, A., Blake, A., Cross, G., & Rother, C. (2005). Bi-layer segmentation of binocular stereo video. In: CVPR (Vol. 2, pp. 407–414).
Kumar, M., Torr, P., & Zisserman, A. (2005). Obj cut. In: CVPR (Vol. I, pp. 18–25).
Lafferty, J. D., McCallum, A., & Pereira, F. C. N. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML (pp. 282–289).
Lan, X., & Huttenlocher, D. P. (2005). Beyond trees: common-factor models for 2D human pose recovery. In: ICCV (pp. 470–477).
Leventon, M. E., Grimson, W. E. L., & Faugeras, O. D. (2000). Statistical shape influence in geodesic active contours. In: CVPR (pp. 1316–1323).
Mori, G., Ren, X., Efros, A. A., & Malik, J. (2004). Recovering human body configurations: Combining segmentation and recognition. In: CVPR (Vol. 2, pp. 326–333).
Press, W., Flannery, B., Teukolsky, S., & Vetterling, W. (1988). Numerical recipes in C. Cambridge: Cambridge University Press.
MATH Google Scholar
Ramanan, D. (2007). Using segmentation to verify object hypotheses. In: CVPR.
Ramanan, D., & Forsyth, D. A. (2003). Finding and tracking people from the bottom up. In: CVPR (Vol. 2, pp. 467–474).
Rihan, J., Kohli, P., & Torr, P. H. S. (2006). Objcut for face detection. In: ICVGIP (pp. 576–584).
Shakhnarovich, G., Viola, P., & Darrell, T. (2003). Fast pose estimation with parameter-sensitive hashing. In: ICCV (pp. 750–757).
Sidenbladh, H., Black, M. J., & Fleet, D. J. (2000a). Stochastic tracking of 3D human figures using 2D image motion. In: ECCV (Vol. 2, pp. 702–718).
Sidenbladh, H., Black, M. J., & Fleet, D. J. (2000b). Stochastic tracking of 3D human figures using 2D image motion. In: ECCV (pp. 702–718).
Sminchisescu, C., & Jepson, A. D. (2004). Generative modeling for continuous non-linearly embedded visual inference. In: ICML.
Sminchisescu, C., & Triggs, B. (2001). Covariance scaled sampling for monocular 3D body tracking. In: CVPR (pp. 447–454).
Stauffer, C., & Grimson, W. (1999). Adaptive background mixture models for real-time tracking. In: CVPR (pp. 246–252).
Stenger, B., Thayananthan, A., Torr, P., & Cipolla, R. (2003). Filtering using a tree-based estimator. In: ICCV (pp. 1063–1070).
Sun, Y., Kohli, P., Bray, M., & Torr, P. H. S. (2006). Using strong shape priors for stereo. In: ICVGIP (pp. 882–893).
Urtasun, R., Fleet, D. J., Hertzmann, A., & Fua, P. (2005). Priors for people tracking from small training sets. In: ICCV (pp. 403–410).
Viola, P. A., & Jones, M. J. (2004). Robust real-time face detection. International Journal of Computer Vision, 57, 137–154.
Article Google Scholar
Zhao, L., & Davis, L. S. (2005). Closely coupled object detection and segmentation. In: ICCV (pp. 454–461).

Download references

Author information

Authors and Affiliations

Department of Computing, Oxford Brookes University, Wheatley Campus, Oxford, OX33 1HX, UK
Pushmeet Kohli, Jonathan Rihan, Matthieu Bray & Philip H. S. Torr

Authors

Pushmeet Kohli
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Rihan
View author publications
You can also search for this author in PubMed Google Scholar
Matthieu Bray
View author publications
You can also search for this author in PubMed Google Scholar
Philip H. S. Torr
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pushmeet Kohli.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kohli, P., Rihan, J., Bray, M. et al. Simultaneous Segmentation and Pose Estimation of Humans Using Dynamic Graph Cuts. Int J Comput Vis 79, 285–298 (2008). https://doi.org/10.1007/s11263-007-0120-6

Download citation

Received: 07 April 2006
Accepted: 26 December 2007
Published: 10 January 2008
Issue Date: September 2008
DOI: https://doi.org/10.1007/s11263-007-0120-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Simultaneous Segmentation and Pose Estimation of Humans Using Dynamic Graph Cuts

Abstract

Access this article

Similar content being viewed by others

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

LSD-SLAM: Large-Scale Direct Monocular SLAM

A review of computer vision-based approaches for physical rehabilitation and assessment

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Simultaneous Segmentation and Pose Estimation of Humans Using Dynamic Graph Cuts

Abstract

Access this article

Similar content being viewed by others

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

LSD-SLAM: Large-Scale Direct Monocular SLAM

A review of computer vision-based approaches for physical rehabilitation and assessment

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation