Abstract
This paper presents a novel algorithm for performing integrated segmentation and 3D pose estimation of a human body from multiple views. Unlike other state of the art methods which focus on either segmentation or pose estimation individually, our approach tackles these two tasks together. Our method works by optimizing a cost function based on a Conditional Random Field (CRF). This has the advantage that all information in the image (edges, background and foreground appearances), as well as the prior information on the shape and pose of the subject can be combined and used in a Bayesian framework. Optimizing such a cost function would have been computationally infeasible. However, our recent research in dynamic graph cuts allows this to be done much more efficiently than before. We demonstrate the efficacy of our approach on challenging motion sequences. Although we target the human pose inference problem in the paper, our method is completely generic and can be used to segment and infer the pose of any rigid, deformable or articulated object.
Similar content being viewed by others
References
Agarwal, A., & Triggs, B. (2004). 3D human pose from silhouettes by relevance vector regression. In: CVPR (Vol. II, pp. 882–888).
Agarwal, A., & Triggs, B. (2006). Recovering 3D human pose from monocular images. IEEE Trans. Pattern Anal. Mach. Intell., 28.
Blake, A., Rother, C., Brown, M., Pérez, P., & Torr, P. (2004). Interactive image segmentation using an adaptive gmmrf model. In: ECCV (Vol. I, pp. 428–441).
Boykov, Y., & Jolly, M. (2001). Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images. In: ICCV (Vol. I, pp. 105–112).
Bray, M., Kohli, P., & Torr, P. H. S. (2006). Posecut: Simultaneous segmentation and 3D pose estimation of humans using dynamic graph-cuts. In: ECCV (Vol. 2, pp. 642–655).
Cremers, D., Osher, S., & Soatto, S. (2006). Kernel density estimation and intrinsic alignment for shape priors in level set segmentation. International Journal of Computer Vision, 69, 335–351.
Deutscher, J., Davison, A., & Reid, I. (2001). Automatic partitioning of high dimensional search spaces associated with articulated body motion capture. In: CVPR (Vol. 2, pp. 669–676).
Ek, C., Laurence, N., & Torr, P. (2007). Gaussian process latent variable models for human pose estimation. In 4th joint workshop on multimodal interaction and related machine learning algorithms.
Felzenszwalb, P. F., & Huttenlocher, D. P. (2000). Efficient matching of pictorial structures. In: CVPR.
Felzenszwalb, P., & Huttenlocher, D. (2004). Distance transforms of sampled functions (Technical Report TR2004-1963). Cornell University.
Freedman, D., & Zhang, T. (2005). Interactive graph cut based segmentation with shape priors. In: CVPR (Vol. I, pp. 755–762).
Gavrila, D., & Davis, L. (1996). 3D model-based tracking of humans in action: a multi-view approach. In: CVPR (pp. 73–80).
Huang, R., Pavlovic, V., & Metaxas, D. (2004). A graphical model framework for coupling mrfs and deformable models. In: CVPR (Vol. II, pp. 739–746).
Kehl, R., Bray, M., & Van Gool, L. (2005). Full body tracking from multiple views using stochastic sampling. In: CVPR (Vol. II, pp. 129–136).
Kohli, P., & Torr, P. (2005). Efficiently solving dynamic Markov random fields using graph cuts. In: ICCV.
Kolmogorov, V., & Zabih, R. (2002). What energy functions can be minimized via graph cuts? In: ECCV (Vol. III).
Kolmogorov, V., Criminisi, A., Blake, A., Cross, G., & Rother, C. (2005). Bi-layer segmentation of binocular stereo video. In: CVPR (Vol. 2, pp. 407–414).
Kumar, M., Torr, P., & Zisserman, A. (2005). Obj cut. In: CVPR (Vol. I, pp. 18–25).
Lafferty, J. D., McCallum, A., & Pereira, F. C. N. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML (pp. 282–289).
Lan, X., & Huttenlocher, D. P. (2005). Beyond trees: common-factor models for 2D human pose recovery. In: ICCV (pp. 470–477).
Leventon, M. E., Grimson, W. E. L., & Faugeras, O. D. (2000). Statistical shape influence in geodesic active contours. In: CVPR (pp. 1316–1323).
Mori, G., Ren, X., Efros, A. A., & Malik, J. (2004). Recovering human body configurations: Combining segmentation and recognition. In: CVPR (Vol. 2, pp. 326–333).
Press, W., Flannery, B., Teukolsky, S., & Vetterling, W. (1988). Numerical recipes in C. Cambridge: Cambridge University Press.
Ramanan, D. (2007). Using segmentation to verify object hypotheses. In: CVPR.
Ramanan, D., & Forsyth, D. A. (2003). Finding and tracking people from the bottom up. In: CVPR (Vol. 2, pp. 467–474).
Rihan, J., Kohli, P., & Torr, P. H. S. (2006). Objcut for face detection. In: ICVGIP (pp. 576–584).
Shakhnarovich, G., Viola, P., & Darrell, T. (2003). Fast pose estimation with parameter-sensitive hashing. In: ICCV (pp. 750–757).
Sidenbladh, H., Black, M. J., & Fleet, D. J. (2000a). Stochastic tracking of 3D human figures using 2D image motion. In: ECCV (Vol. 2, pp. 702–718).
Sidenbladh, H., Black, M. J., & Fleet, D. J. (2000b). Stochastic tracking of 3D human figures using 2D image motion. In: ECCV (pp. 702–718).
Sminchisescu, C., & Jepson, A. D. (2004). Generative modeling for continuous non-linearly embedded visual inference. In: ICML.
Sminchisescu, C., & Triggs, B. (2001). Covariance scaled sampling for monocular 3D body tracking. In: CVPR (pp. 447–454).
Stauffer, C., & Grimson, W. (1999). Adaptive background mixture models for real-time tracking. In: CVPR (pp. 246–252).
Stenger, B., Thayananthan, A., Torr, P., & Cipolla, R. (2003). Filtering using a tree-based estimator. In: ICCV (pp. 1063–1070).
Sun, Y., Kohli, P., Bray, M., & Torr, P. H. S. (2006). Using strong shape priors for stereo. In: ICVGIP (pp. 882–893).
Urtasun, R., Fleet, D. J., Hertzmann, A., & Fua, P. (2005). Priors for people tracking from small training sets. In: ICCV (pp. 403–410).
Viola, P. A., & Jones, M. J. (2004). Robust real-time face detection. International Journal of Computer Vision, 57, 137–154.
Zhao, L., & Davis, L. S. (2005). Closely coupled object detection and segmentation. In: ICCV (pp. 454–461).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kohli, P., Rihan, J., Bray, M. et al. Simultaneous Segmentation and Pose Estimation of Humans Using Dynamic Graph Cuts. Int J Comput Vis 79, 285–298 (2008). https://doi.org/10.1007/s11263-007-0120-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-007-0120-6