ECCV 2012 - LNCS 7572-7578 and 7583-7585

Exploring the Spatial Hierarchy of Mixture Models for Human Pose Estimation

Yuandong Tian¹, C. Lawrence Zitnick², and Srinivasa G. Narasimhan¹

¹Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA 15213, USA
yuandong@cs.cmu.edu
srinivas@cs.cmu.edu

²Microsoft Research, One Microsoft Way, Redmond, WA 98052, USA
larryz@microsoft.com

Abstract. Human pose estimation requires a versatile yet well-constrained spatial model for grouping locally ambiguous parts together to produce a globally consistent hypothesis. Previous works either use local deformable models deviating from a certain template, or use a global mixture representation in the pose space. In this paper, we propose a new hierarchical spatial model that can capture an exponential number of poses with a compact mixture representation on each part. Using latent nodes, it can represent high-order spatial relationship among parts with exact inference. Different from recent hierarchical models that associate each latent node to a mixture of appearance templates (like HoG), we use the hierarchical structure as a pure spatial prior avoiding the large and often confounding appearance space. We verify the effectiveness of this model in three ways. First, samples representing human-like poses can be drawn from our model, showing its ability to capture high-order dependencies of parts. Second, our model achieves accurate reconstruction of unseen poses compared to a nearest neighbor pose representation. Finally, our model achieves state-of-art performance on three challenging datasets, and substantially outperforms recent hierarchical models.

LNCS 7576, p. 256 ff.

Full article in PDF | BibTeX