ECCV 2012 - LNCS 7572-7578 and 7583-7585

Combining Per-frame and Per-track Cues for Multi-person Action Recognition

Sameh Khamis, Vlad I. Morariu, and Larry S. Davis

University of Maryland, College Park, USA

Abstract. We propose a model to combine per-frame and per-track cues for action recognition. With multiple targets in a scene, our model simultaneously captures the natural harmony of an individual’s action in a scene and the flow of actions of an individual in a video sequence, inferring valid tracks in the process. Our motivation is based on the unlikely discordance of an action in a structured scene, both at the track level and the frame level (e.g., a person dancing in a crowd of joggers). While we can utilize sampling approaches for inference in our model, we instead devise a global inference algorithm by decomposing the problem and solving the subproblems exactly and efficiently, recovering a globally optimal joint solution in several cases. Finally, we improve on the state-of-the-art action recognition results for two publicly available datasets.

LNCS 7572, p. 116 ff.

Full article in PDF | BibTeX