ECCV 2012 - LNCS 7572-7578 and 7583-7585

Exploring Bag of Words Architectures in the Facial Expression Domain

Karan Sikka, Tingfan Wu, Josh Susskind, and Marian Bartlett

Machine Perception Laboratory, University of California San Diego, USA
ksikka@mplab.ucsd.edu
ting@mplab.ucsd.edu
josh@mplab.ucsd.edu
marni@mplab.ucsd.edu

Abstract. Automatic facial expression recognition (AFER) has undergone substantial advancement over the past two decades. This work explores the application of bag of words (BoW), a highly matured approach for object and scene recognition to AFER. We proceed by first highlighting the reasons that makes the task for BoW differ for AFER compared to object and scene recognition. We propose suitable extensions to BoW architecture for the AFER’s task. These extensions are able to address some of the limitations of current state of the art appearance-based approaches to AFER. Our BoW architecture is based on the spatial pyramid framework, augmented by multiscale dense SIFT features, and a recently proposed approach for object classification: locality-constrained linear coding and max-pooling. Combining these, we are able to achieve a powerful facial representation that works well even with linear classifiers. We show that a well designed BoW architecture can provide a performance benefit for AFER, and elements of the proposed BoW architecture are empirically evaluated. The proposed BoW approach supersedes previous state of the art results by achieving an average recognition rate of 96% on AFER for two public datasets.

LNCS 7584, p. 250 ff.

Full article in PDF | BibTeX