2012 ICPR

To show or hide the keywords and abstract of a paper (if available), click on the paper title
Open all abstracts Close all abstracts

Click on

to open the PDF file for that paper


ThPSAT1	Main Hall
Poster Shotgun (11): PR	Regular Session

08:30-09:00, Paper ThPSAT1.1
Object Clique Representation for Scenes Classification
Chen, Jingjing	Tianjin Univ.
Cao, Xiaochun	Tianjin Univ.
Zhang, Bao	Tianjin Univ.
Keywords: Classification and Clustering, Scene Understanding Abstract: High-level visual recognition such as scene classification is a challenging task in computer vision. In this paper, we propose an image descriptor based on semantic cliques obtained by high-order pure dependence, and the image is represented by a vector whose element denotes the probability of containing each object cliques. Compared with using single objects as attributes, such representation carries corresponding semantic information, making it more suitable for high-level visual recognition tasks. The experiments show that our object cliques as attributes for scene representation improves the accuracy of image classification.

08:30-09:00, Paper ThPSAT1.2
Towards Breast Ultrasound Image Segmentation Using Multi-Resolution Pixel Descriptors
Rodrigues, Rafael	U.B.I. - Univ. da Beira Interior
Pinheiro, Antonio	U.B.I. - Univ. da Beira Interior
Braz, Rui	U.B.I. - Univ. da Beira Interior
Pereira, Manuela	U.B.I. - Univ. da Beira Interior
Moutinho, Jose	U.B.I. - Univ. da Beira Interior
Keywords: Pattern Recognition for Bioinformatics, Segmentation, Color and Texture, Medical Image Analysis and Registration Abstract: Breast ultrasound images are an important diagnostic factor for breast cancer detection. However, ultrasound imaging is intrinsically degraded by noise, resulting in a difficult detection of masses or nodules, and, most importantly, the evaluation of their size and shape. Computer-aided diagnosis figures as a major help factor, when it comes to analyzing this type of medical imaging. A fully automated and computationally efficient method for breast ultrasound segmentation is proposed. The algorithm classifies the images, with Support Vector Machines and Discriminant Analysis classifiers, based on a pixel descriptor formed with the information from anisotropic diffusion, band-pass filtering and scale-space curvature. The final segmentation results after the application of a set of heuristic rules for the selection of the classifiers' result, based on the ultrasound image characteristics. The final segmentation results yielded good overall accuracy, precision and also recall rates.

08:30-09:00, Paper ThPSAT1.3
Hand-Dorsa Vein Recognition Based on Multi-Level Keypoint Detection and Local Feature Matching
Tang, Yinhang	Beihang Univ.
Huang, Di	Beihang Univ.
Wang, Yunhong	Beihang Univ.
Keywords: Biometrics Abstract: As a new biometric for person authentication, hand-dorsa vein has attracted increasing attention in recent years. This paper proposes a novel approach for hand-dorsa vein recognition, which makes use of multi-level keypoint detection and SIFT feature based local matching. In order to overcome the difficulty in finding local features on NIR images of hand dorsa, a multi-level keypoint detection approach, composed by Harris-Laplace and Hessian-Laplace detectors, is designed to localize enough keypoints so that more discriminative information can be highlighted. Then SIFT based local matching efficiently associates these keypoints between hand dorsa of the same individual. The experimental results achieved on the NCUT database clearly indicate the effectiveness of the proposed method for hand-dorsa vein recognition.

08:30-09:00, Paper ThPSAT1.4
Jensen Divergence Based SPD Matrix Means and Applications
Nielsen, Frank	Sony Computer Science Lab. Inc
Liu, Meizhu	Siemens Corp. Res.
Ye, Xiaojing	Georgia Tech.
Vemuri, Baba	Univ. of Florida
Keywords: Classification and Clustering, Image and Video Processing Abstract: Finding mean of matrices becomes increasingly important in modern signal processing problems that involve matrix-valued images. In this paper, we define the mean for a set of symmetric positive definite (SPD) matrices based on information-theoretic divergences as the unique minimizer of the averaged divergences, and compare it with the means computed using the Riemannian and Log-Euclidean metrics. For the class of divergences induced by the convexity gap of a matrix functional, we present a fast iterative concave-convex optimization scheme with guaranteed convergence to efficiently approximate those divergence-based means.

08:30-09:00, Paper ThPSAT1.5
Keyword Clustering for Automatic Categorization
Zhao, Qinpei	School of Computing, Univ. of Eastern Finland
Rezaei, Mohammad	School of Computing, Univ. of Eastern Finland
Chen, Hao	School of Computing, Univ. of Eastern Finland
Fr�nti, Pasi	Univ. of Eastern Finland
Keywords: Classification and Clustering, Document Understanding, Machine Learning and Data Mining Abstract: Processing short texts is becoming a trend in information retrieval. Since the text has rarely external information, it is more challenging than document. In this paper, keyword clustering is studied for automatic categorization. To obtain semantic similarity of the keywords, a broad-coverage lexical resource WordNet is employed. We introduce a semantic hierarchical clustering. For automatic keyword categorization, a validity index for determining the number of clusters is proposed. The minimum value of the index indicates the potentially appropriate categorization. We show the result in experiments, which indicates the index is effective.

08:30-09:00, Paper ThPSAT1.6
Hybdrid Content Based Image Retrieval Combining Multi-Objective Interactive Genetic Algorithm and SVM
Pighetti, Romaric	Lab. I3S, UMR UNS-CNRS 7271
Pallez, Denis	Lab. I3S, UMR UNS-CNRS 7271
Precioso, Frederic	Lab. I3S, UMR UNS-CNRS, 7271
Keywords: Machine Learning and Data Mining, Classification and Clustering, Multimedia Analysis, Indexing and Retrieval Abstract: The amount of images contained in repositories or available on Internet has exploded over the last years. In order to retrieve efficiently one or several images in a database, the development of Content-Based Image Retrieval (CBIR) systems has become an intensively active research area. However, most proposed systems are keyword-based and few imply the end-user during the search (through relevance feedback). Visual low-level descriptors are then substituted to keywords but there is a gap between visual description and user expectations. We propose a new framework which combines a multi-objective interactive genetic algorithm, allowing a trade-off between image features and user evaluations, and a support vector machine to learn the user relevance feedback. We test our system on SIMPLIcity database, commonly used in the literature to evaluate CBIR systems using a genetic algorithm, and it outperforms the recent frameworks.

08:30-09:00, Paper ThPSAT1.7
Statistical Modeling and Signal Selection in Multivariate Time Series Pattern Classification
Liu, Ruoqian	Univ. of Michigan - Dearborn
Xu, Shen	Univ. of Michigan - Dearborn
Fang, Chen	Univ. of Michigan - Dearborn
Liu, Yung-wen	Univ. of Michigan - Dearborn
Kochhar, Dev	Ford Motor Company
Murphey, Yi	Univ. of Michigan-Dearborn
Keywords: Feature Reduction and Manifold Learning, Statistical, Syntactic and Structural Pattern Recognition, Pattern Recognition for Bioinformatics Abstract: This paper presents an algorithm for selecting a compact subset of relevant signals for pattern classification problems involving multivariate time series (MTS) data. The algorithm uses a statistical causality modeling method to select relevant signals, and a moving average correlation analysis method to remove redundant signals. The MTS signal selection algorithm was evaluated through a case study: driver wellness classification. From a set of 20 time series signals, the signal selection algorithm selected a subset of 9 signals that are independent and most relevant to the pattern class. We trained a driver wellness classification system using Random Forest (RF) with the input of 20 original signals, and another system with the selected 9 signals. The experiment results show that the system used the selected 9 signals performed better than the system used the original set of 20 signals consistently over different sizes of RF.

08:30-09:00, Paper ThPSAT1.8
Multimodal Biometric Authentication Based on Iris Pattern and Pupil Light Reflex
Yano, Vitor	Unicamp
Zimmer, Alessandro	UFPR
Ling, Lee Luan	Unicamp
Keywords: Biometrics, Image and Video Processing, Detection, Separation and Segmentation Abstract: Biometrics-based authentication is a method of personal identification that has some advantages over the password and object-based ones, mainly for the user, who doesn't need to carry or memorize anything. However, this kind of identification is also subject to problems. Besides the technology-related possibilities of fraud, such as system invasion, database corruption or algorithm injection, some of the common used biometric features can be faked. Furthermore, most cases of false rejection are related to the quality of the acquired sample. This paper proposes a multimodal biometric authentication method which incorporates the use of dynamic features of the human reflex and the iris pattern recognition for a better performance. A prototype system has been implemented and tested with 59 volunteers. Experimental results presented an EER of 2.44%.

08:30-09:00, Paper ThPSAT1.9
A Study on Semi-Supervised Dissimilarity Representation
Dinh, Viet Cuong	Delft Univ. of Tech.
Duin, Robert	TU Delft
Loog, Marco	Delft Univ. of Tech. / Univ. of Copenhagen
Keywords: Statistical, Syntactic and Structural Pattern Recognition, Machine Learning and Data Mining, Classification and Clustering Abstract: In the dissimilarity representation approach, objects are represented by their dissimilarities with respect to a representation set, rather than by features. Up to now, the representation or prototype set has usually been selected from the training data, limiting the different aspects that can be captured, especially when the training data set is small. This paper studies the performance change if the object�s representation is extended by including also test data into the representation set in a semi-supervised setting. Experiments on a set of standard data show that the semi-supervised setting can substantially improve the performance of the dissimilarity based representation especially for the small sample size problem.

08:30-09:00, Paper ThPSAT1.10
Multi-Class Ada-Boost Classification of Object Poses through Visual and Infrared Image Information Fusion
Changrampadi, Mohamed	Chalmers Univ. of Tech.
Yun, Yixiao	Chalmers Univ. of Tech.
Gu, Irene Yu-Hua	Chalmers Univ. of Tech.
Keywords: Pattern Recognition for Search, Retrieval and Visualization, Classification and Clustering, Machine Learning and Data Mining Abstract: This paper presents a novel method for pose classification using fusion of visual and thermal infrared(IR) images. We propose a novel tree structure multi-class classification scheme with visual and IR sub-classifiers. These sub-classifiers are different from the conventional one-against-all or one-against-one strategies, where we handle the multi-class problem directly. We propose to use an accuracy score for the fusion of visual and IR sub-classifiers. In addition, we propose to use the original Haar features plus an extra one, and a multi-threshold weak learner to obtain weak hypothesis. The experimental results on a visual and IR image dataset containing 3018 face images in three poses show that the proposed classifier achieves high classification rate of 99.50% on the test set. Comparisons are made to a fused one-vs-all method, a classifier with visual band only, and a classifier with IR band only. Results provide further support to the proposed method.

08:30-09:00, Paper ThPSAT1.11
Cluster-Classification Bayesian Networks for Head Pose Estimation
Kafai, Mehran	Univ. of California, Riverside
Bhanu, Bir	Univ. of California
An, Le	Univ. of California, Riverside
Keywords: Classification and Clustering, Biometrics, Machine Learning and Data Mining Abstract: Head pose estimation is critical in many applications such as face recognition and human-computer interaction. Various classifiers such as LDA, SVM, or nearest neighbor are widely used for this purpose; however, the recognition rates are limited due to the limited discriminative power of these classifiers for discretized pose estimation. In this paper, we propose a head pose estimation method using a Cluster-Classification Bayesian Network (CCBN), specifically designed for classification after clustering. A pose layout is defined where similar poses are assigned to the same block. This increases the discriminative power within the same block when similar yet different poses are present. We achieve the highest recognition accuracy on two public databases (CAS-PEAL and FEI) compared to the state-of-the-art methods.

08:30-09:00, Paper ThPSAT1.12
Improving Cross-Validation Based Classifier Selection Using Meta-Learning
Krijthe, Jesse Hendrik	Delft Univ. of Tech.
Ho, Tin Kam	Bell Lab. Alcatel-Lucent
Loog, Marco	Delft Univ. of Tech. / Univ. of Copenhagen
Keywords: Classification and Clustering, Statistical, Syntactic and Structural Pattern Recognition Abstract: In this paper we compare classifier selection using cross-validation with meta-learning, using as meta-features both the cross-validation errors and other measures characterizing the data. Through simulation experiments we demonstrate situations where meta-learning offers better classifier selections than ordinary cross-validation. The results provide some evidence to support meta-learning not just as a more time efficient classifier selection technique than cross-validation, but potentially as more accurate. It also provides support for the usefulness of data complexity estimates as meta-features for classifier selection.

08:30-09:00, Paper ThPSAT1.13
Jensen-Shannon Graph Kernel Using Information Functionals
Bai, Lu	the Univ. of York
Hancock, Edwin	Univ. of York
Ren, Peng	China Univ. of Petroleum (Huadong)
Keywords: Statistical, Syntactic and Structural Pattern Recognition, Machine Learning and Data Mining, Classification and Clustering Abstract: In recent work we have shown how to use the von Neumann entropy to construct a Jensen-Shannon kernel on graphs. The kernel is defined as the difference in entropies between a product graph and the separate graphs being compared. To develop this graph kernel further, in this paper we explore how to render the computation of the Jensen-Shannon kernel more efficient by using the information functionals defined by Dehmer to compute the required entropies. We illustrate how the resulting Jensen-Shannon graph kernels can be used for the purposes of graph clustering. Experimental results reveal that the methods gives good classification performance on graphs extracted from an object recognition dataset and several bioinformatics datasets.

08:30-09:00, Paper ThPSAT1.14
Graph Clustering Using Graph Entropy Complexity Traces
Bai, Lu	the Univ. of York
Hancock, Edwin	Univ. of York
Ren, Peng	China Univ. of Petroleum (Huadong)
Han, Lin	THE Univ. OF YORK
Keywords: Machine Learning and Data Mining, Statistical, Syntactic and Structural Pattern Recognition, Pattern Recognition for Bioinformatics Abstract: In this paper, we aim to present a principled approach to the problem of depth-based complexity characterisation of graphs. Our idea is to decompose graphs into substructures of increasing size, and then to measure the complexity of these substructures using Shannon entropy or von-Neumann entropy. We commence by identifying the dominant vertex in a graph. From the dominant vertex, we construct subgraphs of increasing K layers, so-called semidiameter subgraphs. We then measure how the entropy varies with increasing K layer semidiameter subgraphs. We construct a vector of subgraph entropies for each graph, a depth-based complexity trace, and then perform graph clustering in the principal components space of the vectors. We explore our approach on both synthetic data and datasets from the domain of bioinformatics.

08:30-09:00, Paper ThPSAT1.15
Face Recognition in Multi-Camera Surveillance Videos
An, Le	Univ. of California, Riverside
Bhanu, Bir	Univ. of California
Yang, Songfan	Univ. of California, Riverside
Keywords: Pattern Recognition for Surveillance and Security, Biometrics, Classification and Clustering Abstract: Recognizing faces in surveillance videos becomes difficult due to the poor quality of the probe data in terms of resolution, noise, blurriness, and varying lighting conditions. In addition, the poses of probe data are usually not frontal view, contrary to the standard format of the gallery data. The discrepancy between the two types of the data makes the existing recognition algorithm less accurate in real-world data. In this paper, we propose a multi-camera video based face recognition framework using a novel image representation called Unified Face Image (UFI), which is synthesized from multiple camera feeds. Within a temporal window the probe frames from different cameras are warped towards a template frontal face and then averaged. The generated UFI is a frontal view of the subject that incorporates information from different cameras. We use SIFT flow as a high level alignment tool to warp the faces. Experimental results show that by using the fused face, the recognition performance is better than the result of any single camera. The proposed framework can be adapted to any multi-camera video based recognition method using any feature descriptors or classifiers.

08:30-09:00, Paper ThPSAT1.16
Digital Privacy: Replacing Pedestrians from Google Street View Images
Nodari, Angelo	Univ. degli studi dell'Insubria
Vanetti, Marco	Univ. degli Studi dell'Insubria
Gallo, Ignazio	Univ. degli Studi dell'Insubria
Keywords: Pattern Recognition for Surveillance and Security, Inpainting and Superimposing, Detection, Separation and Segmentation Abstract: Given the lack of modern techniques to ensure the digital privacy of individuals, we want to pave the way for a new approach to make pedestrians in cityscape images anonymous. To address these concerns, we propose an automated method to replace any unknown pedestrian with another one which is extracted from a controlled and authorized dataset. The techniques used up to now to make people anonymous are based mainly on the blurring of people�s faces, but even so it is possible to trace the identity of the subject starting from his clothing, personal items, hairstyle, the place and time where the photo was taken. The proposed method aims to make the pedestrians completely anonymous, and consists of four phases: firstly we identify the area where the pedestrian is located, we separate the pedestrian from the background, we select the most similar pedestrian from a controlled dataset and subsequently we substitute it. Our case study is Google Street View because it is one of the online services which suffers most from this kind of privacy issues. The experimental results show how this technique can overcome the problems of digital privacy with promising results.

08:30-09:00, Paper ThPSAT1.17
Thresholding-Based Segmentation Revisited Using Mixtures of Generalized Gaussian Distributions
Boulmerka, Aissa	�cole Nationale Sup�rieure d'informatique
Allili, Mohand Said	Univ. du Qu�bec en Outaouais
Keywords: Classification and Clustering, Segmentation, Color and Texture, Statistical, Syntactic and Structural Pattern Recognition Abstract: This paper presents a new approach to image-thresholding-based segmentation. It considerably improves existing methods by efficiently modeling non-Gaussian and multi-modal class-conditional distributions. The proposed approach seamlessly: 1) extends the Otsu's method to arbitrary numbers of thresholds and 2) extends the Kittler and Illingworth minimum error thresholding to non-Gaussian and multi-modal class-conditional data. We use the recently-proposed mixture of generalized Gaussian distributions (MoGG) modeling, which enables to efficiently represent heavy-tailed data, as well as multi-modal histograms with flat and sharply-shaped peaks. Experiments performed on synthetic data and real-world image segmentation show the performance of the proposed approach with comparison to recent state-of-the-art techniques.

08:30-09:00, Paper ThPSAT1.18
Face Analysis of Aggressive Moods in Automobile Driving Using Mutual Subspace Method
Moriyama, Tsuyoshi	Tokyo Pol. Univ.
Khiat, Abdelaziz	Nissan Motor Co., Ltd.
Shimomura, Noriko	Nissan Motor co., ltd.
Keywords: Gesture and Behavior Analysis, Pattern Recognition for Bioinformatics, Feature Reduction and Manifold Learning Abstract: Aggressive affections of automobile drivers such as irritation often cause unpleasant experiences and ultimately road rage. Detecting their cues from drivers' behaviors and obviating undesirable consequences is the most important role of automobile navigation for future safe driving. Facial expressions have been found to be a useful indicator of the driver's affection due to the robustness in monitoring drivers compared with other sensors. Affection consists of two kinds of factors: emotion (impulsive and strong) and mood (long lasting and subtle), where mood biases what kind of emotions to come up. Although moods dominate emotions, conventional approach in facial expression analysis has focused on emotion rather than mood in this context. The technical difficulty in analyzing moods is that there is no neutral expression that has been used as the firm reference for classifying facial expressions because the neutral is the mood itself and varies over time. The proposed method parameterizes appearance changes of face image sequence using mutual subspace method, and estimates the levels of aggressive mood, i.e., irritation and tense. Experimental results that used simulated facial expressions gave the optimal configuration of the proposed method.

08:30-09:00, Paper ThPSAT1.19
Automated Apple Stem End and Calyx Detection Using Evolution-COnstructed Features
Lillywhite, Kirt	Brigham Young Univ.
Tippetts, Beau	Brigham Young Univ.
Lee, Dah-Jye	Brigham Young Univ.
Keywords: Statistical, Syntactic and Structural Pattern Recognition, Features and Image Descriptors Abstract: A majority of consumers list flavor, unbruised and unblemished, and crispness as being the most important characteristics of apples. There is a need for reliable automatic inspection processes to identify bruises and blemishes allowing apples to be sent to the fresh fruit market. Several references state that distinguishing stem end and calyx from true defects is the main challenge for automated apple sorting systems. This research presents the application of the general object recognition algorithm, Evolutionary COnstructed features, to the problem of correctly distinguishing bruises and blemishes from the stem end and calyx of apples. The use of this algorithm demonstrates the feasibility of using machine vision technology with off-the-shelf optical and electronics components to detect true bruises and blemishes on apples with 94% accuracy.

08:30-09:00, Paper ThPSAT1.20
Top-K Correlated Subgraph Query for Data Streams
Pan, Shirui	Univ. of Tech. Sydney
Zhu, Xingquan	Florida Atlantic Univ.
Fang, Meng	Univ. of Tech. Sydney
Keywords: Machine Learning and Data Mining, Statistical, Syntactic and Structural Pattern Recognition Abstract: Given a query graph q, correlated subgraph query intends to find graph structures which are mostly correlated to q. This problem is fundamental for many pattern recognition applications involving structured data like graphs. Current available studies on correlation mining from graph data are all designed for static datasets. However, in real-life applications, data may arrive continuously in a streaming fashion with high speed. In this paper we investigate the problem of top-k correlated subgraphs query over stream. By employing Hoeffding bound into the candidate discovery process and carefully maintaining a candidate list over stream, a novel algorithm, Hoe-PG, is proposed to incrementally identify the top-k correlated subgraphs in a sliding window over stream. Experiments show that the proposed method is several times more efficient than its peer with respect to the runtime and the memory consumption, and is able to maintain high precision and recall for stream-based graph query.

08:30-09:00, Paper ThPSAT1.21
Timed and Probabilistic Automata for Automatic Animal Call Recognition
Duan, Shufei	QUEENSLAND Univ. OF Tech.
Zhang, Jinglan	QUEENSLAND Univ. OF Tech.
Roe, Paul	QUEENSLAND Univ. OF Tech.
Towsey, Michael	QUEENSLAND Univ. OF Tech.
Buckingham, Lawrence	QUEENSLAND Univ. OF Tech.
Keywords: Statistical, Syntactic and Structural Pattern Recognition, Machine Learning and Data Mining, Classification and Clustering Abstract: Automatic Call Recognition is vital for environmental monitoring. Patten recognition has been applied in automatic species recognition for years. However, few studies have applied formal syntactic methods to species call structure analysis. This paper introduces a novel method to adopt timed and probabilistic automata in automatic species recognition based upon acoustic components as the primitives. We demonstrate this through one kind of birds in Australia: Eastern Yellow Robin.

08:30-09:00, Paper ThPSAT1.22
Automatic Fuzzy Clustering Based on Mistake Analysis
Ben, Shenglan	NanjingUniversity of Science and Tech.
Jin, Zhong	Nanjing Univ. of Science and Tech.
Yang, Jingyu	Nanjing Univ. of Science and Tech.
Keywords: Classification and Clustering Abstract: This paper presents a robust fuzzy clustering algorithm which can perform clustering without pre-assigning the number of clusters and is not sensitive to the initialization of cluster centers. This is achieved by iteratively splitting and merging operations under the guidance of mistake measurements. In every step of the iteration, we first split the cluster containing data points belonging to different classes, and then merge the wrongly divided cluster pair. A validity index is proposed based on the two mistake measurements to determine the termination of the clustering process. Experimental results confirm the effectiveness and robustness of the proposed clustering algorithm.

08:30-09:00, Paper ThPSAT1.23
Utilizing Co-Occurrence Patterns for Semantic Concept Detection in Images
Feng, Linan	Univ. of California Riverside
Bhanu, Bir	Univ. of California
Keywords: Pattern Recognition for Search, Retrieval and Visualization, Image and Video Understanding, Detection, Separation and Segmentation Abstract: Semantic concept detection is an important open problem in concept-based image understanding. In this paper, we develop a method inspired by social network analysis to solve the semantic concept detection problem. The novel idea proposed is the detection and utilization of concept co-occurrence patterns as contextual clues for improving individual concept detection. We detect the patterns as hierarchical communities by graph modularity optimization in a network with nodes and edges representing individual concepts and co-occurrence relationships. We evaluate the effect of detected co-occurrence patterns in the application scenario of automatic image annotation. Experimental results on SUN�09 and OSR datasets demonstrate our approach achieves significant improvements over popular baselines.

08:30-09:00, Paper ThPSAT1.24
Hypergraph Matching Based on Marginalized Constrained Compatibility
Su, Jiang	Univ. of electronic science and Tech. of China
Le, Dong	Univ. of electronic science and Tech. of China
Ren, Peng	China Univ. of Petroleum (Huadong)
Hancock, Edwin	Univ. of York
Keywords: Statistical, Syntactic and Structural Pattern Recognition Abstract: We aim to match two hypergraphs via pairwise characterization of multiple relationships. To this end, we introduce a technique referred to as Marginalized Constrained Compatibility Estimation (MCCE), which transforms the compatibility tensor representing hyperedge similarities into a compatibility matrix representing edge similarities. We then cluster graph vertices associated with the compatibility matrix and extract its dominant set as the optimal matches. Our MCCE-based method overcomes the information loss arising in arithmetic average, which is commonly used for marginalization in the hypergraph matching literature. Experiments demonstrate the effectiveness of our method.

08:30-09:00, Paper ThPSAT1.25
Bias Analyses of Spontaneous Facial Expression Database
Wang, Zhaoyu	Univ. ofScience and Tech. of China, Hefei, Anhui,P.R.C
Wang, Shangfei	Univ. of Science and Tech. of China
Zhu, Yachen	Univ. of Science and Tech. of China
Ji, Qiang	RPI
Keywords: Gesture and Behavior Analysis Abstract: In this paper, cross-corpora evaluations are used to analyze the bias of spontaneous facial expression databases. Local binary pattern, Gabor, eigenface and fisherface features are extracted and applied to the four spontaneous expression databases: USTC-NVIE, VAM, Belfast Naturalistic and SEMAINE to recognize arousal (high/low) and valance (positive/negative) respectively. Experimental results indicate that there exists bias a- mong different spontaneous expression databases. The emotion-induction methods, the variety of subjects and the quantity of raters may have caused such a bias.

08:30-09:00, Paper ThPSAT1.26
Multiple HOG Templates for Gait Recognition
Liu, Yushu	Fudan Univ.
Zhang, Junping	Fudan Univ.
Wang, Chen	Fudan Univ.
Wang, Liang	Inst. of Automation, Chinese Acad. of Sciences
Keywords: Biometrics, Features and Image Descriptors, Image and Video Processing Abstract: In gait recognition field, template-based approaches such as Gait Energy Image (GEI) and Chrono-Gait Image (CGI) can achieve good recognition performance with low computational cost. Meanwhile, CGI can preserve temporal information better than GEI. However, they pay less attention to the local shape features. To preserve temporal information and generate more abundant local shape features, we generate multiple HOG templates by extracting Histogram of Oriented Gradients (HOG) of GEI and CGI templates. Experiments show that compared with several published approaches, our proposed multiple HOG templates achieve better performance for gait recognition.

08:30-09:00, Paper ThPSAT1.27
Distance Matrices As Invariant Features for Classifying MoCap Data
Vieira, Antonio Wilson	Univ. Federal de Minas Gerais
Lewiner, Thomas	PUC-Rio
Schwartz, William	Federal Univ. of Minas Gerais
Campos, Mario Montenegro Campos	Univ. Federal de Minas Gerais
Keywords: Gesture and Behavior Analysis, Classification and Clustering, Statistical, Syntactic and Structural Pattern Recognition Abstract: This work introduces a new representation for Motion Capture data (MoCap) that is invariant under rigid transformation and robust for classification and annotation of MoCap data. This representation relies on distance matrices that fully characterise the class of identical postures up to the body position or orientation. This high dimensional feature descriptor is tailored using PCA and incorporated into an action graph based classification scheme. Classification experiments on publicly available data show the accuracy and robustness of the proposed MoCap representation.

08:30-09:00, Paper ThPSAT1.28
Inference Bag of Features Using Sparse Coding for Image Classification
Peng, Yu	Univ. of Newcastle, Australia
Min, Xu	Univ. of Tech. Sydney
Jin, Jesse	Univ. of Newcastle, Australia
Luo, Suhuai	Univ. of Newcastle, Australia
Ni, Zefeng	Univ. of Tech. Sydney
Keywords: Classification and Clustering, Image and Video Processing, Features and Image Descriptors Abstract: In this paper, we originally propose an inference bag of features (BoF) method for image classification. Current BoF methods construct visual word dictionary (VWD) from training images. More training data are desired for higher classification rate. However, more training data increase size of visual word dictionary (VWD) as well as testing time. Fixed size of VWD in current methods guarantee processing speed, but would miss available training data. Our method addresses this dilemma. We use three sets of images: training, inference and testing images. Using sparse coding, VWD is constructed from inference images, the amount of which is fixed. Posterior probabilities of visual words over classes are learned from training images in a Bayesian framework. In testing, testing images are represented by visual words in VWD. The choices of representing visual words determine classification decision. We compared our method with two popular methods on gender classification and vehicle type classification. We achieved promising results.

08:30-09:00, Paper ThPSAT1.29
Efficient Sequence Kernel-Based Genome-Wide Prediction of Transcription Factors
Kuksa, Pavel	NEC Lab. America Inc.
Keywords: Pattern Recognition for Bioinformatics, Classification and Clustering, Machine Learning and Data Mining Abstract: With whole genome sequences of many organisms readily available, and lack of full functional characterization of the genes, computational functional analysis of whole genomes is a target of intensive research. Of a particular interest is prediction of regulatory functions, such as regulation of gene expression by transcription factors (TFs), proteins that bind to DNA to promote or suppress transcription of their target genes. Identification of these transcription factors at the genome level (i.e. from their sequence) lays a basis for further analysis and understanding of gene regulatory networks and can serve as a starting point for targeted high-throughput experiments. In this work, we address a question of predicting whether a (uncharacterized) protein is a transcription factor or not given its amino acid sequence. We cast this problem as classification task: we use sequence features as input variables and output functional class (TF or non-TF). We show that our proposed method can identify with high accuracy TFs at whole genome level both within given organism and across different organisms, as well as identify novel TF families with high accuracy.

08:30-09:00, Paper ThPSAT1.30
Predicting Battery Life from Usage Trajectory Patterns
Takahashi, Toshihiro	IBM
Ide, Tsuyoshi	IBM
Keywords: Machine Learning and Data Mining, Feature Reduction and Manifold Learning, Statistical, Syntactic and Structural Pattern Recognition Abstract: This paper addresses the task of predicting the battery capacity degradation ratio for a given usage pattern. This is an interesting pattern recognition task, where each usage pattern is represented as a trajectory in a feature space, and the prediction model captures the previous usage trajectory patterns. The main technical challenge here is how to build a good model from a limited number of training samples. To tackle this, we introduce a new smoothing technique in the trajectory space. The trajectory smoothing technique is shown to be equivalent of a novel regularization scheme for linear regression. Using real Li-ion battery data, we show that our approach outperforms existing methods.

08:30-09:00, Paper ThPSAT1.31
An Improved K-Means Document Clustering Using Wikipedia Hierarchical Ontology
Hassan, Mostafa	Centre for Pattern Analysis and MachineIntelligence(CPAMI),Univ.
Karray, Fakhri	Univ. of waterloo
Kamel, Mohamed S	Univ. of Waterloo
Keywords: Machine Learning and Data Mining, Classification and Clustering, Statistical, Syntactic and Structural Pattern Recognition Abstract: Text document clustering is one of the crucial tasks in text mining. It is used in many different text mining applications. One of the most commonly used algorithms for document clustering is the k-means algorithm, the main drawback of which is that its output performance is very sensitive to its initial clusters� centroids. In this work, we present a technique to initialize the centroids based on background knowledge structure extracted from one of the largest online knowledge repositories: Wikipedia. Results show that the proposed model is efficient, and promising, as it outperforms the accuracy of the conventional k-means clustering, as well as other conventional algorithms for document clustering.

08:30-09:00, Paper ThPSAT1.32
Multi-Task Co-Clustering Via Nonnegative Matrix Factorization
Xie, Saining	Shanghai Jiao Tong Univ.
Lu, Hongtao	Shanghai Jiao Tong Univ.
He, Yangcheng	Shanghai Jiao Tong Univ.
Keywords: Machine Learning and Data Mining, Classification and Clustering, Document Understanding Abstract: Recent results have empirically proved that, given several related tasks with different data distributions and an algorithm that can utilize both the task-specific and cross-task knowledge, clustering performance of each task can be significantly enhanced. This kind of unsupervised learning method is called multi-task clustering. We focus on tackling the multi-task clustering problem via a 3-factor nonnegative matrix factorization. The object of our approach consists of two parts: (1) Within-task co-clustering: co-cluster the data in the input space individually. (2) Cross-task regularization: Learn and refine the relations of feature spaces among different tasks. We show that our approach has a sound information theoretic background and the experimental evaluation shows that it outperforms many state-of-the-art single-task or multi-task clustering methods.

08:30-09:00, Paper ThPSAT1.33
Designing Various Component Analysis at Will
Kimura, Akisato	NTT Corp.
Sugiyama, Masashi	Tokyo Inst. of Tech.
Sakano, Hitoshi	NTT
Kameoka, Hirokazu	NTT Corp.
Keywords: Statistical, Syntactic and Structural Pattern Recognition, Machine Learning and Data Mining, Feature Reduction and Manifold Learning Abstract: This paper provides a generic framework of component analysis (CA) methods introducing a new expression for scatter matrices and Gram matrices, called Generalized Pairwise Expression (GPE). This expression is quite compact but highly powerful. The framework includes not only (1) the standard CA methods but also (2) several regularization techniques, (3) weighted extensions, (4) some clustering methods, and (5) their semi-supervised extensions. This paper also presents quite a simple methodology for designing a desired CA method from the proposed framework: Adopting the known GPEs as templates, and generating a new method by combining these templates appropriately.

08:30-09:00, Paper ThPSAT1.34
Hierarchical Multilevel Object Recognition Using Markov Model
Attamimi, Muhammad	The Univ. of Electro-Communications
Nakamura, Tomoaki	The Univ. of Electro-Communications
Nagai, Takayuki	The Univ. of Electro-Communications
Keywords: Statistical, Syntactic and Structural Pattern Recognition, 2D/3D Object Detection and Recognition, Vision for Robotics Abstract: In this study, we address the issue on multilevel object recognition. The multilevel object recognition is object recognition in various levels, that is, simultaneous recognition of its instance, category, material, etc. At each level, many recognition methods have been proposed in the literature. Therefore it is straightforward to design a multilevel object recognition system using conventional methods independently. However, these "levels" are related each other and form hierarchical structure. Hence the recognition performance can be improved by considering consistency of the recognition results at all levels. To model the consistency, we formulate the problem as finding the Viterbi path in a Markov model, since the consistent recognition results can be thought of as the most likely sequence of the states. We implemented the proposed multilevel object recognition system and evaluated it to show validity.

08:30-09:00, Paper ThPSAT1.35
A Linear Max K-Min Classifier
Dong, Mingzhi	Beijing Univ. of Posts and Telecommunications
Yin, Liang	Beijing Univ. of Posts and Telecommunications
Deng, Weihong	Beijing Univ. of Posts and Telecommunications
Wang, Qiang	Beijing Univ. of Posts and Telecommunications
Yuan, Caixia	Beijing Univ. of Posts and Telecommunications(BUPT).P.R.Ch
Guo, Jun	Beijing Univ. of Posts and Telecommunications
Shang, Li	Intel China Res. Center
Ma, Liwei	Intel China Res. Center
Keywords: Classification and Clustering Abstract: Over the past decades, the mathematical modeling of classifier has always been a hot topic in the field of pattern recognition. Maximin classifier, which pays strong attention to the worst instance of each class, has achieved excellent performance in a great number of applications. However, the maximin classifiers only consider the most boundary point/points of each class. Thus this paper proposes a more robust Linear Max K-min (LMKM) Classifier for 2-class classification problems by finding a hyperplane which best classifies K-worst cases. The original objective function is reformulated into a linear programming problem with 2N constraints which can be solved with high computational efficiency, where N indicates the number of training samples. Our algorithm is tested in 18 publicly available 2-class classification datasets and the experiment results show that the classification performance of LMKM is competitive with Linear Support Vector Machine (SVM) and Logistic Regression (LR).

08:30-09:00, Paper ThPSAT1.36
A Dual-Staged Classification-Selection Approach for Automated Update of Biometric Templates
Rattani, Ajita	Univ. of Cagliari
Marcialis, Gian Luca	Univ. of Cagliari
Granger, Eric	�cole de Tech. sup�rieure
Roli, Fabio	Univ. of Cagliari
Keywords: Biometrics Abstract: In the emerging field of adaptive biometrics, systems aim to adapt enrolled templates to variations in samples observed during operations. However, despite numerous advantages, few commercial vendors have adopted auto-update procedures in their products. This is due to limitations associated with existing adaptation schemes. This paper proposes a dual-staged template adaptation scheme that allows to capture 'informative' operational samples with significant variations but without increasing the vulnerability to impostor intrusion. This is achieved through a two staged classification-selection approach driven by the harmonic function and risk minimization technique, over a graph based representation of (enrolment and operational) samples. Experimental results on the DIEE fingerprint data set, explicitly collected for evaluating adaptive biometric systems, demonstrate that the proposed scheme results in 67% reduction in error over the baseline system (without adaptation), outperforming state-of-the-art methods.

08:30-09:00, Paper ThPSAT1.37
The Bayesian Logistic Regression in Pattern Recognition Problems under Concept Drift
Turkov, Pavel	Tula State Univ.
Krasotkina, Olga	Tula State Univ.
Mottl, Vadim	Computing Center of the Russian Acad. of Sciences
Keywords: Machine Learning and Data Mining, Classification and Clustering Abstract: The practice always makes us face the challenge of processing pattern recognition data flows with time-varying target concept, i.e., changing statistical relationship between class memberships and observable characteristics of entities to be perceived by the recognition system. In this paper, a mathematical and algorithmic framework is proposed for handling the concept drift in pattern recognition problems on the basis of the Bayesian treatment of logistic regression as an appropriate mathematical instrument for inferring a time-varying decision rule. The pattern recognition procedure resulting from this approach is a numerical implementation of the general dynamic programming principle, and has the linear computational complexity with respect to the length of the time series, in contrast to the polynomial complexity of pattern recognition procedures of general kind.

08:30-09:00, Paper ThPSAT1.38
Adaptive Selection of Ensembles for Imbalanced Class Distributions
Radtke, Paulo Vinicius Wolski	�cole de Tech. sup�rieure
Granger, Eric	�cole de Tech. sup�rieure
Sabourin, R.	�cole de Tech. sup�rieure
Gorodnichy, Dmitry	Canada Border Services Agency
Keywords: Pattern Recognition for Surveillance and Security, Biometrics Abstract: Boolean combination (BC) techniques have been shown to efficiently integrate the responses of multiple diversified classifiers in the ROC space to improve the overall accuracy and reliability of pattern recognition systems. In practice, since class distributions are often imbalanced and change over time, the BC of classifiers, and thus selection of ensembles, should be adapted to reflect operational conditions. Although the impact on classification performance of imbalanced distributions may be addressed using ensemble-based techniques, this is difficult to observe from ROC curves. However, given a desired false positive rate and class imbalance, performing BC in the Precision-Recall Operating Characteristic (PROC) space with skewed data may lead to a higher level of performance. In this paper, an adaptive system is proposed that initially generates several PROC curves, each one from data with a different level of skew. Then, during operations, the class imbalance is periodically estimated, and used to approximate the most accurate BC of classifiers among operational points of these curves. Simulation results indicate that this approach maintains a high level of accuracy that is comparable to full Boolean re-combination (as required for a specific level of imbalance), but for a significantly lower computational cost.

08:30-09:00, Paper ThPSAT1.39
Applying Scattering Operators for Face Recognition: A Comparative Study
Chang, Kuang-Yu	Acad. Sinica
Lin, Cheng-Fu	Res. center for Information Tech. Innovation, Acad.
Chen, Chu-Song	Acad. Sinica
Hung, Yi-Ping	National Taiwan Univ.
Keywords: Pattern Recognition for Bioinformatics Abstract: Face identification is the problem of determining whether two face images depict the same person or not. This is difficult due to variations in scale, pose, lighting, background, expression, hairstyle, and glasses. Thus, a powerful feature descriptor with local-deformation tolerance ability and discriminating capability is essential to fulfill all these variations. In this paper, we present a local descriptor, scattering operator, which includes multi-scale and multi-direction co-occurrence information. It is computed with a cascade of wavelet decompositions and complex modulus. This scattering representation is locally translation invariant and can linearize deformations. We evaluate the abilities of this Gabor-based scattering operator by an effective face recognition paradigm and show that this descriptor outperforms the compared descriptors.

08:30-09:00, Paper ThPSAT1.40
Single-Frame Hand Gesture Recognition Using Color and Depth Kernel Descriptors
Zhu, Xiaolong	The Univ. of Hong Kong
Wong, Kwan-Yee Kenneth	The Univ. of Hong Kong
Keywords: Gesture and Behavior Analysis, Features and Image Descriptors, Human Computer Interaction Abstract: This paper presents a flexible method for single-frame hand gesture recognition by fusing information from color and depth images. Existing methods usually focus on designing intuitive features for color and depth images. On the contrary, our method first extracts common patch-level features, and fuses them by means of kernel descriptors. Linear SVM is then adopted to predict the class label efficiently. In our experiments on two American Sign Language (ASL) datasets, we demonstrate that our approach recognizes each sign accurately with only a small number of training samples, and is robust to the change of distance between the hand and the camera.

08:30-09:00, Paper ThPSAT1.41
An Efficient Method for Occluded Face Recognition
Liu, Wentao	Tsinghua Univ.
Xie, Xudong	Tsinghua Univ.
Lam, Kin-Man	The Hong Kong Pol. Univ.
Keywords: Biometrics Abstract: During the last two decades, a series of subspace methods have succeeded in achieving a satisfactory performance for face recognition tasks, but have always failed when partial occlusions occur. This paper combines the subspace techniques with probabilistic models, and aims at achieving invariance to occlusions. The concept underlying the proposed method is that two faces with the same identity, even though one of them is partially occluded, tend to be similar in the uncorrupted areas. The similarity value measured from the error distributions can then be exploited for identification. Experiments show the robustness of this novel method against various kinds of occlusion.

08:30-09:00, Paper ThPSAT1.42
Collaborative PLSA for Multi-View Clustering
Jiang, Yu	Inst. of Automation, Chinese Acad. of Science
Liu, Jing	National Lab. of Pattern Recognition,Inst.
Li, Zechao	Inst. of Automation, Chinese Acad. of Science
Lu, Hanqing	Inst. of Automation,Chinese Acad. of Science
Keywords: Classification and Clustering, Machine Learning and Data Mining Abstract: In real world, data has multi-view representations from different feature spaces. Multi-view clustering algorithms allow leveraging information from multiple views of the data and this may substantially improve the clustering result obtained by using a single view. In this paper, we propose a novel algorithm called Collaborative PLSA (C-PLSA) for multi-view clustering, which works on the assumption that the clustering from one view should agree with the clustering from another view. The proposed C-PLSA combines individual PLSA models on two different views, and imports a regularizer to force the both clustering results agree across the two views. To solve the regularized problem, an alternating optimization algorithm based on generalized EM (GEM) is adopted for maximum likelihood estimation. Experiments on two real-world datasets, i.e., Reuters multilingual text and Corel images, demonstrate the improved performance of our proposed method over some related work.

08:30-09:00, Paper ThPSAT1.43
A Cross-Device Matching Fingerprint Database from Multi-Type Sensors
Jia, Xiaofei	Insititute of Automation, Chinese Acad. of Sciences
Yang, Xin	Insititute of Automation, Chinese Acad. of Sciences
Zang, Yali	Inst. of Automation, Chinese Acad. of Sciences
Zhang, Ning	Insititute of Automation, Chinese Acad. of Sciences
Tian, Jie	Insititute of Automation, Chinese Acad. of Sciences
Keywords: Biometrics Abstract: Databases play an important role in evaluating the performance of fingerprint identification algorithms. But which can be used to test the interoperability? That is to say, few of databases can test the performance of an algorithm on images acquired by different sensors. In order to solve the problem, we create the FingerPass cross-device matching fingerprint database which consists of almost 80 thousand fingerprint images from 90 subjects on nine different fingerprint sensors. We take both technology type and interaction type into consideration when choosing the sensors, totally different from other databases. It can test the interoperability of an algorithm at both the sensor level and the sensor type level. So we can use the FingerPass to test the performance of a cross-device matching algorithm for sensors of a specific type or different types . We apply the VeriFinger fingerprint recognition algorithm on it, and the experimental results indicate that the FingerPass cross-device matching database is a challenge for fingerprint algorithms.

08:30-09:00, Paper ThPSAT1.44
Object Categorization Via Sparse Representation of Local Features
Wang, Jin	Deakin Univ. Australia
Sun, Xiangping	Deakin Univ.
Chen, Ronghua	Deakin Univ.
She, Mary Fenghua	Deakin Univ.
Wang, Qiang	CSR Zhuzhou Inst. CO.,LTD, China
Keywords: Classification and Clustering, Features and Image Descriptors, Image and Video Understanding Abstract: Sparse representation has been introduced into the computer vision to address many recognition problems. In this paper, we propose a new framework for object categorization based on sparse representation of local features. Unlike most of previous sparse coding based methods in object classification that only use sparse coding to extract high-level features, the proposed method incorporates sparse representation and classification into a unified framework. Therefore, it does not need a further classifier. Experimental results show that the proposed method achieved better or comparable accuracy than the well known bag-of-features representation with various classifiers.

08:30-09:00, Paper ThPSAT1.45
Hash-Based Structural Similarity for Semi-Supervised Learning on Attribute Graphs
Hido, Shohei	Preferred Infrastructure
Kashima, Hisashi	The Univ. of Tokyo
Keywords: Machine Learning and Data Mining, Feature Reduction and Manifold Learning, Classification and Clustering Abstract: We present an efficient method to compute similarity between graph nodes by comparing their neighborhood structures rather than proximity. The key is to use a hash for avoiding expensive subgraph comparison. Experiments show that the proposed algorithm performs well in semi-supervised node classification.

08:30-09:00, Paper ThPSAT1.46
Commensurate Dimensionality Reduction for Extended Local Ternary Patterns
Liao, Wen-Hung	National Chengchi Univ.
Keywords: Feature Reduction and Manifold Learning, Classification and Clustering Abstract: We present a systematic approach to reduce the dimensionality of the feature vector for local binary/ternary patterns. The proposed framework examines the distribution of uniform patterns in different image sets to formulate a procedure to assign dimensionality to uniform and non-uniform patterns. Unlike previous methods where all the information from non-uniform patterns is discarded or merged into a single dimension, the proposed commensurate dimensionality reduction (CDR) technique attempts to retain valuable information from all contributory factors. Experiments and comparative analysis have validated the efficacy of the newly defined CDR-ELTP descriptor in terms of noise resistance and texture classification.


ThPSAT2	Multi-Purpose Hall
Poster Shotgun (12): SS	Regular Session

08:30-09:00, Paper ThPSAT2.1
Anomalous Tie Plate Detection for Railroad Inspection
Li, Ying	IBM T. J. Watson Res. Center
Keywords: Image and Video Processing, Machine Learning and Data Mining, Motion, Tracking and Video Analysis Abstract: This paper describes our latest work on identifying anomalous tie plates to automate railroad inspection using machine vision technology. Specifically, we have developed a completely automatic detection scheme to recognize tie plates with anomalous spiking patterns using various video analytics. In particular, each tie plate is first represented by four characteristic regions-of-interest (ROI), then each ROI is fed into a pre-trained SVM (Support Vector Machine) model, and classified to be either spike- or spike hole-related. Next, the dissimilarity between the current tie plate and a reference set of tie plates in a sliding window is measured and analyzed. Based on that, it is finally recognized as either an anomalous or a normal tie plate. Preliminary experiments conducted on a set of videos captured by our own designed imaging system, has achieved an average precision, recall and false alarm rates of 88%, 92.8% and 2.16%, respectively. This validates the promising direction of applying machine vision technology to assist in railroad inspection.

08:30-09:00, Paper ThPSAT2.2
Query by Humming Via Hierarchical Filters
Guo, Zhiyuan	Beijing Univ. of Posts and Telecommunications
Wang, Qiang	Beijing Univ. of Posts and Telecommunications
Yin, Liang	Beijing Univ. of Posts and Telecommunications
Liu, Gang	Beijing Univ. of Posts and Telecommunications
Guo, Jun	Beijing Univ. of Posts and Telecommunications
Keywords: Multimedia Analysis, Indexing and Retrieval, Speech and Audio Processing, Speech and Audio Analysis Abstract: This paper proposes an effective implementation of query by humming (QBH) system via hierarchical filters. Firstly locality sensitive hashing (LSH) is used to screen candidate fragments. Secondly linear scaling (LS) is applied to filter out more false candidates and a new method called linear alignment (LA) is presented to locate accurate boundaries of fragments. Then recursive alignment (RA) is employed for the remaining ones. Finally, scores of scaling factor (SF) is fused with scores of RA to rank the songs. Experiments conducted on a database of 5,000 MIDI files show that the proposed approach achieved the relative improvement of mean reciprocal rank up to 37.3% compared with the state-of-the-art method.

08:30-09:00, Paper ThPSAT2.3
Smoothness-Constrained Face Photo-Sketch Synthesis Using Sparse Representation
Chang, Liang	Beijing Normal Univ.
Deng, Xiaoming	Inst. of Automation, Chinese Acad. of Sciences
Zhou, Mingquan	Beijing Normal Univ.
Duan, Fuqing	Beijing Normal Univ.
Wu, Zhongke	Beijing Normal Univ.
Keywords: Image and Video Processing Abstract: Face photo-sketch and sketch-photo synthesis have important usages in law enforcement. It is challenging to synthesize face sketches from photos because the drawing techniques and styles of artists' depictions are hard to be learned. To synthesize face photos from sketches is also hard due to its ill-posed nature. In order to avoid mosaic effects in the existed photo-sketch methods, we propose a smoothness-constrained photo-sketch synthesis method via sparse representation. The work is an extension of the previous work. The method is modeled as the minimization of an energy function, a large scale convex optimization problem with l_1-norm constraint. Since previous optimization methods are infeasible to solve our problem, we propose an iterative optimization approach, which decomposes the large scale optimization into a sequence of small scale optimizations and solve them iteratively to obtain the approximated optimal solution. The same synthesis strategy can be also used to synthesize photos from sketches. Experiments show its effectiveness.

08:30-09:00, Paper ThPSAT2.4
Towards Making Thinning Algorithms Robust against Noise in Sketch Images
Chatbri, Houssem	Univ. of Tsukuba
Kameyama, Keisuke	Univ. of Tsukuba
Keywords: Image and Video Processing, Enhancement, Restoration and Filtering Abstract: We introduce an adaptation framework based on scale space filtering for making thinning algorithms robust against noise in sketch images. The framework takes a sketch image as input, produces a set of Gaussian blurred images of the input sketch and uses a thinning algorithm to produce thinned versions of the blurred images. The algorithm's output is then the thinned image with the best performance measurement. Experiments using the proposed framework embedding state-of-the-art thinning algorithms show robustness against various types of noise.

08:30-09:00, Paper ThPSAT2.5
Tempo Variation Based Multilayer Filters for Query by Humming
Wang, Qiang	Beijing Univ. of Posts and Telecommunications
Guo, Zhiyuan	Beijing Univ. of Posts and Telecommunications
Li, Baoxiang	Beijing Univ. of Posts and Telecommunications
Liu, Gang	Beijing Univ. of Posts and Telecommunications
Guo, Jun	Beijing Univ. of Posts and Telecommunications
Keywords: Speech and Audio Processing, Speech and Audio Analysis, Multimedia Analysis, Indexing and Retrieval Abstract: In this paper we propose a methodology of multilayer filters based on tempo variation for realizing a query by humming (QBH) system. Firstly the original query clip is used to search for the candidate songs. If the results are unreliable, the clip is linearly scaled twice for more candidates. If the results are still unreliable, the clip is scaled more times for retrieval. To sort all the candidates, a new matching algorithm called key transposition recursive alignment (KTRA) is presented, which improves the retrieval accuracy. Experimental results on the 2010 MIREX QBH query corpus show that the proposed method can achieve a relative improvement of 20.9% as well as an acceleration factor of 2.09 simultaneously compared to a state-of-the-art method.

08:30-09:00, Paper ThPSAT2.6
Find Dominant Bins of a Histogram by Sparse Representation
Guo, Xin	Beijing Univ. of Posts and Telecommunications
Zhao, Zhicheng	Beijing Univ. of Posts and Telecommunications
Cai, Anni	beijing Univ. of posts and telecommunications
Keywords: Image and Video Processing, Classification and Clustering, Low-Level Vision Abstract: Bag of words (BoW) method has been widely used for image (feature) representation and gained great success for its simplicity but efficient power. However, due to the unsupervised clustering, visual words are equally treated for all classes and are not discriminative for classification. We found that only a few words are activated when samples from one class are sparsely represented over the visual words. Based on this observation, we propose an approach to find the dominant and useful bins in image histogram for each class with sparse representation technique. The resulted histogram with only dominant bins then becomes more discriminative for classification. Experiments on three widely used datasets demonstrate superior performance of the proposed approach over standard BoW method.

08:30-09:00, Paper ThPSAT2.7
Character Extraction in Web Image for Text Recognition
Su, Bolan	National Univ. of Singapore
Lu, Shijian	-
Phan, Trung Quy	National Univ. of Singapore
Tan, Chew-Lim	National Univ. of Singapore
Keywords: Enhancement, Restoration and Filtering, Character and Text Recognition, Classification and Clustering Abstract: Images with text are frequently used on Internet for different purposes. Automatic recognition of text from web images plays an important role on extraction and retrieval of web information. However, the web images are usually in low resolution with artifacts and special effects, which makes word recognition a challenge task even after the text has been localized. In this paper, we propose a robust text recognition technique to efficiently convert the web images into text format. The proposed technique first makes use of the L0 norm smoothing to increase the edge contrast of the input web images. The images are then binarized on each color channel. A connected component analysis is followed to identify the possible character components. Finally the character candidates are recognized by the OCR engine after skew correction. Extensive experiments have been conducted on the latest ICDAR 2011 robust reading competition dataset for born-digital text. The experimental results show the superior performance of our proposed technique.

08:30-09:00, Paper ThPSAT2.8
Comparison of Restoration Quality on Square and Hexagonal Grids Using Normalized Convolution
Linn�r, Elisabeth	Uppsala Univ.
Strand, Robin	Uppsala Univ.
Keywords: Enhancement, Restoration and Filtering, Image and Video Processing Abstract: Normalized convolution can be used to restore information that has been lost from an image, such as dead pixels, using the remaining information, and ignoring the incorrect pixels. It is known that the representation quality of an image consisting of a given number of pixels depends on how these pixels are distributed. In this paper, we investigate whether the ability to restore information using normalized convolution is affected by the sampling grid of the image. We compare square and hexagonal grids, and find that, in general, more pixels can be restored in hexagonal grids.

08:30-09:00, Paper ThPSAT2.9
Color-Line Vector Field and Local Color Component Decomposition for Smoothing and Denoising of Color Images
Shirai, Keiichiro	Shinshu Univ.
Okuda, Masahiro	The Univ. of Kitakyushu
Ikehara, Masaaki	Keio Univ.
Keywords: Image and Video Processing, Enhancement, Restoration and Filtering, Coding and Compression Abstract: Most conventional smoothing and denoising methods for color images deal with each color channel independently, which results in discolorations due to unbalancing the relation between the color components. In this paper, we propose a smoothing algorithm to reduce discolorations based on "color-lines". Our iterative algorithm consists of a local color decomposition step by color-line vectors and an iterative filtering step. Our numerical simulation shows that the method improves image quality with less discolorations while keeping the smoothing capability.

08:30-09:00, Paper ThPSAT2.10
Blind Image Deblurring Based on Sparse Prior of Dictionary Pair
Li, Haisen	Northwestern Pol. Univ.
Zhang, Yanning	Northwestern Pol. Univ.
Zhang, Haichao	Northwestern Pol. Univ.
Zhu, Yu	Northwestern Pol. Univ.
Sun, Jinqiu	Northwestern Pol. Univ.
Keywords: Image and Video Processing, Enhancement, Restoration and Filtering, Statistical, Syntactic and Structural Pattern Recognition Abstract: Blind image deblurring, aiming at obtaining the sharp image from blurred one, is a widely existing problem in image processing. Traditional image deblurring methods always use the deconvolution method to remove the blur kernel's effect, however, deconvolution is so sensitive to noise that inevitable artifacts always exist in the deblurring results, even though regularity terms are introduced as constraints. In this paper, we propose a novel blind image deblurring method based on the sparse prior of dictionary pair, estimating the sparse coefficient, sharp image and blur kernel alternately. The proposed method could avoid the deconvolution problem which is an ill-posed problem, and obtain the result with fewer artifacts. Compared with the state-of-the-art method, experimental results demonstrate that the proposed method could obtain better performance.

08:30-09:00, Paper ThPSAT2.11
Patch-Based Image Colorization
Bugeau, Aur�lie	Univ. Bordeaux, Lab.
Ta, Vinh-Thong	Univ. Bordeaux
Keywords: Enhancement, Restoration and Filtering, Computational Photography, Image and Video Processing Abstract: Image colorization consists in adding colors to grayscale images. Two approaches are mainly used: the first one consists in using manually pre-defined color inputs while the second consider an entire colored image as color example to transfer. The work presented here is in the second category. Indeed, we propose a simple patch-based image colorization based on an input image as a color example. First, we introduce a general colorization model in which many methods of literature can be casted within this framework. Second, we describe our method which is based on patch descriptors of luminance features and a color prediction model with a general distance selection strategy. We also propose to perform a Total Variation (TV) regularization on the colorized image to ensure the spatial color coherency of the final result. Finally, experiments show the potentiality of our proposition in order to automatically colorize grayscale images. Comparisons with methods from the literature are also provided.

08:30-09:00, Paper ThPSAT2.12
Collaborative and Compressive High-Resolution Imaging
Zhang, Yanning	Northwestern Pol. Univ.
Zhang, Haichao	Northwestern Pol. Univ.
Huang, Thomas	Univ. of Illinois at Urbana-Champaign
Keywords: Image and Video Processing, Enhancement, Restoration and Filtering, Computational Photography Abstract: We present a novel collaborative and compressive high-resolution image acquisition method in this paper. The proposed approach acquires several coded low resolution observations via the designed image formation process. The imaging process is achieved via random convolution followed with subsampling, which is practical for hardware implementation. The latent high resolution image is then recovered via a joint optimization scheme in a collaborative manner. An efficient optimization algorithm is developed for recovering the underlying high-resolution image. Experimental results compared with several related imaging schemes have clearly demonstrated the effectiveness of the propose method.

08:30-09:00, Paper ThPSAT2.13
Hamiltonian Monte Carlo Estimator for Abrupt Motion Tracking
Wang, Fasheng	Dalian Maritime Univ.
Lu, Mingyu	Dalian Maritime Univ.
Keywords: Image and Video Processing, Image and Video Understanding Abstract: In this paper, we propose a Hamiltonian Markov Chain Monte Carlo based tracking algorithm for abrupt motion tracking within the Bayesian filtering framework. In this tracking scheme, the object states are augmented by introducing a momentum item and the Hamiltonian Dynamics (HD) is integrated into the traditional MCMC based tracking method. The HD has some excellent properties which are crucial in constructing MCMC updates. A new object state is proposed through constructing a trajectory according to HD, implemented using the Leapfrog method. And the state proposed by the HD can keep a certain distant from the current object state but nevertheless have a high acceptance probability, which consequently bypass the slow exploration of the state space suffered by traditional random-walk proposal distribution. In addition, the proposed tracking algorithm can avoid being trapped in local maxima, which is suffered by conventional MCMC based tracking algorithms. Experimental results reveal that our approach is efficient and effective in handling various types of abrupt motions compared to several alternatives.

08:30-09:00, Paper ThPSAT2.14
Depth Image Enhancement for Kinect Using Region Growing and Bilateral Filter
Chen, Li	Hunan Univ.
Hui, Lin	Hunan Univ.
Li, Shutao	Hunan Univ.
Keywords: Image and Video Processing, Enhancement, Restoration and Filtering Abstract: Microsoft�s Kinect as a recent 3D sensor has attracted considerable research attention in the fields of computer vision and pattern recognition. But its depth image suffers from the problem of poor accuracy caused by invalid pixels, noise and unmatched edges. In this paper, an efficient approach is proposed to improve the quality of Kinect�s depth image. Using its corresponding color image, the pixels with wrong depth values are detected and removed using a region growing method. To accurately estimate the values of invalid pixels, a joint bilateral filter is used to fill the holes. Considering the special noise property of Kinect sensor, an adaptive bilateral filter is proposed to effectively reduce the noise of the depth image. Experimental results show that the proposed method significantly improves the quality of depth image by successfully filling the holes, eliminating the unmatched edges and reducing the noise.

08:30-09:00, Paper ThPSAT2.15
Multi-Modality Movie Scene Detection Using Kernel Canonical Correlation Analysis
Gao, Guangyu	Beijing Univ. of Posts and Telecommunications
Ma, Huadong	Beijing Univ. of Posts and Telecommunications
Keywords: Image and Video Processing, Multimedia Analysis, Indexing and Retrieval, Detection, Separation and Segmentation Abstract: Scene detection is the fundamental step for efficient accessing and browsing videos. In this paper, we propose to segment movie into scenes which utilizes fused visual and audio features. The movie is first segmented into shots by an accelerating algorithm, and the key frames are extracted later. While feature movies are often filmed in open and dynamic environments using moving cameras and have continuously changing contents, we focus on the association extraction of visual and audio features. Then, based on the Kernel Canonical Correlation Analysis (KCCA), all these features are fused for scene detection. Finally, spatial-temporal coherent shots construct the similarity graph which is partitioned to generate the scene boundaries. We conduct extensive experiments on several movies, and the results show that our approach can efficiently detect the scene boundaries with a satisfactory performance.

08:30-09:00, Paper ThPSAT2.16
Descriptor Correlation Analysis for Remote Sensing Image Multi-Scale Classification
dos Santos, Jefersson Alex	Univ. of Campinas
Faria, F�bio Augusto	Univ. of Campinas
Torres, Ricardo	Inst. of Computing, Univ. of Campinas
Rocha, Anderson	Univ. of Campinas
Gosselin, Philippe Henri	CNRS
Philipp-Foliguet, Sylvie	ENSEA/UCP/CNRS
Falcao, Alexandre Xavier	State Univ. of Campinas
Keywords: Remote Sensing, Features and Image Descriptors, Classification and Clustering Abstract: This paper addresses the problem of multi-scale classification of remote sensing images by: (i) showing that using multiple scales improve classification results, but not all scales have the same importance; (ii) showing that image descriptors do not offer the same contribution at all scales and some of them are very correlated; (iii) introducing a simple approach to automatically select segmentation scales, descriptors, and classifiers based on correlation and accuracy analysis.

08:30-09:00, Paper ThPSAT2.17
Bayesian Image Enlargement for Mixed-Resolution Video
Tian, Jing	Wuhan Univ. of Science and Tech.
Chen, Li	Wuhan Univ. of Science and Tech.
Keywords: Enhancement, Restoration and Filtering, Image and Video Processing, Image and Video Understanding Abstract: Many scalable video compression techniques utilize a mixed-resolution scheme, which down-samples some frames at the encoder to be reduced-resolution frames while keeping resolutions of other frames unchanged as full resolutions, in order to achieve higher compression gain. Image enlargement technique is required at the decoder to recover the original full-resolution frames for this mixed-resolution video system setup. This paper proposes a Bayesian approach to enlarge the reduced-resolution frame via its maximum a posterior estimation, using the information from the observed reduced-resolution frame, plus more detailed information extracted from available neighboring frames in full resolution. Experiments are conducted to demonstrate the superior performance of the proposed approach.

08:30-09:00, Paper ThPSAT2.18
An Improved Surround Suppression Model Based on Orientation Contrast for Boundary Detection
Zhang, Hui	Beijing Jiaotong Univ.
Xie, Bojun	Beijing JiaoTong Univ.
Yu, Jian	Beijing Jiaotong Univ.
Keywords: Detection, Separation and Segmentation Abstract: This paper proposes an unsupervised bottom-up boundary detection algorithm, which is an improved surround suppression model based on orientation contrast. First, the candidate boundary set is obtained by the edge focusing algorithm. Second, the orientation contrast map is constructed using the response of Gabor filter. The suppression term is computed on orientation contrast map using steerable filter, which can effectively differentiate step edge from texture edge. Using low-level image features, the boundary map can be used as preprocessing step for image segmentation and/or object detection. The detection approach has been validated on Rug dataset and the average of figure of merit shows an improvement of 15%.

08:30-09:00, Paper ThPSAT2.19
Improving Texture Description in Remote Sensing Image Multi-Scale Classification Tasks by Using Visual Words
dos Santos, Jefersson Alex	Univ. of Campinas
Penatti, Ot�vio	Univ. of Campinas
Torres, Ricardo	Inst. of Computing, Univ. of Campinas
Gosselin, Philippe Henri	CNRS
Philipp-Foliguet, Sylvie	ENSEA/UCP/CNRS
Falcao, Alexandre Xavier	State Univ. of Campinas
Keywords: Remote Sensing, Features and Image Descriptors, Segmentation, Color and Texture Abstract: Although texture features are important for region-based classification of remote sensing images, the literature shows that texture descriptors usually have poor performance when compared and combined with color descriptors. In this paper, we propose a bag-of-visual-words (BOW) ``propagation'' approach to extract texture features from a hierarchy of regions. This strategy improves efficacy of feature as it encodes texture information independently of the region shape. Experiments show that the proposed approach improves the classification results when compared with global descriptors using the bounding box padding strategy.

08:30-09:00, Paper ThPSAT2.20
A Splitting Algorithm for Directional Regularization and Sparsification
Rak�t, Lars Lau	Univ. of Copenhagen
Nielsen, Mads	Univ. of Copenhagen
Keywords: Image and Video Processing, Enhancement, Restoration and Filtering, Segmentation, Color and Texture Abstract: We present a new split-type algorithm for the minimization of a p-harmonic energy with added data fidelity term. The half-quadratic splitting reduces the original problem to two straightforward problems, that can be minimized efficiently. The minimizers to the two sub-problems can typically be computed pointwise and are easily implemented on massively parallel processors. Furthermore the splitting method allows for the computation of solutions to a large number of more advanced directional regularization problems. In particular we are able to handle robust, non-convex data terms, and to define a 0-harmonic regularization energy where we sparsify directions by means of an L0 norm.

08:30-09:00, Paper ThPSAT2.21
A New Algorithm for Labeling Connected-Components and Calculating the Euler Number, Connected-Component Number, and Hole Number
He, Lifeng	Aichi Prefectural Univ.
Chao, Yuyan	Nagoya Sangyo Univ.
Suzuki, Kenji	The Univ. of Chicago
Keywords: Image and Video Processing, Features and Image Descriptors, Image and Video Understanding Abstract: Labeling connected components and calculating the Euler number, connected-component number, and hole number in a binary image are usually necessary for image analysis, pattern recognition, and computer (robot) vision. This paper presents a new algorithm for calculating the Euler number, connected-component number, and hole number in a binary image by labeling connected components in the binary image. The experimental results demonstrated that our algorithm is more efficient than convention algorithms.

08:30-09:00, Paper ThPSAT2.22
Quality Metrics for Practical Face Recognition
Abaza, Ayman	West Virginia High Tech. Consortium Foundation
Bourlai, Thirimachos	WVU
Harrison, Mary Ann	West Virginia High Tech. Consortium Foundation
Keywords: Image and Video Processing, Biometrics, Enhancement, Restoration and Filtering Abstract: In biometric studies, quality evaluation of input data is very important, and has proven to have a direct relation with system performance. Quality measures can provide real-time feedback to reduce the number of poor quality submissions to the system. Another benefit is that they can predict and improve the authentication performance (e.g., by using quality-dependent thresholds). This paper main focus is image quality assessment for face recognition. First, we evaluate a number of techniques that measure image quality factors namely, contrast, brightness, focus, sharpness, and illumination. Second, via a set of experiments measuring the sensitivity of each matric to quality change, we select the most practical measure(s) for each quality factor. Finally, we propose a novel face image quality index (FQI) that combines the five aforementioned quality factors. Via a set of statistical significance tests, we illustrate and support that FQI is a promising quality measure that can be used as an alternative to some benchmark face image quality measures.

08:30-09:00, Paper ThPSAT2.23
Video Storyboard Design Using Delaunay Graphs
Chowdhury, Ananda	Jadavpur Univ.
Kuanar, Sanjay	Jadavpur Univ.
Panda, Rameswar	Jadavpur Univ.
Das, Moloy	Jadavpur Univ.
Keywords: Image and Video Processing, Multimedia Analysis, Indexing and Retrieval, Motion, Tracking and Video Analysis Abstract: Design of video storyboards has emerged as a popular research area in the multimedia community. Different pattern clustering techniques are applied to extract the key frames from a video sequence to form a storyboard. In this paper, we propose an automatic method for the selection of key frames of a video sequence using Delaunay graphs. We prune certain edges from the Delaunay graph using an iterative strategy where overall reduction in the global standard deviation of edge lengths is maximized. Resulting connected components in the graph correspond to the separate clusters. The proposed algorithm also utilizes edge information in addition to the color histogram information to achieve semantic dependency between different video frames. Performance of our algorithm is evaluated using Fidelity, Shot Reconstruction Degree and Compression Ratio. Experiments on standard video datasets indicate the supremacy of the proposed method over a previous Delaunay clustering-based key frame extraction algorithm.

08:30-09:00, Paper ThPSAT2.24
Cluster-Based Vector-Attribute Filtering for CT and MRI Enhancement
Kiwanuka, Fred Noah	Univ. of Groningen
Wilkinson, Michael H.F.	Univ. of Groningen
Keywords: Image and Video Processing, Classification and Clustering, Medical Image Analysis and Registration Abstract: Morphological attribute filters modify images based on properties or attributes of connected components. Usually, attribute filtering is based on a scalar property which has relatively little discriminating power. Vector-attribute filtering allow better description of characteristic features for 2D images. In this paper, we extend vector attribute filtering by incorporating unsupervised pattern recognition, where connected components are clustered based on the similarity of feature vectors. We show that the performance of these new filters is better than those of scalar attribute filters in enhancement of objects in medical volumes.

08:30-09:00, Paper ThPSAT2.25
Fast and Efficient Multichannel Image Completion Using Local Similarity
Shenoda, Sameh	Univ. Teknologi Petronas
Faye, Ibrahima	Univ. Teknologi Petronas
Rohaya, Dayang	Univ. Teknologi Petronas
Keywords: Enhancement, Restoration and Filtering, Image and Video Processing Abstract: Reconstruction and repairing of missing parts or scratches of digital historical images is an important trend which has been extensively used in artwork restoration. Image completion is an active subject in image and video processing, which deals with the recovery of original data. Most previous image completion techniques consume more time in extensive search to find the best texture to repair the damage area. In addition to that, visual artifacts appear when the damage area is large. In this paper, we present a fast texture synthesis and image completion method without the extensive searching process. The proposed method is based on the local similarity of the natural image and two sides hole filling for completion. The method is fast and gives high quality results compared to other existing methods. It reduces the time from hundreds of seconds to a few microseconds and is able to repair large damage area without shadow.

08:30-09:00, Paper ThPSAT2.26
Human Action Recognition by Bagging Data Dependent Representation
Zhou, Wen	Inst. of automation, Chinese Acad. of sciences
Wang, Chunheng	Inst. of Automation Chinese Acad. of Sciences
Xiao, Baihua	Inst. of Automation, Chinese Acad. of Sciences
Zhang, Zhong	Inst. of Automation, Chinese Acad. of Sciences
Ma, Long	Inst. of Automation, Chinese Acad. of Sciences, China
Keywords: Image and Video Understanding, Pattern Recognition for Surveillance and Security Abstract: Traditional methods based on bag-of-word representation can not handle the problem when a test distribution differs from the training distribution. In this paper, we propose a novel method for human action recognition by bagging data dependent representation. Different with traditional methods, the proposed method represents each video by several histograms. These histograms are obtained by bagging from primitive features several times according to an estimated prior in both training and testing. This prior reflects the training distribution. There are two advantage of the proposed method. First, it alleviates the distribution difference between training set and test set. Second, the bagging operation reduces noise and improves the performance significantly. Experimental results demonstrate the effectiveness of the proposed method.

08:30-09:00, Paper ThPSAT2.27
A Fast Wavelet-Packet-Based Algorithm for Texture Synthesis
Hsin, Hsi-Chin	National United Univ.
Sung, Tze-Yun	Chung Hua Univ.
Ko, Lu-Ting	National Central Univ.
Keywords: Image and Video Processing, Segmentation, Color and Texture, Image and Video Understanding Abstract: We propose a fast texture synthesis algorithm based on wavelet packet transform. It decomposes the input image into wavelet packet coefficients, then a 2-step matching, specifically coarse matching based on low frequency wavelet packet coefficients followed by fine matching based on high frequency wavelet packet coefficients, is used for the texture synthesis task. Experimental results show that the proposed algorithm is preferable in terms of computation time.

08:30-09:00, Paper ThPSAT2.28
Fast Image Super Resolution Via Local Regression
Gu, Shuhang	Huazhong Univ. of Science&Tech. Wuhan , PR China
Sang, Nong	Huazhong Univ. of Science and Tech.
Ma, Fan	Huazhong Univ. of Science&Tech. Wuhan , PR China
Keywords: Image and Video Processing, Enhancement, Restoration and Filtering Abstract: In this paper, we propose a super resolution method based on linear regression in different middle-frequency texture categories. We benefit from the hypothesis that the mapping from middle-frequency manifold to high-frequency manifold is similar locally, and use simple linear regression method to learn mapping functions in different area of middle-frequency manifold. Different from previous works, our method only uses the database to learn the map-ping functions in different categories in the training phase, then we just need to save these mapping functions instead of a huge external database to get the missing details. We tested our algorithm on different images and compared the SR results with other methods.It takes about 0.3 seconds to upscale an image of 256-by-256 pixels by a factor of 2 along each axis using my Core i5 CPU laptop with single core. Compared with existing SR methods, our method can saving a lot of running time as well as generate high quality SR results (comparable results with state of art methods). Furthermore, our method predicts the missing high frequency information pixel by pixel, so our method is trivially parallel and can be optimization for speed conveniently.

08:30-09:00, Paper ThPSAT2.29
Moving Objects Segmentation from Compressed Surveillance Video Based on Motion Estimation
Xie, Danfeng	Chinese Acad. of Sciences R&D Center for Internet of Things
Wang, Shizheng	Chinese Acad. of Sciences
Keywords: Image and Video Processing, Image and Video Understanding, Multimedia Analysis, Indexing and Retrieval Abstract: Pixel-domain analysis, the mainstream approach to analyze surveillance video, has always been a hot issue in academy and industry. However, with the increasing volume and resolution of surveillance video, the flexibility and efficiency of fast processing is garnering more significance. Under this circumstance, surveill�ance video analysis in the compressed domain is indeed of strategic importance from the angle of balancing visual perception and processing speed, especially in modeling background and segmenting moving objects. Therefore, a compressed domain based scheme is proposed to model background and segment moving objects based on Motion Estimation (ME) in this paper. The main work and achievements are as follows: 1) a background modeling method with Motion Vector (MV) based on ME is applied to the compressed domain; 2) a method of region modification for Moving Objects Segmentation based on ME is proposed. Experimental results show that our approach can realize moving objects extraction in a fast and effective way.

08:30-09:00, Paper ThPSAT2.30
A Puppet Interface for the Development of an Intuitive Computer Animation System
Narukawa, Hiroki	The Univ. of Tokyo
Pantuwong, Natapon	The Univ. of Tokyo
Sugimoto, Masanori	The Univ. of Tokyo
Keywords: Image and Video Processing, Vision for Graphics, Motion, Tracking and Video Analysis Abstract: In this paper, we introduce a puppet interface for the development of an intuitive animation system. Our puppet interface does not require any special devices and any type of puppet can be used. The puppet interface is developed by attaching ten visible markers onto a puppet. The user can manipulate the pose of the puppet interface to produce the desired motion in front of a camera. The puppet�s joint angle information is captured by the camera and used to retrieve a suitable motion from a motion database. The retrieved motion data is transferred to a 3D character model to generate an animation. To avoid the occlusion problem, we propose an algorithm that estimates the joint angle by determining the position and rotation of markers adjacent to the occluded marker. Experiments confirmed that our proposed algorithm can generate correct results, although some occluded markers remained.

08:30-09:00, Paper ThPSAT2.31
Activity Detection in the Wild Using Video Metadata
McCloskey, Scott	McGill Univ. Honeywell
Davalos, Pedro	Honeywell
Keywords: Multimedia Analysis, Indexing and Retrieval, Motion, Tracking and Video Analysis, Scene Understanding Abstract: We use video metadata to perform activity detection from videos in the wild, particularly the TRECVID dataset. Unlike previous activity datasets (KTH, Weizmann, UCF sports, etc.), this test set is assembled from videos captured with a wide range of cameras, resulting in videos with different frame rates, audio/video bitrates, and resolutions. Because these measures correlate with the quality of the camera, and because different camera hardware may be used to capture different events (e.g., people likely bring nicer cameras to weddings than on fishing trips), we expect that usable correlations exist between metadata and events. Using SVM-based classification of a feature vector of metadata features, we demonstrate that such correlations do exist. While the performance of this method is worse than traditional visual features, we demonstrate that they compliment such approaches using score fusion.

08:30-09:00, Paper ThPSAT2.32
Accurate Genomic Signal Recovery Using Compressed Sensing
Uddin, Bakhtiyar	Univ. of Texas
Celebi, Emre M.	Louisiana State Univ.
Kingravi, Hassan	Georgia Inst. of Tech.
Schaefer, Gerald	Loughborough Univ.
Keywords: Detection, Separation and Segmentation, Coding and Compression, Pattern Recognition for Bioinformatics Abstract: Microarrays are massively parallel biosensors that can simultaneously detect and quantify a large number of different genomic particles. A DNA microarray is a nucleic acid-based microarray that contains probe spots testing a multitude of targets in one experiment. Ideas from compressive sensing have been utilized in different ways in the analysis of DNA microarrays. One of the proposed methods is compressed microarrays, where each spot contains copies of several probes and the total number of spots is lower, resulting in significantly reduced costs due to cheaper array manufacturing. In this paper, we perform compressed microarray experiments with real aCGH data and demonstrate the accuracy of various recovery methods. Our experimental results suggest that the measurements that can be captured by compressed microarrays can be recovered accurately using the proposed norm-minimization methods.

08:30-09:00, Paper ThPSAT2.33
Improvements of Dynamic Texture Synthesis for Video Coding
Hou, Zhiqiang	Wuhan Univ.
Hu, Ruimin	National Engineering Res. Center for MultimediaSoftware, Wuh
Wang, Zhongyuan	National Engineering Res. Center for MultimediaSoftware, Wuh
Han, Zhen	Wuhan Univ.
Keywords: Coding and Compression Abstract: We describe an algorithm for dynamic texture synthesis of video sequences of frames exhibiting certain stationary properties over time, such as sea-waves, whirlwind or moving crowds. The algorithm is based on taking into account the similarity among reference images in the video inter-frame coding. It allowed us to better express the time-varying relationship of the dynamic texture and to extend the algorithm described in [9].

08:30-09:00, Paper ThPSAT2.34
Video Stabilization Based on High Degree B-Spline Smoothing
Wang, Yue	(1) Inst. for Infocomm Res. Univ.
Chang, Richard	Inst. for Infocomm Res.
Chua, Teck Wee	Inst. for Infocomm Res.
Leman, Karianto	Inst. for Infocomm Res.
Pham, Nam Trung	Inst. for Infocomm Res.
Keywords: Image and Video Processing, Pattern Recognition for Surveillance and Security, Motion, Tracking and Video Analysis Abstract: Unmanned Aerial Vehicles (UAV) become widely used as they present many advantages for surveillance applications. However, most computer vision algorithms cannot be applied to the video sequences from UAV due to video shaking or blurredness. We propose a method to stabilize the image sequence from UAV cameras. This approach deals with three steps, a keypoint detection step, then a homography estimation and finally a motion compensation using high degree B-spline smoothing. Experimental results from real videos show that our method can perform real time and provide good performance.

08:30-09:00, Paper ThPSAT2.35
Battle-Lemarie Wavelet Pyramid for Improved GSM Image Denoising
Mudugamuwa, Damith J.	UTS
He, Xiangjian	Univ. of Tech. Sydney
Jia, Wenjing	Univ. of Tech. Sydney
Keywords: Enhancement, Restoration and Filtering, Image and Video Processing, Coding and Compression Abstract: Removing noise from a digital image is a challenging problem. Application of Gaussian Scale Mixtures (GSM) in the wavelet domain has been reported to be one of the most effective denoising algorithms, published to date. The performance of this algorithm depends on the chosen wavelet representation. In this paper, we introduce an improved wavelet pyramid representation based on the Battle-Lemarie wavelet which favors the GSM denoising performance. We present the experimental denoising results using the proposed pyramid representation, and they outperform state-of-the-art GSM denoising results reported in the literature.

08:30-09:00, Paper ThPSAT2.36
Fast Computation of Orthogonal Polar Harmonic Transforms
Hoang, Thai V.	INRIA
Tabbone, Salvatore	Univ. Lorraine
Keywords: Image and Video Processing, Features and Image Descriptors Abstract: This paper presents a method for the computation of polar harmonic transforms that is fast and efficient. The method is based on the inherent recurrence relations among harmonic functions that are used in the definition of the radial and angular kernels of the transforms. The employment of these relations leads to recursive strategies for fast computation of harmonic function-based kernels. Polar harmonic transforms were recently proposed and have shown nice properties for image representation and pattern recognition. The proposed method is 10-time faster than direct computation and five-time faster than fast computation of Zernike moments.

08:30-09:00, Paper ThPSAT2.37
Made to Measure Top-Hats
Meyer, Fernand	Mines-ParisTech
Keywords: Image and Video Processing, Detection, Separation and Segmentation, Enhancement, Restoration and Filtering Abstract: Morphological filters are robust simplification filters of images as they are increasing and idempotent. Among them, openings suppress bright details and closings black details. Subtracting from an image its opening recovers bright details ; subtracting an image from its closing recovers dark details. Both residue operators are called top-hats. A gray tone image may be seen as a topographic surface. Its flooding constitutes a particular closing which simplifies the surface by creating lakes and at the same time preserves the contours of the unflooded parts. The highest flooding of an image f below an image h≥f is of particular interest. By an adequate choice of h, some catchment basins of f remain unflooded, others are flooded but not completely filled, the last ones being completely filled by a lake. The paper shows how 2 successive floodings may be used for a tailored filtering of images or a tailored extraction of image structures. The first constraining function is designed so as to highlight particular wells of f, which will not be completely filled by lakes after the first flooding. The lakes created by this first constrained flooding are analysed and classified in two classes ; a class to remain dry and a class to be completely filled by a lake after the second flooding. A second constraining function is created, equal to f on all minima which should remain dry and equal to the maximal gray tone everywhere else. The result of this second flooding is a tailored filtering of the initial image and its residue a tailored top-hat of its black details. The sequence of dual operators is based upon razings (dual operator of floodings, obtained by flooding the negative of an image and taking the negative of the result) permits to extract bright details.

08:30-09:00, Paper ThPSAT2.38
Real-Time Disparity Estimation Using Line-Wise Hybrid Recursive Matching and Cross-Bilateral Median Up-Sampling
Riechert, Christian	Fraunhofer Inst. for Telecommunications - HeinrichHertz Inst.
Zilly, Frederik Leonhard	Fraunhofer Heinrich Hertz Inst.
M�ller, Marcus	Fraunhofer Inst. for Telecommunications - Heinrich Hertz Ins
Kauff, Peter	Fraunhofer Inst. for Telecommunications - Heinrich Hertz Ins
Keywords: Image and Video Processing, Stereo and Image-Based Modeling Abstract: In this paper a combination of an initial disparity estimation using the line-wise hybrid recursive matcher and a subsequent post-processing and up-sampling step using variations of cross-bilateral filtering is presented. The proposed algorithm is real-time capable for image resolutions up to HD and scales well with large disparity ranges. It is specifically designed to allow for a high degree of parallelization and for temporally consistent disparity maps for use in video processing. In terms of quality the proposed method can compete with most recent real-time or near-real-time capable disparity estimators.

08:30-09:00, Paper ThPSAT2.39
Background Subtraction Via Early Recurrence in Dynamic Scenes
Shi, Xun	York Univ.
Tsotsos, John	York Univ.
Keywords: Image and Video Processing, Low-Level Vision, Enhancement, Restoration and Filtering Abstract: A biologically motivated model of background subtraction is proposed. The two-step computation borrows the idea from the low-level inhibitive processing of the two-pathway primate visual system. A spatiotemporal representation consistent with the dorsal pathway is computed and refined via center-surround inhibition. This representation catches perceptually salient foreground regions, and is further used to inhibit fine-scale visual features that are confined to the ventral pathway, leading to a high-spatially-accurate representation containing mostly foreground pixels. Output of our work is attached to a state-of-the-art visual saliency model. Results using real dynamic scenes are compared with ground truth, which confirmed that our early recurrent processing can effectively remove background.

08:30-09:00, Paper ThPSAT2.40
Optic Disc Localization by Projection with Vessel Distribution and Appearance Characteristics
Zhang, Dongbo	Xiangtan Univ.
Yi, Yao	Coll. of Information Engineering Xiantan Univ.
Shang, Xingyu	Xiangtan Univ.
Peng, Yinghui	xiangtan Univ.
Keywords: Detection, Separation and Segmentation, Medical Image Analysis and Registration Abstract: Considering the vessel distribution and optic disc (OD) appearance characteristics comprehensively, a novel OD localization method based on 1-D projection is proposed. The horizontal location is determined by vascular scatter degree, an evaluation index of vessel distribution. And the vertical location is found by brightness and edge gradient around OD. The proposed method was tested on four publicly-available databases and a self-selection database. The OD was successfully located in 357 images out of 380 images (94%). And the proposed method shows good robustness in both normal and diseased images.

08:30-09:00, Paper ThPSAT2.41
Compression of GPS Trajectories Using Optimized Approximation
Chen, Minjie	Univ. of Eastern Finland
Xu, Mantao	Carestream Health Corp. Shanghai, China
Fr�nti, Pasi	Univ. of Eastern Finland
Keywords: Coding and Compression Abstract: A large number of GPS trajectories, which include users' spatial and temporal information, are collected by geo-positioning mobile phones in recent years. The massive volumes of trajectory data bring about heavy burdens for both network transmission and data storage. To overcome these difficulties, GPS trajectory compression algorithm (GTC) was proposed recently that optimizes both the data reduction by trajectory simplification and the coding procedure using the quantized data. In this paper, instead of using greedy solution in GTC algorithm, the approximation process is optimized jointly with the encoding step via dynamic programming. In addition, Bayes' theorem is applied to improve the robustness of probability estimation for encoded values. The proposed solution has the same time complexity with GTC algorithm in the decoding procedure and experimental results show that its bit-rate is around 80% comparing with GTC algorithm.

08:30-09:00, Paper ThPSAT2.42
A Genetic Algorithm Based Approach for Combining Binary Image Operators
Dornelles, Marta Magda	Univ. Estadual de Santa Cruz
Hirata, Nina S. T.	Univ. of S�o Paulo
Keywords: Image and Video Processing, Machine Learning and Data Mining, Classification and Clustering Abstract: Combining several binary image operators, each one based on different windows, has proven to be an effective way to produce operators with better performance than designing single operators based on one window only. To facilitate the combination task that so far is done manually, we propose a genetic algorithm (GA) based approach. It consists of the definition of a collection of candidate windows and the use of a GA to select a subset of them that will determine the operators to be combined. Experimental results show that the proposed GA based approach produces combinations that are consistently better than those obtained manually, and indicate that the proposed window collections do contain relevant windows.

08:30-09:00, Paper ThPSAT2.43
Fast JPEG Image Retrieval Using Optimised Huffman Tables
Edmundson, David	Loughborough Univ.
Schaefer, Gerald	Loughborough Univ.
Keywords: Multimedia Analysis, Indexing and Retrieval, Features and Image Descriptors Abstract: With image databases expanding rapidly, fast retrieval solutions are highly sought after. Since most images are compressed in JPEG format, compressed-domain retrieval algorithms based on DCT coefficients can be employed to speed up feature extraction and comparison during retrieval. However, this approach is limited as the complete image files need to be read and partially decoded to obtain the required coefficient data. In this paper, we present a very fast method for retrieving JPEG compressed images. Our method is based on the Huffman tables contained in the JPEG header which can be optimised on a per image basis not only to improve compression rates but also to provide a very useful image descriptor. We show that feature extraction and comparison based on optimised Huffman tables takes only a fraction of the time compared to common image retrieval algorithm while resulting only in a relatively small drop in terms of retrieval accuracy.

08:30-09:00, Paper ThPSAT2.44
Strategies for Multiple Feature Fusion with Hierarchical HMM: Application to Activity Recognition from Wearable Audiovisual Sensors
Pinquier, Julien	Univ. Paul Sabatier
Karaman, Svebor	Lab.
Letoupin, Laetitia	Lab.
Guyot, Patrice	IRIT
Megret, Remi	Univ. of Bordeaux
Benois-Pineau, Jenny	Lab.
Ga�stel, Yann	INSERM U.897
Dartigues, Jean-Francois	INSERM U.897
Keywords: Multimedia Analysis, Indexing and Retrieval, Classification and Clustering Abstract: In this paper, we further develop the research on recognition of activities, in videos recorded with wearable cameras, with Hierarchical Hidden Markov Model classifiers. The visual scenes being of a strong complexity in terms of motion and visual content, good performances have been obtained using multiple visual and audio cues. The adequate fusion of features from physically different description spaces remains an open issue not only for this particular task, but in multiple problems of pattern recognition. A study of optimal fusion strategies in the HMM framework is proposed. We design and exploit early, intermediate and late fusions with emitting states in the H-HMM. The results obtained on a corpus recorded by healthy volunteers and patients in a longitudinal dementia study allow choosing optimal fusion strategies as a function of target activity.

08:30-09:00, Paper ThPSAT2.45
Local Color Editing Using Color Classification and Boundary Inpainting
Su, Zhuo	Sun Yat-sen Univ.
Yang, Xue	Sun Yat-sen Univ.
Luo, Xiaonan	Sun Yat-sen Univ.
Wang, Dong	South China Agricultural Univ. of Informatics
Keywords: Enhancement, Restoration and Filtering, Image and Video Processing, Segmentation, Color and Texture Abstract: Color Editing plays an important role in image processing, which aims to change the original image color according to specific image color characteristics. We present a new interactive local color editing method. The user first draws strokes specifying the target region needed to transfer color, which can be segmented using K-means clustering. Then patch-based inpainting technology is applied to achieve natural transition along the boundary. In the course of color editing, only the color of the pixels in this segmented region is transferred, while others including the boundary region and the background remain unchanged. The experimental results show that our method can achieve a visually satisfying local color editing results.


ThAT1	Main Hall
Invited Talk Session-IV	Regular Session
Chair: Kise, Koichi	Graduate School of Engineering, Osaka Prefecture Univ.
Co-Chair: Marinai, Simone	Univ. degli Studi di Firenze

09:00-09:40, Paper ThAT1.1
Three Approaches of Scene Text Recognition: An Informal Comparison on Difficult Images (Invited Talk)
Kim, Jin Hyung	KAIST
Keywords: Abstract: Three KAIST approaches for scene text recognition will be presented in this talk: color-based, edge-based, and part-based approaches. Although features of color, edge and part-relationship are utilized in all of the three approaches, there are differences on the main focus in each of these approaches. The color-based approach focuses on image segmentation mainly based on color, while the edge-based approach focuses on edge following to extract text objects. The part-based approach is an attempt to directly pin point existence of character parts in image. Each of the three approaches has merits and demerits. The text extraction results of the three approaches will be shown on some representative images known 'difficult' in the community. So, one may feel how the approaches will behave in other difficult images.

09:40-10:00, Paper ThAT1.2
A Learning Framework for Degraded Document Image Binarization Using Markov Random Field
Su, Bolan	National Univ. of Singapore
Lu, Shijian	-
Tan, Chew-Lim	National Univ. of Singapore
Keywords: Document Analysis Systems, Segmentation, Color and Texture, Classification and Clustering Abstract: Document image binarization is an important preprocessing technique for document image analysis that segments the text from the document image backgrounds. Many techniques have been proposed and successfully applied in different applications, such as document image retrieval. However, these techniques may perform poorly on degraded document images. In this paper, we propose a learning framework that makes used of the Markov Random Field to improve the performance of the existing document image binarization methods for those degraded document images. Extensive experiments on the recent Document Image Binarization Contest datasets demonstrate that significant improvements of the existing binarization methods when applying our proposed framework.

10:00-10:20, Paper ThAT1.3
Learning Features for Predicting OCR Accuracy
Ye, Peng	Univ. of Maryland, Coll. Park
Doermann, David	Univ. of Maryland
Keywords: Document Understanding, Features and Image Descriptors, Performance Evaluation Abstract: In this paper, we present a new method for assessing the quality of degraded document images using unsupervised feature learning. The goal is to build a computational model to automatically predict OCR accuracy of a degraded document image without a reference image. Current approaches for this problem typically rely on hand-crafted features whose design is based on heuristic rules that may not be generalizable. In contrast, we explore an unsupervised feature learning framework to learn effective and efficient features for predicting OCR accuracy. Our experimental results, on a set of historic newspaper images, show that the proposed method outperforms a baseline method which combines features from previous works.

10:20-10:40, Paper ThAT1.4
String-Level Learning of Confidence Transformation for Chinese Handwritten Text Recognition
Wang, Da-Han	Inst. of Automation, Chinese Acad. of Sciences
Liu, Cheng-Lin	Inst. of Automation, Chinese Acad. of Sciences
Keywords: Character and Text Recognition, Handwriting Recognition Abstract: Handwritten text recognition systems commonly combine character classification confidence scores and context models for evaluating candidate segmentation-recognition paths, and the classification confidence is usually optimized at character level. On comparing the performance of class-dependent and class-independent confidence transformation (CT), this paper proposes two regularized class-dependent CT methods, and particularly, a string-level confidence learning method under the Minimum Classification Error (MCE) criterion. In experiments of online Chinese handwritten text recognition, the string-level confidence learning method was shown to effectively improve the recognition performance.


ThAT2	Multi-Purpose Hall
Segmentation and Classification	Regular Session
Chair: Crandall, David	Indiana Univ.
Co-Chair: Lee, Chan-Su	Yeungnam Univ.

09:00-09:20, Paper ThAT2.1
Discovering Relevant Spatial Filterbanks for VHR Image Classification
Tuia, Devis	Lausanne EPFL
Volpi, Michele	Univ. of Lausanne, Inst. of geomatics and risk analysis
Dalla Mura, Mauro	Fondazione Bruno Kessler
Rakotomamonjy, Alain	Univ. de Rouen
Flamary, R�mi	Univ. de Rouen
Keywords: Remote Sensing, Feature Reduction and Manifold Learning, Detection, Separation and Segmentation Abstract: In very high resolution (VHR) image classification it is common to use spatial filters to enhance the discrimination among landuses related to similar spectral properties but different spatial characteristics. However, the filters types that can be used are numerous (e.g. textural, morphological, Gabor, wavelets, etc.) and the user must pre-select a family of features, as well as their specific parameters. This results in features spaces that are high dimensional and redundant, thus requiring long and suboptimal feature selection phases. In this paper, we propose to discover the relevant filters as well as their parameters with a sparsity promoting regularization and an active set algorithm that iteratively adds to the model the most promising features. This way, we explore the filters/parameters input space efficiently (which is infinitely large for continuous parameters) and construct the optimal filterbank for classification without any other information than the types of filters to be used.

09:40-10:00, Paper ThAT2.3
A Renewed Image Annotation Baseline by Image Embedding and Tag Correlation
Liu, Rujie	Fujitsu Res. & Development Center
Wang, Yuehong	Fujitsu Res. and Development Center
Yu, Hao	Fujitsu Res. and Development Center
Naoi, Satoshi	Fujitsu R&D Center Co., LTD
Keywords: Multimedia Analysis, Indexing and Retrieval, Image and Video Understanding, Pattern Recognition for Search, Retrieval and Visualization Abstract: This paper presents a renewed image annotation baseline method under the nearest neighbor tag transfer framework. Two key problems are considered in this paper: (1) which images are determined as the neighbors; (2) how their keywords are transferred. Firstly, a soft neighbor selection scheme is designed by image embedding technique, with which we can provide more power to the crucial neighbors in decision making. Next, diffused tag propagation is introduced to allow one tag be propagated to other relevant tags. Besides this, the above two measures are formulated into an optimization framework to further improve the prediction performance. Experimental results on standard database show that the proposed approaches outperform the current state-of-the-art methods.

10:00-10:20, Paper ThAT2.4
Structured Sparse Coding for Image Representation Based on L1-Graph
Ou, Weihua	Huazhong Univ. of science and Tech.
You, Xinge	Huazhong Univ. of Science and Tech.
Cheung, Yiu-ming	Hong Kong Baptist Univ.
Peng, Qinmu	Hong Kong Baptist Univ.
Gong, Mingming	Huazhong Univ. of science and Tech.
Jiang, Xiubao	Huazhong Univ. of science and Tech.
Keywords: Image and Video Understanding, Image and Video Processing, Coding and Compression Abstract: Sparse coding seeks for over-complete bases to obtain the high-level image representation for image analysis. In many applications, the image data might reside on a low dimensional manifold embedded in high dimensional ambient space. However, standard sparse coding cannot exploit the manifold structure. In this paper, we propose a novel structured sparse coding method based on the L1-graph， in which the geomet-ric structure of the image data is considered explicitly. Specifically, a new regularization term based on L1-graph is incorporated into the standard sparse coding framework and a fast iterative thresholding algorithm is developed to solve the optimization problem. Through this coding scheme, the codes obtained by our algorithm between the similar data points in high dimensional space are more similar than that obtained by standard sparse coding. Experiments demonstrate the efficacy of the proposed method for image representation on two benchmark databases.

10:20-10:40, Paper ThAT2.5
Stable Discriminative Dictionary Learning Via Discriminative Deviation
Khan, Nazar	Univ. of Central Florida
Tappen, Marshall	Univ. of Central Florida
Keywords: Image and Video Understanding, 2D/3D Object Detection and Recognition, Coding and Compression Abstract: Discriminative learning of sparse-code based dictionaries tends to be inherently unstable. We show that using a discriminative version of the deviation function to learn such dictionaries leads to a more stable formulation that can handle the reconstruction/discrimination trade-off in a principled manner. Results on Graz02 and UCF Sports datasets validate the proposed formulation.


ThAT3	Room 101+102
Geometry and Calibration	Regular Session
Chair: Heikkil�, Janne	Univ. of Oulu
Co-Chair: Kanatani, Kenichi	Okayama Univ.

09:00-09:20, Paper ThAT3.1
Direct Least Square Fitting of Ellipsoids
Ying, Xianghua	Peking Univ.
Yang, Li	Peking Univ.
Kong, Jing	Peking Univ.
Hou, Yongbo	Peking Univ.
Guan, Sheng	Peking Univ.
Zha, Hongbin	Peking Univ.
Keywords: 2D/3D Object Detection and Recognition, Stereo and Image-Based Modeling Abstract: Least square fitting of quadratic surfaces is a fundamental problem in pattern recognition, computer vision, graphics, and medical imaging analysis. This paper investigated in approaches to ellipsoid-specific fitting. In 2D case, Fitzgibbon�s ellipse-specific fitting approach outperforms others since it is extremely robust, efficient, and easy to implement. This paper attempts to extend it from 2D to 3D for ellipsoid-specific fitting. The extension seems straightforward at first glance. However, we discovered to make the extension feasible is not easy as mentioned in the main text. Experimental results demonstrate the validity of the proposed approach.

09:20-09:40, Paper ThAT3.2
Exploiting P-Fold Symmetries for Faster Polynomial Equation Solving
Ask, Erik	Lund Univ.
Kuang, Yubin	Lund Univ.
Astroem, Kalle	Lund Univ.
Keywords: Stereo and Image-Based Modeling, Vision for Robotics, Low-Level Vision Abstract: Numerous geometric problems in computer vision involve the solution of systems of polynomial equations. This is true for problems with minimal information, but also for finding stationary points for overdetermined problems. The state-of-the-art is based on the use of numerical linear algebra on the large but sparse coefficient matrix that represents the expanded original equation set. In this paper we present two simplifications that can be used (i) if the zero vector is one of the solutions or (ii) if the equations display certain pfold symmetries. We evaluate the simplifications on a few example problems and demonstrate that significant speed increases are possible without loosing accuracy.

09:40-10:00, Paper ThAT3.3
Homography Estimation from Correspondences of Local Elliptical Features
Chum, Ondrej	Czech Tech. Univ. in Prague
Matas, Jiri	CTU Prague
Keywords: Stereo and Image-Based Modeling Abstract: We propose a novel unified approach for homography estimation from two or more correspondences of local elliptical features. The method finds a homography defined by a first order Taylor expansions at two (or more) points. The approximations are affine transformations that are constrained by the ellipse-to-ellipse correspondences. Unlike methods based on projective invariants of conics, the proposed method generates only a single homography model per a pair of ellipse correspondences. We show experimentally, that the proposed method generates models of precision comparable or better than the state-of-the-art at lower computational costs.

10:00-10:20, Paper ThAT3.4
Camera Self Calibration Based on Direct Image Alignment
Sugimoto, Shigeki	Tokyo Inst. of Tech.
Okutomi, Masatoshi	Tokyo Inst. of Tech.
Keywords: Vision for Robotics, Stereo and Image-Based Modeling, Motion, Tracking and Video Analysis Abstract: We propose a camera self calibration method based on direct image alignment (DIA) for estimating camera parameters including lens-distortion components from planar scenes. We formulate a cost function without the inverse of the lens distortion function for avoiding the difficulty in the image alignment between two lens-distorted images. We also use a backward warp cost for improving the convergence instability due to inherent ambiguities in calibration parameters and blurring effects through image warps. We show our method leads to a comparable performance with the de-facto standard Matlab toolbox for non-self calibration, even though the proposed method is for self calibration.

10:20-10:40, Paper ThAT3.5
Fast and Efficient Vanishing Point Detection in Indoor Images
Gerogiannis, Demetrios	Univ. of Ioannina
Nikou, Christophoros	Univ. of Ioannina
Likas, Aristidis	Univ. of Ioannina, Department of Computer Science
Keywords: Vision for Robotics, Low-Level Vision Abstract: A method for detecting a vanishing point in structured images is presented. The method relies on the detection of line segments from an edge map by representing clusters of edge points by the long axes of highly eccentric ellipses. The extracted lines provide a set of candidate vanishing points computed by their intersections, which are assigned weights proportional to the lengths of the line segments they belong to. Then, a voting scheme is applied through an accumulator array generated by gridding the image frame. The votes of each grid cell are weighted by the Pi-sigmoid kernel allowing cells to contribute to their


ThAT4	Hall 200
Object Detection	Regular Session
Chair: Mikolajczyk, Krystian	Univ. of Surrey
Co-Chair: Fisher, Robert	Univ. of Edinburgh

09:00-09:20, Paper ThAT4.1
Efficient Incremental Learning of Boosted Classifiers for Object Detection
Sharma, Pramod	Univ. of Southern California
Huang, Chang	NEC Lab.
Nevatia, Ram	USC
Keywords: 2D/3D Object Detection and Recognition, Machine Learning and Data Mining, Pattern Recognition for Surveillance and Security Abstract: Signiﬁcant progress has been made towards learning a generalized ofﬂine object detector. However, when a generalized ofﬂine detector is applied on new datasets, it often misses some instances of the object or produces false alarms in the background scene. we propose a novel and efﬁcient incremental learning method , which improves the performance of an ofﬂine trained detec- tor. Our approach adjusts the parameters of ofﬂine trained cascade of boosted classiﬁers using manually labeled online samples. Experiments demonstrate both efﬁciency and effectiveness of our approach.

09:20-09:40, Paper ThAT4.2
Unsupervised Model Selection for View-Invariant Object Detection in Surveillance Environments
Siddiquie, Behjat	SRI International
Feris, Rogerio	IBM Res.
Datta, Ankur	IBM T. J. Watson Res. Center
Davis, Larry	Univ. of Maryland
Keywords: 2D/3D Object Detection and Recognition, Pattern Recognition for Surveillance and Security, Motion, Tracking and Video Analysis Abstract: We propose a novel approach for view-invariant vehicle detection in traffic surveillance videos. Instead of building a monolithic object detector that can model all possible viewpoints, we learn a large array of efficient view-specific models corresponding to different camera views (source domains). When presented with an unseen viewpoint (target domain), closely related models in the source domain are selected for detection based on a novel discriminatively trained distance metric function, which takes into account scene geometry, vehicle motion patterns, and the generalizing ability of the models. Extensive experimental evaluation on a challenging test set, consisting of images collected from fifty different surveillance cameras, demonstrates that our unsupervised approach can outperform complex methods that utilize labeled training data from the target domain, both in terms of speed as well as accuracy.

09:40-10:00, Paper ThAT4.3
Multi-View Multi-Class Object Detection Via Exemplar Compounding
Ma, Kai	Univ. of Illinois at Chicago
Ben-Arie, Jezekiel	Univ. of Illinois at Chicago
Keywords: 2D/3D Object Detection and Recognition, Detection, Separation and Segmentation, Image and Video Processing Abstract: To address the multi-view multi-class object detection problem, we propose a method named Vector Array Recognition by Indexing and Sequencing (VARIS). VARIS is able to find optimal similarity matching between the input image and pre-stored exemplars while allowing wide geometrical variations which are limited only by topology constraints. Aggregated similarity is further enhanced by matching the input image with compound exemplars. The exemplar compounding procedure also reduces the number of exemplars necessary for each class. Our experiments show that VARIS with exemplar compounding achieves state-of-the-art performance on PASCAL VOC2007 dataset with a reasonable computational cost.

10:00-10:20, Paper ThAT4.4
Mining Sub-Categories for Object Detection
Dai, Jifeng	Tsinghua Univ.
Feng, Jianjiang	Tsinghua Univ.
Zhou, Jie	Tsinghua Univ.
Keywords: 2D/3D Object Detection and Recognition, Detection, Separation and Segmentation Abstract: The visual concept of an object category is usually composed of a set of sub-categories corresponding to different sub-classes, perspectives, spatial configurations and etc. Existing detector training algorithms usually require extensive supervisory information to achieve a satisfactory performance for sub-categorization. In this paper, we propose a detector training algorithm which can automatically mine meaningful sub-categories utilizing only the image contents within the training bounding boxes. The number of sub-categories can also be determined automatically. The mined sub-categories are of medium size and could be further labeled for a variety of applications like sub-category detection, meta-data transferring and etc. Promising detection results are obtained on the challenging PASCAL VOC dataset.

10:20-10:40, Paper ThAT4.5
Semantic Windows Mining in Sliding Window Based Object Detection
Zhang, Junge	CASIA
Zhao, Xin	Univ. of Science and Tech. of China
Huang, Yongzhen	Inst. of Automation, Chinese Acad. of Sciences
Kaiqi, Huang	CAS Inst. of Automation (CASIA)
Tan, Tieniu	casia
Keywords: 2D/3D Object Detection and Recognition, Pattern Recognition for Surveillance and Security Abstract: This paper studies the problem of end-to-end windows mining directly from detection output. Traditional object detection systems approach this problem in an ad-hoc manner, say, Non-Maximum Suppression (NMS). Beyond NMS, multi-class context modeling has been explored thoroughly recent years. But all these methods put their emphasis on eliminating false positive windows rather than improving recall. To address this problem, we firstly study this problem and propose semantic windows mining. To improve recall, we propose Selective Forward Search (SFS) which keeps most of the semantic windows while substantially reduces the number of false positives. After SFS, to improve precision, we present the end-to-end windows mining by means of similarity refining optimized for mean Average Precision (mAP) and overlap regression. We show a noticeable improvement on the PASCAL VOC datasets in both recall and precision.


ThAT5	Hall 300
Gait and Action Analysis	Regular Session
Chair: Ji, Qiang	RPI
Co-Chair: Nixon, Mark	Univ. of Southampton

09:00-09:20, Paper ThAT5.1
Action Recognition Via Sparse Representation of Characteristic Frames
Lu, Guoliang	Graduate school of information scienceandtechnology,Hokkaido Uni
Kudo, Mineichi	Hokkaido Univ.
Toyama, Jun	Hokkaido Univ.
Keywords: Pattern Recognition for Surveillance and Security, Gesture and Behavior Analysis, Pattern Recognition for Bioinformatics Abstract: For achieving efficient action recognition, some recent works propose to select a smaller number of frames in a video sequence instead of the entire sequence of frames. In this study, we propose to represent a frame by a combination of local and global descriptors instead of the silhouette used in our previous approach aiming at frame selection. Action recognition is then executed on the basis of the selected frames. The experiment on KTH database shows that the selected frames by the proposed framework are, in the minimum number to achieve the best recognition rate, better than those by two compared selection ways.

09:20-09:40, Paper ThAT5.2
On Including Quality in Applied Automatic Gait Recognition
Matovski, Darko	Univ. of Southampton
Nixon, Mark	Univ. of Southampton
Mansfield, Tony	national physical Lab.
Mahmoodi, Sasan	Univ. of Southampton
Keywords: Biometrics Abstract: Many gait recognition approaches use silhouette data. Imperfections in silhouette extraction have a negative effect on the performance of a gait recognition system. In this paper we extend quality metrics for gait recognition and evaluate new ways of using quality to improve a recognition system. We demonstrate use of quality to improve silhouette data and select gait cycles of best quality. The potential of the new approaches has been demonstrated experimentally on a challenging dataset, showing how recognition capability can be dramatically improved. Our practical study also shows that acquiring samples of adequate quality in arbitrary environments is difficult and that including quality analysis can improve performance markedly.

09:40-10:00, Paper ThAT5.3
Can Gait Fluctuations Improve Gait Recognition?
Makihara, Yasushi	The Inst.
Fujihara, Yusuke	Osaka Univ.
Yagi, Yasushi	Osaka Univ.
Keywords: Biometrics, Pattern Recognition for Surveillance and Security, Motion, Tracking and Video Analysis Abstract: Gait recognition performance is often degraded by intra-subject gait fluctuations such as temporal fluctuations due to non-uniform evolution of phase (gait stance) and spatial fluctuations in arm swings or posture within the same phase. Therefore, we first propose a method for gait recognition using a phase-normalized image sequence to overcome the temporal fluctuations. However, it has been noticed that gait fluctuations actually contain some useful individuality (e.g., degree of arm swing fluctuations). Hence, we propose a score-level fusion framework for gait recognition using the gait fluctuation features as well as the phase-normalized image sequence. Experiments with a public gait database of 100 subjects show the effectiveness of the proposed method.

10:00-10:20, Paper ThAT5.4
Can Gait Biometrics Be Spoofed?
Hadid, Abdenour	Univ. of OULU
Ghahramani, Mohammad	Univ. of Oulu
Kellokumpu, Vili	Univ. of Oulu
Pietik�inen, Matti	Univ. of Oulu
Bustard, John	Univ. of Southampton
Nixon, Mark	Univ. of Southampton
Keywords: Biometrics, Pattern Recognition for Surveillance and Security Abstract: Gait recognition is a relatively new biometrics and no effort has yet been devoted to studying spoofing attacks against gait recognition systems. Broadly speaking, a spoofing attack occurs when a person tries to imitate the clothing and/or walking style of someone else in order to gain illegitimate access and advantages. To gain insight into the performance of current gait biometric systems when confronted to spoofing attacks, we provide in this paper the first investigation in the literature on how clothing can be used to spoof a target and evaluate the performance of two state-of-the-art recognition methods on a novel gait spoofing database recorded at the University of Southampton. The experiments point out very interesting findings that can be used as a reference for future investigations by the research community.

10:20-10:40, Paper ThAT5.5
Gait-Based Gender Classification in Unconstrained Environments
Lu, Jiwen	UIUC
Wang, Gang	Nanyang Tech. Univ.
Huang, Thomas	Univ. of Illinois at Urbana-Champaign
Keywords: Biometrics, Gesture and Behavior Analysis Abstract: This paper investigates the problem of gait-based gender classification in unconstrained environments. Different from existing human gait analysis and recognition methods which assume that humans walk in controlled environments, we aim to recognize human gender from uncontrolled gaits in which people can walk freely and the walking direction of human gaits may be time-varying in a singe video clip. Given each gait sequence collected in an uncontrolled manner, we first obtain human silhouettes using background substraction and cluster them into several clusters. For each cluster, we compute the averaged gait image as features. Then, we learn a distance metric under which the intraclass variations are minimized and the interclass variations are maximized, simultaneously, such that more discriminative information can be exploited for classification. Experimental results on our dataset demonstrate the efficacy of the proposed approach.


ThPAT6	Room 201+202
Poster Session (11, 12)	Poster Session


ThBT1	Main Hall
Scene Text	Regular Session
Chair: Llados, Josep	Computer Vision Center
Co-Chair: Uchida, Seiichi	Kyushu Univ.

11:20-11:40, Paper ThBT1.1
Convolutional Neural Networks Applied to House Numbers Digit Classification
Sermanet, Pierre	New York Univ.
Chintala, Soumith	New York Univ.
LeCun, Yann	New York Univ.
Keywords: Character and Text Recognition, Features and Image Descriptors, Neural Networks Abstract: We classify digits of real-world house numbers using convolutional neural networks (ConvNets). ConvNets are hierarchical feature learning neural networks whose structure is biologically inspired. Unlike many popular vision approaches that are hand-designed, ConvNets can automatically learn a unique set of features optimized for a given task. We augmented the traditional ConvNet architecture by learning multi-stage features and by using Lp pooling and establish a new state-of-the-art of 95.10% accuracy on the SVHN dataset (48% error improvement). Furthermore, we analyze the benefits of different pooling methods and multi-stage features in ConvNets. The source code and a tutorial are available at eblearn.sf.net.

11:40-12:00, Paper ThBT1.2
Sharpness Estimation for Document and Scene Images
Kumar, Jayant	Univ. of Maryland Coll. Park
Chen, Francine	FX Palo Alto Lab.
Doermann, David	Univ. of Maryland
Keywords: Camera-Based Document Analysis, Image and Video Understanding, Document Understanding Abstract: Images of document pages have different characteristics than images of natural scenes, and so the sharpness measures developed for natural scene images do not necessarily extend to document images primarily composed of text. We present an efficient and simple method for effectively estimating the sharpness/blurriness of document images that also performs well on natural scenes. Our method can be used to predict the sharpness in scenarios where images are blurred due to camera-motion (or hand-shake), defocus, or inherent properties of the imaging system. The proposed method outperforms the perceptually-based, no-reference sharpness work of [1] and [4], which was shown to perform better than 14 other no-reference sharpness measures on the LIVE dataset.

12:00-12:20, Paper ThBT1.3
Text Detection in Natural Scenes Using Gradient Vector Flow-Guided Symmetry
Phan, Trung Quy	National Univ. of Singapore
Palaiahnakote, Shivakumara	National Univ. of Singapore
Tan, Chew-Lim	National Univ. of Singapore
Keywords: Camera-Based Document Analysis, Detection, Separation and Segmentation Abstract: In this paper, we propose a novel method for text detection in natural scenes. Gradient Vector Flow is first used to extract both intra-character and inter-character symmetries. In the second step, we group horizontally aligned symmetry components into text lines based on several constraints on sizes, positions and colors. Finally, to remove false positives, we employ a learning-based approach which makes use of Histogram of Oriented Gradients feature. The main advantage of the proposed method lies in the use of both the text features and the gap (i.e., inter-character) features. Existing techniques typically extract only the former and ignore the latter. Experiments on the benchmark ICDAR 2003 dataset show the good detection performance of our method on natural scene text.

12:20-12:40, Paper ThBT1.4
Wavelet-Gradient-Fusion for Video Text Binarization
Roy, Sangheeta	Kolkata
Palaiahnakote, Shivakumara	National Univ. of Singapore
Roy, Partha Pratim	Univ. Fran�ois Rabelais
Tan, Chew-Lim	National Univ. of Singapore
Keywords: Character and Text Recognition, Enhancement, Restoration and Filtering, Image and Video Processing Abstract: Achieving good character recognition rate in video images is not as easy as achieving the same from the scanned documents because of low resolution and complex background in video images. In this paper, we propose a new method using fusion of horizontal, vertical and diagonal information obtained by the wavelet and the gradient on text line images to enhance the text information. We apply k-means with k=2 on row-wise and column-wise pixels separately to extract possible text information. The union operation on row-wise and column-wise clusters provides the text candidates information. With the help of Canny of the input image, the method identifies the disconnections based on mutual nearest neighbor criteria on end points and it compares the disconnected area with the text candidates to restore the missing information. Next, the method uses connected component analysis to merge some sub-components based on nearest neighbor criteria. The foreground (text) and background (non-text) is separated based on new observation that the color values at edge pixel of the components are larger than the color values of the pixel inside the component. Finally, we use Google Tesseract OCR to validate our results and the results are compared with the baseline thresholding techniques to show that the proposed method is superior to existing methods in terms of recognition rate on 236 video and 258 ICDAR 2003 text lines.

12:40-13:00, Paper ThBT1.5
End-To-End Text Recognition with Convolutional Neural Networks
Wang, Tao	Stanford Univ.
Wu, David J.	Stanford Univ.
Coates, Adam	Stanford Univ.
Ng, Andrew	Stanford Univ.
Keywords: Character and Text Recognition, Neural Networks, Features and Image Descriptors Abstract: Full end-to-end text recognition in natural images is a challenging problem that has received much attention recently. Traditional systems in this area have relied on elaborate models incorporating carefully hand-engineered features or large amounts of prior knowledge. In this paper, we take a different route and combine the representational power of large, multilayer neural networks together with recent developments in unsupervised feature learning, which allows us to use a common framework to train highly-accurate text detector and character recognizer modules. Then, using only simple off-the-shelf methods, we integrate these two modules into a full end-to-end, lexicon-driven, scene text recognition system that achieves state-of-the-art performance on standard benchmarks, namely Street View Text and ICDAR 2003.


ThBT2	Multi-Purpose Hall
Segmentation	Regular Session
Chair: Haindl, Michael	Inst. of Information Theory and Automation
Co-Chair: Shapiro, Linda	Univ. of Washington

11:20-11:40, Paper ThBT2.1
Concurrent Segmentation of Categorized Objects from an Image Collection
Wang, Le	Xi'an Jiaotong Univ.
Xue, Jianru	Xi'an Jiaotong Univ.
Zheng, Nanning	Xi'an Jiaotong Univ.
Hua, Gang	Stevens Inst. of Tech.
Keywords: Segmentation, Color and Texture, Detection, Separation and Segmentation Abstract: We propose a method for automatic segmentation of categorized objects from a collection of images in the same category, which employs a single auto-context model learned from all images without the need of using pixel level labels. Instead of extracting the salient objects from each image one by one, we extract the objects from all images simultaneously. The segmentation of the salient objects is iteratively performed, where the auto-context model is incrementally learned based on new segmentations of all images at each iteration. Upon convergence, we obtain not only the clean segmentations of the salient objects, but also an auto-context classifier learned on all images which can readily be exploited to segment categorized object from a new image. Our experiments validated the efficacy of our proposed approach.

11:40-12:00, Paper ThBT2.2
Efficient Semantic Segmentation with Gaussian Processes and Histogram Intersection Kernels
Freytag, Alexander	Friedrich Schiller Univ. Jena
Fr�hlich, Bj�rn	Friedrich Schiller Univ. Jena
Rodner, Erik	Friedrich-Schiller Univ. Jena
Denzler, Joachim	Friedrich-Schiller Univ. of Jena
Keywords: Scene Understanding, Machine Learning and Data Mining Abstract: Semantic interpretation and understanding of images is an important goal of visual recognition research and offers a large variety of possible applications. One step towards this goal is semantic segmentation, which aims for automatic labeling of image regions and pixels with category names. Since usual images contain several millions of pixel, the use of kernel-based methods for the task of semantic segmentation is limited due to the involved computation times. In this paper, we overcome this drawback by exploiting efficient kernel calculations using the histogram intersection kernel for fast and exact Gaussian process classification. Our results show that non-parametric Bayesian methods can be utilized for semantic segmentation without sparse approximation techniques. Furthermore, in experiments, we show a significant benefit in terms of classification accuracy compared to state-of-the-art methods.

12:00-12:20, Paper ThBT2.3
Learning Non-Target Items for Interesting Clothes Segmentation in Fashion Images
Grana, Costantino	Univ. degli Studi di Modena e Reggio Emilia
Calderara, Simone	Univ. of Modena and Reggio Emilia
Borghesani, Daniele	Univ. degli Studi di Modena e Reggio Emilia
Cucchiara, Rita	Univ. degli Studi di Modena e Reggio Emilia
Keywords: Segmentation, Color and Texture, Statistical, Syntactic and Structural Pattern Recognition, 2D/3D Object Detection and Recognition Abstract: In this paper we propose a color-based approach for skin detection and interest garment selection aimed at an automatic segmentation of pieces of clothing. For both purposes, the color description is extracted by an iterative energy minimization approach and an automatic initialization strategy is proposed by learning geometric constraints and shape cues. Experiments confirms the good performance of this technique both in the context of skin removal and in the context of classification of garments.

12:20-12:40, Paper ThBT2.4
Relaxed Cheeger Cut for Image Segmentation
Paulhac, Ludovic	Univ. Bordeaux, Lab. UMR 5800, F-33400 Talence, France
Ta, Vinh-Thong	Univ. Bordeaux
Megret, Remi	Univ. of Bordeaux
Keywords: Segmentation, Color and Texture, Detection, Separation and Segmentation, Image and Video Processing Abstract: In this paper, we study and evaluate the application to image segmentation of a p-Laplacian based relaxation of the Cheeger Cut problem. Based on a l1 relaxation of the initial clustering problem, we show that these methods can outperform usual well-known graph based approaches, e.g., min-cut/max-flow algorithm or l2 spectral clustering, for unsupervised and very weakly supervised image segmentation. Experimental results demonstrate the benefits and the relevance of the proposed methodology, especially for a noisy image or when very few pixels are labeled for interactive image segmentation.

12:40-13:00, Paper ThBT2.5
Segmentation and Scene Modeling for MIL-Based Target Localization
Sankaranarayanan, Karthik	IBM Res.
Davis, James W.	Ohio State Univ.
Keywords: Motion, Tracking and Video Analysis, Scene Understanding, Segmentation, Color and Texture Abstract: Existing techniques for object tracking with Multiple Instance Learning take the approach of extracting low-level patches of fixed size and aspect ratios within each image, and employ many simplistic assumptions. In this work, we propose an approach that automatically utilizes image segments as input primitives to develop a multi-level segmentation-based system, and build a target model refinement procedure that learns the optimal model corresponding to the target object. To go beyond existing restrictive assumptions, we further develop automatic scene environmental models to assign prior probabilities to segment instances of belonging to the target vs scene. We demonstrate impressive qualitative and quantitative results with tracking sequences in typical outdoor surveillance settings.


ThBT3	Room 101+102
Object Recognition	Regular Session
Chair: Boyer, Kim	Rensselaer Pol. Inst.
Co-Chair: Nevatia, Ram	USC

11:20-11:40, Paper ThBT3.1
Region of Interest Detection Using Indoor Structure and Saliency Map
Kataoka, Kaori	NTT Cyber Space Lab.
Sudo, Kyoko	NTT Corp.
Morimoto, Masashi	NTT Corp.
Keywords: Segmentation, Color and Texture, Scene Understanding, Classification and Clustering Abstract: Detecting and identifying Regions of Interest (ROIs) is an important task for navigation and retrieval services. In this paper, we focus on indoor scene images and detect object regions such as shop signs and merchandise. Our method is based on two approaches; 1) Indoor structure analysis from a single image by learning the types of scenes. 2) Detect ROIs by taking advantage of the relationship of expected locations of planes and objects. We conduct a detection experiment and demonstrate the effectiveness of our proposal.

11:40-12:00, Paper ThBT3.2
Trajectory-Based Fisher Kernel Representation for Action Recognition in Videos
Ghanem, Bernard	King Abdullah Univ. of Science and Tech.
Atmosukarto, Indriyati	Advanced Digital Sciences Center
Ahuja, Narendra	-UIUC
Keywords: Motion, Tracking and Video Analysis, Image and Video Understanding Abstract: Action recognition is an important computer vision problem that has many applications including video indexing and retrieval, event detection, and video summarization. In this paper, we propose to apply Fisher kernel framework for action recognition in videos. The Fisher kernel framework combines the strengths of generative and discriminative models. In this approach, given the trajectories extracted from a video and a generative Gaussian Mixture Model (GMM), we use the Fisher Kernel method to describe how much the parameters of the GMM are modified to best fit the video trajectories. We experiment in using the Fisher Kernel vector to create the video representation and to train an SVM classifier. We further extend our framework to select the most discriminative trajectories using MIL-KNN framework. We compare the performance of our proposed approach to the current state-of-the-art bag-of-features (BOF) approach on three datasets. Experimental results show that our proposed approach is better than the existing state-of-the-art method~cite{wang2011} and that the selected discriminative trajectories are descriptive of the action class.

12:00-12:20, Paper ThBT3.3
Learning to Describe Color Composition of Visual Objects
Liu, Yuanliu	Xi'an Jiaotong Univ.
Liang, Yudong	Xi'anJiaotong Univ.
Yuan, Zejian	Xi'an Jiaotong Univ.
Zheng, Nanning	Xi'an Jiaotong Univ.
Keywords: Segmentation, Color and Texture, Classification and Clustering Abstract: Color composition is an important cue for image retrieval and object classification. In this paper we address the problem of inferring the color composition of visual objects from the pixel-level color distribution over the basic color terms. We build a discriminative model to tag each region with a dominant color and an associate one. We learn the human preference and cooccurrance patterns of the color names from weakly labeled real-world images. Experimental results on the ImageNet-Attribute data set and the Ebay data set show that our model can effectively describe the color composition of real-world images.

12:20-12:40, Paper ThBT3.4
A Stable Graph-Based Representation for Object Recognition through High-Order Matching
Albarelli, Andrea	Univ. Ca' Foscari di Venezia
Bergamasco, Filippo	Univ. Ca' Foscari di Venezia
Rossi, Luca	Univ. Ca' Foscari di Venezia
Vascon, Sebastiano	Univ. Ca' Foscari di Venezia
Torsello, Andrea	Univ. Ca' Foscari
Keywords: 2D/3D Object Detection and Recognition, Features and Image Descriptors, Pattern Recognition for Search, Retrieval and Visualization Abstract: Many Object recognition techniques perform some flavour of point pattern matching between a model and a scene. Such points are usually selected through a feature detection algorithm that is robust to a class of image transformations and a suitable descriptor is computed over them in order to get a reliable matching. Moreover, some approaches take an additional step by casting the correspondence problem into a matching between graphs defined over feature points. The motivation is that the relational model would add more discriminative power, however the overall effectiveness strongly depends on the ability to build a graph that is stable with respect to both changes in the object appearance and spatial distribution of interest points. In fact, widely used graph-based representations, have shown to suffer some limitations, especially with respect to changes in the Euclidean organization of the feature points. In this paper we introduce a technique to build relational structures over corner points that does not depend on the spatial distribution of the features.


ThBT4	Hall 200
Biomedical Application	Regular Session
Chair: Kita, Yasuyo	National Inst. ofAdvancedIndustrialScienceandTechnology
Co-Chair: Schaefer, Gerald	Loughborough Univ.

11:20-11:40, Paper ThBT4.1
Effective Multiple Classifier Systems for Breast Thermogram Analysis
Bartosz, Krawczyk	Wroclaw Univ. of Tech.
Schaefer, Gerald	Loughborough Univ.
Keywords: Classification and Clustering, Machine Learning and Data Mining, Computer-Aided Diagnosis and Surgery Abstract: Breast cancer is the most commonly diagnosed form of cancer in women. Thermography, which uses cameras with sensitivities in the thermal infrared, has been shown to provide an interesting modality for detecting breast cancer as it is able to detect small tumors and hence can lead to earlier diagnosis. In this paper, we present an effective approach to breast thermogram analysis that utilises features describing bilateral symmetries from an image, and utilises a classifier ensemble for decision making. Importantly, our classification approach addresses the problem of imbalanced class distribution that is common in medical decision making. We do this by constructing feature subspaces from balanced data subsets and train different classifiers on different subspaces. To combine the individual classifiers, we use a neural network as classifier fuser. We show our approach to work well and to lead to significantly improved performance compared to canonical classifiers and classifier ensembles.

11:40-12:00, Paper ThBT4.2
Applying Textural Features to the Classification of HEp-2 Cell Patterns in IIF Images
Di Cataldo, Santa	Pol. di Torino
Bottino, Andrea	Pol. di Torino
Ficarra, Elisa	Pol. di Torino, Italy
Macii, Enrico	Pol. di Torino
Keywords: Classification and Clustering, Features and Image Descriptors, Computer-Aided Diagnosis and Surgery Abstract: The analysis of anti-nuclear antibodies in HEp-2 cells by indirect immunofluorescence (IIF) is fundamental for the diagnosis of important immune pathologies; in particular, classifying the staining pattern of the cell is critical for the differential diagnosis of several types of diseases. Current tests based on human evaluation are time-consuming and suffer from very high variability, which impacts on the reliability of the results. As a solution to this problem, in this work we propose a technique that performs automated classification of the staining pattern. Our method combines textural feature extraction and a two-step feature selection scheme to select a limited number of image attributes that are best suited to the classification purpose and then recognizes the staining pattern by means of a Support Vector Machine module. Experiments on IIF images showed that our method is able to identify staining patterns with average accuracy of about 87%.

12:00-12:20, Paper ThBT4.3
Classification of Biological Cells Using Bio-Inspired Descriptors
Bel haj ali, Wafa	I3S, CNRS/Univ. of Nice Sophia Atipolis
Piro, Paolo	Istituto Italiano di Tecnologia (IIT)
Giampaglia, Dario	I3S, CNRS/Univ. of Nice Sophia Atipolis
Pourcher, Thierry	CEA/Univ. of Nice Sophia Antipolis
Nock, Richard	Univ. des Antilles et de la Guyane
Barlaud, Michel	CNRS/Univ. of Nice-Sophia Antipolis
Keywords: Machine Learning and Data Mining, Classification and Clustering, Image and Video Processing Abstract: This paper proposes a novel automated approach for the categorization of cells in fluorescence microscopy images. Our supervised classification method aims at recognizing patterns of unlabeled cells based on an annotated dataset. First, the cell images need to be indexed by encoding them in a feature space. For this purpose, we propose tailored bio-inspired features relying on the distribution of contrast information. Then, a supervised learning algorithm is proposed for classifying the cells. We carried out experiments on cellular images related to the diagnosis of autoimmune diseases, testing our classification method on the HEp-2 Cells dataset of Foggia et al (CBMS 2010). Results show classification precision larger than 96% on average, thus confirming promising application of our approach to the challenging application of cellular image classification for computer-aided diagnosis.

12:20-12:40, Paper ThBT4.4
Robust Regularized Feature Selection for Iris Recognition Via Linear Programming
Wang, Libin	Inst. of Automation, Chinese Acad. of Science
Sun, Zhenan	Inst. of Automation, Chinese Acad. of Sciences.
Tan, Tieniu	casia
Keywords: Biometrics, Feature Reduction and Manifold Learning, Machine Learning and Data Mining Abstract: Ordinal measures are robust image descriptors for encoding discriminative features of iris images. However, there are many tunable parameters in ordinal filters which can generate an over-complete feature pool. This paper proposes a novel feature selection method based on linear programming, which can learn a compact and effective ordinal feature set for iris recognition. Firstly, large margin principle is employed to obtain strong generalization capacity. Secondly, discriminative information for each feature is added to make the model more robust to noise. Finally, non-negative weight is not only interpretable but also suitable for a linear model. Additionally, the model can be efficiently solved by the Simplex algorithm. The comparative experiments are conducted on CASIA-Iris-V4 database, and the results show that our method has outperformed other state-of-the-art feature selection methods, including Adaboost and Sparsity based methods.

12:40-13:00, Paper ThBT4.5
HEp-2 Cell Classification in IIF Images Using ShareBoost
Ersoy, Ilker	Univ. of Missouri Columbia
Bunyak, Filiz	Univ. of Missouri
Peng, Jing	Montclair State Univ.
Palaniappan, Kannappan	Univ. of Missouri
Keywords: Pattern Recognition for Bioinformatics, Classification and Clustering, Medical Image Analysis and Registration Abstract: Indirect immunofluorescence (IIF) imaging is a method used for detection of antinuclear auto-antibodies (ANA) for the diagnosis of autoimmune diseases. We present a feature extraction and classification scheme to classify the fluorescence staining patterns of HEp-2 cells in IIF images. We propose a set of complementary features that are sensitive to staining pattern variations among classes. Our feature set utilizes local shape measures via Hessian matrix, gradient features using our adaptive robust structure tensors and texture features. We apply our multi-view ShareBoost algorithm to this set using each feature descriptor as a separate view. ShareBoost utilizes a single re-sampling distribution for all views that helps the classifier to exploit the interplay between subspaces and is robust to noisy labels. Our experimental results show an average of over 90 percent accuracy in classification of six HEp-2 cell types.


ThBT5	Hall 300
Gesture and Action Analysis-I	Regular Session
Chair: Sugimoto, Akihiro	National Inst. of Informatics
Co-Chair: Wolf, Christian	INSA de Lyon

11:20-11:40, Paper ThBT5.1
Action Recognition with Discriminative Mid-Level Features
Liu, Cuiwei	Beijing Inst. of Tech.
Kong, Yu	SUNY at Buffalo
Wu, Xinxiao	Beijing Lab. of IntelligentInformationTechnology,School of
Jia, Yunde	Beijing Inst. of Tech.
Keywords: Gesture and Behavior Analysis Abstract: This paper presents a novel random forest learning framework to construct a discriminative and informative mid-level feature from low-level features. Since a single low-level feature based representation is not enough to capture the variations of human appearance, multiple low-level features (i.e., optical flow and histogram of gradient 3D features) are fused to further improve recognition performance. The mid-level feature is employed by a random forest classifier for robust action recognition. Experiments on two publicly available action datasets demonstrate that using both the mid-level feature and the fusion of multiple low-level features leads to a superior performance over previous methods.

11:40-12:00, Paper ThBT5.2
Robust 3D Human Pose Estimation Via Dual Dictionaries Learning
Ji, Hao	BUPT
Su, Fei	BeijingUniversityofPostsandTelecommunications,Beijing,China
Keywords: Gesture and Behavior Analysis Abstract: In this paper, a new dual dictionaries learning (DDL) method is proposed for robust 3D human pose estimation. The performance and applicability of traditional methods are limited by a lack of robustness to corrupted observations caused by occlusions or poor background subtraction. Our DDL approach aims at simultaneously constructing two overcomplete dictionaries, called the visual observation dictionary (VBD) and the body configuration dictionary (BCD), with a shared sparse representation (SSR) regularization with respect to every data sample. Under such regularization, two dictionaries are tied together and the 3D pose estimation problem can be reduced to a simple ell^1 optimization problem given a new test visual observation. We also propose a efficient algorithm based on inexact Augmented Lagrange Multiplier (IALM) method to solve the above DDL optimization model. Experimental results on HumanEva database show the superiority of our approach over several current state of the art methods.

12:00-12:20, Paper ThBT5.3
Sparse Granger Causality Graphs for Human Action Classification
Yi, Saehoon	Rutgers Univ.
Pavlovic, Vladimir	Rutgers Univ.
Keywords: Gesture and Behavior Analysis, Classification and Clustering, Statistical, Syntactic and Structural Pattern Recognition Abstract: Basic understanding and recognition of human actions can be accomplished by modeling the spatiotemporal relationship among major skeletal joints. In this work we present an approach that models human actions using temporal causal relations of joint movements. The relations form a graph with joints as nodes and edges induced by the Granger causality measure between pairs of joint point processes. Each human action is then represented by a distinct sparse causality graph. Experiments on motion capture data illustrate the robustness of this approach and its advantages over state-of-the-art methods.

12:20-12:40, Paper ThBT5.4
Incorporating Contextual Knowledge to Dynamic Bayesian Networks for Event Recognition
Wang, Xiaoyang	RPI
Ji, Qiang	RPI
Keywords: Statistical, Syntactic and Structural Pattern Recognition, Pattern Recognition for Surveillance and Security, Classification and Clustering Abstract: This paper proposes a new Probabilistic Graphical Model (PGM) to incorporate the scene, event object interaction and the event temporal contexts into Dynamic Bayesian Networks (DBNs) for event recognition in surveillance videos. We first construct the event DBNs for modeling the events from their own appearance and kinematic observations, and then extend the DBN to incorporate the contexts for boosting event recognition performance. Unlike the existing context methods, our model incorporates various contexts into one unified model. Experiments on natural scene surveillance videos show that the contexts can effectively improve the event recognition performance even with great challenges like large intra-class variations and low image resolution.

12:40-13:00, Paper ThBT5.5
A Novel Probabilistic Approach Utilizing Clip Attribute As Hidden Knowledge for Event Recognition
Wang, Xiaoyang	RPI
Ji, Qiang	RPI
Keywords: Statistical, Syntactic and Structural Pattern Recognition, Pattern Recognition for Surveillance and Security, Classification and Clustering Abstract: This paper proposes a novel probabilistic approach to utilize clip attributes as hidden knowledge for event recognition. Event recognition in surveillance videos is very challenging due to its large intra-class variations and relative low image resolution. The clip attributes, that are available only during training, provide auxiliary hidden information about the variation of the event appearance. Utilizing such hidden knowledge can help better model the joint probability distribution between event and its observations, and thus improve the recognition performance. We propose a probabilistic model to systematically incorporate the clip attributes into the event recognition. Experiments on real surveillance data show improved event recognition performance with the use of the clip attributes.


ThPSBT1	Main Hall
Poster Shotgun (13): PR	Regular Session

14:00-14:30, Paper ThPSBT1.1
Metric Learning for Graph Based Semi-Supervised Human Pose Estimation
Nima, Pourdamghani	Sharif Univ. of Tech.
Rabiee, Hamid Reza	Sharif Univ. of Tech.
Zolfaghari, Mohammadreza	Sharif Univ. of Tech.
Keywords: Feature Reduction and Manifold Learning, Gesture and Behavior Analysis, Machine Learning and Data Mining Abstract: Discriminative approaches to human pose estimation have became popular in recent years. These approaches face a big challenge: Similar inputs might correspond to very dissimilar poses. This property misleads the mapping functions which rely on the Euclidean distances in the input space. In this paper, we use the distances between the labels of the training data to learn a metric and map the input data to a space where this problem is minimized. Our mapping is linear and hence preserves the manifold structure of the input data. We benefit from the unlabeled data to estimate this manifold in the new space as a nearest neighbor graph. We finally utilize Tikhonov regularization to find a smooth estimation of the labels over this manifold. Experimental results show the superiority of the proposed method both in the amount of required training data and the performance of labeling.

14:00-14:30, Paper ThPSBT1.2
Prediction of Drowsy Driving by Monitoring Driver's Behavior
Matsuo, Haruo	Nissan Motor Co., Ltd.
Khiat, Abdelaziz	Nissan Motor Co., Ltd.
Keywords: Gesture and Behavior Analysis, Human Computer Interaction Abstract: In this paper, we present a system that monitors the increasing frequency of driver's subsidiary behavior as a revealing indicator of driver's decrease in arousal level that would cause drowsy driving. The considered involuntary subsidiary behavior events are yawning, self-touching hand motion and head twisting movement, which increased as a signal of struggle against sleepiness in a driving simulator experiment. We show that by combining this index with conventional indexes of eye closure rate and head sway, we obtained better results than conventional drowsiness detection systems. The results obtained in proving ground driving conditions were satisfactory and the timing of the generated feedback was acceptable to most of the volunteers who tested our prototype.

14:00-14:30, Paper ThPSBT1.3
Accelerated Robust Sparse Coding for Fast Face Recognition
Liu, Guanglu	Xiamen Univ.
Yan, Yan	Xiamen Univ.
Wang, Hanzi	Xiamen Univ.
Keywords: Biometrics Abstract: In this paper, we propose an accelerated robust sparse coding (ARSC) method which is based on the combination of linear regression based classification and robust sparse coding for fast face recognition. Firstly, linear regression based classification (LRC) is used to select the candidate face set which can effectively reduce the search scope. Then, robust sparse coding (RSC) is applied to perform accurate face identification in the selected face set. Extensive experimental results on various face databases show that ARSC can greatly reduce the computational complexity while achieving high accuracy performance.

14:00-14:30, Paper ThPSBT1.4
Accurate Iris Localization Using Contour Segments
Li, Haiqing	Chinese Acad. of Sciences
Sun, Zhenan	Inst. of Automation, Chinese Acad. of Sciences.
Tan, Tieniu	casia
Keywords: Biometrics, Detection, Separation and Segmentation Abstract: We consider the problem of locating pupillary and limbic boundaries in iris images captured in non-cooperative environment. This work presents an efficient segment search algorithm, which takes advantage of shape information and learned iris boundary detectors, to enable exclusion of most noisy edges and extraction of genuine pupillary contour segments. Pupillary boundaries can then be accurately fitted as ellipses using the extracted segments. To locate limbic boundaries more stably, the shapes of pupillary boundaries constrain limbic boundary localization by adding inferred points during ellipse fitting. Extensive experiments on the challenging CASIA-Iris-Thousand iris image database demonstrate the effectiveness and efficiency of the proposed method.

14:00-14:30, Paper ThPSBT1.5
A Multiple Kernel Learning Framework for Detecting Altered Fingerprints
Tiribuzi, Michela	Univ. of Perugia
Pastorelli, Marco	Univ. of Perugia
Valigi, Paolo	Univ. of Perugia
Ricci, Elisa	Univ. of Perugia
Keywords: Biometrics, Classification and Clustering Abstract: The accurate performance achieved by current biometric recognition systems based on automated fingerprints analysis has induced criminals to evade system identification by altering their fingerprints on purpose. In this paper, we propose a novel approach for detecting altered fingerprints. Our method is based on the combination of multiple complementary features, such as minutiae density maps and orientation entropy features, describing the discontinuity of the orientation field at multiple scales. Differently from previous works, we propose to learn the correct weights of different features by adopting a Multiple Kernel Learning framework to enhance the discriminative power of a SVM classifier. Experimental results demonstrate that the proposed approach achieves competitive performance with state-of-the-arts methods.

14:00-14:30, Paper ThPSBT1.6
Matching of Multi-Resolution Image for Remote Sensing Glacier Detection
Guermazi, Ahmed	Univ. of Savoie
Valet, Lionel	Univ. of Savoie
Bolon, Philippe	Pol. / Univ. of Savoie
Keywords: Classification and Clustering, Remote Sensing Abstract: Multi-resolution images provide a rich source of information but the combining of their data is still difficult due to their different characteristics. In this paper, a multi-source data fusion approach is presented. A common space for information representation is generated by selecting points of interest that are then connected through a Delaunay triangulation. The obtained topology allows the classification of landforms based on all the available attributes. This approach is illustrated on glacier detection in Alps mountains.

14:00-14:30, Paper ThPSBT1.7
Exploring the Effects of Video Length on Gait Recognition
Mart�n-F�lez, Ra�l	Univ. Jaume I (Castell�n)
Ortells, Javier	Univ. Jaume I de Castell�
Mollineda, Ram�n A.	Univ. Jaume I
Keywords: Statistical, Syntactic and Structural Pattern Recognition, Classification and Clustering, Biometrics Abstract: This paper delves into the effectiveness of a gait recognition process depending on the length of the video sequence used. To this end, a well-known gait representation, the Gait Energy Image (GEI), is incrementally computed from gait cycles in the order they occur. The main objective is to assess the problem of the minimum number of gait cycles required to obtain discriminant GEIs. An experimental study is conducted on two public databases covering both indoor and outdoor environments, and variable covariate conditions. Results show that a few gait cycles are enough to succeed in gait recognition.

14:00-14:30, Paper ThPSBT1.8
Online Adaptive Learning for Multi-Camera People Counting
Li, Jingwen	Chinese Acad. of Sciences
Huang, Lei	Inst. of Automation, Chinese Acad. of Sciences
Liu, Changping	Inst. of Automation, Chinese Acad. of Sciences
Keywords: Pattern Recognition for Surveillance and Security, Motion, Tracking and Video Analysis Abstract: People counting has attracted much attention in video surveillance. This paper proposes an online adaptive learning people counting system across multiple cameras with partial overlapping Fields Of Views (FOVs). The main novelty of this system is that: 1) we propose an online adaptive learning scheme to detect and count people in order to make the system adaptive to various scenes. The system can online update the Gaussian Mixture Model (GMM) based classifier by collecting samples with high confidence automatically; 2) We present an approach to gather the number of people from multiple cameras. The system uses similarity measurement combined with homography transformation to find the corresponding people in overlapping FOVs and integrates the counting results of multiple cameras finally. Experimental results show that the proposed system can adapt to different scenes and count the pedestrians across multiple cameras accurately.

14:00-14:30, Paper ThPSBT1.9
Recognition of Facial Expressions Using Locally Weighted and Adjusted Order Pseudo Zernike Moments
Ahmady, Maryam	Islamic Azad Univ. Qazvin Branch
Rashidy Kanan, Hamidreza	Islamic Azad Univ. Qazvin Branch
Keywords: Gesture and Behavior Analysis, Statistical, Syntactic and Structural Pattern Recognition, Human Computer Interaction Abstract: Recently, various approaches to facial expression recognition have been proposed, but they do not provide a powerful approach to recognize expressions from Partially Occluded Facial Images. Moreover, they usually are global and the importance of different areas in facial images is considered equally. In this paper, we propose a novel facial expression recognition approach based on locally weighted and adjusted order Pseudo Zernike Moments (PZM). PZM is one of the best descriptors that are robust to noise and rotation. In our system, the proposed method employs a local PZM to represent faces partitioned into patches. Also, in this paper, the maximum order of PZM is adjusted based on the importance of the local areas. An extensive experimental investigation is conducted using JAFFE, FG-Net and Radboud Faces databases. The encouraging experimental results demonstrate that the proposed method has significant improvement than other methods. Moreover, our system is robust to the changes on age, ethnicity, and gender.

14:00-14:30, Paper ThPSBT1.10
Robust Object Recognition Via Third-Party Collaborative Representation
Wu, Yang	Kyoto Univ.
Minoh, Michihiko	Kyoto Univ.
Mukunoki, Masayuki	Kyoto Univ.
Lao, Shihong	OMRON Social Solutions Co., LTD
Keywords: Classification and Clustering, Pattern Recognition for Surveillance and Security, Pattern Recognition for Search, Retrieval and Visualization Abstract: A simple and effective method is proposed for object recognition via collaborative representation with ridge regression. Different from existing sparse representation and collaborative representation based approaches, the proposal does not need extensive training samples for each testing class and it is robust to localization errors and large within-class variations, thus being applicable to various real-world object recognition tasks instead of handling only the well-controlled face recognition problem. Its discriminative power is explored from a third-party dataset which can be different from the training and testing datasets, therefore, it enables using an existing dictionary for testing new data without time-consuming data annotation and model re-training. As an example, the proposal is extensively tested on the representative and very challenging task of person re-identification, defining novel state-of-the-art results on widely adopted benchmark datasets using only simple and common features.

14:00-14:30, Paper ThPSBT1.11
Iris Image Classification Based on Color Information
Zhang, Hui	Shanghai Inst.
Sun, Zhenan	Inst. of Automation, Chinese Acad. of Sciences.
Tan, Tieniu	casia
Wang, Jianyu	SITP
Keywords: Biometrics, Segmentation, Color and Texture, Features and Image Descriptors Abstract: Iris recognition systems using iris images captured in visible light have several advantages compared to using near infrared (NIR) images, and draw attention from biometrics researchers. The acquisition of color iris image does not ask for special cameras, and reserves the color information of iris. The color information can be used as an important clue for iris classification which improves performance of iris recognition on non-ideal iris images. In this paper, we propose a novel color feature for iris classification, named as iris color Texton using RGB, HSI and lαβ color spaces. Extensive experiments are performed on three databases. The proposed iris color Texton shows advantages in iris image classification based on color information.

14:00-14:30, Paper ThPSBT1.12
Tablet Owner Authentication Based on Behavioral Characteristics of Multi-Touch Actions
Nakamura, Kumi	Osaka Univ.
Kono, Kazuhiro	Kansai Univ.
Ito, Yoshimichi	Osaka Univ.
Babaguchi, Noboru	Osaka Univ.
Keywords: Pattern Recognition for Bioinformatics Abstract: In this paper, we propose a method for tablet owner authentication based on behavioral characteristics of multi-touch actions. In recent years, password authentication is generally used in tablet-type devices that have multi-touch screen as input interfaces. However, it would be inappropriate because tablet-type devices are smaller than desktop/laptop computers, and they are hard to enter password. It would be more appropriate to use multi-touch screen for owner authentication of tablet-type devices, and thus, the authentication using multiple-fingers' actions called multi-touch actions is quite natural. The proposed method is based on dynamic time warping, which has been commonly used for authentication using pen-tablet or single finger's actions, but a problem due to the use of multi-touch actions arises, that is, finding correspondence between trajectories and figures. A procedure for solving this problem is also presented in this paper. Using proposed method, we evaluate the authentication accuracies for several types of multi-touch actions through experiments from the viewpoint of equal error rate (EER). Experimental results show that EER of authentication by drawing zig-zag shapes is the best in multi-touch actions and is about 8.3%.

14:00-14:30, Paper ThPSBT1.13
A New Statistical Model for Activity Discovery and Recognition in Pervasive Environments
Chikhaoui, Belkacem	Univ. of Sherbrooke
Wang, Shengrui	Univ. of Sherbrooke
Pigot, H�l�ne	Univ. of Sherbrooke
Keywords: Machine Learning and Data Mining, Classification and Clustering, Gesture and Behavior Analysis Abstract: This paper presents a new unsupervised statistical model for human activity discovery and recognition in pervasive environments. The activities are encoded in sequences recorded by non-intrusive sensors disseminated in the environment. Our model studies the relationship between the activities and the sequential patterns from the sequence analysis perspective. Activity discovery is formulated as an optimization problem which is solved by maximization of the likelihood of data. We present experimental results on real datasets recorded in smart homes for persons performing their activities of daily living. The results obtained demonstrate the suitability of our model for activity discovery and recognition and how it outperforms most of the widely used approaches.

14:00-14:30, Paper ThPSBT1.14
A Ranking-Based Cascade Approach for Unbalanced Data
Bria, Alessandro	Univ. of Cassino and L. M.
Marrocco, Claudio	Univ. degli Studi di Cassino e del Lazio Meridionale
Molinara, Mario	Univ. degli Studi di Cassino e del Lazio Meridionale
Tortorella, Francesco	Univ. degli Studi di Cassino e del Lazio Meridionale
Keywords: Statistical, Syntactic and Structural Pattern Recognition, Classification and Clustering, Machine Learning and Data Mining Abstract: In this paper we present a cascade-based framework for object detection in which the node classifiers are trained by a learning algorithm based on ranking instead of classification error. Such an approach is particularly suited for facing the asymmetry between positive and negative class, that is a huge problem in object detection applications. Other methods focused on this problem and previously proposed, such as AsymBoost, rely on an asymmetric weight updating mechanism of the samples based on a parameter k which estimates the degree of skewing between the classes. Actually such parameter is difficult to choose and requires a significant tuning activity during the training phase. On the contrary, our approach is nonparametric and has demonstrated to provide slightly better performance when compared with AsymBoost on a real detection problem.

14:00-14:30, Paper ThPSBT1.15
Statistical Origin-Destination Generation with Multiple Sources
Morimura, Tetsuro	IBM Res. - Tokyo
Kato, Sei	IBM Res. - Tokyo
Keywords: Machine Learning and Data Mining, Statistical, Syntactic and Structural Pattern Recognition, Gesture and Behavior Analysis Abstract: Any trajectory is always generated with its origin and destination. Origin-destination (OD) generation for trips plays an important role in many applications such as trajectory mining, traffic simulation, or marketing. In previous work on traffic pattern recognition, microscopic ODs for limited areas are estimated with probe-car data, while macroscopic ODs for broad areas are usually generated by using road-traffic-census data. In this paper, we propose a microscopic OD determination method for broad areas with the same data and landmark information, which is based on an L1-regularized Poisson regression. We demonstrate performance improvements over baseline methods in numerical experiments with a massive data set from Tokyo.

14:00-14:30, Paper ThPSBT1.16
Protein Structure Similarity Based on Multi-View Images Generated from 3D Molecular Visualization
Suryanto, Chendra Hadi	Univ. of Tsukuba
Jiang, Shukun	Univ. of Tsukuba
Fukui, Kazuhiro	Univ. of Tsukuba
Keywords: Pattern Recognition for Bioinformatics, Classification and Clustering, Statistical, Syntactic and Structural Pattern Recognition Abstract: Comparing the structures of proteins is one of the most challenging problems in structural biology. Root Mean Square Distance (RMSD) has become a standard measurement to calculate the similarity between two protein structures. However, to get the best result one has to align and superpose the two protein structures, which raises issues related to finding the best alignment technique. In this paper, we propose a new approach to protein structure comparison using canonical angles between two subspaces generated from multiple views of the protein structure visualization. The main advantage of our approach is that no protein alignment is required. Moreover, since we also consider the various visualization types of the 3D protein structures (backbone, ribbons, and rockets), our protein descriptors contain more elaborate structures and characteristics of the protein, which possibly cannot be represented by only a single visualization geometry. The validity of our proposed method is shown by experiments on classifications of four classes of protein in which our approach exhibited better performance than the two well-known methods of combinatorial extension alignment and the Gauss integral tuning.

14:00-14:30, Paper ThPSBT1.17
Learning to Detect Traffic Signs: Comparative Evaluation of Synthetic and Real-World Datasets
M�gelmose, Andreas	Aalborg Univ.
Trivedi, Mohan	Univ. of California, San Diego
Moeslund, Thomas	Aalborg Univ.
Keywords: Machine Learning and Data Mining, Pattern Recognition for Surveillance and Security Abstract: This study compares the performance of sign detection based on synthetic training data to the performance of detection based on real-world training images. Viola-Jones detectors are created for 4 different traffic signs with both synthetic and real data, and varying numbers of training samples. The detectors are tested and compared. The result is that while others have successfully used synthetic training data in a classification context, it does not seem to be a good solution for detection. Even when the synthetic data covers a large part of the parameter space, it still performs significantly worse than real-world data.

14:00-14:30, Paper ThPSBT1.18
Human Action Recognition Using Action Trait Code
Shih-Yao, Lin	National Taiwan Univ.
Shie, Chuen Kai	Nation Taiwan Univ.
Chen, Shen-Chi	National Taiwan Univ.
Lee, Ming-Sui	National Taiwan Univ.
Hung, Yi-Ping	National Taiwan Univ.
Keywords: Gesture and Behavior Analysis, Classification and Clustering Abstract: Recognizing actions having similar movements is a challenging problem. We divide human action understanding task into two issues. One is a classical action recognition task where we employ a probabilistic model to learn and recognize human actions. The second is action categorization task where we classify actions based on quantized human movement. In this paper, we present an approach called Action Trait Code (ATC) for human action classification. ATC represents an action with a set of velocity types derived by the averages velocity of each body part. An effective graph model based on ATC classification is employed for learning and recognizing human actions. To examine recognition accuracy, we evaluate our approach on Cornell Kinect Activity Database and compare with a hierarchical maximum entropy Markov model (MEMM). Besides, the results on self-collected action database demonstrate that our proposed approach not only successfully achieves high recognition accuracy but also performs in real-time.

14:00-14:30, Paper ThPSBT1.19
Combining Gradient Histograms Using Orientation Tensors for Human Action Recognition
Almeida Perez, Eder	Univ. Federal de Juiz de Fora
Fernandes Mota, Virginia	Univ. Federal de Juiz de Fora
Maciel, Luiz Maur�lio	Univ. Federal de Juiz de Fora
Sad, Dhiego	Univ. Federal de Juiz de Fora
Bernardes Vieira, Marcelo	Univ. Federal de Juiz de Fora
Keywords: Pattern Recognition for Search, Retrieval and Visualization, Image and Video Understanding Abstract: We present a method for human action recognition based on the combination of Histograms of Gradients into orientation tensors. It uses only information from HOG3D: no features or points of interest are extracted. The resulting raw histograms obtained per frame are combined into an orientation tensor, making it a simple, fast to compute and effective global descriptor. The addition of new videos and/or new action cathegories does not require any recomputation or changes to the previously computed descriptors. Our method reaches 92.01% of recognition rate with KTH, comparable to the best local approaches. For the Hollywood2 dataset, our recognition rate is lower than local approaches but is fairly competitive, suitable when the dataset is frequently updated or the time response is a major application issue.

14:00-14:30, Paper ThPSBT1.20
Discovering Regular and Consistent Behavioral Patterns in Topical Tweeting
Dey, Lipika	Tata Consultancy Services
Gaonkar, Bhakti	Tata Consultancy Services
Keywords: Classification and Clustering, Pattern Recognition for Search, Retrieval and Visualization Abstract: The study of user activity and information in microblogging sites like Twitter has gained momentum to provide real insights about user influence, predicting their actions and information flow optimization. In this paper we present a wavelet-based clustering mechanism that can group users according to their temporal activity profiles. Our study establishes that users of different professions with different objectives can be effectively segregated using temporal profile clustering.

14:00-14:30, Paper ThPSBT1.21
Learning to Predict Super Resolution Wavelet Coefficients
Kumar, Neeraj	Indian Inst. of Tech. Guwahati
Rai, Naveen Kumar	Indian Inst. of Tech. Guwahati
Sethi, Amit	Indian Inst. of Tech. Guwahati
Keywords: Neural Networks, Image and Video Processing Abstract: We develop a wavelet domain learning based technique for single image super resolution (SI SR). First, we learn a mapping between a patch of approximate coefficients (ACs) and the detail coefficients (DCs)corresponding the center location of the patch using Neural Networks. We then obtain an SR image by using an approximate version of the original image (scaled as per the DWT size requirements of the final image) as ACs and by predicting the corresponding DCs using the mapping thus learnt. Our results compare favorably to both mature techniques and state of the art other learning based techniques.

14:00-14:30, Paper ThPSBT1.22
Breath Rate Monitoring During Sleep Using Near-IR Imagery and PCA
Martinez, Manuel	Karlsruhe Inst. of Tech.
Stiefelhagen, Rainer	Karlsruhe Inst. of Tech. & Fraunhofer IOSB,Karlsruhe
Keywords: Gesture and Behavior Analysis, Remote Sensing, Medical Image Analysis and Registration Abstract: We present a vision based method to estimate the respiration rate of subjects from their chest movements. In contrast to alternative approaches, our method is fully automated, non-invasive, robust to occlusions, and only depends on off-the-shelf hardware. We project a fixed infrared (IR) dot pattern. The dots are detected using a camera with a matching IR filter. We estimate the dots' barycenters with sub-pixel precision and we track them over a 30 seconds sliding window. We merge all trajectories using Principal Component Analysis(PCA) and use Autoregressive (AR) Spectral Analysis to estimate the respiratory rate. The system was evaluated on 9 subjects and on a range of simulated scenarios using an artificial chest.

14:00-14:30, Paper ThPSBT1.23
Importance-Weighted Label Prediction for Active Learning with Noisy Annotations
Zhao, Liyue	Univ. of Central Florida
Sukthankar, Gita	Univ. of Central Florida
Sukthankar, Rahul	Google
Keywords: Machine Learning and Data Mining Abstract: This paper presents a practical method for pool-based active learning that is robust to annotation noise. Our work is inspired by recent approaches to active learning in two different noise-free settings: importance-weighted methods for streams and unbiased pool-based techniques. In our proposed method, we employ an ensemble of classifiers to guide the label requests from a pool of unlabeled training data. We demonstrate, using several standard datasets, that the proposed approach, which employs label prediction in combination with importance-weighting, significantly improves active learning in the presence of annotation noise. Moreover, the ease with which the proposed method can be implemented should make it widely applicable to a broad range of real-world applications.

14:00-14:30, Paper ThPSBT1.24
KL Based Data Fusion for Target Tracking
Peng, Jing	Montclair State Univ.
Palaniappan, Kannappan	Univ. of Missouri
Candemir, Sema	Univ. of Missouri
Seetharaman, Guna	AFRL/RITB
Keywords: Statistical, Syntactic and Structural Pattern Recognition, Motion, Tracking and Video Analysis Abstract:

14:00-14:30, Paper ThPSBT1.25
Logo Spotting for Document Categorization
Le, Viet Phuong	La Rochelle Univ.
Visani, Muriel	Univ. of La Rochelle
Tran, Cao De	Can Tho Univ.
Ogier, Jean-Marc	Univ. de la Rochelle
Keywords: Pattern Recognition for Search, Retrieval and Visualization, Graphics Recognition Abstract: Logo spotting is of a great interest because it enables to categorize the document images of a digital library of scanned documents according to their sources, without any costly semantic analysis of their textual transcript. In this paper, we present an approach for logo spotting, based on the matching of keypoints extracted both from the query document images and a given set of logos (gallery) using SIFT. In order to filter the matching points and keep only the most relevant, we compare the spatial distribution of the matching keypoints in the query image and in the logo gallery. We test our approach using a large collection of real world documents using a well-known benchmark database of logos and show that our approach achieves good performances compared to state-of-the-art approaches.

14:00-14:30, Paper ThPSBT1.26
Flow Modeling and Skin-Based Gaussian Pruning to Recognize Gestural Actions Using HMM
Ahmad, Omer Rashid	IESK, OvG Univ. Magdeburg Germany
Al-Hamadi, Ayoub	IESK, Otto-von-Guericke-Univ. Magdeburg
Keywords: Gesture and Behavior Analysis, Classification and Clustering, Motion, Tracking and Video Analysis Abstract: In this paper, we have proposed a novel approach to recognize the human hand/arm actions in the context of gesture recognition. The main idea is to model the flow information through mixture of Gaussians, perform skin-based Gaussian pruning, and to compute inter-level linking of non-pruned Gaussians using Kullback-Leibler(KL) divergence. Next, we have computed the temporal features from the matched Gaussians which are classified by Hidden Markov Model(HMM) to recognize the gestural action. The proposed approach is tested on six gestural actions taken in real situations and achieved 98% recognition results. Besides, we have performed a comparative analysis of different matching approaches where the KL divergence outperforms.

14:00-14:30, Paper ThPSBT1.27
Unsupervised Domain Adaptation of Virtual and RealWorlds for Pedestrian Detection
V�zquez Berm�dez, David	CVC-UAB
L�pez Pe�a, Antonio M.	CVC-UAB
Ponsa, Daniel	CVC-UAB
Keywords: Machine Learning and Data Mining, Detection, Separation and Segmentation, Classification and Clustering Abstract: Vision-based object detectors are crucial for different applications. They rely on learnt object models. Ideally, we would like to deploy our vision system in the scenario where it must operate, and lead it to self-learn how to distinguish the objects of interest, i.e., without human intervention. However, the learning of each object model requires labelled samples collected through a tiresome manual process. For instance, we are interested in exploring the self-training of a pedestrian detector for driver assistance systems. Our first approach to avoid manual labelling consisted in the use of samples coming from realistic computer graphics, so that their labels are automatically available. This would make possible the desired self-training of our pedestrian detector. However, as we showed in Vazquez et al., between virtual and real worlds it may be a dataset shift. In order to overcome it, we propose the use of unsupervised domain adaptation techniques that avoid human intervention during the adaptation process. In particular, this paper explores the use of the transductive SVM (T-SVM) learning algorithm in order to adapt virtual and real worlds for pedestrian detection.

14:00-14:30, Paper ThPSBT1.28
Correcting Pose Estimation with Implicit Occlusion Detection and Rectification
Radwan, Ibrahim	Univ. of Canberra
Dhall, Abhinav	Australian National Univ.
Goecke, Roland	Univ. of Canberra
Keywords: Pattern Recognition for Surveillance and Security, Occlusion and Shadow Detection, Gesture and Behavior Analysis Abstract: Recently, articulated pose estimation methods based on the pictorial structure framework have received much attention in computer vision. However, the performance of these approaches has been limited due to the presence of self-occlusion. This paper deals with the problem of handling self-occlusion in the pictorial structure framework. We propose an exemplar-based framework for implicit occlusion detection and rectification. Our framework can be applied as a general post-processing plug-in following any pose estimation approach to rectify errors due to self-occlusion and to improve the accuracy. The proposed framework outperforms a state-of-the-art pictorial structure approach for human pose estimation on the HumanEva dataset.

14:00-14:30, Paper ThPSBT1.29
A Comparison Study on Appearance-Based Object Recognition
Hsu, Gee-Sern	National Taiwan Univ. of Science and Tech.
Truong, Loc	National Taiwan Univ. of Science and Tech.
Chung, Sheng-Lun	National Taiwan Univ. of Science and Tech.
Keywords: Machine Learning and Data Mining, Classification and Clustering, Pattern Recognition for Search, Retrieval and Visualization Abstract: Appearance-based methods are mostly exploited in the recognition of specific objects, especially faces; while methods with local features are often applied to the recognition of generic objects. Only few works report the performance of appearance-based methods applied to generic object recognition. This paper offers a comparison study to extend our understanding in this regard. The appearance features considered include those extracted by PCA, DCT, and Gabor Transformation, and the classifiers include kNN, LDA, Naive Bayes, artificial neutral networks and support vector machines. We assume that the objects in the training data can be segmented manually, but those in the test data must be segmented automatically. Therefore, a view-based segmentation approach is proposed to meet this requirement. Experiments were conducted on the COIL-100 database to specify which pair of appearance feature and classifier yields the best performance.

14:00-14:30, Paper ThPSBT1.30
Label-Noise Reduction with Support Vector Machines
Fefilatyev, Sergiy	Univ. of South Florida
Shreve, Matthew	Univ. of South Florida
Kramer, Kurt	Univ. of South Florida
Hall, Larry	Univ. of South Florida
Goldgof, Dmitry	Univ. of South Florida
Kasturi, Rangachar	Univ. of South Florida
Daly, Kendra	Univ. of South Florida
Remsen, Andrew	Coll. of Marine Science, Univ. of South Florida
Bunke, Horst	Univ. of Bern
Keywords: Machine Learning and Data Mining Abstract: The problem of detection of label-noise in large datasets is investigated. We consider applications where data is susceptible to label error and a human expert is available to verify a limited number of such labels in order to cleanse the data. We show the support vectors of a Support Vector Machine (SVM) contain almost all of these noisy labels. Therefore, the verification of support vectors allows efficient cleansing of the data. Empirical results are presented for two experiments. In the first experiment, two datasets from the character recognition domain are used and artificial random noise is applied in their labeling. In the second experiment, a large dataset of plankton images, that contains inadvertent human label error, is considered. It is shown that up to 99% of all label-noise from such datasets can be detected by verifying just the support vectors of the SVM classifier.

14:00-14:30, Paper ThPSBT1.31
Subspace Segmentation with Minimal Squared Frobenius Norm Representation
Wei, Siming	Zhejiang Univ.
Yu, Yizhou	The Univ. of Hong Kong
Keywords: Classification and Clustering, Motion, Tracking and Video Analysis, Pattern Recognition for Surveillance and Security Abstract: We introduce a novel subspace segmentation method called Minimal Squared Frobenius Norm Representation (MSFNR). MSFNR performs data clustering by solving a convex optimization problem. We theoretically prove that in the noiseless case, MSFNR is equivalent to the classical Factorization approach and always classifies data correctly. In the noisy case, we show that on both synthetic and real-word datasets, MSFNR is much faster than most state-of-the-art methods while achieving comparable segmentation accuracy.

14:00-14:30, Paper ThPSBT1.32
Discriminative and Generative Vocabulary Tree for Vein Image Recognition
Wang, Jinjun	Epson Res. and Development
Xiao, Jing	Epson Res. and Development
Keywords: Biometrics, Classification and Clustering, Pattern Recognition for Search, Retrieval and Visualization Abstract: Vein image recognition based on modeling shape or geometrical layout of feature points is generative approach, and the performance is usually limited by segmentation error due to poor vein image quality. This paper instead proposes to model the discriminative appearance of local image patch using the vocabulary tree model. The discriminative approach is further extended to consider the geometrical alignment error of feature points under Bayesian inference theory, and thus making the proposed algorithm both discriminative and generative. Experimental results clearly show the superior performance of our method over either generative or discriminative approaches. In addition, both the discriminative and the generative parts of the method are implemented using the same vocabulary tree model, which makes our algorithm generic and efficient for other similar problems.

14:00-14:30, Paper ThPSBT1.33
Facial Expression Classification on Web Images
Richter, Matthias	Karlsruhe Inst. of Tech.
Gehrig, Tobias	Karlsruhe Inst. of Tech. (KIT)
Ekenel, Hazim Kemal	Karlsruhe Inst. of Tech.
Keywords: Classification and Clustering, Machine Learning and Data Mining, Features and Image Descriptors Abstract: In this paper, we present a novel database which, is obtained from the web. It contains 4761 manually labeled images of seven basic expressions performed by a large number of subjects of different gender, age and ethnicity. Furthermore, we develop feature descriptors based on the discrete cosine transform (DCT), local binary patterns (LBP), and Gabor filters, which share a uniform formulation in terms of regions around key points. We explore several strategies to find an optimal selection of these key points. The system achieves 86.2%, 85.9% and 84.4% accuracy on the web image database using the Gabor, LBP, and DCT descriptors, respectively.

14:00-14:30, Paper ThPSBT1.34
Shift Invariance and Border Distortion in Wavelet-Based Electricity Load Forecasting
Rana, Mashud	Univ. of Sydney
Koprinska, Irena	Univ. of Sydney
Keywords: Machine Learning and Data Mining, Neural Networks Abstract: We consider a wavelet-based approach for electricity load prediction. The wavelet transform is used to decompose the load into different frequency components that are predicted separately using machine learning algorithms. We compare the performance of the standard wavelet transform, which is shift variant, with a non-decimated transform, which is shift invariant. Our results show that the use of shift invariant transform considerably improves the prediction accuracy. We also propose a new approach for signal extension which minimizes the border distortion when decomposing the data. It shows superior performance in comparison to three standard methods. Our evaluation is done using two years of Australian electricity load data.

14:00-14:30, Paper ThPSBT1.35
Group Expression Intensity Estimation in Videos Via Gaussian Processes
Dhall, Abhinav	Australian National Univ.
Goecke, Roland	Univ. of Canberra
Keywords: Gesture and Behavior Analysis Abstract: Facial expression analysis has been a very active field of research in recent years. This paper proposes a method for finding the apex of an expression, e.g. happiness, in a video containing a group of people based on expression intensity estimation. The proposed method is directly applied to video summarisation based on group happiness and timestamps; further, a novel Gaussian Process Regression based expression intensity estimation method is described. To demonstrate its performance, experiments on smile intensity estimation are performed and compared to other regression based techniques. The smile intensity estimator is extended to group happiness intensity estimation. The proposed intensity estimator can be extended easily for other expressions. The experiments are performed on an `in the wild' dataset. Quantitative results are presented for comparison of our happiness-intensity detector. A user study was also conducted to verify the results of the proposed method.

14:00-14:30, Paper ThPSBT1.36
An Active Learning Approach to Frequent Itemset-Based Text Clustering
Marcacini, Ricardo Marcondes	Univ. of S�o Paulo
Corr�a, Geraldo Nunes	Univ. of S�o Paulo - USP - S�o Carlos, SP, Brazil
Rezende Oliveira, Solange	Univ. of S�o Paulo
Keywords: Classification and Clustering, Machine Learning and Data Mining Abstract: Frequent itemset-based text clustering has emerged as a promising way to automatic organization of text documents, because it allows high clustering accuracy combined with understandable cluster descriptors. However, the clustering results may not be satisfactory because they do not reflect the user's point of view. In this context, active learning is an interesting approach to incorporate the user's knowledge in the text clustering task by querying the users about the data. We introduce an active learning approach to frequent itemset-based text clustering called AL2FIC. In our approach, the users can provide feedback directly on the cluster descriptors without the need to know the document labels. An experimental evaluation on real text collections demonstrated that our AL2FIC approach significantly increases the text clustering performance even when only few descriptors are selected by the users.

14:00-14:30, Paper ThPSBT1.37
Multi-View Facial Expression Recognition Using Local Appearance Features
Hesse, Nikolas	CAOS GmbH
Gehrig, Tobias	Karlsruhe Inst. of Tech. (KIT)
Gao, Hua	Karlsruhe Inst. of Tech.
Ekenel, Hazim Kemal	Karlsruhe Inst. of Tech.
Keywords: Gesture and Behavior Analysis, Classification and Clustering, Features and Image Descriptors Abstract: In this paper, we present a multi-view facial expression classification system. The system utilizes local features extracted around automatically located facial landmark points using pose-dependent active appearance models. A pose-dependent ensemble of support vector machine classifiers assigns the given sample to one of the six basic expression classes. Extensive experiments have been conducted on the BU-3DFE database, comparing normalized landmark coordinates, discrete cosine transform, local binary patterns, and scale invariant feature transform based features, as well as combinations of shape and appearance features for classification. We evaluate the influence of AAM fitting errors, F-score feature selection, and expression intensity levels on classification accuracy. Features selected from a combination of normalized landmark coordinates and DCT-based features result in a correct classification rate of 74.1%, outperforming automatic state-of-the-art multi-view expression recognition systems.

14:00-14:30, Paper ThPSBT1.38
Locating High-Density Clusters with Noisy Queries
Cao, Chen	Shenzhen Inst. of Advanced Tech. Chinese Acad. of S
Chen, Shifeng	Shenzhen Inst. of Advanced Tech. ChineseAcademyof Sci
Zou, Changqing	Shenzhen Inst. of Advanced Tech. Chinese Acad. of S
Liu, Jianzhuang	Shenzhen Inst. of Advanced Tech. Chinese Acad. of S
Keywords: Feature Reduction and Manifold Learning, Classification and Clustering, Machine Learning and Data Mining Abstract: Semi-supervised learning (SSL) relies on a few labeled samples to explore data�s intrinsic structure through pairwise smooth transduction. The performance of SSL mainly depends on two folds: (1) the accuracy of labeled queries, (2) the integrity of manifolds in data distribution. Both of these qualities would be poor in real applications as data often consist of several irrelevant clusters and discrete noise. In this paper we propose a novel framework to simultaneously remove discrete noise and locate the high-density clusters. Experiments demonstrate that our algorithm is quite effective to solve several problems such as non-feedback image re-ranking and image co-segmentation.

14:00-14:30, Paper ThPSBT1.39
Towards Automated Classification of Fine-Art Painting Style: A Comparative Study
Arora, Ravneet	Rutgers Univ. Computer Science Department
Elgammal, Ahmed	Rutgers Univ.
Keywords: Pattern Recognition for Art, Cultural Heritage and Entertainment, Classification and Clustering, Machine Learning and Data Mining Abstract: This paper presents a comparative study of different classification methodologies for the task of fine-art genre classification. 2-level comparative study is performed for this classification problem. 1st level reviews the performance of discriminative vs. generative models while 2nd level touches the features aspect of the paintings and compares Semantic-level features vs low-level and intermediate level features present in the painting.

14:00-14:30, Paper ThPSBT1.40
Mining Residential Household Information from Low-Resolution Smart Meter Data
Fusco, Francesco	IBM
Yoon, Ji Won	IBM Res.
Wurst, Michael	IBM Res.
Keywords: Machine Learning and Data Mining, Classification and Clustering Abstract: The implementation of electricity smart meters has raised a number of privacy concerns, related to all sorts of information about the nature of the residents that could be inferred from readings of the power consumption. In this paper we attempt to classify households according to different classes, ranging from the presence of kids and of specific appliances to the employment status and education level of the residents. We apply a wide range of features and classification methods and measure the achievable accuracy. It is shown that, at a time resolution of 30 minutes, only a few of the investigated problems give a satisfactorily accuracy, while most of them would require a higher sampling frequency that is not practical for smart meters.

14:00-14:30, Paper ThPSBT1.41
Predicting Onsets of Genocide with Sparse Additive Models
Semenovich, Dimitri	UNSW
Sowmya, Arcot	Univ. of New South Wales
Goldsmith, Benjamin E	Univ. of Sydney
Keywords: Machine Learning and Data Mining, Pattern Recognition for Art, Cultural Heritage and Entertainment Abstract: Prevention of genocide is one of the most impor- tant challenges before the international community. In this paper we apply recent machine learning techniques to forecast the onset of political instability and genocide. Specifically, we employ sparse additive models which are both flexible and maintain interpretability of the results. Our model demonstrates a reasonable degree of forecasting performance over the hold-out period 1988-2003.

14:00-14:30, Paper ThPSBT1.42
Learning Dynamic Bayesian Network Discriminatively for Human Activity Recognition
Wang, Xiaoyang	RPI
Ji, Qiang	RPI
Keywords: Statistical, Syntactic and Structural Pattern Recognition, Machine Learning and Data Mining, Pattern Recognition for Surveillance and Security Abstract: The purpose of this paper is to develop an approach to learn dynamic Bayesian network (DBN) discriminatively for human activity recognition. DBN is a generative model widely used for modeling temporal events in human activity recognition. The parameters of the DBN models are usually learned through maximizing likelihood or expected likelihood. However, activity is often recognized through identifying the activity class with the highest posterior probability. Hence, there is discrepancy between the learning and classification criteria. In this paper, we focus on developing a discriminative parameter learning approach for hybrid DBNs that has a consistent criterion during training and testing. Our approach is applicable to parameter learning with both complete data and incomplete data, and empirical studies show the proposed discriminative learning approach outperforms the maximum likelihood or EM algorithm in activity recognition tasks.

14:00-14:30, Paper ThPSBT1.43
Temporal Feature Selection for Time-Series Prediction
Hido, Shohei	Preferred Infrastructure
Morimura, Tetsuro	IBM Res. - Tokyo
Keywords: Feature Reduction and Manifold Learning, Machine Learning and Data Mining, Classification and Clustering Abstract: We present a feature selection method for multivariate time-series prediction. It aims to use the best sliding window size and delay for each explanatory variable, which are usually fixed. The idea is to convert the original time-series into a set of cumulative sum with different length. The combinations of cumulative sum variables obtaining nonzero weights in sparse learning algorithms represent the optimal temporal effects from explanatory variables to the target variable. Experiments show that the method performs better than conventional methods in regression problems.


ThPSBT2	Multi-Purpose Hall
Poster Shotgun (14): CV	Regular Session

14:00-14:30, Paper ThPSBT2.1
A Modified KLT Multiple Objects Tracking Framework Based on Global Segmentation and Adaptive Template
Xue, Kang	Beijing Inst. of Tech.
Vela, Patricio	Georgia Tech.
Liu, Yue	Beijing Inst. of Tech.
Wang, Yongtian	Beijing Inst. of Tech.
Keywords: Motion, Tracking and Video Analysis Abstract: This paper presents a modified Kanade-Lucas-Tomasi (KLT) tracking framework for multiple objects tracking applications. First, the framework includes a global pixel-level probabilistic model and an adaptive RGB template model to modify traditional KLT tracker more robust to track multiple objects and partial occlusions. Meanwhile, a Merge and Split algorithm is introduced in the proposed framework to track complete occlusions. The advantage of our method is demonstrated on a variety of challenging video sequences.

14:00-14:30, Paper ThPSBT2.2
Pairwise Similarities for Scene Segmentation Combining Color and Depth Data
Bergamasco, Filippo	Univ. Ca' Foscari di Venezia
Albarelli, Andrea	Univ. Ca' Foscari di Venezia
Torsello, Andrea	Univ. Ca' Foscari
Favaro, Martina	Univ. di Padova
Zanuttigh, Pietro	Univ. di Padova
Keywords: Segmentation, Color and Texture, Classification and Clustering, Image and Video Processing Abstract: The advent of cheap consumer level depth-aware cameras and the steady advances with dense stereo algorithms urge the exploitation of combined photometric and geometric information to attain a more robust scene understanding. To this end, segmentation is a fundamental task, since it can be used to feed with meaningfully grouped data the following steps in a more complex pipeline. Color segmentation has been explored throughly in the image processing literature, as much as geometric-based clustering has been widely adopted with 3D data. We introduce a novel approach that mixes both features to overcome the ambiguity that arises when using only one kind of information. This idea has already appeared in recent techniques, however they often work by combining color and depth data in a common Euclidean space. By contrast, we avoid any embedding by virtue of a game-theoretic clustering schema that leverages on specially crafted pairwise similarities.

14:00-14:30, Paper ThPSBT2.3
Probabilistic Invariant Image Representation and Associated Distance Measure
Scandaliaris, Jorge	Univ. Nacional de Tucum�n
Sanfeliu, Alberto	-Univ. Pol. de Catalunya
Keywords: Segmentation, Color and Texture, Physics-Based Vision Abstract: Varying illumination is a limiting factor for many computer vision applications, especially in outdoor settings. Invariant image representations aim to reduce this effect and provide the following processing steps, image segmentation, edge detection, object recognition, etc., with a more stable view, closer to the surface reflectances presents in the scene than to the illumination. In this work we present an invariant image representation that integrates several key observations in a probabilistic way and an associated probabilistic distance measure. They can be used as a measure of similarity between the surfaces represented by a given pair of pixels, even under illumination color changes.

14:00-14:30, Paper ThPSBT2.4
Single Axis Relative Rotation from Orthogonal Lines
Elqursh, Ali	Rutgers Univ.
Elgammal, Ahmed	Rutgers Univ.
Keywords: Scene Understanding Abstract: We present an efficient algorithm that computes the relative pose between two calibrated views given that the rotation is around a single axis. The algorithm is suited for indoor and urban environments that have an abundance of orthogonal lines. We also present a framework in which this algorithm is used within a hypothesize-and-test framework to simultaneously detect orthogonal lines and compute the relative rotation without explicit structure computation. We study the performance of the algorithm using synthetic and real datasets.

14:00-14:30, Paper ThPSBT2.5
Automatic Annotation of Court Games with Structured Output Learning
Yan, Fei	Univ. of Surrey
Kittler, Josef	Univ. of Surrey
Mikolajczyk, Krystian	Univ. of Surrey
Windridge, David	Univ. of Surrey
Keywords: Motion, Tracking and Video Analysis, Machine Learning and Data Mining, Image and Video Understanding Abstract: We investigate the application of structured output learning (SOL) in automatic annotation of court games. We formulate the problem of event classification in court games as one of learning a mapping from features to structured labels, and employ structured SVM to achieve a max-margin solution. We compare closely the more popular generative approach based on the hidden Markov model (HMM) with our discriminative approach on both artificial games and two real world tennis games, and demonstrate the advantage of our method.

14:00-14:30, Paper ThPSBT2.6
Articulated Particle Filter for Hand Tracking
Ros, German	Computer Vision Center and Computer Science Dpt of UAB
Martinez-del-Rincon, Jesus	Kingston Univ.
Garcia-Mateos, Gines	Univ. deMurcia
Keywords: Motion, Tracking and Video Analysis Abstract: This paper proposes a new version of Particle Filter, called Articulated Particle Filter ---ArPF---, which has been specifically designed for an efficient sampling of hierarchical spaces, generated by articulated objects. Our approach decomposes the articulated motion into layers for efficiency purposes, making use of a careful modeling of the diffusion noise along with its propagation through the articulations. This produces an increase of accuracy and prevent for divergences. The algorithm is tested on hand tracking due to its complex hierarchical articulated nature. With this purpose, a new dataset generation tool for quantitative evaluation is also presented in this paper.

14:00-14:30, Paper ThPSBT2.7
A Direction Change-Based Algorithm for Polygonal Approximation
Liu, Han	King Abdullah Univ. of Science and Tech.
Zhang, Xiangliang	King Abdullah Univ. of Science and Tech.
Rockwood, Alyn	KAUST
Keywords: 2D/3D Object Detection and Recognition, Features and Image Descriptors, Pattern Recognition for Search, Retrieval and Visualization Abstract: A linear-time algorithm is proposed for polygonal approximation of digital curves. The direction changes of the x- and y-coordinates are traced to generate a new, compact representation of curves. The algorithm, Direction Change-based Polygonal Approximation (DCPA), has two advantages: linear time complexity and insensitivity to parameter setting. Benchmark results demonstrate the competitive performance of DCPA using standard assessment techniques.

14:00-14:30, Paper ThPSBT2.8
DFlow and DField: New Features for Capturing Object and Image Relationships
Kisilev, Pavel	IBM Res. Haifa
Freedman, Daniel	IBM Res.
Walach, Eugene	IBM
Tzadok, Asaf	IBM Res. Lab. in Haifa
Keywords: Features and Image Descriptors, Scene Understanding, Image and Video Understanding Abstract: In this paper we propose two new types of features useful for problems in which one wants to describe object or image relationships rather than objects or images themselves. The features are based on the notion of distribution flow, as derived from the classic Transportation Problem. Two variants of such features, the Distribution Flow (DFlow) and Displacement Field (DField), are defined and studied. The proposed features show promising results in two different applications, Inter- and Intra-Class Relationship Characterization, and improve on simple concatenation of corresponding pairs of histograms.

14:00-14:30, Paper ThPSBT2.9
Efficient and Robust Image Descriptor for GUI Object Classification
Dubrovina, Anastasia	Tech. Israel Inst. of Tech.
Kisilev, Pavel	IBM Res. Haifa
Freedman, Daniel	IBM Res.
Schein, Sagi	HP
Bergman, Ruth	HP Lab. Israel
Keywords: Features and Image Descriptors, 2D/3D Object Detection and Recognition, Graphics Recognition Abstract: In this paper we address the problem of the Graphical User Interface (GUI) object classification, which is essential for image-based software automation tools. The main challenge in GUI object classification is the large variation in object's appearance within a given class, even if the same operating system or browser are considered. We assume that the GUI objects differ by geometric and "skin"-related transformations. The geometric transformations may be of two types: (1) scaling (potentially anisotropic) � due to changes in screen resolutions, as well as naturally occurring differences in sizes and aspect ratios; and (2) translation. The "skin" changes are related to the variety of themes and colors provided by today's operating systems, and these changes are somewhat harder to quantify. Our approach is to capture the look-and-feel information that remains relatively unchanged under "skin" transformations, and not color information that may change greatly. In order to cope with the above challenges in GUI object classification task, we propose a novel type of image descriptor developed specifically for the GUI object classification task. The proposed descriptor is based on the 1D version of the Fourier-Mellin Transform. In addition, we incorporate information about object image gradients and the percentage of the white color. We show experimentally that the proposed descriptor is robust to various geometric and skin-related transformations of GUI objects. Using the Support Vector Machine along with the proposed descriptor yields superior performance compared to existing image descriptors

14:00-14:30, Paper ThPSBT2.10
Learning Robust Color Name Models from Web Images
Schauerte, Boris	Karlsruhe Inst. of Tech.
Stiefelhagen, Rainer	Karlsruhe Inst. of Tech. & Fraunhofer IOSB,Karlsruhe
Keywords: Segmentation, Color and Texture, Machine Learning and Data Mining, Image and Video Understanding Abstract: We use images that have been collected using an Internet search engine to train color name models for color naming and recognition tasks. Considering color histogram bands as being words of an image and the color names as classes, we use the supervised latent Dirichlet allocation to train our model. To pre-process the training data, we use state-of-the art salient object detection and a Kullback�Leibler divergence based outlier detection. In summary, we achieve state-of-the-art performance on the eBay data set and improve the similarity between labels assigned by our model and human observers by approximately 14%.

14:00-14:30, Paper ThPSBT2.11
Time to Contact Estimation on Paracatadioptric Cameras
Benamar, Fatima Zahra	GSCM-LRIT, Mohammed V-Agdal Univ. Rabat, Morocco andMIS, U
Demonceaux, C�dric	Le2i, UMR CNRS 6306
El Fkihi, Sanaa	ENSIAS Mohammed V Univ. Souissi
Mouaddib, El Mustapha	MIS Lab. Univ. de Picardie Jules Verne
Aboutajdine, Driss	Univ. Mohamed V -Faculty of sciences-LRIT
Keywords: Vision for Robotics, Image and Video Processing, Stereo and Image-Based Modeling Abstract: Time to contact or time to collision (TTC) is the time available to a robot before reaching an object. In this paper, we propose to estimate this time using a catadioptric camera embedded on the robot. Indeed, whereas a lot of works have shown the utility of this kind of cameras in robotic applications (monitoring, localisation, motion,...), a few works deal with the problem of time to contact estimation on it. Thus, in this paper, we propose a new work which allows to define and to estimate the TTC on catadioptric camera. This method will be validated on simulated and real data.

14:00-14:30, Paper ThPSBT2.12
Robust Depth Regularization Explicitly Constrained by Camera Motion
Zarrouati-Vissiere, Nad�ge	DGA, Mines ParisTech
Aldea, Emanuel	SYSNAV
Rouchon, Pierre	Ec. des Mines ParisTech
Keywords: Motion, Tracking and Video Analysis, Stereo and Image-Based Modeling, Vision for Robotics Abstract: The objective of our work is to reconstruct the dense structure of a static scene observed by a monocular camera system following a known trajectory. Our main contribution is representated by the proposition of a TV-L^1 energy functional that estimates directly the unknown depth field given the camera motion, thus avoiding to estimate as an intermediate step an optical flow field with additional geometric constraints. Our method has two main interests: we highlight a practical minimal parametrization for the given assumptions (static scene, known camera motion) and we solve the resulting variational problem using an efficient, discontinuity preserving formulation.

14:00-14:30, Paper ThPSBT2.13
Attention-Driven Segmentation of Cluttered 3D Scenes
Potapova, Ekaterina	Vienna Univ. of Tech.
Zillich, Michael	Vienna Univ. of Tech.
Vincze, Markus	TU Wien
Keywords: Vision for Robotics, Segmentation, Color and Texture, Scene Understanding Abstract: Vision is an essential part in robotic systems, where attention plays an important role to cope with the complexity of the real world. Attention mechanisms have been proposed in the past to guide search and also segmentation of objects. Building on recent advances in affordable 3D sensing we first attend to objects using a novel saliency map, based on color and depth information. We then segment attended objects using an edge map that uses color, depth and curvature within a probabilistic framework. We present an improvement over existing methods regarding the quality of attention points, in terms of their location within the object and the number of attended objects. Together the proposed attention points and probabilistic edges lead to a significant improvement of segmentation results compared to existing methods of active segmentation.

14:00-14:30, Paper ThPSBT2.14
Learning Symmetrical Model for Head Pose Estimation
Dahmane, Afifa	USTHB Univ. USTL Lille1
Larabi, Slimane	usthb Univ. Algiers, Algeria
Djeraba, Chabane	UMR USTL/CNRS 8022
Bilasco, Ioan Marius	Univ. Lille 1
Keywords: Features and Image Descriptors, Motion, Tracking and Video Analysis Abstract: This paper tackles the problem of head pose estimation which has been considered an important research task for decades. The proposed approach selects a set of features from the symmetrical parts of the face. The size of bilateral symmetrical area of the face is a good indicator of the Yaw head pose. We train a Decision Tree model in order to recognize head pose with regard to the areas of symmetry. The approach does not need the location of interest points on face and is robust to partial occlusion. Tests were performed on a different dataset from that used for training the model and the results demonstrate that the change in the size of the regions that contain a bilateral symmetry provides accurate pose estimation.

14:00-14:30, Paper ThPSBT2.15
Robust Range Image Segmentation Based on Coplanarity of Superpixels
Wan, Ji	Inst. of Computing Tech. Chinese Acad.
Xia, Tian	Inst. of Computing Tech. Chinese Acad.
Tang, Sheng	Inst. of Computing Tech. ChineseAcademyOfSciences
Li, Jintao	Inst. of Computing Tech. Chinese Acad. of Sciences
Keywords: Segmentation, Color and Texture, Scene Understanding, Low-Level Vision Abstract: With help of the newly Kinect-style sensors, it is convient to obtain 3D information of the scene nowadays. But we still suffer from the problem caused by noises arised from the device. In this paper, we propose an effective method for extracting planar regions from noisy range data. Including an efficient superpixel extraction algorithm, and a robust grouping algorithm based on coplanarity of superpixels. To evaluate the performance of the proposed method, we built a benchmark framework based on ICPR-HARL 2012 dataset. Experimental results show that the proposed method achieves desirable performance in the real world condition.

14:00-14:30, Paper ThPSBT2.16
Unsupervised Dynamic Texture Segmentation Using Local Descriptors in Volumes
Chen, Jie	Univ. of Oulu, Finland
Keywords: Segmentation, Color and Texture Abstract: Dynamic texture (DT) is an extension of texture to the temporal domain. How to improve the performance and efficiency of DT segmentation is still a challenging problem. In this paper, we improve the performance of a recently published DT segmentation method. We compute the histogram of the spatiotemporal local texture descriptor in one volume and employ the segmentation results of previous frame for the segmentation of the current frame. Experimental results show that our approach improves the performance and efficiency of DT segmentation compared to the state-of-the-art methods.

14:00-14:30, Paper ThPSBT2.17
A Viewpoint-Independent Statistical Method for Fall Detection
Zhang, Zhong	Univ. of Texas at Arlington
Liu, Weihua	Univ. of Texas at Arlington
Metsis, Vangelis	Univ. of Texas at Arlington
Athitsos, Vassilis	Univ. of Texas at Arlington
Keywords: Motion, Tracking and Video Analysis Abstract: The goal of a fall detection system is to automatically detect cases where a human falls and may have been injured. We propose a statistical method based on Kinect depth cameras, that makes a decision based on information about how the human moved during the last few frames. Our method proposes novel features to be used for fall detection, and combines those features using a Bayesian framework. Our experiments explicitly evaluate the ability of our method to use training data collected from one viewpoint, in order to recognize falls from a different viewpoint. We obtain promising results, on a challenging dataset, that we have made public, and that contains, in addition to falls, several similar-looking events such as sitting down, picking up objects from under the bed, or tying shoelaces.

14:00-14:30, Paper ThPSBT2.18
Sparse Stereo by Edge-Based Search Using Dynamic Programming
Witt, Jonas	Hamburg Univ. of Tech.
Weltin, Uwe	IZT at the Hamburg Univ. of Tech.
Keywords: Vision for Robotics, Low-Level Vision Abstract: In this paper, a novel edge-based stereo matching technique is presented. Depth discontinuities are specifically accounted for in the choice of support regions. We employ dynamic programming along edge-segments to efficiently enforce inter-scanline consistency. Although not performing a global optimization over the whole image we show that our approach performs successfully on the Middlebury benchmark datasets while being computationally feasible in real-time.

14:00-14:30, Paper ThPSBT2.19
A Random Walk Approach for Multiatlas-Based Segmentation
Morin, Jean-Philippe	Ec. de Tech. superieure
Desrosiers, Christian	Ec. de Tech. superieure
Duong, Luc	Ec. de Tech. superieure
Keywords: Segmentation, Color and Texture, Scene Understanding, Features and Image Descriptors Abstract: Although atlas-based methods simplify the segmentation process by making it more automated, such methods are often very sensitive to the computationally expensive image registration step. Also, existing methods based on a parametric deformation model may fail when the transformation between the atlas and target images can not be properly described with this model. This paper presents a novel and efficient atlas-based segmentation method based on random walks. Unlike most atlas-based approaches, this method combines the registration and label propagation steps in a single efficient framework and does not depend on a specific deformation model. Experiments conducted on benchmark images show the accuracy and efficiency of our method.

14:00-14:30, Paper ThPSBT2.20
Enhancing Object Detection Performance by Integrating Motion Objectness and Perceptual Organization
Spampinato, Concetto	Univ. of Catania, DIIT
Palazzo, Simone	Univ. of Catania
Keywords: 2D/3D Object Detection and Recognition, Features and Image Descriptors, Segmentation, Color and Texture Abstract: In this paper we propose a method to improve the performance of motion detection algorithms by estimat- ing the probability that a detected blob (i.e. a group of pixels identified as foreground) is actually an object of interest. The system exploits �objectness� and perceptual organization to estimate general properties of real- world objects such as convexity, symmetry, well-defined boundary, visual contrast and cohesiveness. The mea- sures of these properties are given as input to a naive Bayes classifier, which is trained to distinguish objects of interest from false positives. The system was trained and tested on two �real-life� environments (underwater and vehicular monitoring) and the results showed an increase of the performance of four state-of-art motion detection algorithms of about 15%. We also tested our approach on the CAVIAR dataset and although the system was not trained on that specific object class (people) it was able to increase the object detection performance of about 10%.

14:00-14:30, Paper ThPSBT2.21
Saliency-Seeded Localizing Region-Based Active Contour for Automatic Natural Object Segmentation
Gao, Shangbing	Nanjing Univ. of science and Tech.
Yang, Jian	Nanjing Univ. of Science and Tech.
Keywords: Segmentation, Color and Texture, Image and Video Processing Abstract: In this paper, we propose a new saliency-seeded active contour based automatic natural object segmentation method. It is known that using saliency regions or pixels can easily get the approximately location of the desired object in the map. The salient object points are employed as the seeds of convex hull to generate the initial contour for our automatic object segmentation system. In contrast with localizing region-based active contours that require considerable user interaction, the proposed method does not require it, i.e., the segmentation task is fulfilled in a fully automatic manner. Extensive experiments results on a large variety of natural images confirm that our framework can reliably and automatically extract the object from the complex background.

14:00-14:30, Paper ThPSBT2.22
Generalized Ordinary Moment Based Blur Invariant Descriptors for Face Recognition with Degraded Images
Makaremi, Iman	Univ. of Windsor
Ahmadi, Majid	Univ. of Windsor
Keywords: Features and Image Descriptors, Statistical, Syntactic and Structural Pattern Recognition, Pattern Recognition for Surveillance and Security Abstract: In this paper, we introduce an alternative definition for ordinary moments and use them to redefine moment based blur invariants. With this change, we are able to increase the discriminative power significantly, as well as robustness to blur. The superiority of our proposed method is illustrated in comparison to another similar descriptor in an experiment on the FRGC database.

14:00-14:30, Paper ThPSBT2.23
Semantic Hough Transform Based Object Detection with Partial Least Squares
Tang, Jianyu	Xiamen Univ.
Wang, Hanzi	Xiamen Univ.
Keywords: 2D/3D Object Detection and Recognition Abstract: The codebooks play a decisive role in the Hough Transform based object detection. We propose a novel approach to generate the codebooks in the manner of parametric regression and integrate inside semantic information drawn from objects and background. Clustering is a popular method for deriving codebooks, but it generally relies on some parameters, which heavily affect the performance of the approaches. By exploiting Partial Least Squares and tuning only one parameter, we map the most informative latent components of an image patch directly to the displacement vectors from the possible object centroids to the patch, and obtain the Parameterized Semantic Codebook Group (PSCG). Experiments show that PSCG generates accurate voting vectors and performs superiorly on some challenging datasets.

14:00-14:30, Paper ThPSBT2.24
Night Removal by Color Estimation and Sparse Representation
Fu, Huiyuan	Beijing Univ. of Posts and Telecommunications
Ma, Huadong	Beijing Univ. of Posts and Telecommunications
Wu, Shixin	Beijing Univ. of Posts and Telecommunications
Keywords: Computational Photography Abstract: Night Removal is highly desired in both computational photography and computer vision applications. However, few works have been studied towards this goal. This paper proposes an effective algorithm for removing the night from a single input image. We present a new Color Estimation Model (CEM) for transforming the image from �night� to �day� - along with a guided statistical Dark-to-Day (D2D) prior directing for performance optimization. To restore the noisy and blurred image after CEM, sparse representation based on dozens of corresponding day-time images in different illuminations as dictionary training set is used in our algorithm. Extensive experiments on natural images show our algorithm can achieve convincing results.

14:00-14:30, Paper ThPSBT2.25
Single Camera Multi-Person Tracking Based on Crowd Simulation
Jin, Zhixing	Univ. of California, Riverside
Bhanu, Bir	Univ. of California
Keywords: Motion, Tracking and Video Analysis, Physics-Based Vision, 2D/3D Object Detection and Recognition Abstract: Tracking individuals in video sequences, especially in crowded scenes, is still a challenging research topic in the area of pattern recognition and computer vision. However, current single camera tracking approaches are mostly based on visual features only. The novelty of the approach proposed in this paper is the integration of evidences from a crowd simulation algorithm into a pure vision based method. Based on a state-of-the-art tracking-by-detection method, the integration is achieved by evaluating particle weights with additional prediction of individual positions, which is obtained from the crowd simulation algorithm. Our experimental results indicate that, by integrating simulation, the multi-person tracking performance such as MOTP and MOTA can be increased by an average about 2% and 5%, which provides significant evidence for the effectiveness of our approach.

14:00-14:30, Paper ThPSBT2.26
Illuminant Segmentation in Non-Uniformly Lit Scenes
Huynh, Cong Phuoc	National ICT Australia (NICTA)
Robles-Kelly, Antonio	NICTA
Keywords: Segmentation, Color and Texture, Physics-Based Vision, Detection, Separation and Segmentation Abstract: In this paper, we present a method for segmenting illuminants in non-uniformly lit scenes. Here, we view the illuminant colour at an image location as a mixture of the segmented illuminants. Based on the dichromatic structure of the image radiance space, we perform soft-clustering on the set of dichromatic planes corresponding to the neighbourhoods of pixel-sites in the image. We solve the soft-clustering problem with a deterministic annealing approach where the cost function is formulated based on the maximum entropy principle. We show results on real-world imagery and provide comparisons to an alternative method.

14:00-14:30, Paper ThPSBT2.27
Appearance-Based Object Recognition Using Weighted Longest Increasing Subsequence
Kusuma Negara, I Gede Putra	Inst. for Infocomm Res.
Szabo, Attila	Inst. for Infocomm Res.
Li, Yiqun	Inst. for Infocomm Res.
Lee, Jimmy Addison	Inst. for Infocomm Res.
Keywords: 2D/3D Object Detection and Recognition, Multimedia Analysis, Indexing and Retrieval, Scene Understanding Abstract: We proposed in this paper a novel weighted longest increasing subsequence to improve the performance of the appearance-based object recognition. The LIS is employed to find the true keypoint matches that have consistent geometric order in both query and gallery images. Then, the similarity between query and gallery images is measured by the sum of the weights of the true keypoints. The experimental results shown that our approach outperforms the SURF and SURF + RANSAC Homography approaches.

14:00-14:30, Paper ThPSBT2.28
Automated Person Segmentation in Videos
Bhole, Chetan	Univ. of Rochester
Pal, Chris	�cole Pol. Montr�al
Keywords: Motion, Tracking and Video Analysis, Segmentation, Color and Texture, Detection, Separation and Segmentation Abstract: This paper deals with automatically segmenting a person from challenging videos using a pose detector. A state of the art pose detector is used to detect the pose of a person from a frame in the video sequence. The pose is used to extract color and optical flow features to train a conditional random field to provide segmentation on multiple frames. Location from the pose is used to refine the results. No additional training data is required by the method. We also show how the pose results can be improved by our model.

14:00-14:30, Paper ThPSBT2.29
Semantic Saliency Using K-TR Theory of Visual Perception
Varadarajan, Karthik Mahesh	TU Wien
Vincze, Markus	TU Wien
Keywords: Low-Level Vision, Scene Understanding, Cognitive and Embodied Vision Abstract: Saliency in 2D imagery has been receiving increasing attention over the last few years owing to the need to minimize computation requirements through visual search space reduction, especially in the field of domestic robotics. Saliency and preattention mechanisms such as the Itti-Koch model have largely been focused on multi-scale local features mimicking low level attention processes in visual system, without any regard for the semantic content of the scene and therefore any cognitive grounding in visual processing. The �k-TR� theory presents the first attempt at a true cognitive understanding of scenes by explaining visual perception and object recognition, in terms of Recognition of Component Affordances (RBCA). The k-TR model, presents a bi-layer recognition process through a combination of local, global, semantic and affordance features. The k-TR theory provides psychophysical, neurobiological, linguistic and evolutionary studies to support the theory and explains recognition of over 250 categories of common household objects. The features used by k- TR for object representation, termed as k-TRONs are available from the publicly available Affordance Network database (AfNet). In this paper, we use the k- TRON features, in particular the 35+ affordance features, in order to incorporate semantic context into saliency models. Saliency or surprise for pre-attention is modeled in the form of affordance aberrations. By using affordance aberration features for conspicuity map generation, we show that the resulting saliency and attention points more closely resemble the salient regions or surprise regions generated by the human visual system, hence providing superior performance in comparison to the Itti framework. Furthermore, by learning of affordance affinities from test subjects, the degree of influence of each affordance aberration towards visual saliency is estimated and incorporated into the overall saliency model.

14:00-14:30, Paper ThPSBT2.30
Pedestrian Tracking in Low Contrast Regions Using Component Silhouette and Aggregated Background Model
Hsu, Gee-Sern	National Taiwan Univ. of Science and Tech.
Nguyen, Hong Phuoc	NATIONAL TAIWAN Univ. OF SCIENCE AND Tech.
Chien-Hung, Wu	Artificial Vision Lab.
Chung, Sheng-Lun	National Taiwan Univ. of Science and Tech.
Keywords: Motion, Tracking and Video Analysis Abstract: We propose an approach for pedestrian detection and tracking in low contrast regions. The approach is composed of two modules. Module-1 improves the pixel-based Mixture of Gaussians (MOG) by aggregated background modeling and varying interval differences. Module-2 exploits local patch variance (LPV) and partial silhouette template (PST) for improving the incomplete foregrounds often observed in low contrast scenes regardless of the approaches. Experiments show that the proposed approach performs satisfactorily.

14:00-14:30, Paper ThPSBT2.31
Robust Video Stabilization Based on Bounded Path Planing
Song, Chunhe	Univ. of Waterloo
Hai, Zhao	Northeastern Univ.
Jing, Wei	Northeastern Univ.
Bi, Yuanguo	School of Information Science and Tech. NortheasternUniver
Keywords: Low-Level Vision, Motion, Tracking and Video Analysis, Pattern Recognition for Surveillance and Security Abstract: This paper presents a novel video stabilization algorithm based on bounded acceleration, which consists of three steps: motion estimation, virtual path generation and motion compensation. The key insight of this paper is that, to generate a reasonable smooth virtual path, we need to 1) minimize the acceleration along the virtual path, and 2) limit the offset of translations along x and y direction to a certain range. Compared to stat-of-art video stabilization methods, the proposed algorithm can generate more reasonable virtual paths but has fewer parameters to be adjusted. Theoretical analysis and practical experimental results prove the effectiveness of the proposed method.

14:00-14:30, Paper ThPSBT2.32
A New Depth Descriptor for Pedestrian Detection in RGB-D Images
Wang, Ningbo	Zhejiang Univ.
Gong, Xiaojin	Zhejiang Univ.
Liu, Jilin	Zhejiang Univ.
Keywords: 2D/3D Object Detection and Recognition, Features and Image Descriptors Abstract: With the development of depth camera technology, it is feasible to get high quality color and depth images synchronously in real time. Thus, RGB-D-based applications are becoming more and more popular, such as pedestrian detection in RGB-D data. As the key point in this application is to search for better descriptions, in this paper we propose a new feature descriptor, Pyramid Depth Self-Similarities (PDSS), for depth images. It is based on the idea that depth information of people has high local self-similarities. The experiments, where RGB-D data is collected by a Kinect sensor, prove that PDSS is an effective complement to Histogram of Oriented Depth (HOD). Furthermore, the combination of Histogram of Oriented Gradients (HOG), HOD and PDSS improves the detection performance.

14:00-14:30, Paper ThPSBT2.33
Motion Segmentation Using Curve Fitting on Lagrangian Particle Trajectories
Narayan, Sanath	Indian Inst. of Science, Bangalore
Kalpathi, Ramakrishnan	Indian Inst. of Science, Bangalore
Keywords: Motion, Tracking and Video Analysis, Pattern Recognition for Surveillance and Security Abstract: In this paper we present a segmentation algorithm to extract foreground object motion in a moving camera scenario without any preprocessing step such as tracking selected features, video alignment, or foreground segmentation. By viewing it as a curve fitting problem on advected particle trajectories, we use RANSAC to find the polynomial that best fits the camera motion and identify all trajectories that correspond to the camera motion. The remaining trajectories are those due to the foreground motion. By using the superposition principle, we subtract the motion due to camera from foreground trajectories and obtain the true object-induced trajectories. We show that our method performs on par with state-of-the-art technique, with an execution time speed-up of 10x-40x. We compare the results on real-world datasets such as UCF-ARG, UCF Sports and Liris-HARL. We further show that it can be used to perform video alignment.

14:00-14:30, Paper ThPSBT2.34
Framework for Quantitative Performance Evaluation of Shape Decomposition Algorithms
Lewin, Sergej	Univ. of M�nster
Jiang, Xiaoyi	Univ. of M�nster
Clausing, Achim	Univ. of M�nster
Keywords: Segmentation, Color and Texture, Image and Video Processing, Features and Image Descriptors Abstract: Despite of intensive research on shape decomposition algorithms, their performance evaluation remains qualitative today. The intention of this work is to close this gap by proposing a general framework for quantitative performance evaluation of shape decomposition algorithms. The proposed framework is of supervised nature and based on a benchmark database from a large-scale psychological study with manually specified ground truth. We discuss various variants of dissimilarity functions for comparing two decompositions. A preliminary comparison study using five shape decomposition methods and an ensemble technique demonstrates the usefulness of our approach. In particular, the quantitative results well coincide with visual comparison of decompositions.

14:00-14:30, Paper ThPSBT2.35
Plane Based Multi Camera Calibration under Unknown Correspondence Using ICP-Like Approach
Kawabata, Satoshi	National Inst. of Advanced IndustrialScienceandTechnology (A
Kawai, Yoshihiro	National Inst. of Advanced Industrial Science and Tech.
Keywords: Stereo and Image-Based Modeling, Vision for Robotics, Geometric and Photometric Registration Abstract: In the present paper, we propose a plane based multi camera calibration method in the case that point correspondences among camera images are not given aforehand. One can encounter this situation when calibrating a set of fixed cameras by observing a partial region of a large reference plane with a repeated pattern. Basically, all the camera parameters except relative poses are estimated by applying a homography based calibration to a set of observed reference planes in different poses for each camera. The problem is how to obtain fine correspondences among camera images to estimate the relative poses even though there were no common observed points. When a set of cameras simultaneously observe a part of a large reference plane with an infinitely repeating pattern, the problem can be considered as alignment of sets of the large reference planes in different pose observed by different cameras while optimizing camera parameters. Therefore, we integrate the iterative closest point (ICP) approach for estimating (virtual) corresponding points onto optimization process. This approach works well with the low quality initial value by a linear solver. Our experiment shows our method overwhelms a standard nonlinear optimization approach over the all camera parameters in corresponding point detection.

14:00-14:30, Paper ThPSBT2.36
Glocal Shape Context Descriptor in Cluttered Images
Li, Shimiao	Inst. for Infocomm Res. A*STAR
Xiong, Wei	Inst. for Infocomm Res. A-STAR
Nguyen, Tan Dat	Panasonic R&D Center, Singapore
Keywords: Features and Image Descriptors, Detection, Separation and Segmentation Abstract: Shape context has been proven to be an effective method for both local feature matching and global context description. In this paper, we propose a method to build a glocal shape context descriptor in cluttered images. By using the proposed keypoint centered multiple scale edge detection (KMSED) method, glocal shape context encodes fine-scale edges in the keypoint center region while coarse-scale edges in the outer region. In this way, local and global image information are encoded at the same time into a 68 dimension feature vector. Experiments show that the proposed glocal shape context makes significant enhancement over the local shape context descriptor and outperforms SIFT under severe illumination change and high JPEG compression.

14:00-14:30, Paper ThPSBT2.37
Scalable Image Co-Segmentation Using Color and Covariance Features
Zhang, Shijie	Tianjin Univ.
Feng, Wei	Tiajin Univ.
Wan, Liang	Tianjin Univ.
Zhang, Jiawan	Tianjin Univ.
Jiang, Jianmin	Tianjin Univ.
Keywords: Segmentation, Color and Texture, Low-Level Vision, Classification and Clustering Abstract: This paper focuses on producing fast and accurate co-segmentation to a pair of images that is scalable and able to apply multimodal features. We present a general solution for this purpose and specifically propose a non-iterative and fully unsupervised method using pointwise color and regional covariance features for image co-segmentation. The scalability and generality of our method mainly attribute to the superpixel-level irregular graph formulation and multi-feature joint clustering. Through a unified similarity metric, the contributions of multiple features are finally embodied into the co-segmentation energy function. Experiments on common dataset validate the superior scalability of our method over state-of-the-art alternatives and its capability of generating comparable or even better labeling accuracy at the same time. We also find that multi-feature co-segmentation usually produces better labeling accuracy than using single color feature only.

14:00-14:30, Paper ThPSBT2.38
Recognizing Surface Qualities from Natural Images Based on Learning to Rank
Abe, Takashi	Tohoku Univ.
Okatani, Takayuki	Tohoku Univ.
Deguchi, Koichiro	Tohoku Univ.
Keywords: Features and Image Descriptors, Machine Learning and Data Mining Abstract: This paper proposes a method for estimating the quantitative values of some attributes associated with surface qualities of an object, such as glossiness and transparency, from its image. Our approach is to learn functions that compute such attribute values from the input image by using training data given in the form of relative information. To be specific, each sample of the training data represents that, for a pair of images, which is greater in terms of the target attribute. The functions are learned based on leaning to rank. This approach enables us to deal with natural images, which cannot be dealt with in previous works, which are based on CG synthesized images for both training and testing. We created data sets using the Flickr Material Database for four attributes of glossiness, transparency, smoothness, and coldness, and learn the functions representing the values of these attributes. We present experimental results that the learned functions show very promising performances in the estimation of the attribute values.

14:00-14:30, Paper ThPSBT2.39
Unsupervised Multi-Target Trajectory Detection, Learning and Analysis in Complicated Environments
Liu, Hong	Peking Univ.
Li, Jiang	Peking Univ.
Keywords: Motion, Tracking and Video Analysis, Image and Video Understanding, Pattern Recognition for Surveillance and Security Abstract: Trajectory analysis is very important to human behavior-analysis for video processing based smart surveillance systems. It has a challenge that human trajectory has no prior model and needs to online learning and updating, while interaction between targets complicates the problem. This paper describes a novel integrated framework for multiple human trajectory detection, learning and analysis in complicated environments. First a modified feature-spatial representation (MFSR) for Cam-Shift tracking algorithm is proposed to obtain trajectories. Then, a piecewise multilevel learning method is adopted to learn the trajectory patterns by using spectral clustering and Hidden Markov Model. Finally a cascade detector is established for anomaly analysis based on learning information, which allows obviously abnormal trajectories to be quickly deviated from normality. Our framework is demonstrated good results by lots of experiments and can be applied in further selective video analysis.

14:00-14:30, Paper ThPSBT2.40
Position Estimation of Near Point Light Sources Using Clear Hollow Sphere
Aoto, Takahito	Nara Inst. of Science and Tech.
Taketomi, Takafumi	Nara Inst. of Science and Tech.
Sato, Tomokazu	Nara Inst. of Science and Tech.
Mukaigawa, Yasuhiro	Osaka Univ.
Yokoya, Naokazu	Nara Inst. of Science and Tech.
Keywords: Stereo and Image-Based Modeling, Vision for Graphics Abstract: We present a novel method for estimating 3-D positions of near light sources by using highlights on the outside and inside of a single clear hollow sphere. Conventionally, positions of near light sources have been estimated by using observed highlights on multiple reference objects, e.g.~mirror balls. Unlike these approaches, geometric calibration for multiple reference objects is not required for our method, and it results easy setup for measuring 3-D positions of light sources. In experiments, the accuracy of estimated light positions by the proposed method is evaluated using both simulation and real data.


ThCT1	Main Hall
Handwriting	Regular Session
Chair: Tan, Chew-Lim	National Univ. of Singapore
Co-Chair: Liu, Cheng-Lin	Inst. of Automation, Chinese Acad. of Sciences

14:30-14:50, Paper ThCT1.1
Combining Online and Offline Systems for Arabic Handwriting Recognition
Mansour, Hany Ahmed	The American Univ. in Cairo
Abdelazeem, Sherif	American Univ. in Cairo
Keywords: Handwriting Recognition Abstract: The purpose of this research is to improve the recognition rate of online Arabic handwriting recognition using HMM (Hidden Markov Model). Delayed strokes are removed from the online Arabic word to avoid the difficulty and the confusion caused by the delayed strokes in the recognition process. A new technique for extracting offline features by dividing the image into non-uniform horizontal segments is presented. The integration between online and offline approaches has proven to give a better performance. With the combination we could increase the system performance over the best individual recognizer by 2.38%.

14:50-15:10, Paper ThCT1.2
Sparse Descriptor for Lexicon Reduction in Handwritten Arabic Documents
Chherawala, Youssouf	Ec. de Tech. Sup�rieure
Wisnovsky, Robert	Inst. of Islamic Studies, McGill Univ.
Cheriet, Mohammed	�cole de Tech. sup�rieure
Keywords: Character and Text Recognition, Document Understanding, Historical Document Analysis Abstract: Arabic words have a rich structure. They are made of subwords (groups of connected letters) and diacritical marks (dots). This paper proposes a sparse descriptor specifically designed for lexicon reduction in handwritten Arabic documents. The topological and geometrical features of subwords are extracted from the skeleton image, based on the concept of local density. The sparse descriptor is then formed as a 3-bins histogram, describing the distribution of the skeleton pixels' local density (low, medium or high). This descriptor is then extended to the Arabic word descriptor (AWD), which combines information from all the subwords and diacritics of an Arabic word. This approach is easy to implement and has only one free parameter. It has been evaluated on the Ibn Sina and IFN/ENIT databases with promising results.

15:10-15:30, Paper ThCT1.3
Offline Signature Verification and Forgery Detection Using a 2-D Geometric Warping Approach
Kennard, Douglas J.	Brigham Young Univ.
Barrett, William A.	Brigham Young Univ.
Sederberg, Thomas W.	Brigham Young Univ.
Keywords: Handwriting Recognition, Document Analysis Systems Abstract: We present a method of discriminating between authentic and forged signatures using 2-D geometric warping. After an initial coarse-alignment step, we use an automatic morphing correspondence algorithm to compute 2-D geometric warps that align the strokes of a questioned signature with those of known reference examples. We use distance maps to compute a difference metric, and then either accept the signature as genuine or reject it as a forgery depending on how different it is from the reference examples. Our method achieves equal error rate (EER) accuracies of about 94%-96% on our English dataset of blind forgeries and 87%-91% on casual forgeries (unpracticed imitations). Further evaluation of our method using the SigComp2011 competition dataset shows that our accuracies for skilled forgeries are comparable to those of several other recent methods. We are particularly encouraged by the performance of our method on the Chinese portion of the dataset, in which our EER accuracy (74%) is better than all but one of the systems that participated in the 2011 competition.

15:30-15:50, Paper ThCT1.4
Exploiting Ruling Line Artifacts in Writer Identification
Chen, Jin	Lehigh Univ.
Lopresti, Daniel	Lehigh Univ.
Keywords: Document Analysis Systems, Historical Document Analysis, Performance Evaluation Abstract: In this paper, we address the writer identification problem for noisy handwritten documents written on a substrate of pre-printed ruling lines. Instead of attempting to remove rulings and to recover broken strokes, we incorporate rulings to help with the identification task through the use of new displacement features. Experiments involving 61 writers and 4,890 handwritten text lines show that our technique is effective, with a relative 10% performance gain over the baseline system which attempts to remove ruling lines and recover broken strokes.

15:50-16:10, Paper ThCT1.5
The ILGDB Database of Realistic Pen-Based Gestural Commands
Renau-Ferrer, Ney	Univ. Rennes 2, UMR IRISA
Li, Peiyu	INSA de Rennes
Delaye, Adrien	IRISA - INSA
Anquetil, Eric	IRISA/INSA
Keywords: Performance Evaluation, Graphics Recognition, Human Computer Interaction Abstract: In this paper, we introduce the Intuidoc-Loustic Gestures DataBase (ILGDB), a new publicly available database of realistic pen-based gestures for evaluation of recognition systems in pen-enabled interfaces. ILGDB was collected in a real world context and in an immersive environment. As it contains a large number of unconstrained user-defined gestures, ILGDB offers a unique diversity of content that is likely to serve as a precious tool for benchmarking of gesture recognition systems. We report first baseline experimental results on the task of Writer-Dependent gesture recognition.


ThCT2	Multi-Purpose Hall
Imaging and Segmentation	Regular Session
Chair: Chen, Yen-wei	Ritsumeikan Univ.
Co-Chair: Kita, Yasuyo	National Inst. ofAdvancedIndustrialScienceandTechnology

14:30-14:50, Paper ThCT2.1
Local Water Diffusion Phenomenon Clustering from High Angular Resolution Diffusion Imaging (HARDI)
Giot, Romain	Lab. GREYC , ENSICAEN - Univ.
Charrier, Christophe	Univ. de Caen
Descoteaux, Maxime	Univ. de Sherbrooke
Keywords: Medical Image Analysis and Registration, Classification and Clustering Abstract: The understanding of neurodegenerative diseases undoubtedly passes through the study of human brain white matter fiber tracts. To date, diffusion magnetic resonance imaging (dMRI) is the unique technique to obtain information about the neural architecture of the human brain, thus permitting the study of white matter connections and their integrity. However, a remaining challenge of the dMRI community is to better characterize complex fiber crossing configurations, where diffusion tensor imaging (DTI) is limited but high angular resolution diffusion imaging (HARDI) now brings solutions. This paper investigates the development of both identification and classification process of the local water diffusion phenomenon based on HARDI data to automatically detect imaging voxels where there are single and crossing fiber bundle populations. The technique is based on knowledge extraction processes and is validated on a dMRI phantom dataset with ground truth.

14:50-15:10, Paper ThCT2.2
Texture and Shape in Fluorescence Pattern Identification for Auto-Immune Disease Diagnosis
Snell, Violet	Univ. of Surrey
Christmas, William	Univ. of Surrey
Kittler, Josef	Univ. of Surrey
Keywords: Medical Image Analysis and Registration Abstract: Automation of HEp-2 cell pattern classification would drastically improve the accuracy and throughput of diagnostic services for many auto-immune diseases, but it has proven difficult to reach a sufficient level of precision. Correct diagnosis relies on a subtle assessment of texture type in microscopic images of indirect immunofluorescence (IIF), which so far has eluded reliable replication through automated measurements. We introduce a combination of spectral analysis and multi-scale digital filtering to extract the most discriminative variables from the cell images. We also apply multi-stage classification techniques to make optimal use of the limited labelled data set. Overall error rate of 1.6% is achieved in recognition of 6 different cell patterns, which drops to 0.5% if only positive samples are considered.

15:10-15:30, Paper ThCT2.3
A New Convex Variational Model for Liver Segmentation
Peng, Jialin	Zhejiang Univ.
Wang, Jinwei	Zhejiang Univ.
Kong, Dexing	Zhejiang Univ.
Yang, Wenhui	Univ. of Science and Tech. of China
Keywords: Medical Image Analysis and Registration, Segmentation, Color and Texture, Computer-Aided Diagnosis and Surgery Abstract: Due to intensity overlapping, blurred edges and complex backgrounds with clutter features, liver segmentation is still a challenging task. In this paper, we address it with a constrained convex variational model, which can deﬁnitely avoid leakage through anatomical knowledge from users. A novel heuristic intensity model is proposed to suppress irrelevant strong edges and constrain the segmentation. Both global and local region appearance information are integrated to model higher level features such as local context. As a result, weak liver boundaries and ﬁne structures can be stably delineated according to the information from neighborhood and nearby layers. No precise prior segmentation is needed and few seeds without shape restriction, about three seeds, are adequate to capture ﬁne structures. The initialization is also very easy. Moreover, an accelerated primal-dual algorithm is proposed to efﬁciently and globally optimize the model. Our method is validated on MICCAI dataset and produces a high score of 80.6. It can be used to segment other abdominal organs.

15:30-15:50, Paper ThCT2.4
Super-Resolution of MR Volumetric Images Using Sparse Representation and Self-Similarity
Iwamoto, Yutaro	Ritsumeikan Univ.
Han, Xian-Hua	Ritsumeikan Univ.
Sasatani, So	Ritsumeikan Univ.
Taniguchi, Kazuki	Ritsumeikan Univ.
Xiong, Wei	Inst. for Infocomm Res. A-STAR
Chen, Yen-wei	Ritsumeikan Univ.
Keywords: Medical Image Analysis and Registration, Enhancement, Restoration and Filtering Abstract: Magnetic resonance imaging can only acquire volume data with finite resolution due to various factors. In particular, the resolution in the slice direction is much lower than that in the in-plane direction, yielding un-realistic visualizations. To solve this problem, interpolation techniques have conventionally been applied. However, classical interpolation techniques generally cause some artifact noise such as jaggedness and blurring in the edge regions. In this paper, we propose a new super-resolution framework for generating high-resolution data in the slice direction. In the proposed approach, we estimate the high-frequency component using a learning-based super-resolution technique with sparse representation and prove that the dictionary can be constructed using the in-plane frame as the input data without any other high-resolution data as training. Furthermore, we optimize estimated high-resolution data by adding a new regularization term with a non-local means algorithm. Experiments confirm that our proposed method is more effective than the conventional methods.

15:50-16:10, Paper ThCT2.5
A Novel Framework for Segmentation of Stroke Lesions in Diffusion Weighted MRI Using Multiple B-Value Data
Mujumdar, Shashank	International Inst. of Information Tech.
Sivaswamy, Jayanthi	IIIT Hyderabad
Varma, Ravi	KIMS Hospital, Hyderabad
L.T., Kishore	CARE Hospital, Hyderabad
Keywords: Medical Image Analysis and Registration, Detection, Separation and Segmentation Abstract: Diffusion Weighted MR Imaging (DWI) is routinely used for early detection of cerebral ischemic stroke. DWI with higher b-values (b=2000) provide improved sensitivity, higher conspicuity and reduced artifacts and thus improve the detectability of smallest infarcts than conventional DWI (b=1000). We propose a novel framework for accurately detecting stroke regions by combining information from multiple sources:b2000, b1000 data and the apparent diffusion coefficient map. The detected lesions are finally segmented using an active contour approach. The proposed method was tested on 41 datasets acquired with different protocols. A comparison of our method with a leading method validates the effectiveness of our approach. The median dice coefficient, sensitivity and specificity for stroke segmentation were 0.84, 87.07% and 99.90% respectively. The strength of the proposed method is its ability to capture (and accurately segment) the small (and large) lesions in the data which are often missed by segmentation methods operating on a single b-value data.


ThCT3	Room 101+102
Applications	Regular Session
Chair: Chen, Chu-Song	Acad. Sinica
Co-Chair: Ohta, Yuichi	Univ. of Tsukuba

14:30-14:50, Paper ThCT3.1
Connecting the Dots: Triadic Clustering of Crowdsourced Data to Map Dirt Roads
Huynh, Andrew	Univ. of California, San Diego
Lin, Albert	California Inst. of Telecommunications and Information Tech.
Keywords: Segmentation, Color and Texture, Pattern Recognition for Art, Cultural Heritage and Entertainment, Classification and Clustering Abstract: Road segmentation is a critical application of satellite and aerial remote sensing. Traditional attempts to apply machine learning and computer vision have yielded good results but rely on specific characteristics, such as the contrast of paved roads to their surroundings or contextual clues. However, these methods still lack the sensitivity of human perception when identifying rural, non-paved, or less defined roads. We propose combining crowdsourced human labeling with triadic linear clustering to accurately map rural roads across the sparsely populated Mongolian steppe. From 600,000 road annotations made by 8000 volunteer participants we selected a random dataset to apply our proposed approach. We report the performance of the method used compared to the current state-of-the-art in automated and semi-automated satellite image road detection.

14:50-15:10, Paper ThCT3.2
Real-Time Staircase Detection from a Wearable Stereo System
Lee, Young Hoon	Univ. of Southern California
Leung, Tung-Sing	Univ. of Southern California
Medioni, Gerard	Univ. of Southern California
Keywords: Vision for Robotics, Stereo and Image-Based Modeling, Scene Understanding Abstract: We address the problem of staircase detection, in the context of a navigation aid for the visually impaired. The requirements for such a system are robustness to viewpoint, distance, scale, real-time operation, high detection rate and low false alarm rate. Our approach uses classifiers trained using Haar features and Adaboost learning. This first stage does detect staircases, but produces many false alarms. The false alarm rate is drastically reduced by using spatial context in the form of the estimated ground plane, and by enforcing temporal consistency. We have validated our approach on many real sequences under various weather conditions, and are presenting some of the quantitative results here

15:10-15:30, Paper ThCT3.3
Optimal Consensus Set and Preimage of 4-Connected Circles in a Noisy Environment
Andres, Eric	Univ. of Poitiers, Lab. XLIM
Largeteau-Skapin, Gaelle	XLIM-SIC Department, Univ. of Poitiers
Zrour, Rita	Univ. of Poitiers
Sugimoto, Akihiro	National Inst. of Informatics
Kenmochi, Yukiko	Univ. Paris-Est
Keywords: 2D/3D Object Detection and Recognition, Detection, Separation and Segmentation Abstract: This paper exploits the problem of fitting special forms of annuli that correspond to 4-connected digital circles to a given set of points in 2D images in the presence of noise by maximizing the number of inliers, namely the consensus set. We prove that the optimal solutions can be described by solutions with three points on the annulus boundary. These solutions correspond to vertices of the preimage of the annulus in the parameter space thus allowing us to build the preimage.

15:30-15:50, Paper ThCT3.4
Learning Action Symbols for Hierarchical Grammar Induction
Lee, Kyuhwa	Imperial Coll. London
Kim, Tae-Kyun	Imperial Coll. London
Demiris, Yiannis	Imperial Coll. London
Keywords: Vision for Robotics, Gesture and Behavior Analysis, Image and Video Understanding Abstract: We present an unsupervised method of learning action symbols from video data, which self-tunes the number of symbols to effectively build hierarchical activity grammars. A video stream is given as a sequence of unlabeled segments. Similar segments are incrementally grouped to form a hierarchical tree structure. The tree is cut into clusters where each cluster is used to train an action symbol. Our goal is to find a good set of clusters i.e. symbols where regularities are best captured in the learned representation, i.e. induced grammar. Our method has two-folds: 1) Create a candidate set of symbols from initial clusters, 2) Build an activity grammar and measure model complexity and likelihood to assess the quality of the candidate set of symbols. We propose a balanced model comparison method which avoids the problem commonly found in model complexity computations where one measurement term dominates the other. Our experiments on the towers of Hanoi and human dancing videos show that our method can discover the optimal number of action symbols effectively.

15:50-16:10, Paper ThCT3.5
Glass Object Localization by Joint Inference of Boundary and Depth
Wang, Tao	Australian National Univ. NationalICTAustralia(NICTA)
He, Xuming	National ICT Australia and Australian National Univ.
Barnes, Nick	NICTA
Keywords: 2D/3D Object Detection and Recognition, Vision for Robotics Abstract: We address the problem of localizing glass objects with multi-modal RGB-D camera. Our method integrates the intensity and depth information from a single view point, and builds a Markov Random Field that predicts glass boundary and region jointly. Based on the localization, we also reconstruct the depth of the scene and fill in the missing depth values. The efficacy of our algorithm is validated on a new RGB-D Glass dataset of 43 distinct glass objects.


ThCT4	Hall 200
3D Imaging and Application	Regular Session
Chair: Hancock, Edwin	Univ. of York
Co-Chair: Jiang, Xiaoyi	Univ. of M�nster

14:30-14:50, Paper ThCT4.1
Image Matting with Color and Depth Information
Lu, Ting	Hunan Univ.
Li, Shutao	Hunan Univ.
Keywords: Detection, Separation and Segmentation, Image and Video Processing Abstract: This paper describes an efficient image matting method by combining color and depth information. First, the depth image is segmented by variational level set. Then morphological operators, dilation and erosion, are used to form trimap of ROI (Region of Interest). Finally, with preprocessed depth image, color image and trimap as inputs, an RGB-D Bayesian matting method is proposed to estimate the alpha matte. Experiments show our proposed RGB-D matting algorithm can significantly improve the quality of matting result with an automatically generated trimap.

14:50-15:10, Paper ThCT4.2
A Comprehensive Polarisation Model for Surface Orientation Recovery
Zhang, Lichi	Univ. of York
Hancock, Edwin	Univ. of York
Keywords: Image and Video Processing, Physics-Based Vision, Stereo and Image-Based Modeling Abstract: In this paper a polarisation model which predicts surface reflection as a function of refractive index and angle of incidence is introduced. We present the underlying physics of polarisation which is based on the Fresnel theory and Malus' law. The proposed model can be used to recover the shape of the objects for images taken under polarised light. The traditional way of shape recovery using diffuse polarisation is inaccurate due to noise in the degree of polarisation (DOP) measurements, computed from images captured under unpolarised light, which are small at most locations on the surface. The proposed model improves results as the DOP values are relatively larger in polarised light, we present a number of experimental results to demonstrate.

15:10-15:30, Paper ThCT4.3
Depth Image Up-Sampling Using Ant Colony Optimization
Tian, Jing	Wuhan Univ. of Science and Tech.
Chen, Li	Wuhan Univ. of Science and Tech.
Keywords: Enhancement, Restoration and Filtering, Image and Video Processing Abstract: Accurate depth map at high resolution is required in many 3D video concepts. Given a low-resolution depth map, this paper studies how to enhance its resolution with a registered high-resolution color image. The idea of the proposed approach is that pixels with similar color values and small distances should have similar depth values, while color discontinuities indicate sharp depth changes at object edges. Therefore, the known depth values in input depth map can be propagated to estimate the unknown depth values of their neighboring pixels with similar color values and small distances in high-resolution depth map. Different from conventional approaches, the proposed approach utilizes the ant colony optimization (ACO) technique to dispatch artificial ants moving on a coupled graph, which consists of a depth map and a color image, and propagate the known depth information from the observed low-resolution depth map to its up-sampled counterpart. Experimental results show that the proposed approach achieves high-resolution depth maps at more desirable quality than that of conventional approaches.

15:30-15:50, Paper ThCT4.4
Faithful Spatio-Temporal Disocclusion Filling Using Local Optimization
Schmeing, Michael	Univ. of M�nster
Jiang, Xiaoyi	Univ. of M�nster
Keywords: Image and Video Processing, Stereo and Image-Based Modeling, Occlusion and Shadow Detection Abstract: We present a novel method to fill disoccluded regions occurring in Depth Image Based Rendering (DIBR) in a faithful way. Given a video stream and a corresponding depth map, DIBR can render arbitrary new views of a scene. Areas that are not visible in the reference view need to be filled after warping. We present a novel framework for this task which can reconstruct the disoccluded regions by taking temporally neighboring frames into account. An efficient optimization scheme is employed to find faithful filling regions. This way, in contrast to common methods, we can fill disocclusions with their true color values, yielding high-quality view synthesis results.

15:50-16:10, Paper ThCT4.5
Point Cloud Transport
Nakajima, Hozuma	Osaka Univ.
Makihara, Yasushi	The Inst.
Hsu, Hsu	Osaka Univ.
Mitsugami, Ikuhisa	Osaka Univ.
Nakazawa, Mitsuru	Osaka Univ.
Yamazoe, Hirotake	Osaka Univ.
Habe, Hitoshi	Kinki Univ.
Yagi, Yasushi	Osaka Univ.
Keywords: Image and Video Processing Abstract: In this paper we propose a method for temporal interpolation of a point cloud undergoing occlusions and topological changes. The point cloud is first merged into fine clusters, which are then further merged into coarse clusters for each source and target shape. In conjunction with trash box bins to cope with occlusions, a coarse correspondence between a source and a target shape is found that minimizes the transportation cost in the earth mover's distance framework. Subsequently, a fine correspondence is found in a similar way based on the coarse correspondence constraint to suppress locally isolated motion. Finally, the source and target point clouds are transported based on the fine correspondence. Experiments with point cloud sequences captured by a Kinect range finder show promising results.


ThCT5	Hall 300
Gesture and Action Analysis-II	Regular Session
Chair: Nevatia, Ram	USC
Co-Chair: Mukaigawa, Yasuhiro	Osaka Univ.

14:30-14:50, Paper ThCT5.1
Human Actions Recognition from Streamed Motion Capture
Barnachon, Mathieu	Univ. de Lyon, CNRS, Univ. Lyon 1, LIRIS UMR5205
Bouakaz, Saida	Univ. Claude Bernard Lyon1, liris Lab.
Boufama, Boubakeur	Univ. of Windsor
Guillou, Erwan	Univ. de Lyon, CNRS, Univ. Lyon 1, LIRIS UMR5205
Keywords: Gesture and Behavior Analysis, Motion, Tracking and Video Analysis, Human Computer Interaction Abstract: This paper introduces a new method for streamed action recognition using Motion Capture (MoCap) data. First, the histograms of action poses, extracted from MoCap data, are computed according to Hausdorf distance. Then, using a dynamic programming algorithm and an incremental histogram computation, our proposed solution recognizes actions in real time from streams of poses. The comparison of histograms for recognition was achieved using Bhattacharyya distance. Furthermore, the learning phase has remained very efficient with respect to both time and complexity. We have shown the effectiveness of our solution by testing it on large datasets, obtained from animation databases. In particular, we were able to achieve excellent recognition rates that have outperformed the existing methods.

14:50-15:10, Paper ThCT5.2
Inertial-Sensor-Based Walking Action Recognition Using Robust Step Detection and Inter-Class Relationships
Ngo, Thanh Trung	The Inst. of Scientific andIndustrialResearch,OsakaUniversit
Makihara, Yasushi	The Inst.
Nagahara, Hajime	Kyushu Univ.
Mukaigawa, Yasuhiro	Osaka Univ.
Yagi, Yasushi	Osaka Univ.
Keywords: Gesture and Behavior Analysis, Pattern Recognition for Surveillance and Security, Detection, Separation and Segmentation Abstract: This paper tackles a challenging problem of inertial sensor-based recognition for similar walking action classes. We solve two remaining problems of existing methods in the case of walking actions: action signal segmentation and recognition of similar action classes. First, to robustly segment the walking action under drastic changes such as speed, intensity, or style, we rely on the likelihood of heel strike that is computed employing a scale-space technique. Second, to improve the classification performance with similar action classes, we incorporate the inter-class relationship. In experiments, the proposed algorithms were positively validated with 97 subjects and five similar walking action classes, namely walking on flat ground, up/down stairs, and up/down a slope.

15:10-15:30, Paper ThCT5.3
Correlations between 48 Human Actions Improve Their Detection
Burghouts, Gertjan	TNO
Schutte, Klamer	TNO Defence, Security and Safety
Keywords: Gesture and Behavior Analysis, Motion, Tracking and Video Analysis, Features and Image Descriptors Abstract: Many human actions are correlated, because of compound and/or sequential actions, and similarity. Indeed, human actions are highly correlated in human annotations of 48 actions in the 4,774 videos from visint.org. We exploit such correlations to improve the detection of these 48 human actions, ranging from simple actions such as walk to complex actions such as exchange. We apply a basic pipeline of STIP features, a Random Forest to quantize the features into histograms, and an SVM classifier. First, we show that the sampling for the Random Forest can be improved by exploiting the correlations between human actions. Second, we show that exploiting all 48 actions' posteriors for detecting a particular action also improves further the detection in general. We demonstrate a 50% relative improvement for human action detection in 1,294 realistic test videos.

15:30-15:50, Paper ThCT5.4
Gesture Recognition System Based on Adaptive Resonance Theory
Park, Paul K. J.	Samsung Advanced Inst. of Tech.
Lee, Jun Haeng	Samsung Advanced Inst. of Tech.
Shin, Chang Woo	Samsung Advanced Inst. of Tech.
Ryu, Hyun-Surk	Samsung Advanced Inst. of Tech.
Kang, Byung-Chang	Samsung Advanced Inst. of Tech.
Carpenter, Gail A.	Boston Univ.
Grossberg, Stephen	Boston Univ.
Keywords: Gesture and Behavior Analysis, Neural Networks, Human Computer Interaction Abstract: We report on the moving hand gesture recognition technique using Adaptive Resonance Theory (ART). To detect the start and end points of a continuous moving gesture (known as �gesture spotting� problem), we propose the adaptive distributed prediction technique. Our results show that, unlike conventional non-recurrent neural networks, the proposed technique can be utilized usefully in reliable real-time learning (2000 times faster than with alternative methods) and recognition of continuously moving patterns.

15:50-16:10, Paper ThCT5.5
Sparse Shift-Invariant Representation of Local 2D Patterns and Sequence Learning for Human Action Recognition
Baccouche, Moez	Orange Lab. R&D - LIRIS INSA Lyon
Mamalet, Franck	Orange Lab.
Wolf, Christian	INSA de Lyon
Garcia, Christophe	LIRIS - Insa de Lyon
Baskurt, Atilla	LIRIS, INSA Lyon
Keywords: Machine Learning and Data Mining, Neural Networks, Image and Video Understanding Abstract: Most existing methods for action recognition mainly rely on manually engineered features which, despite their good performances, are highly problem dependent. We propose in this paper a fully automated model, which learns to classify human actions without using any prior knowledge. A convolutional sparse auto-encoder learns to extract sparse shift-invariant representations of the 2D local patterns present in each video frame. The evolution of these mid-level features is learned by a Recurrent Neural Network trained to classify each sequence. Experimental results on the KTH dataset show that the proposed approach outperforms existing models which rely on learned-features, and gives comparable results with the best related works.


ThPBT6	Room 201+202
Poster Session (13, 14)	Poster Session

Technical Program for Thursday November 15, 2012