ICPR16


Tu1PL	G.Cancun T1
Maria Petrou Prize - Michal Irani	Plenary Session


TuAT1	G.Cancun T1.A
TuAMO1	Oral Session

10:30-10:50, Paper TuAT1.1
Joint Learning Dictionary and Discriminative Features for High Dimensional Data
Wei, Xian	Shanghai Jiao Tong Univ
Li, Yuanxiang	Shanghai Jiao Tong Univ
Shen, Hao	Tech. Univ. München
Murphey, Yi	Univ. of Michigan-Dearborn
Keywords: Classification and clustering, Dimensionality reduction and manifold learning, Representation and analysis in pixel/voxel images Abstract: Recently, sparse representation (SR) over a redundant dictionary has become a popular way of representing the data. It has been verified as an efficient and useful tool to promote the discrimination between signals. This work develops a joint learning approach to find the low dimensional discriminative features for high dimensional data. To avoid the high computational cost of direct sparse coding on large scale input data, we first learn SR in an orthogonal projected space over a task-driven sparsifying dictionary. We then exploit the discriminative projection on SR. The whole learning process is treated as an optimization problem of trace quotient maximization, which involves an orthogonal projection on original data space, a dictionary and a discriminative projection on sparse codes. The related cost function is well defined on a product manifold of the Stiefel manifold, the Oblique manifold and the Grassmann manifold. Finally, we employ a stochastic gradient descent algorithm on the smooth product manifold to maximize the cost function. Our numerical experiments on visual recognition demonstrate the effectiveness of the proposed algorithm, in comparison with the state of the arts.

10:50-11:10, Paper TuAT1.2
Network Entropy Analysis Using the Maxwell-Boltzmann Partition Function
Wang, Jianjia	Univ. of York
Wilson, Richard	Univ. of York
Hancock, Edwin	Univ. of York
Keywords: Statistical, syntactic and structural pattern recognition, Semi-supervised learning and spectral methods, Classification and clustering Abstract: In this paper, we use the Maxwell-Boltzmann partition function to compute network entropy. The partition function is used to model the energy level population statistics where the network is in thermodynamic equilibrium with a heat-bath. Here the network Hamiltonian operator defines a set of energy levels occupied by particles in thermal equilibrium. These energy levels are given by the eigenvalues of the normalized Laplacian matrix. In other words, we investigate a thermalised version of the system normally studied in spectral graph theory, where the thermalisation accounts for noise in the system. We provide a systematic study of the entropy resulting from this characterization. Compared to previous work based on using von Neumann network entropy, this thermodynamic quantity is effective in characterizing changes of network structure and distinguishing different types of network models (e.g. ErdH{o}s-R'{e}nyi random graphs, small world networks, and scale free networks). Numerical experiments on real world data-sets are presented to evaluate the qualitative and quantitative differences in performance.

11:10-11:30, Paper TuAT1.3
Fusion and Community Detection in Multi-Layer Graphs
Gligorijevic, Vladimir	Imperial Coll. London
Panagakis, Yannis	Imperial Coll. London, Department of Computing
Zafeiriou, Stefanos	IMPERIAL Coll. OF LONDON
Keywords: Classification and clustering, Dimensionality reduction and manifold learning, Machine learning and data mining Abstract: Relational data arising in many domains can be represented by networks (or graphs) with nodes capturing entities and edges representing relationships between these entities. Community detection in networks has become one of the most important problems having a broad range of applications. Until recently, the vast majority of papers have focused on discovering community structures in a single network. However, with the emergence of multi-view network data in many real-world applications and consequently with the advent of multi-layer graph representation, community detection in multi-layer graphs has become a new challenge. Multi-layer graphs provide complementary views of connectivity patterns of the same set of vertices. Fusion of the network layers is expected to achieve better clustering performance. In this paper, we propose two novel methods, coined as WSSNMTF (Weighted Simultaneous Symmetric Non-Negative Matrix Tri-Factorization) and NG-WSSNMTF (Natural Gradient WSSNMTF), for fusion and clustering of multi-layer graphs. Both methods are robust with respect to missing edges and noise. We compare the performance of the proposed methods with two baseline methods, as well as with three state-of-the-art methods on synthetic and three real-world datasets. The experimental results indicate superior performance of the proposed methods.

11:50-12:10, Paper TuAT1.5
Cross-Heterogeneous-Database Age Estimation with Co-Representation among Them
Tian, Qing	Nanjing Univ. of Aeronautics and Astronautics
Chen, Songcan	Nanjing Univ. of Aeronautics & Astronautics
Keywords: Classification and clustering, Dimensionality reduction and manifold learning, Transfer learning Abstract: Human age estimation is an important research topic and can find its applications in such as commodity recommendation and security monitoring. The establishment of existing estimators basically follows a same pipeline, i.e., an estimator is built from a given training dataset like FG-NET and then evaluated on a holdout testing set to determine its effectiveness. In doing so, a usually-followed assumption is that both training and testing sets should share the same age distribution and the same feature representation, implying that 1) once the true age of a human image to be tested is out of the age range of the training set, a mis-estimation is naturally inevitable; 2) estimators built on datasets with different feature representations cannot be directly applied to make predictions on testing sets of each other unless re-trained, because their features are usually different (i.e., the databases are heterogeneous). To the best of our knowledge, the age distributions of different aging databases are usually not consistent and complementary to each other. Motivated by this fact, in order to incorporate such a complementarity in age distributions to improve the generalization ability of the age estimator, in this paper we propose a unified cross-heterogeneous-database age estimation method by first projecting the training samples, usually represented with different features, of different aging databases into a common feature space, and then constructing an age estimator in their mixed sample space. By this way, the age-distribution-incompleteness of the aging datasets can be alleviated by co-representation among them and thus the discriminating ability of the age estimator can be reinforced. Finally, experimental results demonstrate the superiority of the proposed method.

12:10-12:30, Paper TuAT1.6
A Novel Entropy-Based Graph Signature from the Average Mixing Matrix
Bai, Lu	Central Univ. of Finance and Ec
Rossi, Luca	Aston Univ
Cui, Lixin	School of Information, Central Univ. of Finance and Ec
Hancock, Edwin	Univ. of York
Keywords: Statistical, syntactic and structural pattern recognition Abstract: In this paper, we propose a novel entropic signature for graphs, where we probe the graphs by means of continuous-time quantum walks. More precisely, we characterise the structure of a graph through its average mixing matrix. The average mixing matrix is a doubly-stochastic matrix that encapsulates the time-averaged behaviour of a continuous-time quantum walk on the graph, i.e., the ij-th element of the average mixing matrix represents the time-averaged transition probability of a continuous-time quantum walk from the vertex v_i to the vertex v_j. With this matrix to hand, we can associate a probability distribution with each vertex of the graph. We define a novel entropic signature by concatenating the average Shannon entropy of these probability distributions with their Jensen-Shannon divergence. We show that this new entropic measure can encaspulate the rich structural information of the graphs, thus allowing to discriminate between different structures. We explore the proposed entropic measure on several graph datasets abstracted from bioinformatics databases and we compare it with alternative entropic signatures in the literature. The experimental results demonstrate the effectiveness and efficiency of our method.

12:10-12:30, Paper TuAT1.19
Inexact MDL for Linear Manifold Clusters
Haralick, Robert	City Univ. of New York, Graduate Center
Diky, Art	City Univ. of New York, Graduate Center
Su, Xing	City Univ. of New York, Graduate Center
Kiang, Nancy Y.	NASA Goddard Inst. for Space Studies
Keywords: Classification and clustering, Dimensionality reduction and manifold learning Abstract: We present a regularization technique based on the minimum description length (MDL) principle for the linear manifold clustering. We suggest an inexact minimum description length method based on describing the data structure as linear manifold clusters. We examine the behavior of the proposed method and compare it performance against simulated clustering results of various dimensionality and structure. Finally, we empirically evaluate the proposed technique on a climate data.


TuAT2	G.Cancun T1.B
TuAMO2	Oral Session

10:30-10:50, Paper TuAT2.1
An Efficient Hough Transform for Multi-Instance Object Recognition and Pose Estimation
Yoruk, Erdem	Vispera Information Tech
Oner, Kaan Taha	Vispera Information Tech
Akgul, Ceyhun Burak	Vispera Information Tech
Attachments: Supplementary material Keywords: 2D/3D object detection and recognition, Scene understanding, Content based image retrieval and data mining Abstract: Generalized Hough transform, when applied to object detection, recognition and pose estimation, can be susceptible to spurious voting depending on the Hough space to be used and hypotheses to be voted. This often necessitates additional computational steps like non-maxima suppression and geometric consistency checks, which can be costly and prevent voting based methods from being precise and scalable for large numbers of target classes and crowded scenes. In this paper, we propose an efficient and refined Hough transform for simultaneous detection, recognition and exact pose estimation, which can efficiently accommodate up to multiple tens of co-visible query instances and multiple thousands of visually similar classes. Specifically, we match SURF features from a given query image to a database of model features with known poses, and in contrast to existing techniques, for each matched pair, we analytically compute a concise set of 6 degrees-of-freedom pose hypotheses, for which the geometric relationship of the correspondence remains invariant. We also introduce an indirect but equivalent representation for those correspondence-specific poses, termed as emph{feature aligning affine transformations}, which results in a Hough voting scheme as cheap and refined as line drawing in raster grids. Owing to minimized voting redundancy, we can obtain a very sparse and stable Hough image, which can be readily used to read off instances and poses without dedicated steps of non-maxima suppression and geometric verification. Experimented on an extensive emph{Grocery Products} dataset, our method significantly outperforms the sate-of-the-art with near real time overall cost.

10:50-11:10, Paper TuAT2.2
A Bag of Relevant Regions for Visual Place Recognition in Challenging Environments
Maldonado-Ramírez, Angel Alejandro	CINVESTAV Campus Saltillo
Torres-Méndez, Luz Abril	CINVESTAV Campus Saltillo
Castelán, Mario	CINVESTAV Campus Saltillo
Attachments: Supplementary material Keywords: Vision for robotics, 2D/3D object detection and recognition, Segmentation, features and descriptors Abstract: In this paper, we present a method for vision-based place recognition in environments with a high content of similar features and that are prone to variations in illumination. The high similarity of features makes difficult the disambiguation between two different places. The novelty of our method relies on using the Bag of Words (BoW) approach to derive an image descriptor from a set of relevant regions, which are extracted using a visual attention algorithm. We name our approach Bag of Relevant Regions (BoRR). The descriptor of each relevant region is built by using a 2D histogram of the chromatic channels of the CIELab color space. We have compared our results with those using state of the art descriptors that include the BoW and demonstrate that our approach performs better in most of the cases.

11:10-11:30, Paper TuAT2.3
Edge Chain Detection by Applying Helmholtz Principle on Gradient Magnitude Map
Lu, Xiaohu	School of Remote Sensing and Information Engineering Wuhan Univ
Yao, Jian	Wuhan Univ
Li, Li	Wuhan Univ
Liu, Yahui	Wuhan Univ
Zhang, Wei	Shandong Univ
Keywords: 2D/3D object detection and recognition, Image and video analysis and understanding, Low-level vision Abstract: In this paper, we present an efficient edge chains detection algorithm by applying the Helmholtz principle on the gradient magnitude map of an image. A edge chain validation method is proposed which uses the ``relative number of false alarms" (RNFA) instead of the ``number of false alarms" (NFA). The edge chains are detected first and then validated according to their RNFA values. In this way, edge chains that are weak in gradients but meaningful in vision can be detected. To evaluate the proposed edge chain detector in quantity, an edge chain detection benchmark which consists of 25 labeled images in different scenes is built. The proposed edge chain detector is tested in this benchmark, and the experimental results sufficiently demonstrate that the proposed edge chain detector outperform the state-of-the-art methods.

11:30-11:50, Paper TuAT2.4
Taxonomy Augmented Object Recognition
Wang, Xiaoyang	Nokia Bell Labs
Zhao, Yue	Minzu Univ. of China
Ji, Qiang	RPI
Keywords: 2D/3D object detection and recognition, Classification and clustering Abstract: Realistic scene object recognition in computer vision still faces great challenges due to the large intra-class variation of object images caused by factors like object appearance variation and viewpoint change. To address this challenge, we propose to exploit the semantic relationships embedded in object taxonomy for improved object recognition. Specifically, we exploit the relationships in the object taxonomy to augment the learning of object classifiers. We utilize two types of relationships in the taxonomy, including the overall relationship and the local relationship. Our proposed approach jointly incorporates both the overall relationship as the loss function for classifier learning, and the local relationship as classifier learning constraints. Experiments on benchmark datasets demonstrate the effectiveness of our method in incorporating taxonomy for object recognition compared to the state-art-the-art methods.

11:50-12:10, Paper TuAT2.5
Centerline Detection on Partial Mesh Scans by Confidence Vote in Accumulation Map
Kerautret, Bertrand	LORIA, Univ. De Lorraine
Krähenbühl, Adrien	Univ. De Lorraine, LORIA
Debled-Rennesson, Isabelle	LORIA - Nancy Univ
Lachaud, Jacques-Olivier	Lab. of Mathematics (UMR CNRS 5127), Univ. Savoie Mon
Keywords: 2D/3D object detection and recognition, 3D shape recovery, Representation and analysis in pixel/voxel images Abstract: This paper proposes an original method for extracting the centerline of 3D objects given only partial mesh scans as input data. Its principle relies on the construction of a normal vector accumulation map build by casting digital rays from input vertices. This map is then pruned according to a confidence voting rule: confidence in a point increases if this point has maximal votes along a ray. Points with high confidence accurately delineate the centerline of the object. The resulting centerline is robust enough to allow the reconstruction of the associated graph by a simple morphological processing of the confidence and a geodesic tracking. The overall process is unsupervised and only depends on a user-chosen maximal object radius. Experiments show a good behavior on standard mesh scans. Moreover, the proposed method is not only competitive with state-of-the-art methods on perfect data, but appears to be much more reliable on imperfect or damaged data, like holes, partial scans, noise, and scans from only one direction.

12:10-12:30, Paper TuAT2.6
Detection and Localization with Multi-Scale Models
Ohn-Bar, Eshed	Univ. of California, San Diego
Trivedi, Mohan	Univ. of California, San Diego
Keywords: 2D/3D object detection and recognition, Signal, image and video processing, Image and video analysis and understanding Abstract: Object detection and localization in images involve a multi-scale reasoning process. First, responses of object detectors are known to vary with image scale. Second, contextual relationships on a part-level, object-level, and scene-level appear at different scales of the image. This paper studies efficient modeling of these two components by training multi-scale template models. The input to the proposed algorithm involves image features computed at varying image scales, hence operating on volumes in the feature pyramid. The approach generalizes single-scale, local-region detection approaches (e.g. sliding window or region proposals), jointly learning detection and localization cues. Extending the common single-scale detection to a multi-scale volume allows learning scale-specific models as well as analyzing the importance of contextual information at different scales. Experimental analysis on the PASCAL VOC dataset shows the method to considerably improve both detection and localization performance for different type of features, histogram of oriented gradients and deep convolutional neural network features.


TuAT3	Maya T2.A
TuAMO3	Oral Session

10:30-10:50, Paper TuAT3.1
Restoration of Images with Wavefront Aberrations
Zelenka, Claudius	Kiel Univ
Koch, Reinhard	Kiel Univ
Attachments: Supplementary material Keywords: Enhancement, restoration and filtering, Signal, image and video processing Abstract: This contribution deals with image restoration in optical systems with coherent illumination, which is an important topic in astronomy, coherent microscopy and radar imaging. Such optical systems suffer from wavefront distortions, which are caused by imperfect imaging components and conditions. Known image restoration algorithms work well for incoherent imaging, they fail in case of coherent images. In this paper a novel wavefront correction algorithm is presented, which allows image restoration under coherent conditions. In most coherent imaging systems, especially in astronomy, the wavefront deformation is known. Using this information, the proposed algorithm allows a high quality restoration even in case of severe wavefront distortions. We present two versions of this algorithm, which are an evolution of the Gerchberg-Saxton and the Hybrid-Input-Output algorithm. The algorithm is verified on simulated and real microscopic images.

10:50-11:10, Paper TuAT3.2
Depth Map Upsampling by Self-Guided Residual Interpolation
Konno, Yousuke	Tokyo Inst. of Tech
Tanaka, Masayuki	Tokyo Inst. of Tech
Okutomi, Masatoshi	Tokyo Inst. of Tech
Yanagawa, Yukiko	OMRON Corp
Kinoshita, Koichi	Omron
Kawade, Masato	OMRON
Keywords: Enhancement, restoration and filtering, Signal, image and video processing, Vision for robotics Abstract: In this paper, we propose a simple and effective depth upsampling technique using self-guided residual interpolation. The original residual interpolation requires guidance information such as high-resolution RGB color image. However, self-guided residual interpolation requires only a single depth map. In the proposed algorithm, a tentative estimation of a high-resolution depth map is first generated from an input low-resolution depth map. Then, re-interpolation is applied to the residual domain, which is defined by differences between the input depth map and the tentative estimate. A precise high-resolution depth map is obtainable by interpolating in the residual domain. Experimental results demonstrate that our algorithm can outperform state-of-the-art depth map upsampling algorithms.

11:10-11:30, Paper TuAT3.3
Adaptive Boosting for Image Denoising: Beyond Low-Rank Representation and Sparse Coding
Wang, Bo	Texas A&M Univ
Lu, Tao	Wuhan Univ
Xiong, Zixiang	Texas A&M Univ
Keywords: Enhancement, restoration and filtering, Low-level vision Abstract: In the past decade, much progress has been made in image denoising due to the use of low-rank representation and sparse coding. In the meanwhile, state-of-the-art algorithms also rely on an iteration step to boost the denoising performance. However, the boosting step is fixed or non-adaptive. In this work, we perform rank-1 based fixed-point analysis, then, guided by our analysis, we develop the first adaptive boosting algorithm, whose convergence is guaranteed. Preliminary results on the same image dataset show that AB uniformly outperforms existing denoising algorithms on every image and at each noise level, with more gains at higher noise levels.

11:30-11:50, Paper TuAT3.4
Day/Night Unconstrained Image Dehazing
Santra, Sanchayan	Indian Statistical Inst
Chanda, Bhabatosh	Indian Statistical Inst
Keywords: Enhancement, restoration and filtering, Image and video analysis and understanding, Signal, image and video processing Abstract: Images taken under fog or haze have their visibility reduced due to the existence of aerosols in the atmosphere. Image dehazing methods try to recover haze-free versions of these images by removing the effect of haze. Methods proposed till now are exclusively for daytime scene images or for night-time scene. The method we propose here can dehaze an image independent of whether it was captured during the day or night. To achieve this we have relaxed the image formation model to take into account spatially varying atmospheric light that may be present in night-time images. We estimate the contributing airlights and also the patches that are affected by it. We then remove the effect of airlight to obtain haze-free image. We demonstrate the results obtained by using our method on day and night-time images. We have compared out results with that of recently reported methods and the results show the effectiveness of our method.

11:50-12:10, Paper TuAT3.5
Anchored Fusion for Image Restoration
Timofte, Radu	ETH Zurich
Keywords: Enhancement, restoration and filtering, Coding, compression and super-resolution Abstract: Recently, a series of advances were made for image restoration tasks such as image denoising and single image super-resolution. It is particularly remarkable that methods employing different formulations and assumptions achieve comparable top performances. Moreover, the top methods operate at their best on some particular image contents and poorer on other. No method is the best on all the image contents. The methods are complementary in both formulation and performance. We propose a locally adaptive fusion of results of such methods towards an improved restoration result. We work patch-wise and partition the patch space such that per each partition to train anchored regressors from the fused methods' output patches to the fusion target result. At test our anchored fusion method applies efficiently the anchored regressors corresponding to the input patches to be fused. Whilst having a low time complexity, we achieve significant improvements over the fused state-of-the-art methods on standard test images for both image denoising and super-resolution tasks (e.g. 0.1 - 0.5dB PSNR).

12:10-12:30, Paper TuAT3.6
Image Restoration and Segmentation Using the Ambrosio-Tortorelli Functional and Discrete Calculus
Foare, Marion	LAMA, Univ. Savoie Mont Blanc
Lachaud, Jacques-Olivier	Lab. of Mathematics (UMR CNRS 5127), Univ. Savoie Mon
Talbot, Hugues	Univ. Paris-Est - ESIEE
Keywords: Enhancement, restoration and filtering, Segmentation, features and descriptors, Image and video analysis and understanding Abstract: We are interested in some essential tasks of image processing and image analysis, particularly image segmentation, image restoration and image cartooning. For these tasks, the Mumford-Shah (MS) functional has emerged as a very powerful formulation, seeking to represent an image with a piecewise smooth approximation. This approximation can be viewed as a denoising and simplication process, and the frontiers of the smooth pieces can be interpreted as a segmentation. In practice, however, the MS functional is difficult to optimize due to the joint simplification and contour delineation that it requires. Despite that difficulty, MS has served as an inspiration to many very well known and widely used methods, such as the Rudin-Osher-Fatemi (ROF) restoration method, TV denoising, or the Chan-Vese and Boykov-Jolly segmentation methods, and many others. The original formulation of MS is in the continuous domain but purely discrete versions have also been proposed (Graph Cuts). After recalling the MS functional, we present a new digital formulation of the Ambrosio-Tortorelli (AT) functional, an accurate approximation of MS that converges toward it.


TuAT4	Maya T2.B
TuAMO4	Oral Session

10:30-10:50, Paper TuAT4.1
Computationally Efficient Template-Based Face Recognition
Wu, Yue	Information Sciences Inst
AbdAlmageed, Wael	Univ. of Southern California
Rawls, Stephen	ISI
Natarajan, Prem	ISI
Keywords: Face recognition Abstract: Classically, face recognition depends on computing the similarity (or distance) between a pair of face images and/or their respective representations, where each subject is represented by one image. Template-based face recognition was introduced by the release of IARPA's Janus Benchmark-A (IJB-A) dataset, in which each enrolled subject is represented by a group of one or more images, called a emph{template}. The group of images comprising a template might have been acquired using different head poses, illuminations, ages and facial expressions. Template images could come from still images or video frames. Therefore, measuring the similarity between templates representing two subjects significantly increases the number of pairwise image comparisons (i.e., O(NM), where N and M are the number of image templates being compared). As the number of enrolled subjects, K, increases, both computational and space requirements become computationally prohibitive. To address this challenge, we present a novel approximate nearest-neighbor (ANN) search-based solution. Given a query template, ANN methods are used to find similar face images. Retrieved images are used to construct a template pool that is used to find the correct identity of the query subject. The proposed approach largely reduces the number of imposter template-pair comparisons. Experimental results on the IJB-A dataset show that the proposed approach achieves significant speed-up and storage savings, without sacrificing accuracy.

10:50-11:10, Paper TuAT4.2
Face Verification with Three-Dimensional Point Cloud by Using Deep Belief Networks
Jhuang, Dong-Han	National Taipei Univ. Taiwan (R.O.C)
Lin, Daw-Tung	National Taipei Univ
Chi_hung, Tai	National Taipei Univ
Keywords: Face recognition, 2D/3D object detection and recognition, Artificial neural networks Abstract: Developing reliable and robust face verification systems has been a tough challenge in computer vision, for several decades. The variation in illumination and head pose may seriously inhibit the accuracy of two-dimensional face recognition. With the invention of a depth map sensor, more three-dimensional volume data can be processed to mitigate the problem associated with face verification. This paper presents a three-dimensional face verification approach that includes three phases. First, point cloud library is applied to estimate features such as normal vectors and principal curvatures of every point on a human face point cloud acquired from three-dimensional depth sensor. Next, we adopt deep belief networks to train the identification model using extracted features. Finally, face verification is accomplished by using the pre-trained deep belief networks to justify if new incoming face point cloud feature is the one we specified. The experimental results demonstrate that the proposed system performs exceptionally well with about 96.43% verification accuracy.

11:10-11:30, Paper TuAT4.3
Efficient Video Face Recognition by Using Fisher Vector Encoding of Binary Features
Martínez-Díaz, Yoanna	CENATAV
Chang, Leonardo	CENATAV
Hernández, Noslen	Pontifical Catholic Univ. of Rio De Janeiro (PUC-Rio)
Mendez-Vazquez, Heydi	Advanced Tech. Application Center
Sucar, Luis Enrique	INAOE
Keywords: Face recognition, Biometric systems and applications Abstract: One of the main problems of recognizing faces in videos is to achieve accurate algorithms which can be used in real-time applications. Recently, Fisher Vector representation of local descriptors (e.g., SIFT) has gained widespread popularity, achieving good recognition rates. In this work, we propose to use Fisher Vector encoding of binary features for video face recognition, in order to speed up the computation time of the representation. The experimental evaluation was conducted on the challenging YouTube Faces database, showing that the proposed method is very efficient, and has an accuracy comparable with state-of-the-art methods.

11:30-11:50, Paper TuAT4.4
Learning Face Recognition from Limited Training Data Using Deep Neural Networks
Peng, Xi	Rutgers Univ
Ratha, Nalini	IBM Res
Pankanti, Sharath	IBM Res
Keywords: Face recognition, Deep learning, Signal, image and video processing Abstract: Often deep learning methods are associated with huge amounts of training data. The deeper the network gets, the larger is the need for training data. Large amount of labeled data helps the network learn about the variations it needs to handle in the prediction stage. It is not easy for every one to get access to huge amounts of labeled data leaving a few to have the luxury to design very deep networks. In this paper, we propose to flatten the disparity by using the modeling methods to minimize the need for huge amounts of data for training a deep network. Using face recognition as an example, we demonstrate how limited labeled data can be leveraged to obtain near state of the art performance with generalization capability across multiple databases. In addition, we show that the normalization in the overall network can improve the speed and resource requirement for the prediction/inferencing stage.

11:50-12:10, Paper TuAT4.5
Template Regularized Sparse Coding for Face Verification
Xu, Hongyu	Univ. of Maryland
Zheng, Jingjing	Univ. of Maryland, Coll. Park
Alavi, Azadeh	Univ. of Maryland
Chellappa, Rama	Univ. of Maryland
Keywords: Face recognition, Transfer learning Abstract: In this paper, we propose a novel regularized sparse coding approach for template-based unconstrained face verification. Unlike traditional verification tasks, which require the evaluation on image-to-image or video-to-video pairs, template-based face verification/recognition methods can exploit training and/or gallery data containing a mixture of both images or videos from the person of interest. The proposed regularized sparse coding approach addresses the adaptation to training and gallery data using three steps. First, we construct a reference dictionary, which represents the training set. Then we learn the discriminative sparse codes of the templates for verification through the proposed template regularized sparse coding approach. Finally, we measure the similarity between templates. An efficient algorithm is employed to learn the template regularized sparse codes. Extensive experiments on the template-based verification benchmark dataset show that the proposed approach outperforms several state-of-the-art methods.

12:10-12:30, Paper TuAT4.6
Compact Multi-Scale Periocular Recognition Using SAFE Features
Alonso-Fernandez, Fernando	Halmstad Univ
Mikaelyan, Anna	Halmstad Univ
Bigun, Josef	Halmstad Univ
Keywords: Other Biometric applications, Face recognition, Iris recognition Abstract: In this paper, we present a new approach for periocular recognition based on the Symmetry Assessment by Feature Expansion (SAFE) descriptor, which encodes the presence of various symmetric curve families around image key points. We use the sclera center as single key point for feature extraction, highlighting the object-like identity properties that concentrates to this unique point of the eye. As it is demonstrated, such discriminative properties can be encoded with a reduced set of symmetric curves. Experiments are done with a database of periocular images captured with a digital camera. We test our system against reference periocular features, achieving top performance with a considerably smaller feature vector (given by the use of a single key point). All the systems tested also show a nearly steady correlation between acquisition distance and performance, and they are also able to cope well when enrolment and test images are not captured at the same distance. Fusion experiments among the available systems are also provided.


TuAT5	Maya T2.C
TuAMO5	Oral Session

10:30-10:50, Paper TuAT5.1
Ultrasound Image Analysis for Myopathy Detection: Relating Muscle Image Biomarkers to Severity of Disease
Billings, Seth	Johns Hopkins Univ. Applied Physics Lab
Albayda, Myma	Johns Hopkins School of Medicine
Burlina, Philippe	Johns Hopkins Univ. Applied Physics Lab
Keywords: Computer-aided detection and diagnosis, Biological image and signal analysis, Representation and analysis in pixel/voxel images Abstract: This study focuses on using ultrasound (US) biomarkers for characterizing myopathies and in particular myositis. US offers an opportunity to deliver diagnostics in clinical settings at a fraction of the cost and discomfort entailed in current workflows. US is also better suited for usage in under-resourced environments. This paper is focused on studying the link between biomarkers related to absolute and relative echo intensity of muscle tissue and the presence and severity of myositis disease. We show that there is good correlation between these biomarkers and the severity of muscle disease rated by the Heckmatt criteria. A moderate correlation is also found between these biomarkers and muscles categorized by healthy vs. diseased status of each patient. Experimental data involving 37 patients (9 polymyositis, 3 dermatomyositis, 9 inclusion body myositis, and 16 healthy patients) and seven muscle groups show correlations up to 0.91.

10:50-11:10, Paper TuAT5.2
Hybrid Deep Learning for Reflectance Confocal Microscopy Skin Images
Kaur, Parneet	Rutgers Univ
Dana, Kristin	Rutgers Univ
Cula, Gabriela Oana	Johnson and Johnson
Mack, Catherine	Johnson and Johnson
Keywords: Medical image and signal analysis, Biological image and signal analysis Abstract: Reflectance Confocal Microscopy (RCM) is used for evaluation of human skin disorders and the effects of skin treatments by imaging the skin layers at different depths. Traditionally, clinical experts manually categorize the images captured into different skin layers. This time-consuming labeling task impedes the convenient analysis of skin image datasets. In recent automated image recognition tasks, deep learning with convolutional neural nets (CNN) has achieved remarkable results. However in many clinical settings, training data is often limited and insufficient for CNN training. For recognition of RCM skin images, we demonstrate that a CNN trained on a moderate size dataset leads to low accuracy. We introduce a hybrid deep learning approach which uses traditional texton-based feature vectors as input to train a deep neural network. This hybrid method uses fixed filters in the input layer instead of tuned filters, yet superior performance is achieved. Our dataset consists of 1500 images from 15 RCM stacks belonging to six different categories of skin layers. We show that our hybrid deep learning approach performs with a test accuracy of 82% compared with 51% for CNN. We also compare the results with additional proposed methods for RCM image recognition and show improved accuracy.

11:10-11:30, Paper TuAT5.3
Breast Segmentation Using Fuzzy C-Means and Anatomical Priors in DCE-MRI
Marrone, Stefano	Univ. of Naples Federico II
Piantadosi, Gabriele	Federico II Di Napoli
Fusco, Roberta	Univ. of Naples Federico II
Petrillo, Antonella	National Cancer Inst. of Naples ‘Fondazione Pascale’
Sansone, Mario	Univ. of Naples Federico II
Sansone, Carlo	Univ. of Naples Federico II
Keywords: Medical image and signal analysis, Computer-aided detection and diagnosis Abstract: Dynamic Contrast Enhanced - Magnetic Resonance Imaging (DCE-MRI) is gaining popularity as complementary diagnostic tool for breast cancer. In a typical Computer Aided Detection (CAD) processing, the identification and segmentation of the breast parenchyma is a crucial stage aimed to reduce computational effort and increase reliability, by reducing the number of voxels to analyse and removing foreign tissues and air. The aim of this work is to propose a fully-automated geometrical- based breast-mask extraction method in DCE-MRI, that combines three 2D Fuzzy C-Means clustering and geometrical breast anatomy characterization. In particular, seven well defined key- points have been considered in order to accurately segment breast parenchyma from air and chest-wall. The proposed approach has been validated on 30 DCE-MRI studies. The median segmentation accuracy and Dice similarity index were 97.86 (±0.49) and 92.66 (±1.48) respectively with p < 0.05, and 100% of neoplastic lesion coverage.

11:30-11:50, Paper TuAT5.4
Severity Grading of Psoriatic Plaques Using Deep CNN Based Multi-Task Learning
Pal, Anabik	Indian Statistical Inst
Chaturvedi, Akshay	Indian Statistical Inst
Garain, Utpal	Indian Statistical Inst
Chandra, Aditi	Indian Statistical Inst
Chatterjee, Raghunath	Indian Statistical Unit
Keywords: Computer-aided detection and diagnosis, Medical image and signal analysis, Classification and clustering Abstract: This paper addresses the problem of automatic machine analysis based severity scoring of psoriasis skin disease. Three different disease parameters namely, erythema, scaling and induration are considered for such severity grading. Given an image containing a psoriatic plaque the task is to predict severity scores for all the three parameters. This paper presents a novel deep CNN based architecture for achieving the task. Apart from viewing this task as three different single task learning (STL) problems (i.e. three different classification problems), a new multi-task learning (MTL) is also presented where the three classification tasks are treated as interdependent and thereby the neural net is trained accordingly. A new annotated data set consisting of seven hundred and seven (707) images has been constructed on which the performance of the severity scoring algorithms have been reported. Several competing baselines are considered to compare the performance of STL and MTL approaches. Experimental result shows that the deep CNN based architectures (both the STL and MTL) achieve promising performances, MTL producing slightly superior results to that of STL.

11:50-12:10, Paper TuAT5.5
Automatic Cutting Plane Identification for Computer-Aided Analysis of Intracranial Aneurysms
Jerman, Tim	Univ. of Ljubljana
Likar, Bostjan	Univ. of Ljubljana
Pernus, Franjo	Univ. of Ljubljana
Spiclin, Ziga	Univ. of Ljubljana
Chien, Aichi	Univ. of California, Los Angeles Medical School
Keywords: Computer-aided detection and diagnosis, Medical image and signal analysis, Modeling, simulation and visualization Abstract: Clinical studies have established the importance of morphologic measurements of intracranial aneurysm size, neck width, aspect ratio and other shape indices for the assessment of the risk of rupture and selection of the best treatment option. Obtaining these morphologic measurements requires segmentation of vascular structures in an angiographic image, reconstruction of a 3D vascular surface mesh, and isolation of the aneurysm surface mesh from parent vessels. High variability of aneurysm shapes appearing at various anatomical locations renders isolation of the aneurysm a critical step, which, if performed poorly, may adversely impact the morphologic measurements and possibly corrupt important information used for clinical assessment of the aneurysm. Previous quantitative validation of isolation methods showed that manual cutting plane based isolation is more accurate than isolation by state-of-the-art automatic methods. In this paper, we propose and quantitatively validate a novel method for automated cutting plane identification and aneurysm isolation based solely on analysis of the vascular surface mesh.

12:10-12:30, Paper TuAT5.6
Evaluation of Feature Descriptors for Cancerous Tissue Recognition
Stanitsas, Panagiotis	Univ. of Minnesota
Cherian, Anoop	Australian National Univ
Li, Xinyan	Univ. of Minnesota
Truskinovsky, Alexandeer	Roswell Park Cancer Inst
Morellas, Vassilios	Univ. of Minnesota
Papanikolopoulos, Nikos	Univ. of Minnesota
Keywords: Computer-aided detection and diagnosis, Medical image and signal analysis, Segmentation, features and descriptors Abstract: Computer-Aided Diagnosis (CAD) has witnessed a rapid growth over the past decade, providing a variety of automated tools for the analysis of medical images. In surgical pathology, such tools enhance the diagnosing capabilities of pathologists by allowing them to review and diagnose a larger number of cases daily. Geared towards developing such tools, the main goal of this paper is to identify useful computer vision based feature descriptors for recognizing cancerous tissues in histopathologic images. To this end, we use images of Hematoxylin & Eosin-stained microscopic sections of breast and prostate carcinomas, and myometrial leiomyosarcomas, and provide an exhaustive evaluation of several state of the art feature representations for this task. Among the various image descriptors that we chose to compare, including representations based on convolutional neural networks, Fisher vectors, and sparse codes, we found that working with covariance based descriptors shows superior performance on all three types of cancer considered. While covariance descriptors are known to be effective for texture recognition, it is the first time that they are demonstrated to be useful for the proposed task and evaluated against deep learning models. Our experiments show that using covariance based descriptors lead to 92.83%, 91.51%, and 98.10% classification accuracy for the recognition of breast carcinomas, prostate carcinomas, and myometrial leiomyosarcomas, respectively.


Tu2PL	G.Cancun T1
Ricardo Baeza-Yates	Plenary Session

14:00-15:00, Paper Tu2PL.1
Visual Congruent Ads for Image Search (I)
Baeza-Yates, Ricardo	UPF
Keywords: Image and video analysis and understanding Abstract: The quality of user experience online is affected by the relevance and placement of advertisements. We propose a new system for selecting and displaying visual advertisements in image search result sets. Our method compares the visual similarity of candidate ads to the image search results and selects the most visually similar ad to be displayed. The method further selects an appropriate location in the displayed image grid to minimize the perceptual visual differences between the ad and its neighbors. We conduct an experiment with about 900 users and find that our proposed method provides significant improvement in the users' overall satisfaction with the image search experience, without diminishing the users' ability to see the ad or recall the advertised brand.


TuPT1	Poster Session Hall
TuP1	Poster Session

15:00-17:10, Paper TuPT1.1
Reducing the Computational Cost of Shape Matching with the Distance Set
Iwata, Kazunori	Hiroshima City Univ
Keywords: Statistical, syntactic and structural pattern recognition Abstract: The distance set is known to be a versatile local descriptor of shape. As this is simply a set of ordinary distances between sample points on a shape, it is easy to construct and use. More importantly, it remains invariant under many settings and deformations, unlike other typical descriptors. However, in shape matching with distance sets, there is a tradeoff between performance and computational feasibility. In this paper, we present a new descriptor by improving the choice and order of elements in the distance set. We show that our descriptor is more efficient for shape matching from the viewpoint of computer algorithms. Additionally, we demonstrate that, although our descriptor runs more quickly in practice, it is equivalent to the original distance set in terms of shape retrieval.

15:00-17:10, Paper TuPT1.2
Convex Optimization Approach for Multi-Label Feature Selection Based on Mutual Information
Lim, Hyunki	Chung-Ang Univ
Kim, Dae-Won	Chung-Ang Univ
Keywords: Classification and clustering, Dimensionality reduction and manifold learning Abstract: We propose a convex optimization approach for multi-label feature selection. The effective feature subset can be obtained through finding a global optima of a convex objective function for multi-label feature selection. However conventional greedy approaches are prone to suboptimal result. In this paper, the mathematical procedures and considerations for the optimization approach are presented for multi-label feature selection based on mutual information. We compared the proposed method with conventional greedy search based methods to show the potential of optimization based multi-label feature selection.

15:00-17:10, Paper TuPT1.3
Semi-Supervised Image Labelling Using Barycentric Graph Embeddings
Robles-Kelly, Antonio	NICTA
Wei, Ran	NICTA
Keywords: Statistical, syntactic and structural pattern recognition, Classification and clustering Abstract: Here, we turn our attention to barycentric embeddings and examine their utility for semi-supervised image labelling tasks. To this end, we view the pixels in the image as vertices in a graph and their pairwise affinities as weights of the edges between them. Abstracted in this manner, we can pose the semi-supervised labelling problem into a graph theoretic setting where the labels are assigned based upon the distance in the embedding space between the nodes corresponding to the unlabelled pixels and those whose labels are in hand. We do this using a barycentric embedding approach which naturally leads to a setting in which the embedding coordinates can be computed by solving a system of linear equations. Moreover, the method presented here can incorporate side information such as that delivered by colour priors used elsewhere in the literature for semi-supervised colour image labelling. We illustrate the utility of our method for both, colour and hyperspectal image labelling and compare our results against other techniques elsewhere in literature.

15:00-17:10, Paper TuPT1.4
Measuring Regularity of Network Patterns by Grid Approximations Using the LLL Algorithm
Hajdu, Andras	Univ. of Debrecen, Hungary
Harangi, Balazs	Univ. of Debrecen
Besenczi, Renátó	Univ. of Debrecen
Lazar, Istvan	Univ. of Debrecen
Emri, Gabriella	Univ. of Debrecen
Hajdu, Lajos	Univ. of Debrecen
Tijdeman, Robert	Leiden Univ
Keywords: Statistical, syntactic and structural pattern recognition, Medical image and signal analysis, Computer-aided detection and diagnosis Abstract: In a recent work, we have proposed a novel way to approximate point sets with grids using the LLL algorithm, which operates in polynomial time. Now, we show how this approach can be applied to pattern recognition purposes with interpreting the rate of approximation as a new feature for regularity measurement. Our practical problem is the characterization of pigment networks in skin lesions. For this task we also introduce a novel image processing method for the extraction of the pigment network. Then, we show how our grid approximation framework can be applied with specializing it for the recognition of hexagonal patterns. The classification performance of our approach for the pigment network characterization problem is measured on a database annotated by a clinical expert. Throughout the paper we address several practical issues that may help to apply our general framework to other practical tasks, as well.

15:00-17:10, Paper TuPT1.5
Distance-Preserving Vector Space Embedding for the Closest String Problem
Nienkötter, Andreas	Univ. of Münster
Jiang, Xiaoyi	Univ. of Münster
Keywords: Statistical, syntactic and structural pattern recognition, Pattern Recognition for Bioinformatics Abstract: The closest string problem is a core problem in computational biology with applications in other fields like coding theory. Many algorithms exist to solve this problem, but due to its inherent high computational complexity (typically NP-hard), it can only be solved efficiently by restricting the search space to a specific range of parameters. Often, the run-time of these algorithms is exponential in the maximum distance between strings, restricting these solutions to very small distances. Recently, a prototype embedding method has been proposed to solve the similar generalized median problem for arbitrary objects. In this approach, objects are transformed into vector space using prototype embedding. The problem is solved in vector space and afterwards inversely transformed back into original space. This method has been successfully applied to generalized median computation in several domains where the computational complexity is inherently high. In this work, we apply prototype embedding to the closest string problem. We show that different embedding methods can result in a very good and fast approximation of the closest string, independent of the maximum distance and other parameters.

15:00-17:10, Paper TuPT1.6
Quantum Thermodynamics of Time Evolving Networks
Minello, Giorgia	Univ. CA' FOSCARI DI VENEZIA
Torsello, Andrea	Univ. Ca' Foscari Venezia
Hancock, Edwin	Univ. of York
Keywords: Statistical, syntactic and structural pattern recognition, Reinforcement learning and temporal models Abstract: In this paper, we present a novel thermodynamic framework for graphs that can be used to analyze time evolving networks, relating the thermodynamics variables to macroscopic changes in network topology, and linking major structural transition to phase changes in the thermodynamic picture. We start from a recent quantum-mechanical characterization of the structure of a network relating the graph Laplacian to a density operator and resulting in a characterization of the network's entropy. Then we adopt a Schrödinger picture of the dynamics of the network, resulting in an estimation of a hidden time-varying Hamiltonian from the data, from which we derive a measure of Energy exchange. From these variables, using the thermodynamic identity, we obtain temperature under the assumption of constant volume of the system. Evaluation of real-world data shows that the thermodynamic variables thus extracted are effective in detecting critical events occurring during network evolution.

15:00-17:10, Paper TuPT1.7
Taking into Account Stereoisomerism in the Prediction of Molecular Properties
Grenier, Pierre-Anthony	Univ. De Caen, CNRS UMR 6072, GREYC, ENSICAEN
Brun, Luc	ENSICAEN
Villemin, Didier	Cnrs Umr 6507 Lcmt, Ensicaen
Keywords: Statistical, syntactic and structural pattern recognition, Support vector machines and kernel methods, Pattern Recognition for Bioinformatics Abstract: The prediction of molecule's properties through Quantitative Structure Activity (resp. Property) Relationships are two active research fields named QSAR and QSPR. Within these frameworks Graph kernels allow to combine a natural encoding of a molecule by a graph with classical statistical tools such as SVM or kernel ridge regression. Unfortunately some molecules encoded by a same graph and differing only by the three dimensional orientation of their atoms in space have different properties. Such molecules are called stereoisomers. These latter properties can not be predicted by usual graph methods which do not encode stereoisomerism. In a previous paper, we proposed to encode the stereoisomerism property of each atom by a local subgraph, called minimal stereo subgraph, and we designed a kernel based on the comparison of bags of such subgraphs. However, the encoding of a molecule by a bag of subgraphs induces an important loss of information. In this paper, we propose a new kernel based both on the spatial relationships between minimal stereo subgraphs and the local neighbourhood of each minimal stereo subgraph within its supergraph. Our experiments show the benefits of taking into account such information.

15:00-17:10, Paper TuPT1.8
Similarity Measures for Title Matching
Gali, Najlah	Univ. of Eastern Finland
Mariescu-Istodor, Radu	Univ. of Eastern Finland
Fränti, Pasi	Univ. of Eastern Finland
Keywords: Statistical, syntactic and structural pattern recognition, Symbolic learning, Classification and clustering Abstract: In many web applications, users query a place name, a photo name, and other entity names using search words that include alternate spellings, abbreviations, and variants that are similar, but not identical to the title associated with the desired entity. Given two titles, an effective similarity measure should be able to determine whether the titles represent the same entity or not. In this paper, we evaluate 21 measures with the aim at detecting the most appropriate measure for matching the titles. Results show that Soft-TFIDF performs the best.

15:00-17:10, Paper TuPT1.9
Fast Kernel SVM Training Via Support Vector Identification
Mao, Xue	National Lab. of Pattern Recognition, Inst. of Automat
Fu, Zhouyu	Monash Univ
Wu, Ou	Inst. of Automation, CAS
Hu, Weiming	National Lab. of Pattern Recognition, Inst
Keywords: Support vector machines and kernel methods, Classification and clustering Abstract: Training kernel SVM on large datasets suffers from high computational complexity and requires a large amount of memory. However, a desirable property of SVM is that its decision function is solely determined by the support vectors, a subset of training examples with non-vanishing weights. This motivates a novel efficient algorithm for training kernel SVM via support vector identification. The efficient training algorithm involves two steps. In the first step, we randomly sample the training data without replacement several times, each time a small subset of training data is sampled. Then a kernel SVM is trained on each subset, and the resulting kernel SVM models are used to identify the support vectors on the margin. In the second step, an optimization problem is solved to estimate the Lagrange multipliers corresponding to these support vectors. After obtaining the support vectors and Lagrange multipliers, we can approximate the decision function of kernel SVM. Due to the cubic complexity of standard kernel SVM training algorithm, training many kernel SVMs on small subsets of training data is much more efficient than training a single kernel SVM on the whole training data especially for large datasets. Therefore, our algorithm has better scalability than kernel SVM. Besides, training SVMs on each subset can be done independently, and hence our algorithm can be easily parallelized for further speedup. Since our algorithm only identifies the support vectors on the margin, it produces less number of support vectors as compared to that produced by standard kernel SVM. This makes our algorithm more efficient in prediction too. Experimental results show that our method outperforms state-of-the-art methods and achieves performance on par with the kernel SVM albeit with much improved efficiency.

15:00-17:10, Paper TuPT1.10
Adapting Instance Weights for Unsupervised Domain Adaptation Using Quadratic Mutual Information and Subspace Learning
Khan, Mohammad Nazmul Alam	Oklahoma State Univ
Heisterkamp, Douglas	Oklahoma State Univ
Keywords: Transfer learning, Classification and clustering, 2D/3D object detection and recognition Abstract: Domain adaptation (DA) algorithms utilize a label-rich old dataset (domain) to build a machine learning model (classification, detection etc.) in a label-scarce new dataset with different data distribution. Recent approaches transform cross-domain data into a shared subspace by minimizing the shift between their marginal distributions. In this paper, we propose a novel iterative method to learn a common subspace based on non-parametric quadratic mutual information (QMI) between data and corresponding class labels. We extend a prior work of discriminative subspace learning based on maximization of QMI and integrate instance weighting into the QMI formulation. We propose an adaptive weighting model to identify relevant samples that share underlying similarity across domains and ignore irrelevant ones. Due to difficulty of applying cross-validation, an alternative strategy is integrated with the proposed algorithm to setup model parameters. A set of comprehensive experiments on benchmark datasets is conducted to prove the efficacy of our proposed framework over state-of-the-art approaches.

15:00-17:10, Paper TuPT1.11
A Simple Approach for Unsupervised Domain Adaptation
Guo, Xifeng	National Univ. of Defense Tech
Chen, Wei	National Univ. of Defense Tech
Yin, Jianping	National Univ. of Defense Tech
Keywords: Transfer learning, Machine learning and data mining, Support vector machines and kernel methods Abstract: Domain adaptation (DA) aims to eliminate the difference between the distribution of labeled source domain on which a classifier is trained and that of unlabeled or partly labeled target domain to which the classifier is to be applied. Compared with the semi-supervised domain adaptation where some labeled data from target domain is utilized to help train the classifier, the unsupervised domain adaptation where no labels can be seen from the target domain is without doubt more challenging. Most published approaches suffer from high complexity of design or implementation. In this paper, we propose a simple method for unsupervised domain adaptation which minimizes domain shift by projecting each instance from source and target domains into a common feature space using a linear kernel function. Our method is extremely simple without hyper-parameters (it can be implemented in two lines of Matlab code) but still outperforms the state-of-the-art domain adaptation approaches on standard benchmark datasets.

15:00-17:10, Paper TuPT1.12
Multi-Task Learning for One Class Svm with Additional New Features
Xue, Yongjian	Univ. of Tech. of Troyes
Beauseroy, Pierre	Inst. Charles Delaunay
Keywords: Transfer learning, Support vector machines and kernel methods, Machine learning and data mining Abstract: In real applications of one class classification, new features may be added due to some practical or technical reason. While lacking of representative samples for the new features, multi-task learning idea could be used to bring some information from the former learning model. Based on the above assumption, a new multi-task learning approach is proposed to deal with the training of the updated system when adding new measurements. In the model, a parameter is introduced to control the information needed from the former model and an heuristic search method is also established to get a corresponding proper value. Experiments conducted on toy data and real data set show that the new method could decrease the probability of false positive rapidly, while keeping the probability of false negative approximately stable as the number of samples for new introduced features increases.

15:00-17:10, Paper TuPT1.13
Semantic-Free Attributes for Image Classification
Oliveau, Quentin	Télécom ParisTech
Sahbi, Hichem	CNRS, TELECOM ParisTech
Keywords: Classification and clustering Abstract: Attributes are defined as mid-level image characteristics shared among different categories. These characteristics are suitable in order to handle classification problems especially when training data are scarce. In this paper, we design discriminative real-valued attributes by learning nonlinear inductive maps. Our method is based on solving a constrained optimization problem that mixes three criteria; the first one aims to predict real-valued attributes with a high precision while the second criterion maximizes their discrimination power. We also consider a particular smoothness term that provides gradual representations for data belonging to the same categories, resulting into highly discriminant and predictable attributes. Experiments conducted for the particular task of fine-grained image classification - with relatively small training sets - show that our attribute design approach is very competitive and outperforms related attribute learning methods.

15:00-17:10, Paper TuPT1.14
Co-Regularized Kernel K-Means for Multi-View Clustering
Ye, Yongkai	National Univ. of Defense Tech
Liu, Xinwang	Computer School, National Univ. of Defense Tech
Yin, Jianping	National Univ. of Defense Tech
Zhu, En	National Univ. of Defense Tech
Keywords: Classification and clustering Abstract: In clustering applications, multiple views of the data are often available. Although clustering could be done within each view independently, exploiting information across views is promising to gain clustering accuracy improvement. A common assumption in the field of multi-view learning is that the clustering results from multiple views should be consistent with a latent clustering. However, the potential noise among some views would make this assumption difficult to be satisfied, which finally hurts the clustering performance. To address this issue, we propose a novel clustering algorithm where the intrinsic clustering is found by maximizing the sum of weighted similarities between clusterings of different views. Weights that indicate the qualities of views are learned simultaneously along with the latent clustering and clusterings of different views. A three-step alternative algorithm is designed to solve the problem efficiently. Empirical comparisons with a number of baselines on various datasets confirm the efficacy of our approach.

15:00-17:10, Paper TuPT1.15
Latent Model Ensemble with Auto-Localization
Sun, Miao	Univ. of Missouri
Han, Tony	Univ. of Missouri
Xun, Xu	Sony Electronics
Keywords: Classification and clustering, Deep learning Abstract: Deep Convolutional Neural Networks (CNN) have exhibited superior performance in many visual recognition tasks including image classification, object detection, and scene labeling, due to their large learning capacity and resistance to overfit. For the image classification task, most of the current deep CNN-based approaches take the whole size-normalized image as input and have achieved quite promising results. Compared with the previously dominating approaches based on feature extraction, pooling, and classification, the deep CNN-based approaches mainly rely on the learning capability of deep CNN to achieve superior results: the burden of minimizing intra-class variation while maximizing inter-class difference is entirely dependent on the implicit feature learning component of deep CNN; we rely upon the implicitly learned filters and pooling component to select the discriminative regions, which correspond to the activated neurons. However, if the irrelevant regions constitute a large portion of the image of interest, the classification performance of the deep CNN, which takes the whole image as input, can be heavily affected. To solve this issue, we propose a novel latent CNN framework, which treats the most discriminate region as a latent variable. We can jointly learn the global CNN with the latent CNN to avoid the aforementioned big irrelevant region issue, and our experimental results show the evident advantage of the proposed latent CNN over traditional deep CNN: latent CNN outperforms the state-of-the-art performance of deep CNN on standard benchmark datasets including the CIFAR-10, CIFAR-100, MNIST and PASCAL VOC 2007 Classification dataset.

15:00-17:10, Paper TuPT1.16
Discovering Characteristic Landmarks on Ancient Coins Using Convolutional Networks
Kim, Jongpil	Rutgers, the State Univ. of New Jersey
Pavlovic, Vladimir	Rutgers Univ
Keywords: Classification and clustering, Deep learning Abstract: We propose a novel method to find characteristic landmarks and recognize ancient Roman imperial coins using deep convolutional neural networks (CNNs) combined with expert-designed domain hierarchies. We first propose a new framework to recognize the Roman coin which exploits the hierarchical knowledge structure embedded in the coin domain, which we combine with the CNN-based category classifiers. We next formulate an optimization problem to discover class-specific salient coin regions. Analysis of discovered salient regions confirms that they are largely consistent with human expert annotations. Experimental results show that the proposed framework is able to effectively recognize the ancient Roman coins as well as successfully identify landmarks in a general fine-grained classification problem. For this research, we have collected a new Roman coin dataset where all coins are annotated and consist of obverse (head) and reverse (tail) images.

15:00-17:10, Paper TuPT1.17
PLSNet: A Simple Network Using Partial Least Squares Regression for Image Classification
Hasegawa, Ryoma	Meijo Univ
Hotta, Kazuhiro	Meijo Univ
Keywords: Classification and clustering, Deep learning Abstract: PCANet is a simple network using Principal Component Analysis (PCA) for image classification and obtained high accuracies on a variety of datasets. PCA projects explanatory variables on a subspace that the first component has the largest variance. On the other hand, Partial Least Squares (PLS) regression projects explanatory variables on a subspace that the first component has the largest covariance between explanatory and objective variables, and the objective variables are predicted from the subspace. If class labels are used as objective variables for PLS, the subspace is suitable for classification. Stacked PLS is a simple network using PLS for image classification and obtained high accuracy on the MNIST database. However, the performance of Stacked PLS was inferior to PCANet on the others. One of differences between Stacked PLS and PCANet is network architecture. In this paper, we combine the network architecture of PCANet with PLS and propose a new image classification method called PLSNet. It obtained higher accuracies than PCANet on the MNIST and the CIFAR-10 datasets. Furthermore, we change how to make filters for extracting features at the second convolution layer, and we call it Improved PLSNet. It obtained higher accuracies than PLSNet. In addition, we give it deeper network architecture, and we call it Deep Improved PLSNet. It obtained higher accuracies than Improved PLSNet.

15:00-17:10, Paper TuPT1.18
Simultaneous Dual-Views Reconstruction with Adaptive Dictionary and Low-Rank Representation
Yi, Shuangyan	Harbin Inst. of Tech. Shenzhen Graduate School
He, Zhenyu	Harbin Inst. of Tech. Shenzhen Graduate School
Li, Yi	Harbin Inst. of Tech. Shenzhen Graduate School
Cheung, Yiu-ming	Hong Kong Baptist Univ
Chen, Wen-Sheng	Shenzhen Univ
Keywords: Classification and clustering, Dimensionality reduction and manifold learning Abstract: Low-Rank Representation (LRR) is an effective self-expressiveness method, which uses the observed data itself as the dictionary to reconstruct the original data. LRR focuses on representing the global low-dimensional information, but ignores the real fact that data often resides on low-dimensional manifolds embedded in a high-dimensional data. Therefore, LRR can not capture the non-linear geometric structures within data. As well known, locality preserving projections (LPP) is able to preserve the intrinsic geometry structure embedded in high-dimensional data. To this end, we treat the projected data by LPP as an adaptive dictionary, and such a dictionary can capture the intrinsic geometry structures of data. In this way, our method is in favor of the global low-rank representation. Specifically, the proposed method provides a way to reconstruct the original data from two views, and hence we call this proposed method as Simultaneous Dual-Views Reconstruction with Adaptive Dictionary and Low-Rank Representation. The proposed method can be used for unsupervised feature extraction and subspace clustering. Experiments on benchmark databases show the excellent performance of this proposed method in comparison with other state-of-the-art methods.

15:00-17:10, Paper TuPT1.19
Multi-Label Classification with Meta-Label-Specific Features
Sun, Lu	Hokkaido Univ
Kudo, Mineichi	Hokkaido Univ
Kimura, Keigo	Hokkaido Univ
Keywords: Classification and clustering, Dimensionality reduction and manifold learning, Machine learning and data mining Abstract: Multi-label classification has attracted many attentions in various fields, such as text categorization and semantic image annotation. Aiming to classify an instance into multiple labels, various multi-label classification methods have been proposed. However, the existing methods typically build models in the identical feature (sub)space for all labels, possibly inconsistent with real-world problems. In this paper, we develop a novel method based on the assumption that meta-labels with specific features exist in the scenario of multi-label classification. The proposed method consists of meta-label learning and specific feature selection. Experiments on twelve benchmark multi-label datasets show the efficiency of the proposed method compared with several state-of-the-art methods.

15:00-17:10, Paper TuPT1.20
Bus Trajectory Identification by Map-Matching
Raymond, Rudy	IBM Res. - Tokyo
Imamichi, Takashi	IBM Brazil
Keywords: Classification and clustering, Dimensionality reduction and manifold learning, Machine learning and data mining Abstract: We study the problem of identifying vehicle trajectories from the sequences of noisy geospatial-temporal datasets. Nowadays we witness the accumulation of vehicle trajectory datasets in the form of the sequences of GPS points. However, in many cases the sequences of GPS points are sparse and noisy so that identifying the actual trajectories of vehicles is hard. Although there are many advanced map-matching techniques claiming to achieve high accuracy to deal with the problem, only few public datasets that come with ground truth trajectories for supporting the claims. On the other hand, some cities are releasing their bus datasets for real-time monitoring and analytics. Since buses are expected to run on predefined routes, such datasets are highly valuable for map-matching and other pattern recognition applications. Nevertheless, some buses in reality appear not following their predefined routes and behave anomalously. We propose a simple and robust technique based on the combination of map-matching, bag-of-roads, and dimensionality reduction for their route identification. Experiments on datasets of buses in the city of Rio de Janeiro confirm the high accuracy of our method.

15:00-17:10, Paper TuPT1.21
Two-Dimensional PCA Hashing
Mao, Minqi	Zhejiang Normal Univ
Zheng, Zl	UCM
Chen, Zy	ZJNU
Ye, Rh	ZJNU
He, Xw	ZJNU
Keywords: Classification and clustering, Dimensionality reduction and manifold learning, Machine learning and data mining Abstract: Recently, hashing algorithm catches amounts of sight in the field of machine learning. Most existing hashing methods directly utilize a vector, which can be piped by an image matrix, as unit and adopt some projection functions to project the original data into several dimensions of real values. Then each of these projected real values is quantized or hashed into zero-one bit by thresholding, such as locality sensitive hashing (LSH) and principal component analysis hashing (PCAH). However, the plain elongation of image may cause the curse of dimensionality. In this paper, different with PCAH method, a two dimensional (2D) image is directly used to feature extraction by two-dimensional principal component analysis (2DPCA) and 2DPCAH performs hashing using these 2DPCA-projected data. Furthermore, starting with 2DPCA-projected data, we apply iterative quantization technology, which aims to finding a rotation matrix data so as to minimize the quantization error of mapping these data to a zero-centered binary hypercube. The experimental results indicate that 2DPCAH, 2DPCA-RR and 2DPCA-ITQ are competitive compared with traditional PCAH and some other classic methods.

15:00-17:10, Paper TuPT1.22
Sampling Based Approximate Spectral Clustering Ensemble for Partitioning Datasets
Moazzen, Yaser	Istanbul Tech. Univ
Taşdemir, Kadim	Antalya International Univ
Keywords: Classification and clustering, Semi-supervised learning and spectral methods Abstract: Spectral clustering is able to extract clusters with various characteristics without a parametric model, however it is infeasible for large datasets due to its high computational cost and memory requirement. Approximate spectral clustering (ASC) addresses this challenge by a representative-based partitioning approach which first finds a set of data representatives either by sampling or quantization, then applies spectral clustering on them. To achieve an optimal partitioning with ASC, several sampling or quantization methods together with advanced similarity criteria have been recently proposed. While quantization is more accurate than sampling in expense of heavy computation, and geodesic based hybrid similarity criteria are often more informative than others, there is no unique solution optimum for all datasets. Alternatively, we propose to use ensemble learning to produce a consensus partitioning constructed from different set of representatives and similarity criteria. The proposed ensemble (SASCE) not only produces a relatively more accurate partitioning but also eliminates the need to determine the best pair (the optimum set of representatives and the optimum similarity). Thanks to the efficient similarity definition on the representative level, the SASCE can be powerful for clustering small and medium datasets, outperforming traditional clustering approaches and their ensembles.

15:00-17:10, Paper TuPT1.23
Integrating Hidden Markov Models Based on Mixture-Of-Templates and K-NN2 Ensemble for Activity Recognition
Kim, Yong-Joong	POSTECH
Kim, Yonghyun	POSTECH
Ahn, Juhyun	Pohang Univ. of Science and Tech
Kim, Daijin	POSTECH
Keywords: Classification and clustering, Signal, image and video processing, Other applications Abstract: This paper considers the activity recognition problem using inertial sensor data. It is a challenging temporal pattern recognition problem as the sensor data can be easily mixed with noise and also has large intra-class variation, resulting from different characteristic of people doing same activity, and inter-class similarity among several similar activities. To handle these problems concentrating on the classification method, this paper proposes a novel ensemble scheme of hidden Markov models for activity recognition. To improve the performance of activity recognition, our method models the outputs of multiple hidden Markov models by using enhanced template-based classifier fusion method, in which multiple local templates are generated as Mixture-of-Templates, and k-NN2 ensemble method is proposed to elaborately recognize activities. To show the effectiveness of the proposed method, we carried out several experiments on UCI Human Activity Recognition dataset and compared our method with several alternative methods. As a result, our method outperforms other methods.

15:00-17:10, Paper TuPT1.24
Weighted K-Nearest Neighbor Revisited
Bicego, Manuele	Univ. of Verona
Loog, Marco	Delft Univ. of Tech. / Univ. of Copenhagen
Keywords: Classification and clustering, Statistical, syntactic and structural pattern recognition Abstract: In this paper we show that weighted K-Nearest Neighbor, a variation of the classic K-Nearest Neighbor, can be reinterpreted from a classifier combining perspective, specifically as a fixed combiner rule, the sum rule. Subsequently, we experimentally demonstrate that it can be rather beneficial to consider other combining schemes as well. In particular, we focus on trained combiners and illustrate the positive effect these can have on classification performance.

15:00-17:10, Paper TuPT1.25
A New Geometrical Approach for Solving the Supervised Pattern Recognition Problem
Valev, Ventzeslav	Inst. of Mathematics and Informatics, Bulgarian Acad. of S
Yanev, Nicola	Univ. of Sofia and Inst. of Mathematics and Informatics
Krzyzak, Adam	-Concordia Univ
Keywords: Classification and clustering, Statistical, syntactic and structural pattern recognition, Machine learning and data mining Abstract: This paper explores the supervised pattern recognition problem based on feature partitioning. This formulation leads to a new problem in computational geometry. The supervised pattern recognition problem is formulated as an heuristic good clique cover problem satisfying the k-nearest neighbors rule. First it is applied a heuristic algorithm for partitioning a graph into a minimal number of cliques. Next cliques are merged using the k-nearest neighbors rule. An important advantage of this approach is the decomposition of a problem involving l classes into l optimization problems involving a single class. The computational complexity of the method, computational procedures, and classification rules are discussed. A geometrical interpretation of the solution is also given. Using the proposed approach, the geometrical structure of the training set is utilized in the best possible way.

15:00-17:10, Paper TuPT1.26
Hybrid Markov Blanket Discovery
Gao, Tian	Rensselaer Pol. Inst
Ji, Qiang	RPI
Keywords: Machine learning and data mining, Model selection Abstract: In a Bayesian Network (BN), a target node is independent of all other nodes given its Markov Blanket (MB). By finding the MB, many problem can be solved directly or indirectly. There exist predominately two different approaches to finding the MB: the score-based and the constraint-based algorithms. We introduce a new Markov Blanket learning algorithm, Hybrid Markov Blanket (HMB) discovery, by combining these two different approaches. Specifically, HMB first employs a score-based method for finding the parents and children (PC) of the target node. HMB then introduces an efficient constraint-based approach to finding target node's spouses without enforcing the symmetry constraint that is required by existing constraint-based methods. In comparison, HMB achieves a better accuracy than the traditional constraint-based approaches and a better efficiency than the existing score-based approaches. In addition, HMB is theoretically proven sound and complete. Empirical results on synthetic and standard MB discovery datasets demonstrate the superior performance of HMB.

15:00-17:10, Paper TuPT1.27
Online Discriminant Projective Non-Negative Matrix Factorization
Liao, Qing	The Hong Kong Univ. of Science and Tech.
Zhang, Qian	The Hong Kong Univ. of Science and Tech.

15:00-17:10, Paper TuPT1.28
A Robust UAV Landing Site Detection System Using Mid-Level Discriminative Patches
Guo, Xufeng	Queensland Univ. of Tech
Denman, Simon	Queensland Univ. of Tech
Fookes, Clinton	Queensland Univ. of Tech
Sridha, Sridharan	Queensland Univ. of Tech
Keywords: Machine learning and data mining, Scene understanding, Motion, tracking and video analysis Abstract: The forced landing problem has become one of the main impediments to UAV's entering civilian airspace. Unfortunately there is no robust forced landing site detection system that will reliably detect a safe landing site. One of the main reasons for this is the difficulty in considering the various classes of surface, to determine whether they are safe or not. We propose a robust UAV landing site detection system using mid-level discriminative patches. The training and tuning process uses a dataset containing 1600 randomly selected Google map images with weak labels. We then show how the output from multiple mid-level discriminative patch detectors can be combined to indicate the level or danger for a given region. The proposed technique reliably detects safe landing areas in UAV imagery, and achieves improved performance over the state-of-the art. The proposed system outperforms the baseline system by 29.4% for completeness and 33.9% for correctness, and is invariant to the changes of illumination, sharpness and resolution of images.

15:00-17:10, Paper TuPT1.29
Multi-Stage Multi-Task Feature Learning Via Adaptive Threshold
Fan, Ya-Ru	Univ. of Electronic Science and Tech. of China
Wang, Yilun	Univ. of Electronic Science and Tech. of China
Huang, Ting-Zhu	Univ. of Electronic Science and Tech. of China
Keywords: Model selection, Machine learning and data mining, Classification and clustering Abstract: Multi-task feature learning aims to identify the shared features among tasks to improve generalization. Recent works have shown that the non-convex learning model often returns a better solution than the convex alternatives. Thus a non-convex model based on the capped-ell_{1},ell_{1} regularization was proposed in cite{Gong2013}, and the corresponding efficient multi-stage multi-task feature learning algorithm (MSMTFL) was presented. However, this method harnesses a fixed threshold in the capped-ell_{1},ell_{1} regularization. The lack of adaptivity might result in suboptimal practical performance. In this paper we propose to employ an adaptive threshold in the capped-ell_{1},ell_{1} regularized formulation, and the corresponding variant of MSMTFL will incorporate an additional scheme to adaptively determine the threshold. Considering that this threshold aims to distinguish true nonzero components of large magnitude from others, the heuristic of detecting the ``first significant jump" proposed in cite{Wang2010} is applied here to adaptively determine its value. The preliminary theoretical analysis is provided to guarantee the feasibility of the proposed method. Several numerical experiments demonstrate the proposed method outperforms existing state-of-the-art feature learning approaches.

15:00-17:10, Paper TuPT1.30
Enhancing Label Inference Algorithms Considering Vertex Importance in Graph-Based Semi-Supervised Learning
Oh, Byonghwa	Hyundai Card Corp
Yang, Jihoon	Sogang Univ
Keywords: Semi-supervised learning and spectral methods, Classification and clustering, Machine learning and data mining Abstract: Graph-based semi-supervised learning has recently come into focus for to its two defining phases: graph construction, which converts the data into a graph, and label inference, which predicts the appropriate labels for unlabeled data using the constructed graph. And the label inference is based on the smoothness assumption of semi-supervised learning. In this study, we propose an enhanced label inference approach which incorporates the importance of each vertex into the existing inference algorithms to improve the prediction capabilities of the algorithms. We also present extensions of three algorithms which are capable of taking the vertex importance variable to apply in learning. Experiments show that our algorithms perform better than the base algorithms on a variety of datasets, especially when the data is less smooth over the graphs.

15:00-17:10, Paper TuPT1.31
Optimistic Semi-Supervised Least Squares Classification
Krijthe, Jesse Hendrik	Delft Univ. of Tech
Loog, Marco	Delft Univ. of Tech. / Univ. of Copenhagen
Keywords: Semi-supervised learning and spectral methods, Classification and clustering, Statistical, syntactic and structural pattern recognition Abstract: The goal of semi-supervised learning is to improve supervised classifiers by using additional unlabeled training examples. In this work we study a simple self-learning approach to semi-supervised learning applied to the least squares classifier. We show that a soft-label and a hard-label variant of self-learning can be derived by applying block coordinate descent to two related but slightly different objective functions. The resulting soft-label approach is related to an idea about dealing with missing data that dates back to the 1930s. We show that the soft-label variant typically outperforms the hard-label variant on benchmark datasets and partially explain this behaviour by studying the relative difficulty of finding good local minima for the corresponding objective functions.

15:00-17:10, Paper TuPT1.32
Constrained Local and Global Consistency for Semi-Supervised Learning
de Sousa, Celso	Univ. De São Paulo
Batista, Gustavo	Univ. De São Paulo
Keywords: Semi-supervised learning and spectral methods, Machine learning and data mining Abstract: One of the widely used algorithms for graph-based semi-supervised learning (SSL) is the Local and Global Consistency (LGC). Such an algorithm can be viewed as a convex optimization problem that balances fitness on labeled examples and smoothness on the graph using a graph Laplacian. In this paper, we provide a novel graph-based SSL algorithm incorporating two normalization constraints into the regularization framework of LGC. We prove that our method has closed-form solution and generalizes two existing methods, being more flexible than the original ones. Through experiments on benchmark data sets, we show the effectiveness of our method, which consistently outperforms the competing methods.

15:00-17:10, Paper TuPT1.32
A Tight Convex Upper Bound on the Likelihood of a Finite Mixture
Mezuman, Elad	IBM Res
Weiss, Yair	Hebrew Univ
Keywords: Model selection Abstract: The likelihood function of a finite mixture model is a non-convex function with multiple local maxima and commonly used iterative algorithms such as EM will converge to different solutions depending on initial conditions. In this paper we ask: is it possible to assess how far we are from the global maximum of the likelihood? Since the likelihood of a finite mixture model can grow unboundedly by centering a Gaussian on a single datapoint and shrinking the covariance, we constrain the problem by assuming that the parameters of the individual models are members of a large discrete set (e.g. estimating a mixture of two Gaussians where the means and variances of both Gaussians are members of a set of a million possible means and variances). For this setting we show that a simple upper bound on the likelihood can be computed using convex optimization and we analyze conditions under which the bound is guaranteed to be tight. This bound can then be used to assess the quality of solutions found by EM (where the final result is projected on the discrete set) or any other mixture estimation algorithm. For any dataset our method allows us to find a finite mixture model together with a dataset-specific bound on how far the likelihood of this mixture is from the global optimum of the likelihood.

15:00-17:10, Paper TuPT1.33
Object-Aware Tracking
Bogun, Ivan	Florida Inst. of Tech
Ribeiro, Eraldo	Florida Inst. of Tech
Keywords: Semi-supervised learning and spectral methods, Support vector machines and kernel methods, Motion, tracking and video analysis Abstract: In this paper, we address the problem of visual tracking in videos without using a pre-learned model of the object. This type of model-free tracking is a hard problem because of limited information about the object, abrupt object motion, and shape deformation. We propose to integrate an object-agnostic prior, called objectness, which is designed to measure the likelihood of a given location to contain an object of any type, into structured tracking framework. Our objectness prior is based on image segmentation and edges; thus, it does not require training data. By extending a structured tracker with the prior, we introduce a new tracker which we call ObjStruck. We extensively evaluate our tracker on publicly available datasets and show that objectness prior improves tracking accuracy.

15:00-17:10, Paper TuPT1.34
Graph Edit Distance As a Quadratic Program
Bougleux, Sébastien	Normandie Univ. UNICAEN
Gaüzère, Benoit	Normandie Univ. INSA De Rouen, LITIS
Brun, Luc	ENSICAEN
Keywords: Statistical, syntactic and structural pattern recognition Abstract: The graph edit distance (GED) measures the amount of distortion needed to transform a graph into another graph. Such a distance, developed in the context of error-tolerant graph matching, is one of the most flexible tool used in structural pattern recognition. However, the computation of the exact GED is NP-complete. Hence several suboptimal solutions, such as the ones based on bipartite assignments with edition, have been proposed. In this paper we propose a binary quadratic programming problem whose global minimum corresponds to the exact GED. This problem is interpreted a quadratic assignment problem (QAP) where some constraints have been relaxed. This allows to adapt the integer projected fixed point algorithm, initially designed for the QAP, to efficiently compute an approximate GED by finding an interesting local minimum. Experiments show that our method remains quite close to the exact GED for datasets composed of small graphs, while keeping low execution times on datasets composed of larger graphs.

15:00-17:10, Paper TuPT1.35
Graph Model Boosting for Structural Data Recognition
Miyazaki, Tomo	Tohoku Univ
Omachi, Shinichiro	Tohoku Univ
Keywords: Statistical, syntactic and structural pattern recognition, Active and ensemble learning, Model selection Abstract: This paper presents a novel method for structural data recognition using a large number of graph models. Broadly, existing methods for structral data recognition have two crucial problems: 1) only a single model is used to capture structural variation, 2) naive classification rules are used, such as nearest neighbor method. In this paper, we propose to strengthen both capturing structural variation and the classification ability. The proposed method constructs a large number of graph models and trains decision tree classifiers with the models. There are two contributions of this paper. The first contribution is a novel graph model which can be constructed by straightforward calculation. This calculation enables us to construct many models in feasible time. The second contribution is a novel approach to capture structural variation. We construct a large number of our models in a boosting framework so that we can capture structural variation comprehensively. Consequently, we are able to perform structural data recognition with the powerful classification ability and comprehensive structural variation. In experiments, we show that the proposed method achieves significant results and outperforms the existing methods.

15:00-17:10, Paper TuPT1.36
Unsupervised Automatic Attribute Discovery Method Via Multi-Graph Clustering
Liu, Liangchen	Univ. of Queensland
Nie, Feiping	NWPU
Zhang, Teng	The Univ. of Queensland
Wiliem, Arnold	The Univ. of Queensland
Lovell, Brian Carrington	The Univ. of Queensland
Keywords: Scene understanding, Classification and clustering Abstract: Recently, various automated attribute discovery methods have been developed to discover useful visual attributes from a given set of images. Despite of the progress made, most of methods consider the supervised scenario which assumes the existence of labelled data. Recent results suggest that it is possible to discover attributes from a set of unlabelled data. In this work, we propose a novel unsupervised attribute discovery method utilising multi-graph approach that preserves both local neighbourhood structure as well as class separability. We employ multiple similarity graphs that encode various relationships between image exemplars. For evaluation, we first investigate the performance of the proposed approach to address a clustering task. Then we apply our proposed method to automatically discover visual attributes and compare with various automatic attribute discovery and hashing methods. The results show that our proposed method is able to improve the performance in the clustering task. Furthermore, when evaluated using the recent meaningfulness metric, the proposed method outperforms the other unsupervised attribute discovery methods.

15:00-17:10, Paper TuPT1.37
Context Aware Nonnegative Matrix Factorization Clustering
Tripodi, Rocco	Ca' Foscari Univ
Vascon, Sebastiano	Univ. Ca' Foscari of Venice
Pelillo, Marcello	Ca' Foscari Univ
Keywords: Classification and clustering, Face recognition, Document Understanding Abstract: In this article we propose a method to refine the clustering results obtained with the nonnegative matrix factorization (NMF) technique, imposing consistency constraints on the final labeling of the data. The research community focused its effort on the initialization and on the optimization part of this method, without paying attention to the final cluster assignments. We propose a game theoretic framework in which each object to be clustered is represented as a player, which has to choose its cluster membership. The information obtained with NMF is used to initialize the strategy space of the players and a weighted graph is used to model the interactions among the players. These interactions allow the players to choose a cluster which is coherent with the clusters chosen by similar players, a property which is not guaranteed by NMF, since it produces a soft clustering of the data. The results on common benchmarks show that our model is able to improve the performances of many NMF formulations.

15:00-17:10, Paper TuPT1.38
Finding Rigid Sub-Structure Patterns from 3D Point-Sets
Chen, Zihe	State Univ. of New York at Buffalo
Chen, Danyang	State Univ. of New York at Buffalo
Ding, Hu	State Univ. of New York at Buffalo
Huang, Ziyun	State Univ. of New York at Buffalo
Li, Zheshuo	State Univ. of New York at Buffalo
Sehgal, Nitasha	State Univ. of New York at Buffalo
Fritz, Andrew	State Univ. of New York at Buffalo
Berezney, Ronald	State Univ. of New York at Buffalo
Xu, Jinhui	State Univ. of New York at Buffalo
Keywords: Statistical, syntactic and structural pattern recognition, Pattern Recognition for Bioinformatics Abstract: In this paper, we study the following rigid sub-structure pattern reconstruction problem: given a set of n input structures (i.e. point-sets), partition each structure into k rigid sub-structures so that the nk rigid substructures can be grouped into k clusters with each of them containing exact one rigid substructure from every input structure and the total clustering cost is minimized, where the clustering cost of a cluster is the total distance between a pattern reconstructed for this cluster and every member rigid substructure. Different from most of the existing models for pattern reconstruction (where each input point-set is often treated as a single structure), our model views each input point-set as a collection of k rigid substructures, and aims to extract similar rigid substructures from each input point-set to form k rigid clusters. The problem is motivated by an interesting biological application for determining the topological structure of chromosomes inside the cell nucleus. We propose a highly effective and practical solution based on a number of new insights to pattern reconstruction, clustering, and motion detection. We validate our method on synthetic, biological and motion tracking data sets. Experimental results suggest that our approach yields a near optimal solution.


TuPT2	Poster Session Hall
TuP2	Poster Session

15:00-17:10, Paper TuPT2.1
Sequential Factorization for Nonrigid Structure from Motion Via LBFGS
Dong, Qiulei	Inst. of Automation, Chinese Acad. of Sciences
Hu, Hao	Shandong Univ
Keywords: 3D shape recovery Abstract: How to implement an effective factorization for nonrigid structure from motion(NRSFM) has attracted much attention in recent years. Addressing this problem, we propose a novel sequential factorization method without extra priors other than the basis low-rank prior, consisting of a motion estimation module and a 3D shape recovery module. In the motion estimation module, for improving the estimation accuracy, a novel objective function is designed for jointly pursuing the Euclidean corrective matrix and the shape coefficient matrix. And an iterative minimization algorithm is explored to solve the designed objective function based on the Limited-memory Broyden-Fletcher-Goldfarb-Shanno approach(LBFGS), naturally leading to the rotation matrix. In the 3D shape recovery module, a simple iterative algorithm is introduced for effectively computing the 3D deformable shapes with the estimated rotation matrix. The proposed extra-prior-free method is easy to implement and it can achieve an effective tradeoff between estimation accuracy and computational speed, since only several classic techniques are involved. Extensive experimental results demonstrate the effectiveness of the proposed method in comparison to five state-of-the-art methods.

15:00-17:10, Paper TuPT2.2
A Novel Photometric Stereo Method with Nonisotropic Point Light Sources
Nie, Ying	Shenzhen Inst. of Advanced Tech. Chinese Acad. of S
Song, Zhan	CAS/CUHK
Keywords: 3D shape recovery, Illumination and reflectance modeling, Vision sensors Abstract: This paper presents a photometric stereo method with nonisotropic point light sources. Subject to the non-uniform lighting conditions produced by the nonisotropic point sources, each incident light ray should be precisely determined so as to realize an accurate calculation of surface normal. In the proposed method, radiance model of the light source is firstly introduced to the classical photometric stereo framework. By considering the distance and angular attenuations of incident light rays, a precise description for the lighting field can be established. Based on the initial 3D reconstruction result, an iterative process is introduced to optimize the primary light model parameters with respect to the unknown distance factor. The experimental setup is quite simple, which only consists of some LEDs and one camera. And the experimental results show that, with the proposed method, accuracy of the reconstructed surface normal can be greatly improved in comparison with some conventional light models.

15:00-17:10, Paper TuPT2.3
Registration and Entire Shape Acquisition for Grid Based Active One-Shot Scanning Techniques
Hirukawa, Takuto	Kagoshima Univ
Morinaga, Hiroki	Kagoshima Univ
Kawasaki, Hiroshi	Kagoshima Univ
Furukawa, Ryo	Hiroshima City Univ
Keywords: 3D shape recovery, Image based modeling, Vision for graphics Abstract: One-shot active stereo using structured light is a practical solution for dynamic scene acquisition. Basically, those methods are based on encoding positional information of the pixel into the single projected pattern. A disadvantage of such methods is decreases of the spatial resolution caused by requiring a certain area of the pattern to encode the positional information. Among those methods, grid-based patterns are promising at the point of accuracy and robustness, since triangulation for 3D reconstruction is conducted with light-sectioning method and a line detection is usually a stable image processing. However, no shapes are recovered between the grid lines, and thus, the whole reconstructed shape tends to be sparse. To deal with the problem, integrating multiple shapes that are sequentially captured using registration algorithm such as ICP is one solution. In previous work, we show that naive ICP works poorly for grid-like structured point clouds, and proposed a specialized ICP algorithm for aligning a set of grid-like structured 3D shapes. In this paper, we extend this approach and propose a process for entire shape modeling by capturing objects from all the directions using turn table, and integrating into a single shape using our improved ICP. To achieve this, setting good initial 3D shapes is important. For solution, we interpolation grid shapes to create smooth surface so that common ICP works. Comprehensive experiments are conducted to show the strength of our method compared to common ICP.

15:00-17:10, Paper TuPT2.4
Three Dimensional Pose Estimation of Mouse from Monocular Images in Compact Systems
Salem, Ghadi	National Inst. of Health
Krynitsky, Jonathan	National Inst. of Health
Pohida, Tom	National Inst. of Health
Hayes, Monson	George Mason Univ
Burgos-Artizzu, Xavier Paolo	Transmural Biotech
Keywords: 3D shape recovery, Motion, tracking and video analysis Abstract: Video-based activity and behavior analysis for mice has garnered wide attention in biomedical research. Animal facilities hold large numbers of mice housed in `home-cages' densely stored within ventilated racks. Automated analysis of mice activity in their home-cages can provide a new set of sensitive measures for detecting abnormalities and time-resolved deviation from baseline behavior. Large scale monitoring in animal facilities requires minimal footprint hardware that integrates seamlessly with the ventilated racks. Compactness of hardware imposes use of fisheye lenses positioned in close proximity to the cage. In this paper, we estimate the 3D pose of a mouse from fisheye distorted monocular monochromatic images using a novel adaptation of a structured forests algorithm. The method utilizes classification decision trees leveraging their versatility to store arbitrary information in the leaf-nodes. During training, the samples arriving at each node are mapped from continuous pose space to discrete class labels such that similar poses are grouped in the same class. The node splitting function is trained by optimizing a classification objective function rather than a high-dimensional regression one. The leaf-nodes store the pose parameters for the set of samples reaching the node. A prediction model preserving the structural relationship of the pose is formed based on the samples in the leaf-nodes. We apply the method to what we believe is the first known training set for 3D recovery of mouse key points from monocular images. We compare the results of our approach to those obtained via standard regression techniques.

15:00-17:10, Paper TuPT2.5
3D Motion Estimation by Evidence Gathering
Abuzaina, Anas	Univ. of Southampton
Nixon, Mark	Univ. of Southampton
Carter, John	Univ. of Southampton
Keywords: 3D shape recovery, Motion, tracking and video analysis, Physics-based vision Abstract: In this paper we introduce an algorithm for 3D motion estimation in point clouds that is based on Chasles’ kinematic theorem. The proposed algorithm estimates 3D motion parameters directly from the data by exploiting the geometry of rigid transformation using an evidence gathering technique in a Hough-voting-like approach. The algorithm provides an alternative to the feature description and matching pipelines commonly used by numerous 3D object recognition and registration algorithms, as it does not involve keypoint detection and feature descriptor computation and matching. To the best of our knowledge, this is the first research to use kinematics theorems in an evidence gathering framework for motion estimation and surface matching without the use of any given correspondences. Moreover, we propose a method for voting for 3D motion parameters using a one-dimensional accumulator space, which enables voting for motion parameters more efficiently than other methods that use up to 7-dimensional accumulator spaces.

15:00-17:10, Paper TuPT2.6
A Bayesian Part-Based Approach to 3D Human Pose and Camera Estimation
Brau, Ernesto	Boston Coll
Jiang, Hao	Boston Coll
Keywords: 3D shape recovery, Scene understanding Abstract: We present a Bayesian framework for estimating 3D human pose and camera from a single RGB image. We develop a generative model where a 3D pose is rendered onto an image (via the camera), which then generates a detection probability map for each body part. We represent a human pose with a set of 3D cylinders in space, one for each body part, and we place kinematic and self-intersection priors on the model. Importantly, we use a graphics engine (e.g., OpenGL) to render the pose, and use its built-in capabilities for color blending to efficiently compute the likelihood of the model given the observed probability maps, which are obtained by running a convolutional neural network classifier on a test image. We explore the space of 3D poses and camera configurations via the Hybrid Monte Carlo algorithm, with sampling moves designed specifically for this problem. We train the parameters of our prior and likelihood distributions using annotated poses from the CMU mocap database, and test our algorithm on two benchmark datasets, where we compare performance against state-of-the-art methods. Additionally, we demonstrate the flexibility of our framework by incorporating a likelihood function for depth images and showing the associated performance gains.

15:00-17:10, Paper TuPT2.7
Room Reconstruction from a Single Spherical Image by Higher-Order Energy Minimization
Fukano, Kosuke	Waseda Univ
Mochizuki, Yoshihiko	Waseda Univ
Simo-Serra, Edgar	Waseda Univ
Iizuka, Satoshi	Waseda Univ
Sugimoto, Akihiro	National Inst. of Informatics
Ishikawa, Hiroshi	Waseda Univ
Keywords: 3D shape recovery, Scene understanding, Classification and clustering Abstract: We propose a method to reconstruct a simple room from a single spherical image, i.e., to identify structural planes that form the ceiling, the floor, and the walls. A spherical image records the light that falls on a single viewpoint from all directions. Because there is no need to correlate geometrical information from multiple images, it facilitates the robust reconstruction of precise structure of the room. In our method, we first detect line segments in the image, which we then classify into those that form the boundaries of the structural planes and those that do not. The classification is a large combinatorial problem, which we solve with graph cuts as a minimization problem of a higher-order energy that combines the various measures of likelihood that one, two, or three line segments are part of the boundary. Finally, we estimate the planes that constitute the room from the line segments classified as residing on the boundaries. We evaluate the proposed method on synthetic and real images.

15:00-17:10, Paper TuPT2.8
Model-Based Face Reconstruction Using SIFT Flow Registration and Spherical Harmonics
Wu, Fanzi	The Chinese Univ. of Hong Kong
Li, Songnan	CUHK
Zhao, Tianhao	The Chinese Univ. of Hong Kong
Ngan, King Ngi	The Chinese Univ. of Hong Kong
Keywords: 3D shape recovery, Shape modeling and encoding Abstract: In this paper, we propose a robust method for face reconstruction using a single color image. A 3D morphable model is used to reconstruct a smooth 3D face shape. To find the correspondence between model vertices and image pixels, landmarks are updated using SIFT flow which is illumination and rotation invariant. To reconstruct more detailed information, depth values are refined using a shape from shading method which approximates lighting condition by spherical harmonics. We test the proposed method on a set of real world images and compare reconstructed results with depth maps captured by a depth camera. The average error is around 3.3 mm.

15:00-17:10, Paper TuPT2.10
Salient Object Detection in Tracking Shots
Muthuswamy, Karthik	Nanyang Tech. Univ
Rajan, Deepu	Nanyang Tech. Univ
Keywords: Motion, tracking and video analysis, Image and video analysis and understanding, Low-level vision Abstract: Tracking shots have posed a significant challenge for salient region detection due to the presence of highly competing background motion. In this paper, we propose a computationally efficient technique to detect salient objects in a tracking shot. We first separate the tracked foreground pixels from the background by accounting for the variability of the pixels in a set of frames. The focus of the tracked foreground pixels is utilized as a measure of saliency of objects in the scene. We evaluate the performance of this method by comparing the salient region detection with ground truth data of the location of the salient object that are manually generated. The results of the evaluation show that the proposed method is able to achieve superior salient object detection performance with very low computational load.

15:00-17:10, Paper TuPT2.11
Aligning Movies with Scripts by Exploiting Temporal Ordering Constraints
Naim, Iftekhar	Univ. of Rochester
Mamun, Abdullah Al	Univ. of Rochester
Song, Young Chol	Univ. of Rochester
Luo, Jiebo	-Univ. of Rochester
Kautz, Henry	Univ. of Rochester
Gildea, Daniel	Univ. of Rochester
Keywords: Motion, tracking and video analysis, Image and video analysis and understanding, Multimedia analysis, indexing and retrieval Abstract: Scripts provide rich textual annotation of movies, including dialogs, character names, and other situational descriptions. Exploiting such rich annotations requires aligning the sentences in the scripts with the corresponding video frames. Previous work on aligning movies with scripts predominantly relies on time-aligned closed-captions or subtitles, which are not always available. In this paper, we focus on automatically aligning faces in movies with their corresponding character names in scripts without requiring closed-captions/subtitles. We utilize the intuition that faces in a movie generally appear in the same sequential order as their names are mentioned in the script. We first apply standard techniques for face detection and tracking, and cluster similar face tracks together. Next, we apply a generative Hidden Markov Model (HMM) and a discriminative Latent Conditional Random Field (LCRF) to align the clusters of face tracks with the corresponding character names. Our alignment models (especially LCRF) significantly outperform the previous state-of-the-art on two different movie datasets and for a wide range of face clustering algorithms.

15:00-17:10, Paper TuPT2.12
Temporally Subsampled Detection for Accurate and Efficient Face Tracking and Diarization
Le, Nam	Idiap Res. Inst
Heili, Alexander	Idiap Res. Inst
Wu, Di	Idiap Res. Inst
Odobez, Jean-Marc	IDIAP Res. Inst
Keywords: Motion, tracking and video analysis, Image and video analysis and understanding, Multimedia analysis, indexing and retrieval Abstract: Face diarization, i.e. face tracking and clustering within video documents, is useful and important for video indexing and fast browsing but it is also a difficult and time consuming task. In this paper, we address the tracking aspect and propose a novel algorithm with two main contributions. First, we propose an approach that leverages state-of-the-art deformable part-based model (DPM) face detector with a multi-cue discriminant tracking-by-detection framework that relies on automatically learned long-term time-interval sensitive association costs specific to each document type. Secondly to improve performance, we propose an explicit false alarm removal step at the track level to efficiently filter out wrong detections (and resulting tracks). Altogether, the method is able to skip frames, i.e. process only 3 to 4 frames per second - thus cutting down computational cost - while performing better than state-of-the-art methods as evaluated on three public benchmarks from different context including a movie and broadcast data.

15:00-17:10, Paper TuPT2.13
Unknown Object Tracking in 360-Degree Camera Images
Delforouzi, Ahmad	Res. Group for Pattern Recognition, Univ. of Siegen
Tabatabaei, Seyed Amir Hossein	Res. Group for Pattern Recognition, Univ. of Siegen
Shirahama, Kimiaki	Univ. of Siegen
Grzegorzek, Marcin	Univ. of Siegen
Keywords: Motion, tracking and video analysis, Image and video analysis and understanding, Vision sensors Abstract: In this paper, a method for unknown object tracking in output images from 360-degree cameras called Modified Training-Learning-Detection (MTLD) is presented. The proposed method is based on the recently introduced Training-Learning-Detection (TLD) scheme in the literature. The flaws of the TLD approach have been detected and significant modifications have been proposed to enhance and elaborate the scheme. Unlike TLD, MTLD is capable of detecting the unknown objects of interest in 360-degree images. According to the experimental results, the proposed method significantly outperforms the TLD method in terms of detection rate and implementation cost.

15:00-17:10, Paper TuPT2.14
Structure and Appearance Preserving Network Flow for Multi-Object Tracking
Pu, Shi	Beijing Univ. of Posts and Telecommunications
Zhang, Honggang	Beijing Univ. of Posts and Telecommunications
Zhao, Kaili	Beijing Univ. of Posts and Telecommunications
Keywords: Motion, tracking and video analysis, Occlusion and shadow detection Abstract: Tracking-by-detection with temporal smoothness has recently attracted increasing attentions in the field of multi-object tracking. Occlusions and clutter are two key problems. To address these problems, this paper proposes a new structure and appearance preserving network flow (SAPNF) with tracking-by-detection, introducing spatial structural configuration and appearance overlapping constraint from frame to frame. One crucial aspect in SAPNF is to consider structure information and appearance smoothness simultaneously which benefits from each other. Unlike previous studies that only learn spatial information or appearance smoothness, a unified min-cost flow with the proposed new structure and appearance induces to track multi-object in crowded and cluttered scenes. Experiments on PETS and TUD benchmarks show that SAPNF performs the comparative results in comparison with alternative methods.

15:00-17:10, Paper TuPT2.15
Human Tracking in Crowded Scenes Using Target Information at Previous Frames
Takada, Hiromasa	Meijo Univ
Hotta, Kazuhiro	Meijo Univ
Janney, Pranam	Defence Science and Tech. Group, Department of Defence
Keywords: Motion, tracking and video analysis, Occlusion and shadow detection, Active and ensemble learning Abstract: Human tracking in crowded scenes is a challenging problem because of frequent occlusion and presence of the tracking in similar regions. In this paper, we propose an online human tracking method which can handle occlusion and targets with similar regions. Our method compares the target region with a surrounding region and targets with similar regions at current frame. In addition, we also compare the target region at current and previous frames. We reduce the probabilities of uncommon colors at current and previous frames thereby improving the tracking accuracy. The effectiveness of the proposed method has been demonstrated via comparison with state-of-the-art trackers on the PETS2009 dataset.

15:00-17:10, Paper TuPT2.15
Detection and Characterization of Intrinsic Symmetry of 3D Shapes
Mukhopadhyay, Anirban	Zuse Inst. Berlin
Bhandarkar, Suchendra	Univ. of Georgia
Porikli, Fatih	Anu / Nicta
Keywords: Shape modeling and encoding, 3D shape recovery, Vision for graphics Abstract: A comprehensive framework for detection and characterization of partial intrinsic symmetry over 3D shapes is proposed. To identify prominent symmetric regions which overlap in space and vary in form, the proposed framework is decoupled into a Correspondence Space Voting (CSV) procedure followed by a Transformation Space Mapping (TSM) procedure. In the CSV procedure, significant symmetries are first detected by identifying surface point pairs on the input shape that exhibit local similarity in terms of their intrinsic geometry while simultaneously maintaining an intrinsic distance structure at a global level. To allow detection of potentially overlapping symmetric shape regions, a global intrinsic distance-based voting scheme is employed to ensure the inclusion of only those point pairs that exhibit significant intrinsic symmetry. In the TSM procedure, the Functional Map framework is employed to generate the final map of symmetries between point pairs. The TSM procedure ensures the retrieval of the underlying dense correspondence map throughout the 3D shape that follows a particular symmetry. The TSM procedure is also shown to result in the formulation of a metric symmetry space where each point in the space represents a specific symmetry transformation and the distance between points represents the complexity between the corresponding transformations. Experimental results show that the proposed framework can successfully analyze complex 3D shapes that possess rich symmetries.

15:00-17:10, Paper TuPT2.16
Appearance Changes Detection During Tracking
Chen, Wei	National Univ. of Defense Tech
Guo, Xifeng	National Univ. of Defense Tech
Liu, Xinwang	Computer School, National Univ. of Defense Tech
Zhu, En	National Univ. of Defense Tech
Yin, Jianping	National Univ. of Defense Tech
Keywords: Motion, tracking and video analysis, Occlusion and shadow detection, Deep learning Abstract: Correlation tracker has made a huge success in visual object tracking. However, it is mainly because that the tracker cannot catch the occurrence of appearance changes, tracking based on correlation filters often drifts due to the unexpected appearance changes caused by occlusion, deformation and background clutter. In this paper, we propose a new method to detect the case when the tracker encountered the unexpected appearance changes. This method uses the following points:1)Filter response curve would decreases dramatically when target suffers heavy appearance changes. 2)Features extracted from deeper layers of convolutional neural networks (CNNs) have more semantics information and features extracted from shadower layers have more spatial information. Extensive experimental results on several public benchmark datasets show that the proposed method can deal with the appearance changes effectively.

15:00-17:10, Paper TuPT2.17
Statistical Consensus Matching Framework for Image Registration
Feigin, Micha	Massachusetts Inst. of Tech
Ranger, Bryan	Massachusetts Inst. of Tech
Anthony, Brian	Massachusetts Inst. of Tech
Keywords: Motion, tracking and video analysis, Reconstruction and camera motion estimation, Stereo and multiple view geometry Abstract: A common method for image alignment in computer vision is finding the maximum consensus transformation for a set of features in the images. This is commonly done using randomized methods such as RANSAC. While relatively robust when strong features are involved, these methods do not deal well with ambiguous features where maximum likelihood does not provide the best match between the images, a common case with modalities such as medical ultrasound, thermal imaging and cross modality registration. They also do not inherently allow for the application of external knowledge regarding possible configurations to aid in the registration. In this paper we present a novel statistical framework for maximum consensus image alignment which is both robust in the presence of weak features (features not providing one-to-one matches) while at the same time providing an inherent natural ability for integrating external knowledge. Our methods is able to collect information not only from finding good matches, but also from improbable and partially ambiguous matches. We demonstrate our framework in the context of medical ultrasound image registration. In our test cases, our method succeeded where other state of the art methods we compared to failed to provide satisfactory results with over 17% of the samples.

15:00-17:10, Paper TuPT2.18
Beyond Verbs: Understanding Actions in Videos with Text
Naha, Shujon	Univ. of Manitoba
Wang, Yang	Univ. of Manitoba
Keywords: Motion, tracking and video analysis, Scene understanding Abstract: We consider the problem of joint modeling of videos and corresponding textual descriptions (e.g. sentences, or phrases). Our approach consists of three components: the video representation, the textual representation, and a joint model that links videos and text. Our video representation uses the state-of-the-art deep 3D ConvNet to captures the semantic information in the video. Our textual representation uses the recent advancement in learning word and sentence vectors from large text corpus. The joint model is learned to score the correct (video, text) pairs higher than the incorrect ones. We demonstrate our approach in several applications: 1) retrieving sentences given a video; 2) retrieving videos given a sentence; 3) zero-shot action recognition in videos.

15:00-17:10, Paper TuPT2.19
User Interest Profiling Using Tracking-Free Coarse Gaze Estimation
Bartoli, Federico	Univ. of Florence
Lisanti, Giuseppe	Univ. Degli Studi Di Firenze
Seidenari, Lorenzo	Media Integration and Communication Center - Univ
Del Bimbo, Alberto	Univ. of Florence
Keywords: Motion, tracking and video analysis, Scene understanding Abstract: Understanding where people attention focuses is a challenging and extremely valuable task that can be solved using computer vision technologies. In this paper we address this problem on surveillance-like scenarios, where head and body imagery are usually low resolution. We propose a method to profile the attention of people moving in a known space. We exploit coarse gaze estimation and a novel model based on optical flow to improve attention prediction without the need of a tracker. Removing the tracker dependency makes the method applicable also on highly crowded scenarios. The proposed method is able to obtain comparable performance with respect to state of the art solutions in terms of Mean Average Angular Error (MAAE) on the TownCentre dataset. We also test our approach on the publicly available MuseumVisitors dataset showing an improvement both in terms of MAAE and in terms of accuracy in the estimation of visitors' profile.

15:00-17:10, Paper TuPT2.20
Modeling Human-Skeleton Motion Patterns Using Conditional Deep Boltzmann Machine
Lei, Peng	Oregon State Univ
Todorovic, Sinisa	Oregon State Univ
Attachments: Supplementary material Keywords: Motion, tracking and video analysis, Shape modeling and encoding Abstract: This paper addresses the problem of modeling longrange motion patterns of a 3D human skeleton performing an activity. This problem is important, as such a model can be used in many applications, including person tracking via 3D pose estimation, and probabilistic sampling of realistic 3D skeleton sequences conducting different activities with different motion styles. To this end, we formulate a new generative model, called conditional deep Boltzmann machine (CDBM). CDBM defines a joint distribution of two hidden layers and 3D-skeleton pose predictions in the near future given human skeleton observations from the recent past. Our CDBM extends the conditional restricted Boltzmann machine (CRBM) and the factored conditional restricted Boltzmann machine (FCRBM) by introducing an additional hidden layer and removing the style layer, while preserving the computational efficiency of CRBM and FCRBM. The new hidden variables are aimed at capturing longrange and high-order spatiotemporal interactions among human body joints, and thus enable CDBM to effectively model 3D motion sequences with different activities and motion styles with a single set of parameters. Our experiments on the benchmark Motion Capture and HumanEva datasets demonstrate that our CDBM outperforms CRBM and achieves on par performance with FCRBM both in 3D pose based person tracking and realistic 3D skeleton sequence generating.

15:00-17:10, Paper TuPT2.21
Robust Real-Time Performance-Driven 3D Face Tracking
Pham, Hai	Rutgers, the State Univ. of New Jersey
Pavlovic, Vladimir	Rutgers Univ
Cai, Jianfei	Nanyang Tech. Univ
Cham, Tat-Jen	Nanyang Tech. Univ
Attachments: Supplementary material Keywords: Motion, tracking and video analysis, Vision for graphics Abstract: We introduce a novel robust hybrid 3D face tracking framework from RGBD video streams, which is capable of tracking head pose and facial actions without pre-calibration or intervention from a user. In particular, we emphasize on improving the tracking performance in instances where the tracked subject is at a large distance from the cameras, and the quality of point cloud deteriorates severely. This is accomplished by the combination of a flexible 3D shape regressor and the joint 2D+3D optimization on shape parameters. Our approach fits facial blendshapes to the point cloud of the human head, while being driven by an efficient and rapid 3D shape regressor trained on generic RGB datasets. As an on-line tracking system, the identity of the unknown user is adapted on-the-fly resulting in improved 3D model reconstruction and consequently better tracking performance. The result is a robust RGBD face tracker capable of handling a wide range of target scene depths, whose performances are demonstrated in our extensive experiments better than those of the state-of-the-arts.

15:00-17:10, Paper TuPT2.22
Pedestrain Detection from Motion: A Spatial-Temporal Approach Based on Walking Actions
Kilicarslan, Mehmet	Indiana Univ. Univ. Indianapolis
Zheng, Jiang Yu	Indiana Univ. Purdue Univ. Indianapolis
Raptis, Konstantinos	Indiana Univ. Univ. Indianapolis
Keywords: Motion, tracking and video analysis, Vision sensors, Human body motion and gesture based interaction Abstract: Pedestrian detection is a challenging problem studied over decades. Most algorithms are based on human appearance. Only few works consider motion as a feature component. In this paper, however, we tackle this problem only considering short periods of pedestrian walking. This motion does not depend on the variations of pedestrian pose, body shape, illumination, and background. We model pedestrian motion that has unique properties compare to background and rigid objects motion in spatial-temporal motion profiles. This observation helps us to identify pedestrian leg motion along with body motion over a short time period. Our method also works for a vehicle borne camera where background also moves. We achieved more robust results by dealing with crowds, and other degenerating cases of human motion against background and dynamic scenes. The method has a low computational cost on a motion profile and it can be combined with a shape-based method as pre-screening for reducing the false positives. It also provides a feasible way to find human behaviors.

15:00-17:10, Paper TuPT2.23
Epipolar Geometry Based on Line Similarity
Ben-Artzi, Gil	HEBREW Univ
Halperin, Tavi	The Hebrew Univ. of Jerusalem
Werman, Michael	Huji
Peleg, Shmuel	The Hebrew Univ. of Jerusalem
Keywords: Stereo and multiple view geometry, Reconstruction and camera motion estimation Abstract: It is known that epipolar geometry can be computed from three epipolar line correspondences but this computation is rarely used in practice since there are no simple methods to find corresponding lines. Instead, methods for finding corresponding points are widely used. This paper proposes a similarity measure between lines that indicates whether two lines are corresponding epipolar lines and enables finding epipolar line correspondences as needed for the computation of epipolar geometry. A similarity measure between two lines, suitable for video sequences of a dynamic scene, has been previously described. This paper suggests a stereo matching similarity measure suitable for images. It is based on the quality of stereo matching between the two lines, as corresponding epipolar lines yield a good stereo correspondence. Instead of an exhaustive search over all possible pairs of lines, the search space is substantially reduced when two corresponding point pairs are given. We validate the proposed method using real-world images and compare it to state-of-the-art methods. We found this method to be more accurate by a factor of five compared to the standard method using seven corresponding points and comparable to the 8-points algorithm.

15:00-17:10, Paper TuPT2.24
Bridge Motion to Collision Alarming Using Driving Video
Kilicarslan, Mehmet	Indiana Univ. Univ. Indianapolis
Zheng, Jiang Yu	Indiana Univ. Purdue Univ. Indianapolis
Keywords: Vision for robotics, Motion, tracking and video analysis, Vision sensors Abstract: the objective of this work is to compute the Timeto- Collision (TTC) of surrounding vehicles of a vehicle using motion information in driving video. The key advantage in this work is the extraction of potential danger without vehicle detection and recognition in prior, but directly from the motion divergence in the video. We analyze the trace expansion both horizontally and vertically condensed in the collision sensitive zones in the driving video. Long term motion is stably obtained through filtering in the spatial-temporal video profiles at collision sensitive parts in the video. This overcomes the accuracy problem in object recognition and saved the computation cost tremendously in the real time sensing. The fine velocity computation yields reasonable TTC accuracy so that the video camera can achieve

15:00-17:10, Paper TuPT2.25
On Spatio-Temporal Saliency Detection in Videos Using Multilinear PCA
Sidibe, Desire	Univ. De Bourgogne
Rastgoo, Mojdeh	Univ. De Bourgogne
Meriaudeau, Fabrice	LE2I
Keywords: Biologically motivated vision, Early vision, Motion, tracking and video analysis Abstract: Visual saliency is an attention mechanism which helps to focus on regions of interest instead of processing the whole image or video data. Detecting salient objects in still images has been widely addressed in literature with several formulations and methods. However, visual saliency detection in videos has attracted little attention, although motion information is an important aspect of visual perception. A common approach for obtaining a spatio-temporal saliency map is to combine a static saliency map and a dynamic saliency map. In this paper, we extend a recent saliency detection approach based on principal component analysis (PCA) which have shwon good results when applied to static images. In particular, we explore different strategies to include temporal information into the PCA-based approach. The proposed models have been evaluated on a publicly available dataset which contain several videos of dynamic scenes with complex background, and the results show that processing the spatio-tempral data with multilinear PCA achieves competitive results against state-of-the-art methods.

15:00-17:10, Paper TuPT2.26
RGB-D Saliency Detection under Bayesian Framework
Wang, Songtao	Beijing Acad. of Science and Tech
Zhou, Zhen	Harbin Univ. of Science and Tech
Qu, HanBing	Beijing Acad. of Science and Tech. Beijing
Li, Bin	Beijing Acad. of Science and Tech
Keywords: Cognitive and embodied vision, Signal, image and video processing, Other applications Abstract: In this paper, we propose a saliency detection model for RGB-D images based on the contrasting features of color and depth within a Bayesian framework. The depth feature map is extracted based on super-pixel contrast computation with spatial priors. We model the depth saliency map by approximating the density of depth-based contrast features using a Gaussian distribution. Similar to the depth saliency computation, the color saliency map is computed using a Gaussian distribution based on multi-scale contrasts in super-pixels by exploiting low-level cues. By assuming that color-based and depth-based contrast features are conditionally independent given the classes, a discriminative mixed-membership naive Bayes model is used to calculate the final saliency map from the depth saliency and color saliency probabilities by applying Bayes' theorem. The Gaussian distribution parameter can be estimated in the DMNB model by a variational inference-based expectation maximization algorithm. Experimental results on a recent eye-tracking database show that the proposed model performs better than other existing models.

15:00-17:10, Paper TuPT2.27
Depth Estimation with Cascade Occlusion Culling Filter for Light-Field Cameras
Zhou, Wenhui	Hangzhou Dianzi Univ
Lumsdaine, Andrew	Indiana Univ. Bloomington
Lin, Lili	Zhejiang Gongshang Univ
Keywords: Computational photography Abstract: Depth recovery from a light-field camera is an essential and interesting problem. One of its most challenges is to get accurate estimation for the depth discontinuities and occluded regions. We propose a simple and efficient solution with a cascade occlusion culling filter. It is a cascade processing corresponding to the different manifestations of occlusions at ray-level, pixel-level and image-level. (i) At ray-level, any potential occluded ray will be filtered out in depth-cue responses computation, and then reliable multiple-cue cost volumes are constructed. (ii) At pixel-level, occlusions generally result in weak depth discontinuity ramp edges. These discontinuities will be preserved and enhanced by a multiple-cue cost-volume filter with edge preserving property. (iii) At image-level, occlusion is embodied in the uncertain regions. In order to obtain optimal depth estimation of uncertain regions, an iterative depth optimization framework is applied to integrate the aforementioned filtered multiple-cue cost volumes with their confidences. We show that our method has good depth-discontinuity preserving property, and is insensitive to the surface color / texture discontinuities at the same time. Experimental results on Lytro Illum camera data demonstrate the effectiveness and robustness of our method which has excellent performance in handling discontinuities and occlusions.

15:00-17:10, Paper TuPT2.28
Accurate and Fast Micro Lenses Depth Maps from a 3D Point Cloud in Light Field Cameras
Ferreira, Rodrigo T.	ISR Coimbra
Goncalves, Nuno	Inst. for Systems and Robotics - Univ. of Coimbra
Attachments: Supplementary material Keywords: Computational photography, Reconstruction and camera motion estimation, Stereo and multiple view geometry Abstract: Light field cameras capture a scene's multi-directional light field with one image, allowing the estimation of depth. In this paper, we introduce a fully automatic method for depth estimation from a single plenoptic image running a RANSAC-like algorithm for feature matching. The novelty about our method is the global method to back project correspondences found using photometric similarity to obtain a 3D virtual point cloud and different methods to build a depth map from the 3D point cloud generated. We use lenses with different focal-lengths in a multiple depth map refining phase, generating a dense depth map. Tests with simulations and real images are presented and compared with the state of the art, showing comparable accuracy for substantial less computational time.

15:00-17:10, Paper TuPT2.29
Deep Structured-Output Regression Learning for Computational Color Constancy
Qian, Yanlin	Tampere Univ. of Tech
Chen, Ke	Tampere Univ. of Tech
Kamarainen, Joni-Kristian	Tampere Univ. of Tech
Nikkanen, Jarno	Intel Finland
Matas, Jiri	CTU Prague
Keywords: Computational photography, Statistical, syntactic and structural pattern recognition Abstract: The color constancy problem is addressed by structured-output regression on the values of the fully-connected layers of a convolutional neural network. The AlexNet and the VGG are considered and VGG slightly outperformed AlexNet. Best results were obtained with the first fully-connected ”fc6” layer and with multi-output support vector regression. Ex- periments on the SFU Color Checker and Indoor Dataset benchmarks demonstrate that our method achieves competitive performance, outperforming the state of the art on the SFU indoor benchmark.

15:00-17:10, Paper TuPT2.30
Absorptive Scattering Model for Rough Laminar Surfaces
Dahlan, Hadi Affendy	Univ. of York
Hancock, Edwin	Univ. of York
Keywords: Illumination and reflectance modeling Abstract: This paper introduces a new light scattering model for surfaces with rough boundaries and absorption. This is an extension to Ragheb-Hancock model. The new model adds an absorption term proportional of the squared cosine of the light incidence angle, and satisfies conservation of energy. To test the accuracy of the model, we have used the CUReT database. The model was compared with alternatives such as the Jensen model, the Oren-Nayar model, and the original Ragheb-Hancock model. The results show that the new model produces the best fits to the data. Interestingly the model is capable of predicting absorption in dominant colored samples, a feature not possible with the original models studied. The absorption parameter of the new model provides is also informative of surface structure and composition, especially for layered dielectric materials.

15:00-17:10, Paper TuPT2.31
Hierarchical Hough Forests for View-Independent Action Recognition
Hilsenbeck, Barbara	Fraunhofer IOSB
Münch, David	Fraunhofer IOSB
Kieritz, Hilke	Fraunhofer IOSB
Hübner, Wolfgang	Fraunhofer IOSB
Arens, Michael	Fraunhofer IOSB
Keywords: Image based modeling, Classification and clustering Abstract: Appearance-based action recognition can be considered as a natural extension of appearance-based object detection from the spatial to the spatio-temporal domain. Although this step seems natural, most action recognition approaches are evaluated in isolation. Towards this end the contribution of the paper is twofold. First, a view-independent approach to action recognition is proposed and second the sensitivity w.r.t. a combination of person detection and action recognition is evaluated. Action recognition is performed in a hierarchical manner: First, the relative camera orientation in the scene is estimated and second, the action is determined using view-dependent Hough forests. The proposed approach is evaluated on the multi-view i3DPost dataset and its performance is compared to single-step approaches using Hough forests. The results suggest that the recognition rate increases, when using the proposed hierarchical method compared to the single-step approaches. Further, the performance rates of hierarchical Hough forests on groundtruth data are compared to the results of hierarchical Hough forests in combination with a person detector.


TuPT3	Poster Session Hall
TuP3	Poster Session

15:00-17:10, Paper TuPT3.1
Multiclass Labeling Via Gaussian Mixture Model Interactive Classifier an Efficient Interactive Technique for Heuristic Model Segmentation in Laplacian Coordinates
Ms., Juwairiyah Naeem	National Univ. of Sciences and Tech
Rao, Naveed Iqbal	National Univ. of Sciences and Tech. Ofsignals
Attachments: Supplementary material Keywords: Segmentation, features and descriptors Abstract: Abstract— This paper presents a fast and accurate interactive algorithm to address the problem of multiclass image segmentation called GMMIC (Gaussian Mixture Model-based Interactive Classifier). The proposed framework has an imperative feature of classification of complex or Indistinct and Indistinguishable Boundary images. The algorithm extracts objects of interest from seeds as small as dots provided by the user. This work is inspired by Casaca et al. seed based image segmentation method but to reduce the processing time and obtain remarkable accuracy, GMMIC uses a unique multilayered approach. The Gaussian Mixture Model assigns color labels to pixels and later seeds are provided in the Laplacian Coordinates graph. This paper demonstrates the outperformance of the proposed framework in terms of better boundary fitting and greater efficiency with minimum user intervention in comparison with five prominent algorithms taken from the McGuiness et al. review and based on comparison with 500 images from the Berkeley datasets (BSD500 and BSD300) and 50 images from the Grab cut dataset (PASCAL VOC 2007).

15:00-17:10, Paper TuPT3.2
Dominant Plane Recognition in Interior Scenes from a Single Image
Osuna-Coutiño, J. A. de Jesús	Inst. Nacional De Astrofísica, Óptica Y Electrónica
Martinez-Carranza, Jose	Inst. Nacional De Astrofisica Optica Y Electronica
Arias-Estrada, Miguel	Inst. Nacional De Astrofísica, Óptica Y Electrónica (INAOE)
Mayol-Cuevas, Walterio	Univ. of Bristol
Keywords: Segmentation, features and descriptors, 2D/3D object detection and recognition, Texture and color analysis Abstract: Recognition of dominant planes is an important task used in areas such as robot navigation, augmented reality, 3-D reconstruction, among others. There are several approaches for recognizing planar structures, however, most of these approaches are based on processing two or more images captured from different camera views or on processing 3D data in the form of point clouds associated to the camera images. An alternative is to process a single image seeking to interpret areas of the images where planar structure may be observed, thus removing parallax dependency, but adding the challenge of having to correctly interpret image ambiguities. Motivated by the latter, this work presents initial results of a novel methodology for dominant planes recognition in a single image by combining three key strategies: a learning algorithm, a segmentation scheme and a contour detection method. We constraint our approach to work with interior scenes as an attempt to identify key elements that may help in the recognition process in this sort of scenes. In this sense, our results show a recognition accuracy of 60.17% with an error of 3.14%, which indicate the feasibility of our approach.

15:00-17:10, Paper TuPT3.3
Efficient Segmentation for Region-Based Image Retrieval Using Edge Integrated Minimum Spanning Tree
Liu, Yang	Beihang Univ
Huang, Lei	Beihang Univ
Wang, Siqi	Beihang Univ
Liu, Xianglong	Beihang Univ
Lang, Bo	Beihang Univ
Keywords: Segmentation, features and descriptors, Content based image retrieval and data mining Abstract: Region-based Image Retrieval (RBIR), which bases itself on image segmentation rather than global features or key-point-based local features, is a branch of Content-based Image Retrieval. This paper proposes a novel RBIR-oriented image segmentation algorithm named Edge Integrated Minimum Spanning Tree (EI-MST). The difference between EI-MST and the traditional MST-based methods is that EI-MST generates MSTs over edge-maps rather than the original images, which achieved high retrieval performance cooperating with state-of-the-art matching strategies. In addition, by limiting the nodes in every MST with adaptive scale selection, EI-MST is efficient especially when processing high resolution images. The experiments on four popular public datasets proved that, EI-MST is capable of achieving higher retrieval accuracy over four widely used segmentation methods while only consuming moderate amount of time in both online and offline parts of RBIR systems.

15:00-17:10, Paper TuPT3.4
Weakly-Supervised Segmentation by Combining CNN Feature Maps and Object Saliency Maps
Wataru, Shimoda	Univ. Electro-Comm., Tokyo
Yanai, Keiji	The Univ. of Electro-Commnications, Tokyo
Attachments: Supplementary material Keywords: Segmentation, features and descriptors, Deep learning Abstract: In general, CNN based semantic segmentation methods assume pixel-wise annotation is available, which is costly to obtain in general. On the other hand, image-level annotations is much easier to obtain than pixel-level annotation. Then, in this work, we focus on weakly-supervised semantic segmentation which is known as task of using training data with only imagelevel annotations. In this work, we propose a new CNN-based semantic segmentation method which uses both and activation features extracted from forwarding of the CNN and object saliency maps obtained by back-propagation. As a CNN, we use the VGG-16 pre-trained with 1000-class ILSVRC datasets and fine-tuned it with multilabel training using only image-level labeled dataset. By the experiments, we show that the proposed method achieved nearly state-of-the-art results with the PASCAL VOC 2012 dataset.

15:00-17:10, Paper TuPT3.5
Affine Properties of the Relative Position PHI-Descriptor
Matsakis, Pascal	Univ. of Guelph
Keywords: Segmentation, features and descriptors, Image and video analysis and understanding Abstract: Colour, texture, shape, and relative position descriptors are fundamental visual descriptors. In particular, a relative position descriptor is a quantitative representation of the relative position of two spatial objects, and a basis from which models of spatial relationships (like inside, above, around, near) can be derived. The affine properties of visual descriptors have been the subject of much attention. Here, we focus on the PHI-descriptor, introduced in recent work. We show that it is a relative position descriptor with remarkable affine properties, and we illustrate these properties with two experiments.

15:00-17:10, Paper TuPT3.6
Optical Flow Co-Occurrence Matrices: A Novel Spatiotemporal Feature Descriptor
Caetano, Carlos Antônio	Univ. Federal De Minas Gerais
dos Santos, Jefersson Alex	Univ. Federal De Minas Gerais
Schwartz, William	Federal Univ. of Minas Gerais
Keywords: Segmentation, features and descriptors, Image and video analysis and understanding, Motion, tracking and video analysis Abstract: Suitable feature representation is essential for performing video analysis and understanding in applications within the smart surveillance domain. In this paper, we propose a novel spatiotemporal feature descriptor based on co-occurrence matrices computed from the optical flow magnitude and orientation. Our method, called Optical Flow Co-occurrence Matrices (OFCM), extracts a robust set of measures known as Haralick features to describe the flow patterns by measuring meaningful properties such as contrast, entropy and homogeneity of co-occurrence matrices to capture local space-time characteristics of the motion through the neighboring optical flow magnitude and orientation. We evaluate the proposed method on the action recognition problem by applying a visual recognition pipeline involving bag of local spatio-temporal features and SVM classification. The experimental results, carried on three well-known datasets (KTH, UCF Sports and HMDB51), demonstrate that OFCM outperforms the results achieved by several widely employed spatio-temporal feature descriptors such as HOF, HOG3D and MBH, indicating its suitability to be used as video representation.

15:00-17:10, Paper TuPT3.7
Robust Feature Matching by Learning Descriptor Covariance with Viewpoint Synthesis
Taira, Hajime	Tokyo Inst. of Tech
Torii, Akihiko	Tokyo Inst. of Tech
Okutomi, Masatoshi	Tokyo Inst. of Tech
Keywords: Segmentation, features and descriptors, Image based modeling, Reconstruction and camera motion estimation Abstract: For images taken from very different viewpoints, we propose a new feature matching algorithm that provides accurate matches while preserving high matchability. Our method first synthesizes images by simulating the viewpoint changes. It then learns variation of local feature descriptors induced by the viewpoint changes. Finally, we robustly match feature descriptors by measuring the similarity using the learned variation. Our method is particularly useful for matching new query images to target image archived in a database. We demonstrate the benefits of the proposed method in terms of accuracy and computational time through experiments using several wide-baseline image datasets.

15:00-17:10, Paper TuPT3.8
Object Segmentation Using Low-Rank Representation with Multiple Block-Diagonal Priors
Dai, Lingzheng	Nanjing Univ. of Science and Tech
Ding, Jundi	Nanjing Univ. of Science and Tech
Chen, Jinhui	Nanjing Univ. of Science and Tech
Li, Jun	Nanjing Univ. of Science and Tech
Yang, Jian	Nanjing Univ. of Science and Tech
Keywords: Segmentation, features and descriptors, Low-level vision Abstract: This paper addresses the problem of segmenting objects for natural images by leveraging multiple segmentation methods. Existing image segmentation algorithms mostly partition the image into some coherent segments instead of extracting the object entirely. We observe that basic elements (e.g., superpixels) in the common segment produced by many methods are highly-correlated -- they belong to the same object and generally lead to a low-rank structure. To make use of this information, this paper presents a novel approach to learn a low-rank affinity by leveraging various algorithms for segmenting the objects. First, starting from the segments produced using various algorithms, the label information for basic elements of an image, segmentation-to-superpixel label, is generated. Second, the derived labels are used to construct the multiple block-diagonal priors which are integrated into the subsequent affinity learning process. Finally the segmentation result is achieved by applying the spectral clustering technique on the obtained affinity matrix. Comprehensive experiments on MSRC, Alpert's and Berkeley segmentation datasets validate that our proposed approach achieves superior results as compared to each individual method. Moreover, our method is shown to be competitive in comparison to the state-of-the-art methods.

15:00-17:10, Paper TuPT3.9
USEQ: Ultra-Fast Superpixel Extraction Via Quantization
Huang, Chun-Rong	National Chung Hsing Univ
Wang, Wei-An	National Chung Hsing Univ
Lin, Szu-Yu	National Chung Hsing Univ
Lin, Yen-Yu	Acad. Sinica
Attachments: Supplementary material Keywords: Segmentation, features and descriptors, Low-level vision Abstract: We propose a novel superpixel extraction method named USEQ to generate regular and compact superpixels. To reduce the computational burden of iterative optimization procedures used in most recent approaches, the spatial and color quantizations are performed in advance to represent pixels and superpixels. Maximum a posteriori estimation in both pixel and region levels is then adopted to aggregate pixels into spatially and visually coherent superpixels. The resultant superpixels are extremely efficient to generate and can more precisely adhere to object boundaries. Compared to the state-of-the-art approaches to superpixel extraction, USEQ can achieve better or competitive performance in terms of boundary recall, undersegmentation error and achievable segmentation accuracy, and is significantly faster than these approaches.

15:00-17:10, Paper TuPT3.10
Robust Region Extraction of Moving Objects in Dynamic Background
Mori, Shun	NTT Comware Corp
Kasahara, Yuya	Tohoku Univ
Abe, Toru	Tohoku Univ
Suganuma, Takuo	Tohoku Univ
Keywords: Segmentation, features and descriptors, Pattern Recognition for Surveillance and Security, Scene understanding Abstract: We propose a novel background subtraction method for robust region extraction of moving objects in the dynamic background. In our method, a set of recently observed frame (reference) images is used as a background model. To withstand the constant fluctuations of background appearance, a current frame (input) image is compared with every reference image, and pixels in the input image deviated from those in the reference images are extracted as moving object regions. Such a simply-structured background model enables our method to adaptively vary the size of a spatial block for comparing the input image with the reference images. Setting an appropriate size block for each subject pixel in the input image, our method increases the stability of image comparison, and then improves the robustness of region extraction. Experimental results indicate that our method outperforms the existing methods in the region extraction accuracy of moving objects in the dynamic background.

15:00-17:10, Paper TuPT3.11
Feature Descriptor Based on Local Intensity Order Relations of Pixel Group
Liao, Wen-Hung	National Chengchi Univ
Wu, Chia-Chen	Dept. of Computer Science, National Chengchi Univ
Lin, Ming-Ching	Dept. of Computer Science, National Chengchi Univ
Keywords: Segmentation, features and descriptors, Representation and analysis in pixel/voxel images, Image and video analysis and understanding Abstract: Robust image features are essential in building effective image recognition engines. These features can be constructed according to various principles, such the distribution of local gradients (Histogram of Oriented Gradients, HOG), the relationship between two pixels (Local Binary Descriptors, LBD), or local intensity order statistics (Local Intensity Order Patterns, LIOP). Because the feature dimension grows quickly as one considers the ordering relations of a group of N (N>2) pixels, few researchers have exploited local order statistics among a pixel set to define an image feature. In this paper, we propose a novel approach to construct a feature descriptor using local intensity order relations (LIOR) in a pixel group. In contrast to LIOP where the feature dimension increases drastically with the number of elements in a set, the size of LIOR is manageable. Moreover, LIOR ensures the stability of ordering by encoding the intensity differences as weights. Two different strategies for assigning the weights have been devised and tested. Experimental results indicate that the proposed methods yield better or comparable performance for different types of image degradation when compared to the original LIOP. Additionally, the storage requirement is significantly lower when the number of pixels in a group increases.

15:00-17:10, Paper TuPT3.12
EISeg: Effective Interactive Segmentation
Xian, Min	Utah State Univ
Xu, Fei	Utah State Univ
Cheng, Heng-Da	Utah State Univ
Zhang, Yingtao	Harbin Inst. of Tech
Ding, Jianrui	Harbin Inst. of Tech
Keywords: Segmentation, features and descriptors, Signal, image and video processing Abstract: Interactive image segmentation is a popular and challenging task. User interactions, e.g., setting seeds or specifying bounding box, play a critical role in determining the performance of all interactive segmentation approaches. However, most methods focus on improving segmentation performance by integrating higher level information; and to the best of our knowledge, no work has been done to improve the effectiveness of user interactions yet. In this paper, we propose the effective interactive segmentation (EISeg) method based on Neutro-Connectedness, which provides user with objective visual clues for guiding interactions. The experiments demonstrate that the proposed EISeg method guides interaction effectively, and achieves better results with much less user interaction (averagely 2.3 foreground and 1.8 background seeds/image) than state-of-the-art approaches.

15:00-17:10, Paper TuPT3.13
Comparative Study of Descriptors with Dense Key Points
Chatoux, Hermine	Univ. of Poitiers, Xlim Lab
Lecellier, François	Univ. of Poitiers, Xlim Lab
Fernandez-Maloigne, Christine	Univ. of Poitiers, Xlim Lab
Keywords: Segmentation, features and descriptors, Signal, image and video processing, Image and video analysis and understanding Abstract: A great deal of features detectors and descriptors are proposed every years for several computer vision applications. In this paper, we concentrate on dense detector applied to different descriptors. Eight descriptors are compared, three from gradient based family (SIFT, SURF, DAISY), others from binary category (BRIEF, ORB, BRISK, FREAK and LATCH). These descriptors are created and defined with certain invariance properties. We want to verify their invariances with various geometric and photometric transformations, varying one at a time. Deformations are computed from an original image. Descriptors are tested on five transformations: scale, rotation, viewpoint, illumination plus reflection. Overall, descriptors display the right invariances. Objective of this paper is to establish a reproducible protocol to test descriptors invariances.

15:00-17:10, Paper TuPT3.14
Bags of Spatial Relations and Shapes Features for Structural Object Description
Clément, Michaël	Univ. Paris Descartes
Kurtz, Camille	Univ. Paris Descartes
Wendling, Laurent	Univ. Paris Descartes
Keywords: Segmentation, features and descriptors, Statistical, syntactic and structural pattern recognition, Image and video analysis and understanding Abstract: We introduce a novel bags-of-features framework based on relative position descriptors, modeling both spatial relations and shape information between the pairwise structural subparts of objects. First, we propose a hierarchical approach for the decomposition of complex objects into structural subparts, as well as their description using the concept of Force Histogram Decomposition (FHD). Then, an original learning methodology is presented, in order to produce discriminative hierarchical spatial features for object classification tasks. The cornerstone is to build an homogeneous vocabulary of shapes and spatial configurations occurring across the objects at different scales of decomposition. An advantage of this learning procedure is its compatibility with traditional bags-of-features frameworks, allowing for hybrid representations of both structural and local features. Classification results obtained on two datasets of images highlight the interest of this approach based on hierarchical spatial relations descriptors to recognize structured objects.

15:00-17:10, Paper TuPT3.15
Multiscale Line Segment Detector for Robust and Accurate SfM
Salaun, Yohann	LIGM,
Marlet, Renaud	École Des Ponts ParisTech
Monasse, Pascal	Ec. Des Ponts ParisTech
Keywords: Segmentation, features and descriptors, Stereo and multiple view geometry Abstract: We propose a multiscale extension of a well-known line segment detector, LSD. We show that its multiscale nature makes it much less prone to over-segmentation, more robust to low contrast and less sensitive to noise, while keeping the parameterless advantage of LSD and still being fast. Moreover, we show that in scenes with little or no feature points, but where it is however possible to perform structure from motion from matched line segments, the accuracy is significantly improved. This provides an objective and automatic quantitative assessment of our detector that goes much beyond the usual qualitative visual inspection found in the literature.

15:00-17:10, Paper TuPT3.16
An Integrated Descriptor for Texture Classification
Nguyen, Vu-Lam	Univ. of Cergy-Pontoise
Vu, Ngoc-Son	ENSEA
Phan, Hai-Hong	Univ. of Cergy-Pontoise
Gosselin, Philippe Henri	ETIS / ENSEA - Univ. De Cergy-Pontoise - CNRS
Attachments: Supplementary material Keywords: Segmentation, features and descriptors, Texture and color analysis, Signal, image and video processing Abstract: Regarding texture features, Local-based methods such as Local Binary Pattern (LBP) and its variants are computationally efficient high-performing but sensitive to noise, and suffering global structure information loss. By contrast, filter-based counterparts, the scattering transform for instance, are tolerant to noise and translation but often lack of small local structure information. In this paper we propose an integration of those to take full advantages of both local and global features. In this way, LBP is used for extracting local features while the Scattering Transform feature plays the role of a global descriptor. In addition to the combination of these two state-of-the-art features, we further integrate a new preprocessing technique called biologically-inspired filtering(BF) as well as an efficient PCA classifier. Intensive experiments conducted on many texture benchmarks such as CUReT, UIUC, KTH-TIPS2b, and OUTEX show that our combined method not only outweighs each one which stands alone but also competes with state-of-the-art on the experimented datasets.

15:00-17:10, Paper TuPT3.17
Learning Rotation Invariant Convolutional Filters for Texture Classification
Marcos, Diego	Univ. of Zurich
Volpi, Michele	Univ. of Zurich
Tuia, Devis	Univ. of Zurich
Keywords: Texture and color analysis, Artificial neural networks, Representation and analysis in pixel/voxel images Abstract: We present a method for learning discriminative filters using a shallow Convolutional Neural Network (CNN). We encode rotation invariance directly in the model by tying the weights of groups of filters to several rotated versions of the canonical filter in the group. These filters can be used to extract rotation invariant features well-suited for image classification. We test this learning procedure on a texture classification benchmark, where the orientations of the training images differ from those of the test images. We obtain results comparable to the state-of-the-art. Compared to standard shallow CNNs, the proposed method obtains higher classification performance while reducing by an order of magnitude the number of parameters to be learned.

15:00-17:10, Paper TuPT3.18
Virtual Hexagonal and Multi-Scale Operator for Fuzzy Rank Order Texture Classification Using One-Dimensional Generalised Fourier Analysis
Brandtberg, Tomas	Linkoping Univ
Keywords: Texture and color analysis, Classification and clustering, Statistical, syntactic and structural pattern recognition Abstract: This paper presents a study on a family of local hexagonal and multi-scale operators useful for texture analysis. The hexagonal grid shows an attractive rotation symmetry with uniform neighbour distances. The operator depicts a closed connected curve (1D periodic). It is resized within a scale interval during the conversion from the original square grid to the virtual hexagonal grid. Complementary image features, together with their tangential first-order hexagonal derivatives, are calculated. The magnitude/phase information from the Fourier or Fractional Fourier Transform (FFT, FrFT) are accumulated in thirty different Cartesian (polar for visualisation) and multi-scale domains. Simultaneous phase-correlation of a subset of the data gives an estimate of scaling/rotation relative the references. Similarity metrics are used as template matching. The sample, unseen by the system, is classified into the group with the maximum fuzzy rank order. An instantiation of a 12-point hexagonal operator (radius=2) is first successfully evaluated on a set of thirteen Brodatz images (no scaling/rotation). Then it is evaluated on the more challenging KTH-TIPS2b texture dataset (scaling/rotation, varying pose/illumination). A confusion matrix and cumulative fuzzy rank order summaries show, for example, that the correct class is top-ranked 44-50% and top-three ranked 68-76% of all sample images. A similar evaluation, using a box-like 12-point mask of square grids, gives overall lower accuracies. Finally, the FrFT parameter is an additional tuning parameter influencing the accuracies significantly.

15:00-17:10, Paper TuPT3.19
Three-Dimensional Gaussian Mixture Texture Model
Haindl, Michael	Inst. of Information Theory and Automation
Havlicek, Vojtech	Inst. of Information Theory and Automation
Keywords: Texture and color analysis, Enhancement, restoration and filtering, Coding, compression and super-resolution Abstract: Visual texture modeling based on multidimensional mathematical models is the prerequisite for both robust material recognition as well as for image restoration, compression or numerous physically correct virtual reality applications. A novel multispectral visual texture modeling method based on a descriptive, unusually complex, three-dimensional, spatial Gaussian mixture model is presented. Texture synthesis benefits from easy computation of arbitrary conditional distributions from the model. The model is inherently multispectral thus it does not suffer with the spectral quality compromises of the spectrally factorized alternative approaches. The model is especially well suited for multispectral textile textures and it can also describe the most advanced textural representation in the form of a bidirectional texture function (BTF).

15:00-17:10, Paper TuPT3.20
Texture Classification with Discrete Fractional Fourier Transform
Zheng, Liying	Harbin Engineering Univ
Chu, Yan	Harbin Engineering Univ
Keywords: Texture and color analysis, Segmentation, features and descriptors, Classification and clustering Abstract: Based on the fact that the discrete fractional Fourier transform (DFrFT) of a signal consists of not only spatial but frequency characteristics, a novel texture classification algorithm is presented in this paper. A 1-D local DFrFT is proposed and the fractional frequency histogram is established to represent a texture. Moreover, the conceptions such as the between-class scatter and the within-class scatter, the coarse-to-fine searching, and the minimum χ2-statistic distance are also adopted to complete the classification. The simulation results on four groups of benchmark texture images validate the performance of the proposed texture classification method.

15:00-17:10, Paper TuPT3.21
Texture Image Segmentation Using Fused Features and Active Contour
Gao, Mingqi	Chongqing Univ
Chen, Hengxin	Chongqing Univ
Zheng, Shenhai	Chongqing Univ
Fang, Bin	Chongqing Univ
Zhang, Lin	Chongqing Univ
Keywords: Texture and color analysis, Segmentation, features and descriptors, Signal, image and video processing Abstract: This paper introduces an effective active contour model for texture segmentation. To improve the robustness against noise and illumination, a novel descriptor named local statistical variation degree (LSVD) is presented to express textural features, which uses corner point deletion and isolated region detection operations to eliminate image patches unrelated with object regions. And then the fused features combined LSVD with Gabor can be constructed to express image structure in many scene. During the texture segmentation stage, a factorization based fitting energy is proposed to measure the weights of representative features in the features computed from image regions. This fitting energy can be used to localize region boundary more accurately. Moreover, a boundary shrinking method is put forward to improve the reliability of representative features. By comparing our proposed method with the recent texture segmentation models on synthetic images and natural images, we demonstrate that the novel active contour model can obtain accurate segmentation results and is robust to noise, illumination and position of initial contour.


TuPT4	Poster Session Hall
TuP4	Poster Session

15:00-17:10, Paper TuPT4.1
Spatio-Temporal Pattern Recognition of Dendritic Spines and Protein Dynamics Using Live Multichannel Fluorescence Microscopy
On, Vincent	Univ. of California, Riverside
Zahedi, Atena	Univ. of California Riverside
Ethell, Iryna	Univ. of California Riverside
Bhanu, Bir	Univ. of California
Keywords: Pattern Recognition for Bioinformatics, Biological image and signal analysis, Modeling, simulation and visualization Abstract: Actin-regulating proteins, such as cofilin, are essential in regulating the shape of dendritic spines, and synaptic plasticity in both neuronal functionality as well as in neurodegeneration related to aging. The analysis of the motility of cofilin in fluorescence video-microscopy allows the discovery of its effects on cell functions. However, the flow of cofilin has not been analyzed to date by automatic means. This paper presents a novel automated pattern recognition system to analyze protein trafficking in neurons. Using spatio-temporal information present in multichannel fluorescence videos, the system generates a temporal maximum intensity projection that enhances the signal-to-noise ratio of important biological structures, segments and tracks dendritic spines, and quantifies the flux and density of proteins in spines. The temporal dynamics of spines is used to generate spine energy images which are used to automatically classify the shape of dendritic spines as stubby, mushroom, or thin. By tracking these spines over time and using their intensity profiles, the system is able to analyze the flux patterns of cofilin and other fluorescently stained proteins. The cofilin flux patterns is found to be correlated with the dynamically changing dendritic spine shapes. The results are presented using multichannel fluorescence videos.

15:00-17:10, Paper TuPT4.2
Contextual Similarity Regularized Metric Learning for Person Re-Identification
Wang, Jin	School of Automation, Huazhong Univ. of Science and Tech
Zheng, Wang	School of Computer, Wuhan Univ
Gao, Changxin	Huazhong Univ. of Science and Tech
Junkang, Zhu	School of Physics, Nanjing Univ
Sang, Nong	Huazhong Univ. of Science and Tech
Huang, Rui	School of Automation, Huazhong Univ. of Science and Tech
Keywords: Pattern Recognition for Bioinformatics, Machine learning and data mining, Image and video analysis and understanding Abstract: Person re-identification, aiming to match a specific person among non-overlapping cameras, has attracted plenty of attention in recent years. It can be regarded as a visual retrieval task, namely given a query person image, ranking all gallery images according to their similarities to the query. Conventionally, this similarity function is learnt by forcing intra-distances to be small while inter-distances to be large, which are referred to as individual similarity constraints. In this paper, we propose to learn the similarity function by taking into account of both individual similarity constraints and contextual similarity constraints. The context of a query is defined as its k-nearest neighbors in the gallery. We argue that if two images are from the same person, apart from the visual likeness between them, denoted as the individual similarity, they should also possess similar k-nearest neighbors in the gallery, denoted as the contextual similarity. Motivated by this assumption, we propose a new Contextual Similarity Regularized Metric Learning (CSRML) method for person re-identification. The contextual similarity regularization term forces two images of the same person to share similar context. Both individual and contextual similarity constraints are encoded by a large margin logistic loss function and the final problem is solved by the stochastic gradient descent algorithm. Experiments on the challenging VIPeR and CUHK01 datasets show that our approach achieves very competitive performance.

15:00-17:10, Paper TuPT4.3
Pattern Localization in Historical Document Images Via Template Matching
En, Sovann	Univ. of Rouen
Petitjean, Caroline	Univ. De Rouen
Nicolas, Stephane	Univ. of Rouen
Heutte, Laurent	Univ. De Rouen
Jurie, Frederic	Univ. of Caen
Keywords: Pattern Recognition for Search, Retrieval and Visualization, Content based image retrieval and data mining, Pattern Recognition for Art, Cultural Heritage and Entertainment Abstract: Template matching is a classical and essential step in many pattern recognition, object detection or video tracking systems. This paper aims at integrating and evaluating different template matching methods in the context of pattern spotting in historical document images --- i.e. the search for occurrences of a given visual pattern in document images. Given a query image, our pattern spotting system first computes the similarity score between the query signature and the signatures of a few regions provided by a region proposal algorithm. The top ranked regions are then selected for further processing. Template Matching is then applied in the neighborhood of the selected regions to precisely locate and rank the candidate windows that maximize the matching score. This paper builds upon popular template matching approaches and provides a unified testing framework for historical document image pattern spotting. The experimentations offer an insight on how to choose the most promising techniques for historical document images. This paper also proposes an improvement over these standard template matching approaches to significantly increase the overall performance.

15:00-17:10, Paper TuPT4.4
Human Activity Prediction Based on Sub-Volume Relationship Descriptor
Lee, Dong-Gyu	Korea Univ
Lee, Seong-Whan	Korea Univ
Keywords: Pattern Recognition for Surveillance and Security, Human body motion and gesture based interaction Abstract: In this paper, we address the problem of recognizing unfinished human activity from partially observed videos. Specifically, we propose a novel human activity descriptor, which can represent pairwise relationships among human activities in a compact manner using pre-trained Convolutional Neural Networks (CNNs) by capturing the discriminative sub-volume. The potentially important relationship among all pairwise subvolumes, called key-volumes, is automatically captured using global and local motion activation and the ratio of the participant. The captured key-volumes without prior knowledge hold discriminative information related to the unfinished activity. The key-volume information is considered in the descriptor construction procedure. Training a CNN model for a particular purpose requires a lot of resources, such as large amount of labeled data and computing power, despite its representational power. Thus, we develop a method to utilize pre-trained CNN without any additional model training procedure. The low-level features can be extracted through existing CNN toolkits. For a real application, the proposed method may be more costeffective while implementing a smart surveillance system to understand human activity. In our experiments, we compare the performances of the proposed method with other state-of-theart human activity prediction methods for two public datasets; the results of the experiments show that the proposed method outperforms these competing methods.

15:00-17:10, Paper TuPT4.5
Granular Trajectory Based Anomaly Detection for Surveillance
Maiorano, Francesco	Univ. of Naples "Parthenope"
Petrosino, Alfredo	Univ. of Naples "Parthenope"
Keywords: Pattern Recognition for Surveillance and Security, Machine learning and data mining, Classification and clustering Abstract: Surveillance systems are widely spread in many public and private places of social life such as streets, offices, universities, hospitals and parking lots. The amount of data to be processed grows together with the need of skilled experts able to interpret it. For this reason, the scientific community puts a great effort in conceiving approaches able to detect anomalies automatically. Whichever the systems that gather the data, be they heat sensors, cameras or gps trackers, anomalous event detection must relies on approaches able to promptly detect suspicious activities. To tackle this problem, we present an online point-wise approach to the analysis of spatio-temporal (trajectories) datasets in order to detect anomalous behaviors. We model trajectory dataset points as a Rough Set and use the ROSE outlier detection algorithm, performing a sliding temporal window scan to detect anomalies in real time. We demonstrate the robustness of this method using the CUHK crowd and the University of Udine trajectory datasets, which provide different combinations of trajectory scenarios: one group or multiple groups, one anomaly or multiple anomalies.

15:00-17:10, Paper TuPT4.6
Mutual Interaction Model between the Number of People in Real Space and the Number of Tweets in Virtual Space
Onishi, Masaki	NationalInstituteofAdvancedIndustrialScienceandTechnology(AIST)
Nakashima, Shinnosuke	Japan Advanced Inst. of Science and Tech. (JAIST)
Keywords: Pattern Recognition for Surveillance and Security, Machine learning and data mining, Multimedia analysis, indexing and retrieval Abstract: The relationship between human behavior and SNS (social networking service) activity is growing stronger every year as SNS websites such as Twitter and Facebook develop and expand. Accordingly, a significant number of studies related to SNS have also been reported. However, the majority of this research, such as that which examines the transmission of information via SNS websites, focuses entirely on the virtual world of SNS and few studies have established connections between virtual and real space. This paper focuses on one SNS website, Twitter, and proposes a method for analyzing the statistical relationship between the number of tweets posted on Twitter and the number of people visiting a real world location in the form of a mutual interaction model consisting of nodes and links. We conducted an experiment to verify the proposed method, in which we compared measurements of the number of passers-by in Akihabara with the number of tweets posted.

15:00-17:10, Paper TuPT4.7
Location-Associated Indoor Behavior Analysis of Multiple Persons
Suzuuchi, Syota	Graduate School of Information Science and Tech. Hokkaido U
Kudo, Mineichi	Hokkaido Univ
Keywords: Pattern Recognition for Surveillance and Security, Motion, tracking and video analysis Abstract: Recording of the activity of people working in an office or in a living room is important for several goals: to design an evacuation route, to measure the degree of ADL (Activity of Daily Living) of single-living elderly persons, and to analyze the working contents of people, and so on. Camera systems are available for these goals, but they are weak for the light condition change (not available at dark in general) and invasive for the users’ privacy. Therefore, we use an infrared ceiling sensor network for localizing moving people and use some extra pieces of evidence for identifying them. Such ID evidence is taken from a finger-vein authentication system at the entrance/exit and from the staying duration at their personal desks. The kind of activity of individuals is determined by the locations and the staying duration related to those activities, e.g., spending two minutes or more at kitchen means “cooking.” A challenging task is to find how to recover missing IDs and related their activities. This sometimes happens when two or more persons cross or meet at a place or someone sits at a desk silently, because a motion sensor reacts only when something with human temperature “moves” in the focus of view. This system succeeded in recording of the activity of three persons for three hours at 87.4% and six persons for seven hours at 15.0%.

15:00-17:10, Paper TuPT4.8
Person Re-Identification Using Co-Occurrence Attributes of Physical and Adhered Human Characteristics
Nishiyama, Masashi	Tottori Univ
Nakano, Shota	Tottori Univ
Yotsumoto, Tappei	Tottori Univ
Yoshimura, Hiroki	Tottori Univ
Iwai, Yoshio	Tottori Univ
Sugahara, Kazunori	Tottori Univ
Keywords: Pattern Recognition for Surveillance and Security, Soft biometrics, Other Biometric applications Abstract: We propose a novel method for extracting features from images of people using co-occurrence attributes, which are then used for person re-identification. Existing methods extract features based on simple attributes such as gender, age, hair style, or clothing. Our method instead extracts more informative features using co-occurrence attributes, which are combinations of physical and adhered human characteristics (e.g., a man wearing a suit, 20-something woman, or long hair and wearing a skirt). Our co-occurrence attributes were designed using prior knowledge of methods used by public websites that search for people. Our method first trains co-occurrence attribute classifiers. Given an input image of a person, we generate a feature by vectorizing confidences estimated using the classifiers and compute a distance between input and reference vectors with a metric learning technique. Our experiments using a number of publicly available datasets show that our method substantially improved the matching performance of the person re-identification results, when compared with existing methods. We also demonstrated how to analyze the most important co-occurrence attributes.

15:00-17:10, Paper TuPT4.9
Kernel Hierarchical PCA for Person Re-Identification
Prates, Raphael	Univ. Federal De Minas Gerais (UFMG)
Schwartz, William	Federal Univ. of Minas Gerais
Keywords: Pattern Recognition for Surveillance and Security, Support vector machines and kernel methods, Dimensionality reduction and manifold learning Abstract: Person re-identification (Re-ID) maintains a global identity for an individual while he moves along a large area covered by multiple cameras. Re-ID enables a multi-camera monitoring of individual activity that is critical for surveillance systems. However, the low-resolution images combined with the different poses, illumination conditions and camera viewpoints make person Re-ID a challenging problem. To reach a higher matching performance, state-of-the-art methods map the data to a nonlinear feature space where they learn a cross-view matching function using training data. Kernel PCA is a statistical method that learns a common subspace that captures most of the variability of samples using a small number of vector basis. However, Kernel PCA disregards that images were captured by distinct cameras, a critical problem in person Re-ID. Differently, Hierarchical PCA (HPCA) captures a consensus projection between multiblock data (e.g, two camera views), but it is a linear model. Therefore, we propose the Kernel Hierarchical PCA (Kernel HPCA) to tackle camera transition and dimensionality reduction in a unique framework. To the best of our knowledge, this is the first work to propose a kernel extension to the multiblock HPCA method. Experimental results demonstrate that Kernel HPCA reaches a matching performance comparable with state-of-the-art nonlinear subspace learning methods at PRID450S and VIPeR datasets. Furthermore, Kernel HPCA reaches a better combination of subspace learning and dimensionality requiring significantly lower subspace dimensions.

15:00-17:10, Paper TuPT4.10
Building Semantic Understanding Beyond Deep Learning from Sound and Vision
Souza, Fillipe Dias Moreira de	Univ. of South Florida
Camara Chavez, Guillermo	Federal Univ. of Ouro Preto
Sarkar, Sudeep	Univ. of South Florida
Keywords: Statistical, syntactic and structural pattern recognition, Image and video analysis and understanding, Deep learning Abstract: Deep learning-based models have recently been widely successful at outperforming traditional approaches in several computer vision applications such as image classification, object recognition and action recognition. However, those models are not naturally designed to learn structural information that can be important to tasks such as human pose estimation and structured semantic interpretation of video events. In this paper, we demonstrate how to build structured semantic understanding of audio-video events by reasoning on multiple-label decisions of deep visual models and auditory models using Grenander's structures for imposing semantic consistency. The proposed structured model does not require joint training of the structural semantic dependencies and deep models. Instead they are independent components linked by Grenander's structures. Furthermore, we exploited Grenander's structures as a means to facilitate and enrich the model with fusion of multimodal sensory data; in particular, auditory features with visual features. Overall, we observed improvements in the quality of semantic interpretations using deep models and auditory features in combination with Grenander's structures, reflecting as numerical improvements of up to 11.5% and 12.3% in precision and recall, respectively.

15:00-17:10, Paper TuPT4.11
An Edge-Based Matching Kernel on Commute-Time Spanning Trees
Bai, Lu	Central Univ. of Finance and Ec
Cui, Lixin	School of Information, Central Univ. of Finance and Ec
Escolano, Francisco	Univ. of Alicante
Hancock, Edwin	Univ. of York
Keywords: Statistical, syntactic and structural pattern recognition, Machine learning and data mining Abstract: Bai and Hancock recently proposed a novel edge-based matching kernel for graphs~cite{DBLP:conf/caip/BaiZWH15}, by aligning depth-based representations. Unfortunately, one drawback arising in their kernel is the computational inefficiency for large graphs. This follows the fact that their kernel is essentially defined on directed line graphs. Moreover, the computational complexity of the kernel is cubic in the vertex number of the line graph. Since the directed line graph is a dual representation and each vertex represents a directed arc residing on the edge of the original graph. For a graph having n vertices, there may be at most n(n-1) directed arcs residing on the edges and thus at most n(n-1) vertices in the directed line graph. As a result, computing the kernel through the line graphs may require time complexity O(n^6) for the worst case, making the kernel unapplicable for graphs having hundreds of vertices. The aim of this paper is to overcome this inefficiency, by proposing a new edge-based matching kernel. In order to cope with large graph structures, we propose to construct a sparser version of the original graph using the simplification method introduced in~cite{DBLP:journals/pr/QiuH07}. More specifically, we compute the minimum spanning tree over the commute time matrix of a graph. This spanning tree representation minimizes the number of edges of the original graph while preserving most of its structural information. With this simplification method to hand, the new edge-based matching kernel between two graphs is then computed on the directed line graphs transformed from their respective minimum spanning trees. We show that this strategy significantly reduces the computational complexity to O(n^3). We evaluate the performance of the proposed kernels on several standard graph datasets. The experimental results demonstrate the effectiveness and efficiency.

15:00-17:10, Paper TuPT4.12
Look Who Is NOT Talking: Assessing Engagement Levels in Panel Conversations
Cote, Melissa	Univ. of Victoria
Dash, Amanda	Univ. of Victoria
Branzan Albu, Alexandra	Univ. of Victoria
Keywords: Group interaction: analysis of verbal and non-verbal communication, Gesture and Behavior Analysis Abstract: Nonverbal cues constitute a significant part of human communication. Traditionally the object of psychology, nonverbal communication studies now permeate fields such as social signal processing and human computer interaction. The ubiquity of digital recordings of human social interactions and of free sharing platforms offers many opportunities for the automated analysis of group interaction dynamics; yet, most research relies on multimodal cues and strict setups, which are incompatible with this vast pool of video data. In this paper, we focus on the automatic identification of non-talking participants in videos of panel conversations acquired in uncontrolled environments, based solely on visual nonverbal cues. Our approach characterizes human body motion with a novel feature descriptor based on a non-linear model of pixel change history; motor behavioral patterns derived from this descriptor are then utilized via supervised machine learning to identify non-talking participants in each frame and provide an assessment of the participants' engagement levels. Performance evaluation on a challenging dataset demonstrated the effectiveness of our approach to detect non-speakers, with an overall F-score of 86.2%, as well as its robustness to varied settings. To the best of our knowledge, this is the first attempt at identifying non-talking participants for engagement level assessments from a computer vision viewpoint, which has several relevant applications, such as content-based video retrieval and video summarization.

15:00-17:10, Paper TuPT4.13
Human Group Activity Recognition Based on Modelling Moving Regions Interdependencies
Stephens, Kyle	Univ. of York
Bors, Adrian	Univ. of York
Keywords: Group interaction: analysis of verbal and non-verbal communication, Gesture and Behavior Analysis, Segmentation, features and descriptors Abstract: In this research study, we model the interdependency of actions performed by people in a group in order to identify their activity. Unlike single human activity recognition, in interacting groups the local movement activity is usually influenced by the other persons in the group. We propose a model to describe the discriminative characteristics of group activity by considering the relations between motion flows and the locations of moving regions. The inputs of the proposed model are jointly represented in time-space and time-movement spaces. These spaces are modelled using Kernel Density Estimation (KDE) which is then fed into a machine learning classifier. Unlike in other group-based human activity recognition algorithms, the proposed methodology is automatic and does not rely on any pedestrian detection or on the manual annotation of tracks.

15:00-17:10, Paper TuPT4.14
3D Gesture-Based Interaction for Immersive Experience in Mobile VR
Yousefi, Shahrouz	Linnaeus Univ
Tewele, Mhretab Kidane	Manomotion
Real Delgado, Yeray	ManoMotion
Chana, Julio	Manomotion
Reski, Nico	Linnaeus Univ
Attachments: Supplementary material Keywords: Gesture and Behavior Analysis, Image and video analysis and understanding, Human Computer Interaction Abstract: In this paper we introduce a novel solution for real-time 3D hand gesture analysis using the embedded 2D camera of a mobile device. The presented framework is based on forming a large database of hand gestures including the ground truth information of hand poses and details of finger joints in 3D. For a query frame captured by the mobile device's camera in real time, the gesture analysis system finds the best match from the database. Once the best match is found, the corresponding ground truth information will be used for interaction in the designed interface. The presented framework performs an extremely efficient gesture analysis (more than 30 fps) in flexible lighting condition and complex background with dynamic movement of the mobile device. The introduced work is implemented in Android and tested in Gear VR headset.


TuPT5	Poster Session Hall
TuP5	Poster Session

15:00-17:10, Paper TuPT5.1
Pollen Recognition in Optical Microscopy by Matching Multifocal Image Sequences
Filipovych, Roman	Univ. of Pennsylvania
Daood, Amar	Florida Inst. of Tech
Ribeiro, Eraldo	Florida Inst. of Tech
Bush, Mark	Florida Inst. of Tech
Keywords: Biological image and signal analysis, Classification and clustering, Pattern Recognition for Search, Retrieval and Visualization Abstract: We present a novel technique for pollen identification from sets of multifocal image sequences obtained from optical microscopy. Our algorithm analyzes the visual texture of pollen grains for each focal image, and performs identification using a fast sequence-matching algorithm. Although we develop a pollen-recognition protocol, the method is applicable to other microscopy object-recognition tasks. The proposed method requires little manual interaction, and does not rely on specialized imaging procedures such as florescence and deconvolution. We test our method on images of tropical fossil pollen.

15:00-17:10, Paper TuPT5.3
A Bone Marrow Cavity Segmentation Method Using Wavelet-Based Texture Feature
Shigeta, Hironori	Osaka Univ
Mashita, Tomohiro	Osaka Univ
Kikuta, Junichi	Osaka Univ
Seno, Shigeto	Osaka Univ
Takemura, Haruo	Osaka Univ
Matsuda, Hideo	Osaka Univ
Ishii, Masaru	Osaka Univ
Keywords: Biological image and signal analysis, Texture and color analysis, Segmentation, features and descriptors Abstract: A better understanding of in vivo bio images is expected to contribute to the discovery of new drugs and mechanisms of disease. To improve the contributions of in vivo bioimaging, the extraction of a particular region is required in order to detect a particular cell's motion because manual image processing of a massive number of images is unrealistic. One of the issues for automatic image-segmentation is that conditions of image-taking are variable. Thus, some manual input and/or manual tuning of some parameters is required to adjust each image. To reduce manual operation for image processing of bone marrow cavity segmentation, we focused on the texture pattern of bone marrow cavity. In this paper, we propose a bone marrow cavity segmentation method using support vector machine and wavelet-based texture feature. The proposed method does not require manual inputs to obtain distribution of intensity before processing, because the texture patterns of bone marrow cavity regions are integrated into the system in advance. Moreover, it is applicable to a particular frame in an image sequence in which the condition of fluorescent material is variable because it does not require temporal variation or initial frame for the segmentation. In the experiment, we evaluated our method with nine types of mother wavelets and several sets of scale parameters. The bone marrow cavity segmentation, using graph-cuts with our texture pattern classification, performs well without manual inputs by a user.

15:00-17:10, Paper TuPT5.4
Sub-Classification Strategies for Tree Species Recognition
Ben Ameur, Rihab	Univ. De Savoie
Valet, Lionel	Univ. of Savoie
Coquin, Didier	Univ. De Savoie
Keywords: Classification and clustering Abstract: In the context of tree species recognition, botanists knowledge was used in different works specially when recognising tree species through leaves. In this paper, two sub-classification strategies for tree species recognition are proposed. For each sub-classification strategy, Basic belief assignment (Bba) was determined and obtained data were fused thanks to a totally adaptive fusion system implemented in the general framework of belief functions.

15:00-17:10, Paper TuPT5.6
HEp-2 Specimen Classification Via Deep CNNs and Pattern Histogram
Li, Hongwei	Sun Yat-Sen Univ
Huang, Hao	Sun Yat-Sen Univ
Zheng, Wei-Shi	Queen Mary Univ. of London
Xie, Xiaohua	Sun Yat-Sen Univ
Zhang, Jianguo	Univ. of Dundee
Keywords: Computer-aided detection and diagnosis, Medical image and signal analysis, Biological image and signal analysis Abstract: Automatic classification of Human Epithelial Type-2 (HEp-2) specimen patterns is an important yet challenging problem in medical image analysis. Most prior works have primarily focused on cells images classification problem which is one of the early essential steps in the system pipeline, while less attention has been paid to the classification of whole-specimen ones. In this work, a specimen pattern recognition system combining convolutional neural networks (CNNs) and pattern histogram was proposed. The pattern histograms were obtained based on the prediction of each single cell inside the specimens. Two strategies were designed to predicted the pattern of a whole specimen: 1) the most dominant cell pattern in pattern histogram was represented as the specimen pattern, 2) the pattern histograms were employed as bags of patterns and then were trained and predicted separately by a SVM classifier. Experimental results show that the proposed system is effective and achieves high classification accuracy on public benchmark datasets. We further evaluate the robustness of the proposed framework by testing trained CNNs on another different dataset, demonstrating that the system is robust to inter-lab data.

15:00-17:10, Paper TuPT5.7
A Novel Filterbank Especially Designed for the Classification of Colonic Polyps
Wimmer, Georg	Univ. of Salzburg
Häfner, Michael	St. Elisabeth Hospital
Uhl, Andreas	Univ. of Salzburg
Keywords: Computer-aided detection and diagnosis, Segmentation, features and descriptors, Texture and color analysis Abstract: This work proposes a filter bank based texture analysis method and applies it for the classification of colonic polyps. The filter masks of the filter bank are especially designed to enable a distinction between different types of polyps. The filter bank consists of four types of filters, where each filter is based on the Gaussian distribution respectively derivatives of the Gaussian distribution. Three of the four types of filters are directional sensitive. To achieve rotation invariance, only the maximum and minimum filter responses are accounted over differently oriented version of the directional filters. The final feature vector consists of the histograms over the filter responses. The method is tested on eight HD-endoscopic image databases. Five state-of-the-art approaches are additionally applied to the databases to compare their results with those of our filter bank approach. Experiments show that our proposed method clearly outperforms the state-of-the art approaches.

15:00-17:10, Paper TuPT5.8
L-CNN: Exploiting Labeling Latency in a CNN Learning Framework
Afridi, Muhammad Jamal	Michigan State Univ
Ross, Arun	Michigan State Univ
Shapiro, Erik	Michigan State Univ
Keywords: Medical image and signal analysis, Deep learning, Transfer learning Abstract: A supervised learning system requires labeled data during the training phase. Obtaining labels can be an expensive process, especially in medical imaging applications where a qualified expert may be needed to carefully analyze images and annotate them. This constrains the amount of labeled data available. This study explores the possibility of incorporating labeling behavior (viz., labeling latency) in a supervised convolutional neural network (CNN) framework in order to improve its performance in the presence of limited labeled data. The problem of “spot" detection in MRI scans is considered in this work. In this two-class problem, (a) labeling behavior is available only during the training phase unlike traditional features that are available both during training and testing; and (b) the labeling behavior is associated with only one class (the positive samples) unlike other side information that is available for all classes. To address these issues, a new CNN architecture referred to as L-CNN is designed. The proposed method utilizes the labeling behavior of the expert to cluster the labeled data into multiple categories; a source CNN is then trained to distinguish between these categories. Next, a transfer learning paradigm is used where a target CNN is initialized using this source CNN and its weights updated with the limited labeled data that is available. Experimental results on an existing MRI database show that the proposed L-CNN performs better than a conventional CNN and, further, significantly outperforms the previous state-of-the-art, thereby establishing a new baseline for spot detection in MRI.

15:00-17:10, Paper TuPT5.9
Spatially Constrained Sparse Regression for the Data-Driven Discovery of Neuroimaging Biomarkers
Kumar, Kuldeep	Univ. of Quebec, Ec. De Tech. Superieure
Desrosiers, Christian	Ec. De Tech. Superieure
Chaddad, Ahmad	Ec. De Tech. Superieure
Toews, Matthew	école De Tech. Supérieure
Keywords: Medical image and signal analysis, Machine learning and data mining, Statistical, syntactic and structural pattern recognition Abstract: Sparse multivariate regression techniques like Lasso and Elastic Net are among the most popular approaches for the identification of biomarkers related to brain diseases like Alzheimer’s. Because they use L1 norm to enforce sparsity, these approaches are often sensitive to differences in voxel intensities within the same scan or across subjects. Also, when few samples are available, such approaches can select voxels that are only correlated by chance, leading to disconnected features that do not correspond to any significant brain structure. To address these challenges, we propose a novel sparse regression method that uses the L0 norm for sparse regularization, and imposes spatial consistency constraints on the selected features without requiring an atlas of pre-defined regions. This method uses an efficient optimization strategy based on the Alternating Direction Method of Multipliers (ADMM), that can scale to large data matrices. The performance of the proposed method is evaluated using synthetic data and 3429 T1-weighted (MP-RAGE) images from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. Experimental results show our method to outperform Lasso and Elastic Net regression in the recovery of spatially consistent features corresponding to known neuroimaging biomarkers.

15:00-17:10, Paper TuPT5.10
Exposing and Modeling Underlying Mechanisms in ALS with Machine Learning
Gordon, Jonathan	Ben-Gurion Univ
Lerner, Boaz	Ben-Gurion Univ
Keywords: Medical image and signal analysis, Modeling, simulation and visualization Abstract: We develop methodologies and apply machine-learning algorithms to a database of ALS patients to expose and model underlying mechanisms and relations in the disease. We view the disease state as an ordinal variable (with values between 4 for normal function and 0 for complete loss of function), and show that ordinal classification applied to the data has an advantage over classification that does not utilize the ordinal nature of the domain. To identify important physiological and lab test variables in relation to patient functionality, we rank variables with a decision tree that predicts future disease state using current and past variable instantiations. In addition, we cluster data of patient functionalities in performing daily tasks into higher level groupings of body segments and show how certain variables relate more concretely to certain groupings than to others, thus reducing the dimensionality of the disease state representation in a natural manner that was found to be medically interpretable. Finally, we learn Bayesian networks to detect predictors within the Markov blanket of the disease-state variable and to expose relations among the predictors and with the disease state, as well as to identify value combinations of the predictors that distinguish severe and mild patients.


TuBT1	G.Cancun T1.A
TuPMO1	Oral Session

17:10-17:30, Paper TuBT1.1
Automatic Generation of Biclusters from Gene Expression Data Using Multi-Objective Simulated Annealing Approach
Sahoo, Pracheta	IIT Patna
Acharya, Sudipta	IIT Patna
Saha, Sriparna	IIT Patna
Keywords: Classification and clustering, Pattern Recognition for Bioinformatics, Machine learning and data mining Abstract: The invention of microarray technology aids in the successful monitoring of the gene expression patterns. Biclustering is a method in which a number of co-regulated genes are identified over subset of conditions. Our aim is to detect all the non trivial biclusters having low mean squared residue(MSR) and high row variance. In this paper, we have proposed a multi-objective simulated annealing based solution framework to solve the biclustering problem from gene expression data sets. Two objective functions MSR and row-variance capturing two important properties of biclusters are optimized in parallel using the search capability of multi-objective simulated annealing based optimization technique, AMOSA. A new encoding strategy and several different search operators are defined for fast convergence of the algorithm. We have done experiment on two real-life data sets and obtained results are quantified by using several cluster validity indices. We have compared our obtained results with some state-of-the-art biclustering techniques.

17:30-17:50, Paper TuBT1.2
A Benchmark of Classifiers on Feature Drifting Data Streams
Barddal, Jean Paul	Pontifícia Univ. Católica Do Paraná (PUCPR)
Gomes, Heitor Murilo	Pontificia Univ. Catolica Do Parana
Britto, Alceu	Pontificia Univ. Católica Do Paraná
Enembreck, Fabrício	Pontifícia Univ. Católica Do Parná
Keywords: Classification and clustering, Machine learning and data mining Abstract: The ever increasing data generation confronts both practitioners and researchers on handling massive and sequentially generated amounts of information, the so-called data streams. In this context, a lot of effort has been put on the extraction of useful patterns from streaming scenarios. Learning from data streams embeds a variety of problems, and by far, the most challenging is concept drift, i.e. changes in data distribution. In this paper, we focus on a specific type of drift uncommonly assessed in the literature: feature drifts. Feature drifts occur whenever a subset of features becomes, or ceases to be, relevant to the concept to be learned. We propose and review several feature drifting data stream generators and use them to benchmark state-of-the-art data stream classification algorithms and their combination with drift detectors. Results show that, although drift detectors enable slight quicker recovery to feature drifts, best results are obtained by Hoeffding Adaptive Tree, the only learner that performs dynamic feature selection as streams progress.

17:50-18:10, Paper TuBT1.3
Overcoming Feature Drifts Via Dynamic Feature Weighted K-Nearest Neighbor Learning
Barddal, Jean Paul	Pontifícia Univ. Católica Do Paraná (PUCPR)
Gomes, Heitor Murilo	Pontificia Univ. Catolica Do Parana
Granatyr, Jones	Pontifical Univ. Catholic of Paraná - Graduate on Computer
Britto, Alceu	Pontificia Univ. Católica Do Paraná
Enembreck, Fabrício	Pontifícia Univ. Católica Do Parná
Keywords: Classification and clustering, Machine learning and data mining Abstract: Extracting useful knowledge from data streams is problematic, mainly due to changes in their data distribution, a phenomenon named concept drift. Recently, studies have shown that most of existing algorithms for learning from data streams do not encompass techniques for a specific kind of drift: feature drifts. Feature drifts occur when features become, or cease to be, relevant to the learning task. In this paper, we propose an extension to the k-nearest neighbor classifier, so its distances' computations are weighted according to their current discriminative power. On our proposal, the discriminative power of features is given by entropy, which is swiftly computed over a sliding window. Empirical evidence shows that our approach is able to overcome several existing algorithms in accuracy and feature drift adaptation, while at the expense of bounded processing time and memory space.

18:10-18:30, Paper TuBT1.4
Partial Multi-View Clustering Using Graph Regularized NMF
Rai, Nishant	Indian Inst. of Tech. Kanpur
Negi, Sumit	Xerox Res. India
Chaudhury, Santanu	Indian Inst. of Tech. Delhi
Deshmukh, Om	Xerox Res. Center India
Keywords: Classification and clustering, Machine learning and data mining Abstract: Real-world datasets consist of data representations (views) from different sources which often provide information complementary to each other. Multi-view learning algorithms aim at exploiting the complementary information present in different views for clustering and classification tasks. Several multi-view clustering methods that aim at partitioning objects into clusters based on multiple representations of the object have been proposed. Almost all of the proposed methods assume that each example appears in all views or at least there is one view containing all examples. In real-world settings this assumption might be too restrictive. Recent work on Partial View Clustering addresses this limitation by proposing a Non-negative Matrix Factorization based approach called PVC. Our work extends the PVC work in two directions. First, the current PVC algorithm is designed specifically for two-view datasets. We extend this algorithm for the k partial-view scenario. Second, we extend our k partial-view algorithm to include view specific graph laplacian regularization. This enables the proposed algorithm to exploit the intrinsic geometry of the data distribution in each view. The proposed method, which is referred to as GPMVC (Graph Regularized Partial Multi-View Clustering), is compared against 7 baseline methods (including PVC) on 5 publicly available text and image datasets. In all settings the proposed GPMVC method outperforms all baselines. For the purpose of reproducibility, we provide access to our code


TuBT2	G.Cancun T1.B
TuPMO2	Oral Session

17:10-17:30, Paper TuBT2.1
Facial Landmark Detection Based on an Ensemble of Local Weighted Regressors During Real Driving Situation
Jeong, Mira	Keimyung Univ
Kwak, Jun-Yong	Keimyung Univ
Ko, ByoungChul	Keimyung Univ
Nam, JaeYeal	Keumyung Univ
Keywords: Motion, tracking and video analysis, Active and ensemble learning, Facial expression recognition Abstract: In this study, we propose a novel method for facial landmark detection (FLD) based on an ensemble of local weighted regressors and a global face shape model under real driving situations. Unlike other FLD approaches, the method proposed in this study first detects the nose region instead of a face-bounding box as a reference point for estimating the offset from a landmark and a reference point. Next, a weighted random forest regressor (WRFR) is used for designing a regressor that maintains the generality while utilizing a small number of decision trees. During the training period, some of the trees having low accuracy are removed and the remaining trees of the WRFR have different weights according to their regression accuracy. As a global face shape model, we use the spatial relationship between three landmarks to identify erroneous estimates of the local regressors and provide valid alternatives. Using the unified framework of the proposed FLD, our algorithm is robust to facial expressions and partial occlusions caused by a subject’s hair or sunglasses. For our experiment, using a near-infrared camera, we constructed a benchmark dataset for FLD under real driving situations, which we call the Face Alignment Dataset used In Driving (FADID). The proposed algorithm was successfully applied to various driving sequences in FADID, and the results show that its FLD detection performance is better than that of other state-of-the-art methods.

17:30-17:50, Paper TuBT2.2
Angle and Volume-Preserving Mapping of Organ Volume Model Based on Modified Self-Organizing Deformable Model
Miyauchi, Shoko	Kyushu Univ
Morooka, Ken'ichi	Kyushu Univ
Tsuji, Tokuo	Kanazawa Univ
Miyagi, Yasushi	Kyushu Univ
Fukuda, Takaichi	Kumamoto Univ
Kurazume, Ryo	Kyushu Univ
Keywords: Shape modeling and encoding, Computer-aided detection and diagnosis Abstract: This paper proposes a new method for mapping volume models of human organs onto a target volume with simple shapes. The proposed method is based on our modified Self-organizing Deformable Model (mSDM) which finds the one-to-one mapping with no foldovers between an arbitrary object surface model and a target surface. By extending mSDM to apply to organ volume models, the proposed method, called volumetric SDM (vSDM), establishes the one-to-one correspondence between the volume model and its target volume. At the same time, vSDM preserves geometrical properties of the original model before and after the mapping. In addition, vSDM allows to control the mapping of interior structures of the organ model onto specific regions inside the target volume. These characteristics of vSDM enables to easily find a reliable correspondence between different volume models via a common target volume.

17:50-18:10, Paper TuBT2.3
Boosting VLAD with Double Assignment Using Deep Features for Action Recognition in Videos
Duta, Ionut Cosmin	Univ. of Trento
Nguyen, Tuan Anh	Univ. of Tokyo
Aizawa, Kiyoharu	The Univ. of Tokyo
Ionescu, Bogdan	Univ. Pol. of Bucharest
Sebe, Nicu	Univ. of Trento
Keywords: Motion, tracking and video analysis, Image and video analysis and understanding, Multimedia analysis, indexing and retrieval Abstract: The encoding method is an important factor for an action recognition pipeline. One of the key points for the encoding method is the assignment step. A very widely used super-vector encoding method is the vector of locally aggregated descriptors (VLAD), with very competitive results in many tasks. However, it considers only hard assignment and the criteria for the assignment is performed only from the features side, by looking for which visual word the features are voting. In this work we propose to encode deep features for videos using a double assignment VLAD (DA-VLAD). In addition to the traditional assignment for VLAD we perform a second assignment by taking into account the perspective from the codebook side: which are the nearest features to a visual word and not only which is the nearest centroid for the features as the standard assignment. Another important factor for the performance of an action recognition system is the feature extraction step. Recently, deep features obtained state-of-the-art results in many tasks, being also adopted for action recognition with competitive results over hand-crafted features. This work includes a pipeline to extract local deep features for videos using any available network as a black box and we show competitive results including the case when the network was trained for another task or another dataset. Our DA-VLAD encoding method outperforms the traditional VLAD and we obtain state-of-the-art results on UCF50 dataset and competitive results on UCF101 dataset.

18:10-18:30, Paper TuBT2.4
Online Weighted One-Class Ensemble for Feature Selection in Background/Foreground Separation
Silva, Caroline	Univ. De La Rochelle
Bouwmans, Thierry	Lab. of Mathematics, Images and Applications, Univ. Of
Frelicot, Carl	Univ. De La Rochelle
Keywords: Motion, tracking and video analysis, Active and ensemble learning, Model selection Abstract: Background subtraction (BS) is one of the key steps for detecting moving objects in video surveillance applications. In the last few years, many BS methods have been developed to handle the different challenges met in video surveillance but the role and the relevance of the visual features used has been less investigated. In this paper, we present an Online Weighted Ensemble of One-Class SVMs (Support Vector Machines) able to select suitable features for each pixel to distinguish the foreground objects from the background. In addition, our proposal uses a mechanism to update the relative importance of each feature over time. Moreover, a heuristic approach is used to reduce the complexity of the background model maintenance while maintaining the robustness of the background model. Results on two datasets show the pertinence of the approach


TuBT3	Maya T2.A
TuPMO3	Oral Session

17:10-17:30, Paper TuBT3.1
A New Efficient Measure for Accuracy Prediction and Its Application to Multistream-Based Unsupervised Adaptation
Ogawa, Tetsuji	Waseda Univ
Mallidi, Sri Harish	Johns Hopkins Univ
Dupoux, Emmanuel	Ec. Des Hautes Etudes Sciences Sociale
Jordan, Cohen	Spelamode
Feldman, Naomi	Univ. of Maryland
Hermansky, Hynek	Johns Hopkins Univ
Keywords: Automatic speech and speaker recognition, Artificial neural networks Abstract: A new efficient measure for predicting estimation accuracy is proposed and successfully applied to multistream-based unsupervised adaptation of ASR systems to address data uncertainty when the ground-truth is unknown. The proposed measure is an extension of the M-measure, which predicts confidence in the output of a probability estimator by measuring the divergences of probability estimates spaced at specific time intervals. In this study, the M-measure was extended by considering the latent phoneme information, resulting in an improved reliability. Experimental comparisons carried out in a multistream-based ASR paradigm demonstrated that the extended M-measure yields a significant improvement over the original M-measure, especially under narrow-band noise conditions.

17:30-17:50, Paper TuBT3.2
Multi-Modal Neural Conditional Ordinal Random Fields for Agreement Level Estimation
Rakicevic, Nemanja	Imperial Coll. London
Rudovic, Ognjen	Imperial Coll
Petridis, Stavros	Imperial Coll. London
Pantic, Maja	Imperial Coll. London
Keywords: Automatic speech and speaker recognition, Affective computing Abstract: The ability to automatically detect the extent of agreement or disagreement a person expresses is an important indicator of inter-personal relations and emotion expression. Most of existing methods for automated analysis of human agreement from audio-visual data perform agreement detection using either audio or visual modality of human interactions. However, this is suboptimal as expression of different agreement levels is composed of various facial and vocal cues specific to the target level. To this end, we propose the first approach for multi-modal estimation of agreement intensity levels. Specifically, our model leverages the feature representation power of Multimodal Neural Networks (NN) and discriminative power of Conditional Ordinal Random Fields (CORF) to achieve dynamic classification of agreement levels from videos. We show on the MAHNOB-Mimicry database of dyadic human interactions that the proposed approach outperforms its uni-modal and linear counterparts, and related models that can be applied to the target task.

17:50-18:10, Paper TuBT3.3
Real Time Artificial Auditory Systems for Cluttered Environments
Dentamaro, Giuseppe	Pol. Di Bari
Cardellicchio, Angelo	Pol. Di Bari
Guaragnella, Cataldo	Pol. Di Bari
Keywords: Audio and acoustic processing and analysis, Sensor array & multichannel signal processing, Signal, image and video processing Abstract: In this paper, a three dimensional sound steering system which exploits pointing error minimization techniques is presented. The system uses a three-microphones array located on the vertexes of an equilateral triangle, which capture audio signals generated from a sound source to control a stepper motor based steering unit. The system works in reverberant and noisy environments, and is able to estimate the Direction Of Arrival (DOA) of an audio source located in a semispace delimited by the plane of the array. The DOA of the audio source is estimated in a two-step process: the first step determines the Time Delay Of Arrival (TDOA) for each pair of microphones, the second step uses the TDOAs previously estimated to compute the DOA. The system works in real time and can achieve state-of-the art performances using only three microphones, representing a significant improvement with respect to the existing approaches.

18:10-18:30, Paper TuBT3.4
A Pertinent Evaluation of Automatic Video Summary
Kannappan, Sivapriyaa	Aberystwyth Univ
Liu, Yonghuai	Aberystwyth Univ
Tiddeman, Bernard Paul	Aberystwyth Univ
Keywords: Signal, image and video processing, Performance Evaluation, Motion, tracking and video analysis Abstract: Video summarization is useful to find a concise representation of the original video, nevertheless its evaluation is somewhat challenging. This paper proposes a simple and efficient method for precisely evaluating the video summaries produced by the existing techniques. This method includes two steps. The first step is to establish a set of matched frames between automatic summary (AT) and the ground truth summary (GT) through two-way search, in which the similarity between two frames are measured using correlation coefficient. The second step is to estimate the consistency among these established matches, so that the difference among these frames in the AT and GT are preserved respectively. To accomplish this, a compatibility matrix is built based on the features extracted from each of these frames. The consistency values among these matched frames are estimated as the eigenvector of this matrix corresponding to the maximum eigenvalue. Such matched frames with a small enough consistency value will be rejected, leading to more accurate performance estimation of the video summarization techniques. Experimental results based on a publicly accessible dataset shows that the proposed method is effective in finding true matches and provide more realistic measurement of the performance for various techniques.


TuBT4	Maya T2.B
TuPMO4	Oral Session

17:10-17:30, Paper TuBT4.1
Multiple Facial Action Unit Recognition by Learning Joint Features and Label Relations
Shan, Wu	Univ. of Science and Tech. of China
Wang, Shangfei	Univ. of Science and Tech. of China
Ji, Qiang	RPI
Keywords: Facial expression recognition Abstract: Although both feature dependencies and label dependencies are crucial for facial action unit (AU) recognition, little work addresses them simultaneously till now. To address this limitation, we propose a 4-layer Restrict Boltzmann Machine (RBM) to simultaneously capture feature level and AU level dependencies to recognize multiple AUs. Specifically, the bottom two layers of the RBM model capture dependencies among image features, while the top two layers capture the high order dependencies among AU labels. An efficient learning algorithm is introduced to jointly learn all layers to leverage the interactions among different layers. Experiments on two benchmark databases demonstrate the effectiveness of the proposed approach in modelling complex AU relationships from both features and labels jointly, and its improved performance over the existing methods.

17:30-17:50, Paper TuBT4.2
Magnifying Subtle Facial Motions for 4D Expression Recognition
Zhen, Qingkai	Buaa.edu.cn
Huang, Di	Beihang Univ
Wang, Yunhong	Beihang Univ
Drira, Hassen	LIFL (UMR Lille1/CNRS8022), Univ. De Lille1
Ben Amor, Boulbaba	Lifl Umr 8022
Daoudi, Mohamed	TELECOM Lille
Keywords: Facial expression recognition Abstract: In this paper, we propose an effective approach for automatic 4D Facial Expression Recognition (FER). The flow of 3D facial scans is first modeled to capture spatial deformations based on the recently-developed Riemannian approach, namely Dense Scalar Fields (DSF), where registration and comparison of neighboring 3D face frames are jointly led. The deformations are then fed into a temporal filtering based magnification step to amplify the slight facial actions over time. The proposed method allows revealing subtle (hidden) deformations which enhances the performance in classification. We evaluate our approach on the BU-4DFE dataset, and the state-of-art accuracy up to 94.18% is achieved, which is superior to the top one so far reported, clearly demonstrating its effectiveness.

17:50-18:10, Paper TuBT4.3
Selective Deep Features for Micro-Expression Recognition
Patel, Devangini	Univ. of Oulu
Hong, Xiaopeng	Univ. of Oulu
Zhao, Guoying	Univ. of Oulu
Keywords: Facial expression recognition, Affective computing, Deep learning Abstract: Micro-expression recognition is a challenging task in computer vision field due to the repressed facial appearance and short duration. Previous work for micro-expression recognition have used hand-crafted features like LBP-TOP, Gabor filter and optical flow. This paper is the first work to explore the possible use of deep learning for micro-expression recognition task. Due to the lack of data for micro-expression, training a CNN model from micro-expression data is not feasible. Instead, transfer learning from objects and facial expressions based CNN models are used. The aim is to use feature selection to remove the irrelevant deep features for our task. This work extends evolutionary algorithms to search an optimal set of deep features so that it does not overfit the training data and generalizes well for the test data. Promising results are presented for various micro-expression datasets.

18:10-18:30, Paper TuBT4.4
Transferring from Face Recognition to Face Attribute Prediction through Adaptive Selection of Off-The-Shelf CNN Representations
Zhong, Yang	KTH, Royal Inst. of Tech
Sullivan, Josephine	Royal Inst. of Tech. (KTH)
Li, Haibo	KTH, Royal Inst. of Tech
Keywords: Facial expression recognition, Deep learning, Image and video analysis and understanding Abstract: This paper addresses the problem of transferring CNNs pre-trained for face recognition to a face attribute prediction task. To transfer an off-the-shelf CNN to a novel task, a typical solution is to fine-tune the network towards the novel task. As demonstrated in state-of-the-art face attribute prediction approach, fine-tuning the high-level CNN hidden layer by using labeled attribute data leads to significant performance improvements. In this paper, however, we tackle the same problem but through a different approach. Rather than using an end-to-end network, we select face representations from off-the-shelf hierarchical CNN representations for recognizing different attributes. Through such an adaptive representation selection, even without any fine-tuning, our results still outperform state-of-the-art face attribute prediction approach on the latest large-scale dataset for an error rate reduction of more than 20%. Moreover, by using intensive empirical probes, we have identified several key factors that are significant for achieving promising face attribute prediction performance. These results attempt to gain and update our understandings of the nature of CNN features and how they can be better applied to the transferred novel tasks.


TuBT5	Maya T2.C
TuPMO5	Oral Session

17:10-17:30, Paper TuBT5.1
Dual Active Contours Model for HR-Pqct Cortical Bone Segmentation
Hafri, Mohamed	Univ. of Orleans
Jennane, Rachid	Univ. of Orleans
Lespessailles, Eric	Univ. of Orleans
Toumi, Hechmi	Univ. of Orleans
Keywords: Medical image and signal analysis, Segmentation, features and descriptors Abstract: The segmentation of the bone in HR-pQCT (High Resolution peripheral Quantitative Computed Tomography) images remains a challenging task due to the image characteristics and the complex structure of the bone (cortical and trabecular). In this paper, we address the problem of separating the cortical bone from the background and the trabecular bone. We propose a novel approach to segment the cortical bone using dual active contours. This new concept allows the two contours to interact with each other and evolve to delineate the cortical bone. The energy of the two contours is based on the local information along the curve to be able to handle the intensity inhomogeneity, the noise characteristics and the motion artefacts in such images. Unlike state of art methods, the proposed technique can segment accurately the cortical bone in the different sites (proximal and ultra-distal). Finally, to test the robustness and accuracy of our proposed method, we compared the computed segmentation results of each method to the ground truth. Our proposed approach gives a higher Dice similarity coefficient in both proximal and ultra-distal sites.

17:30-17:50, Paper TuBT5.2
Learned vs. Engineered Features for Fine-Grained Classification of Aquatic Macroinvertebrates
Riabchenko, Ekaterina	Tampere Univ. of Tech
Meissner, Kristian	Fresh Water Centre, Finnish Environment Inst. Jyväskylä, Fi
Ahmad, Iftikhar	Department of Signal Processing, Tampere Univ. of Tech
Iosifidis, Alexandros	Tampere Univ. of Tech
Tirronen, Ville	Department of Mathematical, Information Tech. Univ. O
Moncef, Gabbouj	Tampere Univ. of Tech
Kiranyaz, Serkan	Tampere Univ. of Tech
Keywords: Classification and clustering Abstract: Aquatic macroinvertebrate biomonitoring is an efﬁcient way of assessment of slow and subtle anthropogenic changes and their effect on water quality. It is imperative to have reliable identiﬁcation and counts of the various taxa occurring in samples as these form the basis for the quality indices used to infer the ecological status of the aquatic ecosystem. In this paper, we try to close the gap between human taxa identiﬁcation accuracy (typically 90-95% on 30-40 classes of macroinvertebrates) and results of automatic ﬁne-grained classiﬁcation by introducing a novel technique based on Convolutional Neural Networks (CNN). CNN learns optimal features for macroinvertebrate classiﬁcation and achieves near human accuracy when tested on 29 macroinvertebrate classes. Moreover, we perform comparative evaluation of the learned features against the hand-crafted features, which have been commonly used in classical approaches, and conﬁrm superiority of the learned deep features over the engineered ones.

17:50-18:10, Paper TuBT5.3
Bag of Temporal Co-Occurrence Words for Retrieval of Focal Liver Lesions Using 3D Multiphase Contrast-Enhanced CT Images
Xu, Yingying	Zhejiang Univ
Lin, Lanfen	Zhejiang Univ
Hu, Hongjie	Sir Run Run Shaw Hospital
Wang, Dan	Sir Run Run Shaw Hospital
Liu, Yitao	Sir Run Run Shaw Hospital
Wang, Jian	Ritsumeikan Univ
Han, Xian-Hua	Ritsumeikan Univ
Chen, Yen-wei	Ritsumeikan Univ
Keywords: Content based image retrieval and data mining, Medical image and signal analysis, Computer-aided detection and diagnosis Abstract: Computer-aided diagnosis (CAD) systems have been verified to have the potential to assist radiologists in clinical diagnosis to detect and characterize focal liver lesions (FLLs) based on single- or multiphase contrast-enhanced computed tomography (CT) images. Features extracted from multiphase contrast-enhanced CT images carry more important diagnostic information i.e. enhancement pattern and demonstrate much stronger discriminative ability compared to those of single-phase CT images. In this paper, we propose a new method for multiphase image feature generation called the bag of temporal co-occurrence words (BoTCoW). A temporal co-occurrence image connecting intensity from multiphase images is constructed. Then the bag of visual word (BoVW) model is employed on the temporal co-occurrence images to extract temporal feature. The proposed method effectively captures temporal enhancement information and demonstrates the distribution of the evolution patterns. The effectiveness of this method is validated in a retrieval system using 132 FLLs with confirmed pathology type. The preliminary results show that the proposed BoTCoW method outperforms the previously proposed temporal features and multiphase features based on the BoVW model.

18:10-18:30, Paper TuBT5.4
A Feature Fusion Framework for Hashing
Jhuo, I-Hong	Acad. Sinica
Weng, Li	IGN - French Mapping Agency
Cheng, Wen-Huang	Acad. Sinica
Lee, D.T.	Inst. of Information Science
Keywords: Content based image retrieval and data mining, Pattern Recognition for Search, Retrieval and Visualization Abstract: A hash algorithm converts data into compact strings. In the multimedia domain, effective hashing is the key to large-scale similarity search in high-dimensional feature space. A limit of existing hashing techniques is that they typically use single features. In order to improve search performance, it is necessary to utilize multiple features. Due to the compactness requirement, concatenation of hash values from different features is not an optimal solution. Thus a fusion process is desired. In this paper, we solve the multiple feature fusion problem by a hash bit selection framework. Given multiple features, we derive an n-bit hash value of improved performance compared with hash values of the same length computed from each individual feature. The framework utilizes a feature-independent hash algorithm to generate a sufficient number of bits from each feature, and selects n bits from the hash bit pool by leveraging pair-wise label information. The metric bit reliability is used for ranking the bits. It is estimated by bit-level hypothesis testing. In addition, we also take into account the dependence among bits. A weighted graph is constructed for refined bit selection, where the bit reliability is used as vertex weights and the mutual information among hash bits is used as edge weights. We demonstrate our framework with LSH. Extensive experiments confirm that our method is effective, and outperforms several state-of-the-art methods.

Technical Program for Tuesday December 6, 2016