ICPR16


Mo1PL	G.Cancun T1
King Sun Fu Prize - Robert Haralick	Plenary Session


MoAT1	G.Cancun T1.A
MoAMO1	Oral Session

10:30-10:50, Paper MoAT1.1
Regularizing AdaBoost with Validation Sets of Increasing Size
Meijer, Dirk	Delft Univ. of Tech
Tax, David	Delft Univ. of Tech
Keywords: Active and ensemble learning, Classification and clustering Abstract: AdaBoost is an iterative algorithm to construct classifier ensembles. It quickly achieves high accuracy by focusing on objects that are difficult to classify. Because of this, AdaBoost tends to overfit when subjected to noisy datasets. We observe that this can be partially prevented with the use of validation sets, taken from the same noisy training set. But using less than the full dataset for training hurts the performance of the final classifier ensemble. We introduce ValidBoost, a regularization of AdaBoost that takes validation sets from the dataset, increasing in size with each iteration. ValidBoost achieves performance similar to AdaBoost on noise-free datasets and improved performance on noisy datasets, as it performs similar at first, but does not start to overfit when AdaBoost does.

10:50-11:10, Paper MoAT1.2
BeamECOC: A Local Search for the Optimization of the ECOC Matrix
Zor, Cemre	Univ. of Surrey
Yanikoglu, Berrin	Sabanci Univ
Merdivan, Erinc	Sabanci Univ
Windeatt, Terry	Univ. Surey
Kittler, Josef	Univ. of Surrey
Alpaydin, Ethem	Bogazici Univ
Keywords: Active and ensemble learning, Classification and clustering, Machine learning and data mining Abstract: Error Correcting Output Coding (ECOC) is a multi-class classification technique in which multiple binary classifiers are trained according to a preset code matrix such that each one learns a separate dichotomy of the classes. While ECOC is one of the best solutions for multi-class problems, one issue which makes it suboptimal is that the training of the base classifiers is done independently of the generation of the code matrix. In this paper, we propose to modify a given ECOC matrix to improve its performance by reducing this decoupling. The proposed algorithm uses Beam Search to iteratively modify the original matrix, using validation accuracy as a guide. It does not involve further training of the classifiers and can be applied to any ECOC matrix. We evaluate the accuracy of the proposed algorithm (BeamECOC) using 10-fold cross-validation experiments on 6 UCI datasets, using random code matrices of different sizes, and base classifiers of different strengths. Compared to the random ECOC approach, BeamECOC increases the average cross-validation accuracy in 83.3% of the experimental settings involving all datasets, and gives better results than the state-of-the-art in 75% of the scenarios. By employing BeamECOC, it is also possible to reduce the number of columns of a random matrix down to 13% and still obtain significantly indifferent or even better results at times.

11:10-11:30, Paper MoAT1.3
Loss Factors for Learning Boosting Ensembles from Imbalanced Data
Soleymani Samarin, Roghayeh	Le Lab. D’imagerie, De Vision Et D’intelligence Artificie
Granger, Eric	École De Tech. Supérieure
Fumera, Giorgio	Univ. of Cagliari
Keywords: Active and ensemble learning, Classification and clustering, Performance Evaluation Abstract: Class imbalance is an issue in many real world applications because classification algorithms tend to misclassify instances from the class of interest when its training samples are outnumbered by those of other classes. Several variations of AdaBoost ensemble method have been proposed in literature to learn from imbalanced data based on re-sampling. However, their loss factor is based on standard accuracy, which still biases performance towards the majority class. This problem is mitigated using cost-sensitive Boosting algorithms, although it can be avoided at the outset by modifying the loss factor calculation. In this paper, two loss factors, based on F-measure and G-mean are proposed that are more suitable to deal with imbalanced data during the Boosting learning process. The performance of standard AdaBoost and of three specialized versions for class imbalance (SMOTEBoost, RUSBoost, and RB-Boost) are empirically evaluated using the proposed loss factors, both on synthetic data and on a real-world face re-identification task. Experimental results show a significant performance improvement on AdaBoost and RUSBoost with the proposed loss factors.

11:30-11:50, Paper MoAT1.4
An Empirical Investigation into the Inconsistency of Sequential Active Learning
Loog, Marco	Delft Univ. of Tech. / Univ. of Copenhagen
Yang, Yazhou	Delft Univ. of Tech
Keywords: Active and ensemble learning, Classification and clustering, Performance Evaluation Abstract: In active learning, one aims to acquire labeled samples that are particularly useful for training a classifier. In sequential active learning, this sample selection is done in a one-at-a-time manner where the choice of sample t + 1 may depend on the current state of the classifier and the t labeled data points already available. In their deviation from standard random sampling, current active learning schemes typically introduce severe sampling bias. Even though this fact has been acknowledged in the more theoretical contributions covering active learning, the more popular approaches largely ignore this bias. This work empirically investigates the consequences of their actions and sets out to identify the pros and cons of this way of dealing with the problem of active learning. Even though current techniques can provide excellent approaches to learning, we conclude that they provide inconsistent solutions and therefore, in a strict sense, do not solve the problem of active learning.

11:50-12:10, Paper MoAT1.5
Meta-Regression Based Pool Size Prediction Scheme for Dynamic Selection of Classifiers
Roy, Anandarup	École De Tech. Supérieure
Menelau Oliveira e Cruz, Rafael	École De Tech. Supérieure
Sabourin, Robert	École De Tech. Supérieure
Cavalcanti, George	Univ. Federal De Pernambuco
Keywords: Active and ensemble learning, Machine learning and data mining, Classification and clustering Abstract: Dynamic selection (DS) is a mechanism to select one or an ensemble of competent classifiers from a pool of base classifiers, in order to classify a specific test sample. The size of this pool is user defined and yet crucial to control the computational complexity and performance of a DS. An appropriate pool size depends on the choice of base classifiers, the underlying DS method used, and more importantly, the characteristics of the given problem. After the DS method and the base classifiers are selected, an appropriate pool size for a given problem can be obtained by the repetitive application of the DS with a variety of sizes, after which a selection is performed. Since this brute force approach is computationally expensive, researchers usually set the pool size to a pre-specified value. However, this strategy may reduce the performance of the DS method. Instead, we propose a meta-regression model in order to predict a suitable pool size, based on the intrinsic classification complexity of a problem. In our strategy, we obtain the best pool sizes for a number of data sets, using the brute force approach. Additionally, we extract meta-features that represent classification complexity of a problem. These two pieces of information are associated by means of meta-regression models. Finally, for an unseen problem, we predict the pool size using this model and the classification complexity information. We carry out the experiments on 64 two-class data sets and with several well-known DS methods. We also consider variants of meta-regression techniques and report prediction results. We further analyze these results using a statistical test. Finally, we investigate the performance of a DS and observe that DS performs equivalently for predicted and the best pool sizes.

12:10-12:30, Paper MoAT1.6
Composing Ensembles by a Stochastic Approach under Execution Time Constraint
Hajdu, Andras	Univ. of Debrecen, Hungary
Toman, Henrietta	Univ. of Debrecen
Kovacs, Laszlo	Univ. of Debrecen
Hajdu, Lajos	Univ. of Debrecen
Keywords: Active and ensemble learning, Model selection, 2D/3D object detection and recognition Abstract: Ensemble-based systems are primarily analyzed on how the accuracy of the ensemble depends on that of its members. In this paper, we extend this model with adding a natural constraint regarding a time limit within which the ensemble should make the decision. For this aim, we consider both the execution time and accuracy of each member. Then, we solve the problem on how to find the most accurate ensemble, where the sum of the execution times of its members remains below the limit. As a decision rule, we analyze a majority voting-based one generalized to be applicable in single object detection scenarios. The optimization task leads to a non-separable Knapsack problem, which is addressed using stochastic considerations. The proposed methodology is also validated experimentally for the localization of the optic disc in retinal images.


MoAT2	G.Cancun T1.B
MoAMO2	Oral Session

12:10-12:30, Paper MoAT2.7
Flip-Avoiding Interpolating Surface Registration for Skull Reconstruction
Xie, Shudong	National Univ. of Singapore
Leow, Wee Kheng	National Univ. of Singapore
Keywords: 3D shape recovery Abstract: Reconstruction of skulls from defective models is a very important and challenging task in craniofacial surgery, forensics, and anthropology. Existing methods typically reconstruct approximating surfaces that regard corresponding points on the target skull as soft constraints, thus incurring non-zero error even for non-defective parts and high overall reconstruction error. This paper proposes a novel method that non-rigidly registers an interpolating surface that regards corresponding target points as hard constraints, thus achieving low reconstruction error. To overcome the shortcoming of interpolating surface, a flip-avoiding method is used to detect and exclude conflicting hard constraints that would otherwise cause surface patches to flip. Comprehensive test results show that our method is more accurate than existing methods and it is robust against severe outliers such as radiation artifacts in CT due to dental implants.

12:10-12:30, Paper MoAT2.8
Automatic Feature Extraction Using CNN for Robust Active One-Shot Scanning
Sagawa, Ryusuke	AIST
Shiba, Yuki	Kagoshima Univ
Hirukawa, Takuto	Kagoshima Univ
Ono, Satoshi	Kagoshima Univ
Kawasaki, Hiroshi	Kagoshima Univ
Furukawa, Ryo	Hiroshima City Univ
Keywords: 3D shape recovery, Deep learning Abstract: Active one-shot scanning techniques have been widely used for various applications. Stereo-based active oneshot scanning embeds a positional information regarding the image plane of a projector onto a projected pattern to retrieve correspondences entirely from a captured image. Many combinations of patterns and decoding algorithms for active one-shot scanning have been proposed. If the capturing environment lacks the assumed conditions, such as the absence of strong external lights, then reconstruction using those methods is degraded, because the pattern decoding fails. In this paper, we propose a general reconstruction algorithm that can be used for any kind of patterns without strict assumptions. The technique is based on an efficient feature extraction function that can drastically reduce redundant information from the raw pixel values of patches of captured images. Shapes are reconstructed by efficiently finding correspondences between a captured image and the pattern using low-dimensional feature vectors. Such a function is created automatically by a convolutional neural network using a large database of pattern images that are efficiently synthesized by using GPU with wide variation of depth and surface orientation. Experimental results shows that our technique can be used for several existing patterns without any ad hoc algorithm or information regarding the scene or the sensor.

12:10-12:30, Paper MoAT2.9
Moving Object Reconstruction in Monocular Video Data Using Boundary Generation
Bullinger, Sebastian	Fraunhofer Inst. of Optronics, System Tech. and Image
Bodensteiner, Christoph	Fraunhofer IOSB
Wuttke, Sebastian	Fraunhofer IOSB Ettlingen
Arens, Michael	Fraunhofer IOSB
Keywords: 3D shape recovery, Reconstruction and camera motion estimation, Stereo and multiple view geometry Abstract: We present a method to reconstruct the three-dimensional shape of a moving instance of a known object category in video data. We exploit state-of-the-art semantic segmentation techniques to extract the object's two-dimensional shape in each frame. Therefore, our method is robust to occlusion, handles stationary objects and extends naturally to multiple video sequences. We apply Structure from Motion (SfM) to previously generated object images in order to compute a three-dimensional representation of the object. Our approach allows us to remove outliers in SfM reconstructions and to compute clean object meshes by leveraging previously computed semantic segmentations and virtual camera positions. We evaluate the accuracy of our method using a multi-view dataset of a moving vehicle. A laser scan serves as ground truth. We applied our algorithm on publicly available video data and on 25 sequences from our dataset. The algorithm achieves an average point distance of 3.3 cm evaluated on seven trajectories contained in the dataset.

12:10-12:30, Paper MoAT2.10
Aligning the Dissimilar: A Probabilistic Feature-Based Point Set Registration Approach
Danelljan, Martin	Linköping Univ
Meneghetti, Giulia	Linköping Univ
Khan, Fahad Shahbaz	Linköping Univ
Felsberg, Michael	Linköping Univ
Keywords: 3D shape recovery Abstract: 3D-point set registration is an active area of research in computer vision. In recent years, probabilistic registration approaches have demonstrated superior performance for many challenging applications. Generally, these probabilistic approaches rely on the spatial distribution of the 3D-points, and only recently color information has been integrated into such a framework, significantly improving registration accuracy. Other than local color information, high-dimensional 3D shape features have been successfully employed in many applications such as action recognition and 3D object recognition. In this paper, we propose a probabilistic framework to integrate high-dimensional 3D shape features with color information for point set registration. The 3D shape features are distinctive and provide complementary information beneficial for robust registration. We validate our proposed framework by performing comprehensive experiments on the challenging Stanford Lounge dataset, acquired by a RGB-D sensor, and an outdoor dataset captured by a Lidar sensor. The results clearly demonstrate that our approach provides superior results both in terms of robustness and accuracy compared to state-of-the-art probabilistic methods.

12:10-12:30, Paper MoAT2.11
Indexing Mayan Hieroglyphs with Neural Codes
Roman-Rangel, Edgar	Univ. of Geneva
Marchand-Maillet, Stephane	Univ. of Geneva
Keywords: Shape modeling and encoding, Pattern Recognition for Art, Cultural Heritage and Entertainment, Classification and clustering Abstract: We present an approach for unsupervised computation of local shape descriptors, which relies on the use of linear autoencoders for characterizing local regions of complex shapes. The proposed approach responds to the need for a robust scheme to index binary images using local descriptors, which arises when only few examples of the complete images are available for training, thus making inaccurate the learning process of parameters of traditional neural networks schemes. Given the possibility of using linear operations during the encoding phase, the computation of the proposed local descriptor can be fast once the parameters of the encoding function are learned. After conducting a vast search, we identified the optimal dimensionality of the resulting local descriptor to be of only 128 dimensions, which allows for efficient further operations on them, such as the construction of bag representations with purposes of shape retrieval and classification. We evaluated the proposed approach indexing a collection of complex binary images, whose instances contain compounds of hieroglyphs from the ancient Maya civilization. Our retrieval experiments show that the proposed approach achieves competitive retrieval performance when compared with hand-crafted local descriptors.

12:10-12:30, Paper MoAT2.12
Real-Time Performance Improvement of Vehicle Detection System by Inter and Intra Frame Adaptation of RoI
Das, Apurba	Tata Consultancy Services
Pv, Sharfudheen	Tata Consultancy Sevices
K, Ruppin	Tata Consultancy Services


MoAT3	Maya T2.A
MoAMO3	Oral Session

10:30-10:50, Paper MoAT3.1
Semisupervised Manifold Learning for Color Transfer between Multiview Images
Liao, Danping	Zhejiang Univ
Qian, Yuntao	Zhejiang Univ
Li, Ze-Nian	Simon Fraser Univ
Keywords: Signal, image and video processing, Texture and color analysis Abstract: In multiview image stitching, the colors of images in a scene might vary when images are taken under different illumination or camera settings. A common way to produce a seamless stitched image is to transform the colors of a target image to match that of a source image. In this paper we present a color transfer method based on two premises: ﬁrst, pixels in the generated image should have similar colors with their corresponding pixels in the source image. Second, pixels with similar colors should still have similar colors after color transfer. Our method can be considered as a semisupervised manifold learning approach, where the corresponding pixels of the input images serve as the labeled data. Our goal is to learn a ﬁnal image which not only shares the same colors with the source image but also has the same image structure with the target image. While manifold learning methods aim to ﬁnd an embedded space to represent the data with minimum structure loss, the proposed method further constrains the solution space using the labeled data. This paper introduces a parametric linear method and a nonparametric nonlinear method to tackle different types of color changes. We formalize our goal for color transfer as a quadratic cost function with a quadratic regularizer, which is convex differentiable and easy to solve. Experimental results show the effectiveness of our methods both quantitatively and qualitatively.

10:50-11:10, Paper MoAT3.2
Camera Self-Calibration from Tracking of Moving Persons
Tang, Zheng	Univ. of Washington
Lin, Yen-Shuo	National Chiao Tung Univ
Lee, Kuan-Hui	Univ. of Washington
Hwang, Jenq-Neng	Univ. of Washington
Chuang, Jen-Hui	National Chiao Tung Univ
Fang, Zhijun	Shanghai Univ. of Engineering Science
Keywords: Signal, image and video processing, Motion, tracking and video analysis, Image and video analysis and understanding Abstract: In a video surveillance system with a single static camera, tracking results of moving persons can be effectively used for camera self-calibration. However, the current methods need to depend on robustness of both tracking and segmentation procedures. RANSAC has been widely used to remove outliers in finding the vertical vanishing point and the horizon line, but the performance is degraded when the proportion of outliers is high. Last but not least, all of them require excessive simplifications in the algorithmic procedures resulting in increasing reprojection error. In this paper, a robust segmentation and tracking system is applied to provide accurate estimation of head and foot locations of moving persons. The noise in the computation of vanishing points is handled by mean shift clustering and Laplace linear regression through convex optimization. We also propose to use the estimation of distribution algorithm (EDA) to search for the local optimal solution for camera calibration that minimizes average reprojection error on the ground plane, while relaxing the assumptions on camera parameters. Promising evaluations of the performance of our proposed method on real scenes are presented.

11:10-11:30, Paper MoAT3.3
Efficient JPEG Decompression by the Alternating Direction Method of Multipliers
Sorel, Michal	Inst. of Information Theory and Automation, Czech Acad. Of
Bartoš, Michal	Inst. of Information Theory and Automation, Czech Acad. Of
Keywords: Enhancement, restoration and filtering, Signal, image and video processing Abstract: Standard decompression of JPEG images produces artifacts along edges and a disturbing checkerboard pattern. To reduce these artifacts, decompression can be formulated as an image reconstruction problem within Bayesian maximum a posteriori probability framework. In this type of problem, the prior information about an image is typically given by the l₁ norm of its sparse domain representation. In this paper, we show how the solution of this problem can be achieved very efficiently using the alternating direction method of multipliers if the sparsity domain forms a tight frame. The proposed algorithm restores images without disturbing JPEG artifacts in several iterations, typically considerably less than competing algorithms. The quality of reconstruction both visually and in terms of SNR primarily depends on the tight frame used.

11:30-11:50, Paper MoAT3.4
Detection of Groups in Crowd Considering Their Activity State
Nakamura, Kazuaki	Osaka Univ
Ono, Tsukasa	Osaka Univ
Babaguchi, Noboru	Osaka Univ
Keywords: Signal, image and video processing, Image and video analysis and understanding, Statistical, syntactic and structural pattern recognition Abstract: In this paper, we focus on the problem of group detection in crowd, which is a task of partitioning a set of pedestrians in a scene into small subsets called groups based on their trajectories. Most of previous methods use only a single model for representing a relationship between trajectories of pedestrians who belong to the same group. However, such relationship would vary depending on the activity state (e.g. walking together, approaching, splitting, and so on) of the group. In this paper, we propose a novel group detection method which can cope with a variation of groups' activity state. The proposed method constructs different models for each activity state in order to appropriately evaluate the relationship of pedestrians' trajectories. In addition, our method regards groups' activity state as hidden variables and estimates their probability distributions, which is used for integrating the constructed models. The proposed method outperforms existing methods in the experiment on the public dataset.

11:50-12:10, Paper MoAT3.5
Manifold Regularized Multi-View Subspace Clustering for Image Representation
Wang, Lei	Xidian Univ
Li, Danping	Xidian Univ
He, Tiancheng	Houston Methodist Res. Inst
Xue, Zhong	Houston Methodist Res. Inst. Weill Cornell Medicine
Keywords: Signal, image and video processing, Image and video analysis and understanding Abstract: Subspace clustering refers to the task of clustering a collection of points drawn from a high-dimensional space into a union of multiple subspaces that best fits them. State-of-the-art approaches have been proposed for tackling this clustering problem by using the low-rank or sparse optimization techniques. However, most of the traditional subspace clustering methods are developed for single-view data and are not directly applicable to the multi-view scenario. In this paper, we present a Manifold Regularized Multi-view Subspace Clustering (MRMSC) method to better incorporate the correlated and complementary information from different views. MRMSC yields a unified affinity representation by joint optimization across different views. To respect the data manifold locally, the graph Laplacian is constructed to maintain the intrinsic geometrical structure of each view. In the multi-view integration, a sparsity constraint is imposed to the unified affinity representation in order to better reflect the data relationship from multiple views or features. In experiments, we compared the performance of clustering using MRMSC with the single-view and concatenate-multi-view methods on different datasets. The results showed that better clustering performance can be achieved by fusing the multiple features with a unified affinity representation by MRMSC.

12:10-12:30, Paper MoAT3.6
High Dynamic Range Image Processing Using Manifold-Based Ordering
Lezoray, Olivier	Univ. De Caen Normandie
Keywords: Signal, image and video processing, Computational photography, Enhancement, restoration and filtering Abstract: Very few research works have addressed the problem of directly manipulating raw HDR vectors for general HDR image processing. In this paper a framework is proposed towards this aim and is based on a new representation of HDR images in the form of an ordering of vectors and an index image. This enables to formulate vector-preserving image processing methods dedicated to HDR images. The ordering relies on three steps: dictionary learning, manifold learning, and out of sample extension. The performance of the proposed approach is illustrated with innovative examples of HDR image filtering and enhancement.


MoAT4	Maya T2.B
MoAMO4	Oral Session

10:30-10:50, Paper MoAT4.1
Implicit Hybrid Video Emotion Tagging by Integrating Video Content and Users' Multiple Physiological Responses
Chen, Shiyu	Univ. of Science and Tech. of China
Wang, Shangfei	Univ. of Science and Tech. of China
Wu, Chongliang	USTC
Gao, Zhen	Univ. of Science and Tech. of China
Shi, Xiaoxiao	Univ. of Science and Tech. of China
Ji, Qiang	RPI
Keywords: Affective computing Abstract: The intrinsic interactions among a video's emotion tag, its content, and a user's spontaneous response while consuming the video can be leveraged to improve video emotion tagging, but this capability has not been thoroughly exploited yet. In this paper, we propose an implicit hybrid video emotion tagging approach by integrating video content and users' multiple physiological responses, which are only required during training. Specifically, multiple physiological signals during training construct a better emotion tagging model from video content. We add similarity constraints on the classifier mapping functions during training to capture the relationships among different kinds of features. We modify the traditional support vector machine with these constraints to improve video emotion tagging. Efficient learning algorithms of the proposed model are also developed. Experiments on three benchmark databases demonstrate the effectiveness and superior performance of our proposed method for implicitly integrating multiple physiological responses to improve video emotion tagging.

10:50-11:10, Paper MoAT4.2
Employing Subjects’ Information As Privileged Information for Emotion Recognition from EEG Signals
Shan, Wu	Univ. of Science and Tech. of China
Wang, Shangfei	Univ. of Science and Tech. of China
Zhu, Yachen	Univ. of Science and Tech. of China
Gao, Zhen	Univ. of Science and Tech. of China
Yue, Lihua	Univ. of Science and Tech. of China
Ji, Qiang	RPI
Keywords: Affective computing Abstract: Current research of emotion recognition from electroencephalogram (EEG) signals rarely considers common patterns embodied in multiple subjects and individual patterns for each subject simultaneously. Therefore, in this paper, we propose a novel emotion recognition approach using subjects or subject groups as privileged information, which is only available during training. First, five frequency features are extracted from each channel of the EEG signals, and features are selected by statistical tests. Then, we propose two three-node Bayesian networks to capture the joint probability distribution function of emotion labels, EEG features, and subjects or subject groups during traininggz{.} Through the learned joint probability distribution, the Bayesian networks model both common and individual emotion patterns simultaneously. During testing, emotion labels can be estimated from EEG features only by marginalized over the privileged information, i.e. subjects or subject groups. Experimental results on three benchmark databases, i.e. the MAHNOB-HCI database, the DEAP database and the USTC-ERVS database, demonstrate that our approach incorporating subjects and clusters achieves better emotion recognition performance than training a classifier for each subject, as well as training a classifier without subject information on the whole dataset.

11:10-11:30, Paper MoAT4.3
Online Speaker Emotion Tracking with a Dynamic State Transition Model
Cirakman, Ozgun	Istanbul Tech. Univ
Gunsel, Bilge	Istanbul Tech. Univ
Keywords: Affective computing, Human Computer Interaction Abstract: Although emotional state recognition from voice has been extensively studied, there is not much effort focusing on the online emotion recognition. Since duration and intensity of emotional experiences change over time it is hard to employ existing static transition models while monitoring emotional states especially in an online setting. To overcome this difficulty we introduce a method which incorporates particle filter tracking for switching observation models with emotional state classification. Adopting the Active Field State Space (AFSS) used in modeling human social interactions, a dynamic state transition model is formulated in the continuous arousal-valence-stance space. Under the assumption that the target posterior of each emotional state is a GMM with unknown number of mixture components, the observation model is constructed throughout a training scheme where DPM models of the emotional states are learned via SMC sampling. Online speaker emotional state labeling performance of the proposed method has been evaluated on long speech sequences containing emotional drift and transitions. Test sequences are simulated from EmoDB based on the AFSS interaction model. It is shown that formulating the emotional state classification as an online tracking problem provides a considerable improvement over standard maximum likelihood classification approach. Test results demonstrate that the introduced method achieves 83% accuracy in an online setting which is comparable to the performance of existing offline methods.

11:30-11:50, Paper MoAT4.4
Face Alignment with Cascaded Bidirectional LSTM Neural Networks
Chen, Yu	Nanjing Univ. of Science and Tech
Qian, Jianjun	Nanjing Univ. of Science and Tech
Yang, Jian	Nanjing Univ. of Science and Tech
Jin, Zhong	Nanjing Univ. of Science and Tech
Keywords: Biometric systems and applications, Deep learning, Face recognition Abstract: Face alignment is an important issue in many computer vision problems. The key problem is to find the nonlinear mapping from face image or feature to landmark locations. In this paper, we propose a novel cascaded approach with bidirectional Long Short Term Memory (LSTM) neural networks to approximate this nonlinear mapping. The cascaded structure is used to reduce the complexity of this problem and accelerate the algorithm by conducting the coarse-to-fine search. In each cascaded module, features of landmarks are delivered as inputs into the bidirectional LSTM network. The depth of the network guarantees the ability to learn highly complex mapping. The recurrent connections in LSTM explore the relationships of different landmarks and ensure that the shape of the face is maintained. On several challenging public databases, our approach achieves state-of-the-art performances.

11:50-12:10, Paper MoAT4.5
Sleep Position Classification from a Depth Camera Using Bed Aligned Maps
Grimm, Timo	Karlsruhe Inst. of Tech. Karlsruhe, Germany
Martinez, Manuel	Karlsruhe Inst. of Tech
Benz, Andreas	Thoraxklinik-Heidelberg GmbH, Heidelberg, Germany
Stiefelhagen, Rainer	Karlsruhe Inst. of Tech. & Fraunhofer IOSB, Karlsruhe
Keywords: Biometric systems and applications, Forensic biometrics and its applications, Deep learning Abstract: Sleep position is an important feature used to assess the quality and quantity of an individual's sleep. Furthermore, it is related to sleep disorders like sleep apnoea and snoring, and needs to be tracked in nursery homes to avoid pressure ulcers. Therefore, body position is registered during sleep studies using, generally, a gravity sensor attached to the chest. We suggest a non-intrusive and cost-efficient approach to detect the sleep position based on a single depth camera. Compared to alternative state-of-the-art approaches, ours require no calibration, and has been evaluated on a real setting comprising 78 patients from a sleep laboratory. We use the Bed Aligned Maps to extract a low resolution descriptor from a depth map which is aligned to the bed position, this descriptor is then classified using Convolutional Neural Networks, achieving an accuracy of 94.0%, thus outperforming current state-of-the-art algorithms and even the contact sensor from the sleep laboratory which achieves an accuracy of 91.9%.

12:10-12:30, Paper MoAT4.6
Learning Effective Gait Features Using LSTM
Feng, Yang	Univ. of Rochester
Li, Yuncheng	Univ. of Rochester
Luo, Jiebo	-Univ. of Rochester
Attachments: Supplementary material Keywords: Gait recognition, Deep learning, Segmentation, features and descriptors Abstract: Human gait is an important biometric feature for person identification in surveillance videos because it can be collected at a distance without subject cooperation. Most existing gait recognition methods are based on Gait Energy Image (GEI). Although the spatial information in one gait sequence can be well represented by GEI, the temporal information is lost. To solve this problem, we propose a new feature learning method for gait recognition. Not only can the learned feature preserve temporal information in a gait sequence, but it can also be applied to cross-view gait recognition. Heatmaps extracted by a convolutional neutral network (CNN) based pose estimate method are used to describe the gait information in one frame. To model a gait sequence, the LSTM recurrent neural network is naturally adopted. Our LSTM model can be trained with unlabeled data, where the identity of the subject in a gait sequence is unknown. When labeled data are available, our LSTM works as a frame to frame view transformation model (VTM). Experiments on a gait benchmark demonstrate the effecacy of our method.


MoAT5	Maya T2.C
MoAMO5	Oral Session

10:30-10:50, Paper MoAT5.1
Automatic Detection of Laser-Induced Structures in Live Cell Fluorescent Microscopy Images Using Snakes with Geometric Constraints
Kondrat'ev, Alexandr Yu.	Lomonosov Moscow State Univ
Sorokin, Dmitry V.	Masaryk Univ
Keywords: Biological image and signal analysis, Biologically motivated vision, Segmentation, features and descriptors Abstract: The existence of reliable evaluation datasets for cell image registration algorithms is crucial for quantitative comparison of registration approaches. A new technique for creating real live cell image sequences for this purpose was introduced recently. These datasets contain stable structures bleached by argon laser in the cell nucleus. In this work, we propose an approach for automatic detection of laser-induced linear structures in live cell fluorescent microscopy images. Compared to a previous linear laser-induced structure detection approach, our method employs an active contours model with a Hessian-based image energy term for linear structures enhancement and geometry energy term controlling the geometric relations between the structures. It uses position adaptive tension parameter values to adjust the snakes behavior in problematic regions (end points and intersection points) and a temporal consistent scheme where the results from the previous frame are used as an initial approximation for the current frame. Our approach was successfully applied to real live cell microscopy image sequences and an experimental comparison with an existing laser-induced structures detection method based on minimal paths has been performed.

10:50-11:10, Paper MoAT5.2
Skin Lesion Segmentation in Clinical Images Using Deep Learning
Jafari, Mohammad Hossein	Isfahan Univ. of Tech
Karimi, Nader	Isfahan Univ. of Tech
Nasr-Esfahani, Ebrahim	Isfahan Univ. of Tech
Samavi, Shadrokh	McMaster Univ
Soroushmehr, S.M. Reza	Univ. of Michigan
Ward, Kevin	Department of Emergency Medicine, Virginia Commonwealth Univ
Najarian, Kayvan	Univ. of Michigan
Keywords: Biological image and signal analysis, Deep learning, Classification and clustering Abstract: Melanoma is the most aggressive form of skin cancer and is on rise. There exists a research trend for computerized analysis of suspicious skin lesions for malignancy using images captured by digital cameras. Analysis of these images is usually challenging due to existence of disturbing factors such as illumination variations and light reflections from skin surface. One important stage in diagnosis of melanoma is segmentation of lesion region from normal skin. In this paper, a method for accurate extraction of lesion region is proposed that is based on deep learning approaches. The input image, after being preprocessed to reduce noise artifacts, is applied to a deep convolutional neural network (CNN). The CNN combines local and global contextual information and outputs a label for each pixel, producing a segmentation mask that shows the lesion region. This mask will be further refined by some post processing operations. The experimental results show that our proposed method can outperform the existing state-of-the-art algorithms in terms of segmentation accuracy.

11:10-11:30, Paper MoAT5.3
Machine Learning Framework Incorporating Expert Knowledge in Tissue Image Annotation
Kromp, Florian	Labdia Lab. GmbH
Ambros, Inge	Children's Cancer Res. Inst
Weiss, Tamara	Children's Cancer Res. Inst
Bogen, Dominik	Children's Cancer Res. Inst
Dodig, Helena	Children's Cancer Res. Inst
Berneder, Maria	Children's Cancer Res. Inst
Gerber, Teresa	Children's Cancer Res. Inst
Taschner-Mandl, Sabine	Children's Cancer Res. Inst
Ambros, Peter F.	Children's Cancer Res. Inst
Hanbury, Allan	Vienna Univ. of Tech
Keywords: Biological image and signal analysis, Machine learning and data mining Abstract: The annotation of cellular nuclei in images of tissue sections is a time consuming but crucial task in quantitative microscopy. We present a machine learning framework incorporating expert knowledge enabling biologists to annotate a large number of nuclear images in a reasonable time. The proposed system is designed to generate three successive levels of annotation, each presenting more details until single nuclei are annotated. Moreover, the output of each level is used to update the model of the next level, increasing the performance speed of the system. A crucial task is the separation of aggregated nuclei. This task is modeled as an Integer Linear Program (ILP), based on the output of an ensemble segmentation and solved by using a genetic algorithm. To incorporate user input at runtime, we use a modified version of an Online Random Forest (ORF). The proposed system was tested by biologists annotating images of ganglioneuroma tissue sections. Results show that the time biologists need to annotate an image is considerably reduced after the system has been trained.

11:30-11:50, Paper MoAT5.4
Quantitative Analysis of Facial Paralysis Based on Limited-Orientation Modified Circular Gabor Filters
Ngo, Truc Hung	Ritsumeikan Univ
Chen, Yen-wei	Ritsumeikan Univ
Seo, Masataka	Ritsumeikan Univ
Matsushiro, Naoki	Osaka Pol. Hospital
Xiong, Wei	Inst. for Infocomm Res. A-STAR
Keywords: Biological image and signal analysis, Medical image and signal analysis, Computer-aided detection and diagnosis Abstract: The diagnosis of disease with the aid of computer programs has been developing more and more in recent years. This paper presents an approach which is based on frequency technique for the objective quantitative analysis of facial paralysis. In this method, limited-orientation modified circular Gabor filters (LO-MCGFs) are used to enhance the desirable frequencies in images. Then, features are extracted from the filtered images for classification. The first advantage of the LO-MCGF is that its inner passbands are uniform, so it helps remove noise and control frequencies more effectively. The second benefit is that the LO-MCGF utilizes the existing robust characteristics of circular Gabor filter for rotation invariant texture regions. Hence, the LO-MCGF-based technique improves remarkably the accuracies of score estimation for some expressions whose local textures are invariant in rotation. Finally, the limited filtered regions, or limited propagation orientations, help the LO-MCGF focus on only some specific spaces. Therefore, the LO-MCGF can avoid the influences of irrelevant regions. In other words, it improves the spatial localization. For overall evaluation, experiments show that our proposed method is superior to other contemporary techniques tested on a dynamic facial expression database.

11:50-12:10, Paper MoAT5.5
Skin Disease Classification versus Skin Lesion Characterization: Achieving Robust Diagnosis Using Multi-Label Deep Neural Networks
Liao, Haofu	Univ. of Rochester
Li, Yuncheng	Univ. of Rochester
Luo, Jiebo	-Univ. of Rochester
Keywords: Computer-aided detection and diagnosis, Medical image and signal analysis Abstract: In this study, we investigate what a practically useful approach is in order to achieve robust skin disease diagnosis. A direct approach is to target the ground truth diagnosis labels, while an alternative approach instead focuses on determining skin lesion characteristics that are more visually consistent and discernible. We argue that, for computer aided skin disease diagnosis, it is both more realistic and more useful that lesion type tags should be considered as the target of an automated diagnosis system such that the system can first achieve a high accuracy in describing skin lesions, and in turn facilitate disease diagnosis using lesion characteristics in conjunction with other evidences. To further meet such an objective, we employ convolutional neutral networks (CNNs) for both the disease-targeted and lesion-targeted classifications. We have collected a large-scale and diverse dataset of 75665 skin disease images from six publicly available dermatology atlantes. Then we train and compare both disease-targeted and lesion-targeted classifiers, respectively. For disease-targeted classification, only 27.6% top-1 accuracy and 57.9% top-5 accuracy are achieved with a mean average precision (mAP) of 0.42. In contrast, for lesion-targeted classification, we can achieve a much higher mAP of 0.70.

12:10-12:30, Paper MoAT5.6
Remote Photoplethysmography Based on Implicit Living Skin Tissue Segmentation
Bobbia, Serge	Univ. Bourgogne Franche-Comté, Le2i Lab
Benezeth, Yannick	Univ. De Bourgogne
Dubois, Julien	Univ. Bourgogne Franche-Comté, Le2i Lab
Keywords: Biological image and signal analysis, Segmentation, features and descriptors, Signal, image and video processing Abstract: Region of interest selection is an essential part for remote photoplethysmography (rPPG) algorithms. Most of the time, face detection provided by a supervised learning of physical appearance features coupled with skin detection is used for region of interest selection. However, both methods have several limitations and we propose to implicitly select living skin tissue via their particular pulsatility feature. The input video stream is decomposed into several temporal superpixels from which pulse signals are extracted. Pulsatility measure for each temporal superpixel is then used to merge pulse traces and estimate the photoplethysmogram signal. This allows to select skin tissue and furthermore to favor areas where the pulse trace is more predominant. Experimental results showed that our method perform better than state of the art algorithms without any critical face or skin detection.


Mo2PL	G.Cancun T1
J. K. Aggarwal Prize - Fei Fei Li	Plenary Session


MoPT1	Poster Session Hall
MoP1	Poster Session

15:00-17:10, Paper MoPT1.1
A Joint Facial Point Detection Method of Deep Convolutional Network and Shape Regression
Yang, Tangqin	Univ. of Electronic Science and Tech. of China
Shu, Chang	Univ. of Electronic Science and Tech. of China
Zhou, Ning	Univ. of Electronic Science and Tech. of China
Keywords: Deep learning, Face recognition, Facial expression recognition Abstract: Facial landmark detection is a challenging task with broad applications. Many approaches have been proposed with varying degrees of success. Regression based methods update the facial point positions iteratively. The mean shape or shapes sampled from training set is often used as the initialization, which sometimes may lead to a local minimum in update due to the offset of initial positions and target positions. On the other hand, convolution network based method shows superior accuracy, at the cost of a complex and unwieldy architecture of deep model. To address the above limitations, a new approach for facial landmark detection is proposed in this paper. The key idea is to combine deep convolution networks with shape regression approach. Deep convolution networks in the first level provide a highly robust initial shape, while the following regression finely tunes the initial prediction to achieve high accuracy. Extensive experiments show that the regression based methods are very sensitive to initializations and the proposed approach (i) achieves high accuracy on public datasets, especially outperforms existing methods in challenging conditions like large pose expression variation, and (ii) reduces model complexity drastically compared to previous methods.

15:00-17:10, Paper MoPT1.2
Complexity of Multiverse Networks and Their Multilayer Generalization
Littwin, Etai	TAU
Wolf, Lior	Tel Aviv Univ
Keywords: Deep learning, Face recognition, Machine learning and data mining Abstract: Multiverse networks were recently proposed as a method for promoting more effective transfer learning. While an extensive analysis was proposed, this analysis failed to capture two main aspects of such networks: (i) the rank of the representation is much lower than the rank predicted by the analysis; and (ii) the contribution of increased multiplicity in such networks diminishes quickly. In this work, we propose additional analysis of multiverse networks which addresses both deficits. A major contribution of our work is quantifying the Rademacher complexity of the multiverse network. It is shown that the complexity upper bound of multiverse networks is significantly lower then that of conventional networks, and diminishes by a factor of sqrt{k}, k being the multiplicity. In addition, we generalize the notion of multiverse networks to multilayer multiverse networks. We derive Rademacher complexity formula to such networks and present experimental results.

15:00-17:10, Paper MoPT1.3
Siamese Network Features for Image Matching
Melekhov, Iaroslav	Aalto Univ
Kannala, Juho	Aalto Univ
Rahtu, Esa	Univ. of Oulu
Keywords: Deep learning, Image and video analysis and understanding, Multimedia analysis, indexing and retrieval Abstract: Finding matching images across large datasets plays a key role in many computer vision applications such as structure-from-motion (SfM), multi-view 3D reconstruction, image retrieval, and image-based localisation. In this paper, we propose finding matching and non-matching pairs of images by representing them with neural network based feature vectors, whose similarity is measured by Euclidean distance. The feature vectors are obtained with convolutional neural networks which are learnt from labeled examples of matching and non-matching image pairs by using a contrastive loss function in a Siamese network architecture. Previously Siamese architecture has been utilised in facial image verification and in matching local image patches, but not yet in generic image retrieval or whole-image matching. Our experimental results show that the proposed features improve matching performance compared to baseline features obtained with networks which are trained for image classification task. The features generalize well and improve matching of images of new landmarks which are not seen at training time. This is despite the fact that the labeling of matching and non-matching pairs is imperfect in our training data. The results are promising considering image retrieval applications, and there is potential for further improvement by utilising more training image pairs with more accurate ground truth labels.

15:00-17:10, Paper MoPT1.4
Exploring Deep Learning Based Solutions in Fine Grained Activity Recognition in the Wild
Cao, Song	Univ. of Southern California
Nevatia, Ram	USC
Keywords: Deep learning, Pattern Recognition for Surveillance and Security Abstract: In this paper, we explore the usage of deep learning based solutions in fine grained activity recognition in the wild. As a powerful tool, deep learning has been widely used in image classification, object detection and activity recognition. We focus on implementing deep learning methods into the more complicated fine grained activity recognition problems. We test our solutions on MPII activity dataset with 410 activities. We find that due to the challenges of large intra class variances, small inter class variances, and limited training samples per activity, the classical two stream deep ConvNets method does not perform that well for fine grained activity recognition. Observing these issues, we propose a solution to directly use deep features learned from ImageNet in an SVM. In experiments, we achieve a 20 percent improvement compared to the classical two stream deep ConvNets solutions, on MPII fine grained activity challenge videos.

15:00-17:10, Paper MoPT1.5
Robust Deep Appearance Models
Quach, Kha Gia	Concordia Univ
Duong, Chi Nhan	Univ. of Science, HCMC
Luu, Khoa	Carnegie Mellon Univ
Bui, Tien D.	Concordia Univ
Keywords: Deep learning, Representation and analysis in pixel/voxel images, Image based modeling Abstract: This paper presents a novel Robust Deep Appearance Models (RDAMs) approach to learn the non-linear correlation between shape and texture of face images. In this approach, two crucial components of face images, i.e. shape and texture, are represented by Deep Boltzmann Machines and Robust Deep Boltzmann Machines (RDBM), respectively. The RDBM, an alternative form of Robust Boltzmann Machines, can separate corrupted/occluded pixels in the texture modeling to achieve better reconstruction results. The two models are connected by Restricted Boltzmann Machines at the top layer to jointly learn and capture the variations of both facial shapes and appearances. This paper also introduces new fitting algorithms with occlusion awareness through the mask obtained from the RDBM reconstruction. The proposed approach is evaluated in various applications by using challenging face datasets, i.e. Labeled Face Parts in the Wild (LFPW), Helen, EURECOM and AR databases, to demonstrate its robustness and capabilities.

15:00-17:10, Paper MoPT1.6
A Transitive Aligned Weisfeiler-Lehman Subtree Kernel
Bai, Lu	Central Univ. of Finance and Ec
Rossi, Luca	Aston Univ
Cui, Lixin	School of Information, Central Univ. of Finance and Ec
Hancock, Edwin	Univ. of York
Keywords: Support vector machines and kernel methods Abstract: In this paper, we develop a new transitive aligned Weisfeiler-Lehman subtree kernel. This kernel not only overcomes the shortcoming of ignoring correspondence information between isomorphic substructures that arises in existing R-convolution kernels, but also guarantees the transitivity between the correspondence information that is not available for existing matching kernels. Our kernel outperforms state-of-the-art graph kernels in terms of classification accuracy on standard graph datasets.

15:00-17:10, Paper MoPT1.7
Robust Kernel Principal Nested Spheres
Awate, Suyash	Indian Inst. of Tech. (IIT) Bombay
Dhar, Manik	IIT Bombay
Kulkarni, Nilesh	Indian Inst. of Tech. Bombay
Keywords: Support vector machines and kernel methods, Dimensionality reduction and manifold learning, Classification and clustering Abstract: Kernel principal component analysis (kPCA) learns nonlinear modes of variation in the data by nonlinearly mapping the data to kernel feature space and performing (linear) PCA in the associated reproducing kernel Hilbert space (RKHS). However, several widely-used Mercer kernels map data to a Hilbert sphere in RKHS. For such directional data in RKHS, linear analyses can be unnatural or suboptimal. Hence, we propose an alternative to kPCA by extending principal nested spheres (PNS) to RKHS without needing the explicit lifting map underlying the kernel, but solely relying on the kernel trick. It generalizes the model for the residual errors by penalizing the Lp norm / quasi-norm to enable robust learning from corrupted training data. Our method, termed robust kernel PNS (rkPNS), relies on the Riemannian geometry of the Hilbert sphere in RKHS. Relying on rkPNS, we propose a novel algorithm for classification and evaluates it on several real-world datasets, where rkPNS improves over the state of the art.

15:00-17:10, Paper MoPT1.8
Kernelized Covariance for Action Recognition
Cavazza, Jacopo	Istituto Italiano Di Tecnologia
Zunino, Andrea	IIT
San Biagio, Marco	Istituto Italiano Di Tecnologia
Murino, Vittorio	Istituto Italiano Di Tecnologia
Attachments: Supplementary material Keywords: Support vector machines and kernel methods, Gesture and Behavior Analysis Abstract: In this paper we aim at increasing the descriptive power of the covariance matrix, limited in capturing linear mutual dependencies between variables only. We present a rigorous and principled mathematical pipeline to recover the kernel trick for computing the covariance matrix, enhancing it to model more complex, non-linear relationships conveyed by the raw data. To this end, we propose Kernelized-COV, which generalizes the original covariance representation without compromising the efficiency of the computation. In the experiments, we validate the proposed framework against many previous approaches in the literature, scoring on par or superior with respect to the state of the art on benchmark datasets for 3D action recognition.

15:00-17:10, Paper MoPT1.9
Refining Pre-Image Via Error Compensation for KPCA-Based Pattern Denoising
Li, Jianwu	School of Computer Science and Tech. Beijing Inst. Of
Tu, Qiang	Beijing Inst. of Tech
Yan, Ziye	Beijing Inst. of Tech
Keywords: Support vector machines and kernel methods, Signal, image and video processing Abstract: Finding pre-image is crucial for kernel principal component analysis (KPCA) based pattern de-noising. This paper proposes to learn the systematic error of some classical methods of pre-image finding, and to refine the obtained pre-image via error compensation. Experiments based on simulated data as well as real-world data demonstrate that the proposed approach can improve effectively the results from two classical pre-image methods: gradient decent and distance constraint.

15:00-17:10, Paper MoPT1.10
One-Class Slab Support Vector Machine
Fragoso, Victor	West Virginia Univ
Scheirer, Walter	Univ. of Notre Dame
Hespanha, Joao	UCSB
Turk, Matthew	UC Santa Barbara
Attachments: Supplementary material Keywords: Support vector machines and kernel methods, Statistical, syntactic and structural pattern recognition, Machine learning and data mining Abstract: This work introduces the one-class slab SVM (OCSSVM), a one-class classifier that aims at improving the performance of the one-class SVM. The proposed strategy reduces the false positive rate and increases the accuracy of detecting instances from novel classes. To this end, it uses two parallel hyperplanes to learn the normal region of the decision scores of the target class. OCSSVM extends one-class SVM since it can scale and learn non-linear decision functions via kernel methods. The experiments on two publicly available datasets show that OCSSVM can consistently outperform the one-class SVM and perform comparable to or better than other state-of-the-art one-class classifiers.

15:00-17:10, Paper MoPT1.11
On Regularization Parameter Estimation under Covariate Shift
Kouw, Wouter Marco	Delft Univ. of Tech
Loog, Marco	Delft Univ. of Tech. / Univ. of Copenhagen
Keywords: Transfer learning, Model selection, Classification and clustering Abstract: This paper identifies a problem with the usual procedure for L²-regularization parameter estimation in a domain adaptation setting. In such a setting, there are differences between the distributions generating the training data (source domain) and the test data (target domain). The usual cross-validation procedure requires validation data, which can not be obtained from the unlabeled target data. The problem is that if one decides to use source validation data, the regularization parameter is underestimated. One possible solution is to scale the source validation data through importance weighting, but we show that this correction is not sufficient. We conclude the paper with an empirical analysis of the effect of several importance weight estimators on the estimation of the regularization parameter.

15:00-17:10, Paper MoPT1.12
Unsupervised Cyber Bullying Detection in Social Networks
Di Capua, Michele	Univ. of Milan
Di Nardo, Emanuel	Univ. of Naples "Parthenope"
Petrosino, Alfredo	Univ. of Naples "Parthenope"
Keywords: Classification and clustering, Document Understanding, Artificial neural networks Abstract: Modern young people (digital natives) have grown in an era dominated by new technologies where communications are pushed to quite a real-time level, and pose no limits in establishing relationships with other people or communities. However, the speed of evolution does not allow young people to split consciously acceptable behaviors from potentially harmful ones and a new phenomenon known as cyber bullying is emerging with increasing evidence, attracting the attention of educators, and media. Cyber bullying is defined as "willful and repeated harm inflicted through the use of electronic devices". In this paper we propose a possible solution for automatic detection of bully traces over a social network, using techniques derived from NLP (Natural Language Processing) and machine learning. Specifically, we shall design a model inspired by Growing Hierarchical SOMs, able to cluster efficiently documents containing bully traces, built upon semantic and syntactic features of textual sentences. We fine-tuned our model to work with the social network Twitter, but we also tested the model against other social networks such as YouTube and Formspring. Finally, we report our results, showing that the proposed unsupervised approach could be effectively used with good performances in some scenarios.

15:00-17:10, Paper MoPT1.13
Fast Random K-Labelsets for Large-Scale Multi-Label Classification
Kimura, Keigo	Hokkaido Univ
Kudo, Mineichi	Hokkaido Univ
Sun, Lu	Hokkaido Univ
Kohjaku, Sadamori	Hokkaido Univ
Keywords: Classification and clustering, Machine learning and data mining Abstract: Multi-label classification (MLC), allowing instances to have multiple labels, has been received a surge of interests in recent years due to its wide range of applications such as image annotation and document tagging. One of simplest ways to solve MLC problems is label-power set method (LP) that regards all possible label subsets as classes. LP validates traditional multi-classification classifiers such as multi-class SVM but it suffers from the increased number of classes. Therefore, several improvements have been made for LP to be scaled for large problems with many labels. Random k labELsets (RAkEL) proposed by Tsoumakas textit{et al.} solves this problem by randomly sampling a small number of labels and taking ensemble of them. However, RAkEL needs all instances for constructing each model and thus suffers from high computational complexity. In this paper, we propose a new fast algorithm for RAkEL. First, we assign each training instance to a small number of models. Then LP is applied for each model with only the assigned instances. Experiments on twelve benchmark datasets demonstrated that the proposed algorithm works faster than the conventional methods while keeping accuracy. In the best case, it was 100 times faster than baseline method (LP) and 30 times faster than the original RAkEL.

15:00-17:10, Paper MoPT1.14
Ensemble-Driven Support Vector Clustering: From Ensemble Learning to Automatic Parameter Estimation
Huang, Dong	South China Agricultural Univ
Wang, Chang-Dong	Sun Yat-Sen Univ
Lai, Jian-huang	Sun Yat-Sen Univ
Liang, Yun	South China Agricultural Univ
Bian, Shan	Bianshan@scau.edu.cn
Chen, Yu	South China Agricultural Univ
Keywords: Classification and clustering, Machine learning and data mining Abstract: Support vector clustering (SVC) is a versatile clustering technique that is able to identify clusters of arbitrary shapes by exploiting the kernel trick. However, one hurdle that restricts the application of SVC lies in its sensitivity to the kernel parameter and the trade-off parameter. Although many extensions of SVC have been developed, to the best of our knowledge, there is still no algorithm that is able to effectively estimate the two crucial parameters in SVC without supervision. In this paper, we propose a novel support vector clustering approach termed ensemble-driven support vector clustering (EDSVC), which for the first time tackles the automatic parameter estimation problem for SVC based on ensemble learning, and is capable of producing robust clustering results in a purely unsupervised manner. Experimental results on multiple real-world datasets demonstrate the effectiveness of our approach.

15:00-17:10, Paper MoPT1.15
Axioms to Characterize Efficient Incremental Clustering
Bandyopadhyay, Sambaran	IBM Res
Musti, Narasimha Murty	Indian Inst. of Science
Keywords: Classification and clustering, Machine learning and data mining Abstract: Although clustering is one of the central tasks in machine learning for the last few decades, analysis of clustering irrespective of any particular algorithm was not undertaken for a long time. In the recent literature, axiomatic frameworks have been proposed for clustering and its quality. But none of the proposed frameworks has concentrated on the computational aspects of clustering, which is essential in current big data analytics. In this paper, we propose an axiomatic framework for clustering which considers both the quality and the computational complexity of clustering algorithms. The axioms proposed by us necessarily associate the problem of clustering with the important concept of incremental learning and divide and conquer learning. We also propose an order independent incremental clustering algorithm which satisfies all of these axioms in some constrained manner.

15:00-17:10, Paper MoPT1.16
WENN for Individualized Cleaning in Imbalanced Data
Guan, Hongjiao	Harbin Inst. of Tech
Zhang, Yingtao	Harbin Inst. of Tech
Xian, Min	Utah State Univ
Cheng, Heng-Da	Utah State Univ
Tang, Xianglong	Harbin Inst. of Tech
Keywords: Classification and clustering, Machine learning and data mining Abstract: This paper proposes individualized cleaning for diverse imbalanced data sets. Existing techniques for data cleaning have difficulties with rare cases and outliers in minority class, especially, in highly unbalanced data. The drawback leads incomplete and imprecise examples to removal. In order to enhance the robustness and perform thorough data cleaning, we propose a weighted edited nearest neighbor (WENN), which detects and removes noisy examples from both classes intelligently. It considers individual characteristics of each imbalanced data, involving global class imbalance and local distribution. The main idea of the proposed method is to carefully put more focus on the majority class than the minority class during data cleaning. Extensive experiments over synthetic and real data clearly validate the superiority of our approach against other data cleaning methods.

15:00-17:10, Paper MoPT1.17
Multi-Label Learning with Global Density Fusion Mapping Features
Guo, Yumeng	Tongji Univ. and the Hong Kong Pol. Univ
Chung, Fu-lai	Hong Kong Pol. Univ
Li, Guozheng	Tongji Univ
Keywords: Classification and clustering, Machine learning and data mining Abstract: Multi-label learning, where each instance is assigned to multiple categories simultaneously, is a prevalent problem in data analysis. Previous study approaches typically learn from multi-label data by employing the original feature space in the discrimination process of all class labels. However, this traditional strategy might be suboptimal as the original feature space exists irrelevant or redundant information, which affect the performance of classification. In this paper, we propose another strategy to learn from multi-label data, where reconstructed feature space is exploited to improve the classification performance. Accordingly, an intuitive yet effective algorithm named ATOM, i.e. multi-label learning with globAl densiTy fusiOn Mapping features, is proposed. ATOM firstly reconstructs feature spaces specific to each and no label by conducting cluster analysis on its belonging instances, and then utilizes density fusion to excavate optimum centers from the cluster center union, and finally performs classification by querying the reconstructed feature spaces. Comprehensive experimental results on a total of 12 benchmark data sets clearly validate the superiority of ATOM against other competitive algorithms.

15:00-17:10, Paper MoPT1.18
A New Density Kernel in Density Peak Based Clustering
Hou, Jian	Bohai Univ
Pelillo, Marcello	Ca' Foscari Univ
Keywords: Classification and clustering, Machine learning and data mining Abstract: The clustering algorithm by fast search and find of density peaks is shown to be a promising clustering approach. However, this algorithm involves manual selection of cluster centers, which is not convenient in practical applications. In this paper we discuss the correlation between density peaks and cluster centers. As a result, we present a new local density estimation method to highlight the uniqueness of cluster centers by making use of the farthest ones in nearest neighbors of data. Furthermore, we propose to use density normalization to deal with the density difference among clusters. Given the number of clusters, our algorithm is able to accomplish the clustering process without human intervention and improve the clustering results. In experiments on several datasets, our algorithm is shown to outperform the original one with both cutoff and Gaussian kernels evidently.

15:00-17:10, Paper MoPT1.19
Global Convergence of a Novel Hybrid Gene Clustering Algorithm
Yan Gong, Gong	school of mathematical science, Dalian Univ. of Tech.
Yan Liu, Yan	School of information science and enginering, Dalian Pol. Univ.
Huanan Wang, Huanan	School of mathematical science, Dalian Univ. of Tech.
Jing Wang, Jing	School of telecommunications, Dalian Univ. of Tech.
Wei Wu, Wei	School of mathematical science, Dalian Univ. of Tech.

15:00-17:10, Paper MoPT1.20
Multiple Instance Learning with Multiple Positive and Negative Target Concepts
Karem, Andrew	Univ. of Louisville
Frigui, Hichem	Univ. of Louisville
Keywords: Classification and clustering, Machine learning and data mining, Dimensionality reduction and manifold learning Abstract: We introduce a new algorithm that maps multiple instance data using both positive and negative target concepts into a data representation suitable for standard classification. Multiple instance data are characterized by bags which are in turn characterized by a variable number of feature vectors or instances. Each bag has a known positive or negative label, but the labels of any given instances within a bag is unknown. First, we use the Fuzzy Clustering of Multiple Instance data (FCMI) algorithm to identify K+ positive target concepts, which represent points in the feature space that are close to instances from positive bags, and distant to instances from negative bags. We use a simple K-means clustering algorithm to identify K- negative target concepts that supplement the positive target concepts. Next we demonstrate how the positive and negative target concepts can be used to embed each bag, which has a variable number of instances, into a feature vector with fixed dimension. A key advantage to embedded instance space feature vectors is that standard machine learning algorithms may be used in training and testing multiple instance data. Another advantage of our embedding is that it provides a simple and intuitive interpretation of the data. We show that using our feature embedding, coupled with standard classifiers such as support vector machines or k-nearest neighbors, can outperform state-of-the-art Multiple Instance Learning classifiers on benchmark datasets.

15:00-17:10, Paper MoPT1.21
An Optimal Multiclass Classifier Design
Fiori, Marcelo	Univ. De La República
Di Martino, Matias	Facultad De Ingeniería, Univ. De La República
Fernández, Alicia	Facultad De Ingeniería, Univ. De La República
Keywords: Classification and clustering, Machine learning and data mining, Performance Evaluation Abstract: The use of different evaluation measures for classification tasks have gained a significant amount of attention in the past decade, specially for those problems with multiple and imbalanced classes. However, the optimization of classifiers with respect to these measures is still heuristic, using ad-hoc rules with classical accuracy-optimized classifiers. We propose a classifier designed specifically to optimize one of the possible measures, namely, the so-called G-mean. Nevertheless, the technique is general, and it can be used to optimize generic evaluation measures. The optimization algorithm to train the classifier is described, and the numerical scheme is tested showing its usability and robustness. The code is publicly available, as well as the datasets used along this paper.

15:00-17:10, Paper MoPT1.22
Misclassification Tolerable Learning for Robust Pedestrian Orientation Classification
Kawanishi, Yasutomo	Nagoya Univ
Deguchi, Daisuke	Nagoya Univ
Ide, Ichiro	Nagoya Univ
Murase, Hiroshi	Nagoya Univ
Fujiyoshi, Hironobu	Chubu Univ
Keywords: Classification and clustering, Machine learning and data mining, Support vector machines and kernel methods Abstract: In this paper, we propose a pedestrian orientation classification method which reduces "fatal" misclassifications by cost-relaxation of tolerable misclassifications in one-against-all classifiers training. In a binary classifier in the one-against-all classifiers, we introduce a new class group "conceptually similar classes," whose class labels are similar to the positive class. In the case of pedestrian orientation classification, the conceptually similar classes are defined as neighboring orientations to the positive orientation. We consider the misclassification of the conceptually similar classes to the positive class as tolerable misclassification. By relaxing the cost of the tolerable misclassifications, our proposed classification method reduces fatal misclassifications of non-similar classes. We evaluated the cost-relaxation effectiveness on several public datasets and confirmed that the proposed method outperforms the normal SVM on all of the datasets in the soft criterion by marking 78.63% recognition rate on PDC Dataset.

15:00-17:10, Paper MoPT1.23
Incremental Construction of Rule Ensembles Using Classifiers Produced by Different Class Orderings
Yildiz, Olcay Taner	Isik Univ
Ulas, Aydin	Argela A.S
Keywords: Classification and clustering, Model selection, Machine learning and data mining Abstract: In this paper, we discuss a novel approach to incrementally construct a rule ensemble. The approach constructs an ensemble from a dynamically generated set of rule classifiers. Each classifier in this set is trained by using a different class ordering. We investigate criteria including accuracy, ensemble size, and the role of starting point in the search. Fusion is done by averaging. Using 22 data sets, floating search finds small, accurate ensembles in polynomial time.

15:00-17:10, Paper MoPT1.24
Distributed Data Augmented Support Vector Machine on Spark
Nguyen, Tu Dinh	Deakin Univ
Nguyen, Vu	Deakin Univ
Le, Trung	HCMc Univ. of Pedagogy
Phung, Dinh	Deakin Univ
Keywords: Support vector machines and kernel methods, Machine learning and data mining, Classification and clustering Abstract: Support vector machines (SVMs) are widely-used for classification in machine learning and data mining tasks. However, they traditionally have been applied to small to medium datasets. Recent need to scale up with data size has attracted research attention to develop new methods and implementation for SVM to perform tasks at scale. Distributed SVMs are relatively new and studied recently, but the distributed implementation for SVM with data augmentation has not been developed. This paper introduces a distributed data augmentation implementation for SVM on Apache Spark, a recent advanced and popular platform for distributed computing that has been employed widely in research as well as in industry. We term our implementation sparkling vector machine (SkVM) which supports both classification and regression tasks by scanning through the data exactly once. In addition, we further develop a framework to handle the data with new classes arriving under an online classification setting where new data points can have labels that have not previously seen - a problem we term label-drift classification. We demonstrate the scalability of our proposed method on large-scale datasets with more than one hundred million data points. The experimental results show that the predictive performances of our method are comparable or better than those of baselines whilst the execution time is much faster at an order of magnitude.

15:00-17:10, Paper MoPT1.25
Object Verification in Two Views Using Sparse Representation
Hsu, Shih-Chung	National Tsing Hua Univ
Chang, I-Cheng	National Dong Hwa Univ
Huang, Chung-Lin	Asia Univ
Keywords: Classification and clustering, Pattern Recognition for Surveillance and Security Abstract: This paper proposes an object verification method by using sparse representation (SR). SR has been applied for intuitive object description for its higher de-noising ability. However, most SR does not show the compactness of a dictionary. Our method comprises three major modules. First, we train the sparse matrix by using Boost K-Singular Value Decomposition (Boost K-SVD). Second, we project the training set onto the sparse matrix to obtain a training parse vector set. Third, we combine two training sparse vector sets of the same and different objects from two views to generate a positive/negative combined sparse vector set. Finally, a Support Vector Machine(SVM) classifier is trained using these combined sparse vectors. Our contributions are: (1) a sparser dictionary than K-SVD is obtained, (2) the optimal sparse matrix has the better Restricted Isometry Property (RIP), and (3) the optimal SR is generated and applied to the verification process with the high accuracy. The experimental results prove that our verification method has higher accuracy than the other methods.

15:00-17:10, Paper MoPT1.26
Co-Regularized Collective Matrix Factorization for Joint Matrix Completion
Deng, Yujie	Univ. of Kansas
Lan, Chao	Univ. of Kansas
Huan, Jun	Univ. of Kansas
Keywords: Transfer learning, Machine learning and data mining Abstract: Collective matrix factorization (CMF) is a popular technique for joint matrix completion. However, it relies on a strong assumption that matrices share a common low-rank structure, which may not be easily satisfied in practice. To lift this limitation, this paper introduces a novel joint matrix completion method based on a relaxed assumption. Specifically, we allow the matrix structures to be different, but assume their induced subspaces lie close to each other. Then, we propose a method that penalizes the distance between these subspaces while learning different factorization models for different matrices. Compared with the state-of-art solution, our solution has lower model complexity and hence suffers less from over-fitting. In experiment we demonstrate its effectiveness.

15:00-17:10, Paper MoPT1.27
Exploiting Local and Global Geometric Data Relationships in Support Vector Data Description
Mygdalis, Vasileios	Aristotle Univ. of Thessaloniki
Tefas, Anastasios	Aristotle Univ. of Thessaloniki
Pitas, Ioannis	-
Keywords: Support vector machines and kernel methods, Classification and clustering Abstract: In this paper, we describe a one-class classification method based on Support Vector Data Description, which exploits multiple graph structures in its optimization process. We derive in a generic solution which can be employed for supervised one-class classification tasks. The devised method can produce linear or non-linear decision functions, depending on the adopted kernel function. In our experiments, we simultaneously adopted two graphs that describe local and global geometric training data relationships, respectively. We evaluated the proposed classifier in publicly available datasets, where its performance compared favorably against closely related methods.

15:00-17:10, Paper MoPT1.28
Multiple Kernel Learning Using Data Envelopment Analysis and Feature Vector Selection and Projection
Saikia, Gitimoni	IIT Guwahati
Shivagunde, Saroj	IIT Guwahati
Saradhi, Vijaya	IIT Guwahati
Kannao, Raghvendra	IIT Guwahati
Guha, Prithwijit	Department of EEE, IIT Guwahati
Keywords: Support vector machines and kernel methods, Machine learning and data mining Abstract: Multiple kernel learning methods combine a set of base kernels to produce an optimal one for a certain classification or regression problem. But selecting a set of base kernels from a plethora of kernels is not automated. We provide a criteria to select efficient base kernels. Automating the selection process of efficient base kernel requires less time and effort than manually selecting them. However, learning the weights in the ratio of which the selected kernels are to be combined is still a costly process. To calculate these combination weights, we first evaluate the efficiency of a kernel on the basis of two parameters viz. the Trace and Alignment of the kernel matrix using data envelopment analysis. A base kernel can be selected if its efficiency is 100%. After selecting a set of most efficient kernels, we combine them in the proportion to their efficiencies. Also, we want to control the complexity of the model by the method of data selection. We use feature vector selection method to cast data points to a limited number of features and then apply classical algorithms to solve our classification problems.

15:00-17:10, Paper MoPT1.29
Kernel Alignment for Unsupervised Transfer Learning
Redko, Ievgen	Lab. Hubert Curien, Univ. Jean Monnet
Bennani, Younès	Univ. of Paris 13 - LIPN (UMR 7030 CNRS)
Keywords: Transfer learning, Classification and clustering Abstract: The ability of a human being to extrapolate previously gained knowledge to other domains inspired a new family of methods in machine learning called transfer learning. Transfer learning is often based on the assumption that objects in both target and source domains share some common feature and/or data space. In this paper, we propose a simple and intuitive approach that minimizes iteratively the distance between source and target task distributions by optimizing the kernel target alignment (KTA). We show that this procedure is suitable for transfer learning by relating it to Hilbert-Schmidt Independence Criterion (HSIC) and Quadratic Mutual Information (QMI) maximization. We run our method on benchmark computer vision data sets and show that it can outperform some state-of-art methods.

15:00-17:10, Paper MoPT1.30
Training Robust Models Using Random Projection
Nguyen, Xuan Vinh	Univ. of Melbourne
Erfani, Sarah M.	The Univ. of Melbourne
Paisitkriangkrai, Sakrapee	The Univ. of Melbourne
Bailey, James	Department of Computing and Information Systems, Univ. of M
Leckie, Christopher	The Univ. of Melbourne
Kotagiri, Rao	Univ. of Melbourne
Attachments: Supplementary material Keywords: Artificial neural networks, Machine learning and data mining Abstract: Regularization plays an important role in machine learning systems. We propose a novel methodology for model regularization using random projection. We demonstrate the technique on neural networks, since such models usually comprise a very large number of parameters, calling for strong regularizers. It has been shown recently that neural networks are sensitive to two kinds of samples: (i) adversarial samples, which are generated by imperceptible perturbations of previously correctly-classified samples---yet the network will misclassify them; and (ii) fooling samples, which are completely unrecognizable, yet the network will classify them with extremely high confidence. In this paper, we show how robust neural networks can be trained using random projection. We show that while random projection acts as a strong regularizer, boosting model accuracy similar to other regularizers, such as weight decay and dropout, it is far more robust to adversarial noise and fooling samples. We further show that random projection also helps to improve the robustness of traditional classifiers, such as Random Forrest and Gradient Boosting Machines.

15:00-17:10, Paper MoPT1.31
Transfer Learning for Rare Cancer Problems Via Discriminative Sparse Gaussian Graphical Model
Saha, Budhaditya	Deakin Univ
Gupta, Sunil Kumar	Deakin Univ
Phung, Dinh	Deakin Univ
Venkatesh, Svetha	Deakin Univ
Keywords: Transfer learning, Classification and clustering, Machine learning and data mining Abstract: Mortality prediction of rare cancer types with a small number of high-dimensional samples is a challenging task. We propose a transfer learning model where both classes in rare cancers (target task) are modeled in a joint framework by transferring knowledge from the source task. The knowledge transfer is at the data level where only “related” data points are chosen to train the target task. Moreover, both positive and negative class in training enhances the discrimination power of the proposed framework. Overall, this approach boosts the generalization performance of target task with a small number of data points. The formulation of the proposed framework is convex and expressed as a primal problem. We convert this to a dual problem and efficiently solve by alternating direction multipliers method. Our experiments with both synthetic and three real-world datasets show that our framework outperforms state-of-the-art single-task, multi-task, and transfer learning baselines.


MoPT2	Poster Session Hall
MoP2	Poster Session

15:00-17:10, Paper MoPT2.1
Semantic Segmentation Priors for Object Discovery
Martin Garcia, German	Rheinische Friedrich-Wilhelms-Univ. Bonn
Husain, Farzad	Inst. De Robotica I Informatica Industrial (CSIC-UPC)
Schulz, Hannes	Rheinische Friedrich-Wilhelms-Univ. Bonn
Frintrop, Simone	Univ. of Bonn
Torras, Carme	CSIC-UPC
Behnke, Sven	Rheinische Friedrich-Wilhelms-Univ. Bonn
Keywords: 2D/3D object detection and recognition Abstract: Reliable object discovery in realistic indoor scenes is a necessity for many computer vision and service robot applications. In these scenes, semantic segmentation methods have made huge advances in recent years. Such methods can provide useful prior information for object discovery by removing false positives and by delineating object boundaries. We propose a novel method that combines bottom-up object discovery and semantic priors for producing generic object candidates in RGB-D images. We use a deep learning method for semantic segmentation to classify colour and depth superpixels into meaningful categories. Separately for each category, we use saliency to estimate the location and scale of objects, and superpixels to find their precise boundaries. Finally, object candidates of all categories are combined and ranked. We evaluate our approach on the NYU Depth V2 dataset and show that we outperform other state-of-the-art object discovery methods in terms of recall.

15:00-17:10, Paper MoPT2.2
Theoretical Criterion for Image Matching Using GPT Correlation
Zhang, Shizhi	Tokyo Inst. of Tech
Wakahara, Toru	Hosei Univ
Yamashita, Yukihiko	Tokyo Inst. of Tech
Keywords: 2D/3D object detection and recognition Abstract: GAT (Global Affine Transformation) and GPT (Global Projection Transformation) correlation matchings were successively proposed by Wakahara and Yamashita which use affine transformation (AT) and 2D projection transformation (PT), respectively, to maximize the normalized cross-correlation value between a template and a GAT/GPT-superimposed input image. In theory, to maximize the degree of matching via normalized cross-correlation, the L2 norm of both images should be normalized. However, the criteria of conventional GAT/GPT correlation techniques didn ’ t take account of the conservation of the L2 norm. This research solves the above-mentioned problem which might impair the matching ability of GAT/GPT correlations. In particular, we focus on the enhanced GPT correlation recently proposed by Wakahara et al. to calculate optimal PT parameters simultaneously. Then, we propose an improved criterion satisfying the requirement of the L2 norm conservation together with the enhanced GPT correlation algorithm with norm normalization. Experimental results using artificially deformed images by 2D projection transformation show that the proposed method clearly outperforms those conventional GPT correlation techniques with or without norm normalization in terms of both convergence speed and matching ability.

15:00-17:10, Paper MoPT2.3
Moving Object Detection from a Point Cloud Using Photometric and Depth Consistencies
Takabe, Atsushi	Nara Inst. of Science and Tech
Takehara, Hikari	Nara Inst. of Science and Tech
Kawai, Norihiko	Nara Inst. of Science and Tech
Sato, Tomokazu	Nara Inst. of Science and Tech
Machida, Takashi	Toyota Central R&D Labs., Inc
Nakanishi, Satoru	Toyota Central R&d Labs., Inc
Yokoya, Naokazu	Nara Inst. of Science and Tech
Keywords: 2D/3D object detection and recognition Abstract: 3D models of outdoor environments have been used for several applications such as a virtual earth system and a vision-based vehicle safety system. 3D data for constructing such 3D models are often measured by an on-vehicle system equipped with laser rangefinders, cameras, and GPS/IMU. However, 3D data of moving objects on streets lead to inaccurate 3D models when modeling outdoor environments. To solve this problem, this paper proposes a moving object detection method for point clouds by minimizing an energy function based on photometric and depth consistencies assuming that input data consist of synchronized point clouds, images, and camera poses from a single sequence captured with a moving on-vehicle system.

15:00-17:10, Paper MoPT2.4
An Empirical Study of Deformable Part Model with Fast Feature Pyramid
Yang, Jun	Peking Univ. Shenzhen Graduate School
Li, Ge	Peking Univ. Shenzhen Graduate School
Wang, Wenmin	Peking Univ. Shenzhen Graduate School
Wang, Ronggang	Peking Univ. Shenzhen Graduate School
Keywords: 2D/3D object detection and recognition Abstract: The performance of an object detection system relies heavily on two components: an object model to capture the compositional relationship among the object body and its parts, and a feature representation to describe object appearance. In this work, we present an empirical study of combining two state-of-the-art such components: Deformable Part Model (DPM), a proven effective and flexible part-based object model which originally adopts Histogram of Oriented Gradients (HOG) feature, and Aggregated Channel Features (ACF), a unified feature representation framework with fast pyramid calculation which is originally used in a rigid template matching scheme. DPM is known to work but slow, at the same time ACF has previously been shown to yield a massive speedup with only a minor loss in accuracy compared to competing features including HOG. By combining the two, our hope is to achieve the best of both worlds: the object structure representation power of DPM and the computational efficiency of ACF. Our experiments show that while ACF with heterogeneous feature channels could improve the accuracy of DPM, the run time benefit introduced by fast pyramid approximation is rather limited.

15:00-17:10, Paper MoPT2.5
Robust Hand Detection in Vehicles
Le, T. Hoang Ngan	Carnegie Mellon Univ
Zhu, Chenchen	Carnegie Mellon Univ
Zheng, Yutong	Carnegie Mellon Univ
Luu, Khoa	Carnegie Mellon Univ
Savvides, Marios	Carnegie Mellon Univ
Keywords: 2D/3D object detection and recognition Abstract: The problems of hand detection have been widely addressed in many areas, e.g. human computer interaction environment, driver behaviors monitoring, etc. However, the detection accuracy in recent hand detection systems are still far away from the demands in practice due to a number of challenges, e.g. hand variations, highly occlusions, low-resolution and strong lighting conditions. This paper presents the Multiple Scale Faster Region-based Convolutional Neural Network (MS-FRCNN) to handle the problems of hand detection in given digital images collected under challenging conditions. Our proposed method introduces a multiple scale deep feature extraction approach in order to handle the challenging factors to provide a robust hand detection algorithm. The method is evaluated on the challenging hand database, i.e. the Vision for Intelligent Vehicles and Applications (VIVA) Challenge, and compared against various recent hand detection methods. Our proposed method achieves the state-of-the-art results with 20% of the detection accuracy higher than the second best one in the VIVA challenge.

15:00-17:10, Paper MoPT2.6
A Fast Approach for Traffic Panel Detection from Natural Scene Images
Li, Zhen-Mao	School of Electronic and Information Engineering, Beijing Jiaoto
Huang, Lin-Lin	School of Electronic and Information Engineering, Beijing Jiaoto
Keywords: 2D/3D object detection and recognition Abstract: Traffic panels contain rich text and symbolic information for transportation and scene understanding. Fast detection of traffic panels facilitates text information extraction but has been paid little attention by the community. In this paper, we propose a fast and robust approach for rectangular traffic panel detection from traffic scene images. Considering the rectangular shape of traffic panels, we first extract candidate line segments in gray-level image using a fast line segment detector, and panel proposals are formed from the line segments. The proposals are then filtered using a multi-stage classifier with multiple features. The surviving proposals undergo a post-processing refinement procedure using geometric constraints to give the bounding boxes of detected panels. Experimental results on real images from Baidu Street View demonstrate the effectiveness of the proposed method.

15:00-17:10, Paper MoPT2.7
3D Point Cloud Object Detection with Multi-View Convolutional Neural Network
Pang, Guan	Univ. of Southern California
Neumann, Ulrich	Univ. of Southern California
Keywords: 2D/3D object detection and recognition, Artificial neural networks Abstract: Efficient detection of three dimensional (3D) objects in point clouds is a challenging problem. Performing 3D descriptor matching or 3D scanning-window search with detector are both time-consuming due to the 3-dimensional complexity. One solution is to project 3D point cloud into 2D images and thus transform the 3D detection problem into 2D space, but projection at multiple viewpoints and rotations produce a large amount of 2D detection tasks, which limit the performance and complexity of the 2D detection algorithm choice. We propose to use convolutional neural network (CNN) for the 2D detection task, because it can handle all viewpoints and rotations for the same class of object together, as well as predicting multiple classes of objects with the same network, without the need for individual detector for each object class. We further improve the detection efficiency by concatenating two extra levels of early rejection networks with binary outputs before the multi-class detection network. Experiments show that our method has competitive overall performance with at least one-order of magnitude speed-up comparing with latest 3D point cloud detection methods.

15:00-17:10, Paper MoPT2.8
Pedestrian Detection with a Resolution-Aware Convolutional Network
Yamada, Keiichi	Meijo Univ
Keywords: 2D/3D object detection and recognition, Artificial neural networks, Deep learning Abstract: Pedestrian detection from in-vehicle camera images for the purpose of advanced driver assistance systems is of particular importance in cases of low-resolution pedestrians, because it is desirable to detect the pedestrian as far from the vehicle as possible to effectively provide safe driving support for the driver. Most previous studies on pedestrian detection, however, have focused on pedestrians with comparatively high resolutions. Unfortunately, the scale invariant assumption does not hold in the case of low resolution, and the pedestrian detection performance suffers greatly as the resolution decreases. From this background, this paper deals with pedestrian detection from an image including low-resolution pedestrians. We present a method of detecting pedestrians with a resolution-aware CNN-based architecture, RACNN, that learns low-level image feature with resolution information. Furthermore, we demonstrate the advantage of the proposed method by the evaluation using the Caltech-USA dataset.

15:00-17:10, Paper MoPT2.9
COATL - a Learning Architecture for Online Real-Time Detection and Classification Assistance for Environmental Data
Langenkämper, Daniel	Biodata Mining Group, Bielefeld Univ
Nattkemper, Tim W.	Bielefeld Univ
Keywords: 2D/3D object detection and recognition, Biological image and signal analysis, Human Computer Interaction Abstract: We propose a machine learning based approach to real-time detection and classification assistance for images from unknown environments. While systems for detecting and classifying regular structures like faces in still images are well established, the task of e.g. detecting new morphotypes/objects in an environment is much more complex. The morphotypes/objects are not guaranteed to have apriori known characteristics such as facial landmarks, thus adaptive and data-driven approaches are needed that are trained online with a growing number of available labels provided by a human observer. Our approach combines an effective points of interest computation with an online learning network for real-time proposition of classes. In an example application using images from an arctic deep-sea underwater observatory we report a maximum accuracy of 0.87 and a top-3 accuracy of 0.97 in the range from 10 to 1499 available labels. The average accuracy in that range is 0.74, with the top-3 accuracy being 0.93. Additionally, a convolutional neural network achieves an accuracy of 0.965 at 1500 labels. Points of interest are computed in approximately 3.7 seconds, right after loading a new image. Proposing a class for a computationally determined point of interest or a manually selected one takes about 0.2 seconds. If the image is already in cache the time is reduced to 0.1 seconds. The suggestion of points of interest combined with a proposition of putative classes allows for significantly faster labeling times in new environments.

15:00-17:10, Paper MoPT2.10
Segmentation Methods for Detection of Stationary Vehicles in Combined Elevation and Optical Data
Bulatov, Dimitri	Fraunhofer IOSB
Schilling, Hendrik	Fraunhofer IOSB Ettlingen
Keywords: 2D/3D object detection and recognition, Classification and clustering, Scene understanding Abstract: Detection of vehicles in remote sensing data represents a captivating and challenging task that has been studied during many years. The state-of-the-art detection tools can be subdivided into implicit and explicit methods; the latter ones provide detection results by means of some explicitly characterizing features. Mostly, these methods rely on optical aerial images in which vehicles appear distorted. However, 3D elevation data and orthophotos are increasingly available and typically used to perform a full context-based scene analysis of which vehicles are an indispensable part. In this paper, we propose to combine elevation and optical data for segmentation of vehicle-like objects. To do this, several strategies, their advantages and disadvantages, will be discussed. Since this segmentation method also produces numerous false alarms, we will briefly describe its part in the complete ve-hicle detection pipeline. The results indicate that sensor data fusion is crucial for obtaining the most accurate results in a reasonable time. For example, using trapezoids or stripes formed in optical and elevation data allows one to detect almost all targets with a very high accuracy exceeding the results obtained from single sensor data. We perform an extensive evaluation of all presented methods and outline the main ideas to overcome the existing shortcomings and to achieve a closer embedding of vehicle detection into the process of urban terrain reconstruction from sensor data.

15:00-17:10, Paper MoPT2.11
A Shape Preserving Approach for Salient Object Detection Using Convolutional Neural Networks
Kim, Jongpil	Rutgers, the State Univ. of New Jersey
Pavlovic, Vladimir	Rutgers Univ
Keywords: 2D/3D object detection and recognition, Deep learning Abstract: Determining visual saliency is one of the fundamental problems in computer vision as the saliency not only identifies the most informative parts of a visual scene but may also reduce computational complexity by filtering out irrelevant segments of the scene. In this paper, we propose a novel saliency object detection method that combines a shape-preserving saliency prediction driven by a convolutional neural network with the mid and low-level region preserving image information. Our model learns a saliency shape dictionary, which is subsequently used to train a CNN to predict the salient class of a target region and estimate the full but coarse saliency map of the target image. The map is then refined using image specific low-to-mid level information. The performance evaluation on popular benchmark datasets shows that the proposed method outperforms existing state-of-the-art methods in saliency detection.

15:00-17:10, Paper MoPT2.12
Deep Learning for Integrated Hand Detection and Pose Estimation
Chen, Tzu-Yang	National Taiwan Univ
Wu, Min-Yu	National Taiwan Univ
Hsieh, Yu-Hsun	National Taiwan Univ. Computer Science and Information Eng
Fu, Li-Chen	National Taiwan Univ
Keywords: 2D/3D object detection and recognition, Deep learning, Artificial neural networks Abstract: We propose a novel framework which integrates human hand detection and pose estimation into one single pipeline. Unlike most of previous works which only focus on the pose estimation part subject to some strong assumptions or relying on a weak detector to detect human hands, we employ a deep learning architecture to complete both aforementioned tasks. By letting three different neural networks share the convolutional layers, this deeply learning architecture can efficiently and accurately detect human hands and compute their hand pose configuration. Moreover, we propose a new energy function to optimize the predicted result of convolutional neural network. To validate the proposed framework, experiments have been conducted and the results show that our approach is highly reliable and suitable for real-world applications.

15:00-17:10, Paper MoPT2.13
Multi-Spectral Pedestrian Detection Based on Accumulated Object Proposal with Fully Convolutional Networks
Choi, Hangil	Yonsei Univ
Kim, Seungryong	Yonsei Univ
Park, Kihong	Yonsei Univ
Sohn, Kwanghoon	Yonsei Univ
Attachments: Supplementary material Keywords: 2D/3D object detection and recognition, Deep learning, Classification and clustering Abstract: This paper presents a method for detecting a pedestrian by leveraging multi-spectral image pairs. Our approach is based on the observation that a multi-spectral image, especially far-infrared (FIR) image, enables us to overcome inherent limitations for pedestrian detection under challenging circumstances, such as even dark environments. For that task, multi-spectral color-FIR image pairs are used in a synergistic manner for pedestrian detection through deep convolutional neural networks (CNNs) learning and support vector regression (SVR). For inferring the confidence of a pedestrian, we first learn CNNs between color images (or FIR images) and bounding box annotations of pedestrians, respectively. Furthermore, for each object proposal, we extract intermediate activation features from network, and learn the probability of pedestrian using SVR. To improve the detection performance, the learned probability of pedestrian for each proposal is accumulated on the image domain. Based on the pedestrian confidence estimated from each network and accumulated pedestrian probabilities, the most probable pedestrian is finally localized among object proposal candidates. Thanks to its high robustness of multi-spectral imaging in dark environments and its high discriminative power of deep CNNs, our framework is shown to surpass state-of-the-art pedestrian detection methods on multi-spectral pedestrian benchmark.

15:00-17:10, Paper MoPT2.14
Object Proposals Using CNN-Based Edge Filtering
Waris, Muhammad Adeel	Tampere Univ. of Tech
Iosifidis, Alexandros	Tampere Univ. of Tech
Moncef, Gabbouj	Tampere Univ. of Tech
Keywords: 2D/3D object detection and recognition, Deep learning, Image based modeling Abstract: With the success of deep learning in the last few years, the object detection community shifted from processing on exhaustive sliding windows to smaller set of object proposals using more powerful and deep visual representations. Object proposals increase the accuracy and speed up detection process by reducing the search space. In this paper we propose a novel idea of filtering irrelevant edges using semantic image filtering and true objectness learnt within convolutional layers of CNN. Our approach localizes well proposals by producing highly accurate bounding boxes and reduces the number of proposals. The greatest benefit of our approach is that it can be integrated into any existing method exploiting edge-based objectness to achieve consistently high recall across various intersection over union thresholds. Unlike other supervised methods, our approach does not require bounding box annotations for training. Experiments on PASCAL VOC 2007 dataset demonstrate that our approach improves the state-of-the-art model with a significant margin.

15:00-17:10, Paper MoPT2.15
A Multi-Scale Cascade Fully Convolutional Network Face Detector
Yang, Zhenheng	Univ. of Southern California
Nevatia, Ram	USC
Attachments: Supplementary material Keywords: 2D/3D object detection and recognition, Face recognition Abstract: Face detection is challenging as faces in images could be present at arbitrary locations and in different scales. We propose a three-stage cascade structure based on fully convolutional neural networks (FCNs). It first proposes the approximate locations where the faces may be, then aims to find the accurate location by zooming on to the faces. Each level of the FCN cascade is a multi-scale fully-convolutional network, which generates scores at different locations and in different scales. A probability map is generated after each FCN stage. Probable regions of face are selected and fed to the next stage. The number of proposals is decreased after each level, and the areas of regions are decreased to more precisely fit the face. Compared to passing proposals directly between stages, passing probable regions can decrease the number of proposals and reduce the cases where first stage doesn't propose good bounding boxes. We show that by using FCN and probability map, the FCN cascade face detector can achieve strong performance on public datasets.

15:00-17:10, Paper MoPT2.16
Egocentric Hand Detection Via Region Growth
Huang, Shao	Univ. of Chinese Acad. of Sciences, China
Wang, Weiqiang	Univ. of Chinese Acad. of Sciences
Lu, Ke	Graduate Univ. of Chinese Acad. of Sciences
Keywords: 2D/3D object detection and recognition, Human body motion and gesture based interaction, Motion, tracking and video analysis Abstract: Wearable cameras used to record daily life are attracting researchers' attention, and a large number of ego-related applications have been developed in recent years. Hand detection is one of the key steps for the tasks like gesture recognition, action recognition and understanding hand-based interaction in egocentric videos, since humans are accustomed to interacting with objects using their hands. In this work a novel region growth approach is proposed for egocentric hand detection. The proposed method first identifies seed regions most likely containing the parts of hand based on the distribution of hand-related matches. Then hand regions are gradually located by extending from the seed regions. Finally a whole hand is obtained according to the scores of adjacent superpixels, which are evaluated based on four egocentric cues: contrast, distance, previous hand position and skin appearance. The experimental results on two publicly available datasets demonstrate this work achieves satisfactory performance.

15:00-17:10, Paper MoPT2.17
Scene Text Detection Based on Multi-Scale SWT and Edge Filtering
Feng, Yuanyuan	Xi'an Jiaotong Univ
Song, Yonghong	Xi'an Jiaotong Univ
Zhang, Yuanlin	Xian JiaoTong Univ
Keywords: 2D/3D object detection and recognition, Image and video analysis and understanding Abstract: This paper presents a text detection method based on multi-scale Stroke Width Transform (SWT). First, an image pyramid is built and SWT is performed on each level of the pyramid. Second, edge components are filtered using two novel features, stroke pair ratio (SPR) and edge density of a connected component (EDC). Next, the remaining edge components on each level are grouped into text lines. And these lines are projected back onto a single image. Finally, candidate text lines are verified by integrating block level features and line level features. The multi-scale mechanism makes it possible to detect text defected by reflection or blurring. And the two features are proved to be both effective and efficient in filtering non-text edges. Moreover, experimental results on the ICDAR Robust Reading Competition datasets show that the proposed text detection method provides promising performance.

15:00-17:10, Paper MoPT2.18
Saliency-Guided Selective Magnification for Company Logo Detection
Eggert, Christian	Univ. of Augsburg
Winschel, Anton	Univ. of Augsburg
Zecha, Dan	Univ. of Augsburg
Lienhart, Rainer	Univ. of Augsburg
Keywords: 2D/3D object detection and recognition, Image and video analysis and understanding Abstract: Fast R-CNN is a well-known approach to object detection which is generally reported to be robust to scale changes. In this paper we examine the influence of scale within the detection pipeline in the case of company logo detection. We demonstrate that Fast R-CNN encounters problems when handling objects which are significantly smaller than the receptive field of the utilized network. In order to overcome these difficulties, we propose a saliency-guided multiscale approach that does not rely on building a full image pyramid. We use the feature representation computed by Fast R-CNN to directly classify large objects while at the same time predicting salient regions which contain small objects with high probability. Only selected regions are magnified and a new feature representation for these enlarged regions is calculated. Feature representations from both scales are used for classification, improving the detection quality of small objects while keeping the computational overhead low. Compared to a naive magnification strategy we are able to retain 79% of the performance gain while only spending 36% of the computation time.

15:00-17:10, Paper MoPT2.19
Multi-Orientation Scene Text Detection with Multi-Information Fusion
Pei, Wei-Yi	Univ. of Science and Tech. Beijing
Yang, Chun	Univ. of Science and Tech. Beijing
Yin, Xu-Cheng	Univ. of Science and Tech. Beijing
Kau, Lih-Jen	National Taipei Univ. of Tech
Keywords: 2D/3D object detection and recognition, Image and video analysis and understanding, Deep learning Abstract: We construct a robust and precise multi-orientation text detection system in scene images which can extensively locate possible characters with multi-information fusion. In our method, an adaptive multi-channel character grouping algorithm is first proposed to extract all possible character candidates robustly, and an AdaBoost classifier is then to properly identify character candidates as characters or non-characters. A single-link clustering with distance metric learning is thereafter used to adaptively group characters into text regions, and an effective hybrid filter with Convolution Neural Networks (CNN), AdaBoost and Bayesian classifiers is finally designed to precisely verify the extracted text regions. Our proposed technology is extensively evaluated on several public multi-orientation scene text datasets, e.g., MSRA-TD500 and USTB-SV1K, and is much better than state-of-the-art methods.

15:00-17:10, Paper MoPT2.20
Automatic Building Extraction from Oblique Aerial Images
Sun, Xiaofeng	Inst. of Automation, Chinese Acad. of Sciences
Shen, Shuhan	Inst. of Automation, Chinese Acad. of Sciences
Hu, Zhanyi	Inst. of Automation, Chinese Acad. of Sciences
Keywords: 2D/3D object detection and recognition, Image based modeling, Reconstruction and camera motion estimation Abstract: In this paper we propose an automatic urban building extraction method for oblique aerial images. Five steps are included in this method: point cloud generation, grid partition, feature extraction, building detection and building reconstruction. Taking advantage of recent progress in large-scale Structure from Motion (SfM) and Multiple View Stereo (MVS), dense point cloud is generated first. Then, we project the point cloud into a regularly spaced grid in XY plan, and convert the building extraction problem into an image segmentation problem. By combining the strength of the geometric attribute and spectral attribute, three complementary features are extracted and a MRF based graph model along with an energy function is created. Points belonging to buildings are recognized by minimizing this function, and prismatic 3D building models are reconstructed accordingly.

15:00-17:10, Paper MoPT2.21
Enhancing the Runtime of JUDOCA Detector
Gabr, Mohamed	German Univ. in Cairo
Elias, Rimon	German Univ. in Cairo
Keywords: 2D/3D object detection and recognition, Low-level vision, Early vision Abstract: In this work, two enhancement methods are proposed to speed up junction detection performed by the JUDOCA detector. The first enhancement method minimizes the number of junction candidates on which the circular kernel is applied. This is achieved by introducing a suppression technique that takes both the thin and thick edge images into consideration. The second method works on relaxing the step of checking the edge continuity. Instead of traversing the edge pixel by pixel, using Bresenham’s algorithm, sample pixels along the edge are considered. Test results show that an enhancement factor of 62% to 75% in runtime can be reached as a result of applying both enhancements.

15:00-17:10, Paper MoPT2.22
In the Saddle: Chasing Fast and Repeatable Features
Aldana-Iuit, Javier	Czech Tech. Univ. in Prague
Mishkin, Dmytro	Czech Tech. Univ. in Prague
Chum, Ondrej	Czech Tech. Univ. in Prague
Matas, Jiri	CTU Prague
Keywords: 2D/3D object detection and recognition, Low-level vision, Stereo and multiple view geometry Abstract: A novel similarity-covariant feature detector that extracts points whose neighborhoods, when treated as a 3D intensity surface, have a saddle-like intensity profile. The saddle condition is verified efficiently by intensity comparisons on two concentric rings that must have exactly two dark-to-bright and two bright-to-dark transitions satisfying certain geometric constraints. Experiments show that the Saddle features are general, evenly spread and appearing in high density in a range of images. The Saddle detector is among the fastest proposed. In comparison with detector with similar speed, the Saddle features show superior matching performance on number of challenging datasets.

15:00-17:10, Paper MoPT2.23
Minimal Filtered Channel Features for Pedestrian Detection
Kuranuki, Yoshiki	Yamaha Motor CO. LTD
Patras, Ioannis	Queen Mary Univ. of London
Keywords: 2D/3D object detection and recognition, Machine learning and data mining Abstract: This paper addresses the problem of efficient pedestrian detection using features that are extracted by convolving feature channels with a very small number of filters. The method uses as feature channels low level features such as LUV colour and HOG, and trains a boosted decision forest on top of the learned features. The feature selection is guided by a greedy search or by an exhaustive search on a few number of scaled versions of simple horizontal, vertical and uniform filters. Extensive results on the challenging Caltech dataset show that with only 3 filters we obtain state-of-the-art results achieving a 18.3% miss rate. Using optical flow as an additional input, we further improve the results and obtain a 15.5% miss rate.

15:00-17:10, Paper MoPT2.24
Fast Template Matching Using Brick Partitioning and Initial Threshold
Toyama, Fubito	Utsunomiya Univ
Mori, Hiroshi	Utsunomiya Univ. Information Systems Science
Shoji, Kenji	Utsunomiya Univ
Keywords: 2D/3D object detection and recognition, Motion, tracking and video analysis, Industrial image analysis Abstract: Template matching is a technique for finding a part of reference image which matches a template image. This paper presents a new fast template matching algorithm which can detect the most similar position. In the proposed method, first, an effective initial threshold is calculated using Winner Update Algorithm. Next, very fast template matching is achieved by using this initial threshold in Multilevel Successive Elimination Algorithm. Furthermore, Brick Partitioning which is a new partitioning method is used to reduce the computational cost of comparing a template with each position within reference image. Experimental results indicate that the proposed method can search faster than previous methods.

15:00-17:10, Paper MoPT2.25
A Multiscale Method for Shape Recognition of Overlapping Elliptical Particles
de Langlard, Mathieu	CEA
Al Saddik, Hania	Inst. National De Recherche Agronomique
Lamadie, Fabrice	CEA
Charton, Sophie	CEA
Debayle, Johan	Ec. Nationale Supérieure Des Mines De Saint-Etienne
Keywords: 2D/3D object detection and recognition, Occlusion and shadow detection Abstract: The particle size distribution (PSD) of a dispersed phase is a fundamental geometrical characteristic that needs to be determined from digital images for many industrial processes involving a multiphase flow. Nevertheless, when dealing with 2-D images, only the projections of the particles are visualized and therefore the particles can overlap each other. In this way, this paper aims to develop and evaluate a method for recognizing overlapping elliptical particles from 2-D images. It is based on four main steps: binarization of the studied image, detection of the contour segments, grouping of the segments and decomposition of the clusters. The proposed multiscale method enables small and large clusters of particles to be detected. The performance of the method has been evaluated both on synthetic images and on real images and also compared to a standard method of the literature.

15:00-17:10, Paper MoPT2.26
Dynamic Collaboration of Far-Infrared and Visible Spectrum for Human Detection
Blondel, Paul	Univ. Picardie Jules Vernes (UPJV) - MIS Lab
Potelle, Alex	Univ. Picardie Jules Vernes (UPJV)
Pegard, Claude	MIS
Lozano, Rogelio	Univ. Tech. De Compiègne (UTC)
Lara Alabazares, David	Univ. Autonomous of Tamaulipas
Attachments: Supplementary material Keywords: 2D/3D object detection and recognition, Pattern Recognition for Search, Retrieval and Visualization, Vision for robotics Abstract: This paper is about the collaborative use of a farinfrared spectrum human detector and a visible spectrum human detector; the idea is to make collaborate these two detectors of different nature to automatically adapt the human detection whatever the luminosity changes and whatever the infrared emission changes of the scene. Our collaborative approach of detection handles: 1) gradual luminosity changes due, for instance, to the passage from night to day (and vice-versa), 2) sudden luminosity changes due, for instance, to navigation in a forest (when going through a glade in a forest), 3) infrared emission saturation when the global temperature of the scene is very high and does not permit to distinguish human people in infrared. Our approach of detection permits to detect people 24 hours a day and regardless the weather conditions. Furthermore, the proposed approach is relatively fast: it is practically as fast as using one detector alone whereas two are used in the same time.

15:00-17:10, Paper MoPT2.27
Quaternion-Type Moments Combining Both Color and Depth Information for RGB-D Object Recognition
Chen, Beijing	Nanjing Univ. of Information Science & Tech. Nanjing,
Yang, Jianhao	Nanjing Univ. of Information Science & Tech. Nanjing,
Ding, Mengru	Nanjing Univ. of Information Science & Tech. Nanjing,
Liu, Tianliang	Nanjing Univ. of Posts and Telecommunications, JiangsuProvi
Zhang, Xinpeng	School of Communication & Information Engineering, Shanghai Univ
Keywords: 2D/3D object detection and recognition, Representation and analysis in pixel/voxel images Abstract: The existing quaternion-type moments (QTMs) are based on the quaternion representation (QR) of color images. However, this representation creates redundancy when using four-dimensional quaternions to represent color images with three components. In this paper, for RGB-D images, the QR is improved by combining both color and depth information, which is invariant to lighting and color variations. The improved QR fully utilizes the four-dimensional quaternion domain. The new QTMs (NQTMs) are defined using the improved QR. They are combined with the quaternion back-propagation neural network (QBPNN) for RGB-D object recognition. The experimental results demonstrate that the NQTMs outperform our previous QTMs considering only color information.

15:00-17:10, Paper MoPT2.28
Symmetry-Based Object Proposal for Text Detection
Zhang, Xuelei	Huazhong Univ. of Science and Tech
Zheng, Zhang	Huazhong Univ. of Science and Tech
Zhang, Chengquan	Huazhong Univ. of Science and Tech
Bai, Xiang	Huazhong Univ. of Sci. and Tech
Keywords: 2D/3D object detection and recognition, Scene understanding Abstract: Scene text detection and recognition have become active research topics in computer vision. In this paper, we focus on the detection of text proposal from wild images. Text proposals attempt to generate a relatively small set of bounding box proposals that are most likely to contain text. Different from previous methods that merge similar region based on property of individual region, we assumed that text word bare strong symmetry property. We propose a new algorithm that exploit the symmetry property to directly generate word-level proposals. Proposals generation process using the region features, and rank process making use of the symmetry structures in text groups. Experiments on two standard datasets demonstrate that the proposed algorithm has achieve the state-of-the-art performance, especially in the case of smaller proposal number

15:00-17:10, Paper MoPT2.29
Enhanced Face Detection Using Body Part Detections for Wearable Cameras
Brown, Lisa	IBM T. J. Watson Res. Center
Quanfu, Fan	IBM T. J. Watson Res. Center
Keywords: 2D/3D object detection and recognition, Scene understanding, Security issues Abstract: With the recent broad acceptance of body worn cameras (BWC) for police departments, there is an increased need to perform video analytics for this domain. However, body worn cameras pose several challenges including severe motion blur, barrel camera distortion from wide angle lenses, close proximity and odd viewing angles, and poor lighting conditions. In this paper, we evaluate the performance of several state of the art face detection approaches including Aggregate Channel Features [1] and Faster R-CNN [2] and show their limitations in this domain. We then describe how face detection can be improved for BWC by corroborating information from body parts detection. We design a system using 0-1 linear integer programming to optimize the matching of body parts detections for each person in the scene and to maximize the hit rate of faces with supporting evidence. By leveraging information from body parts detection, we are able to improve the average precision by nearly two percent.

15:00-17:10, Paper MoPT2.30
3D Object Recognition from Large-Scale Point Clouds with Global Descriptor and Sliding Window
Gunji, Naoyuki	NTT Corp
Niigaki, Hitoshi	Nippon Telegraph and Telephone Corp
Tsutsuguchi, Ken	NTT Media Intelligence Lab. NTT Corp
Kurozumi, Takayuki	NTT Corp
Kinebuchi, Tetsuya	NTT
Keywords: 2D/3D object detection and recognition, Shape modeling and encoding Abstract: We propose a novel method for the recognition of objects that match a given 3D model in large-scale scene point clouds captured in indoor environments with a laser range finder. Since large-scale indoor point clouds are greatly damaged by noise such as clutter, occlusion, hole, and measurement errors, it is difficult to exactly identify local correspondences between points in a target model point cloud and points in a scene point cloud, based on similarities between local descriptors computed at keypoints on both point clouds. To avoid such a problem, we suggest to utilize sliding window in order to match the input model and pieces of scene point clouds, both of which are represented with Bag-of-Features(BoF). A BoF representation of a window is efficiently calculated by using the integral image, which stores accumulated BoF vectors. Though BoF is robust to partial noises, it does not preserve any spatial information. Then, we propose a method to make a global descriptor of a window which is almost invariant to horizontal rotations of an object inside the divided window and roughly preserves spatial information by dividing sliding window into several parts. Experiments on real world data show that our approach offers better performance than a baseline method in terms of precision and recall.

15:00-17:10, Paper MoPT2.31
Single Image Depth Estimation Using Joint Local-Global Features
Mohaghegh, Hoda	Isfahan Univ. of Tech
Karimi, Nader	Isfahan Univ. of Tech
Soroushmehr, S.M. Reza	Univ. of Michigan
Samavi, Shadrokh	McMaster Univ
Najarian, Kayvan	Univ. of Michigan
Keywords: 2D/3D object detection and recognition, Statistical, syntactic and structural pattern recognition, Multimedia analysis, indexing and retrieval Abstract: Inferring scene depth from a single monocular image is an essential component in several computer vision applications such as 3D modeling and robotics. This process is an ill-posed problem. To tackle this challenging problem, previous efforts have been focusing on exploiting only global or local depth aware properties. We propose a model that incorporates both of them to obtain significantly more accurate depth estimates than using either global or local properties alone. Specifically, we formulate single image depth estimation as a K nearest neighbor search problem at both image level and patch level. At each level, a set of rich depth aware features, describing monocular depth cues, is employed in a nearest-neighbor regression model. By comparing the results with and without patch based fusion, the importance of our joint local-global framework becomes clear. The experimental results also demonstrate superior performance compared with existing data-driven approaches in both quantitative and qualitative analyses with a significantly simpler algorithm than others.

15:00-17:10, Paper MoPT2.32
A Multi-Modal RGB-D Object Recognizer
Fäulhammer, Thomas	Vienna Univ. of Tech
Zillich, Michael	Vienna Univ. of Tech
Prankl, Johann	Vienna Univ. of Tech
Vincze, Markus	TU Wien
Keywords: 2D/3D object detection and recognition, Vision for robotics, Classification and clustering Abstract: In this paper we propose a multi-modal object recognition system that uses a two-step hypothesis verification approach to improve runtime efficiency. The system uses local and global appearance and shape features, generating many possibly competing hypotheses, which are then verified such that the scene can be optimally explained in terms of recognized object models. The introduced modification in this time consuming step reduces runtime considerably, while maintaining recognition performance. We evaluate recognition performance for various feature extraction modalities on the publicly available Willow Garage RGB-D dataset and show runtime improvements of a factor 2 to 10.


MoPT3	Poster Session Hall
MoP3	Poster Session

15:00-17:10, Paper MoPT3.1
Fast Motion Deblurring Using Gyroscopes and Strong Edge Prediction
Zhao, Jiacai	Huazhong Univ. of Science and Tech
Ma, Jie	Huazhong Univ. of Science and Tech
Fang, Bin	Huazhong Univ. of Science and Tech
Quan, Siwen	Huazhong Univ. of Science and Tech
Hu, Fangyu	Huazhong Univ. of Science and Tech
Keywords: Enhancement, restoration and filtering Abstract: This paper presents a fast deblurring algorithm to remove camera motion blur from a single photograph using built-in gyroscopes and strong edge prediction. An inaccurate blur kernel or point spread function (PSF) usually leads to an unsatisfying restored result. Hence, we propose a robust three-phase method for accurate PSF estimation. In the first stage, we utilize the embedded gyroscopes to compute a coarse version of the PSF from the camera’s angular velocity during an exposure. In order to reduce the execution time of the later PSF modification, we introduce a patch selection procedure in the second stage to choose a suitable region from the blurry image based on the size of the coarse PSF estimated in stage one. The third phase aims to modify the coarse PSF to obtain an accurate one by predicting strong edges from an estimated latent image. In our experiments, we compare the restoration performance of several state-of-the-art approaches including ours and find that the proposed method outperforms others qualitatively as well as quantitatively. In addition, our method is also compared with the multi-scale approach without gyroscope data and shows shorter processing time and comparable deblurring quality. To the best of our knowledge, this is the first work that combines the sensor-aided method with the image-based approach to estimate the blur kernel.

15:00-17:10, Paper MoPT3.2
Learning-Based Single Image Dehazing Via Genetic Programming
Lee, Chulwoo	Northumbria Univ
Shao, Ling	Northumbria Univ
Keywords: Enhancement, restoration and filtering, Image and video analysis and understanding Abstract: A genetic programming (GP)-based framework to learn the effective feature representation for image dehazing is proposed in this work. In GP, an individual program is randomly generated and genetically evolved to achieve the desired goal. To make GP estimate haze in an input image, a set of operators and operands is designed, each of which is a primitive of a GP program. Specifically, we provide four basic features as candidates, and also include function operators to construct sophisticated representations of these features. After the entire GP process finishes, we obtain a near-optimal compact descriptor for haze estimation. Experimental results demonstrate that the proposed algorithm enhances the visual quality of haze-degraded images both objectively and subjectively.

15:00-17:10, Paper MoPT3.3
Enhancement of Low Light Level Images with Coupled Dictionary Learning
Yang, Jie	Inst. of Automation, Chinese Acad. of Sciences
Jiang, Xinwei	Inst. of Automation, Chinese Acad. of Sciences
Liu, Cheng-Lin	Inst. of Automation, Chinese Acad. of Sciences
Pan, Chunhong	Inst. of Automation, Chinese Acad. of Sciences
Keywords: Enhancement, restoration and filtering, Low-level vision, Statistical, syntactic and structural pattern recognition Abstract: Low Light Level Images (LLLIs) are captured with exceptionally low brightness and low contrast, and cannot be enhanced satisfactorily with ordinary methods. In this paper, we propose a LLLI enhancement method using coupled dictionary learning. During the training stage, a pair of dictionaries and a linear mapping function are learned simultaneously. The dictionary pair aims to describe the raw LLLIs and their enhanced versions, and the linear mapping function models the correspondence between the representations of the dictionary pair. In the enhancement process, the resulting image is generated through dictionary mapping from patches of the input LLLI. We adopt a clustering strategy to improve the robustness of coupled dictionary learning, and propose an improved algorithm for fast implementation. Experimental results on real images demonstrate the effectiveness of our method.

15:00-17:10, Paper MoPT3.4
Intelligent Neural Computing-Based Way for Multi-Sensor Imaging Radar Data Fusion: An Aggregated Maximum Entropy Model-Based Regularization Approach
Shkvarko, Yuriy	Res. Professor/CINVESTAV Del IPN, Unidad Guadalajara
López Ruíz, Josué Antonio	CINVESTAV Del IPN, Unidad Guadalajara
Santos Arce, Stewart René	Univ. De Guadalajara
García Torales, Guillermo	Univ. De Guadalajara
Keywords: Enhancement, restoration and filtering, Sensor array & multichannel signal processing, Coding, compression and super-resolution Abstract: We address and compare two new frameworks for neural network (NN) computing-based feature enhanced (FE) fusion of remote sensing (RS) imagery acquired with different coherent radar sensing modalities. Both approaches exploit aggregation of the descriptive experiment design regularization (DEDR) based and the theoretical informatics inspired maximum entropy (ME) regularization paradigms for iterative minimization of the energy function (EF) of the multistate MENN with adaptive adjustments of the NN’s synaptic weights and bias inputs. Two distinct ways employed for the construction and minimization of the NNs’ EFs specify two corresponding different fusion frameworks addressed as the DEDR-MENN(1) and the DEDR-MENN(2) techniques, respectively. The DEDR-MENN(1) framework is based on the weighted aggregation of the individual fused sensor’s objective functions, while the DEDR-MEN(2) performs the FE fusion employing the theoretical informatics inspired unified model-based MENN framework. We compare the computational implementation and performance issues of both addressed methods and feature through computer simulations with the real-world radar imagery the superior operational performances attained with the DEDR-MENN(2) approach.

15:00-17:10, Paper MoPT3.5
Characterizing the Structure Tensor Using Gamma Distributions
Oskarsson, Magnus	Lund Univ
Keywords: Enhancement, restoration and filtering, Signal, image and video processing, Computational photography Abstract: The structure tensor is a powerful tool describing the local intensity structure of an image or image sequence. In this paper we give a model for the noise distribution of the components of the tensor. In order to do so we have also investigated some properties of the gamma distribution. We show that, given an input image corrupted with Gaussian noise, the noise in the structure tensor can be modeled well by gamma distributions. We apply our model to automatic contrast enhancement of images taken under poor illumination. We show how our noise model can be used for automatic parameter selection in the filtering process, giving powerful results without the need for cumbersome parameter tuning.

15:00-17:10, Paper MoPT3.6
A Hyperspectral Image Restoration Method Based on Analysis Sparse Filter
Han, Chang	Huazhong Univ. of Science and Tech
Sang, Nong	Huazhong Univ. of Science and Tech
Gao, Changxin	Huazhong Univ. of Science and Tech
Keywords: Enhancement, restoration and filtering, Signal, image and video processing, Inpainting and Superimposing Abstract: Cosparse analysis model has shown its superior performance in image reconstruction. However, this analysis frame has not been exploited yet for hyperspectral image restoration task. An analysis operator learning method called GOAL (GeOmetric Analysis operator Learning) is applied for hyperspectral image. Considering the correlation of the hyperspectral bands, the hyperspectral images were cropped into cube cells to get training samples. To avoid the window effect by image patch strategy, the analysis sparse filter method which provides global support from local information of the image was adopted. The denoising experiments and inpainting experiments are implemented on the hyperspectral images. The results are compared with the state of the art method, which shows our method is robust and efficient.

15:00-17:10, Paper MoPT3.7
Comparative Evaluation of Deblurring Techniques for Fresnel Lens Computational Imaging
Nikonorov, Artem	Samara State Aerospace Univ. (National Res. Univ
Kazanskiy, Nikolay	Image Processing Systems Inst. of the Russian Acad. of Sci
Skidanov, Roman	Image Processing Systems Inst. of the Russian Acad. of Sci
Petrov, Maksim	Samara State Aerospace Univ. (National Res. Univ
Bibikov, Sergey	Samara State Aerospace Univ. (National Res. Univ
Yakimov, Pavel	Image Processing System Inst. of Russian Acad. of Sciences
Yuzifovich, Yuriy	Samara State Aerospace Univ. (National Res. Univ
Fursov, Vladimir	Samara State Univ
Keywords: Enhancement, restoration and filtering, Signal, image and video processing, Texture and color analysis Abstract: With suggested computational post-processing workflow for correcting optical distortions, the Fresnel lens can finally be used in lightweight and inexpensive computer vision sensors. Common methods for image enhancement do not comprehensively address the blurring artifacts caused by strong chromatic aberrations in images produced by a simple Fresnel optical system. To deliver image quality acceptable for general-purpose color imaging, we propose a computational post-capture processing to enhance the quality of images acquired with a 256-level Fresnel lens. The PSNR quality measure is then applied to estimate resulting quality for different deblurring techniques. A novel technique that removes chromatic blur without computationally expensive deconvolution can be considered a breakthrough as it finally enables in-camera post-processing.

15:00-17:10, Paper MoPT3.8
Patch Sparsity Based Image Inpainting Using Local Patch Statistics and Steering Kernel Descriptor
Ghorai, Mrinmoy	INDIAN STATISTICAL Inst. KOLKATA
Mandal, Sekhar	Indian Inst. of Engineering Science and Tech. Shibpur
Chanda, Bhabatosh	Indian Statistical Inst
Keywords: Enhancement, restoration and filtering, Texture and color analysis, Segmentation, features and descriptors Abstract: This paper presents a sparse representation based image inpainting method using local patch geometric structure based feature extraction. In local patch analysis, we approximate the target region by weighted average of some local patches which are frequently occurred within the neighborhood. Local patch statistics is applied to find most relevant neighbors for each target patch. We also apply local steering kernel (LSK) based feature to preserve geometric structure and texture sharpness in the target region. To take the advantage of non local self similarity as redundancy of similar patches in natural images we find the candidate patches from the whole source region. Based on these local and non local prior we propose a sparse representation framework for image inpainting. Our proposed method is tested on wide range of natural images. The experimental results show the superiority of the proposed method compared to some of the previous approaches.

15:00-17:10, Paper MoPT3.9
Using a MRF-BP Model with Color Adaptive Training for Underwater Color Restoration
Ponce Hinestroza, Abraham Noe	CINVESTAV
Torres-Méndez, Luz Abril	CINVESTAV Campus Saltillo
Drews, Paulo	Univ. Federal Do Rio Grande - FURG
Keywords: Enhancement, restoration and filtering, Transfer learning, Other applications Abstract: For underwater robotics applications involving monitoring and inspection tasks, it is important to capture quality color images in real time. In this paper, we propose a statistically learning method with an automatic selection of the training set for restoring the color of underwater images. Our statistical model is a Markov Random Field with Belief Propagation (MRF-BP). The quality of the results depends strongly on the trained correlations between the degraded image and its corresponding color image. However, it is not possible to have color ground truth data given the inherent conditions of underwater environments. Thus, we build a color adaptive training set by applying a multiple color space analysis to those frames that present a high change in its distribution from the previous frame and use only those frames for training. Experimental results in real underwater video sequences demonstrate that our approach is feasible, even when visibility conditions are poor, as our method can recover and discriminate between different colors in objects that may seem similar to the human eye.

15:00-17:10, Paper MoPT3.10
Change Detection in Marine Observatory Image Streams Using Bi-Domain Feature Clustering
Möller, Torben	Bielefeld Univ. Biodata Mining Group
Nilssen, Ingunn	Statoil ASA and Norwegian Univ. of Science and Tech. (
Nattkemper, Tim W.	Bielefeld Univ
Keywords: Image and video analysis and understanding, 2D/3D object detection and recognition, Machine learning and data mining Abstract: Vision based environmental monitoring using fixed cameras generates large image collections, creating a bottleneck in data analysis. In areas with limited background knowledge of the monitored habitat, this bottleneck can often not be overcome by traditional pattern recognition methods. A new change detection method to identify interesting events such as presence and behavior of different species is proposed. The change detection method uses the new Bi-Domain Feature Clustering (BDFC). BDFC integrates the location of a feature vector in the feature space as well as the location in the image into the clustering. Firstly, BDFC is applied to a time dependent representation of the image stream to identify regions of similar change. Secondly it is applied to a time independent representation to group these changes into categories. These categories can rapidly be assessed by a human observer to bypass the time consuming inspection of the whole data set. To make the posterior browsing of detected changes more efficient, a relevance factor computed for each category is proposed. The approach is demonstrated with experimental runs, using images from the Lofoten Vesteralen ocean observatory, showing the potential to harvest changes of interest and novelties in large image collections.

15:00-17:10, Paper MoPT3.11
An MCMC-Based Prior Sub-Hypergraph Matching in Presence of Outliers
Zhang, Ruonan	Peking Univ
Wenmin, Wang	Peking Univ
Ronggang, Wang	Peking Univ
Keywords: Image and video analysis and understanding, 2D/3D object detection and recognition, Representation and analysis in pixel/voxel images Abstract: Correspondence problems are very challenging due to the complexity of real-world scenes. Some hypergraph matching methods have been proposed for improving the recall of the solution, but the numerous outliers are brought since the precision is rarely considered. To solve this issue, we propose a sub-hypergraph matching method, which is robust with better integration of geometric information and reduces the difficulty of NP-hard problem happened in hypergraphs. To narrow the search space and solve the optimization problem, a new prior strategy and cell-algorithm in Markov Chain Monte Carlo (MCMC) framework is proposed on sub-hypergraph matching. The experiments show that our proposed method significantly outperforms other state-of-the-art algorithms.

15:00-17:10, Paper MoPT3.12
3D GLOH Features for Human Action Recognition
Abdulmunem, Ashwan Anwer	Cardiff Univ
Lai, Yu-Kun	Cardiff Univ
Sun, Xianfang	Cardiff Univ
Keywords: Image and video analysis and understanding, Classification and clustering, Motion, tracking and video analysis Abstract: Human action recognition from videos has wide applicability and receives significant interests. In this work, to better identify spatio-temporal characteristics, we propose a novel 3D extension of Gradient Location and Orientation Histograms, which provides discriminative local features representing not only the gradient orientation, but also their relative locations. We further propose a human action recognition system based on the Bag of Visual Words model, by combining the new 3D GLOH local features with Histograms of Oriented Optical Flow (HOOF) global features. Along with the idea from our recent work to extract features only in salient regions, our overall system outperforms existing feature descriptors for human action recognition for challenging real-world video datasets.

15:00-17:10, Paper MoPT3.13
With Whom Do I Interact? Detecting Social Interactions in Egocentric Photo-Streams
Aghaei, Maedeh	Univ. of Barcelona
Dimiccoli, Mariella	Computer Vision Center
Radeva, Petia	CVC
Keywords: Image and video analysis and understanding, Deep learning Abstract: Given a user wearing a low frame rate wearable camera during a day, this work aims to automatically detect the moments when the user gets engaged into a social interaction solely by reviewing the automatically captured photos by the worn camera. The proposed method, inspired by the sociological concept of F-formation, exploits distance and orientation of the appearing individuals -with respect to the user- in the scene from a bird-view perspective. As a result, the interaction pattern over the sequence can be understood as a two-dimensional time series that corresponds to the temporal evolution of the distance and orientation features over time. A Long-Short Term Memory-based Recurrent Neural Network is then trained to classify each time series. Experimental evaluation over a dataset of 30.000 images has shown promising results on the proposed method for social interaction detection in egocentric photo-streams.

15:00-17:10, Paper MoPT3.14
Video2Vec: Learning Semantic Spatio-Temporal Embeddings for Video Representation
Hu, Sheng-Hung	Arizona State Univ
Li, Yikang	Arizona State Univ
Li, Baoxin	Arizona State Univ
Keywords: Image and video analysis and understanding, Deep learning, Signal, image and video processing Abstract: We propose to learn semantic spatio-temporal embeddings for videos to support high-level video analysis. The first step of the proposed embedding employs a deep architecture consisting of two channels of convolutional neural networks (capturing appearance and local motion) followed by their corresponding Gated Recurrent Unit encoders for capturing longer-term temporal structure of the CNN features. The resultant spatio-temporal representation (a vector) is used to learn a mapping via a multilayer perceptron to the word2vec semantic embedding space, leading to a semantic interpretation of the video vector that supports high-level analysis. We demonstrate the usefulness and effectiveness of this new video representation by experiments on action recognition, zero-shot video classification, and “word-to-video” retrieval, using the UCF-101 dataset.

15:00-17:10, Paper MoPT3.15
Effective Surface Normals Based Action Recognition in Depth Images
Nguyen, Xuan Son	Univ. of Lorraine
Nguyen, Thanh Phuong	Univ. of Toulon
Charpillet, François	Inria
Keywords: Image and video analysis and understanding, Gesture and Behavior Analysis Abstract: In this paper, we propose a new local descriptor for action recognition in depth images. The proposed descriptor relies on surface normals in 4D space of depth, time, spatial coordinates and higher-order partial derivatives of depth values along spatial coordinates. In order to classify actions, we follow the traditional Bag-of-words (BoW) approach, and propose two encoding methods termed Multi-Scale Fisher Vector (MSFV) and Temporal Sparse Coding based Fisher Vector Coding (TSCFVC) to form global representations of depth sequences. The high-dimensional action descriptors resulted from the two encoding methods are fed to a linear SVM for efficient action classification. Our proposed methods are evaluated on two public benchmark datasets, MSRAction3D and MSRGesture3D. The experimental result shows the effectiveness of the proposed methods on both the datasets.

15:00-17:10, Paper MoPT3.16
Mutually Incoherent Pose Bases for Action Recognition
Qian, Yinzhong	Fudan Univ
Chen, Wenbin	Fudan Univ
Shen, I-fan	Fudan Univ
Keywords: Image and video analysis and understanding, Gesture and Behavior Analysis, Image based modeling Abstract: We propose mutually incoherent pose bases for action recognition in static image, each of which implicitly represents co-occurrence of poselets. First of all, action specific poselets are trained. To suppress the ambiguity of detection, we cluster poselet activation by the overlap of predicted torso bound of each poselet. Then pose feature of an action person can be extracted which is a vector composed of poselet detection. In dictionary training, our challenge is that dictionary is over complete thus small perturbation in pose feature would cause significant change in sparse code, thus might change classification result. In our framework, a penalty which induces pose bases become mutually incoherent is added to the objective function. We evaluate the method on PASCAL VOC 2012 Action dataset and Ikizler 5-Action dataset, experiment results show wonderful performance compared with counterparts and baselines.

15:00-17:10, Paper MoPT3.17
Collaborative Segmentation and Classification for Remote Sensing Image Analysis
Troya-Galvis, Andrés	ICube, Univ. De Strasbourg
Gançarski, Pierre	ICube, Univ. De Strasbourg
Berti-Équille, Laure	Qatar Computing Res. Inst. Hamad Bin Khalifa Univ
Keywords: Image and video analysis and understanding, Machine learning and data mining, Other applications Abstract: In this article we present CoSC, a generic framework for collaborative segmentation and classification. The framework is guided by both radiometric homogeneity based criteria and implicit semantic criteria to segment and extract the objects of a given thematic class. We present a proof-of-concept case-study and show that CoSC is able to reach higher confidence for object classification and results in significant improvement of the whole segmentation.

15:00-17:10, Paper MoPT3.18
Key Frame Extraction for Salient Activity Recognition
Kulhare, Sourabh	Rochester Inst. of Tech
Sah, Shagan	Rochester Inst. of Tech
Pillai, Suhas	Rochester Inst. of Tech
Ptucha, Raymond	Rochester Inst. of Tech
Keywords: Image and video analysis and understanding, Motion, tracking and video analysis, Deep learning Abstract: Surveillance cameras have become big business, with most metropolitan cities spending millions of dollars to watch residents, both from street corners, public transportation hubs, and body cameras on officials. Watching and processing the petabytes of streaming video is a daunting task, making auto- mated and user assisted methods of searching and understanding videos critical to their success. Although numerous techniques have been developed, large scale video classification remains a difficult task due to excessive computational requirements. In this paper, we conduct an in-depth study to investigate effective architectures and semantic features for efficient and accurate solutions to activity recognition. We investigate different color spaces, optical flow, and introduce a novel deep learning fusion architecture for multi-modal inputs. The introduction of key frame extraction, instead of using every frame or a random representation of video data, make our methods computationally tractable. Results further indicate that transforming the image stream into a compressed color space reduces computational requirements with minimal affect on accuracy.

15:00-17:10, Paper MoPT3.20
Frame Level Annotations for Tennis Videos
Sukhwani, Mohak	IIIT Hyderabad
Jawahar, C. V.	IIIT
Keywords: Image and video analysis and understanding, Multimedia analysis, indexing and retrieval, Signal, image and video processing Abstract: Content based indexing is critical to the effective access of the multimedia data. To this end, visual data is often annotated with textual content for bridging the semantic gap. In this paper, we present a method to generate frame level fine grained annotations for a given video clip. Access to the frame level fine grained annotations lead to rich, dense and meaningful semantic associations between the text and video. This in turn makes the video retrieval systems more robust and accurate. We demonstrate the use of probabilistic label consistent sparse coding and dictionary learning with a K-SVD algorithm to generate 'fine grained' annotations for a class of videos – lawn tennis. The algorithm simultaneously learns a classifier and a dictionary to generate the frame level annotations for the tennis videos using available textual descriptions. The utility of the proposed algorithm is demonstrated on a publicly available tennis dataset comprising of tennis match videos from Olympics games.

15:00-17:10, Paper MoPT3.21
A Rank Minimization-Based Late Fusion Method for Multi-Label Image Annotation
Yao, Yao	Beijing Inst. of Tech
Xin, Xin	Beijing Inst. of Tech
Guo, Ping	Beijing Normal Univ
Keywords: Image and video analysis and understanding, Scene understanding Abstract: Image annotation is a hard multi-label learning problem which aims at automatically tagging each input image with relevant keywords reflecting its semantic concepts. Recently, several late fusion methods were proposed to improve the accuracy of image annotation. But these late fusion methods need normalization of confidence score vectors of independent models corresponding to distinct representations. Choosing a good normalization function is tricky and difficult. In this paper, we propose a new method of late fusion for image annotation based on rank minimization. The proposed method avoids normalization by transforming confidence score vectors into pairwise relationship matrices. And an optimal matrix is obtained by solving a minimization optimization problem. With the optimal matrix, a fused confidence score vector can be recovered, which gives the final prediction of tags. Experiments on standard Corel5K and our Campus-Indoor dataset confirm the effectiveness of our late fusion method for image annotation.

15:00-17:10, Paper MoPT3.22
Noise Stable Image Registration Using Random Resample Consensus
Nakazawa, Atsushi	Kyoto Univ
Keywords: Image and video analysis and understanding, Segmentation, features and descriptors, Reconstruction and camera motion estimation Abstract: Image registration is an important and fundamental problem in computer vision and image processing. Although there are currently a large number of image registration algorithms such as RANSAC and its extensions, image registration under very noisy conditions remains difficult when it cannot obtain enough number of correct corresponding points. This paper solves this issue by introducing a random resample consensus (RANRESAC) strategy, which achieves robust registration where it is difficult to obtain enough numbers of correct correspondence pairs. In contrast to RANSAC, proposed RANRESAC newly generate corresponding points for the images using the hypothesis transformation function, and verifies the correctness by evaluating the similarity of the local features at the newly sampled points. To confirm the effectiveness for the proposed method, we first conducted an preliminary experiment that evaluates the similarity of texture and orientation components of SURF local descriptor in the images adding several levels of noise. As the result, we observed the texture component is more stable than the orientation component. Based on this finding, we design the RANRESAC algorithm and performed experiments using a open image registration dataset. As the result, proposed method outperforms to the RANSAC, MSAC and Optimal RANSAC algorithms in large noise conditions.

15:00-17:10, Paper MoPT3.23
Robust Road Detection from a Single Image
Zhang, Junkang	Southeast Univ
Xia, Siyu	Southeast Univ
Lu, Kaiyue	Australian National Univ
Pan, Hong	School of Automation, Southeast Univ
Qin, A. K.	RMIT Univ
Keywords: Image and video analysis and understanding, Segmentation, features and descriptors, Scene understanding Abstract: Road detection from images is a challenging task in computer vision. Previous methods are not robust, because their features and classifiers cannot adapt to different circumstances. To overcome this problem, we propose to apply unsupervised feature learning for road detection. Specifically, we develop an improved encoding function and add a feature selection process to obtain robust and discriminative road features. Besides, a road segmentation algorithm is proposed to extract road regions from the learned feature maps, in which a tree structure is established to represent the hierarchical relations of various regions segmented by multiple thresholds, and a two-loop optimization is then employed to select the most stable regions as road areas. Experimental results on several challenging datasets justify the effectiveness of our method.

15:00-17:10, Paper MoPT3.24
Detection of Glare in Night Photography
Singh, Mandakinee	Samsung Res. & Development, Bangalore
Tiwari, Rajesh	Samsung R&D
Swami, Kunal	Samsung R&D Inst. India - Bangalore
Vijayvargiya, Ajay	Samsung R&D Insitute
Keywords: Image and video analysis and understanding, Segmentation, features and descriptors, Signal, image and video processing Abstract: Glare is a hardware problem that occurs because of the light trapped in the lens elements. It is a common problem faced in photography when trying to capture image of a scene having bright source in it or taken in a very bright environment. Glare can hide useful information in the image, can make foreground objects blurry and deformed. In this paper, we propose a novel method to detect glare, mainly focusing on scenario where users take photo of scene having light source in outdoor environment during night. The method described in the paper takes combination of three different masks of original image to detect the glare. First mask is obtained by image segmentation of original image using our improved Bernsen’s local thresholding method. To obtain second mask, we binarize the original image by simple thresholding to get specular hot-spot of light present in the original image and for the third mask, we apply thresholding on each RGB channel of original image. Finally, glare is detected using connected component computation on aforementioned masks. The proposed solution detects the glare affected area with very good accuracy.

15:00-17:10, Paper MoPT3.25
Vector R-Ordering Based Selection of Segments for Video Skimming
V K, Vivekraj	Indian Inst. of Tech. Roorkee
Sen, Debashis	Indian Inst. of Tech. - Kharagpur
Raman, Balasubramanian	Indian Inst. of Tech. Roorkee, Roorkee, India
Keywords: Image and video analysis and understanding, Signal, image and video processing Abstract: Video skimming is a process of generating a shorter yet fully comprehensible version of a given video as its dynamic summary. A generic skimming system involves division of the video into segments and selecting the segments based on their suitability. The suitability is often obtained considering various features of the video and combining their individual contributions. Suggesting that the combination causes loss of information, we propose collective representation of the individual contributions in the form of a vector and use vector reduced (R)-ordering to judge the suitability. R-ordering based tree-structured organization and similarity levels of the video segments are employed to determine the suitability. Comparing with user generated summaries, we show that a video summary generated by a general skimming approach using R-ordering will be more effective in covering the important parts of a given video than when a feature combination is used.

15:00-17:10, Paper MoPT3.26
Prediction Based Seam Carving for Video Retargeting
Kaur, Harpreet	1Department of Computer Science and Engineering, Indian Inst
Kour, Swarnjeet	Department of Computer Science and Engineering, Indian Inst
Sen, Debashis	Indian Inst. of Tech. - Kharagpur
Keywords: Image and video analysis and understanding, Signal, image and video processing Abstract: This paper presents a prediction based spatio-temporal seam carving scheme for video retargeting. It resizes the video maintaining appropriate balance between spatial and temporal coherence. In a video frame, the proposed approach finds a ‘temporal’ seam by using Kalman filter estimation and then modifies it with the help of ‘spatial’ seam considering both spatial and temporal coherency. Unlike image retargeting, it is of utmost importance in retargeting a video frame to consider temporal coherency along with spatial coherency to remove or replicate unimportant background portion. This will ensure that insignificant amount of motion artifacts are introduced during resizing. The proposed Kalman filter based approach not only predicts a spatio-temporal seam to mark a portion of the frame where there is more possibility of having spatially and temporally coherent seam, but also has low time complexity. The proposed approach outperforms other state-of-the-art video retargeting methods which is illustrated by experimental results.

15:00-17:10, Paper MoPT3.27
Multi-Focus Image Fusion Using Quaternion Wavelet Transform
Zheng, Xueni	Jiangnan Univ
Luo, Xiaoqing	Jiangnan Univ
Zhang, Zhancheng	Suzhou Univ. of Science and Tech
Wu, Xiaojun	Jiangnan Univ
Keywords: Signal, image and video processing, Image based modeling, Image and video analysis and understanding Abstract: To avoid the introduction of false information during the fusion progress, a novel multi-focus image fusion method is proposed in quaternion wavelet transform domain. To obtain the dependency in different high frequency subbands, a quaternion wavelet contextual hidden Markov model (Q-CHMM) is established for modeling quaternion wavelet coefficients. And for better image representations, several features are proposed by analyzing the transform coefficients, phases of coefficients and the statistical attribution of coefficients. Different from the traditional fusion methods basing on a single feature, a comprehensive feature is constructed by using quaternion matrix to fuse the high frequency subbands. Experimental results demonstrate that the proposed method possess good fusion performance.


MoPT4	Poster Session Hall
MoP4	Poster Session

15:00-17:10, Paper MoPT4.1
View Invariant Gait Recognition Using Only One Uniform Model
Yu, Shiqi	Shenzhen Univ
Wang, Qing	Shenzhen Univ
Shen, Linlin	Shenzhen Univ
Huang, Yongzhen	Inst. of Automation, Chinese Acad. of Sciences
Keywords: Gait recognition Abstract: Gait recognition has been proved useful in human identification at a distance. But view variance of gait feature is always a great challenge because of the difference in appearance. If the view of the probe is different from that of the gallery, one view transformation model can be employed to convert the gait feature from one view to another. But most existing models need to estimate the view angle first, and can work for only one view pair. They can not convert multi-view data to one specific view efficiently. We employ one deep model based on auto-encoder for view invariant gait extraction.The model can synthesize gait feature in a progressive way by stacked multi-layer auto-encoders. The unique advantage is that it can extract view invariant feature from any view using only one model, and view estimation is not needed. The proposed method is evaluated on a large dataset, CASIA Gait Dataset B. The experimental results show that it can achieve state-of-the-art performance, and the improvement is more obvious when the view variance is larger.

15:00-17:10, Paper MoPT4.3
Learning Shape Variations of Motion Trajectories for Gait Analysis
Devanne, Maxime	Telecom Lille/LIFL
Wannous, Hazem	Univ. of Lille1
Berretti, Stefano	Univ. of Florence
Pala, Pietro	Univ. of Firenze
Daoudi, Mohammed	Télécom Lille/CRIStAL (UMR 9189)
Del Bimbo, Alberto	Univ. of Florence
Keywords: Gait recognition, Gesture and Behavior Analysis Abstract: The analysis of human gait is more and more investigated due to its large panel of potential applications in various domains, like rehabilitation, deficiency diagnosis, surveillance and movement optimization. In addition, the release of depth sensors offers new opportunities to achieve gait analysis in a non-intrusive context. In this paper, we propose a gait analysis method from depth sequences by analyzing separately each step so as to be robust to gait duration and incomplete cycles. We analyze the shape of the motion trajectory as signature of the gait and consider shape variations within a Riemannian manifold to learn step models. During classification, the derivation of each performed step is evaluated in an online manner to qualitatively analyze the gait. Experiments are carried out in the context of abnormal gait detection and person re-identification trough gait recognition. Results demonstrated the potential of the method in both scenarios.

15:00-17:10, Paper MoPT4.4
Learning Robust Features for Gait Recognition by Maximum Margin Criterion
Balazia, Michal	Faculty of Informatics, Masaryk Univ
Sojka, Petr	Faculty of Informatics, Masaryk Univ
Keywords: Gait recognition, Pattern Recognition for Surveillance and Security, Biometric systems and applications Abstract: In the field of gait recognition from motion capture data, designing human-interpretable gait features is a common practice of many fellow researchers. To refrain from ad-hoc schemes and to find maximally discriminative features we may need to explore beyond the limits of human interpretability. This paper contributes to the state-of-the-art with a machine learning approach for extracting robust gait features directly from raw joint coordinates. The features are learned by a modification of Linear Discriminant Analysis with Maximum Margin Criterion so that the identities are maximally separated and, in combination with an appropriate classifier, used for gait recognition. Experiments on the CMU MoCap database show that this method outperforms eight other relevant methods in terms of the distribution of biometric templates in respective feature spaces expressed in four class separability coefficients. Additional experiments indicate that this method is a leading concept for rank-based classifier systems.

15:00-17:10, Paper MoPT4.5
Leveraging Intra-Class Variations to Improve Large Vocabulary Gesture Recognition
Conly, Christopher	Univ. of Texas at Arlington
Dillhoff, Alex	Univ. of Texas at Arlington
Athitsos, Vassilis	Univ. of Texas at Arlington
Keywords: Gesture and Behavior Analysis, Human body motion and gesture based interaction, Motion, tracking and video analysis Abstract: Large vocabulary gesture recognition using a training set of limited size is a challenging problem in computer vision. With few examples per gesture class, researchers often employ exemplar-based methods such as Dynamic Time Warping (DTW). This paper makes two contributions in the area of exemplar-based gesture recognition: 1) it introduces Multiple-Pass DTW (MP-DTW), a method in which scores from multiple DTW passes focusing on different gesture properties are combined, and 2) it introduces a new set of features modeling intra-class variation of several gesture properties that can be used in conjunction with MP-DTW or DTW. We demonstrate that these techniques provide substantial improvement over DTW in both user-dependent and user-independent experiments on American Sign Language (ASL) datasets, even when using noisy data generated by RGB-D skeleton detectors. We further show that using these techniques in a large vocabulary system with a limited training set provides significantly better results compared to Long Short-Term Memory (LSTM) network and Hidden Markov Model (HMM) approaches.

15:00-17:10, Paper MoPT4.6
Human Pose Estimation Based on Human Limbs
Liang, Guoqiang	Xi'an Jiaotong Univ
Lan, Xuguang	Xi'an Jiaotong Univ
Wang, Jiang	Baidu Res
Zheng, Nanning	Xi'an Jiaotong Univ
Keywords: Gesture and Behavior Analysis, Image and video analysis and understanding, Deep learning Abstract: Modeling the relationship among human joints is one of the most important components in human pose estimation. Previous methods usually define this relationship as geometric constraints on the relative location of two neighboring joints. In this definition, the local image appearance of the region connecting two neighboring joints is ignored. In fact, this image appearance, called human limb, plays an important role in human joint localization in human visual system. To make full use of this local image appearance, we propose to solve a new task: human limb detection. We combine it with human joint localization in one deep convolutional neural network. After getting coarse results, we employ a graphical model to remove false positive detections. Besides, shallow and deep features are combined in this model. We evaluate our method on the FLIC and LSP datasets. The experiments results show the effectiveness of our method.

15:00-17:10, Paper MoPT4.7
A Fast and Accurate Motion Descriptor for Human Action Recognition Applications
Ghorbel, Enjie	IRSEEM and URIA(Mines Douai)
Boutteau, Remi	Esigelec Irseem
Boonaert, Jacques	Ec. Des Mines De DOUAI
Savatier, Xavier	IRSEEM
Lecoeuche, Stephane	URIA - Mines Douai
Keywords: Gesture and Behavior Analysis, Image and video analysis and understanding, Human body motion and gesture based interaction Abstract: With the availability of the recent human skeleton extraction algorithm introduced by Shotton et al. cite{shotton}, an interest for skeleton-based action recognition methods has been renewed. Despite the importance of the low-latency aspect in applications, it can be noted that the majority of recent approaches has not been evaluated in terms of computational cost. In this paper, a novel fast and accurate human action descriptor named Kinematic Spline Curves (KSC) is introduced. This descriptor is built by interpolating the kinematics of joints (position, velocity and acceleration). To overcome the anthropometric and the execution rate variabilities, we respectively propose the use of a skeleton normalization and a temporal normalization. For this purpose, a new temporal normalization method based on the Normalized Accumulated kinetic Energy (NAE) of the human skeleton is suggested. Finally, the classification step is performed using a linear Support Vector Machine (SVM). Experimental results on challenging benchmarks show the efficiency of our approach in terms of recognition accuracy and computational latency.

15:00-17:10, Paper MoPT4.8
Unsupervised Mouse Behavior Analysis: A Data-Driven Study of Mice Interactions
Katsageorgiou, Vasiliki-Maria	Istituto Italiano Di Tecnologia
Zanotto, Matteo	Istituto Italiano Di Tecnologia
Huang, Huiping	Istituto Italiano Di Tecnologia
Ferretti, Valentina	Istituto Italiano Di Tecnologia
Papaleo, Francesco	Istituto Italiano Di Tecnologia (IIT)
Sona, Diego	Istituto Italiano Di Tecnologia (IIT)
Murino, Vittorio	Istituto Italiano Di Tecnologia
Keywords: Gesture and Behavior Analysis, Image and video analysis and understanding, Machine learning and data mining Abstract: Automatic analysis of rodent behavior has been receiving growing attention in recent years since rodents have been the reference species for many neuroscientific studies, with the social interaction being among the subjects of the most important ones. Systems that are employed in these studies are mainly based on tracking of mice and activity classification through supervised learning methods, trained on datasets manually annotated by experts. In this paper, we introduce a completely unsupervised way of analysing tracking data for the automatic identification of social and non-social behaviors using models capable of spotting regularities in the data. In particular, a mean-covariance Restricted Boltzmann Machine is employed to abstract higher-level behavioral configurations of mice interacting in an arena for a long time.

15:00-17:10, Paper MoPT4.9
A Novel Fingerprint Classification Method Based on Deep Learning
Ruxin, Wang	School of Mathematical Science Univ. of Chinese Acad. Of
Congying, Han	School of Mathematical Science Univ. of Chinese Acad. Of
Tiande, Guo	School of Mathematical Science Univ. of Chinese Acad. Of
Keywords: Fingerprint recognition, Classification and clustering, Deep learning Abstract: Fingerprint classification is an effective technique for reducing the candidate numbers of fingerprints in the stage of matching in automatic fingerprint identification system (AFIS). In recent years, deep learning is an emerging technology which has achieved great success in many fields, such as image processing, computer vision. In this paper, we have a preliminary attempt on the traditional fingerprint classification problem based on the new depth neural network method. For the four-class problem, only choosing orientation field as the classification feature, we achieve 91.4% accuracy using the stacked sparse autoencoders (SAE) with three hidden layers in the NIST-DB4 database. And then two classification probabilities are used for fuzzy classification which can effectively enhance the accuracy of classification. By only adjusting the probability threshold, we get the accuracy of classification is 96.1% (setting threshold is 0.85), 97.2% (setting threshold is 0.90) and 98.0% (setting threshold is 0.95) with a single layer SAE. Applying the fuzzy method, we obtain higher accuracy.

15:00-17:10, Paper MoPT4.10
Define a Fingerprint Orientation Field Pattern
Zhang, Ning	Insititute of Automation, Chinese Acad. of Sciences
Zang, Yali	Inst. of Automation, Chinese Acad. of Sciences
Jia, Xiaofei	Insititute of Automation, Chinese Acad. of Sciences
Yang, Xin	Insititute of Automation, Chinese Acad. of Sciences
Tian, Jie	Inst. of Automation, Chinese Acad. of Sciences
Keywords: Fingerprint recognition, Forensic biometrics and its applications Abstract: Orientation Field (OF) is one of the most significant characters to distinguish fingerprint images from non-fingerprint images. An effective definition of fingerprint OF pattern will not only benefit fingerprint enhancement, but also contribute to latent fingerprint detection and segmentation. The existing fingerprint OF models either require pre-knowledge of singular points, or cannot be generalized to all kinds of fingerprint OFs. In this paper, we propose to define the fingerprint OF patterns based on low rank decomposition and sparse coding. Then we apply this proposed method to fingerprint OF recognition and detection. Experimental results prove the effectiveness of our method.

15:00-17:10, Paper MoPT4.11
Improving Cross Sensor Interoperability for Fingerprint Identification
Lin, Chenhao	The Hong Kong Pol. Univ
Kumar, Ajay	The Hong Kong Pol. Univ
Keywords: Fingerprint recognition, Other Biometric applications Abstract: Improving accuracy of matching ﬁngerprint images acquired from two different ﬁngerprint sensors is an important research problem with several promising studies in the literature. Most of these studies focus on sensor interoperability using ﬁngerprints acquired from different kinds of contact-based sensors. However emerging contactless ﬁngerprint technologies have shown its beneﬁts. This paper investigates ﬁngerprint sensor interoperability problem using ﬁngerprints acquired from contact-based and contactless sensor. We propose a generalized contact-based ﬁngerprint deformation correction model (DCM) to improve the matching accuracy. This model is trained by estimating the deformation between contact-based ﬁngerprint and corresponding contactless ﬁngerprint (ground truth). We present a method to estimate contact-based ﬁngerprint impression type and intensity. As a result, minutiae features from contact-based and contactless ﬁngerprint can be better aligned using the proposed model. A database of 1200 2D contactless ﬁngerprints and respective contact-based ﬁngerprints from 200 clients is used for the experiments. The experimental results presented in this paper validate our approach and illustrate promising improvement in performance using the proposed model.

15:00-17:10, Paper MoPT4.12
Local Active Content Fingerprint: Solutions for General Linear Feature Maps
Kostadinov, Dimche	Univ. of Geneva
Voloshynovskiy, Sviatoslav	Univ. of Geneva
Diephuis, Maurits	Univ. of Genvea
Ferdowsi, Sohrab	Univ. of Geneva
Holotyak, Taras	Univ. of Geneva
Keywords: Fingerprint recognition, Signal, image and video processing, Image and video analysis and understanding Abstract: This paper presents solutions to the local patch based Active Content Fingerprint (aCFP) with linear modulation, general linear feature map and convex constraints on the properties of the local feature descriptor. A direct approximation of the linear feature map such that the image distortion is as small as possible and the approximate linear feature map is as close as possible to the original map is proposed. Then an explicit regularization of the trade-off between the modulation distortion and the robustness of the local feature is introduced trough a novel problem formulation. A computer simulation using local image patches, extracted from publicly available data set is provided, demonstrating the advantages under: additive white Gaussian noise (AWGN), lossy JPEG compression and projective geometrical transform distortions.

15:00-17:10, Paper MoPT4.13
A Proposed Pattern Recognition Framework for EEG-Based Blind Watermarking System
Pham, Trung Duy	Univ. of Canberra
Tran, Dat	Univ. of Canberra
Ma, Wanli	Univ. of Canberra
Keywords: Forensic biometrics and its applications, Classification and clustering, Security issues Abstract: Copyright protection for multimedia data owners is of crucial importance as the duplication of multimedia data has become easily with the advent of Internet and digital multimedia technology. Current digital watermarking techniques for preserving the product ownership are rule-based and not directly deal with the data synchronization, therefore their decoding performance reduces significantly when the watermarked data is transmitted through a real communication channel. This paper proposes a pattern recognition framework to build a new blind watermark scheme for electroencephalography (EEG) data. Embedding a watermark is based on modifying mean modulation relationship of approximation coefficient in wavelet domain. Retrieving this watermark is done effectively using Support vector data description (SVDD) models trained with the correlation between modified frequency coefficients and the watermark sequence in wavelet domain. Experimental results show that the proposed scheme provides good imperceptibility and more robust against various signal processing techniques and common attacks such as random cropping, noise addition, low-pass filtering, and resampling.

15:00-17:10, Paper MoPT4.15
Novel Generative Model for Facial Expressions Based on Statistical Shape Analysis of Landmarks Trajectories
Desrosiers, Paul Audain	Télécom Lille, CRIStAL UMR (CNRS 9189)
Devanne, Maxime	Telecom Lille/LIFL
Daoudi, Mohammed	Télécom Lille/CRIStAL (UMR 8219)
Keywords: Facial expression recognition, Statistical, syntactic and structural pattern recognition, Shape modeling and encoding Abstract: We propose a novel geometric framework for analyzing spontaneous facial expressions, with the specific goal of comparing, matching, and averaging the shapes of landmarks trajectories. Here we represent facial expressions by the motion of the landmarks across the time. The trajectories are represented by curves. We use elastic shape analysis of these curves to develop a Riemannian framework for analyzing shapes of these trajectories. In terms of empirical evaluation, our results on two databases: UvA-NEMO and Cohn-Kanade CK+ are very promising. From a theoretical perspective, this framework allows formal statistical inferences, such as generation of facial expressions.

15:00-17:10, Paper MoPT4.16
Shannon Information Based Adaptive Sampling for Action Recognition
Tian, Qing	McGill Univ
Arbel, Tal	Centre for Intelligent Machines, McGill Univ
Clark, James	McGill Univ
Attachments: Supplementary material Keywords: Gesture and Behavior Analysis, Image and video analysis and understanding, Motion, tracking and video analysis Abstract: This paper investigates the effects of sampling on action recognition performance. Currently, dense (regular grid) sampling and uniform random sampling are popular strategies that achieve state-of-the-art performance. However, they are data-blind and pay equal attention to locations of different informativeness. In this paper, a Shannon information based adaptive sampling approach is proposed for action recognition. Results of different sampling approaches are compared on three benchmark datasets: the basic KTH and the challenging HMDB51 and UCF101 datasets. The method is shown to improve recognition accuracy as well as computational efficiency over the current state-of-the-art using less than one percent of the total pixels.

15:00-17:10, Paper MoPT4.17
High Precision Gesture Sensing Via Quantitative Characterization of the Doppler Effect
Ai, Haojun	Wuhan Univ
Men, Yifang	Wuhan Univ
Han, Liangliang	Aerospace System Engineering Shanghai
Li, Zuchao	Wuhan Univ
Mengyun, Liu	Wuhan Univ
Keywords: Gesture and Behavior Analysis, Segmentation, features and descriptors, Audio and acoustic processing and analysis Abstract: This paper presents a high precision gesture recognition system that leverages the Doppler effect of ultrasound to sense in-air hand gestures. The system can precisely identify a wider variety of gestures than other systems without any modification to consumer laptops. The system recognizes quantitatively detailed and complex movements from the signals reflected by a moving body. A Hidden Markov Model is used to construct a library of independent, discrete gestures. The gestures can be mapped to diverse application actions. Our method can distinguish among similar gestures with slight difference by extracting fewer, more effective features. Our proposed system reduces false positives caused by unintended motions and is versatile and adaptable to multiple device. We implemented a proof-of-concept prototype on a laptop and extensively evaluated the system. Our results show that the system recognizes six gestures with an average accuracy of 98.6% and 18 gestures including similar ones with 95% accuracy. The flexibility and robustness on multiple devices highlights its ability to enable future ubiquitous non-contact gesture-based interaction with computing devices.

15:00-17:10, Paper MoPT4.18
Fast 3D Hand Estimation for Mobile Interactions
Pei, Yuru	Peking Univ
Ma, Gengyu	Usens Inc
Attachments: Supplementary material Keywords: Human body motion and gesture based interaction, 3D shape recovery, 2D/3D object detection and recognition Abstract: The ubiquitous hand gesture plays an important role in the natural human machine interaction (HMI). Recently, the consumer color and depth cameras have been used to estimate hand shapes and postures for the mid-air HMI. Under the observation that the 3D hand contours possess much information of hand postures, we estimate the 3D hand contour from infrared images with a limited computation complexity for the HMI on mobile devices. A variant of the dynamic programming (vDP) algorithm is proposed to handle the complex self-occlusion in 3D hand estimation, where a set of heuristic rules are introduced to avoid the finger missing. Furthermore, the constraints are used to reduce the searching space in the contour alignment. Given 3D hand contours, a set of hand gestures, including touching, swiping, and pinching, can be applied to mid-air interactions. The proposed method is much faster than the traditional depth estimation of the whole hand, and can achieve up to 500 Hz on PC, and 100 Hz on mobile devices.

15:00-17:10, Paper MoPT4.19
HIF3D: Handwriting-Inspired Features for 3D Skeleton-Based Action Recognition
Boulahia, Said Yacine	IRISA/INSA De Rennes
Anquetil, Eric	IRISA/INSA
Kulpa, Richard	INRIA/Univ. De Rennes2
Multon, Franck	INRIA/Univ. De Rennes2
Keywords: Human body motion and gesture based interaction, Gesture and Behavior Analysis, Human Computer Interaction Abstract: Action recognition based on human skeleton structure represents nowadays a prosper research field. This is mainly due to the recent advances in terms of capture technologies and skeleton extraction algorithms. In this context, we observed that 3D skeleton-based actions share several properties with handwritten symbols since they both result from a human performance. We accordingly hypothesize that the action recognition problem can take advantage of trial and error already carried out on handwritten patterns. Therefore, inspired by one of the most efficient and compact handwriting feature-set, we propose in this paper a skeleton descriptor referred to as Handwriting-Inspired Features (HIF3D). First of all a data preprocessing is applied to joint trajectories in order to handle the variabilities among actor's morphologies. Then we extract the HIF3D features from the processed joint locations according to a time partitioning scheme so as to additionally encode the temporal information over the sequence. Finally, we selected the Support Vector Machine (SVM) to achieve the classification step. Evaluations conducted on two challenging datasets, namely HDM05 and UTKinect, testify the soundness of our approach as the obtained results outperform the state-of-the-art algorithms that rely on skeleton data.

15:00-17:10, Paper MoPT4.20
Locating Human Interactions with Discriminatively Trained Deformable Pose+Motion Parts
van Gemeren, Coert	Utrecht Univ
Poppe, Ronald	Utrecht Univ
Veltkamp, Remco	Utrecht Univ
Keywords: Human body motion and gesture based interaction, Gesture and Behavior Analysis, Image and video analysis and understanding Abstract: We model dyadic (two-person) interactions by discriminatively training a spatio-temporal deformable part model of fine-grained human interactions. All interactions involve at most two persons. Our models are capable of localizing human interactions in unsegmented videos, marking the interactions of interest in space and time. Our contributions are as follows: First, we create a model that localizes human interactions in space and time. Second, our models use multiple pose and motion features per part. Third, we experiment with different ways of training our models discriminatively. When testing on the target class our models achieve a mean average precision score of 0.86. Cross dataset tests show that our models generalize well to different environments.

15:00-17:10, Paper MoPT4.21
Fast Gesture Recognition with Multiple Stream Discrete HMMs on 3D Skeletons
Borghi, Guido	Univ. of Modena and Reggio Emilia
Vezzani, Roberto	Univ. of Modena and Reggio Emilia
Cucchiara, Rita	Univ. Degli Studi Di Modena E Reggio Emilia
Keywords: Human body motion and gesture based interaction, Human Computer Interaction, Gesture and Behavior Analysis Abstract: HMMs are widely used in action and gesture recognition due to their implementation simplicity, low computational requirement, scalability and high parallelism. They have worth performance even with a limited training set. All these characteristics are hard to find together in other even more accurate methods. In this paper, we propose a novel double-stage classification approach, based on Multiple Stream Discrete Hidden Markov Models (MSD-HMM) and 3D skeleton joint data, able to reach high performances maintaining all advantages listed above. The approach allows both to quickly classify pre-segmented gestures (offline classification), and to perform temporal segmentation on streams of gestures (online classification) faster than real time. We test our system on three public datasets, MSRAction3D, UTKinect-Action and MSRDailyAction, and on a new dataset, Kinteract Dataset, explicitly created for Human Computer Interaction (HCI). We obtain state of the art performances on all of them.

15:00-17:10, Paper MoPT4.22
Localization of Skin Features on the Hand and Wrist from Small Image Patches
Stearns, Lee	Univ. of Maryland
Oh, Uran	Univ. of Maryland, Coll. Park
Cheng, Bridget J.	Cornell Univ
Findlater, Leah	Univ. of Maryland, Coll. Park
Ross, David	Atlanta VA R&D Center for Vision & Neurocognitive Rehabilitation
Chellappa, Rama	Univ. of Maryland
Froehlich, Jon E.	Univ. of Maryland, Coll. Park
Keywords: Human Computer Interaction, Biometric systems and applications, Texture and color analysis Abstract: Skin-based biometrics rely on the distinctiveness of skin patterns across individuals for identification. In this paper, we investigate whether small image patches of the skin can be localized on a user’s body, determining not “who?” but instead “where?” Applying techniques from biometrics and computer vision, we introduce a hierarchical classifier that estimates a location from the image texture and refines the estimate with keypoint matching and geometric verification. To evaluate our approach, we collected 10,198 close-up images of 17 hand and wrist locations across 30 participants. Within-person algorithmic experiments demonstrate that an individual’s own skin features can be used to localize their skin surface image patches with an F1 score of 96.5%. As secondary analyses, we assess the effects of training set size and between-person classification. We close with a discussion of the strengths and limitations of our approach and evaluation methods as well as implications for future applications using a wearable camera to support touch-based, location-specific taps and gestures on the surface of the skin.

15:00-17:10, Paper MoPT4.23
Improving Classifier Fusion Via Pool Adjacent Violators Normalization
Goswami, Gaurav	Indraprastha Inst. of Information Tech. Delhi
Ratha, Nalini	IBM Res
Singh, Richa	IIIT Delhi
Vatsa, Mayank	IIIT Delhi
Keywords: Multi-biometrics Abstract: Classifier fusion is a well studied problem in which decisions from multiple classifiers are combined at the score, rank, or decision level to obtain better results than a single classifier. Subsequently, various techniques for combining classifiers at each of these levels have been proposed in the literature. Many popular methods entail scaling and normalizing the scores obtained by each classifier to a common numerical range before combining the normalized scores using the sum rule or another classifier. In this research, we explore an alternative method to combine classifiers at the score level. The Pool Adjacent Violators (PAV) algorithm has traditionally been utilized to convert classifier match scores to confidence values that model posterior probabilities for test data. The PAV algorithm and other score normalization techniques have studied the same problem without being aware of each other. In this first ever study to combine the two, we propose the PAV algorithm for classifier fusion on publicly available NIST multi-modal biometrics score data. We observe that it provides several advantages over existing techniques and find that the interpretation learned by the PAV algorithm is more robust than the scaling learned by other popular normalization algorithms such as min-max. Moreover, the PAV algorithm enables the combined score to be interpreted as confidence and is able to further improve the results obtained by other approaches. We also observe that utilizing traditional normalization techniques first for individual classifiers and then normalizing the fused score using PAV offers a performance boost compared to only using the PAV algorithm.

15:00-17:10, Paper MoPT4.24
Spoofing Detection for Embedded Face Recognition System Using a Low Cost Stereo Camera
Tian, Guifen	Csrd, Toshiba
Keywords: Other Biometric applications, 2D/3D object detection and recognition Abstract: Spoofing detection is essential for practical face recognition system. Based on the fact that genuine face has special geometric curvatures across surface, this paper brings forward an ultra-fast yet accurate spoofing detection approach using a low-cost stereo camera. To obtain curvatures, the three dimensional shapes of selected facial landmarks are analyzed, by fitting point cloud around each landmark to a specific partial face surface. Spoofing detection is then performed by evaluating curvatures of each landmark and integrating them together. Experiments verify that the approach is able to detect spoofed faces in printed photographs without or with various bending at FAR equal to 0.00%. Meanwhile, genuine faces have a trivial opportunity to be falsely rejected: FRR is 0.59% for near frontal faces and less than 5% for faces with large varying poses. Detection time is 51 milliseconds when executed on a single processor [1] running at a clock frequency of 266M Hz, this makes the detection very suitable for embedded face recognition system.

15:00-17:10, Paper MoPT4.25
Automatic Leaf Shape Category Discovery
Olivares, Leonel	Univ. Central
Victorino, Jorge	Univ. Central
Gómez, Francisco	Univ. Nacional De Colombia
Keywords: Pattern Recognition for Bioinformatics, Classification and clustering, Segmentation, features and descriptors Abstract: Categorical description of leaf shapes is of paramount importance in agriculture and plant sciences. Traditionally, these descriptions have been based on categorical systems proposed by domain experts. Despite the importance of these visual descriptive systems, these approaches may be limited by the representation of unknown shapes as expected in exploratory domains. In this work, we propose a novel strategy to automatically discover the shape categories from a leaf dataset by using only the leaf-shape information. The proposed approach maintains high levels of visual interpretability, a major requirement for interpretation of biological data. The method is based on a complex Fourier shape representation, a low-dimensional representation of this information, and an adaptive kernel-based strategy to discover the shape categories. The proposed method was evaluated through the task of discovering shape categories from 6 different plant species for 3 different biological scenarios. Our experiments demonstrate that the proposed method is able to successfully infer the underlying shape categories presented in a leaf dataset.

15:00-17:10, Paper MoPT4.26
Towards Protecting Biometric Templates without Sacrificing Performance
Li, Jing	National Univ. of Singapore, Univ. of Science and Tech
Wong, Yongkang	National Univ. of Singapore
Sim, Terence	National Univ. of Singapore
Keywords: Security issues, Face recognition Abstract: The ideal biometric template protection scheme possesses the properties of irreversibility, revocability, unlinkability, and good performance. These properties protect the security of the biometrics system as well as users’ privacy. Practical systems, however, fall short of this ideal. In this paper, we present a novel protection scheme that achieves this ideal under the circumstance that a subject’s token and his biometric template are not concurrently exposed. Moreover, our scheme can add template protection to any face verifier. We do this by rendering virtual faces, rather than by devising new biometric features, which is the more common approach. Experimental evaluations using two public face recognition systems show that accuracy is not adversely affected with our scheme.

15:00-17:10, Paper MoPT4.27
Face Anti-Spoofing with Multifeature Videolet Aggregation
Ahmad Siddiqui, Talha	IIIT-Delhi
Bharadwaj, Samarth	IBM
Dhamecha, Tejas Indulal	IBM
Agarwal, Akshay	IIIT Delhi
Vatsa, Mayank	IIIT Delhi
Singh, Richa	IIIT Delhi
Ratha, Nalini	IBM Res
Keywords: Security issues, Face recognition, Biometric systems and applications Abstract: Biometric systems can be attacked in several ways and the most common being spoofing the input sensor. Therefore, anti-spoofing is one of the most essential prerequisite against attacks on biometric systems. For face recognition it is even more vulnerable as the image capture is non-contact based. Several anti-spoofing methods have been proposed in the literature for both contact and non-contact based biometric modalities often using video to study the temporal characteristics of a real vs. spoofed biometric signal. This paper presents a novel multi-feature evidence aggregation method for face spoofing detection. The proposed method fuses evidence from features encoding of both texture and motion (liveness) properties in the face and also the surrounding scene regions. The feature extraction algorithms are based on a configuration of local binary pattern and motion estimation using histogram of oriented optical flow. Furthermore, the multi-feature windowed videolet aggregation of these orthogonal features coupled with support vector machine-based classification provides robustness to different attacks. We demonstrate the efficacy of the proposed approach by evaluating on three standard public databases: CASIA-FASD, 3DMAD and MSU-MFSD with equal error rate of 3.14%, 0%, and 0%, respectively.

15:00-17:10, Paper MoPT4.28
Exposing Seam Carving Forgery under Recompression Attacks by Hybrid Large Feature Mining
Liu, Qingzhong	Sam Houston State Univ
Keywords: Security issues, Pattern Recognition for Surveillance and Security Abstract: While seam carving has been widely used in computer vision and multimedia processing, it is also used for tampering illusions. Although several methods have been proposed to detect seam carving-based forgery, to this date, the detection of the seam carving forgery under recompression attacks in JPEG images has not been explored. To fill this gap, we proposed a hybrid large scale feature mining-based detection method to distinguish the doctored JPEG images from the untouched JPEG images under recompression attacks. Over one hundred thousand features from the spatial domain and from the DCT transform domain are extracted. Ensemble learning is used to deal with the high dimensionality and to avoid overfitting that may occur with some traditional learning classifier for the detection. Our study demonstrates the efficacy of proposed approach to exposing the seam-carving forgery under recompression attacks, especially from a lower quality level or on the same quality recompression.

15:00-17:10, Paper MoPT4.29
Effective 3D Based Frontalization for Unconstrained Face Recognition
Ferrari, Claudio	Univ. of Florence
Lisanti, Giuseppe	Univ. Degli Studi Di Firenze
Berretti, Stefano	Univ. of Florence
Del Bimbo, Alberto	Univ. of Florence
Keywords: Face recognition, Pattern Recognition for Surveillance and Security Abstract: In this paper, we propose a new and effective frontalization algorithm for frontal rendering of unconstrained face images, and experiment it for face recognition. Initially, a 3DMM is fit to the image, and an interpolating function maps each pixel inside the face region on the image to the 3D model's. Thus, we can render a frontal view without introducing artifacts in the final image thanks to the exact correspondence between each pixel and the 3D coordinate of the model. The 3D model is then back projected onto the frontalized image allowing us to localize image patches where to extract the feature descriptors, and thus enhancing the alignment between the same descriptor over different images. Our solution outperforms other frontalization techniques in terms of face verification. Results comparable to state-of-the-art on two challenging benchmark datasets are also reported, supporting our claim of effectiveness of the proposed face image representation.

15:00-17:10, Paper MoPT4.30
Radon Transform Inspired Method for Hand Gesture Recognition
Khorsandi, Mohammad Amin	Isfahan Univ. of Tech
Karimi, Nader	Isfahan Univ. of Tech
Soroushmehr, S.M. Reza	Univ. of Michigan
Hajabdollahi, Mohsesn	Isfahan Univ. of Tech
Samavi, Shadrokh	McMaster Univ
Ward, Kevin	Univ. of Michigan
Najarian, Kayvan	Univ. of Michigan
Keywords: Gesture and Behavior Analysis, Image based modeling, Statistical, syntactic and structural pattern recognition Abstract: Abstract—Touchless communication is a new field for commanding electronic devices. This method is highlighted when hygiene is a special issue. Automated hand gesture recognition needs processing of hand images. Many research works have tried to cope with this recognition problem. Complexity and high computational costs are important drawbacks that make real-time execution of these algorithms difficult. In this paper a new hand gesture recognition method is proposed. To show the functionality of our method we show how it can be used for recognition of the number of fingers in segmented images. Also the proposed algorithm can estimate angles of fingers, direction of the hand, and positions of fingers. In this work, we transform an image to intercept-slope coordinate using a proposed Radon transform inspired mapping. Using this mapping, the algorithm becomes invariant to rotation, scale and position. Straight and separated fingers will be extracted and their locations and angles are feasible to be determined as well. Simplicity and robustness against rotation, scaling and position and also having no complex mathematical calculation are advantages of our work.

15:00-17:10, Paper MoPT4.31
StereoTag: A Novel Stereogram-Marker-Based Approach for Augmented Reality
Nguyen, Minh	Auckland Univ. of Tech. New Zealand
Yeap, Albert (Wai)	Auckland Univ. of Tech. New Zealand
Keywords: Mixed and Augmented Reality, Stereo and multiple view geometry, 2D/3D object detection and recognition Abstract: Augmented Reality (AR) is an active and exciting topic aiming to create intuitive computer interface by blending reality and virtual reality. One challenge of AR is to align virtual data with the environment. Typically, one uses a marker-based approach such as a thick-bordered black and white 2D marker which allows one to recover the relative pose (location and orientation) of a camera in real time. However, bar-code markers do not contain any intuitive visual meaning, and they thus look uninteresting and uninformative. We propose a new type of marker, referred to as a StereoTag, which embeds a meaningful stereogram image hiding 3D coded/decoded information. From experiments conducted, our StereoTag is found to be relatively robust under various conditions and thus could be widely used in future AR applications.

15:00-17:10, Paper MoPT4.32
Sketch Simplification by Classifying Strokes
Ogawa, Toru	The Univ. of Tokyo
Matsui, Yusuke	NII
Yamasaki, Toshihiko	The Univ. of Tokyo
Aizawa, Kiyoharu	The Univ. of Tokyo
Keywords: Pattern Recognition for Art, Cultural Heritage and Entertainment, Vision for graphics, Signal, image and video processing Abstract: In this paper, we propose a novel approach to creating clean-line drawing from a scribbled sketch automatically. The main problem is determining the strokes of a scribbled sketch to be merged, and in our method, we use a machine learning approach to solve this problem. We create training data by comparing scribbled sketches with manually drawn line drawings. Then, we verify that our method creates clean-line drawings when training data are used as the input of the merging phase. In addition, our method includes a step to remove incorrect prediction results that are returned from the trained estimator. We perform tests to show that this step increases the rate of correct results, and the line drawings created using this step were better than those created without this step.

15:00-17:10, Paper MoPT4.33
Over-Atoms Accumulation Orthogonal Matching Pursuit Reconstruction Algorithm for Fish Recognition and Identification
Hsiao, YiHao	National Tsing Hua Univ
Chen, Chaur-Chin	Department of Computer Science, National Tsing Hua Univ
Keywords: Pattern Recognition for Bioinformatics, Classification and clustering, Face recognition Abstract: Fish recognition and identification in an underwater environment are important research topics. In this study, several real-world underwater videos were collected to construct a fish category database for further fish recognition and identification. Recently, compressive sensing, using reconstruction algorithms to reconstruct a sparse signal, has been successfully applied to face recognition. Reconstruction algorithms can be roughly categorized into two groups: basic pursuit (BP) and matching pursuit (MP). BP-related methods adopt a convex optimization technique, while MP-related methods utilize greedy search and vector projection ideas. This study reviews concepts for these reconstruction algorithms and analyzes their performance. Moreover, an over-atoms accumulation orthogonal matching pursuit (OAOMP) method based on OMP is proposed. OAOMP includes two procedures: picking over atoms, and accumulating weighting coefficients of each subject to assign as new weights. OAOMP was compared with existing reconstruction algorithms in terms of reconstruction performance and run time. Experiments were implemented in a fish category database by using eigenfaces and fisherfaces for feature extraction. The experimental results demonstrated that BP-related methods have better recognition rates, while MP-related methods have shorter run times. Moreover, OAOMP is able to achieve better accuracy than OMP and other MP-related methods.

15:00-17:10, Paper MoPT4.34
An Effective Voiceprint Based Identity Authentication System for Mandarin Smartphone Users
Liu, Junhong	Peking Univ
Zou, Yuexian	Peking Univ
Huang, Yichi	Peking Univ
Keywords: Speaker recognition, Biometric systems and applications, Other Biometric applications Abstract: Voiceprint based identity authentication system (IAS) for smartphone users is highly demanded in mobile internet times. There are some successful application cases for English smartphone users. However, to our knowledge, the research outcomes are few for Mandarin smartphone users. Analysis shows that there remain some issues need to be carefully considered: (1) security issue: vulnerable to replay attacks; (2) user experience issue: zero-tolerance of misreading; (3) channel mismatch issue: perform poorly when user change his smartphone. Taking above issues into account, this study strives to develop an effective voiceprint based IAS (termed as DR-EiSV-IAS) for Mandarin smartphone users. Specifically, a content disorder degree (CDD) module implemented with DNN based digit recognition is introduced to resist replay attacks and enhance the fault-tolerance of misreading. Besides, the speaker verification is carefully designed using enhanced ivector technique where ivector framework is incorporated with WCCN to compensate for channel variability. To facilitate this study, we have built up a Mandarin corpus MTDSR2015, which is the first public and free Mandarin database recorded by smartphones for text-dependent speaker recognition research. Extensive experiments have been conducted on both MTDSR2015 and RSR2015 to validate the effectiveness of our proposed DR-EiSV-IAS.

15:00-17:10, Paper MoPT4.35
Persistent Homology-Based Gait Recognition Robust to Upper Body Variations
Lamar-Leon, Javier	Advances Tech. Application Center
Alonso-Baryolo, Raul	Advances Tech. Application Center
Garcia, Edel	Advanced Tech. Application Center
Gonzalez-Diaz, Rocio	Univ. of Seville
Keywords: Gait recognition Abstract: Gait recognition is nowadays an important biometric technique for video surveillance tasks, due to the advantage of using it at distance. However, when the upper body movements are unrelated to the natural dynamic of the gait, caused for example by carrying a bag or wearing a coat, the reported results show low accuracy. With the goal of solving this problem, we apply persistent homology to extract topological features from the lowest fourth part of the body silhouettes. To obtain the features, we modify our previous algorithm for gait recognition, to improve its efficacy and robustness to variations in the amount of simplices of the gait complex. We evaluate our approach using the CASIA-B dataset, obtaining a considerable accuracy improvement of 93.8%, achieving at the same time invariance to upper body movements unrelated with the dynamic of the gait.

15:00-17:10, Paper MoPT4.36
Towards Miss Universe Automatic Prediction: The Evening Gown Competition
Carvajal, Johanna	The Univ. of Queensland
Wiliem, Arnold	The Univ. of Queensland
Sanderson, Conrad	NICTA
Lovell, Brian Carrington	The Univ. of Queensland
Keywords: Image and video analysis and understanding Abstract: Can we predict the winner of Miss Universe after watching how they stride down the catwalk during the evening gown competition? Fashion gurus say they can! In our work, we study this question from the perspective of computer vision. In particular, we want to understand whether existing computer vision approaches can be used to automatically extract the qualities exhibited by the Miss Universe winners during their catwalk. This study can pave the way towards new vision-based applications for the fashion industry. To this end, we propose a novel video dataset, called the Miss Universe dataset, comprising 10 years of the evening gown competition selected between 1996-2010. We further propose two ranking-related problems: (1) Miss Universe Listwise Ranking and (2) Miss Universe Pairwise Ranking. In addition, we also develop an approach that simultaneously addresses the two proposed problems. To describe the videos we employ the recently proposed Stacked Fisher Vectors in conjunction with robust local spatio-temporal features. From our evaluation we found that although the addressed problems are extremely challenging, the proposed system is able to rank the winner in the top 3 best predicted scores for 5 out of 10 Miss Universe competitions.

15:00-17:10, Paper MoPT4.37
Landmark Manifold: Revisiting the Riemannian Manifold Approach for Facial Emotion Recognition
Zhao, Kun	The Univ. of Queensland
Yang, Siqi	The Univ. of Queensland
Wiliem, Arnold	The Univ. of Queensland
Lovell, Brian Carrington	The Univ. of Queensland
Keywords: Facial expression recognition, Image and video analysis and understanding Abstract: Automatically recognising facial emotions has drawn increasing attention in computer vision. Facial landmark based methods are one of the most widely used approaches to perform this task. However, these approaches do not provide good performance. Thus, researchers usually tend to combine more information such as textural and audio information to increase the recognition rate. In this paper we propose a novel method, here called the landmark manifold, that shows the possibility to achieve competitive performance by facial landmark information alone. Through experiments on the well-known dataset: marked Cohn-Kanade extended facial emotion dataset~(CK+), we show that with accurate facial landmarks, our simple approach is fast to run and can achieve competitive performance with enormously expensive methods.


MoPT5	Poster Session Hall
MoP5	Poster Session

15:00-17:10, Paper MoPT5.1
Shape-Aware Multi-Atlas Segmentation
Alvén, Jennifer	Chalmers Univ. of Tech
Kahl, Fredrik	Lund Univ
Landgren, Matilda	Lund Univ
Larsson, Viktor	Lund Univ
Ulén, Johannes	Lund Univ
Keywords: Medical image and signal analysis Abstract: Despite of having no explicit shape model, multi-atlas approaches to image segmentation have proved to be a top-performer for several diverse datasets and imaging modalities. In this paper, we show how one can directly incorporate shape regularization into the multi-atlas framework. Unlike traditional methods, our proposed approach does not rely on label fusion on the voxel level. Instead, each registered atlas is viewed as an estimate of the position of a shape model. We evaluate and compare our method on two public benchmarks: (i) the VISCERAL Grand Challenge on multi-organ segmentation of whole-body CT images and (ii) the Hammers brain atlas of MR images for segmenting the hippocampus and the amygdala. For this wide spectrum of both easy and hard segmentation tasks, our experimental quantitative results are on par or better than state-of-the-art. More importantly, we obtain qualitatively better segmentation boundaries, for instance, preserving fine structures.

15:00-17:10, Paper MoPT5.3
Information Fusion for Cocaine Dependence Recognition Using Fmri
Faria, Fabio Augusto	Federal Univ. of São Paulo
Menocci Cappabianco, Fábio Augusto	Federal Univ. of São Paulo
Li, Chiang-Shan Ray	Yale Univ
Ide, Jaime	Yale Univ
Keywords: Medical image and signal analysis, Classification and clustering, Machine learning and data mining Abstract: Cocaine dependence devastates millions of human lives. Despite of a variety of treatments, there is a very high rate of individual relapse to drug use. In the last decade, functional magnetic resonance imaging (fMRI) proved to be a powerful tool to diagnose and understand different pathologies. This work provides advances in the identification of cocaine dependence and in the relapse prediction based on fMRI classification. We improve the traditional methodology of the literature called multi-voxel pattern analysis (MVPA), which is used for feature extraction and classification. In addition, we propose new features that use specific functional connectivity measures. An extensive evaluation was conducted comparing our methodology with MVPA, as well as, several learning methods with distinct feature sets. We could identify the neural patterns that lead to improve classification accuracies and evaluate the advantages of employing an information fusion approach through an ensemble of classifiers. Experimental results show an improvement of final accuracy over the state-of-the-art methods.

15:00-17:10, Paper MoPT5.5
EEG-Based Classification of 'Wakefulness' and 'Unconsciousness' in Sedation Using Global Spectra Pincipal Component (withdrawn from program)
Kim, Hwi-Jae	Korea Univ.
Seul-Ki, Yeom	Korea Univ.
Hyun-Jeong, Kim	Seoul National Univ. Dental Hospital
Kwang-Suk, Seo	Seoul National Univ. Dental Hospital
Lee, Seong-Whan	Korea Univ.

15:00-17:10, Paper MoPT5.6
A Bregman Divergence-Based Level Set Evolution for Efficient Medical Image Segmentation
Dai, Shuanglu	Stevens Inst. of Tech
Man, Hong	Stevens Inst. of Tech
Zhan, Shu	Hefei Univ. of Tech
Keywords: Medical image and signal analysis, Segmentation, features and descriptors Abstract: Fluctuations in signed distance measurement often reduce the numerical precision of level set methods (LSMs) in image segmentation. Inspired by the split Bregman method for L1-regularization problems, this paper proposes an efficient energy-based level set framework with Bregman divergence reaction to achieve stable and accurate numerical solutions. In this proposed algorithm, the level set and its signed distance function (SDF) are formulated as a constrained L1-norm optimization problem. Bregman divergence is then introduced as a new energy measurement of the level set function. By adding the reaction term for the divergence, SDF with L1-norm constraint is then computed under an unconstrained optimization framework. Efficient numerical algorithms such as Fast Fourier Transformation (FFT) and Newton's method are further adopted within a unified computational framework for solving the sub-minimizations. Extensive experimental results demonstrate that the proposed level set algorithm is able to achieve competitive performance in medical image segmentation.

15:00-17:10, Paper MoPT5.8
EvaToon: A Novel Graph Matching System for Evaluating Cartoon Drawings
Wu, Yirui	Nanjing Univ
Zhou, Xianli	Nanjing Univ
Lu, Tong	State Key Lab. for Software Tech. Nanjing Univ
Guo, Mei	Nanjing Tech. Vocational Coll
Sun, Linbi	Nanjing Tech. Vocational Coll
Keywords: Graphics Recognition, Document Understanding Abstract: Imitation cartoon drawing is an important skill for cartoonists, requiring quantity of efforts on practising and guidance. In this paper, we propose EvaToon, an imitated drawing evaluate system, which automatically assigns judging scores and marks improper drawing regions. With our system, cartoonists can practise and get guidance by themselves. We have cooperated with several experts on developing such an evaluation system. Based on their guide, we present EvaToon in two stages comprising cartoon drawings analyzing and similarity evaluating. During analyzing, we first locate contour pixels with high curvature as interest points and then extract multi-scale features around interest points to hierarchically describe shape. During evaluating, we first match interest points between original and imitated drawing based on distance of features. After matching, we construct a regression tree to map high dimensional difference of matching features to scores and marks based on quantity of manually evaluated training examples. Finally, our system matches an input imitated drawing with the original one and predicts its scores automatically. We demonstrate the accuracy of our EvaToon system in matching and predicting and prove the capability of describing shape of our proposed features by experiments on a collected dataset of imitated drawings.

15:00-17:10, Paper MoPT5.9
Improving PGF Retrieval Effectiveness with Active Learning
Qu, Jingwei	Peking Univ
Lu, Xiaoqing	Peking Univ
Fu, Songping	Peking Univ
Tang, Zhi	Peking Univ
Keywords: Graphics Recognition, Pattern Recognition for Search, Retrieval and Visualization, Performance Evaluation Abstract: Multimedia education is playing a significant and increasing role for education purposes, thus leading to a large number of electronic documents. Plane geometry figures (PGFs), as important components of these documents, are regarded as very helpful information to most retrieval systems in the field of mathematics education. However, the burdensome work of annotation has become one of the chief obstacles to improve the efficiency of retrieval systems. In this paper, we introduce an active learning-based frame to select candidate instances for training the classifiers in retrieval systems, which are an emerging non-text-based information systems. In addition, an enhanced uncertainty measure and the selection of specific features of PGFs are proposed for our active learning algorithm. Comparative experiment results indicate that the proposed method effectively improves the performance of the PGF retrieval system and reduces the burdensome annotation workload.

15:00-17:10, Paper MoPT5.10
A New Database for Online Handwritten Mongolian Word Recognition
Ma, Long-Long	Inst. of Software, Chinese Acad. of Sciences
Keywords: Handwriting Recognition Abstract: A new online handwritten Mongolian word database, MRG-OHMW, is introduced in this paper. This database contains 946 frequently Mongolian words produced by 300 persons from Mongolian ethnic minority. These Mongolian words is composed of one to fourteen Mongolian characters, and selected from large-scale Mongolian text corpus according to the frequencies of usage. The current version of this database is collected using Anoto pen on paper. The database is further annotated using Mongolian word-level string alignment strategy. We partition the samples into training and test sets, and evaluate the database using the CNN-based recognizer as a baseline. Experimental results reveal a big challenge to higher recognition performance. To our knowledge, MRG-OHMW is the first publicly available database for online handwritten Mongolian research. It provides a basic database to compare empirically different algorithms for online handwritten Mongolian word recognition.

15:00-17:10, Paper MoPT5.11
In-Air Handwritten Chinese Character Recognition Using Discriminative Projection Based on Locality-Sensitive Sparse Representation
Qu, Xiwen	Univ. of Chinese Acad. of Sciences
Wang, Weiqiang	Univ. of Chinese Acad. of Sciences
Lv, Ke	Univ. of Chinese Acad. of Sciences
Keywords: Handwriting Recognition, Character and Text Recognition, Human Computer Interaction Abstract: Dimensionality reduction methods have been shown to be effective for handwritten Chinese character recognition (HCCR). In this paper,we propose discriminative projection based on locality-sensitive sparse representation (DPLSR) for in-air handwritten Chinese character recognition (IAHCCR). DPLSR based on the locality-sensitive sparse representation based classifier (LSRC), which can provide closed-form solutions and maintain the data locality constraint during the sparse coding stage. In contrast to sparse representation classifier steered discriminative projection (SRC-DP), which did not consider global structure of data and use all training samples as dictionary atoms, DPLSR is able to use fewer number atoms and spend less training time to achieve better performance. Experiments are conducted on the IAHCC-UCAS2016 dataset built by us, experimental results demonstrate the effectiveness of proposed method.

15:00-17:10, Paper MoPT5.12
Novel Character Segmentation Method for Overlapped Chinese Handwriting Recognition Based on LSTM Neural Networks
Su, Tonghua	Harbin Inst. of Tech
Keywords: Handwriting Recognition, Deep learning, Artificial neural networks Abstract: Overlapped handwriting recognition is widely used to input text in smart devices since it allows to write continuous characters on an size-restricted screens. How to segment the stroke sequences into characters is a crucial step before recognition. It is currently formulated as a two-class classification problem merely evaluating on the relationships between a pair of adjacent strokes. To facilitate the long contextual dependency, the paper novelly presents the problem as a sequential classification problem. Firstly each adjacent stroke pair is expressed as a feature vector. Secondly a LSTM model is learned to encode the long contextual history information from massive data. Finally the model is propagated forward to predict the labels once new samples are fed. Experiments are conducted on a public online Chinese handwriting database. The results show that the proposed method outperforms the traditional ones with about 10 percent improvement in terms of both specificity and precision.

15:00-17:10, Paper MoPT5.13
Approaching the Intra-Class Variability in Multi-Script Static Signature Evaluation
Diaz, Moises	Univ. De Las Palmas De Gran Canaria
Ferrer, M.A.	Univ. Las Palmas De Gran Canaria
Sabourin, Robert	École De Tech. Supérieure
Keywords: Handwriting Recognition, Forensic biometrics and its applications, Performance Evaluation Abstract: As an emerging issue, multi-script signature verification is a recent challenge for current Automatic Signature Verification (ASV) systems. Relevant differences are presented in the morphology and lexicon of the signature images written in different scripts, such as used symbols, shape of the signatures, legibility, etc. These peculiarities could reduce the success of ASV systems, especially those which were originally designed for only one kind of script. However, one common feature among scripts in ASV is the fact that the greater the number of signatures that are used for training, the better the expected performance. In this work, we propose a method inspired by observations from the neuromotor equivalence theory to artificially enlarge the signature images used to train a state-of-the-art static signature classifier. Experimental results are obtained by using three static signature datasets derived from completely different scripts: Western, Bengali and Devanagari. Our results suggest that the cognitive-inspired model, which aims to duplicate static signatures, tends toward intra-class variability of signatures written in different scripts; the model's beneficial impact is seen in signature verification tests.

15:00-17:10, Paper MoPT5.14
Writer Identification by Training on One Script but Testing on Another
Adak, Chandranath	Griffith Univ
Chaudhuri, Bidyut Baran	Indian Statistical Inst
Blumenstein, Michael	Univ. of Tech. Sydney
Keywords: Handwriting Recognition, Other applications, Forensic biometrics and its applications Abstract: This paper deals with identifying a writer from his/her offline handwriting. In a multilingual country where a writer can scribe in multiple scripts, writer identification becomes challenging when we have individual handwriting data in one script while we need to verify/identify a writer from handwriting in another script. In this paper such an issue is addressed with two scripts: English and Bengali. Here we model the task as a classification problem, where training data contains only Bengali handwritten samples and testing is performed on English handwritten texts. This work is based on the understanding that a writer has some inherent stroke characteristics that are independent of the script in which (s)he writes. In this work, some implicit structural and statistical features are extracted, and multiple classifiers are employed for writer identification. Many training sessions are run on a database of 100 writers and the performances are analyzed. We have obtained encouraging results on this database, which show the effectiveness of our method.

15:00-17:10, Paper MoPT5.15
Gaze Estimation Using EEG Signals for HCI in Augmented and Virtual Reality Headsets
Fernandez Montenegro, Juan Manuel	Kingston Univ. London
Argyriou, Vasileios	Kingston Univ. London
Keywords: Human Computer Interaction Abstract: Augmented and virtual reality have evolved significantly over the last few years providing new ways of entertainment and interaction with the environment. Although many systems and solutions are currently available, still there is much left unsettled and some technologies are missing from many VR/AR devices, such as foveated rendering and HCI. In this paper, a novel approach for coarse gaze estimation using EEG sensors with applications in items selection for HCI or foveated rendering for VR/AR devices is proposed. The suggested method requires only few electroencephalogram sensors that can be easily added to the current virtual and augmented reality headsets. A supervised machine leaning approach was suggested utilising novel features, based on quaternions allowing gaze estimation. Experiments were performed to evaluate the proposed method and a new dataset was designed and captured. Finally, the introduced learning framework was compared with other similar techniques demonstrating further the gain of the proposed descriptors.

15:00-17:10, Paper MoPT5.16
Continuous Low Latency Heart Rate Estimation from Painful Faces in Real Time
Rapczynski, Michal	Otto-Von-Guericke-Univ. Magdeburg
Werner, Philipp	Otto-Von-Guericke-Univ. Magdeburg
Al-Hamadi, Ayoub	IESK, Otto-Von-Guericke-Univ. Magdeburg
Keywords: Human Computer Interaction, Computer-aided detection and diagnosis, Medical image and signal analysis Abstract: Video based heart rate estimation has several advantages compared to the classical method. Current approaches use long time windows to calculate heart rates, which results in high latency and is a big disadvantage for a practical use. To overcome this constraint, we propose a low latency approach for continuous frame based heart rate estimation. It is based on combination of face tracking and skin detection using short time windows to filter and analyze the extracted PPG signals in real time. In experiments the presented approach performs with high accuracy (85,2%, with error <3 BPM) using a pain recognition data set including facial expressions and head movement for validation.

15:00-17:10, Paper MoPT5.17
Temporal Dynamics of Tip Fluorescence Predict Cell Growth Behavior in Pollen Tubes
Tambo, Asongu L.	Univ. of California Riverside
Bhanu, Bir	Univ. of California
Keywords: Biological image and signal analysis, Classification and clustering, Image and video analysis and understanding Abstract: In the sexual reproductive life cycle of flowering plants, the growth of the pollen tube plays a vital role. The pollen tube grows towards the ovary of the flower where it delivers male reproductive material. This growth often involves twists and turns as the pollen tube navigates towards the ovary. Current growth models are a collection of mathematical equations to explain observable linear growth behavior in pollen tubes. However, there are few studies on the relationship between the fluorescence signal at the tip of the cell and the growth behavior (straight vs. turning). In this paper, we propose a method of extracting features from the tip fluorescence signal which will be used to distinguishing between straight vs. turning growth behavior. The tip signal is obtained as a ratio of the average membrane-to-cytoplasm fluorescence values over time. A two-stage scheme is used to automatically detect individual growth intervals/cycles from the tip signal and split the experimental video into growth segments. In each growth segment, we extract relevant features. An initial classification uses structure-based features to distinguish between straight vs. turning growth cycles. The signal-based features are then used to train a Naive Bayes classifier to refine the miss-classifications of the initial classification. Our results show that this two-stage process yields good classification results.

15:00-17:10, Paper MoPT5.18
Recognition Based Text Localization from Natural Scene Images
Ray, Anupama	Iit Delhi
Shah, Archit	Iit Delhi
Chaudhury, Santanu	-IIT Delhi
Keywords: Other applications, Scene understanding, Machine learning and data mining Abstract: With the rapid increase of multimedia data, textual content in an image has become a very important source of information for several applications like navigation, image search and retrieval, image understanding, captioning, machine translation and several others. Scene text localization is the first step towards such applications and most current methods focus on generating a small set of high precision detectors rather than obtaining large set of detections covering all text patches. In this work we propose a novel hybrid framework for text localization which uses character level recognition recursively in a feedback mechanism to refine text patches and reduce false positives. We use popular MSER algorithm at multiple scales as an initial region proposal algorithm and several filtering stages recursively to improve precision as well as maximize recall. We aim at achieving high recall rather than achieving higher precision since several robust word recognition systems are already available. The word recognition systems are mature enough to produce highly accurate results if provided with maximum amount of regions rather than providing small set of highly precise text patches and losing several other text regions. The main contribution of this paper is the use of character recognizer within a novel feedback mechanism to recursively search for text regions in the neighborhood of previously detected text patches. Using 3 publicly available benchmark datasets (ICDAR2011, MSRA TD-500 and OSTD), we demonstrate the efficacy of the proposed framework for text localization.

15:00-17:10, Paper MoPT5.19
Solar Flare Retrieval, Detection and Analysis
Suk, Tomáš	Inst. of Information Theory and Automation, Czech Acad. Of
Simberova, Stanislava	Astronomical Inst. Acad. of Sciences of the Czech Republi
Keywords: Signal Processing for Astronomy, Statistical, syntactic and structural pattern recognition, 2D/3D object detection and recognition Abstract: We propose methodology for the analysis of active regions on the Sun. It is based on high-order statistical moments of image histogram, particularly on its skewness. The methodology includes a new technique to select regions with possible formation of the flare. We track these areas in the time video sequences to search for triggers -- turning point(s), when the pre-flare phase changes to the full developed event. The frequency analysis in the flare formation areas is based on Fourier analysis, Morlet wavelets, and principal component analysis. Results have been evaluated to achieve the periods of oscillations stimulating the flare origin.

15:00-17:10, Paper MoPT5.20
Sequential vs. Batch Machine-Learning with Evolutionary Hyperparameter Optimization for Segmenting Aortic Dissection Thrombus
Morariu, Cosmin Adrian	Univ. Duisburg-Essen
Thomas, Malte	Univ. Duisburg-Essen
Dohle, Daniel	Univ. Duisburg-Essen
Tsagakis, Konstantinos	Univ. Duisburg-Essen
Pauli, Josef	Univ. Duisburg-Essen
Keywords: Computer-aided detection and diagnosis, Medical image and signal analysis Abstract: While delineation of aortic aneurysms has been subject of research in several publications, this represents the first contribution to address segmentation of thrombus in case of aortic dissections. The segmentation process ensues in multiplanar reformated slices (MPRs). In 3D CTA data, thrombus hardly differs from surrounding tissue outside the aorta. Segmentation is further complicated by the high variance of adjacent structures along the aorta in thoracic and abdominal area. Therefore, we propose a combination of machine learning methods and additional features for the detection of the aortic outer wall, which includes both lumen and thrombus. The optimal path is sought in each MPR in polar space based on the result of a classifier, as well as the filter response of a phase congruency filter and a distance-based component. Hyperparameters for the classifier are inferred by employing evolutionary algorithms.

15:00-17:10, Paper MoPT5.21
Quantifying Radiographic Knee Osteoarthritis Severity Using Deep Convolutional Neural Networks
Antony, Joseph	Insight Centre for Data Analytics, Dublin City Univ
McGuinness, Kevin	Insight Centre for Data Analytics, Dublin City Univ
O'Connor, Noel E	Insight Centre for Data Analytics, Dublin City Univ
Moran, Kieran	Insight Centre for Data Analytics, Dublin City Univ
Keywords: Computer-aided detection and diagnosis, Medical image and signal analysis, Biological image and signal analysis Abstract: This paper proposes a new approach to automatically quantify the severity of knee osteoarthritis (OA) from radiographs using deep convolutional neural networks (CNN). Clinically, knee OA severity is assessed using Kellgren & Lawrence (KL) grades, a five point scale. Previous work on automatically predicting KL grades from radiograph images were based on training shallow classifiers using a variety of hand engineered features. We demonstrate that classification accuracy can be significantly improved using deep convolutional neural network models pre-trained on ImageNet and fine-tuned on knee OA images. Furthermore, we argue that it is more appropriate to assess the accuracy of automatic knee OA severity predictions using a continuous distance-based evaluation metric like mean squared error than it is to use classification accuracy. This leads to the formulation of the prediction of KL grades as a regression problem and further improves accuracy. Results on a dataset of X-ray images and KL grades from the Osteoarthritis Initiative (OAI) show a sizable improvement over the current state-of-the-art.


MoBT1	G.Cancun T1.A
MoPMO1	Oral Session

17:10-17:30, Paper MoBT1.1
Regularized Dynamic Boltzmann Machine with Delay Pruning for Unsupervised Learning of Temporal Sequences
Dasgupta, Sakyasingha	IBM Res. - Tokyo
Yoshizumi, Takayuki	IBM
Osogami, Takayuki	IBM Res. - Tokyo
Keywords: Artificial neural networks, Deep learning, Model selection Abstract: We introduce Delay Pruning, a simple yet powerful technique to regularize dynamic Boltzmann machines (DyBM). The recently introduced DyBM provides a particularly structured Boltzmann machine, as a generative model of a multi-dimensional time-series. This Boltzmann machine can have infinitely many layers of units but allows exact inference and learning based on its biologically motivated structure. DyBM uses the idea of conduction delays in the form of fixed length first-in first-out (FIFO) queues, with a neuron connected to another via this FIFO queue, and spikes from a pre-synaptic neuron travel along the queue to the post-synaptic neuron with a constant period of delay. Here, we present Delay Pruning as a mechanism to prune the lengths of the FIFO queues (making them zero) by setting some delay lengths to one with a fixed probability, and finally selecting the best performing model with fixed delays. The uniqueness of structure and a non-sampling based learning rule in DyBM, make the application of previously proposed regularization techniques like Dropout or DropConnect difficult, leading to poor generalization. First, we evaluate the performance of Delay Pruning to let DyBM learn a multidimensional temporal sequence generated by a Markov chain. Finally, we show the effectiveness of delay pruning in learning high dimensional sequences using the moving MNIST dataset, and compare it with Dropout and DropConnect methods.

17:30-17:50, Paper MoBT1.2
Measuring Dependency Via Intrinsic Dimensionality
Romano, Simone	The Univ. of Melbourne
Chelly, Oussama	National Inst. of Informatics
Nguyen, Xuan Vinh	Univ. of Melbourne
Bailey, James	Department of Computing and Information Systems, Univ. of M
Houle, Michael E.	National Inst. of Informatics
Attachments: Supplementary material Keywords: Dimensionality reduction and manifold learning, Machine learning and data mining Abstract: Measuring the amount of dependency among multiple variables is an important task in pattern recognition. In the last few years, many new dependency measures have been developed for the exploration of functional relationships. In this paper, we develop a dependency measure between variables based on an extreme-value theoretic treatment of intrinsic dimensionality. Our measure identifies variables with low intrinsic dimension --- that is, those that support embeddings of the data within low-dimensional manifolds. To build a dependency measure on strong foundations, we theoretically prove a connection between information theory and intrinsic dimensionality theory. This allows us also to propose novel estimators of intrinsic dimensionality. Finally, we show that our dependency measure enables to find patterns that cannot be found by other state-of-the-art measures on real and synthetic data.

17:50-18:10, Paper MoBT1.3
Learning Opposites Using Neural Networks
Kalra, Shivam	Univ. of Waterloo
Sriram, Aditya	Univ. of Waterloo
Rahnamayan, Shahryar	Univ. of Ontario Inst. of Tech
Tizhoosh, Hamid Reza	Univ. of Waterloo
Keywords: Artificial neural networks, Machine learning and data mining, Model selection Abstract: Many research works have successfully extended algorithms such as evolutionary algorithms, reinforcement agents and neural networks using “opposition-based learning” (OBL). Two types of the “opposites” have been defined in the literature, namely type-I and type-II. The former are linear in nature and applicable to the variable space, hence easy to calculate. On the other hand, type-II opposites capture the “oppositeness” in the output space. In fact, type-I opposites are considered a special case of type-II opposites where inputs and outputs have a linear relationship. However, in many real-world problems, inputs and outputs do in fact exhibit a nonlinear relationship. Therefore, type-II opposites are expected to be better in capturing the sense of “opposition” in terms of the input-output relation. In the absence of any knowledge about the problem at hand, there seems to be no intuitive way to calculate the type-II opposites. In this paper, we introduce an approach to learn type-II opposites from the given inputs and their outputs using the artificial neural networks (ANNs). We first perform opposition mining on the sample data, and then use the mined data to learn the relationship between input x and its opposite x. We have validated our algorithm using various benchmark functions to compare it against an evolving fuzzy inference approach that has been recently introduced. The results show the better performance of a neural approach to learn the opposites. This will create new possibilities for integrating oppositional schemes within existing algorithms promising a potential increase in convergence speed and/or accuracy.

18:10-18:30, Paper MoBT1.4
A DBN-CRF for Spectral-Spatial Classification of Hyperspectral Data
Zhong, Ping	National Univ. of Defense Tech
Gong, Zhiqiang	National Univ. of Defense Tech
Schönlieb, Carola-Bibiane	Univ. of Cambridge
Keywords: Classification and clustering, Deep learning, Other applications Abstract: This work shows how to improve hyperspectral image classification through using both a deep representation and contextual information. To implement this objective, this work proposes a new Conditional Random Field (CRF) model (named DBN-CRF) with potentials defined over deep features produced by the Deep Belief Networks (DBNs). The newly formulated DBN-CRF model takes advantage of strength of the DBNs in learning a good representation and the ability of CRFs to model contextual (spatial) information in both observations and labels. Within a piecewise training framework, an efficient training method is proposed to train the whole DBN-CRF model end-to-end. This means that parameters in DBN and CRF can be jointly trained and thus the proposed method can fully use the strength of both DBN and CRF. Moreover, in the proposed training method, the end-to-end training can be implemented with a standard back-propagation algorithm, avoiding the repeated inference usually involved in CRF training and thus is computationally efficient. Experiments on real-world hyperspectral data show that our method outperforms the most recent approaches in hyperspectral image classification.


MoBT2	G.Cancun T1.B
MoPMO2	Oral Session

17:10-17:30, Paper MoBT2.1
Distributed and Unsupervised Cost-Driven Person Re-Identification
Martinel, Niki	Univ. of Udine
Foresti, Gian Luca	Univ. of Udine
Micheloni, Christian	Univ. Degli Studi Di Udine
Keywords: Motion, tracking and video analysis, Scene understanding, Vision sensors Abstract: The problem of re-identify persons across single disjoint camera-pairs has received great attention from the community. Despite this, when the re-identification process has to be carried out on a large camera network a different approach has to be considered. In particular, existing approaches have neglected the importance of the network topology (i.e., the structure of the monitored environment) in such a process. To try filling such a gap, we propose a Distributed and Unsupervised Cost-Driven Person Re-Identification framework (DUPRe) which introduces the following contributions: (i) a camera matching cost to measure the re-identification performance between nodes of the network; (ii) a derivation of the distance vector algorithm which allows to learn the network topology hence to prioritize and limit the cameras inquired for the re-identification. Results on two benchmark datasets show that our solution brings to significant network-wise re-identification improvements.

17:30-17:50, Paper MoBT2.2
Online RGB-D Tracking Via Detection-Learning-Segmentation
An, Ning	Inst. of Automation, Chinese Acad. of Sciences
Zhao, Xiao-Guang	Inst. of Automation, Chinese Acad. of Sciences
Hou, Zeng-Guang	Inst. of Automation, Chinese Acad. of Sciences
Keywords: Motion, tracking and video analysis, 2D/3D object detection and recognition Abstract: In this paper, we address the problem of online RGB-D tracking where the target object undergoes significant appearance changes. To sufficiently exploit the color and depth cues, we propose a novel RGB-D tracking framework (DLS) that simultaneously builds the target 2D appearance model and 3D distribution model. The framework decomposes the tracking task into detection, learning and segmentation. The detection and segmentation components locate the target collaboratively by using the two target models. An adaptive depth histogram is proposed in the segmentation component to efficiently locate the target in depth frames. The learning component estimates the detection and segmentation errors, updates the target models from the most confident frames by identifying two kinds of distractors: potential failure and occlusion. Extensive experimental results on a large-scale benchmark dataset show that the proposed method performs favourably against state-of-the-art RGB-D trackers in terms of efficiency, accuracy, and robustness.

17:50-18:10, Paper MoBT2.3
Energy-Based Topological Outlier Filtering
Barath, Daniel	Mta Sztaki
Hajder, Levente	Mta Sztaki
Keywords: Stereo and multiple view geometry, Classification and clustering Abstract: An intuitive approach is proposed for outlier recognition among 2D point correspondences. The main novelty of the proposed method is the exploitation of feature point topology provided by Delaunay triangulation. The solution obtained by minimizing an energy originated from neighboring correspondences in order to remove incorrectly paired points. Assuming local, approximately rigid structures, it is able cope with nonrigid scenes. However, if the epipolar geometry is estimable, the additional information is exploited as well. The proposed method – called Delaunay Filtering – is validated on the publicly available AdelaideRMF dataset and outperforms the state-of-theart, robust model-regression techniques. It is presented that it can be applied to image pairs for which epipolar geometry-based solutions fail.

18:10-18:30, Paper MoBT2.4
Deep Motion Features for Visual Tracking
Glad, Susanna	Linköping Univ
Danelljan, Martin	Linköping Univ
Khan, Fahad Shahbaz	Linköping Univ
Felsberg, Michael	Linköping Univ
Keywords: Motion, tracking and video analysis Abstract: Robust visual tracking is a challenging computer vision problem, with many real-world applications. Most existing state-of-the-art approaches employ hand-crafted appearance features, such as HOG or Color Names. Recently, deep RGB features extracted from convolutional neural networks have been successfully applied for tracking. Despite their success, these features only capture appearance information. On the other hand, motion cues provide discriminative and complementary information that can improve tracking performance. Contrary to visual tracking, deep motion features have been successfully applied for action recognition and video classification tasks. Typically, the motion features are learned by training a CNN on optical flow images extracted from large amounts of labeled videos. This paper presents an investigation of the impact of deep motion features in a tracking-by-detection framework. We further show that hand-crafted, deep RGB, and deep motion features contain complementary information. To the best of our knowledge, we are the first to propose fusing appearance information with deep motion features for visual tracking. Comprehensive experiments clearly suggest that our fusion approach with deep motion features outperforms standard methods relying on appearance information alone.


MoBT3	Maya T2.A
MoPMO3	Oral Session

17:10-17:30, Paper MoBT3.1
Facial Depth Map Enhancement Via Neighbor Embedding
Yang, Shuai	Peking Univ
Song, Sijie	Peking Univ
Guo, Qikun	Peking Univ
Lu, Xiaoqing	Peking Univ
Liu, Jiaying	Peking Univ
Attachments: Supplementary material Keywords: Coding, compression and super-resolution, Enhancement, restoration and filtering Abstract: The simple yet subtle structures of faces make it difficult to capture the fine differences between different facial regions in the depth map, especially for consumer devices like Kinect. To address this issue, we present a novel method to super-solve and recover the facial depth map nicely. The key idea of our approach is to exploit the learning-based method to obtain the reliable face priors from high quality facial depth map to further improve the depth image. Specifically, we utilize the neighbor embedding framework. First, face components are decomposed to train specialized dictionaries and reconstructed, respectively. Joint features, i.e. color, depth and position cues, are put forward for robust patch similarity measurement. The neighbor embedding results form high frequency cues of facial depth details and gradients. Finally, an optimization function is defined to combine these high frequency information to yield depth maps that fit the actual face structures better. Experimental results demonstrate the superiority of our method compared to state-of-the-art techniques in recovering both synthetic data and real world data from Kinect.

17:30-17:50, Paper MoBT3.2
Single Image Super-Resolution with Non-Local Balanced Low-Rank Matrix Restoration
You, Xinge	Huazhong Univ. of Science and Tech
Xue, Weiyong	Huazhong Univ. of Science and Tech
Lei, Jiajia	Huazhong Univ. of Science and Tech
Zhang, Peng	Huazhong Univ. of Science & Tech
Cheung, Yiu-ming	Hong Kong Baptist Univ
Tang, YuanYan	Univ. of Macao
Zhou, Naiding	Huazhong Univ. of Science and Tech
Keywords: Coding, compression and super-resolution, Signal, image and video processing, Image based modeling Abstract: Single image super-resolution (SR) has gained popularity to construct a high-resolution (HR) image from a single low-resolution (LR) version. More recently, non-local self similarity (NSS) has been attracted enormous interests in the field of SR, and the non-local means (NLM)-based methods are classical NSS-based SR methods. However, NLM-based methods neglect the structure information in the patches and structural similarity between patches, so it will be prone to introduce unexpected details into resultant HR images. In this paper, we propose a non-local balanced low rank matrix restoration model (NB-LRM) to improve the performance of SR which will overcome the drawbacks of NLM-based methods and take full advantage of the NSS prior. The proposed algorithm formulates the constrained optimization problem for HR image recovery. First, to take advantage of the local structure in the patch and the structural similarity between the non-local similar patches, we propose a measurement of the similarity based on both Euclidean distance and Pearson distance, then reconstruct the target patch by weighted average the similar patches. Second, to guarantee the structural similarity and linear correlation between the target patch and similar patches, we propose a new low rank regular term. Third, we introduce the iterative low rank regular algorithm to solve our model. Addition, this method doesn’t need other image priors and can produce more robust reconstruction of image local structures. Compared with state-of-the-art SR methods, the proposed NB-LRM method achieves highly competitive PSNR and SSIM result, while demonstrating better edge and texture preservation performance.

17:50-18:10, Paper MoBT3.3
Rate Distortion Optimization Using SSIM for 3D Video Coding
Y, Harshalatha	Indian Inst. of Tech. Kharagpur
Biswas, Prabir Kumar	INDIAN Inst. OF Tech. KHARAGPUR
Keywords: Coding, compression and super-resolution, Multimedia analysis, indexing and retrieval, Signal, image and video processing Abstract: Coding efficiency can be enhanced through rate-distortion optimization (RDO) that provides a trade-off between bit-rate and distortion. In this paper, we have proposed Structural SIMilarity (SSIM) based RDO for 3D video coding improvement. SSIM index is a quality metric that gives better approximation to visual quality. Most of the existing literature on 3D video coding employs sum-of-squared error (SSE) as a measure of distortion, which does not always correlate to visual quality. In order to overcome this gap, SSIM-based RDO is implemented in this paper. Lagrange multiplier is modified to obtain optimum rate along with a reduction in distortion which improves the perceptual quality of the video. The entropy of macroblock (MB) is also considered in the scaling of Lagrange multiplier to increase RDO performance. The proposed algorithm is implemented in 3DV- ATM reference software. Experimental results show an improvement in the perceptual quality of the synthesized sequences with bitrate reduction of 6 - 15%.

18:10-18:30, Paper MoBT3.4
Accelerated Sparse Optimization for Missing Data Completion
Xu, Zheng	Univ. of Texas at Arlington
Li, Yeqing	Univ. of Texas at Arlington
Huang, Junzhou	Univ. of Texas at Arlington
Keywords: Signal, image and video processing Abstract: In this paper, we propose an algorithm for missing value recovery of visual data such as image or video. These missing values may result from the corruption in acquisition process, or user-specified unexpected outliers. This problem exists in wide range of applications. We use the nuclear norm (NN) regularization to enforce the global consistency of the image, while the total variation (TV) regularization is used to encourage the locally consistent in image intensity domain. This model can be applied in very challenging scenarios, where only very small amount of data is available. However, it is very difficult to efficiently solve these two regularizations simultaneously by convex programming due to its composite structure and non-smoothness. To this end, we propose an efficient proximal-splitting algorithm for joint NN/TV minimization. The proposed algorithm is theoretically guaranteed to achieve a convergence rate of mathcal{O}(1/N) for N iterations, which is much faster than mathcal{O}(1/sqrt{N}) by the black-box first-order method for solving the non-smooth optimization problem. In our experiments, we demonstrate the superior performance of our algorithm on image completion compared with seven state-of-the-art algorithms.


MoBT4	Maya T2.B
MoPMO4	Oral Session

17:10-17:30, Paper MoBT4.1
Generalized Stacking of Layerwise-Trained Deep Convolutional Neural Networks for Document Image Classification
Roy, Saikat	Jadavpur Univ
Das, Arindam	HCL Tech
Bhattacharya, Ujjwal	Indian Statistical Inst
Keywords: Document Understanding Abstract: This article presents our recent study of a lightweight Deep Convolutional Neural Network (DCNN) architecture for document image classification. Here, we concentrated on training of a committee of generalized, compact and powerful base DCNNs. A support vector machine (SVM) is used to combine the outputs of individual DCNNs. The main novelty of the present study is introduction of supervised layerwise training of DCNN architecture in document classification tasks for better initialization of weights of individual DCNNs. Each DCNN of the committee is trained for a specific part or the whole document. Also, here we used the principle of generalized stacking for combining the normalized outputs of all the members of the DCNN committee. The proposed document classification strategy has been tested on the well-known Tobacco3482 document image dataset. Results of our experimentations show that the proposed strategy involving a considerably smaller network architecture can produce comparable document classification accuracies in competition with the state-of-the-art architectures making it more suitable for use in comparatively low configuration mobile devices.

17:30-17:50, Paper MoBT4.2
Streaming News Image Summarization
Li, Hao	Univ. of Maryland
Peng, Shangfu	Univ. of Maryland
Samet, Hanan	Univ. of Maryland
Keywords: Document Understanding, Multimedia analysis, indexing and retrieval Abstract: Automatic summarization of streaming news images is critical for efficient news browsing. Although image duplicates are redundant for news reading, the number of duplicates of a news image is a good indicator for its importance. We describe the architecture used in a news aggregation system for online streaming news image summarization. Given a sequence of images for a news topic, we first clusters image duplicates based on a two-stage feature matching process, followed by representative image selection inside each cluster. Images with high importance score are ranked chronologically to generate a timeline summarization. Our timeline summarization is not limited to a fixed size but enables users to zoom in to see more images with more details based on their interests. Experiments on real-world news data demonstrate that the timelines produced by our method can generate accurate and dynamic timeline summarizations.

17:50-18:10, Paper MoBT4.3
Towards an Automated Estimation of English Skill Via TOEIC Score Based on Reading Analysis
Augereau, Olivier	Osaka Prefecture Univ
Fujiyoshi, Hiroki	Osaka Prefecture Univ
Kise, Koichi	Graduate School of Engineering, Osaka Prefecture Univ
Keywords: Document Understanding, Human Computer Interaction, Gesture and Behavior Analysis Abstract: Estimating automatically the degree of language skill by analyzing the eye movements is a promising way to help people from all over the world to learn a new language. In this study, we focus on the English skills of non-native speakers. Our aim is to provide an algorithm that can assess accurately and automatically the TOEIC score after reading English texts for few minutes. As a first step towards this direction, we propose an algorithm that can predict accurately this score after reading and answering some questions about the comprehension of some English texts. We use an eye tracker in order to record the eye gaze, i.e. the positions where the reader is looking at. Then we extract several features to characterize the behavior, and consequently the skill of the reader. We also add a feature based on the number of correct answers to the questions. By using a machine learning based on multivariate regression, the score is estimated user independently. A backward stepwise feature selection is used to select the relevant features and to optimize the estimation. As a main result, the TOEIC score is estimated with 21.7 points of mean absolute error for 21 subjects after reading and answering the questions of only 3 documents.

18:10-18:30, Paper MoBT4.4
Segmentation of Highly Unstructured Handwritten Documents Using a Neural Network Technique
Radhakrishnan Nair, Rathin	Univ. at Buffalo
Urala K, Bhargava	Univ. at Buffalo
Nwogu, Ifeoma	Univ. at Buffalo, SUNY
Govindaraju, Venu	Univ. at Buffalo
Keywords: Document Understanding, Image and video analysis and understanding, Deep learning Abstract: In recent years there has been a growing interest in digitizing the extensive amounts of books and documents that existed preceding the widespread adoption of digital technologies. Many of these digitizing initiatives deal with huge collections of handwritten documents, for which document image analysis techniques (page segmentation, keyword-spotting, optical character recognition (OCR), etc) are not yet as mature as for printed text. Thus, there is an imminent need to develop techniques techniques to understand, archive, index and search the manuscripts. The antiquated approach of manually transcribing handwritten collections and then using standard text retrieval techniques can be very expensive for large collections. But many of the manuscripts in these collections, unlike machine-printed texts, contain unstructured information - cluttered group of texts and graphics - that do not necessarily follow a pre-specified format, thus making it quite challenging to automatically process. Thus, in this paper we present a convolutional neural network (CNN) based implementation that is used to segment pages of handwritten documents into their constituent sections. We showcase a multiscale sliding window based network that is trained to predict the sections of the pages in handwritten manuscripts. The results of the network are post-processed with a novel region growing technique to further improve the segmentation results. The implementation is applied on the Marianne Moore archival collection - a body of handwritten notes and memos by the renowned author Marianne Moore (1887-1972), one of the foremost modernist poets of the early twentieth-century. We present our segmentation results both quantitatively and qualitatively.


MoBT5	Maya T2.C
MoPMO5	Oral Session

17:10-17:30, Paper MoBT5.1
Classifying DME vs Normal SD-OCT Volumes: A Review
Joan Massich, Massich	Univ. De Bourgogne
Rastgoo, Mojdeh	Univ. De Bourgogne
Lemaitre, Guillaume	Univ. De Bourgogne
Cheung, Carol	Singapore Eye Res. Inst
Wong, Tien Yin	Singapore Eye Res. Inst
Sidibe, Desire	Univ. De Bourgogne
Meriaudeau, Fabrice	LE2I
Keywords: Medical image and signal analysis, Computer-aided detection and diagnosis, Classification and clustering Abstract: This article reviews the current state of automatic classification methodologies to identify DME versus normal subjects based on SD-OCT data. Addressing this classification problem has valuable interest since early detection and treatment of DME play a major role to prevent eye adverse effects such as blindness. The main contribution of this article is to cover the lack of a public dataset and benchmark suited for classifying DME and normal SD-OCT volumes, providing our own implementation of the most relevant methodologies in the literature. Subsequently, 6 different methods were implemented and evaluated using this common benchmark and dataset to produce reliable comparison.

17:30-17:50, Paper MoBT5.2
Wireless Capsule Endoscopy Video Summarization: A Learning Approach Based on Siamese Neural Network and Support Vector Machine
Chen, Jin	Peking Univ
Zou, Yuexian	Peking Univ
Wang, Yi	Peking Univ
Keywords: Medical image and signal analysis, Computer-aided detection and diagnosis, Image and video analysis and understanding Abstract: Wireless capsule endoscopy video summarization (WCE-VS) is highly demanded for eliminating redundant frames with high similarity. Conventional WCE-VS methods extract various hand-crafted features as image representations. Researches show that such features only reflect the low-level characteristics of single frame and essentially are not effective to capture the semantic similarity between WCE frames. Motivated by the salient property of Siamese neural network (SNN) in mapping similar image pairs closer while mapping dissimilar image pairs apart in the feature space, a novel learning-based WCE-VS method is proposed in this paper. Specifically, with the availability of labelled similar and dissimilar pairs of WCE frames, SNN is trained with a contrastive loss function to extract high level semantic features. Furthermore, for similarity judgment, to avoid the challenge of manually setting optimal threshold in conventional methods, we creatively cast it into a supervised classification problem implemented by a linear SVM. Extensive experiments validate the effectiveness and efficiency of our proposed method.

17:50-18:10, Paper MoBT5.3
Radon-Gabor Barcodes for Medical Image Retrieval
Mina, Nouredanesh	Univ. of Waterloo
Tizhoosh, Hamid Reza	Univ. of Waterloo
Ershad, Banijamali	Univ. of Waterloo
James, Tung	Univ. of Waterloo
Keywords: Medical image and signal analysis, Content based image retrieval and data mining, Pattern Recognition for Search, Retrieval and Visualization Abstract: In recent years, with the explosion of digital images on the Web, content-based retrieval has emerged as a significant research area. Shapes, textures, edges and segments may play a key role in describing the content of an image. Radon and Gabor transforms are both powerful techniques that have been widely studied to extract shape-texture-based information. The combined Radon-Gabor features may be more robust against scale/rotation variations, presence of noise, and illumination changes. The objective of this paper is to harness the potentials of both Gabor and Radon transforms in order to introduce expressive binary features, called barcodes, for image annotation/tagging tasks. We propose two different techniques: Gabor-of-Radon-Image Barcodes (GRIBCs), and Guided-Radon-of-Gabor Barcodes (GRGBCs). For validation, we employ the IRMA x-ray dataset with 193 classes, containing 12,677 training images and 1,733 test images. A total error score as low as 322 and 330 were achieved for GRGBCs and GRIBCs, respectively. This corresponds to approx. 81% retrieval accuracy for the first hit.

18:10-18:30, Paper MoBT5.4
A Temporal Deep Learning Approach for MR Perfusion Parameter Estimation in Stroke
Ho, King Chung	Univ. of California, Los Angeles
Scalzo, Fabien	UCLA, Geffen School of Medicine
Sarma, Karthik	Univ. of California, Los Angeles
El-Saden, Suzie	Univ. of California, Los Angeles
Arnold, Corey	Univ. of California - Los Angeles
Keywords: Medical image and signal analysis, Deep learning, Artificial neural networks Abstract: Perfusion magnetic resonance (MR) images are often used in the assessment of acute ischemic stroke to distinguish between salvageable tissue and infarcted core. Deconvolution methods such as singular value decomposition have been used to approximate model-based perfusion parameters from these images. However, studies have shown that these existing deconvolution algorithms can introduce distortions that may negatively influence the utility of these parameter maps. There is limited previous work on utilizing machine learning algorithms to estimate perfusion parameters. In this work, we present a novel bi-input convolutional neural network (bi-CNN) to approximate four perfusion parameters without using an explicit deconvolution method. These bi-CNNs produced good approximations for all four parameters, with relative average root-mean-square errors (ARMSEs) ≤ 5% of the maximum values. We further demonstrate the utility of the estimated perfusion maps for quantifying the salvageable tissue volume in stroke, with more than 80% agreement with the ground truth. These results show that deep learning techniques are a promising tool for perfusion parameter estimation without requiring a standard deconvolution process.

Technical Program for Monday December 5, 2016