ICPR16


We1PL	G.Cancun T1
Josien Pluim	Plenary Session

09:00-10:00, Paper We1PL.1
The Truth Is Hard to Make: Validation of Image Registration (I)
Pluim, Josien P.W.	Eindhoven Univ. of Tech
Muenzing, Sascha Eugen Albert	RWTH Aachen Univ
Eppenhof, Koen	Eindhoven Univ. of Tech
Murphy, Keelin	Univ. Coll. Cork
Keywords: Image and video analysis and understanding Abstract: An unsolved problem in medical image analysis is validation of methods. In this paper we will focus on image registration and in particular on nonlinear image registration, which is one of the hardest analysis problems to validate. The paper covers currently used methods of validation, comparative challenges and public datasets, as well as some of our own work in this area.


WeAT1	G.Cancun T1.A
WeAMO1	Oral Session

10:30-10:50, Paper WeAT1.1
Spectral Sparsification in Spectral Clustering
Chakeri, Alireza	Univ. of South Florida
Farhidzadeh, Hamidreza	Univ. of South Florida
Hall, Larry	Univ. of South Florida
Keywords: Classification and clustering, Machine learning and data mining Abstract: Graph spectral clustering algorithms have been shown to be effective in finding clusters and generally outperform traditional clustering algorithms, such as k-means. However, they have scalibility issues in both memory usage and computational time. To overcome these limitations, the common approaches sparsify the similarity matrix by zeroing out some of its elements. They generally consider local neighborhood relationships between the data instances such as the k-nearest neighbor method. Although, they eventually work with the Laplacian matrix, there is no evidence about preservation of its spectral properties with approximation guarantees. As a result, in this paper, we adopt the idea of spectral sparsification to sparsify the Laplacian matrix. A spectral sparsification algorithm takes a dense graph G with n vertices and m edges (that is usually O(n^2)), and returns a new graph H with the same set of vertices and many fewer edges, on the order of O(n log n), that approximately preserves the spectral properties of the input graph. We study the effect of the spectral sparsification method based on sampling by effective resistance on the spectral clustering outputs. Through experiments, we show that the clustering results obtained from sparsified graphs are very similar to the results of the original non-sparsified graphs.

10:50-11:10, Paper WeAT1.2
Hierarchical Learning for Large Multi-Class Network Classification
Liu, Lei	HP Inc
Keywords: Classification and clustering, Machine learning and data mining Abstract: Multi-class learning from network data is an important but challenging problem with many applications, including malware detection in computer networks, user modeling in social networks, and protein function prediction in biological networks. Despite the extensive research on large multi-class learning, there are still numerous issues that have not been sufficiently addressed, such as efficiency of model testing, interpretability of the induced models, as well as the ability to handle imbalanced classes. To overcome these challenges, there has been increasing interest in recent years to develop hierarchical learning methods for large multi-class problems. Unfortunately, none of them were designed for classification of network data. In addition, there are very few studies devoted to classification of heterogeneous networks, where the nodes may have different feature sets. In this paper, we propose a hierarchical tree learning approach for large scale network classification task, our method can seamlessly integrate both the link structure and node attribute information into a unified learning framework. To the best of our knowledge, this is the first study that automatically constructs a taxonomy structure to predict large multi-class problems for network classification. Empirical results suggest that the approach can effectively capture the relationship between classes and generate class taxonomy structures that resemble those produced by human experts. The approach can also be easily parallelizable and has been implemented in a MapReduce framework.

11:10-11:30, Paper WeAT1.3
Semantic Role-Based Representations in Text Classification
Sinoara, Roberta Akemi	Univ. of São Paulo
Rossi, Rafael	Federal Univ. of Mato Grosso Do Sul
Rezende Oliveira, Solange	Univ. of São Paulo
Keywords: Classification and clustering, Machine learning and data mining Abstract: Although good results for automatic text classification can be achieved with the use of bag-of-words representation, this model is not suitable for all classification problems and richer text representations can be required. In this paper, we proposed two text representation models based on semantic role labels and analyzed them in text classification scenarios. We also evaluated the combination of bag-of-words with a semantic representation considering ensemble multi-view strategies. We explored different classification problems for two text collections and pointed out situations that require more than a bag-of-words. The experimental evaluation indicates that the combination of bag-of-words and a text representation based on semantic role labels can improve text classification accuracies.

11:30-11:50, Paper WeAT1.4
Locality in Multi-Label Classification Problems
Norov-Erdene, Batzaya	Hokkaido Univ
Kudo, Mineichi	Hokkaido Univ
Sun, Lu	Hokkaido Univ
Kimura, Keigo	Hokkaido Univ
Keywords: Classification and clustering, Machine learning and data mining Abstract: Lately, multi-label classification (MLC) problems have drawn a lot of attention in a wide range of fields including medical, web, and entertainment. The scale and the diversity of MLC problems is much larger than single-label classification problems. Especially we have to face all possible combinations of labels. To solve MLC problems more efficiently, we focus on three kinds of locality hidden in a given MLC problem. In this paper, first we show how large degree of locality exists in nine datasets, then examine how closely they are related to labels, and last propose a method of reducing the problem size using one kind of locality.

11:50-12:10, Paper WeAT1.5
Simultaneous Clustering and Outlier Detection Using Dominant Sets
Mequanint, Eyasu Zemene	Ca'Foscari Uiversity of Venice
Tesfaye, Yonatan Tariku	Univ. Di IUAV Venezia
Prati, Andrea	Univ. of Parma
Pelillo, Marcello	Ca' Foscari Univ
Keywords: Classification and clustering, Machine learning and data mining Abstract: We present a unified approach for simultaneous clustering and outlier detection in data. We utilize some properties of a family of quadratic optimization problems related to dominant sets, a well-known graph-theoretic notion of a cluster which generalizes the concept of a maximal clique to edgeweighted graphs. Unlike most (all) of the previous techniques, in our framework the number of clusters arises intuitively and outliers are obliterated automatically. The resulting algorithm discovers both parameters from the data. Experiments on real and on large scale synthetic dataset demonstrate the effectiveness of our approach and the utility of carrying out both clustering and outlier detection in a concurrent manner.

12:10-12:30, Paper WeAT1.6
Lambda Means Clustering: Automatic Parameter Search and Distributed Computing Implementation
Comiter, Marcus	Harvard Univ
Cha, Miriam	Harvard Univ
Kung, H. T.	Harvard Univ
Teerapittayanon, Surat	Harvard Univ
Keywords: Classification and clustering, Machine learning and data mining Abstract: Recent advances in clustering have shown that ensuring a minimum separation between cluster centroids leads to higher quality clusters compared to those found by methods that explicitly set the number of clusters to be found, such as k-means. One such algorithm is DP-means, which sets a distance parameter lambda for the minimum separation. However, without knowing either the true number of clusters or the underlying true distribution, setting lambda itself can be difficult, and poor choices in setting lambda will negatively impact cluster quality. As a general solution for finding lambda, in this paper we present lambda-means, a clustering algorithm capable of deriving an optimal value for lambda automatically. We contribute both a theoretically-motivated cluster-based version of lambda-means, as well as a faster conflict-based version of lambda-means. We demonstrate that lambda-means discovers the true underlying value of lambda asymptotically when run on datasets generated by a Dirichlet Process, and achieves competitive performance on a real world test dataset. Further, we demonstrate that when run on both parallel multicore computers and distributed cluster computers in the cloud, cluster-based lambda-means achieves near perfect speedup, and while being a more efficient algorithm, conflict-based lambda-means achieves speedups only a factor of two away from the maximum-possible.


WeAT2	G.Cancun T1.B
WeAMO2	Oral Session

10:30-10:50, Paper WeAT2.1
Geometric Verification Using Semi-2D Constraints for 3D Object Retrieval
Matsuzaki, Kohei	KDDI R&D Lab. Inc
Uchida, Yusuke	KDDI R&D Lab. Inc
Sakazawa, Shigeyuki	KDDI R&D Lab. Inc
Satoh, Shin'ichi	National Inst. of Informatics
Keywords: 2D/3D object detection and recognition, Pattern Recognition for Search, Retrieval and Visualization, Stereo and multiple view geometry Abstract: Geometric verification with epipolar geometry often results in a high score for an incorrect image pair due to ambiguity in its geometric constraints. The ambiguity is caused by a high degree of freedom in the epipolar geometry and a weak constraint from the fitting between a point and a line. In order to mitigate the ambiguity, we propose to filter geometrically inconsistent components, namely correspondences, a sample, a model, and inliers in a RANSAC-based geometric verification. For the filtering, we introduce novel semi-2D constraints whose geometric constraint is weaker than full-2D constraint, but stronger than pure-epipolar constraint. Additionally, an advantage of the proposed approach is that it requires only an image pair instead of neither additional information nor prior learning. Experiments on the public dataset containing 3D object images show that the proposed approach improves the true positive rate when the false positive rate is low, and greatly reduces computational time for the geometric verification of both a correct image pair and an incorrect image pair.

10:50-11:10, Paper WeAT2.2
Energy Minimization of Discrete Functions with Higher-Order Potentials for Depth Map Generation
Bulatov, Dimitri	Fraunhofer IOSB
Kottler, Benedikt	Fraunhofer IOSB
Rottensteiner, Franz	IPI Hannover
Keywords: Scene understanding, Stereo and multiple view geometry, Image and video analysis and understanding Abstract: Minimization of discrete energy functions considering higher-order potentials is a challenging yet an important problem. In this work, a three-step procedure will be presented and exemplified on a general problem related to the dense depth map computation from multi-view configurations: Achieving a joint reconstruction of structure and semantics with piecewise planarity constraints. The three steps of the procedure are bi-na-rization, quadratization, and energy minimization. While the first and the third step are accomplished using procedures based on alpha-expansion and max-flow algorithms, respectively, we propose for the quadratization step a fast and simple module to reformulate the higher-order problem as a quadratic one. This module is based on edge statistics and is particularly useful for regular graphs and for third- or fourth-order potentials.

11:10-11:30, Paper WeAT2.3
3D Reconstruction under Light Ray Distortion from Parametric Focal Cameras
Morinaka, Satoshi	Nagoya Inst. of Tech
Sakaue, Fumihiko	Nagoya Inst. of Tech
Sato, Jun	Nagoya Inst. of Tech
Ishimaru, Kazuhisa	NIPPON SOKEN Inc
Kawasaki, Naoki	NIPPON SOKEN Inc
Keywords: Stereo and multiple view geometry, 3D shape recovery Abstract: In this paper, we propose a new camera model for reconstructing 3D objects under light ray distortion caused by refractive medias. The proposed method can reconstruct 3D scene, even if light rays projected into the cameras are refracted by the refractive media, such as glasses and raindrops. For this objective, we represent light ray projection of multiple cameras by using a pair of planes shared by the multiple cameras in the scene. By using this model, intrinsic and extrinsic camera parameters as well as the refractive properties of the refractive media can be represented efficiently. By using the newly defined camera model, we propose a method for recovering 3D points and camera parameters with refractive properties simultaneously. The experimental results show the efficiency of the proposed camera model and reconstruction method.

11:30-11:50, Paper WeAT2.4
Epipolar Plane Image Rectification and Flat Surface Detection in Light Field
Zhu, Hao	Northwestern Pol. Univ.
Wang, Qing	Northwestern Pol. Univ.

11:50-12:10, Paper WeAT2.5
Dynamic Photometric Stereo Method Using Multi-Tap CMOS Image Sensor
Yoda, Takuya	Kyushu Univ
Nagahara, Hajime	Kyushu Univ
Taniguchi, Rin-ichiro	Kyushu Univ
Kagawa, Keiichiro	Shizuoka Univ
Yasutomi, Keita	Shizuoka Univ
Kawahito, Shoji	Shizuoka Univ
Attachments: Supplementary material Keywords: Vision sensors, Computational photography, 3D shape recovery Abstract: Photometric stereo enables the estimation of surface normals from images that were captured using different known lighting directions. The classical photometric stereo method requires at least three images to determine the normals of a given scene. This method therefore cannot be applied to a dynamic scene, because it is assumed that the scene should remain static while the required images are captured. We present a dynamic photometric stereo method to estimate the surface normals in a dynamic scene. We use a multi-tap complementary metal-oxide-semiconductor (CMOS) image sensor to capture the input images for the photometric stereo method. The image sensor can divide the electrons from the photodiode of a single pixel into different taps of exposures, and can therefore capture multiple images under different lighting conditions with almost the same timing. We implemented a prototype camera that was synchronized with a lighting system, and subsequently realized photometric stereo of a dynamic scene.

12:10-12:30, Paper WeAT2.9
Expression Invariant 3D Face Modeling from an RGB-D Video
Kim, Donghyun	Univ. of Southern California
Choi, Jongmoo	Univ. of Southern California
Leksut, Toy	Univ. of Southern California
Medioni, Gerard	Univ. of Southern California
Keywords: 3D shape recovery, Shape modeling and encoding, Reconstruction and camera motion estimation Abstract: We aim to reconstruct an accurate neutral 3D face model from an RGB-D video in the presence of extreme expression changes. Since each depth frame, taken by a low-cost sensor, is noisy, point clouds from multiple frames can be registered and aggregated to build an accurate 3D model. However, direct aggregation of multiple data produces erroneous results in natural interaction (e.g., talking and showing expressions). We propose to analyze facial expression from an RGB frame and neutralize the corresponding 3D point cloud if needed. We first estimate the person's expressions by fitting blend-shape coefficients using 2D facial landmarks for each frame and calculate an expression deformity (expression score). With the estimated expression score, we determine whether an input face is neutral or non-neutral. If the face is non-neutral, we proceed to neutralize the expression of the 3D point cloud in that frame. To neutralize the 3D point cloud of a face, we deform our generic 3D face model by applying the estimated blendshape coefficients, find displacement vectors from the deformed generic face to a neutral generic face, and apply the displacement vectors to the input 3D point cloud. After preprocessing frames in a video, we rank frames based on the expression scores and register the ranked frames into a single 3D model. Our system produces a neutral 3D face model in the presence of extreme expression changes even when neutral faces do not exist in the video.


WeAT3	Maya T2.A
WeAMO3	Oral Session

10:30-10:50, Paper WeAT3.1
Partial Membership Latent Dirichlet Allocation for Image Segmentation
Chen, Chao	Univ. of Missouri
Zare, Alina	Univ. of Missouri
Cobb, James	Naval Surface Warfare Center
Keywords: Segmentation, features and descriptors, Statistical, syntactic and structural pattern recognition, Machine learning and data mining Abstract: Topic models (e.g., pLSA, LDA, SLDA) have been widely used for segmenting imagery. These models are confined to crisp segmentation. Yet, there are many images in which some regions cannot be assigned a crisp label (e.g., transition regions between a foggy sky and the ground or between sand and water at a beach). In these cases, a visual word is best represented with partial memberships across multiple topics. To address this, we present a partial membership latent Dirichlet allocation (PM-LDA) model and associated parameter estimation algorithms. Experimental results on two natural image datasets and one SONAR image dataset show that PM-LDA can produce both crisp and soft semantic image segmentations; a capability existing methods do not have.

10:50-11:10, Paper WeAT3.2
SCALP: Superpixels with Contour Adherence Using Linear Path
Giraud, Rémi	Univ. of Bordeaux
Ta, Vinh-Thong	Univ. Bordeaux
Papadakis, Nicolas	CNRS
Keywords: Segmentation, features and descriptors Abstract: Superpixel decomposition methods are generally used as a pre-processing step to speed up image processing tasks. They group the pixels of an image into homogeneous regions while trying to respect existing contours. For all state-of-the-art superpixel decomposition methods, a trade-off is made between 1) computational time, 2) adherence to image contours and 3) regularity and compactness of the decomposition. In this paper, we propose a fast method to compute Superpixels with Contour Adherence using Linear Path (SCALP) in an iterative clustering framework. The distance computed when trying to associate a pixel to a superpixel during the clustering is enhanced by considering the linear path to the superpixel barycenter. The proposed framework produces regular and compact superpixels that adhere to the image contours. We provide a detailed evaluation of SCALP on the standard Berkeley Segmentation Dataset. The obtained results outperform state-of-the-art methods in terms of standard superpixel and contour detection metrics.

11:10-11:30, Paper WeAT3.3
Learning Local Descriptors by Optimizing the Keypoint-Correspondence Criterion
Markuš, Nenad	Univ. of Zagreb, Faculty of Electrical Engineering and Comp
Pandzic, Igor	Univ. of Zagreb
Ahlberg, Jörgen	Termisk Systemteknik AB
Keywords: Segmentation, features and descriptors, Artificial neural networks, Deep learning Abstract: Current best local descriptors are learned on a large dataset of matching and non-matching keypoint pairs. However, data of this kind is not always available since detailed keypoint correspondences can be hard to establish. On the other hand, we can often obtain labels for pairs of keypoint bags. For example, keypoint bags extracted from two images of the same object under different views form a matching pair, and keypoint bags extracted from images of different objects form a non-matching pair. On average, matching pairs should contain more corresponding keypoints than non-matching pairs. We describe an end-to-end differentiable architecture that enables the learning of local keypoint descriptors from such weakly-labeled data.

11:30-11:50, Paper WeAT3.4
Interactive Region Segmentation for Manga
Ito, Kota	The Univ. of Tokyo
Matsui, Yusuke	NII
Yamasaki, Toshihiko	The Univ. of Tokyo
Aizawa, Kiyoharu	The Univ. of Tokyo
Keywords: Segmentation, features and descriptors, Signal, image and video processing Abstract: Manga (Japanese comics) are popular all over the world, and are created digitally. In this paper, we propose an interactive segmentation method tailored for manga. The proposed method enables annotators to select areas in manga efficiently. Our experimental results showed that the proposed framework works better than Adobe Photoshop CC, which is the most widely used commercial image editing software.

11:50-12:10, Paper WeAT3.5
Unsupervised Learning of Supervoxel Embeddings for Video Segmentation
Khodabandeh, Mehran	Simon Fraser Univ
Srikanth, M	Simon Fraser Univ
Vahdat, Arash	Simon Fraser Univ
Mehrasa, Nazanin	Simon Fraser Univ
M. Pereira, Eduardo	INESCTEC
Satoh, Shin'ichi	National Inst. of Informatics
Mori, Greg	Simon Fraser Univ
Keywords: Segmentation, features and descriptors, Deep learning, Representation and analysis in pixel/voxel images Abstract: We present an algorithm for learning a feature representation for video segmentation. Standard video segmentation algorithms utilize similarity measurements in order to group related pixels. The contribution of our paper is an unsupervised method for learning the feature representation used for this similarity. The feature representation is defined over video supervoxels. An embedding framework learns a feature mapping for supervoxels in an unsupervised fashion such that supervoxels with similar context have similar embeddings. Based on the learned representation, we can merge similar supervoxels into spatio-temporal segments. Experimental results demonstrate the effectiveness of this learned supervoxel embedding on standard benchmark data.

12:10-12:30, Paper WeAT3.7
Precise Hand Segmentation from a Single Depth Image
Li, Minglei	Univ. of Science and Tech. of China
Sun, Lei	Microsoft Res. Asia
Huo, Qiang	Microsoft Res. Asia
Attachments: Supplementary material Keywords: Segmentation, features and descriptors, Human body motion and gesture based interaction, Image and video analysis and understanding Abstract: We propose a new approach to segmenting a hand accurately from a single depth image. Given a depth image, we extract first a rough hand region of interest (RoI) including a hand and a part of an arm. Then, the RoI is partitioned into triangles by using a constrained Delaunay triangulation (CDT) approach from which hand segmentation proposals are generated. Each segmentation proposal is evaluated by a shallow convolutional neural network (CNN) which is trained as a regression function to predict a confidence score for each proposal. Finally, the segmentation proposal with the highest confidence score is selected as our hand segmentation result. To evaluate the effectiveness of our approach, we use a set of real data containing more than 370,000 frames of hand depth images collected from 40 subjects with large variations in pose, orientation and sensing distance. Compared with segmentation results achieved by a random decision forest (RDF) based approach, our approach achieves much higher accuracy.


WeAT4	Maya T2.B
WeAMO4	Oral Session

10:30-10:50, Paper WeAT4.1
Region Based Fusion of 3D and 2D Visual Data for Cultural Heritage Objects
Frohlich, Robert	Univ. of Szeged
Kato, Zoltan	Univ. of Szeged
Tremeau, Alain	Univ. Jean Monnet
Tamas, Levente	Tech. Univ. of Cluj-Napoca
Shabo, Shadi	Univ. Lyon2
Waksman, Sylvie Yona	French National Center for Scientific Res. CNRS UMR 5138, M
Keywords: Pattern Recognition for Art, Cultural Heritage and Entertainment Abstract: A workflow is proposed for Cultural Heritage applications where the fusion of 3D and 2D visual data is needed. Using cheap, standard devices, one can produce high quality color calibrated 3D model for documenting purpose. The proposed processing workflow combines the author's novel region based registration method with an ICP alignment used for refining the results. It works on 3D data, that doesn't necessarily have intensity information in it, and 2D images of a calibrated camera. These can be obtained with any of the relatively cheap devices available on the market. Contrary to typical solutions, with our method, no calibration patterns or markers are needed. The efficiency and robustness of the proposed method has been confirmed on both large scale synthetic data and on real data acquired with various devices.

10:50-11:10, Paper WeAT4.2
Efficient Approximation of Labeling Problems with Applications to Immune Repertoire Analysis
Osmanlioglu, Yusuf	Drexel Univ
Ontanon, Santiago	Drexel Univ
Hershberg, Uri	Drexel Univ
Shokoufandeh, Ali	Drexel Univ
Keywords: Pattern Recognition for Bioinformatics, Classification and clustering Abstract: Labeling problems are finding increasing applications to optimization problems. They usually get realized into linear or quadratic optimization problems, which are inefficient for large graphs. In this paper we propose an efficient primal-dual solution, ML_PD, for a family of labeling problems. We apply this algorithm to the analysis of immune repertoires, and compare it against our baseline approach based on refinement operators. We provide a comparative evaluation both in terms of accuracy and computational efficiency with respect to the baseline model, as well as to quadratic optimization.

11:10-11:30, Paper WeAT4.3
Bag of Embedded Words Learning for Text Retrieval
Passalis, Nikolaos	Aristotle Univ. of Thessaloniki
Tefas, Anastasios	Aristotle Univ. of Thessaloniki
Keywords: Pattern Recognition for Search, Retrieval and Visualization Abstract: The word embedding models are capable of capturing the semantic content of the textual words. The process of extracting a set of word embedding vectors from a text document is similar to the feature extraction step of the Bag-of-Features pipeline, which is usually used in computer vision tasks. That gives rise to the Bag-of-Embedded Words (BoEW) model. In this paper a novel learning technique that optimizes both the word embedding and the codebook of the BoEW model towards text retrieval is proposed. The proposed method adheres to the cluster hypothesis that states that points in the same cluster are likely to fulfill the same information need and it is demonstrated, using two text datasets, that can significantly increase the retrieval precision. Finally, the proposed technique uses smaller representations than the competitive representation methods, that allows to reduce both the retrieval time and the storage requirements.

11:30-11:50, Paper WeAT4.4
A Unified Framework for Semantic Matching of Architectural Floorplans
Sharma, Divya	IIT Jodhpur
Chattopadhyay, Chiranjoy	INDIAN Inst. OF Tech. JODHPUR
Harit, Gaurav	Indian Insititute of Tech. Rajasthan
Keywords: Pattern Recognition for Search, Retrieval and Visualization, Document Understanding, Graphics Recognition Abstract: An automatic lookup tool, which matches and retrieves similar floorplans from a large repository of digitized architectural floorplans can prove to be of immense help for the architects while designing new projects. In this paper, we have proposed a framework for the matching and retrieval of similar architectural floorplans under the query by example paradigm. We propose a room layout segmentation and adjacent room detection algorithm to represent layouts as an undirected graph. We have also proposed a novel graph spectral embedding feature to uniquely represent the layout of the architectural floorplan. This helps in effective and efficient matching of the room layouts. Room semantics in terms of both the room structures and room decor is used to retrieve similar floorplans from the repository. To match the semantic similarity between a pair of floorplans, we have proposed a two stage matching technique. We have validated the effectiveness of our proposed framework by performing experiments on publicly available floorplan dataset and achieved high retrieval accuracy.

11:50-12:10, Paper WeAT4.5
Person Re-Identification Using CNN Features Learned from Combination of Attributes
Matsukawa, Tetsu	Kyushu Univ
Suzuki, Einoshin	Kyushu Univ
Keywords: Pattern Recognition for Search, Retrieval and Visualization, Pattern Recognition for Surveillance and Security, Soft biometrics Abstract: This paper presents fine-tuned CNN features for person re-identification. Recently, features extracted from top layers of pre-trained Convolutional Neural Network (CNN) on a large annotated dataset, e.g., ImageNet, have been proven to be strong off-the-shelf descriptors for various recognition tasks. However, large disparity among the pre-trained task, i.e., ImageNet classification, and the target task, i.e., person image matching, limits performances of the CNN features for person re-identification. In this paper, we improve the CNN features by conducting a fine-tuning on a pedestrian attribute dataset. In addition to the classification loss for multiple pedestrian attribute labels, we propose new labels by combining different attribute labels and use them for an additional classification loss function. The combination attribute loss forces CNN to distinguish more person specific information, yielding more discriminative features. After extracting features from the learned CNN, we apply conventional metric learning on a target re-identification dataset for further increasing discriminative power. Experimental results on four challenging person re-identification datasets (VIPeR, CUHK, PRID450S and GRID) demonstrate the effectiveness of the proposed features.

12:10-12:30, Paper WeAT4.6
Polygon-Shape-Based Scale and Rotation Invariant Features for Camera-Based Document Image Retrieval
Dang, Quoc Bao	L3i Lab. Univ. of La Rochelle
Coustaty, Mickaël	Univ. of La Rochelle
Luqman, Muhammad Muzzamil	L3i Lab. Univ. of La Rochelle, France
De Cao, Tran	Can Tho Univ
Ogier, Jean-Marc	Univ. De La Rochelle
Keywords: Pattern Recognition for Search, Retrieval and Visualization, Segmentation, features and descriptors Abstract: The scientific problem of real-time camera-based document image retrieval is achieved by computing the image features adapted to this acquisition mode i.e. the image features which are highly discriminative even under challenging conditions of camera capture as well as which are light to be computed. In this paper, we propose new extension features to our previously proposed SRIF descriptor. The new descriptor is named as PSRIF (Polygon-shape-based Scale and Rotation Invariant Features) and makes SRIF more discriminative under challenging camera capture conditions by using least number of nearest points around the keypoint. We propose to use angles and edges of the polygon established from nearest points as additional features. To validate our extension features (PSRIF), the experimentation is carried out on two datasets comprising of 400 heterogeneous-content complex linguistic map images (huge size, 9800 X 11768 pixels resolution) and 700 textual document images. The experimental results show that our extension features (PSRIF) improve the performance of SRIF as well as PSRIF outperforms the state-of-the-art methods.


WeAT5	Maya T2.C
WeAMO5	Oral Session

10:30-10:50, Paper WeAT5.1
Deep Learning for Magnification Independent Breast Cancer Histopathology Image Classification
Bayramoglu, Neslihan	Univ. of Oulu
Kannala, Juho	Univ. of Oulu
Heikkilä, Janne	Univ. of Oulu
Keywords: Biological image and signal analysis, Computer-aided detection and diagnosis, Medical image and signal analysis Abstract: Microscopic analysis of breast tissues is necessary for a definitive diagnosis of breast cancer which is the most common cancer among women. Pathology examination requires time consuming scanning through tissue images under different magnification levels to find clinical assessment clues to produce correct diagnoses. Advances in digital imaging techniques offers assessment of pathology images using computer vision and machine learning methods which could automate some of the tasks in the diagnostic pathology workflow. Such automation could be beneficial to obtain fast and precise quantification, reduce observer variability, and increase objectivity. In this work, we propose to classify breast cancer histopathology images independent of their magnifications using convolutional neural networks (CNNs). We propose two different architectures; single task CNN is used to predict malignancy and multi-task CNN is used to predict both malignancy and image magnification level simultaneously. Evaluations and comparisons with previous results are carried out on BreaKHis dataset. Experimental results show that our magnification independent CNN approach improved the performance of magnification specific model. Our results in this limited set of training data are comparable with previous state-of-the-art results obtained by hand-crafted features. However, unlike previous methods, our approach has potential to directly benefit from additional training data, and such additional data could be captured with same or different magnification levels than previous data.

10:50-11:10, Paper WeAT5.2
A Computational Approach to Relative Aesthetics
Gattupalli, Vijetha	Arizona State Univ
Chandakkar, Parag Shridhar	Arizona State Univ
Li, Baoxin	Arizona State Univ
Keywords: Deep learning Abstract: Computational visual aesthetics has recently become an active research area. Existing state-of-art methods formulate this as a binary classification task where a given image is predicted to be beautiful or not. In many applications such as image retrieval and enhancement, it is more important to rank images based on their aesthetic quality instead of binary-categorizing them. Furthermore, in such applications, it may be possible that all images belong to the same category. Hence determining the aesthetic ranking of the images is more appropriate. To this end, we formulate a novel problem of ranking images with respect to their aesthetic quality. We construct a new dataset of image pairs with relative labels by carefully selecting images from the popular AVA dataset. Unlike in aesthetics classification, there is no single threshold which would determine the ranking order of the images across our entire dataset. We propose a deep neural network based approach that is trained on image pairs by incorporating principles from relative learning. Results show that such relative training procedure allows our network to rank the images with a higher accuracy than a state-of-art network trained on the same set of images using binary labels.

11:10-11:30, Paper WeAT5.3
Annotation Order Matters: Recurrent Image Annotator for Arbitrary Length Image Tagging
Jin, Jiren	Univ. of Tokyo
Nakayama, Hideki	Univ. of Tokyo
Keywords: Deep learning, Artificial neural networks, 2D/3D object detection and recognition Abstract: Automatic image annotation has been an important research topic in facilitating large scale image management and retrieval. Existing methods focus on learning image-tag correlation or correlation between tags to improve annotation accuracy. However, most of these methods evaluate their performance using top-k retrieval performance, where k is fixed. Although such setting gives convenience for comparing different methods, it is not the natural way that humans annotate images. The number of annotated tags should depend on image contents. Inspired by the recent progress in machine translation and image captioning, we propose a novel Recurrent Image Annotator (RIA) model that forms image annotation task as a sequence generation problem so that RIA can natively predict the proper length of tags according to image contents. We evaluate the proposed model on various image annotation datasets. In addition to comparing our model with existing methods using the conventional top-k evaluation measures, we also provide our model as a high quality baseline for the arbitrary length image tagging task. Moreover, the results of our experiments show that the order of tags in training phase has a great impact on the final annotation performance.

11:30-11:50, Paper WeAT5.4
Mutual Information-Based RBM Neural Networks
Peng, Kang-Hao	Univ. of Maryland
Zhang, Heng	Center for Automation Res. Univ. of Maryland, Coll
Keywords: Deep learning, Artificial neural networks, Classification and clustering Abstract: (Deep) neural networks are increasingly being used for various computer vision and pattern recognition tasks due to their strong ability to learn highly discriminative features. However, quantitative analysis of their classification ability and design philosophies are still nebulous. In this work, we use information theory to analyze the concatenated restricted Boltzmann machines (RBMs) and propose a mutual information-based RBM neural networks (MI-RBM). We develop a novel pretraining algorithm to maximize the mutual information between RBMs. Extensive experimental results on various classification tasks show the effectiveness of the proposed approach.

11:50-12:10, Paper WeAT5.5
BranchyNet: Fast Inference Via Early Exiting from Deep Neural Networks
Teerapittayanon, Surat	Harvard Univ
McDanel, Bradley	Harvard Univ
Kung, H. T.	Harvard Univ
Keywords: Deep learning, Artificial neural networks, Classification and clustering Abstract: Deep neural networks are state of the art methods for many learning tasks due to their ability to extract increasingly better features at each network layer. However, the improved performance of additional layers in a deep network comes at the cost of added latency and energy usage in feedforward inference. As networks continue to get deeper and larger, these costs become more prohibitive for real-time and energy-sensitive applications. To address this issue, we present BranchyNet, a novel deep network architecture that is augmented with additional side branch classifiers. The architecture allows prediction results for a large portion of test samples to exit the network early via these branches when samples can already be inferred with high confidence. BranchyNet exploits the observation that features learned at an early layer of a network may often be sufficient for the classification of many data points. For more difficult samples, which are expected less frequently, BranchyNet will use further or all network layers to provide the best likelihood of correct prediction. We study the BranchyNet architecture using several well-known networks (LeNet, AlexNet, ResNet) and datasets (MNIST, CIFAR10) and show that it can both improve accuracy and significantly reduce the inference time of the network.

12:10-12:30, Paper WeAT5.6
On the Magnitude of Parameters of RBMs Being Universal Approximators
Gu, Linyan	Sun Yat-Sen Univ
Yang, Lihua	Sun Yat-Sen Univ
Keywords: Deep learning, Artificial neural networks, Machine learning and data mining Abstract: This paper concentrates on the magnitude of parameters of restricted Boltzmann machines (RBMs) being universal approximators. It is known that when an RMB is used to compute a probability distribution with sufficient high accuracy, the magnitude of its parameters must tends to infinite unless the probability has a positive lower bound. In this paper, for any given error and probability, we provide a bound, by which there exits an RBM computing the the probability up to the error with parameters bounded. And the bound depends on the error and the input probability.


We2PL	G.Cancun T1
Wolfgang Förstner	Plenary Session

14:00-15:00, Paper We2PL.1
Learning Building Models (I)
Förstner, Wolfgang	Univ. Bonn
Keywords: Image and video analysis and understanding Abstract: Deriving semantic 3D models of man-made environments hitherto has not reached the desired maturity which makes human interaction obsolete. Man-made environments play a central role in navigation, city planning, building management systems, disaster management or augmented reality. They are characterised by rich geometric and semantic structures. These cause conceptual problems when learning generic models or when developing automatic acquisition systems. The problems appear to be caused by (1) the incoherence of the models for signal analysis, (2) the type of interplay between discrete and continuous geometric representations, (3) the inefficiency of the interaction between crisp models, such as partonomies and taxonomies, and soft models, mostly having a probabilistic nature, and (4) the vagueness of the used notions in the envisaged application domains. The paper wants to encourage the development and learning of generative models, specifically for man-made objects, to be able to understand, reason about, and explain interpretations.


WePT1	Poster Session Hall
WeP1	Poster Session

15:00-17:10, Paper WePT1.1
Selective Unsupervised Feature Learning with Convolutional Neural Network (S-CNN)
Ghaderi, Aamir	Univ. of Texas at Arlington
Athitsos, Vassilis	Univ. of Texas at Arlington
Keywords: Deep learning, Classification and clustering, Artificial neural networks Abstract: Supervised learning needs vitally huge amounts of labeled data. Labeling thousands or millions of data is very boring and needs much more time and cost. On the other hand one of the important goals in visual recognition is to create features from unlabeled data. In this paper we propose new method for training a CNN with no need to labeled instances. This algorithm for unsupervised feature learning is successful when we test for object recognition. We implement simple algorithm which can get accuracy similar to more sophisticated ones. This is easy for training and resistant to overfitting. We show the results on the popular data sets that are STL-10, CIFAR-10, and CIFAR-100 where our accuracy is competitive with the other methods. Selective Convolutional Neural Network (S-CNN) is simple and fast algorithm, that introduces new way of unsupervised feature learning and provide discriminative features which generalize well.

15:00-17:10, Paper WePT1.2
Detection of Intracranial Hypertension Using Deep Learning
Quachtran, Benjamin	Univ. of California, Los Angeles
Hamilton, Robert	Neural Analytics
Scalzo, Fabien	UCLA, Geffen School of Medicine
Keywords: Deep learning, Medical image and signal analysis, Classification and clustering Abstract: Intracranial Hypertension, a disorder characterized by elevated pressure in the brain, is typically monitored in neurointensive care and diagnosed only after elevation has occurred. This reaction-based method of treatment leaves patients at higher risk of additional complications in case of misdetec- tion. The detection of intracranial hypertension has been the subject of many recent studies in an attempt to accurately characterize the causes of hypertension, specifically examining waveform morphology. We investigate the use of Deep Learning, a hierarchical form of machine learning, to model the relationship between hypertension and waveform morphology, giving us the ability to accurately detect presence hypertension. Data from 60 patients, showing intracranial pressure levels over a half hour time span, was used to evaluate the model. We divided each patient’s recording into average normalized beats over 30 sec segments, assigning each beat a label of high (i.e. greater than 15 mmHg) or low intracranial pressure. The model was tested to predict the presence of elevated intracranial pressure. The algorithm was found to be 92.05± 2.25% accurate in detecting intracranial hypertension.

15:00-17:10, Paper WePT1.3
Underwater Target Classification in Synthetic Aperture Sonar Imagery Using Deep Convolutional Neural Networks
Williams, David	NATO Science and Tech. Organization
Keywords: Deep learning, Pattern Recognition for Surveillance and Security, Artificial neural networks Abstract: Deep convolutional neural networks are used to perform underwater target classification in synthetic aperture sonar (SAS) imagery. The deep networks are learned using a massive database of real, measured sonar data collected at sea during different expeditions in various geographical locations. A novel training procedure is developed specially for the data from this new sensor modality in order to augment the amount of training data available for learning and to avoid overfitting. The deep networks learned are employed for several binary classification tasks in which different classes of objects in real sonar data are to be discriminated. The proposed deep approach consistently achieves superior performance to a traditional feature-based classifier that we had relied on previously.

15:00-17:10, Paper WePT1.4
Bad Teacher or Unruly Student: Can Deep Learning Say Something in Image Forensics Analysis?
Rota, Paolo	Vienna Univ. of Tech
Sangineto, Enver	Univ. of Trento
Conotter, Valentina	Social IT S.r.l
Pramerdorfer, Christopher	Vienna Univ. of Tech
Keywords: Deep learning, Security issues, Other applications Abstract: The pervasive availability of the Internet, coupled with the development of increasingly powerful technologies, has led digital images to be the primary source of visual information in nowadays society. However, their reliability as a true representation of reality cannot be taken for granted, due to the affordable powerful graphics editing softwares that can easily alter the original content, leaving no visual trace of any modification on the image making them potentially dangerous. This motivates developing technological solutions able to detect media manipulations without a prior knowledge or extra information regarding the given image. At the same time, the huge amount of available data has also led to tremendous advances of data-hungry learning models, which have already demonstrated in last few years to be successful in image classification. In this work we propose a deep learning approach for tampered image classification. To our best knowledge, this the first attempt to use the deep learning paradigm in an image forensic scenario. In particular, we propose a new blind deep learning approach based on Convolutional Neural Networks (CNN) able to learn invisible discriminative artifacts from manipulated images that can be exploited to automatically discriminate between forged and authentic images. The proposed approach not only detects forged images but it can be extended to localize the tampered regions within the image. This method outperforms the state-of-the-art in terms of accuracy on CASIA TIDE v2.0 dataset. The capability of automatically crafting discriminant features can lead to surprising results. For instance, detecting image compression filters used to create the dataset. This argument is also discussed within this paper.

15:00-17:10, Paper WePT1.5
Semi-Supervised Tuning from Temporal Coherence
Maltoni, Davide	Univ. of Bologna
Lomonaco, Vincenzo	Univ. of Bologna
Keywords: Deep learning, Semi-supervised learning and spectral methods, Artificial neural networks Abstract: Recent works demonstrated the usefulness of temporal coherence to regularize supervised training or to learn invariant features with deep architectures. In particular, enforcing a smooth output change while presenting temporally-closed frames from video sequences, proved to be an effective strategy. In this paper we prove the efficacy of temporal coherence for semi-supervised incremental tuning. We show that a deep architecture, just mildly trained in a supervised manner, can progressively improve its classification accuracy, if exposed to video sequences of unlabeled data. The extent to which, in some cases, a semi-supervised tuning allows to improve classification accuracy (approaching the supervised one) is somewhat surprising. A number of control experiments pointed out the fundamental role of temporal coherence.

15:00-17:10, Paper WePT1.6
Non-Negative Multiple Matrix Factorization with Euclidean and Kullback-Leibler Mixed Divergences
Kohjima, Masahiro	NTT Corp
Matsubayashi, Tatsushi	NTT Corp
Sawada, Hiroshi	NTT
Keywords: Machine learning and data mining Abstract: In this paper, we tackle the problem of extracting latent structure and patterns from multiple datasets that consist of users' rating scores and activity logs (click, view, visit, ...) in order to understand the typical users' behavior. Our proposed method is based on non-negative matrix factorization, and factorizes multiple matrices simultaneously while adopting Euclidean distance and generalized KL divergence for the rating matrix and the activity matrix, respectively. We derive an optimization algorithm that offers a theoretical guarantee that it can find a locally optimal solution. Our experiments show that the proposed method outperformed existing methods when measured by mean squared error, which implies that it can extract latent structure and patterns more precisely. We also confirm that the segmentation result by the proposal helps to analyze users' behavior.

15:00-17:10, Paper WePT1.7
On Combining Websensors and DTW Distance for Knn Time Series Forecasting
Marcacini, Ricardo Marcondes	Federal Univ. of Mato Grosso Do Sul
Carnevali, Julio César	Univ. Federal De Mato Grosso Do Sul
Domingos, João Domingos Ferreira Mundim	UFMS
Keywords: Machine learning and data mining Abstract: In the pattern recognition field, different approaches have been proposed to improve time series forecasting models. In this sense, k-Nearest-Neighbour (kNN) with DTW (Dynamic Time Warping) distance is one of the most representative methods, due to its effectiveness, simplicity and intuitiveness. The great advantage of the DTW distance is the robustness to distortions in the time axis by allowing stretching and squeezing (time warping) of the time series, while traditional measures require a linear alignment between each data point. However, as well as other traditional measures, the DTW distance has the limitation of focusing only on historical time series data to predict future values, thereby not considering additional external knowledge of the problem domain. In this paper, we propose an approach called TSFW (Time Series Forecasting with Websensors) that incorporates Websensors into DTW distance to improve kNN time series forecasting. Websensors are models that represent knowledge extracted from news about the problem domain as well as the temporal evolution of this knowledge. In our proposed TSFW approach, we show that Websensors allow a more robust non-linear alignment of the time series by using similar events (extracted from news) that have occurred in the both time series. Thus, distortions in the time axis among the time series can be corrected more accurately compared to the traditional technique that uses only the original values of the time series.

15:00-17:10, Paper WePT1.8
Exploiting Social and Mobility Patterns for Friendship Prediction in Location-Based Social Networks
Valverde-Rebaza, Jorge Carlos	Univ. of São Paulo
Roche, Mathieu	Cirad, TETIS & LIRMM
Poncelet, Pascal	LIRMM
de Andrade Lopes, Alneu	Univ. of São Paulo
Keywords: Machine learning and data mining, Classification and clustering Abstract: Link prediction is a ``hot topic'' in network analysis and has been largely used for friendship recommendation in social networks. With the increased use of location-based services, it is possible to improve the accuracy of link prediction methods by using the mobility of users. The majority of the link prediction methods focus on the importance of location for their visitors, disregarding the strength of relationships existing between these visitors. We, therefore, propose three new methods for friendship prediction by combining, efficiently, social and mobility patterns of users in location-based social networks (LBSNs). Experiments conducted on real-world datasets demonstrate that our proposals achieve a competitive performance with methods from the literature and, in most of the cases, outperform them. Moreover, our proposals use less computational resources by reducing considerably the number of irrelevant predictions, making the link prediction task more efficient and applicable for real world applications.

15:00-17:10, Paper WePT1.9
Detecting Contextual Collective Anomalies at a Glance
Prado-Romero, Mario Alfonso	CENATAV
Gago Alonso, Andrés	Advanced Tech. Application Centre (CENATAV)
Keywords: Machine learning and data mining, Classification and clustering Abstract: Many phenomena in our world can be modeled as networks, from neurons in the human brain, computer networks and bank transactions to social interactions. Anomaly detection is an important data mining task consisting in detecting rare objects that deviate from the majority of the data. Contextual collective anomaly detection techniques can be applied to intrusion detection in computer networks, bank fraud detection, or finding people with strange behavior in social networks. In this work, a fast and intuitive algorithm to detect collective contextual anomalies is presented. Furthermore, the importance of selecting algorithms which find meaningful outliers for the application domain specialists is analyzed.

15:00-17:10, Paper WePT1.10
Dual Approximated Nuclear Norm Based Matrix Regression Via Adaptive Line Search Scheme
Luo, Lei	Nanjing Univ. of Science and Tech
Yang, Jian	Nanjing Univ. of Science and Tech
Tu, Qinghua	School of Computer Science and Engineering, Nanjing Univ. Of
Zhang, Yigong	Nanjing Univ. of Science and Tech
Keywords: Machine learning and data mining, Classification and clustering Abstract: Face recognition with partial occlusion is one of the urgent and challenging problems in the pattern recognition research. Using the Alternating Direction Method of Multipliers (ADMM), the recently proposed nuclear norm based matrix regression model (NMR) has been shown a great potential in dealing with the structural noise. And yet, ADMM needs to bring into an auxiliary variable and only exploits the convexity of NMR. Compared with ADMM, the gradient based methods are simpler. To make use of these methods, this paper considers the Approximated NMR (ANMR) model. Utilizing the singular value shrinkage operator and strong convexity of ANMR, the dual problem of ANMR (DANMR) is derived and a crucial result is obtained: the primal optimal solution of ANMR can be converted as the matrix function associated with the dual optimal solution. Due to the differentiability of DANMR, an adaptive line search scheme is developed to solve it. This approach combines the advantages of the accelerated gradient technique and adaptive parameters updating strategy. Therefore, a convergence rate of O(1/N2) can be guaranteed. Experimental results show the superiority of the proposed algorithm over some existing methods.

15:00-17:10, Paper WePT1.11
Aggregation Procedure of Gaussian Mixture Models for Additive Features
Ridi, Antonio	Univ. of Applied Sciences Western Switzerland
Gisler, Christophe	Univ. of Fribourg, Switzerland
Hennebert, Jean	Univ. of Applied Sciences Western Switzerland
Keywords: Machine learning and data mining, Classification and clustering Abstract: In this work we provide details on a new and effective approach able to generate Gaussian Mixture Models (GMMs) for the classification of aggregated time series. More specifically, our procedure can be applied to time series that are aggregated together by adding their features. The procedure takes advantage of the additive property of the Gaussians that complies with the additive property of the features. Our goal is to classify aggregated time series, i.e. we aim to identify the classes of the single time series contributing to the total. The standard approach consists in training the models using the combination of several time series coming from different classes. However, this has the drawback of being a very slow operation given the amount of data. The proposed approach, called GMMs aggregation procedure, addresses this problem. It consists of three steps: (i) modeling the independent classes, (ii) generation of the models for the class combinations and (iii) simplification of the generated models. We show the effectiveness of our approach by using time series in the context of electrical appliance consumption, where the time series are aggregated by adding the active and reactive power. Finally, we compare the proposed approach with the standard procedure.

15:00-17:10, Paper WePT1.12
Learning Multi-View Strategies with Boosting for Classification
Peng, Jing	Montclair State Univ.
Aved, Alex	AFRL

15:00-17:10, Paper WePT1.13
Multiview Clustering Based on Robust and Regularized Matrix Approximation
Pu, Jiameng	Computer School, Wuhan Univ
Zhang, Qian	Beijing Samsung Telecom R&D Center
Zhang, Lefei	Department of Computing, the Hong Kong Pol. Univ
Du, Bo	School of Computer, Wuhan Univ. Wuhan 430079, China
You, Jane	The Hong Kong Pol. Univ
Keywords: Machine learning and data mining, Classification and clustering, Dimensionality reduction and manifold learning Abstract: Pattern recognition tasks such as the data classification and clustering usually can be represented by the perspective of multiple views or feature spaces. Obviously, the performance of the classification and clustering should be greatly improved if we carefully consider the discriminabilities from multiple views and explore the complementary information among them. However, multiple features also bring new challenges to handle them. In the literature, many existed multiview feature learning methods dealt with different views equally, thus they couldn't optimally utilize the complementary property of them. On the other hand, matrix factorization based clustering algorithms usually adopt the conventional ell_{2}-norm based squared residue minimization to measure the loss, which is easily influenced by the outliers and noises from the multiple sources of input. In this paper, we propose a novel multiview data clustering algorithm based on the matrix factorization to relieve the above issues. The basic idea for the proposed Robust and Regularized Matrix Approximation (RRMA) is that the observed data matrix could be low-rank approximated by a cluster centroid matrix and a cluster indicator matrix, respectively, and the major contributions of our work lie in the introduction of the robust ell_{2,1}-norm and ensemble manifold regularization to regularize the matrix factorization and make the model more discriminative for multiview data clustering. We properly adjust the importance of different views by assigning a set of trainable weights on the views. Moreover, we propose an efficient solution featured with impactful updating rules to seek the local optimal parameters. Encouraging experimental results on numerous public multiview datasets demonstrate the superiority of our model compared to some state-of-the-art methods.

15:00-17:10, Paper WePT1.14
Detecting Low-Quality Reference Time Series in Stream Recognition
Dupont, Marc	IRISA
Marteau, Pierre-Francois	Univ. De Bretagne Sud
Ghouaiel, Nehla	IRISA Labaratory, Univ. of South Brittany
Keywords: Machine learning and data mining, Classification and clustering, Gesture and Behavior Analysis Abstract: On-line supervised spotting and classification of subsequences can be performed by comparing some distance between the stream and previously learnt time series. However, learning a few incorrect time series can trigger disproportionately many false alarms. In this paper, we propose a fast technique to prune bad instances away and automatically select appropriate distance thresholds. Our main contribution is to turn the ill-defined spotting problem into a collection of single well-defined binary classification problems, by segmenting the stream and by ranking subsets of instances on those segments very quickly. We further demonstrate our technique's effectiveness on a gesture recognition application.

15:00-17:10, Paper WePT1.15
Instance Selection Using Non-Linear Sparse Modeling
Dornaika, Fadi	Univ. of the Basque Country
Kamal Aldine, Ihab	Univ. OF THE BASQUE COUNTRY
Ruichek, Yassine	Univ. De Tech. De Belfort-Montbeliard
Keywords: Machine learning and data mining, Classification and clustering, Statistical, syntactic and structural pattern recognition Abstract: Sparse Modeling Representative Selection (SMRS) has been recently introduced for selecting the most relevant examples in datasets. SMRS exploits data self-representativeness coding in order to infer a coding matrix with block sparsity constraint. The relevance scores of samples are then derived from the estimated matrix of coefficients. Since SMRS is based on a linear model for data self-representation, it cannot always provide good relevant samples. Besides, most of its selected samples can be found in dense areas in input space. In this paper, we propose to overcome the SMRS method's shortcomings that are related to the coding matrix estimation. We introduce two non-linear data self-representativeness coding schemes that are based on Hilbert space and column generation. Experimental evaluation is carried out on summarizing a video movie and on summarizing training image datasets used for classification tasks. These experiments demonstrated that the proposed non-linear methods can outperform state-of-the art selection methods including the SMRS method.

15:00-17:10, Paper WePT1.16
Constrained Dominant Sets for Retrieval
Mequanint, Eyasu Zemene	Ca'Foscari Uiversity of Venice
Alemu, Leulseged Tesfaye	Ca'Foscari Univ. of Venice
Pelillo, Marcello	Ca' Foscari Univ
Keywords: Machine learning and data mining, Content based image retrieval and data mining Abstract: Learning new global relations based on an initial affinity of the database objects has shown significant improvements in similarity retrievals. Locally constrained diffusion process is one of the recent effective tools in learning the intrinsic manifold structure of a given data. Existing methods, which constrain the diffusion process locally, have problems - manual choice of optimal local neighborhood size, do not allow for intrinsic relation among the neighbors, fix initialization vector to extract dense neighbor - which negatively affect the affinity propagation. We propose a new approach, which alleviate these issues, based on some properties of a family of quadratic optimization problems related to dominant sets, a well-known graph theoretic notion of a cluster which generalizes the concept of a maximal clique to edge-weighted graphs. In particular, we show that by properly controlling a regularization parameter which determines the structure and the scale of the underlying problem, we are in a position to extract dominant set cluster which is constrained to contain user-provided query. Experimental results on standard benchmark datasets show the effectiveness of the proposed approach.

15:00-17:10, Paper WePT1.17
Hyperparameter Tuning for Big Data Using Bayesian Optimisation
Theckel Joy, Tinu	Deakin Univ
Rana, Santu	Deakin Univ
Gupta, Sunil Kumar	Deakin Univ
Venkatesh, Svetha	Deakin Univ
Keywords: Machine learning and data mining, Model selection, Deep learning Abstract: Hyperparameters play a crucial role in the model selection of machine learning algorithms. Tuning these hyperparameters can be exhaustive when the data is large. Bayesian optimisation has emerged as an efficient tool for hyperparameter tuning of machine learning algorithms. In this paper, we propose a novel framework for tuning the hyperparameters for big data using Bayesian optimisation. We divide the big data into chunks and generate hyperparameter configurations for the chunks using the standard Bayesian optimisation. We utilise the information from the chunks for the hyperparameter tuning for the big data using a transfer learning setting. We evaluate the performance of the proposed method on the task of tuning hyperparameters of two machine learning algorithms. We show that our method achieves the best available hyperparameter configuration within less computational time compared to the state-of-art hyperparameter tuning methods.

15:00-17:10, Paper WePT1.18
Bayesian Regression Selecting Valuable Subset from Mixed Bag Training Data
Katsuki, Takayuki	IBM Res. - Tokyo
Inoue, Masato	Waseda Univ
Attachments: Supplementary material Keywords: Machine learning and data mining, Other applications Abstract: This paper addresses a problem in which we learn a regression model from sets of training data. Each of the sets has an only single label, and only one of the training data in the set reflects the label. This is particularly the case when the label is attached to a group of data, such as time-series data. The label is not attached to the point of the sequence but rather attached to particular time window of the sequence. As such, a small part of the time window likely reflects the label, whereas the other larger part of the time window likely does not reflect it. We design an algorithm for estimating which of the training data in each of the sets corresponds to the label, as well as for training the regression model on the basis of Bayesian modeling and posterior inference with variational Bayes. Our experimental results show that our approach perform better than baseline methods on an artificial dataset and on a real-world dataset.

15:00-17:10, Paper WePT1.19
Quantile Regression of Interval-Valued Data
Fagundes, Roberta	Univ. De Pernambuco
Souza, Renata	Univ. Federal De Pernambuco
Soares, Yanne	Univ. De Pernambuco
Keywords: Machine learning and data mining, Performance Evaluation, Model selection Abstract: Linear regression is a standard statistical method widely used for prediction. It focuses on modeling the mean the target variable without accounting for all the distributional properties of this variable. In contrast, the quantile regression model facilitates the analysis of the full distributional properties, it allows to model different quantities of the target variable. This paper proposes a quantile regression model for interval data. In this model, each interval variable of the input data is represented by its range and center and a smooth function between two vectors composed by interval variables are defined. In order to test the usefulness of the proposed model, a simulation study is undertaken and an application using a scientific production interval data set of institutions from Brazil are performed. The quality of the interval prediction obtained by the proposed model is assessed by mean magnitude of relative error calculated from test data.

15:00-17:10, Paper WePT1.20
Reinforcement Learning Via Recurrent Convolutional Neural Networks
Shankar, Tanmay	Indian Inst. of Tech. Guwahati
Dwivedy, Santosha Kumar	Indian Inst. of Tech. Guwahati
Guha, Prithwijit	Department of EEE, IIT Guwahati
Keywords: Reinforcement learning and temporal models, Deep learning, Artificial neural networks Abstract: Deep Reinforcement Learning has enabled the learning of policies for complex tasks in partially observable environments, without explicitly learning the underlying model of the tasks. While such model-free methods do achieve considerable performance, they often ignore the structure of task. We present a more natural representation of the solutions to Reinforcement Learning (RL) problems, within 3 Recurrent Convolutional Neural Network (RCNN) architectures to better exploit this inherent structure. The forward passes of each RCNN execute an efficient Value Iteration, propagate beliefs of state in partially observable environments, and choose optimal actions respectively. Applying back-propagation to these RCNNs allows the system to explicitly learn the Transition Model and Reward Function associated with the underlying MDP, serving as an elegant alternative to classical model-based RL. We evaluate the proposed algorithms in simulation, considering a robot planning problem. We demonstrate the capability of our framework to reduce the cost of re-planning, learn accurate MDP models, and finally replan with learned models to achieve near-optimal policies.

15:00-17:10, Paper WePT1.21
A Distance-Based Shape Descriptor Invariant to Similitude and Its Application to Shape Classification
Presles, Benoit	Univ. Coll. London
Debayle, Johan	Ec. Nationale Supérieure Des Mines De Saint-Etienne
Keywords: Representation and analysis in pixel/voxel images, Classification and clustering Abstract: Pattern recognition usually requires to describe or represent shapes with some features, called shape escriptors. A shape descriptor generally needs to be invariant to some geometrical transformations (translation, rotation, scaling...). In addition, it has to be robust against slight deformations or noise damaging of the shape. In this paper, a novel shape descriptor based on distances and invariant to similitude transformation is proposed. A metric associated to the proposed descriptor is then introduced to measure the dissimilarity between shapes. Performance tests are evaluated on the Kimia and MPEG7 image databases to evaluate the quality of the proposed descriptor. More specifically, the proposed method shows a better performance for shape classification in comparison to some methods from the literature.

15:00-17:10, Paper WePT1.22
A Soft-Labeled Self-Training Approach
Mey, Alexander	TU Delft
Loog, Marco	Delft Univ. of Tech. / Univ. of Copenhagen
Keywords: Semi-supervised learning and spectral methods Abstract: Semi-supervised classification methods try to improve a supervised learned classifier with the help of unlabeled data. In many cases one assumes a certain structure on the data, as for example the manifold assumption, the smoothness assumption or the cluster assumption. Self-training is a method that does not need any assumptions on the data itself. The idea is to use the supervised trained classifier to label the unlabeled points and to enlarge this way the training data. This paper aims to show that a self-training approach with soft-labeling is preferable in many cases in terms of expected loss (risk) minimization. The main idea is to use a soft-labeling to minimize the risk on labeled and unlabeled data together, in which the hard-labeled self-training is an extreme case.

15:00-17:10, Paper WePT1.23
Deep Sparse-Coded Network (DSN)
Gwon, Youngjune	Harvard Univ
Cha, Miriam	Harvard Univ
Kung, H. T.	Harvard Univ
Keywords: Semi-supervised learning and spectral methods, Classification and clustering, Deep learning Abstract: We present Deep Sparse-coded Network (DSN), a deep architecture based on multilayer sparse coding. It has been considered difficult to learn a useful feature hierarchy by stacking sparse coding layers in a straightforward manner. The primary reason is the modeling assumption for sparse coding that takes in a dense input and yields a sparse output vector. Applying a sparse coding layer on the output of another tends to violate the modeling assumption. We overcome this shortcoming by interlacing nonlinear pooling units. Average- or max-pooled sparse codes are aggregated to form dense input vectors for the next sparse coding layer. Pooling achieves nonlinear activation analogous to neural networks while not introducing diminished gradient flows during the training. We introduce a novel backpropagation algorithm to finetune the proposed DSN beyond the pretraining via greedy layerwise sparse coding and dictionary learning. We build an experimental 4-layer DSN with the L1-regularized LARS and the greedy-L0 OMP, and demonstrate superior performance over a similarly-configured stacked autoencoder (SAE) on CIFAR-10.

15:00-17:10, Paper WePT1.24
Dynamic Adaptive Graph Construction: Application to Graph-Based Multi-Observation Classification
Dornaika, Fadi	Univ. of the Basque Country
Dahbi, Radouan	Univ. of Tech. of Belfort-Montbéliard
Bosaghzadeh, Alireza	Univ. of Basque Country
Ruichek, Yassine	Univ. De Tech. De Belfort-Montbeliard
Keywords: Semi-supervised learning and spectral methods, Classification and clustering, Machine learning and data mining Abstract: Most of graph construction techniques assume a transductive setting in which the whole data collection is available at construction time. Addressing graph construction for inductive setting, in which data are coming sequentially, has received much less attention. Constructing the graph from scratch can be very time consuming. In this paper, we propose an efficient dynamic graph construction method that adds new samples (labeled or unlabeled) to a previously constructed graph. We use a Two Phase Weighted Regularized Least Square (TPWRLS) coding scheme to represent new sample(s) with respect to an existing data set. The representative coefficients are then used to update the graph affinity matrix. The proposed method not only appends the new samples to the graph but also updates the whole graph structure by discovering which nodes are affected by the introduction of new samples and by updating their edge weights. The proposed construction framework is applied to the problem of graph-based label propagation using multiple observations in a semi-supervised scenario. Experiments on three public image databases show that, without any significant loss in the accuracy of the final classification, the proposed dynamic graph construction is more efficient than the batch graph construction.

15:00-17:10, Paper WePT1.25
Model-Based Classification and Novelty Detection for Point Pattern Data
Vo, Ba-Ngu	Curtin Univ
Tran, Nhat-Quang	Curtin Univ
Phung, Dinh	Deakin Univ
Vo, Ba-Tuong	Curtin Univ
Keywords: Semi-supervised learning and spectral methods, Classification and clustering, Machine learning and data mining Abstract: Point patterns are sets or multi-sets of unordered elements that can be found in numerous data sources. However, in data analysis tasks such as classification and novelty detection, appropriate statistical models for point pattern data have not received much attention. This paper proposes the modelling of point pattern data via random finite sets (RFS). In particular, we propose appropriate likelihood functions, and a maximum likelihood estimator for learning a tractable family of RFS models. In novelty detection, we propose novel ranking functions based on RFS models, which substantially improve performance.

15:00-17:10, Paper WePT1.26
A PAC Bound for Joint Matrix Completion Via Partially Collective Matrix Factorization
Lan, Chao	Univ. of Kansas
Li, Xiaoli	Univ. of Kansas
Deng, Yujie	Univ. of Kansas
Amand, Joseph St.	Univ. of Kansas
Huan, Jun	Univ. of Kansas
Keywords: Semi-supervised learning and spectral methods, Transfer learning, Machine learning and data mining Abstract: Collective Matrix Factorization (CMF) is a popular model for the joint matrix completion task, but limited by its strong assumption that all matrices share the same low-rank structure. Recently, an alternative model was proposed with a relaxed assumption that matrix low-rank structures are partly shared. We refer this model as Partially Collective Matrix Factorization (P-CMF). This paper presents a first PAC generalization error bound for joint matrix completion based on the P-CMF model. Our technical contributions are tri-facet. First, we derive a new PAC bound for single matrix completion, which fundamentally improves the existing PAC bound in multiple aspects. Then, based on it we derive the first PAC bound for joint matrix completion based on the P-CMF model. This not only justifies the theoretical soundness of P-CMF, but also reveals its several insights. Finally, we present a model construction criterion for P-CMF based methods, which specifies the degrees of sharing between matrix low-rank structures. We demonstrate the effectiveness of this criterion in simulation.

15:00-17:10, Paper WePT1.27
Shape Classification with a Vertex Clustering Graph Kernel
Bai, Lu	Central Univ. of Finance and Ec
Cui, Lixin	School of Information, Central Univ. of Finance and Ec
Wang, Yue	Central Univ. of Finance and Ec
Bai, Xiao	Beihang Univ
Hancock, Edwin	Univ. of York
Jin, Xin	Central Univ. of Finance and Ec. Beijing, China
Keywords: 2D/3D object detection and recognition, Machine learning and data mining Abstract: Graph kernels are powerful tools for structural analysis in computer vision. Unfortunately, most existing state-of-the-art graph kernels ignore the locational or structural correspondence information between graphs, based on the visual background. This drawback influences the performance of existing kernels for computer vision based classification problems, e.g., classification of shapes, point clouds and digital images. The aim of this paper is to address the problem with existing kernels, by developing a novel vertex clustering graph kernel. We show that this kernel not only overcomes the shortcoming of ignoring correspondence information between isomorphic substructures that arises in most existing graph kernels, but also guarantees the transitivity between the correspondence information. Our kernel can easily outperform state-of-the-art graph kernels in terms of classification accuracy on standard shape based graph datasets.

15:00-17:10, Paper WePT1.28
Ensemble-Based Local Learning for High-Dimensional Data Regression
Raytchev, Bisser	Hiroshima Univ
Katamoto, Yoshinari	Hiroshima Univ
Koujiba, Miku	Hiroshima Univ
Tamaki, Toru	Hiroshima Univ
Kaneda, Kazufumi	Hiroshima Univ
Keywords: Active and ensemble learning, Classification and clustering Abstract: In this paper we propose a new local learning based regression method which utilizes ensemble-learning as a form of regularization to reduce the variance of local estimators. This makes it possible to use local learning methods even with very high-dimensional datasets. The efficacy of the proposed method is illustrated on two publicly available high-dimensional sets in comparison with several global learning methods, and it is shown that the proposed ensemble-based local learning method significantly outperforms the global ones.

15:00-17:10, Paper WePT1.29
Active Learning Using Uncertainty Information
Yang, Yazhou	Delft Univ. of Tech
Loog, Marco	Delft Univ. of Tech. / Univ. of Copenhagen
Keywords: Active and ensemble learning, Machine learning and data mining, Model selection Abstract: Many active learning methods belong to the retraining-based approaches, which select one unlabeled instance, add it to the training set with its possible labels, retrain the classification model, and evaluate the criteria that we base our selection on. However, since the true label of the selected instance is unknown, these methods resort to calculating the average-case or worse-case performance with respect to the unknown label. In this paper, we propose a different method to solve this problem. In particular, our method aims to make use of the uncertainty information to enhance the performance of retraining-based models. We apply our method to two state-of-the-art algorithms and carry out extensive experiments on a wide variety of real-world datasets. The results clearly demonstrate the effectiveness of the proposed method and indicate it can reduce human labeling efforts in many real-life applications.

15:00-17:10, Paper WePT1.30
D-LSM: Deep Liquid State Machine with Unsupervised Recurrent Reservoir Tuning
Wang, Qian	Texas A&M Univ
Li, Peng	Texas A&M Univ
Keywords: Artificial neural networks, Deep learning Abstract: The Liquid State Machine (LSM) is a biologically plausible model of computation for recurrent spiking neural networks, which offers promising solutions to real-world applications in both software and hardware based systems. At the same time, deep feedforward rate-based neural networks such as convolutional neural networks (CNNs) have achieved great success in many computer vision related applications. However, a systematic exploration of deep recurrent spiking neural networks is lacking. We propose a new model of Deep Liquid State Machine (D-LSM), which simultaneously explores the powers of recurrent spiking networks and deep architectures. D-LSM consists of multiple basic LSM processing and pooling stages. Recurrent reservoir networks across different LSM stages act as nonlinear filters capable of extracting spatio-temporal features of increasingly higher levels from the input. We propose to train the D-LSM practically by adopting unsupervised training (e.g. through STDP) for recurrent reservoirs and spike-based supervised rules for the final readout stage. The perspective of realizing D-LSM based hardware processors is also presented.

15:00-17:10, Paper WePT1.31
Generating Commentaries for Tennis Videos
Yan, Fei	Univ. of Surrey
Mikolajczyk, Krystian	Univ. of Surrey
Kittler, Josef	Univ. of Surrey
Keywords: Artificial neural networks, Image and video analysis and understanding, Deep learning Abstract: We present an approach to automatically generating verbal commentaries for tennis games. We introduce a novel application that requires a combination of techniques from computer vision, natural language processing and machine learning. A video sequence is first analysed using state-of-the-art computer vision methods to track the ball, fit the detected edges to the court model, track the players, and recognise their strokes. Based on the recognised visual attributes we formulate the tennis commentary generation problem in the framework of long short-term memory recurrent neural networks as well as structured SVM. In particular, we investigate pre-embedding of descriptive terms and loss function for LSTM. We introduce a new dataset of 633 annotated pairs of tennis videos and corresponding commentary. We perform an automatic as well as human based evaluation, and demonstrate that the proposed pre-embedding and loss function lead to substantially improved accuracy of the generated commentary.

15:00-17:10, Paper WePT1.32
Linear Model Optimizer vs Neural Networks: A Comparison for Improving the Quality and Saving of LED-Lighting Control Systems
Lobato-Rios, Victor	Inst. Nacional De Astrofisica, Optica Y Electronica
Hernandez-Castañon, Viviana del Rocio	Inst. Nacional De Astrofisica, Optica Y Electronica
Carrasco-Ochoa, Jesus Ariel	National Inst. of Astrophysics, Optics and Electronics
Martinez-Trinidad, Francisco	National Inst. of Astrophysics, Optics and Electronics
Keywords: Artificial neural networks, Machine learning and data mining Abstract: Lighting systems represents about 38% of the total energy consumption in office buildings, however, a great amount of this energy is wasted because luminaires keep working at its maximum power even when just a single person is present. In order to improve the performance of LED-Lighting control systems, we propose a linear model that considers the luminaire’s influence on its neighborhood and takes into account visual comfort and energy consumption. The proposed linear model was contrasted against two Neural Network configurations that were trained to find the best dimming levels. Our experiments demonstrate that a linear optimizer applied to our proposed linear model have a better performance than any of the two tested neural networks.

15:00-17:10, Paper WePT1.33
Point Cloud Labeling Using 3D Convolutional Neural Network
Huang, Jing	Univ. of Southern California
You, Suya	Univ. of Southern California
Attachments: Supplementary material Keywords: Deep learning, Representation and analysis in pixel/voxel images, Scene understanding Abstract: In this paper, we tackle the labeling problem for 3D point clouds. We introduce a 3D point cloud labeling scheme based on 3D Convolutional Neural Network. Our approach minimizes the prior knowledge of the labeling problem and does not require a segmentation step or hand-crafted features as most previous approaches did. Particularly, we present solutions for large data handling during the training and testing process. Experiments performed on the urban point cloud dataset containing 7 categories of objects show the robustness of our approach.

15:00-17:10, Paper WePT1.35
Evolutionary Data Purification for Social Media Classification
James, Stuart	Univ. of Surrey
Collomosse, John Philip	Univ. of Surrey
Keywords: Image and video analysis and understanding, Classification and clustering, Model selection Abstract: We present a novel algorithm for the semantic labeling of photographs shared via social media. Such imagery is diverse, exhibiting high intra-class variation that demands large training data volumes to learn representative classifiers. Unfortunately image annotation at scale is noisy resulting in errors in the training corpus that confound classifier accuracy. We show how evolutionary algorithms may be applied to select a 'purified' subset of the training corpus to optimize classifier performance. We demonstrate our approach over a variety of image descriptors (including deeply learned features) and support vector machines.

15:00-17:10, Paper WePT1.36
Estimates of Classification Complexity for Myoelectric Pattern Recognition
Nilsson, Niclas	Chalmers Univ. of Tech
Ortiz-Catalan, Max	Chalmers Univ. of Tech
Keywords: Machine learning and data mining, Classification and clustering, Dimensionality reduction and manifold learning Abstract: Myoelectric pattern recognition (MPR) can be used for intuitive control of virtual and robotic effectors in clinical applications such as prosthetic limbs and the treatment of phantom limb pain. The conventional approach is to feed classifiers with descriptive electromyographic (EMG) features that represent the aimed movements. The complexity and consequently classification accuracy of MPR is highly affected by the separability of such features. In this study, classification complexity estimating algorithms were investigated as a potential tool to estimate MPR performance. An early prediction of MPR accuracy could inform the user of faulty data acquisition, as well as suggest the repetition or elimination of detrimental movements in the repository of classes. Two such algorithms, Nearest Neighbor Separability (NNS) and Separability Index (SI), were found to be highly correlated with classification accuracy in three commonly used classifiers for MPR (Linear Discriminant Analysis, Multi-Layer Perceptron, and Support Vector Machine). These Classification Complexity Estimating Algorithms (CCEAs) were implemented in the open source software “BioPatRec” and are available freely online. This work deepens the understanding of the complexity of MPR for the prediction of motor volition.

15:00-17:10, Paper WePT1.37
Multiple Instance Dictionary Learning Using Functions of Multiple Instances
Jiao, Changzhe	Univ. of Missouri
Zare, Alina	Univ. of Missouri
Keywords: Machine learning and data mining, Classification and clustering, Image and video analysis and understanding Abstract: Dictionary Learning Functions of Multiple Instances (DL-FUMI) is proposed to address target detection problems with inaccurate training labels. DL-FUMI is a multiple instance dictionary learning method that estimates target atoms that describe distinctive and representative features of the target class and background atoms that account for the shared features found across both target and non-target data points. Experimental results show that the target atoms estimated by DL-FUMI are more discriminative and representative of the target class than comparison methods. DL-FUMI is shown to have improved performance on several detection problems as compared to other multiple instance dictionary learning algorithms.

15:00-17:10, Paper WePT1.38
One-Shot Learning of Temporal Sequences Using a Distance Dependent Chinese Restaurant Process
Orrite, Carlos	Univ. of Zaragoza
Rodríguez, Mario	Univ. of Zaragoza
Medrano, Carlos	Univ. of Zaragoza
Keywords: Machine learning and data mining, Classification and clustering, Model selection Abstract: Activity recognition in videos is a challenging task, mainly if a scarce number of samples is available for modelling the problem. The task becomes even harder when using generative models such as mixture models or Hidden Markov Models (HMMs), as they demand a lot of samples to determinate their parameters. Additionally, these models rely on the appropriate selection of some parameters, for instance the number of hidden states. Therefore, we propose in this paper the creation of a Universal Background Model (UBM) of features, using videos from public datasets, applied to the activity encoding and an unsupervised modelling of the activities with a distance dependent Chinese Restaurant Process (ddCRP), where the number of states (tables in the Chinese Restaurant descriptions) is automatically determined by the process. In order to classify an incoming video-sequence we propose to model it as a ddCRP distribution and to apply a nearest neighbour algorithm based on a kernel between distributions. To carry out this process we use a Probability Product Kernel (PPK) algorithm by previously mapping the ddCRP into a HMM with discrete observations. Preliminary experiments in two public data sets, as Weizmann and KTH, show that this proposal achieves state-of-the-art results.

15:00-17:10, Paper WePT1.39
Regression-Based Metric Learning
Moutafis, Panagiotis	Univ. of Houston
Leng, Mengjun	Univ. of Houston
Kakadiaris, Ioannis	Univ. of Houston
Attachments: Supplementary material Keywords: Machine learning and data mining, Classification and clustering, Other applications Abstract: Existing distance metric learning methods define an objective function and seek a distance metric (or equivalently a projection) that minimizes it. In this paper, we propose a different approach that illustrates how to formulate distance metric learning as a regression problem. First, the objective function is minimized to learn target representations. Then, a regression method is employed to learn a projection that maps the input to the target representations. This global projection function is the single output of the proposed algorithm. Our contribution is a different perspective on how to train a distance metric learning algorithm. The advantages are: (i) this approach has the potential to simplify the optimization process; and (ii) it allows researchers to leverage the power of existing regression methods and those to be invented. Experimental results on several publicly available datasets illustrate that the proposed framework can learn a distance metric with discriminative properties.


WePT2	Poster Session Hall
WeP2	Poster Session

15:00-17:10, Paper WePT2.1
Attention-Inspired Moving Object Detection in Monocular Dashcam Videos
Yun, Kimin	Seoul National Univ
Lim, Jongin	Seoul National Univ
Yun, Sangdoo	Seoul National Univ
Kim, Soo Wan	Samsung Electronics Co., Ltd
Choi, Jin Young	Automation and System Res. Inst. Seoul National Univ
Keywords: Biologically motivated vision, Low-level vision, Motion, tracking and video analysis Abstract: This paper proposes a moving object detection algorithm for a monocular dashcam mounted on a vehicle. To deal with dynamic changes of the scene from the dashcam, we propose a new scheme inspired by human-attention inclination for change detection. Humans do not build a detailed visual representation and perceive a change of the scene based on the structure of an interesting region. In this perspective, our method focuses on a sky and road region of the scene and builds an abstracted background model, which is updated with a spatially adaptive learning rate according to the center-focused tendency of the human gaze. To improve the robustness of detection, the final detection map is refined by combining the results from twin processes applied to the original image and the median-filtered image, respectively. In experiments, we have found that our method outperforms state-of-the-art methods qualitatively and quantitatively on a realistic dashcam video.

15:00-17:10, Paper WePT2.2
Fast and Accurate Scale Estimation Method for Object Tracking
Rampal, Karan	NEC Corp
Sakurai, Kazuyuki	NEC Corp
Imaoka, Hitoshi	NEC
Keywords: Motion, tracking and video analysis Abstract: Many of the existing tracking methods do not estimate the object scale (width, height), only the location (x, y). In this paper we present a method which can accurately estimate the object scale given the location. The proposed approach works by cascading two methods together; such that each method refines the estimate by removing the false scale samples. Our method does not depend on the tracking technique and can be applied with any tracking system. We apply our approach to an existing tracker and compare the performance on benchmark sequences. The proposed method outperforms the existing tracker, while hardly affecting the speed.

15:00-17:10, Paper WePT2.3
Tracking with the Support of Couplers and Historical Models
He, Ke	Beijing Univ. of Post and Telecommunications
Mo, Borui	Beijing Univ. of Posts and Telecommunications
Li, Ningning	Beijing Univ. of Posts and Telecommunications
Men, Aidong	Beijing Univ. of Posts and Telecommunications

15:00-17:10, Paper WePT2.4
Correcting the Tracker with Memories
He, Ke	Beijing Univ. of Post and Telecommunications
Mo, Borui	Beijing Univ. of Posts and Telecommunications
Li, Ningning	Beijing Univ. of Posts and Telecommunications
Men, Aidong	Beijing Univ. of Posts and Telecommunications

15:00-17:10, Paper WePT2.5
Visual Tracking Via Sparsity Pattern Learning
Wang, Yuxi	Beijing Inst. of Tech
Li, Zhuwen	National Univ. of Singapore
Cheong, Loong Fah	National Univ. of Singapore
Liu, Yue	Beijing Inst. of Tech
Ling, Haibin	Temple Univ
Keywords: Motion, tracking and video analysis Abstract: Recently sparse representation has been applied to visual tracking by modeling the target appearance using a sparse approximation over the template set. However, this approach is limited by the high computational cost of the l1-norm minimization involved, which also impacts on the amount of particle samples that we can have. This paper introduces a basic constraint on the self-representation of the target set. The sparsity pattern in the self-representation allows us to recover the "sparse coefficients" of the candidate samples by some small-scale l2-norm minimization; this results in a fast tracking algorithm. It also leads to a principled dictionary update mechanism which is crucial for good performance. Experiments on a recently released benchmark with 50 challenging video sequences show significant runtime efficiency and tracking accuracy achieved by the proposed algorithm.

15:00-17:10, Paper WePT2.6
An Intensity and Region Guided Narrow Band Level Set Model for Contour Tracking
Das, Somenath	Univ. of Georgia
Bhandarkar, Suchendra	Univ. of Georgia
Chowdhury, Ananda	Jadavpur Univ
Keywords: Motion, tracking and video analysis Abstract: Level set based contour tracking methods have become quite popular in the computer vision community. In this paper, we propose a novel level set based method for tracking dynamic implicit contours that utilizes minimal prior information. Our solution consists of two main steps. In the first step, a simple first order Markov chain model is employed for the coarse localization of a target. In the second step, we evolve level sets within a narrow band to accurately track the target contour. Narrow band curve evolution is guided through color and region based terms in the standard Chan-Vese framework. Comprehensive experimentation on several publicly available tracking datasets clearly demonstrate the advantage of the proposed approach.

15:00-17:10, Paper WePT2.7
Efficient Tracking with Distinctive Target Colors and Silhouette
Xiao, Changlin	The Ohio State Univ
Yilmaz, Alper	Ohio State Univ
Attachments: Supplementary material Keywords: Motion, tracking and video analysis Abstract: Target tracking using color based appearance models is very popular in visual tracking. However, trackers based only on color are fragile and often drift to the background when it has similar appearances. In this paper, we propose an efficient way to use distinctive target colors to track the target and eliminate the drift problem. Colors are sampled from the target and its immediate surrounding region. And color samples coming from target result in more distinctive target color. In our approach, we use a short and a long time color histogram to represent the target color. The short time color histogram is used to calculate the distinctiveness of colors while the long time color histogram is used to keep the target color that is consistent over time. In our approach, the target is not marked as a rectangle or other geometric primitives, instead, we track it with its own silhouette. Using silhouette to mark target significantly reduces the false positive information during online learning. Also, the color models are updated with a dynamic learning factor which is based on the tracking result. After testing with many tracking sequences and comparison with other state-of-art trackers, the proposed tracking algorithm shows comparably better performance with very high tracking rate.

15:00-17:10, Paper WePT2.8
Adaptive and Compressive Target Tracking Based on Feature Point Matching
Li, Fengjiao	Shandong Acad. of Sciences
Zhang, Yuanyuan	Shandong Acad. of Sciences
Yan, WeiQi	AUT Univ
Klette, Reinhard	Auckland Univ. of Tech
Keywords: Motion, tracking and video analysis, 2D/3D object detection and recognition Abstract: In compressive tracking algorithms, a feature reduction projection matrix is constructed by using compressed sensing theory. Target and non-target objects are discriminated by using a naive Bayesian classifier. Such an algorithm may ensure accuracy of target tracking in real-time. But it is not adaptive for tracking with respect to scales and rotations. In this paper, we propose a novel adaptive algorithm based on feature point matching for tracking objects which appear with various changes. We combine weight-average and improved compressive tracking algorithms together for tracking objects, then calculate the corresponding feature points between two subsequent frames of the same object for obtaining the target changes related to various scales and rotations. Our experimental results show that the improved algorithm effectively improves the accuracy of target tracking and ensures adaptability of the tracking algorithm.

15:00-17:10, Paper WePT2.9
Robust Volleyball Tracking System Using Multi-View Cameras
Takahashi, Masaki	Japan Broadcasting Corp. (NHK)
Ikeya, Kensuke	Japan Boradcasting Corp
Kanou, Masanori	Japan Broadcasting Corp
Ookubo, Hidehiko	Japan Broadcasting Corp
Mishina, Tomoyuki	Japan Broadcasting Corp
Keywords: Motion, tracking and video analysis, 2D/3D object detection and recognition, Vision for graphics Abstract: We have developed a real-time ball tracking system that can be used for volleyball games. Although a number of methods for visual object tracking have been proposed, tracking a fast-moving ball is still a challenging task because of the motion blur and the occlusion. We thus use a complementary tracking scheme in which tracking processes for multiple cameras help each other sharing the 3D position of the ball. The ball on each camera is accurately tracked by predicting its position at the next frame. The 3D ball positions measured by the system can be used for drawing the trajectory CG of a ball and for calculating statistical data related to ball movement. Evaluation results obtained using actual volleyball video sequences showed that the system would be effective for visualizing ball trajectories in live volleyball broadcasts.

15:00-17:10, Paper WePT2.10
Depth-Based 3D Hand Pose Tracking
Quach, Kha Gia	Concordia Univ
Duong, Chi Nhan	Univ. of Science, HCMC
Luu, Khoa	Carnegie Mellon Univ
Bui, Tien D.	Concordia Univ
Keywords: Motion, tracking and video analysis, Deep learning, Human Computer Interaction Abstract: In this paper, we propose two new approaches using the Convolution Neural Network (CNN) and the Recurrent Neural Network (RNN) for tracking 3D hand poses. The first approach is a detection based algorithm while the second is a data driven method. Our first contribution is a new tracking-by-detection strategy extending the CNN based single frame detection method to a multiple frame tracking approach by taking into account prediction history using RNN. Our second contribution is the use of RNN to simulate the fitting of a 3D model to the input data. It helps to relax the need of a carefully designed fitting function and optimization algorithm. With such strategies, we show that our tracking frameworks can automatically correct the fail detection made in previous frames due to occlusions. Our proposed method is evaluated on two public hand datasets, i.e. NYU and ICVL, and compared against other recent hand tracking methods. Experimental results show that our approaches achieve the state-of-the-art accuracy and efficiency in the challenging problem of 3D hand pose estimation.

15:00-17:10, Paper WePT2.11
Real Time Eye Gaze Tracking with Kinect
Wang, Kang	Rensselaer Pol. Inst
Ji, Qiang	RPI
Keywords: Motion, tracking and video analysis, Gesture and Behavior Analysis, Human Computer Interaction Abstract: Traditional gaze tracking systems rely on explicit infrared lights and high resolution cameras to achieve high performance and robustness. These systems, however, require complex setup and thus are restricted in lab research and hard to apply in practice. In this paper, we propose to perform gaze tracking with a consumer level depth sensor (Kinect). Leveraging on Kinect’s capability to obtain 3D coordinates, we propose an efficient model-based gaze tracking system. We first build a unified 3D eye model to relate gaze directions and eye features (pupil center, eyeball center, cornea center) through subjectdependent eye parameters. A personal calibration framework is further proposed to estimate the subject-dependent eye parameters. Finally we can perform real time gaze tracking given the 3D coordinates of eye features from Kinect and the subjectdependent eye parameters from personal calibration procedure. Experimental results with 6 subjects prove the effectiveness of the proposed 3D eye model and the personal calibration framework. Furthermore, the gaze tracking system is able to work in real time (20 fps) and with low resolution eye images.

15:00-17:10, Paper WePT2.12
Edge-Guided Depth Map Enhancement
Song, Xibin	Shandong Univ
Huang, Haiyang	Shandong Univ
Zhong, Fan	Shandong Univ
Ma, Xin	Shandong Univ
Qin, Xueying	Shandong Univ
Keywords: Vision for graphics, 3D shape recovery Abstract: Low-cost depth sensing devices, such as Microsoft Kinect, can only produce noisy depth maps that are mis-aligned with color images, and even contain many holes. Even though the coupled high quality color images contain rich information which can be exploited to enhance the depth maps, the redundant color edges often introduce incorrect depth edges in the result depth map, since color images contain more textures than depth maps. To solve this problem, we propose a novel approach which generates accurate color-consistent depth edges by employing both color and depth images. First, Edges of raw depth maps are extracted using image pyramid strategy. Then, the redundant edges in color images are removed according to the raw depth edges, and, accurate color-consistent depth edges are generated by combining raw depth edges with current color edges. Finally, constraints extracted from both raw depth and color images and the generated depth edges are fused in a MRF optimization framework to obtain the enhanced depth map, which is accurately aligned with coupled color image. As experimentally demonstrated, the proposed method achieves outstanding performance when compared with previous approaches.

15:00-17:10, Paper WePT2.13
Maximum Clique Based RGB-D Visual Odometry
Zhang, Yigong	Nanjing Univ. of Science and Tech
Hou, Zhixing	Nanjing Univ. of Science and Tech
Yang, Jian	Nanjing Univ. of Science and Tech
Kong, Hui	Nanjing Univ. of Science and Tech
Keywords: Vision for robotics Abstract: In this paper, we propose a new feature-point based RGB-D visual odometry approach for estimating the relative camera motion from two consecutive frames. The approach differs from most feature-point based RGB-D visual odometry approaches in two key aspects: (1) we do not directly use point correspondences to compute relative motion, instead, we link each two distinct points to form a line segment, then utilize correspondences of the generated line segments to estimate relative motion; (2) considering the measurement noise of the RGB-D camera, we design a threshold technique to control the size of maximum clique. Several experiments on real-world dataset show that our method achieved improved accuracy when compared with other recent RGB-D based odometry methods.

15:00-17:10, Paper WePT2.14
Virtual Flattening of Clothing Item Held in the Air
Kita, Yasuyo	National Inst. Ofadvancedindustrialscienceandtechnology
Kita, Nobuyuki	National Inst. of Advanced Industrial Science Andtechnology
Keywords: Vision for robotics, 2D/3D object detection and recognition Abstract: We propose a method of virtually flattening the surface of a clothing item to a two-dimensional plane using the geodesic distance over the surface of the item. If a clothing item is flatly opened on a table, the recognition of the item is much easier than that of the same item having an arbitrary shape. However, it is difficult and troublesome to physically flatten a clothing item from its arbitrary shape automatically. We therefore propose to develop the surface of the clothing item held in the air into a two-dimensional shape using three-dimensional observation data of the surface. To this end, boundary points of the observed clothing region are sampled as start/end nodes for calculating geodesic lines, the lengths of which become two-dimensional distances between the points when the surface is flattened to a plane. To robustly calculate geodesic lines using a three-dimensional point cloud of the surface, we adopt a method that interpolates the depth and normal direction at any point on the surface using the element-free Galerkin method. The shortest path along the surface between two points on the surface is then calculated using these depths and normal directions in the framework of the “zero-length spring analogy”. The two-dimensional coordinates of the points on the plane are obtained by solving simultaneous equations determined by the geodesic distances. We also propose a method of using the flattened view for the classification of clothing type and detection of important parts of clothing. Preliminary experiments using long-sleeve shirts and trousers as clothing items demonstrate the promise of the proposed methods.

15:00-17:10, Paper WePT2.15
On Looking at Faces in an Automobile: Issues, Algorithms and Evaluation on Naturalistic Driving Dataset
Yuen, Kevan	Univ. of California, San Diego
Martin, Sujitha	Univ. of California, San Diego
Trivedi, Mohan	Univ. of California, San Diego
Keywords: Vision for robotics, Deep learning Abstract: Face detection is a vital step in the process of extracting semantic information about the driver's state, such as distraction and fatigue, from pixel values in images looking at the driver. Therefore, in the context of time and safety critical situation like driving, efficient use of time and reliable detection of faces is essential. While challenges like lighting and occlusion are prevalent in the vehicle cockpit and disruptive for time and reliabilities sake, the automobile cabin has a unique and advantageous environment for face detection. In this study we introduce a deep CNN based face detection method with discrete head pose estimation which address key challenges such as lighting conditions, occlusions, varying view points. One of the vital points in training the CNN based system is the compilation of positive samples via real-world dataset and synthetic data augmentation useful for in-vehicular settings. Performance evaluation on publicly available naturalistic driving data set, called VIVA-Face Dataset, shows promising results compared to baseline methods.

15:00-17:10, Paper WePT2.16
Preparatory Coordination of Head, Eyes and Hands: Experimental Study at Intersections
Martin, Sujitha	Univ. of California, San Diego
Rangesh, Akshay	Univ. of California San Diego
Ohn-Bar, Eshed	Univ. of California, San Diego
Trivedi, Mohan	Univ. of California, San Diego
Keywords: Vision for robotics, Image and video analysis and understanding, Machine learning and data mining Abstract: Drivers use some combination of head, eye and hand movements to perform varying number of tasks from driving related to non-driving secondary tasks. Furthermore, the combinations may vary depending on the task performed. It is important to model and understand these variations in order to build predictive systems, explore driving styles, detect activities, etc. This study, therefore, introduces a framework to model the spatio-temporal movements of head, eyes and hands given naturalistic driving data of looking-in at the driver for any events or tasks performed of interest. As a use case, we explore the temporal coordination of the modalities on data of drivers executing maneuvers at stop-controlled intersections; the maneuvers executed are go straight, turn left and turn right. In sequentially increasing time windows, by training classifiers which have the ability to provide discriminative quality of its input variable, the experimental study at intersections shows which type of, when and how long distinguishable preparatory movements occur in the range of a few milliseconds to a few seconds.

15:00-17:10, Paper WePT2.17
A Novel Hybrid Camera System with Depth and Fisheye Cameras
Perez-Yus, Alejandro	Univ. De Zaragoza
Lopez-Nicolas, Gonzalo	Univ. of Zaragoza
Guerrero, Jose J.	Univ. De Zaragoza
Keywords: Vision sensors, Stereo and multiple view geometry, Vision for robotics Abstract: We introduce a novel hybrid camera configuration composed by a fisheye camera attached to an RGB-D system. Current RGB-D sensors provide the 3D information and scale of the scene, but they are limited by a small field of view. In contrast, wide field of view cameras capture a larger portion of the scene, but providing highly distorted images that require specific algorithms. By coupling a fisheye camera to an RGB-D system we take advantage of both types of cameras overcoming their drawbacks. The system provides a portion of the fisheye image with depth data and we use this seed information to perform scaled operations in the complete image. We also present a calibration procedure of the system to map depth information to the wide angle image. With this purpose, we propose a depth-fisheye calibration algorithm nurturing from state of the art camera models and methods. Several experiments test the accuracy of the system with real images.

15:00-17:10, Paper WePT2.18
Plucker Correction Problem: Analysis and Improvements in Efficiency
Cardoso, João	Pol. Inst. of Coimbra
Miraldo, Pedro	Inst. Superior Tecnico, Lisboa
Araujo, Helder	Univ. of Coimbra
Keywords: Vision sensors, Vision for robotics Abstract: A given six dimensional vector represents a 3D straight line in Plücker coordinates if its coordinates satisfy the Klein quadric constraint. In many problems aiming to find the Plücker coordinates of lines, noise in the data and other type of errors contribute for obtain- ing 6D vectors that do not correspond to lines, because of that constraint. A common procedure to overcome this drawback is to find the Plücker coordinates of the lines that are closest to those vectors. This is known as the Plücker correction problem. In this article we propose a simple, closed-form, and global solution for this problem. When compared with the state-of-the-art method, one can conclude that our algorithm is easier and requires much less operations than previous tech- niques (it does not require Singular Value Decomposi- tion techniques).

15:00-17:10, Paper WePT2.19
Initialized Iterative Closest Point for Bone Recognition in Ultrasound Volumes
Haddad, Oussama	Univ. De Bretagne Occidentale (UBO)
Leboucher, Julien	Lab. De Traitement De L’information Médicale LATIM INSERM
Troccaz, Jocelyne	Univ. Alpes/ CNRS / TIMC-IMAG UMR 5525, Grenoble, F-3804
Stindel, Eric	Centre Hospitalier Régional Et Univ. Service Orthopédie
Keywords: 2D/3D object detection and recognition, Image guidance and robot guidance of interventions, Vision for robotics Abstract: Ultrasound (US) probes have been used as guiding tools for Computer Assisted Orthopedic Surgeries (CAOS) [1]. Because of the US data uncertainty, the process of recognition - the localization of regions of interest in the image- requires a registration to a more precise, but invasive, imaging modality such as Computed Tomography (CT). A millimetric precision and a real-time processing are intraoperative requirements. Iterative Closest Point (ICP) [2] is a simple and non symmetric rigid registration algorithm that is sensitive to the initial position of the point sets. The aim of this study is to show the contribution of initializing ICP in rigid US-CT registration and to illustrate it on data of a proximal femur. First, an iterative initialization of the model (CT) to the partial view (US) is performed using ICP with annealed filtering. The first obtained local minimum is then used to initialize a refinement step that maps the partial view to the model. One femur phantom was imaged both in a water bath using a calibrated 3D ultrasound probe and by CT. For each of the ten US acquisitions (five in the Anterior neck A, and five in the Posterior neck P), the CT scan is brought by means of fiducials pair-point matching. The initialization step improves ICP successful registrations from (A:25%, P:21%) to (A:76%, to P:52%) and the registration takes about 3s in average whilst ICP takes about 1s. [1] A. Mozes. 3D A-Mode Ultrasound Calibration and Registration of the Tibia and Femur for Computer- Assisted Robotic Surgery. PhD thesis, The University of Miami 2008.

15:00-17:10, Paper WePT2.20
Accurate Depth-Map Refinement by Per-Pixel Plane Fitting for Stereo Vision
Yokozuka, Masashi	National Inst. of Advanced Industrial Science and Tech
Tomita, Kohji	AIST
Matsumoto, Osamu	National Inst. of Advanced Industrial Science and Tech
Banno, Atsuhiko	National Inst. of Advanced Industrial Science and Tech
Keywords: 3D shape recovery, Stereo and multiple view geometry Abstract: This paper discusses refinement of sparse and noisy depth maps for improving stereo measurement. Our method functions as a post filter for stereo measurement to remove outliers and to interpolate depth of invalid pixels. Per-pixel plane fitting is employed in order to estimate normals of an object surface in a depth map. The normals provide information about interpolation of depth and removal of outliers by evaluating direction of surfaces. In the experiments, our method successfully reconstructed a dense and accurate geometry from a noisy and sparse depth map even with several dozen percent outliers and a few percent density from original correct geometry. This result suggests a novel methodology of fast stereo measurement because dense reconstruction can be done without stereo matching for the whole pixels.

15:00-17:10, Paper WePT2.21
Back to Butterworth - a Fourier Basis for 3D Surface Relief Hole Filling within RGB-D Imagery
Atapour-Abarghouei, Amir	Durham Univ
Payen de La Garanderie, Gregoire	Durham Univ
Breckon, Toby	Durham Univ
Keywords: 3D shape recovery, Texture and color analysis, Inpainting and Superimposing Abstract: We address the problem of hole filling in RGB-D (color and depth) images, obtained from either active or stereo based sensing, for the purposes of object removal and missing depth estimation. This is performed independently on the low frequency depth information (surface shape) and the high frequency depth detail (relief) by way of a Fourier space transform and classical Butterworth high/low pass filtering. The high frequency detail is then filled using a texture synthesis method, whilst the low frequency shape information is inpainted using structural inpainting. Here, a classical non-parametric sampling approach is extended, using the concept of query expansion, to perform high frequency depth synthesis with the final output then recombined in Fourier space. In order to improve the overall depth relief (D) and edge detail accuracy, color information (RGB) is also used to constrain the sampling process within high frequency component completion. Experimental results demonstrate the efficacy of the proposed method outperforming prior work for generalized depth filling in the presence of high frequency surface relief detail.

15:00-17:10, Paper WePT2.22
Event Recognition in Photo Albums Using Probabilistic Graphical Model and Feature Relevance
Siham, Bacha	Univ. of Blida
Allili, Mohand Said	Univ. Du Québec En Outaouais
Benblidia, Nadjia	Univ. Saad Dahlab
Keywords: Image based modeling, Multimedia analysis, indexing and retrieval Abstract: The exponential use of digital cameras has raised a new problem: how to store/retrieve images/albums in very large photo databases that correspond to special events. In this paper, we propose a new probabilistic graphical model (PGM) to recognize events in photo albums stored by users. The PGM combines high-level image features consisting of scenes and objects detected in images. To consider the discriminative power of features, our model integrates the object/scene relevance for more precise prediction of semantic events in photo albums. Experimental results carried out on the

15:00-17:10, Paper WePT2.23
BASS: Boundary-Aware Superpixel Segmentation
Rubio, Antonio	Inst. De Robòtica I Informàtica Industrial (IRI-UPC) - Wide E
Yu, Longlong	Wide Eyes Tech
Simo-Serra, Edgar	Waseda Univ
Moreno-Noguer, Francesc	CSIC-UPC
Keywords: Low-level vision, Classification and clustering Abstract: We propose a new superpixel algorithm based on exploiting the boundary information of an image, as objects in images can generally be described by their boundaries. Our proposed approach initially estimates the boundaries and uses them to place superpixel seeds in the areas in which they are more dense. Afterwards, we minimize an energy function in order to expand the seeds into full superpixels. In addition to standard terms such as color consistency and compactness, we propose using the geodesic distance which concentrates small superpixels in regions of the image with more information, while letting larger superpixels cover more homogeneous regions. By both improving the initialization using the boundaries and coherency of the superpixels with geodesic distances, we are able to maintain the coherency of the image structure with fewer superpixels than other approaches. We show the resulting algorithm to yield smaller Variation of Information metrics in seven different datasets while maintaining Undersegmentation Error values similar to the state-of-the-art methods.

15:00-17:10, Paper WePT2.24
Moving Object Detection for Vehicle Tracking in Wide Area Motion Imagery Using 4D Filtering
Palaniappan, Kannappan	Univ. of Missouri
Poostchi, Mahdieh	Univ. of Missouri-Columbia
Aliakbarpour, Hadi	Univ. of Missouri
Viguier, Raphael	Univ. of Missouri
Fraser, Joshua	Univ. of Missouri-Columbia
Bunyak, Filiz	Univ. of Missouri
Basharat, Arslan	Kitware Inc
Suddarth, Steve	Univ. of Missouri
Blasch, E	Air Force Res. Lab
Rao, Raghuveer	Army Res. Lab
Seetharaman, Guna	AFRL/RITB
Attachments: Supplementary material Keywords: Motion, tracking and video analysis, 2D/3D object detection and recognition, Reconstruction and camera motion estimation Abstract: Most Wide Area Motion Imagery (WAMI) based trackers use motion based cueing for detecting and tracking moving objects. The results are very high false alarm rates in urban environments with tall structures due to parallax effects. This paper proposes an accurate moving object detection method using a precise orthorectification approach for ground stabilization combined with accurate multiview depth maps for reducing the number of false positives induced by parallax effects by 90 percent. Proposed hybrid moving vehicle detection approach for large scale aerial urban imagery is based on fusion of motion detection mask obtained from median-based background subtraction and tall structures height mask provided by image depth map information. Using Building's height map, we are able to improve the object level detection accuracy in terms of F-measure by almost 57 percent from 22.2 percent to 79.2 percent.

15:00-17:10, Paper WePT2.25
A Robust Flash Image Shadow Detection Method and Seamless Recovery of Shadow Regions
Swami, Kunal	Samsung R&D Inst. India - Bangalore
Das, Saikat Kumar	Samsung R&D Inst. India, Bangalore
Khandelwal, Gaurav	Samsung R&D Inst. Bangalore
Vijayvargiya, Ajay	Samsung R&D Insitute
Keywords: Occlusion and shadow detection, Image and video analysis and understanding, Computational photography Abstract: In low or moderate ambient light conditions, flash image is sharp and less noisy as compared to no-flash image; however, it contains unwanted sharp shadows at silhouettes. In our previous work, we proposed a new flash image shadow detection method which utilizes a pair of flash/no-flash images to detect sharp shadows at silhouettes; however, the method often leads to false shadow detection due its inability to effectively discriminate dark background areas. In this paper, we present an enhanced shadow detection algorithm which shows significant improvement in detection results. A seamless cloning method is used to fill detected shadow regions using no-flash image information. A quantitative as well as qualitative comparison of the proposed shadow detection method on our flash/no-flash image dataset shows that it outperforms current state-of-the-art shadow detection algorithms, both in terms of accuracy as well as speed. The qualitative results of proposed shadow recovery technique show that it generates good quality resultant flash images without sharp shadows.

15:00-17:10, Paper WePT2.26
Object Figure-Ground Segmentation Using Zero-Shot Learning
Naha, Shujon	Univ. of Manitoba
Wang, Yang	Univ. of Manitoba
Keywords: Perceptual organization, Transfer learning Abstract: We consider the problem of object figure-ground segmentation when the object categories are not available during training (i.e. zero-shot). During training, we learn standard segmentation models for a handful of object categories (called ``source objects'') using existing semantic segmentation datasets. During testing, we are given images of objects (called ``target objects'') that are unseen during training. Our goal is to segment the target objects from the background. Our method learns to transfer the knowledge from the source objects to the target objects. Our experimental results demonstrate the effectiveness of our approach.

15:00-17:10, Paper WePT2.27
Visual Odometry Driven Online Calibration for Monocular LiDAR-Camera Systems
Chien, Hsiang-Jen	Auckland Univ. of Tech
Klette, Reinhard	Auckland Univ. of Tech
Schneider, Nick	Daimler AG
Franke, Uwe	Daimler AG
Keywords: Reconstruction and camera motion estimation, Stereo and multiple view geometry Abstract: Recently LiDAR-camera systems have rapidly emerged in many applications. The integration of laser range-finding technologies into existing vision systems enables a more comprehensive understanding of 3D structure of the environment. The advantage, however, relies on a good geometrical calibration between the LiDAR and image sensors. In this paper we consider visual odometry, an extensively studied discipline in computer vision and robotics, in the context of recently emerging online sensory calibration studies. We point out that, by embedding the online calibration problem into a monocular visual odometry technique, the temporal change of extrinsic parameters can be tracked and compensated effectively.

15:00-17:10, Paper WePT2.28
Using Local Convexities As Anchor Points for 3D Curve Skeletonization
Serino, Luca	CNR
Sanniti di Baja, Gabriella	CNR
Keywords: Representation and analysis in pixel/voxel images, 2D/3D object detection and recognition, Dimensionality reduction and manifold learning Abstract: A new algorithm is introduced to compute the curve skeleton of 3D objects by using the notion of local convexity. The centers of maximal balls detected on the distance transform of the object are filtered to select as anchor points only those located on sharp local convexities of the object’s boundary. Then, the skeleton is obtained by means of topology preserving removal operations. Pruning is finally accomplished to remove from the skeleton scarcely significant peripheral branches.

15:00-17:10, Paper WePT2.29
Reflectance-Aware Optical Flow
Dahlan, Hadi Affendy	Univ. of York
Hancock, Edwin	Univ. of York
Smith, William	Univ. of York
Keywords: Signal, image and video processing, Vision for graphics Abstract: In this paper, we present a reflectance-aware optical flow technique for nonrigidly aligning images captured in different lighting environments. We consider alignment between two specific lighting conditions of particular relevance to object capture using a light stage. The technique uses optical flow combined with three image transformation functions, namely a) the Illumination-Independent image transformation, b) the image colour transformation and c) the specular-invariant projection. We explore two types of photometric image alignment problem, i) aligning images captured with different spherical gradient patterns but the same types of light source and ii) aligning the spherical gradient sequence to an image captured with a different light source. Our results show that a previously proposed method can accurately solve the first problem and we propose a model-based approach to solving the second.


WePT3	Poster Session Hall
WeP3	Poster Session

15:00-17:10, Paper WePT3.1
Spontaneous Speech Emotion Recognition Using Prior Knowledge
Chakraborty, Rupayan	5G6, Yantra Park, Pokhran Road No 2
Pandharipande, Meghna	Innovation Labs-Mumbai, TCS
Kopparapu, Sunil Kumar	Innovation Labs-Mumbai, TCS
Keywords: Audio and acoustic processing and analysis, Affective computing Abstract: Automatic and spontaneous speech emotion recognition is an important part of a human-computer interactive system. However, emotion identification in spontaneous speech is difficult because most often the emotion expressed by the speaker are not necessarily as prominent as in acted speech. In this paper, we propose a spontaneous speech emotion recognition framework that makes use of the associated knowledge. The framework is motivated by the observation that there is significant disagreement amongst human annotators when they annotate spontaneous speech; the disagreement largely reduces when they are provided with additional knowledge related to the conversation. The proposed framework makes use of the contexts (derived from linguistic contents) and the knowledge regarding the time lapse of the spoken utterances in the context of an audio call to reliably recognize the current emotion of the speaker in spontaneous audio conversations. Our experimental results demonstrate that there is a significant improvement in the performance of spontaneous speech emotion recognition using the proposed framework.

15:00-17:10, Paper WePT3.2
Text-Independent Voice Conversion Using Deep Neural Network Based Phonetic Level Features
Zheng, Huadi	Sun Yat-Sen Univ
Cai, Weicheng	Sun Yat-Sen Univ
Zhou, Tianyan	Sun Yat-Sen Univ
Zhang, Shilei	IBM Res
Li, Ming	Sun Yat-Sen Univ
Keywords: Audio and acoustic processing and analysis, Automatic speech and speaker recognition Abstract: This paper presents a phonetically-aware joint density Gaussian mixture model (JD-GMM) framework for voice conversion that no longer requires parallel data from source speaker at the training stage. Considering that the phonetic level features contain text information which should be preserved in the conversion task, we propose a method that only concatenates phonetic discriminant features and spectral features extracted from the same target speaker’s speech to train a JD-GMM. After the mapping relationship of these two features is trained, we can use phonetic discriminant features from source speaker to estimate target speaker's spectral features at conversion stage. The phonetic discriminant features are extracted using PCA from the output layer of a deep neural network (DNN) in an automatic speaker recognition (ASR) system. It can be seen as a low dimensional representation of the senone posteriors. We compare the proposed phonetically-aware method with conventional JD-GMM method on the Voice Conversion Challenge 2016 training database. The experimental results show that our proposed phonetically-aware feature method can obtain similar performance compared to the conventional JD-GMM in the case of using only target speech as training data.

15:00-17:10, Paper WePT3.3
Wake-Up-Word Spotting Using End-To-End Deep Neural Network System
Zhang, Shilei	IBM Res
Wen, Liu	IBM China Res. Lab
Qin, Yong	IBM Res. - China
Keywords: Audio and acoustic processing and analysis, Automatic speech and speaker recognition, Spoken language processing Abstract: Deep neural networks (DNNs) have tremendously improved the performance of automatic speech recognition (ASR). On the other hand, end-to-end speech recognition system can achieve state-of-the-art performance using Long Short-Term Memory (LSTM) recurrent neural networks (RNNs) and Connectionist Temporal Classification (CTC) method for unsegmented sequence data. In this paper, we therefor propose a lightweight wake-up-word (WUW) spotting system based on end-to-end DNN architecture, which is intended to provide a great balance between decoding speed, accuracy and model size. The objective is to introduce CTC framework on spotting process, and to enhance the system by WUW-oriented model training and refinement steps. We test the performance of the proposed architecture on a conversational telephone dataset which illustrate that the computation time can be significantly reduced without a significant decrease in the spotting accuracy.

15:00-17:10, Paper WePT3.4
StableFlow: A Novel Real-Time Method for Digital Video Stabilization
Ahmed, Abdelrahman	Memorial Univ. of Newfoundland
Shehata, Mohamed	Memorial Univ. of Newfoundland
Keywords: Coding, compression and super-resolution, Signal, image and video processing, Enhancement, restoration and filtering Abstract: Digital video stabilization is crucial in many applications such as object detection and tracking. It has been studied for decades yielding an extensive amount of literature in the field, however, current approaches suffer from either being computationally expensive or under-performing in terms of visual quality . In this paper, we present StableFlow, a novel real-time method that was inspired by the mass-spring-damper physical model. In StableFlow, a video frame is modelled as a mass suspended in each direction by a critically dampened spring and damper which can be fine-tuned to adapt with different shaking patterns. The proposed method is tested on video sequences that have different types of shakiness and diverse video contents. The obtained results are then compared to current stateof- the-art stabilization algorithms including Youtube stabilization and it is found that the proposed method significantly outperforms other algorithms in terms of visual quality while performing in real time.

15:00-17:10, Paper WePT3.4
Unmixing Three Types of Lung Sounds by Convex Optimization
Sakai, Tomoya	Nagasaki Univ
Miyahara, Sueharu	Nagasaki Univ
Kiyasu, Senya	Nagasaki Univ
Keywords: Audio and acoustic processing and analysis, Computer-aided detection and diagnosis, Medical image and signal analysis Abstract: We present a convex optimization technique for unmixing lung sounds to improve computer-aided pulmonary auscultation. An auscultatory sound of a patient with pulmonary disorder may be composed of continuous and discontinuous adventitious sounds as well as breath. Our technique exploits sparse and low-rank properties of these sounds in the Fourier, wavelet, and time-frequency domains, which can be quantified as convex functions. The optimization algorithm is derived from the alternating direction method of multipliers (ADMM). This approach enables the lung sound unmixing without training data for learning diverse structures of lung sounds in time-frequency domains. We show some experimental examples and discuss further improvements.

15:00-17:10, Paper WePT3.5
Rapid Feature Space MLLR Speaker Adaptation for Deep Neural Network Acoustic Modeling
Zhang, Shilei	IBM Res
Qin, Yong	IBM Res. - China
Keywords: Automatic speech and speaker recognition, Audio and acoustic processing and analysis Abstract: Bilinear models based feature space Maximum Likelihood Linear Regression (FMLLR) speaker adaptation have showed good performance for GMM-HMMs especially when the amount of adaptation data is limited. In this paper, we propose using bilinear models feature as inputs to deep neural networks (DNNs) for rapid speaker adaptation of acoustic modeling to facilitate utterance-level normalization. The effectiveness of the proposed method is demonstrated with experiments on the Mandarin short message dictation and voice query dataset.

15:00-17:10, Paper WePT3.6
Multilingual Articulatory Features Augmentation Learning
Zhao, Yue	Minzu Univ. of China
Zhao, Rui	Minzu Univ. of China
Wang, Xiaoyang	RPI
Ji, Qiang	RPI
Keywords: Automatic speech and speaker recognition, Coding, compression and super-resolution, Deep learning Abstract: Articulatory features are used as an universal set of speech attributes shared across many different languages. Some multilingual and cross-language speech recognition systems using articulatory features have been shown to improve the performance. The existing articulatory features are defined by phonetician as a set of articulatory descriptions of phones, which represent some semantic information explaining how humans produce speech sounds via the interaction of different physiological structures. But these manually specified attributes suffer from the incomplete capturing articulation information of all languages and are not distinctive enough for accurate monolingual and multilingual phoneme recognition. In this paper, we are solving the problem of a more complete set of articulatory features representation by sparse coding methods. We learned the latent attributes that sparsely represent more speech articulation information sharing between English and Tibetan languages. Models based on the concatenated semantic and latent speech attributes performed the better accuracy over the existing methods in our experiments for English-Tibetan bilingual phone recognition.

15:00-17:10, Paper WePT3.7
Speaker-Aware Multi-Task Learning for Automatic Speech Recognition
Pironkov, Gueorgui	Univ. of Mons
Dupont, Stéphane	Univ. of Mons
Dutoit, Thierry	Univ. of Mons
Keywords: Automatic speech and speaker recognition, Deep learning Abstract: Overfitting is a commonly met issue in automatic speech recognition and is especially impacting when the amount of training data is limited. In order to address this problem, this article investigates acoustic modeling through Multi-Task Learning, with two speaker-related auxiliary tasks. Multi-Task Learning is a regularization method which aims at improving the network's generalization ability, by training a unique model to solve several different, but related tasks. In this article, two auxiliary tasks are jointly examined. On the one hand, we consider speaker classification as an auxiliary task by training the acoustic model to recognize the speaker, or find the closest one inside the training set. On the other hand, the acoustic model is also trained to extract i-vectors from the standard acoustic features. I-Vectors are efficiently applied in the speaker identification community in order to characterize a speaker and its acoustic environment. The core idea of using these auxiliary tasks is to give the network an additional inter-speaker awareness, and thus, reduce overfitting. We investigate this Multi-Task Learning setup on the TIMIT database, while the acoustic modeling is performed using a Recurrent Neural Network with Long Short-Term Memory cells.

15:00-17:10, Paper WePT3.8
Application of Pronunciation Knowledge on Phoneme Recognition by LSTM Neural Network
Bo, Zhang	Nankai Univ
Yuqin, Gan	Department of Computer Science and Information Security, Nankai
Yan, Song	Department of Computer Science and Information Security, Nankai
Tang, Benlai	Department of Computer Science and Information Security, Nankai
Keywords: Automatic speech and speaker recognition, Spoken language processing, Artificial neural networks Abstract: Abstract--- When applied for phoneme recognition, the Connectionist Temporal Classification (CTC) objective function allows a neural network to be trained with the phoneme level transcriptions of training utterances. A limitation of the CTC is that it can not be applied directly for network training with large speech corpora, since those corpora usually only have word level transcriptions. This work extends the CTC such that a novel objective function can be evaluated even if only the word level transcriptions are available. Furthermore, various pronunciation knowledge is adopted to construct pronunciation networks which can model the pronunciations of connected speech more accurately. When combined with a bidirectional Long Short-term Memory (LSTM) network, the extended CTC achieves a phoneme error rate of 18.3% on the LibriSpeech corpus. When various pronunciation knowledge is applied, the error rate is further reduced by 18.6% relatively.

15:00-17:10, Paper WePT3.9
Cross-Scenario Clothing Retrieval and Fine-Grained Style Recognition
Li, Zongmin	Univ. of Petroleum
Li, Yante	China Univ. of Petroleum (East China)
Tian, Weiwei	China Univ. of Petroleum(East China)
Pang, Yunping	China Univ. of Petroleum
Liu, Yujie	China Univ. of Petroleum (Huadong)
Keywords: Multimedia analysis, indexing and retrieval, Classification and clustering Abstract: In this paper, we propose a new approach for cross-scenario clothing retrieval and fine-grained clothing style recognition. The query clothing photos captured by cameras or other mobile devices are filled with noisy background while the product clothing images online for shopping are usually presented in a pure environment. We tackle this problem by two steps. Firstly, a hierarchical super-pixel merging algorithm based on semantic segmentation is proposed to obtain the intact query clothing item. Secondly, aiming at solving the problem of clothing style recognition in different scenarios, we propose sparse coding based on domain-adaptive dictionary learning to improve the accuracy of the classifier and adaptability of the dictionary. In this way, we obtain fine-grained attributes of the clothing items and use the attributes matching score to re-rank the retrieval results further. The experiment results show that our method outperforms the state-of-the-art approaches. Furthermore, we build a well labeled clothing dataset, where the images are selected from 1.5 billion product clothing images. Index Terms—clothing retrieval, semantic segmentation, superpixel segmentation, cross-scenario, deep learning, domainadaptive dictionary learning.

15:00-17:10, Paper WePT3.10
Exploiting Supervised Learning for Finetuning Deep CNNs in Content Based Image Retrieval
Tzelepi, Maria	Department of Informatics, Aristotle Univ. of Thessaloniki
Tefas, Anastasios	Aristotle Univ. of Thessaloniki
Keywords: Multimedia analysis, indexing and retrieval, Deep learning, Artificial neural networks Abstract: In this paper a novel CNN-based approach in the Content Based Image Retrieval domain that exploits supervised learning is proposed. We employ a deep CNN model to derive feature representations from the activations of the deepest layers and we refine the weights of the utilized layers in order to produce better image descriptors using information obtained from the available data labels. To this end, we adapt the pretrained model and we retrain it on the dataset so that each image representation comes closer in terms of Euclidean distance to its nearest relevant representations and moves away from the irrelevant ones. Experimental results on four publicly available datasets for image retrieval denote the effectiveness of the proposed method in enhancing the retrieval performance, outperforming other CNN-based retrieval techniques in three out of four datasets, as well as traditional hand-crafted approaches.

15:00-17:10, Paper WePT3.11
Automatic Video Description Generation Via LSTM with Joint Two-Stream Encoding
Zhang, Chenyang	The City Coll. of New York
Tian, Ying-li	Ccny, Cuny
Keywords: Multimedia analysis, indexing and retrieval, Image and video analysis and understanding, Deep learning Abstract: In this paper, we propose a novel two-stream framework based on combinational deep neural networks. The framework is mainly composed of two components: one is a parallel two-stream encoding component which learns video encoding from multiple sources using 3D convolutional neural networks and the other is a long-short-term-memory (LSTM)-based decoding language model which transfers the input encoded video representations to text descriptions. The merits of our proposed model are: 1) It extracts both temporal and spatial features by exploring the usage of 3D convolutional networks on both raw RGB frames and motion history images. 2) Our model can dynamically tune the weights of different feature channels since the network is trained end-to-end from learning combinational encoding of multiple features to LSTM-based language model. Our model is evaluated on three public video description datasets: one YouTube clips dataset (Microsoft Video Description Corpus) and two large movie description datasets (MPII Corpus and Montreal Video Annotation Dataset) and achieves comparable or better performance than the state-of-the-art approaches in video caption generation.

15:00-17:10, Paper WePT3.12
Mutli-Channel Micro-Structure Difference Descriptor for Image Retrieval
Wang, Xuekuan	Tongji Univ
Zhao, Cairong	Tongji Univ
Keywords: Multimedia analysis, indexing and retrieval, Low-level vision, Representation and analysis in pixel/voxel images Abstract: This paper presents a novel image feature representation method, called multi-channel micro-structure difference descriptor (MCMSDD) for image retrieval. With the local feature extraction from a micro-structure and MAX operator, MCMSDD integrates the advantages of multi-channel local binary encoding and color difference histogram , which are the fusion of color, texture and spatial distribution information. Although it extracts feature from full color image, the dimension of the feature vector is relatively low without learning and segmentation. To improve the performance of retrieval, a simple re-ranking algorithm is employed. Finally, the proposed MCMSDD is extensively tested on Corel-2K and Washington datasets, and the experimental results show that the proposed MCMSDD is more effective than the state-of-the-art.

15:00-17:10, Paper WePT3.13
HOOFR: An Enhanced Bio-Inspired Feature Extractor
Nguyen, Dai-Duong	Paris Sud Univ. - Paris Saclay Univ
Elouardi, Abdelhafid	Paris Sud Univ
Aldea, Emanuel	Paris Sud Univ. Paris Saclay Univ
Bouaziz, Samir	Paris Sud Univ
Keywords: Segmentation, features and descriptors, Vision for robotics, Biologically motivated vision Abstract: Feature matching plays an important role in many computer vision applications, such as object recognition, scene reconstruction or image mosaicing. In this paper, we propose an algorithm called Hessian ORB - Overlapped FREAK (HOOFR) which is based on the combination of the ORB detector and the FREAK bio-inspired descriptor. We address some modifications related to the detection and the description processes in order to enhance HOOFR reliability, speed and memory fingerprint. The experiments on a widely used dataset demonstrate the considerable performance of HOOFR compared to SIFT, SURF or ORB in terms of the execution time and the matching quality, in various matching contexts.

15:00-17:10, Paper WePT3.13
3D Sketch-Based 3D Model Retrieval with Convolutional Neural Network
Yuxiang, Ye	Texas State Univ
Li, Bo	Univ. of Southern Mississippi
Lu, Yijuan	Texas State Univ
Keywords: Multimedia analysis, indexing and retrieval, Pattern Recognition for Search, Retrieval and Visualization, 2D/3D object detection and recognition Abstract: 3D sketch-based 3D model retrieval is to retrieve similar 3D models using users' hand-drawn 3D sketches as input. Compared with traditional 2D sketch-based retrieval, 3D sketch-based 3D model retrieval is a brand new and challenging research topic. In this paper, we employ advanced deep learning method and propose a novel 3D sketch based 3D model retrieval system. Our system has been comprehensively tested on two benchmark datasets and compared with other existing 3D model retrieval algorithms. The experimental results reveal our approach outperforms other competing state-of-the-arts and demonstrate promising potential of our approach on 3D sketch based applications.

15:00-17:10, Paper WePT3.14
Smart Query Expansion Scheme for CDVS Based on Illumination and Key Features
Lu, Tao	Peking Univ
Zhu, Chuang	Peking Univ
Jia, Huizhu	Peking Univ
Duan, Lingyu	Inst. of Digital Media, School of EE & CS, Peking Univ
Tao, Li	Peking Univ
Song, Jiawen	Peking Univ
Xie, Xiaodong	Peking Univ
Gao, Wen	PeKing Univ
Keywords: Multimedia analysis, indexing and retrieval, Pattern Recognition for Search, Retrieval and Visualization, Segmentation, features and descriptors Abstract: Abstract—Given a query image, retrieving images depicting the same object in a large scale database is becoming an urgent and challenging task. Recently, Compact Description for Visual Search (CDVS) is drafted by the ISO/IEC Moving Pictures Experts Group (MPEG) to support image retrieval applications, and it has been published as an international standard. Unfortunately, with regard to applications with hugely mutative illumination, perspective and noisy background, CDVS suffers from an inevitable performance loss. In this paper, firstly we introduce the query expansion to address performance loss caused by the scene complexity in CDVS. Secondly, a query expansion instance selection method based on illumination is proposed, which achieves better performance. Thirdly, we adopt a key feature matching score based weighted strategy in basic query expansion to improve retrieval performance. We evaluate our proposed methods on the Oxford (5K images) dataset and a reality traffic vehicle dataset (12K images), and the result shows that the proposed methods boost mean average precision (MAP) by 7% ~ 10% in Oxford dataset and 7% ~17% in vehicle dataset.

15:00-17:10, Paper WePT3.15
Story Segmentation in TV News Broadcast Videos
Kannao, Raghvendra	IIT Guwahati
Guha, Prithwijit	Department of EEE, IIT Guwahati
Keywords: Multimedia analysis, indexing and retrieval, Segmentation, features and descriptors, Image and video analysis and understanding Abstract: Segmentation of TV news broadcast into semantically meaningful stories is an essential pre-requisite for a wide range of video analytics applications. In this work, we have introduced a hybrid approach for news story segmentation based on conditional random fields (CRFs). The story boundary detection problem is converted into a shot classification problem by classifying video shots into either of the four categories. These are start shot, end shot and middle shots of a story or single shot story. To achieve this classification, we have introduced two new features. These are overlay text based semantic similarity and grid-wise edge orientation histogram. The first feature measures the semantic similarity between video shots by linking them through a set of web news articles. We use overlay text with their relevance as a weight to link a set of articles with the video shots. The second feature captures the variations in presentation formats. The CRF model effectively combines these two features to model the news stories. Experimental results on approximately 50 hours of news videos demonstrate the efficiency of the proposed features. We were able to achieve an F1 score of 81% with our proposed features.

15:00-17:10, Paper WePT3.16
Optimizing Top Precision Performance Measure of Content-Based Image Retrieval by Learning Similarity Function
Liang, Ru-Ze	KAUST
Shi, Lihui	Centerfield Corp
Wang, Haoxiang	Ithaca, NY 14850
Meng, Jiandong	Shandong Medical Coll
Wang, Jim Jing-Yan	Univ. at Buffalo, the State Univ. of New York
Sun, Qingquan	California State Univ. San Bernardino
Gu, Yi	Travelers Canada
Keywords: Multimedia analysis, indexing and retrieval, Semi-supervised learning and spectral methods, Classification and clustering Abstract: In this paper we study the problem of content-based image retrieval. In this problem, the most popular performance measure is the top precision measure, and the most important component of a retrieval system is the similarity function used to compare a query image against a database image. However, up to now, there is no existing similarity learning method proposed to optimize the top precision measure. To fill this gap, in this paper, we propose a novel similarity learning method to maximize the top precision measure. We model this problem as a minimization problem with an objective function as the combination of the losses of the relevant images ranked behind the top-ranked irrelevant image, and the squared Frobenius norm of the similarity function parameter. This minimization problem is solved as a quadratic programming problem. The experiments over two benchmark data sets show the advantages of the proposed method over other similarity learning methods when the top precision is used as the performance measure.

15:00-17:10, Paper WePT3.19
Video Summarization in a Multi-View Camera Network
Panda, Rameswar	Univ. of California, Riverside
Das, Abir	Univ. of Massachusetts, Lowell
Roy-chowdhury, Amit	Univ. of California, Riverside
Keywords: Image and video analysis and understanding, Multimedia analysis, indexing and retrieval, Signal, image and video processing Abstract: While most existing video summarization approaches aim to extract an informative summary of a single video, we propose a novel framework for summarizing multi-view videos by exploiting both intra- and inter-view content correlations in a joint embedding space. We learn the embedding by minimizing an objective function that has two terms: one due to intra-view correlations and another due to inter-view correlations across the multiple views. The solution can be obtained directly by solving one Eigen value problem that is linear in the number of multiview videos. We then employ a sparse representative selection approach over the learned embedding space to summarize the multi-view videos. Experimental results on several benchmark datasets demonstrate that our proposed approach clearly outperforms the state-of-the-art.


WePT4	Poster Session Hall
WeP4	Poster Session

15:00-17:10, Paper WePT4.1
Detection of Duplicate Identities in Streams of Biometric Samples: A Gait-Based Case Study
Ortells, Javier	Univ. Jaume I De Castelló
Mollineda, Ramón A.	Univ. Jaume I
Keywords: Biometric systems and applications Abstract: This paper addresses the problem of determining whether an observed subject has already been seen in a stream of biometric samples. Given a new sample, unlike the common practice of comparing a related match score to a constant threshold, this work introduces a function which takes as input the match score and the position of that sample in the stream, and produces as output a duplicate/non-duplicate decision. The rationale behind the proposal is the expected dependence of the reliability of a match score on the amount and quality of the enrolled identities. The position plays the role of a (very basic) context to properly judge a related match score. Experiments were designed on two well-known gait databases under three types of sample arrangement, which recreate scenarios of different complexity levels. Non-parametric statistical tests applied on detection results showed the superiority of the proposed adaptive approach over threshold-based solutions.

15:00-17:10, Paper WePT4.2
Analyzing Features Learned for Offline Signature Verification Using Deep CNNs
Hafemann, Luiz Gustavo	École De Tech. Supérieure
Sabourin, Robert	École De Tech. Supérieure
Oliveira, Luiz	Federal Univ. of Parana
Keywords: Biometric systems and applications, Deep learning, Other Biometric applications Abstract: Research on Offline Handwritten Signature Verification explored a large variety of handcrafted feature extractors, ranging from graphology, texture descriptors to interest points. In spite of advancements in the last decades, performance of such systems is still far from optimal when we test the systems against skilled forgeries - signature forgeries that target a particular individual. In previous research, we proposed a formulation of the problem to learn features from data (signature images) in a Writer-Independent format, using Deep Convolutional Neural Networks (CNNs), seeking to improve performance on the task. In this research, we push further the performance of such method, exploring a range of architectures, and obtaining a large improvement in state-of-the-art performance on the GPDS dataset, the largest publicly available dataset on the task. In the GPDS-160 dataset, we obtained an Equal Error Rate of 2.74%, compared to 6.97% in the best result published in literature (that used a combination of multiple classifiers). We also present a visual analysis of the feature space learned by the model, and an analysis of the errors made by the classifier. Our analysis shows that the model is very effective in separating signatures that have a different global appearance, while being particularly vulnerable to forgeries that very closely resemble genuine signatures, even if their line quality is bad, which is the case of slowly-traced forgeries.

15:00-17:10, Paper WePT4.3
Face Spoofing Detection Based on 3D Lighting Environment Analysis of Image Pair
Zhang, Xu	Inst. of Automation Chinese Acad. of Sciences
Hu, Xiyuan	Inst. of Automation, Chinese Acad. of Sciences
Ma, Mingyang	Inst. of Automation, Chinese Acad. of Sciences
Chen, Chen	Chinese Acad. of Sciences
Peng, Silong	Inst. of Automation, Chinese Acad. of Sciences
Keywords: Biometric systems and applications, Face recognition Abstract: In this paper, we present a novel face spoofing detection method based on 3D lighting environment analysis of an image pair collected before and after the lighting environment change. Our idea is inspired from the unimpressive fact that the illumination distributions of the internal spoof face stays stable under the protection of the photo and screen plane, while that of a exposed genuine face changes accordingly to different lighting environment due to a natural response of 3D structure. After estimating two sets of lighting environment coefficients of client's face image pair with the hand of 3D Morphable Model (3DMM) and Sphere Harmonic Illumination Model (SHIM), robust liveness judgement is conducted by hypothesis tests. Experimental results show the effectiveness of proposed method on multiple kinds of face attacks including printed photo, screen photo, and video replay attack, and other advantages such as user cooperation free, loose using conditions, simple equipment demand, easy to camouflage and propitious to face recognition.

15:00-17:10, Paper WePT4.4
Fingerprint Sensor Classification Via M´elange of Handcrafted Features
Agarwal, Akshay	IIIT Delhi
Singh, Richa	IIIT Delhi
Vatsa, Mayank	IIIT Delhi
Keywords: Biometric systems and applications, Forensic biometrics and its applications Abstract: Large scale biometrics projects rely on capturing images/signal from multiple sensors. For example, in India's Aadhaar project, multiple fingerprint sensors of different make and model are used for data collection. Similarly, in law enforcement applications, different agencies use different fingerprint sensors. These scenarios cause two potential problems: (i) sensor inter-operability and (ii) protecting/recording chain of evidence. While sensor inter-operability in fingerprints is a well studied problem, automatically recording chain of evidence is a relatively less explored research problem. For both the problems, one potential approach includes automatically identifying sensors based on the input image. This paper presents a novel fingerprint sensor identification algorithm based on multiple features such as Haralick, entropy, statistical and image quality features. The proposed algorithm is evaluated on a large database with 30,000 images with 15 fingerprint sensor classes. The proposed algorithm achieves an accuracy of 96% and computationally requires less than 10 milliseconds for an image.

15:00-17:10, Paper WePT4.5
Local Multiple Directional Pattern of Palmprint Image
Fei, Lunke	Harbin Inst. of Tech
Wen, Jie	Shenzhen Graduate School, Harbin Inst. of Tech
Zhang, Zheng	Harbin Inst. of Tech
Yan, Ke	Harbin Inst. of Tech
Zhong, Zuofeng	Harbin Inst. of Tech. Shenzhen Graduate School
Keywords: Biometric systems and applications, Other Biometric applications Abstract: Lines are the most essential and discriminative features of palmprint images, which motivate researches to propose various line direction based methods for palmprint recognition. Conventional methods usually capture the only one of the most dominant direction of palmprint images. However, a number of points in palmprint images have double or even more than two dominant directions because of a plenty of crossing lines of palmprint images. In this paper, we propose a local multiple directional pattern (LMDP) to effectively characterize the multiple direction features of palmprint images. LMDP can not only exactly denote the number and positions of dominant directions but also effectively reflect the confidence of each dominant direction. Then, a simple and effective coding scheme is designed to represent the LMDP and a block-wise LMDP descriptor is used as the feature space of palmprint images in palmprint recognition. Extensive experimental results demonstrate the superiority of the LMDP over the conventional powerful descriptors and the state-of-the-art direction based methods in palmprint recognition.

15:00-17:10, Paper WePT4.5
The GIST of Aligning Faces
Yang, Siqi	The Univ. of Queensland
Wiliem, Arnold	The Univ. of Queensland
Lovell, Brian Carrington	The Univ. of Queensland
Keywords: Face recognition, Other applications, Image and video analysis and understanding Abstract: We propose a novel supervised initialization scheme for cascaded face alignment by searching nearest neighbors based on global image descriptors. Unlike existing schemes which resort to additional large training data sets for learning features, our method does not require additional training steps; thus making our method low computational. Moreover, we found that it is sufficient to use a simple low-dimensional global image descriptor that is easy to extract. In particular, in this work we use the GIST features as our global image descriptor. The proposed initialization scheme outperforms existing initialization schemes for face alignment and improves on the state-of-the-art methods on two challenging datasets, 300-W and COFW.

15:00-17:10, Paper WePT4.6
A Geometric-Based Tattoo Retrieval System
Xu, Xingpeng	Nanyang Tech. Univ
Kong, Adams	Nanyang Tech. Univ
Keywords: Forensic biometrics and its applications, Biometric systems and applications, Multi-biometrics Abstract: Various soft biometric traits have been used as hints in forensic investigation. Tattoo, as one of those soft biometric traits, has been used extensively because it is easy to be remembered and described by witnesses and appears very often among criminals and victims. Most of the tattoo retrieval systems currently used in police departments are still text-based systems. They depend on labels tagged on the tattoo images in databases. This manual labelling process is very time consuming and subject to loss of some detailed information, such as the accurate location and the shape of tattoos. These problems can make tattoo retrieval inefficiently and misleading. To address them, a geometric based tattoo retrieval system is developed. It allows witnesses to draw the boundary of a query tattoo and retrieve all the tattoos with similar shape around the particular location. The system comprises a tattoo detection algorithm, which detects tattoos from full body images, a full body coordinate algorithm, which defines locations of input tattoo boundaries and locations of tattoos in databases and a tattoo shape matching algorithm, which measures similarity between input boundaries and boundaries of tattoos in databases. The experimental results on 2188 images show the effectiveness of the proposed system.

15:00-17:10, Paper WePT4.7
Multi-Script Writer Identification Using Dissimilarity
Bertolini, Diego	UFPR
Oliveira, Luiz	Federal Univ. of Parana
Sabourin, Robert	École De Tech. Supérieure
Keywords: Forensic biometrics and its applications, Handwriting Recognition, Other applications Abstract: Multi-script writer identification consists in identifying a person of a given text written in one script from the samples of the same person written in another script. The rationale behind this is that the writing style of an individual remains constant across different scripts. While this hypothesis may hold, recent results on a multi-script writer identification competition show that classical writer-dependent classifiers fail in this task. In this work we investigate the efficacy of a writer-independent classifier based on dissimilarity for multi-script writer identification. The classifiers were trained using two different texture descriptors (LBP and LPQ). Our experiments on 475 writers of the QUWI dataset, which is composed of arabic and english samples, shows that the proposed strategy surpasses the results published in the literature by a large margin, achieving error rates similar to single-script writer identification systems.

15:00-17:10, Paper WePT4.8
Audio-Visual Biometric Recognition Via Joint Sparse Representations
Primorac, Rudi	The Univ. of Western Australia
Togneri, Roberto	The Univ. of Western Australia
Bennamoun, Mohammed	The Univ. of Western Australia
Sohel, Ferdous	Murdoch Univ
Keywords: Multi-biometrics, Face recognition, Speaker recognition Abstract: In this paper we present a novel audio-visual (AV) person identification system based on joint sparse representation. Video features used were vectorized raw pixel values, while i-vectors were used as the audio features. Classification is performed by solving the joint sparsity optimization problem, and fusion is carried out by using the quality (confidence) assigned to each matcher. Our experimental results on the challenging MOBIO database using 100 subjects show that the system based on joint sparse representation outperforms the system based on separate sparse representations for each modality. Furthermore, we show that our newly introduced quality measure improves the system’s performance, when compared to conventionally used quality measures for sparse representation - based systems.

15:00-17:10, Paper WePT4.9
On Restricting Modalities in Likelihood-Ratio Based Biometric Score Fusion
Murakami, Takao	National Inst. of Advanced Industrial Science and Tech
Kaga, Yosuke	Hitachi, Ltd
Takahashi, Kenta	Hitachi, Ltd
Keywords: Multi-biometrics, Performance Evaluation, Security issues Abstract: Likelihood-ratio based biometric score fusion (LR fusion) has attracted attention since it maximizes accuracy if a log-likelihood ratio (LLR) is accurately estimated. It can also allow a user to select a subset of modalities at the authentication phase by setting LLRs corresponding to missing query samples to 0 (we refer to LR fusion with/without this mode as selective/non-selective LR fusion). However, a recent study proposed a modality selection attack, in which an impostor inputs only query samples whose LLRs are larger than 0 (i.e. takes an optimal strategy), against selective LR fusion, and showed that it degrades overall accuracy even if a genuine user also takes this optimal strategy. In this paper, we investigate the impact of the modality selection attack in more details. Specifically, we study whether the overall accuracy is improved by eliminating ``goat'' templates, whose LLRs tend to be less than or equal to 0 for genuine users. We investigate, both theoretically and experimentally, whether this restriction of modalities (i.e. elimination of goat templates) increases the KL (Kullback-Leibler) divergence between a genuine score distribution and an impostor's one, which can be compared with password entropy. We first show a negative result that the restriction of modalities hardly increases the KL divergence in selective LR fusion. We then show that it can increase the KL divergence in non-selective LR fusion.

15:00-17:10, Paper WePT4.10
Small Scale Single Pulse ECG-Based Authentication Using GLRT That Considers T Wave Shift and Adaptive Template Update with Prior Information
Chun, Se Young	UNIST
Keywords: Other Biometric applications, Biometric systems and applications Abstract: Electrocardiogram (ECG) has been investigated as a promising biometric for the last two decades by exploiting the difference of ECG signals between people. However, it is still challenging to take ECG signal variation of one person into account. ECG of one person may vary due to person’s multiple states (e.g., tension, relax, cardio exercise) or anatomical / physiological changes of one’s heart over a long period of time (e.g., heart disease). It has been shown that these types of ECG signal variations resulted in low authentication task performance. We propose a generalized likelihood ratio test (GLRT) based authentication metric that considers T wave shift. Our proposed GLRT based method does not require to know heart rate (HR) that can not be usually obtained when using single pulse ECG. We also propose an adaptive ECG template update scheme based on penalized maximum likelihood estimator with prior information, previously obtained ECG template. Our proposed methods do not require high computation complexity and other people’s ECG information so that they can be potentially implemented in small scale devices such as low cost wearable bands with limited access to others’ ECG data. Proposed methods were evaluated with the public ECG-ID database (89 subjects) from the PhysioNet that contains varying HR and acquisitions over multiple days for some subjects. ECG Set S denotes partial data of ECG-ID that contains 2 records per subject that were collected on different sessions of the same day and ECG Set A denotes another data set including ECG Set S and additional 2 records per subject for 25 subjects that were collected on different sessions partially on different days. A classical Euclidean metric yielded 4.7% EER (equal error rate) for ECG Set S and 8.1% EER for ECG Set A. Our proposed GLRT based metric yielded improved EER over Euclidean distance: 3.9% for ECG Set S and 6.5% for ECG Set A. Proposed GLRT metric with adaptive template update achieved 4.8% EER for ECG Set A.

15:00-17:10, Paper WePT4.11
A Score Calculation Method Using Positional Information of Feature Points for Biometric Authentication
Ito, Koichi	Tohoku Univ
Aoki, Takafumi	Tohoku Univ
Keywords: Other Biometric applications, Biometric systems and applications, Other applications Abstract: A lot of feature-based correspondence matching methods have been proposed in the field of computer vision, image processing and pattern recognition. These methods are also effective for biometric recognition. In general, in the case of feature-based matching methods, the matching score is calculated as a ratio between the number of feature points and corresponding points. These methods need to normalize image deformation by fitting an image transformation model to images according to the correspondence between images. Then, the matching score is calculated from the normalized images so as to take into consideration image deformation. On the other hand, this paper proposes a score calculation method which calculates a matching score from positional information of corresponding point pairs. The proposed method does not need any deformation model defined for each biometric trait to handle image deformation. The combination of the matching scores defined by the number of corresponding points and the positional information improves the performance of biometric recognition algorithms, since these scores play a complementary role in decision. Through a set of experiments using a palmprint image database, we demonstrate that the proposed method exhibits efficient performance for biometric recognition.

15:00-17:10, Paper WePT4.12
Tattoo Detection and Localization Using Region-Based Deep Learning
Sun, Zhaohui	Kiware Inc
Baumes, Jeffrey	Kitware Inc
Tunison, Paul	Kitware, Inc
Turek, Matthew	Kitware, Inc
Hoogs, Anthony	Kitware
Keywords: Soft biometrics, Forensic biometrics and its applications, Graphics Recognition Abstract: Tattoos have been increasingly used as a discriminative soft biometric for people identification, such as criminal and victim identification in forensics investigation and law enforcement. However, automatic detection of tattoo images and accurate localization of the regions of interest are challenged by the large variations in artistic composition, color, shape, texture, location on the body, local geometric shape (e.g. neck and finger), imaging conditions, and image quality. In this paper, we train a tattoo detector from the Tatt-C and PASCAL VOC 2007 image datasets using region-based deep learning. The detector can effectively determine if an image contains tattoos and the locations of tattoo regions. We carry out a comprehensive evaluation of our tattoo image classification and detection localization. The detector improves upon the state-of-the-art algorithms in the Tatt-C challenge, achieving a better detection error trade-off curve. It yields low confidence scores on randomly sampled non-tattoo images from 397 scene categories in the MIT-SUN dataset. In addition, the same detector is also validated on the NTU Tattoo Image Dataset with 10000 images.

15:00-17:10, Paper WePT4.13
Image-Based Gender Estimation from Body and Face across Distances
Gonzalez-Sosa, Ester	Univ. Autónoma De Madrid
Antitza, Dantcheva	INRIA
Vera-Rodriguez, Ruben	Univ. Autonoma De Madrid
Dugelay, Jean-Luc	Eurécom
Bremond, Francois	INRIA (Inst. National De Recherche En Informatique Etautomat
Fierrez, Julian	Univ. Autonoma De Madrid
Keywords: Soft biometrics, Other Biometric applications, Pattern Recognition for Surveillance and Security Abstract: Gender estimation has received increased attention due to its use in a number of pertinent security and commercial applications. Automated gender estimation algorithms are mainly based on extracting representative features from face images. In this work we study gender estimation based on information deduced jointly from face and body, extracted from single-shot images. The approach addresses challenging settings such as low- resolution-images, as well as settings when faces are occluded. Specifically the face-based features include local binary patterns (LBP) and scale-invariant feature transform (SIFT) features, projected into a PCA space. The features of the novel body- based algorithm proposed in this work include continuous shape information extracted from body silhouettes and texture infor- mation retained by HOG descriptors. Support Vector Machines (SVMs) are used for classification for body and face features. We conduct experiments on images extracted from video-sequences of the Multi-Biometric Tunnel database, emphasizing on three distance-settings: close, medium and far, ranging from full body exposure (far setting) to head and shoulders exposure (close setting). The experiments suggest that while face-based gender estimation performs best in the close-distance-setting, body-based gender estimation performs best when a large part of the body is visible. Finally we present two score-level-fusion schemes of face and body-based features, outperforming the two individual modalities in most cases.

15:00-17:10, Paper WePT4.14
Retrieving Relative Soft Biometrics for Semantic Identification
Martinho-Corbishley, Daniel	Univ. of Southampton
Nixon, Mark	Univ. of Southampton
Carter, John	Univ. of Southampton
Keywords: Soft biometrics, Pattern Recognition for Surveillance and Security, Deep learning Abstract: Automatically describing pedestrians in surveillance footage is crucial to facilitate human accessible solutions for suspect identification. We aim to identify pedestrians based solely on human description, by automatically retrieving semantic attributes from surveillance images, alleviating exhaustive label annotation. This work unites a deep learning solution with relative soft biometric labels, to accurately retrieve more discriminative image attributes. We propose a Semantic Retrieval Convolutional Neural Network to investigate automatic retrieval of three soft biometric modalities, across a number of `closed-world' and `open-world' re-identification scenarios. Findings suggest that relative-continuous labels are more accurately predicted than absolute-binary and relative-binary labels, improving semantic identification in every scenario. Furthermore, we demonstrate a top rank-1 improvement of 23.2% and 26.3% over a traditional, baseline retrieval approach, in one-shot and multi-shot re-identification scenarios respectively.

15:00-17:10, Paper WePT4.15
Automated Help System for Novice Older Users from Touchscreen Gestures
Sato, Daisuke	IBM
Morimura, Tetsuro	IBM Res. - Tokyo
Katsuki, Takayuki	IBM Res. - Tokyo
Toyota, Yosuke	KDDI R&D Lab. Inc
Kato, Tsuneo	Doshisha Univ
Takagi, Hironobu	IBM Res. - Tokyo
Keywords: Human Computer Interaction, Machine learning and data mining, Reinforcement learning and temporal models Abstract: Older adults who have never used smartphone often suffers from getting used to smartphone gestures because of their lack of basic knowledge or skills with the latest technologies like gesture-oriented touchscreens. In this paper, we propose a user modeling method for inferring problems novice users face for smartphone from their touchscreen gestures. The output of user model is used by automated help enabling them to acquire touchscreen gestures. We apply a feature extraction approach based on the frequent pattern mining of gesture sequence to the user modeling. The learned user model detects types of problems in real time and is used for automated help. To optimize of instruction timing and its selection, we use a Bayesian reinforcement learning approach, which balances the exploration-exploitation trade-off. We evaluate the effectiveness of the method by using a prototype assistant system for a map application. The evaluation with older (60+) novice users showed positive results. The performance of the prototype system and the potential for further application is discussed.

15:00-17:10, Paper WePT4.16
Knowing When You Don't: Bag of Visual Words with Reject Option for Automatic Visual Inspection of Bulk Materials
Richter, Matthias	Karlsruhe Inst. of Tech
Längle, Thomas	Fraunhofer Inst. of Optronics, System Tech. and Image
Beyerer, Jürgen	Fraunhofer Inst. of Optronics, System Tech. and Image
Keywords: Industrial image analysis, Other applications, Classification and clustering Abstract: Visual inspection of bulk material is the thorough optical inspection of streams of granular material to assess their quality or to detect defective objects. Examples are found in mining (discovery of ores), recycling (sorting waste from reusable material) and food safety (detection of pathogens). In these applications, it is generally not feasible or even possible to provide an accurate and exhaustive training set of all the materials that can be encountered during the inspection. Instead, classification has to be performed in an open world setting, i.e., with the option to recognize and reject unknown objects. Despite the practical relevance, prior work on this topic is surprisingly sparse. Here, we present a method to augment bag of visual words object descriptors by an additional unknown word that encodes outliers. The method depends on only few parameters that have a clear interpretation and is suitable for the application in the field. We demonstrate the performance of our approach using two real-world datasets and compare it to a related method. The experiments show that our method significantly outperforms classification with a closed world assumption as well as the related method.

15:00-17:10, Paper WePT4.17
GPU-Accelerated Descriptor Extraction Process for 3D Registration in Augmented Reality
Garrett, Timothy	Iowa State Univ
Sheaffer, Jeremy	Department of Computer Science, Iowa State Univ
Radkowski, Rafael	Iowa State Univ
Keywords: Mixed and Augmented Reality, Segmentation, features and descriptors Abstract: Augmented Reality (AR) is a type of human-computer interaction that overlays virtual information on a user's natural visual perception of the environment. Determining where to place these virtual objects so they appear a part of the environment requires real-time tracking. One type of tracking uses point clouds, processed from commodity depth sensors, such as the Microsoft Kinect. In this format, tracking becomes a 3D registration problem which can be quickly solved using local registration methods; however, these techniques require a sufficient overlap between the object-to-track, and the sampled object points from the range camera. 3D descriptors can be used to provide this initial overlap, but require significant computational resources. This work extends previous research towards real-time descriptor computation on large point sets. To do so, the 3D descriptor extraction process is broken down into fundamental algorithmic steps. These steps are optimized by computing them on the GPU using NVIDIA's CUDA framework. Results of an experiment indicate the opportunity for large speed-ups in the most computationally intensive portions of descriptor extraction.

15:00-17:10, Paper WePT4.18
Pollen Recognition Using a Multi-Layer Hierarchical Classifier
Daood, Amar	Florida Inst. of Tech
Ribeiro, Eraldo	Florida Inst. of Tech
Bush, Mark	Florida Inst. of Tech
Keywords: Other applications, 2D/3D object detection and recognition, Segmentation, features and descriptors Abstract: We propose a method to recognize pollen grains using a two-stage classifier. First, texture classification categorizes the pollen grains into sub-groups. Then, a final classification of individual pollen types is done by segmenting the image int multiple layers of regions for each pollen image. The main novelty in our method is threefold: (1) Adopting two successive classification stages. (2) Combining hierarchical clustering and SVM algorithms to merge similar pollen types into sub-groups. (3) Adopting a layering approach prior to performing feature extraction. The combination of these aspects gives excellent results. We evaluated our method using 1,063 light-microscopy images of pollen grains from 30 species. The results show that: (1) the layering technique increases the classification rate by almost almost 7% over using the same features directly. (2) adopting two classification stage increases the classification rate by 6%. (3) the proposed system outperformed traditional techniques.

15:00-17:10, Paper WePT4.19
Bayesian Neural Networks Based Bootstrap Aggregating for Tropical Cyclone Tracks Prediction in South China Sea (withdrawn from program)
Zhu, Lei	East China Normal Univ.
Jin, Jian	East China Normal Univ.
Cannon, Alex	Univ. of British Columbia
Hsieh, William	Univ. of British Columbia

15:00-17:10, Paper WePT4.20
Topological Weighted Fisher Vectors for Person Re-Identification
Ksibi, Salma	Univ. of Sfax, BP 1173, Sfax, 3038, Tunisia
Mejdoub, Mahmoud	REGIM-Lab: Res. Groups on Intelligent Machines,
Ben Amar, Chokri	Res. Group on Intelligent Machines
Keywords: Pattern Recognition for Surveillance and Security, Signal, image and video processing, Representation and analysis in pixel/voxel images Abstract: Person re-identification is a fundamental challenging task in Computer Vision that consists on recognizing the same person across multiple potentially non-overlapping cameras. This importance is due to the important challenges that it proposes like pose, background clutter and occlusion, illumination changes and low resolution. Also, most of the existing approaches rely on brute-force matching between pedestrian local descriptors and consequently, suffer from low computational efficiency. So, to address this issues, we present a new perspective for person re-identification based on a histogram encoding scheme that assigns a global signature to each pedestrian image and thus, simplifies the matching process. The main contribution of this paper is the design of an extended weighted version of the traditional Fisher vector (FV) encoding scheme. This is achieved by incorporating the Topological location of the encoded descriptors CN, CHS and 15-d in the encoding process and then combining the obtained Topological weighted histograms in order to form our proposed descriptor. The super Fisher vector representation has improved both the rate and the speedup of the person matching process, while weighting the FV encoding scheme by the Topological weight helped out to remove the noisy and busy background clutters surrounding the pedestrians in the images. Besides, Retinex transform was applied in order to handle the problem of illumination variations. Experimental results made on three challenging datasets, the VIPeR dataset, the CUHK03 dataset and the Market-1501 dataset, prove the effectiveness of the proposed method.

15:00-17:10, Paper WePT4.21
Fast Thresholding of High Dimensional Euclidean Distances Using Binary Squaring
Röwekamp, Jan Henrik	Univ. of Hamburg
Keywords: Performance Evaluation, Other applications Abstract: This paper presents a new approximative method to threshold Euclidean distances. The procedure utilizes so called binary squares, which will be introduced and formally put into relation to conventional square computations. The proposed method can significantly speed up distance matching. The speed gain depends on several parameters, like dimension count (the higher the better), total distances and the percentage of distances outside of the desired thresholded area. In the worst case its running time degenerates to the running time of conventional square based computation. The method is essentially presented for integer coordinates, but can – to a certain degree – be applied to floating point coordinates as well. Though being an approximation method the results diverge from the correct results only by a small margin in most cases.

15:00-17:10, Paper WePT4.22
YACCLAB - yet Another Connected Components Labeling Benchmark
Grana, Costantino	Univ. Degli Studi Di Modena E Reggio Emilia
Bolelli, Federico	Univ. Degli Studi Di Modena E Reggio Emilia
Baraldi, Lorenzo	Univ. of Modena and Reggio Emilia
Vezzani, Roberto	Univ. of Modena and Reggio Emilia
Keywords: Performance Evaluation, Signal, image and video processing, Segmentation, features and descriptors Abstract: The problem of labeling the connected components (CCL) of a binary image is well-defined and several proposals have been presented in the past. Since an exact solution to the problem exists and should be mandatory provided as output, algorithms mainly differ on their execution speed. In this paper, we propose and describe YACCLAB, Yet Another Connected Components Labeling Benchmark. Together with a rich and varied dataset, YACCLAB contains an open source platform to test new proposals and to compare them with publicly available competitors. Textual and graphical outputs are automatically generated for three kinds of test, which analyze the methods from different perspectives. The fairness of the comparisons is guaranteed by running on the same system and over the same datasets. Examples of usage and the corresponding comparisons among state-of-the-art techniques are reported to confirm the potentiality of the benchmark.

15:00-17:10, Paper WePT4.23
Predicting Privileged Information for Height Estimation
Sarafianos, Nikolaos	Univ. of Houston
Nikou, Christophoros	Univ. of Ioannina
Kakadiaris, Ioannis	Univ. of Houston
Keywords: Soft biometrics, Biometric systems and applications Abstract: In this paper, we propose a novel regression-based method for employing privileged information to estimate the height using human metrology. The actual values of the anthropometric measurements are difficult to estimate accurately using state-of-the-art computer vision algorithms. Hence, we use ratios of anthropometric measurements as features. Since many anthropometric measurements are not available at test time in real-life scenarios, we employ a learning using privileged information (LUPI) framework in a regression setup. Instead of using the LUPI paradigm for regression in its original form (i.e., ε-SVR+), we train regression models that predict the privileged information at test time. The predictions are then used, along with observable features, to perform height estimation. Once the height is estimated, a mapping to classes is performed. We demonstrate that the proposed approach can estimate the height better and faster than the ε-SVR+ algorithm and report results for different genders and quartiles of humans.

15:00-17:10, Paper WePT4.24
Are Facial Attributes Adversarially Robust?
Rozsa, Andras	Univ. of Colorado at Colorado Springs
Gunther, Manuel	Univ. of Colorado at Colorado Springs
Rudd, Ethan	Univ. of Colorado at Colorado Springs
Boult, Terry	U. Colorado at Colorado Springs
Keywords: Soft biometrics, Deep learning, Facial expression recognition Abstract: Facial attributes are emerging soft biometrics that have the potential to reject non-matches, for example, based on mismatching gender. To be usable in stand-alone systems, facial attributes must be extracted from images automatically and reliably. In this paper, we propose a simple yet effective solution for automatic facial attribute extraction by training a deep convolutional neural network (DCNN) for each facial attribute separately, without using any pre-training or dataset augmentation, and we obtain new state-of-the-art facial attribute classification results on the CelebA benchmark. To test the stability of the networks, we generated adversarial images – formed by adding imperceptible non-random perturbations to original inputs which result in classification errors – via a novel fast flipping attribute (FFA) technique. We show that FFA generates more adversarial examples than other related algorithms, and that DCNNs for certain attributes are generally robust to adversarial inputs, while DCNNs for other attributes are not. This result is surprising because no DCNNs tested to date have exhibited robustness to adversarial images without explicit augmentation in the training procedure to account for adversarial examples. Finally, we introduce the concept of natural adversarial samples, i.e., images that are misclassified but can be easily turned into correctly classified images by applying small perturbations. We demonstrate that natural adversarial samples commonly occur, even within the training set, and show that many of these images remain misclassified even with additional training epochs. This phenomenon is surprising because correcting the misclassification, particularly when guided by training data, should require only a small adjustment to the DCNN parameters.


WePT5	Poster Session Hall
WeP5	Poster Session

15:00-17:10, Paper WePT5.1
A Deep Learning Architecture for Protein-Protein Interaction Article Identification
Yadav, Shweta	IIT Patna
Ekbal, Asif	IIT Patna
Saha, Sriparna	IIT Patna
Bhattacharyya, Pushpak	IIT Patna
Keywords: Classification and clustering, Deep learning, Pattern Recognition for Bioinformatics Abstract: In recent past there has been phenomenal growth in biomedical literature and health care records. Robust text mining techniques are essential in order to properly organize the documents as well as to extract relevant information. Traditional techniques for document classification focus on machine learning algorithms where learning of classifier is decided on the basis of labeled data and the features that are prominent. In this paper we focus on developing an automated technique for classifying biomedical articles containing protein-protein interaction related information against the others. Our proposed approach is based on deep neural network framework. We investigate the role of convolution neural network (CNN) and propose two model variants. We evaluate the proposed approach on the benchmark datasets of BioCreative-II Interaction Article Subtask (IAS) data sets. Effectiveness of our proposed model is evident with the significant performance gains, 2.8 % in terms of F-measure and 5 % in terms of accuracy over the traditional models.

15:00-17:10, Paper WePT5.2
An Efficient Radiographic Image Retrieval System Using Convolutional Neural Network
Chowdhury, Manish	School of Tech. and Health, KTH Royal Inst. of Tech
Rota Bulo', Samuel	Bruno Kessler Foundation
Moreno, Rodrigo	KTH Royal Inst. of Tech
Kundu, Malay Kumar	Indian Statistical Inst
Smedby, Örjan	Linköping Univ
Keywords: Content based image retrieval and data mining, Medical image and signal analysis, Deep learning Abstract: Content-Based Medical Image Retrieval (CBMIR) is an important research field in the context of medical data management. In this paper we propose a novel CBMIR system for the automatic retrieval of radiographic images. Our approach employs a Convolutional Neural Network (CNN) to obtain high level image representations that enable a coarse retrieval of images that are in correspondence to a query image. The retrieved set of images is refined via a non-parametric estimation of putative classes for the query image, which are used to filter out potential outliers in favour of more relevant images belonging to those classes. The refined set of images is finally re-ranked using Edge Histogram Descriptor, i.e. a low-level edge-based image descriptor that allows to capture finer similarities between the retrieved set of images and the query image. To improve the computational efficiency of the system, we employ dimensionality reduction via Principal Component Analysis (PCA). Experiments were carried out to evaluate the effectiveness of the proposed system on medical data from the ``Image Retrieval in Medical Applications'' (IRMA) benchmark database. The obtained results show the effectiveness of the proposed CBMIR system in the field of medical image retrieval.

15:00-17:10, Paper WePT5.3
Simultaneous Food Localization and Recognition
Bolaños, Marc	Univ. De Barcelona
Radeva, Petia	CVC
Keywords: Deep learning, Image and video analysis and understanding, Classification and clustering Abstract: The development of automatic nutrition diaries, which would allow to keep track objectively of everything we eat, could enable a whole new world of possibilities for people concerned about their nutrition patterns. With this purpose, in this paper we propose the first method for simultaneous food localization and recognition. Our method is based on two main steps, which consist in, first, produce a food activation map on the input image (i.e. heat map of probabilities) for generating bounding boxes proposals and, second, recognize each of the food types or food-related objects present in each bounding box. We demonstrate that our proposal, compared to the most similar problem nowadays - object localization, is able to obtain high precision and reasonable recall levels with only a few bounding boxes. Furthermore, we show that it is applicable to both conventional and egocentric images.

15:00-17:10, Paper WePT5.4
Semi-Supervised Learning of Anatomical Manifolds for Atlas-Based Segmentation of Medical Images
Borga, Magnus	Linköping Univ
Andersson, Thord	Linköping Univ
Dahlqvist Leinhard, Olof	Center for Medical Image Science and Visualization (CMIV), Depar
Keywords: Medical image and signal analysis Abstract: This paper presents a novel method for atlas-based segmentation of medical images. The method uses semi-supervised learning of a graph describing a manifold of anatomical variations of whole-body images, where unlabelled data are used to find a path with small deformations from the labelled atlas to the target image. The method is evaluated on 36 whole-body magnetic resonance images with manually segmented livers as ground truth. Significant improvement (p < 0.001) was obtained compared to direct atlas-based registration.

15:00-17:10, Paper WePT5.5
Barcodes for Medical Image Retrieval Using Autoencoded Radon Transform
Tizhoosh, Hamid Reza	Univ. of Waterloo
Mitcheltree, Christopher	Univ. of Waterloo
Zhu, Shujin	Nanjing Univ. of Science and Tech
Dutta, Shamak	Univ. of Waterloo
Keywords: Medical image and signal analysis, Content based image retrieval and data mining, Pattern Recognition for Search, Retrieval and Visualization Abstract: Using content-based binary codes to tag digital images has emerged as a promising retrieval technology. Recently, Radon barcodes (RBCs) have been introduced as a new binary descriptor for image search. RBCs are generated by binarization of Radon projections and by assembling them into a vector, namely the barcode. A simple local thresholding has been suggested for binarization. In this paper, we put forward the idea of “autoencoded Radon barcodes”. Using images in a training dataset, we autoencode Radon projections to perform binarization on outputs of hidden layers. We employed the mini-batch stochastic gradient descent approach for the training. Each hidden layer of the autoencoder can produce a barcode using a threshold determined based on the range of the logistic function used. The compressing capability of autoencoders apparently reduces the redundancies inherent in Radon projections leading to more accurate retrieval results. The IRMA dataset with 14,410 x-ray images is used to validate the performance of the proposed method. The experimental results, containing comparison with RBCs, SURF and BRISK, show that autoencoded Radon barcode (ARBC) has the capacity to capture important information and to learn richer representations resulting in lower retrieval errors for image retrieval measured with the accuracy of the first hit only.

15:00-17:10, Paper WePT5.6
VTA Estimation for DBS Planning Based on a Hierarchical K-Nearest Neighbor Approach
De La Pava Panche, Iván	Univ. Tecnológica de Pereira
Alvarez-Meza, Andres Marino	Univ. Nacional de Colombia
Álvarez, Mauricio A.	Univ. Tecnológica de Pereira
Henao Gallo, Oscar Alberto	Univ. Tecnológica de Pereira
Orozco Gutiérrez, Álvaro Ángel	Univ. Tecnológica de Pereira

15:00-17:10, Paper WePT5.7
Multiple Adverse Effects Prediction in Longitudinal Cancer Treatment
Li, Cheng	Deakin Univ
Gupta, Sunil Kumar	Deakin Univ
Rana, Santu	Deakin Univ
Nguyen, Vu	Deakin Univ
Venkatesh, Svetha	Deakin Univ
Ashley, David	Barwon Health Geelong Australia
Livingston, Trish	Deakin Univ
Keywords: Statistical, syntactic and structural pattern recognition, Machine learning and data mining, Classification and clustering Abstract: Adverse effects, such as voice change and fatigue, are prevalent in cancer treatment duration. These adverse effects have been significant burden for patients physically and emotionally. Predicting multiple adverse effects becomes important for patients and oncologists. In this paper, we formulate the prediction of multiple adverse effects in cancer treatment as a longitudinal multiple-output regression problem. The correlated multiple outputs are first decoupled to uncorrelated ones in a new output space. We then propose a comprehensive framework to capture the empirical loss between the predicted value and the ground truth in the transformed space and the temporal smoothness at neighboring prediction points. Experiments were performed on one synthetic data and two real-world datasets including radiotherapy and chemotherapy treatments. Results in terms of root mean square errors (RMSE) and R -value show that our proposed approach is promising for the longitudinal multiple-output regression problem.

15:00-17:10, Paper WePT5.30
Segmentation of Defects on Log Surface from Terrestrial Lidar Data
Nguyen, Van-Tho	INRA Lorraine
Kerautret, Bertrand	LORIA, Univ. De Lorraine
Debled-Rennesson, Isabelle	LORIA - Nancy Univ
Colin, Francis	LERFOB, AgroParisTech, INRA, 54000 NANCY France
Piboule, Alexandre	ONF, RDI, 5 Rue Girardet, 54000 Nancy, France
Constant, Thiéry	INRA
Keywords: Biological image and signal analysis, Segmentation, features and descriptors, 2D/3D object detection and recognition Abstract: Segmentation of defects on the tree log surface remains a challenge due to the unclear seperation between the foreground and the background and the high variability of the tree surface. Even if some first works exist to process specific tree species, a generic method robust to various species is missing. We propose a new approach for segmenting defects on log surface based on the tabular object analysis. We firstly compute the log centerline by surface normal accumulation and then threshold the point cloud by the the difference between the distance to the centerline and the reference distance estimated from a patch of neighbors. The performance of the proposed approach was experimented and compared on ten logs recovered from different species. The results showed that our approach outperformed other method based on cylinder detection and was robust to several tree species. The results can be reproduced and compared on an online demonstration.


WeBT1	G.Cancun T1.A
WePMO1	Oral Session

17:10-17:30, Paper WeBT1.1
Clustering for Point Pattern Data
Tran, Nhat-Quang	Curtin Univ
Vo, Ba-Ngu	Curtin Univ
Phung, Dinh	Deakin Univ
Vo, Ba-Tuong	Curtin Univ
Keywords: Classification and clustering, Machine learning and data mining Abstract: Clustering is one of the most common unsupervised learning tasks in machine learning and data mining. Clustering algorithms have been used in a plethora of applications across several scientific fields. However, there has been limited research in the clustering of point patterns – sets or multi-sets of unordered elements – that are found in numerous applications and data sources. In this paper, we propose two approaches for clustering point patterns. The first is a non-parametric method based on novel distances for sets. The second is a model-based approach, formulated via random finite set theory, and solved by the Expectation–Maximization algorithm. Numerical experiments show that the proposed methods perform well on both simulated and real data.

17:30-17:50, Paper WeBT1.2
Simplifying Gaussian Mixture Model Via Model Similarity
Wan, Yuchai	School of Computer Science, Beijing Inst. of Tech
Liu, Xiabi	School of Computer Science, Beijing Inst. of Tech
Tang, Yuyang	School of Computer Science, Beijing Inst. of Tech
Keywords: Classification and clustering, Machine learning and data mining, Content based image retrieval and data mining Abstract: Mixture models are crucial statistical modeling tools at the heart of many challenging applications in computer vision, pattern recognition, and etc. Simplification of mixture models has recently emerged as an important issue in the field of statistical learning. In this paper, we propose a novel Gaussian mixture model simplification approach using only the model’s parameters, avoiding the use of the original data records which may bring heavy computational and storage burden. We integrate the inter-model similarity and intra-model independence to introduce a similarity measure between two Gaussian mixture models. An objective function is further designed which aims at keeping a balance between model similarity and simplification degree and a heuristic simulated annealing method is presented to search for the optimal parameter set of the simplified model. The experimental results confirm that our approach is effective and promising.

17:50-18:10, Paper WeBT1.3
Sparse Coding with Unity Range Codes and Label Consistent Discriminative Dictionary Learning
Nilsson, Mikael	Lund Univ
Keywords: Classification and clustering, Machine learning and data mining, Semi-supervised learning and spectral methods Abstract: A novel sparse coding framework with unity range codes and the possibility to produce a discriminative dictionary is presented. The framework is, in contrast to many other works, able to handle unsupervised, supervised and semi-supervised settings. Furthermore, codes are constrained to be in unity range, which is beneficial in many scenarios. The paper presents the framework and solvers used to produce dictionaries and codes. Experiments in image reconstruction and feature learning for classification highlight the benefits with the proposed framework.

18:10-18:30, Paper WeBT1.4
Wind Turbine Fault Prediction Using Soft Label SVM
Zhao, Rui	RPI
Iqbal, Md Ridwan Al	RPI
Bennett, Kristin	RPI
Ji, Qiang	RPI
Keywords: Classification and clustering, Machine learning and data mining, Support vector machines and kernel methods Abstract: In this paper, we address the problem of predicting wind turbine electrical subsystem fault using time series data obtained from multiple sensors on wind turbine. While considering this as a time series classification problem, we are facing with the challenge that there is no explicit label information regarding the temporal location and duration of symptoms of the fault. Besides, significant data variation caused by both external and internal factors make the identification of change point non-trivial. To address these challenges, we propose a soft label SVM method where the probability of fault instead of binary label is used to train classifier to handle the uncertainty in label information. The probability is determined using temporal information of fault instances. We consider this as a weakly supervised learning problem. To handle large variation within data, we perform customized normalization on different sensor data based on their physical meanings and relationships. Finally, we evaluate our method on 38 different forced outage instances. The experiment on real SCADA data obtained from wind turbines show promising results where we can predict the triggering of fault 18 hours beforehand with an average AUC value 0.91.


WeBT2	G.Cancun T1.B
WePMO2	Oral Session

17:10-17:30, Paper WeBT2.1
Fast Local Polynomial Regression Approach for Speckle Noise Removal
Sharabati, Walid	Purdue Univ
Xi, Bowei	Purdue Univ
Keywords: Image based modeling, Signal, image and video processing, Image and video analysis and understanding Abstract: In this paper we focus on speckle noise removal. Previously, variational models have been proposed to remove the multiplicative speckle noise. In general, the variational models require a significant amount of run time to converge, and need to set the proper tuning parameter values to achieve optimal noise reduction results. In this paper, we present a local polynomial regression model for speckle noise removal. Our regression model is fast, does not need to be trained on a set of images, does not rely on tuning parameters, and is capable of performing fast speckle noise removal on high resolution images. We have conducted extensive experiments to evaluate our model performance. Our polynomial regression filter outperformed popular noise removal algorithms.

17:30-17:50, Paper WeBT2.2
Efficient Volumetric Fusion of Airborne and Street-Side Data for Urban Reconstruction
Bodis-Szomoru, Andras	ETH Zurich, Computer Vision Lab
Riemenschneider, Hayko	ETH Zurich
Van Gool, Luc	ETH Zurich and Univ. of Leuven
Keywords: Image based modeling, Stereo and multiple view geometry, Reconstruction and camera motion estimation Abstract: Airborne acquisition and on-road mobile mapping provide complementary 3D information of an urban landscape: the former acquires roof structures, ground, and vegetation at a large scale, but lacks the facade and street-side details, while the latter is incomplete for higher floors and often totally misses out on pedestrian-only areas or undriven districts. In this work, we introduce an approach that efficiently unifies a detailed street-side Structure-from-Motion (SfM) or Multi-View Stereo (MVS) point cloud and a coarser but more complete point cloud from airborne acquisition in a joint surface mesh. We propose a point cloud blending and a volumetric fusion based on ray casting across a 3D tetrahedralization (3DT), extended with data reduction techniques to handle large datasets. To the best of our knowledge, we are the first to adopt a 3DT approach for airborne/street-side data fusion. Our pipeline exploits typical characteristics of airborne and ground data, and produces a seamless, watertight mesh that is both complete and detailed. Experiments on 3D urban data from multiple sources and different data densities show the effectiveness and benefits of our approach.

17:50-18:10, Paper WeBT2.3
Separating Reflection Components in Images under Multispectral and Multidirectional Light Sources
Kobayashi, Naoto	Kyushu Inst. of Tech
Okabe, Takahiro	Kyushu Inst. of Tech
Keywords: Illumination and reflectance modeling, Physics-based vision, Computational photography Abstract: The appearance of an object depends on the color as well as the direction of a light source illuminating the object. The progress of LEDs enables us to capture the images of an object under multispectral and multidirectional light sources. Separating diffuse and specular reflection components in those images is important for preprocessing of various computer vision techniques such as photometric stereo, material editing, and relighting. In this paper, we propose a robust method for separating reflection components in a set of images of an object taken under multispectral and multidirectional light sources. We consider the set of images as the 3D data whose axes are the pixel, the light source color, and the light source direction, and then show the inherent structures of the 3D data: the rank 2 structure derived from the dichromatic reflection model, the rank 3 structure derived from the Lambert model, and the sparseness of specular reflection components. Based on those structures, our proposed method separates reflection components by combining sparse NMF and SVD with missing data. We conducted a number of experiments by using both synthetic and real images, and show that our method works better than some of the state-of-the-art techniques.

18:10-18:30, Paper WeBT2.4
A 2D Shape Structure for Decomposition and Part Similarity
Leonard, Kathryn	CSU Channel Islands
Morin, Geraldine	Univ. of Toulouse
Hahmann, Stefanie	Univ. of Grenoble, Inria, LJK
Axel, Carlier	Univ. De Toulouse
Keywords: Shape modeling and encoding Abstract: This paper presents a multilevel analysis of 2D shapes and uses it to find similarities between the different parts of a shape. Such an analysis is important for many applications such as shape comparison, editing, and compression. Our robust and stable method decomposes a shape into parts, determines a parts hierarchy, and measures similarity between parts based on a salience measure on the medial axis, the Weighted Extended Distance Function, providing a multi-resolution partition of the shape that is stable across scale and articulation. Comparison with an extensive user study on the MPEG-7 database demonstrates that our geometric results are consistent with user perception.


WeBT3	Maya T2.A
WePMO3	Oral Session

17:10-17:30, Paper WeBT3.1
DLSTM Approach to Video Modeling with Hashing for Large-Scale Video Retrieval
Zhuang, Naifan	Univ. of Central Florida
Ye, Jun	Univ. of Central Florida
Hua, Kien	Univ. of Central Florida
Keywords: Multimedia analysis, indexing and retrieval, Content based image retrieval and data mining, Deep learning Abstract: Although Query-by-Example techniques based on Euclidean distance in a multidimensional feature space have proved to be effective for image databases, this approach cannot be effectively applied to video since the number of dimensions would be massive due to the richness and complexity of video data. The above issue has been addressed in two recent solutions, namely Deterministic Quantization (DQ) and Dynamic Temporal Quantization (DTQ). DQ divides the video into equal segments and extracts a visual feature vector for each segment. The bag-of-word feature is then encoded by hashing to facilitate approximate nearest neighbor search using Hamming distance. One weakness of this approach is the deterministic segmentation of video data. DTQ improves on this by using dynamic video segmentation to obtain varied-length video segments. As a result, feature vectors extracted from these video segments can better capture the semantic content of the video. To support very large video databases, it is desirable to minimize the number of segments in order to keep the size of the feature representation as small as possible. We achieve this by using only one video segment (i.e., no video data segmentation is even necessary) with even better retrieval performance. Our scheme models video using differential long short-term memory (DLSTM) recurrent neural networks and obtains a highly compact fixed-size feature representation with the output of the hidden states of the DLSTM. Each of these features are further compressed by hashing them into binary bits via quantization. Experimental results based on two public data sets, UCF101 and MSRActionPairs, indicate that the proposed video modeling technique outperforms DTQ by a significant margin.

17:30-17:50, Paper WeBT3.2
Multi-Paced Dictionary Learning for Cross-Domain Retrieval and Recognition
Xu, Dan	Univ. of Trento
Song, Jingkuan	Univ. of Trento, Multimedia, Multimedia Retrieval, Machine
Alameda-Pineda, Xavier	Univ. of Trento
Ricci, Elisa	Univ. of Perugia
Sebe, Nicu	Univ. of Trento
Keywords: Coding, compression and super-resolution, Machine learning and data mining, Pattern Recognition for Search, Retrieval and Visualization Abstract: Several applications benefit from learning coupled representations able to describe data from multiple sources. For instance, cross-domain dictionary learning methods demonstrated to be particularly effective. In this paper we introduce Multi-Paced Dictionary Learning (MPDL) and propose an instantiation of it under the framework of cross- domain dictionary learning. MPDL is inspired by previous works on SPL, a framework able to enhance the accuracy of conventional learning models by presenting the training data in a meaningful order, i.e. easy samples are provided first. However, most of existing SPL methods only consider a single modality, while MPDL is specifically designed to assess the learning pace when data from multiple sources are available. We present the model and propose an efficient algorithm to learn the dictionaries and codes. The approach is validated via experiments on two different tasks, namely cross-media retrieval and sketch-to-photo face recognition, using publicly available datasets.

17:50-18:10, Paper WeBT3.3
Texture Synthesis through Convolutional Neural Networks and Spectrum Constraints
Liu, Gang	Telecom Paristech
Gousseau, Yann	Telecom Paris
Xia, Gui-Song	Wuhan Univ
Keywords: Texture and color analysis, Deep learning Abstract: This paper presents a significant improvement for the synthesis of texture images using convolutional neural networks (CNNs), making use of constraints on the Fourier spectrum of the results. More precisely, the texture ynthesis is regarded as a constrained optimization problem, with constraints conditioning both the Fourier spectrum and statistical features learned by CNNs. In contrast with existing methods, the presented method inherits from previous CNN approaches the ability to depict local structures and fine scale details, and at the same time yields coherent large scale structures, even in the case of quasi-periodic images. This is done at no extra computational cost. Synthesis experiments on various images show a clear improvement compared to a recent state-of-the art method relying on CNN constraints only.

18:10-18:30, Paper WeBT3.4
A New Method for Spatiotemporal Textual Saliency Detection in Video
Shan, Susu	Nanjing Univ
Xu, Hailiang	Nanjing Univ
Su, Feng	Nanjing Univ
Keywords: Image and video analysis and understanding, Character and Text Recognition, 2D/3D object detection and recognition Abstract: To detect salient image regions containing textual patterns in the video is valuable to many content-based video applications such as video retrieval, abstraction, classification and analysis. In this paper, we present an effective textual saliency detection method for natural scene videos. We first compute text-alike confidence values of local image regions, which capture the basic visual cues of textual components in the video frames, using an efficient cascaded prediction model. Next, we construct patch features depicting the statistical and spatial distribution of confidence values and combine them with general visual features like colors. We then employ a saliency detection model based on random walk with restart on the graph of local video regions, which effectively integrates both the spatial and the temporal saliency maps. The experiment result demonstrates the effectiveness of the proposed method.


WeBT4	Maya T2.B
WePMO4	Oral Session

17:10-17:30, Paper WeBT4.1
Context-Aware Mathematical Expression Recognition: An End-To-End Framework and a Benchmark
He, Wenhao	Chinese Acad. of Science
Luo, Yuxuan	Baidu
Yin, Fei	Inst. of Automation of CAS
Hu, Han	Tsinghua Univ
Han, Junyu	Baidu Online Network Tech. (Beijing) Co., Ltd
Ding, Errui	Inst. of Deep Learning, Baidu Res
Liu, Cheng-Lin	Inst. of Automation, Chinese Acad. of Sciences
Keywords: Character and Text Recognition Abstract: In this paper we propose a novel end-to-end framework for mathematical expression (ME) recognition. The method uses a convolutional neural network (CNN) to perform mathematical symbol detection and recognition simultaneously incorporating spatial context, and can handle multi-part and torched symbols effectively. To evaluate the performance, we provide a benchmark that contains MEs both from real-life and synthetic data. Images in our dataset undergo multiple variations such as viewpoint, illumination and background. For training, we use pure synthetic data for saving human labelling effort. The proposed method achieved 87% accuracy of total correct for clear images and 45% for cluttered ones.

17:30-17:50, Paper WeBT4.2
Structural Feature-Based Event Clustering for Short Text Streams
Sun, Zhengya	Inst. of Automation, Chinese Acad. of Sciences
Han, Jiuqi	Inst. of Automation, Chinese Acad. of Sciences
Hao, Hong-Wei	Univ. of Science and Tech. Beijing
Keywords: Document Understanding, Classification and clustering, Statistical, syntactic and structural pattern recognition Abstract: This paper is concerned with event clustering for short text streams, which aims to divide constantly arriving short texts into several dynamic event-based clusters. A widely adopted approach is based on the Vector Space Models (VSMs) such as bag of words. However, these models have limitations in that not only the semantic relationships between words are largely ignored, the term weighting may also not well adapt the change of data stream. To avoid these limitations, in this paper, we propose a new model which captures the compositional structure of textual relations, and jointly utilize the event content similarity, temporal and location proximity to evaluate the distance between texts. We then leverage the benefits of ClusTree, a parameter-free stream clustering algorithm, to discover micro-clusters and maintain stream summaries. The experimental results on several real world data sets demonstrate the superiority of the proposed method over the state-of-the-arts.

17:50-18:10, Paper WeBT4.3
Layered Ground Truth: Conveying Structural and Statistical Information for Document Image Analysis and Evaluation
Cote, Melissa	Univ. of Victoria
Branzan Albu, Alexandra	Univ. of Victoria
Keywords: Document Understanding, Performance Evaluation Abstract: This paper addresses the problem of semantic overlap across document objects in the context of ground truth representation for document layout analysis. Document object categories often share primitives from a low-level perspective (e.g. regions inside bars in a bar chart resemble background), making it difficult to evaluate document layout segmentation methods based on pixel classification, as most datasets and ground truth models focus on document objects. We propose a novel ground truth model that utilizes structural and statistical pattern recognition concepts. Statistical pixel-based data derived from low-level elemental patterns are layered onto high-level structural object-based data. We also present evaluation metrics that take advantage of the layered ground truth model, allowing a contextual evaluation of pixel classification algorithms. We apply the proposed model to two recent pixel classification approaches, evaluated on business document images that exhibit a challenging mixture of textual, graphical, and pictorial elements through varied layouts. The proposed model allows to obtain very detailed, comprehensive, and intuitive information on the strengths and limitations of the evaluated approaches that would be impossible to obtain through other models.

18:10-18:30, Paper WeBT4.4
Joint Training of Conditional Random Fields and Neural Networks for Stroke Classification in Online Handwritten Documents
Ye, Junyu	Inst. of Automation, Chinese Acad. of Sciences
Zhang, Yan-Ming	Inst. of Automation, Chinese Acad. of Sciences
Liu, Cheng-Lin	Inst. of Automation, Chinese Acad. of Sciences
Keywords: Document Understanding, Statistical, syntactic and structural pattern recognition, Artificial neural networks Abstract: The task of text/non-text stroke classification in online handwritten documents is an essential preprocessing step in document analysis. It is also a challenging problem since in many cases local features are not enough to generate high accuracy results and contextual information, such as temporal information and spatial information, must be carefully considered. In this paper, we propose a novel method, which jointly trains a combined model of conditional random fields and neural networks, to solve this problem. Both our unary and pairwise potentials are formulated as neural networks. The parameters of conditional random fields and neural networks are learned together during the training process. With much fewer parameters and faster speed, our method achieves impressive performance on the IAMonDo database, a publicly available database of freely handwritten documents.


WeBT5	Maya T2.CPoster Session Hall
WePMO5	Oral Session

17:10-17:30, Paper WeBT5.1
Multiple Instance Learning Convolutional Neural Networks for Object Recognition
Sun, Miao	Univ. of Missouri
Han, Tony	Univ. of Missouri
Keywords: Deep learning, Classification and clustering Abstract: Convolutional Neural Networks(CNN) have demonstrated its successful applications in computer vision, speech recognition, and natural language processing. For object recognition, CNNs might be limited by its strict label requirement and an implicit assumption that images are supposed to be target- object-dominated for optimal solutions. However, the labeling procedure, necessitating laying out the locations of target ob- jects, is very tedious, making high-quality large-scale dataset prohibitively expensive. Data augmentation schemes are widely used when deep networks suffer the insufficient training data problem. All the images produced through data augmentation share the same label, which may be problematic since not all data augmentation methods are label-preserving. In this paper, we propose a weakly supervised CNN framework named Multiple Instance Learning Convolutional Neural Networks(MILCNN) to solve this problem. We apply MILCNN framework to object recognition and report state-of-the-art performance on three benchmark datasets: CIFAR10, CIFAR100 and ILSVRC2015 classification dataset.

17:30-17:50, Paper WeBT5.2
Face Hallucination by Deep Traversal Network
Feng, Zhanxiang	Sun Yat-Sen Univ
Lai, Jian-huang	Sun Yat-Sen Univ
Xie, Xiaohua	Sun Yat-Sen Univ
Yang, Dakun	Sun Yat-Sen Univ
Mei, Ling	Sun Yat-Sen Univ
Keywords: Deep learning, Coding, compression and super-resolution Abstract: In this paper, we propose a novel patch-based face hallucination method that consists of two patch-based sparse autoencoder (SAE) networks and a deep fully connected network (namely traversal network). The SAE networks are used to capture the intrinsic features of low-resolution (LR) images and high-resolution (HR) images in the hidden layers, while the traversal network is used to map features from the LR hidden layer to the HR hidden layer. In the training stage, these three networks are jointly optimized. Compared with previous network-based methods that learn an end-to-end mapping from LR images to HR images, our method learns the mapping between hidden layers, which can better alleviate the over-fitting problem. Experimental results demonstrate that our method is efficient and robust for hallucinating face images from both lab environment and the wild. The proposal achieves state-of-the-art performance when conducting face hallucination in CAS-PEAL-R1 database, CMU-PIE database and Casia database.

17:50-18:10, Paper WeBT5.3
Learning a Lightweight Deep Convolutional Network for Joint Age and Gender Recognition
Zhu, Linnan	The Hong Kong Pol. Univ
Wang, Keze	Sun Yat-Sen Univ. Hong Kong Pol. Univ
Lin, Liang	Lotushill.org
Zhang, Lei	The Hong Kong Pol. Univ
Keywords: Deep learning, Image and video analysis and understanding Abstract: This paper proposes a lightweight deep model to recognize age and gender from a face image. Though simple, our network architecture is able to complete the two tasks effectively and efficiently. Moreover, different from existing methods, we simultaneously perform the age and gender recognition tasks via a joint regression model. Specifically, our model employs a multi-task learning scheme to learn shared features for these two correlated tasks in an end-to-end manner. Extensive experimental results on the recent Adience benchmark demonstrate that our model achieves competitive recognition accuracy with the state-of-the-art methods but with much faster speed, i.e., about 10 times faster in the testing phase. Our model can be easily adopted and extended to other facial applications.

18:10-18:30, Paper WeBT5.4
A Temporally Coherent Neural Algorithm for Artistic Style Transfer
Dushkoff, Michael	Rochester Inst. of Tech
McLaughlin, Ryan	Rochester Inst. of Tech
Ptucha, Raymond	Rochester Inst. of Tech
Attachments: Supplementary material Keywords: Deep learning, Motion, tracking and video analysis, Vision for graphics Abstract: Within the fields of visual effects and animation, humans have historically spent painstaking hours mastering the skill of drawing frame-by-frame animations. One such animation technique that has been widely used is called "rotoscoping" and has allowed uniquely stylized animations to capture the motion of real life action sequences. Automating this arduous process would free animators from performing frame by frame stylization to concentrate on artistic contributions. We introduce a new artificial system based on an existing neural style transfer method which creates artistically stylized animations that simultaneously reproduce both the motion of the original videos that they are derived from and the unique style of a given artistic work. This system utilizes a convolutional neural network framework to extract a hierarchy of image features used for generating images that appear visually similar to a given artistic style while at the same time faithfully preserving temporal content. The use of optical flow allows the combination of style and content to be integrated directly with the apparent motion over frames of a video to produce smooth and visually appealing transitions. This implementation demonstrates how biologically-inspired systems such as convolutional neural networks are rapidly approaching human-level behavior in tasks that were once thought impossible. Further, this research provides unique insights into the way that humans who produce artistically stylized animations perceive temporal information.

Technical Program for Wednesday December 7, 2016