|
ThPT1 |
Poster Session Hall |
ThP1 |
Poster Session |
|
14:00-16:10, Paper ThPT1.1 | |
A Deep Multi-Level Network for Saliency Prediction |
Cornia, Marcella | Univ. of Modena and Reggio Emilia |
Baraldi, Lorenzo | Univ. of Modena and Reggio Emilia |
Serra, Giuseppe | Univ. DEGLI STUDI DI MODENA E REGGIO EMILIA |
Cucchiara, Rita | Univ. Degli Studi Di Modena E Reggio Emilia |
Keywords: Deep learning
Abstract: This paper presents a novel deep architecture for saliency prediction. Current state of the art models for saliency prediction employ Fully Convolutional networks that perform a non-linear combination of features extracted from the last convolutional layer to predict saliency maps. We propose an architecture which, instead, combines features extracted at different levels of a Convolutional Neural Network (CNN). Our model is composed of three main blocks: a feature extraction CNN, a feature encoding network, that weights low and high level feature maps, and a prior learning network. We compare our solution with state of the art saliency models on two public benchmarks datasets. Results show that our model outperforms under all evaluation metrics on the SALICON dataset, which is currently the largest public dataset for saliency prediction, and achieves competitive results on the MIT300 benchmark.
|
|
14:00-16:10, Paper ThPT1.2 | |
Latent Regression Bayesian Network for Data Representation |
Nie, Siqi | RPI |
Zhao, Yue | Minzu Univ. of China |
Ji, Qiang | RPI |
Keywords: Deep learning
Abstract: Restricted Boltzmann machines (RBMs) are widely used for data representation and feature learning in various machine learning tasks. The undirected structure of an RBM allows inference to be performed efficiently, because the latent variables are dependent on each other given the visible variables. However, we believe the correlations among latent variables are crucial for faithful data representation. Driven by this idea, we propose a counterpart of RBMs, namely latent regression Bayesian networks (LRBNs), which has a directed structure. One major difficulty of learning LRBNs is the intractable inference. To address this problem, we propose an inference method based on the conditional pseudo-likelihood that preserves the dependencies among the latent variables. For learning, we propose to employ the hard Expectation Maximization (EM) algorithm, which avoids the intractability of the traditional EM by max-out instead of sum-out to compute the data likelihood. Qualitative and quantitative evaluations of our model against state-of-the-art models and algorithms on benchmark data sets demonstrate the effectiveness of the proposed algorithm in data representation and reconstruction.
|
|
14:00-16:10, Paper ThPT1.3 | |
Pedestrian and Part Position Detection Using a Regression-Based Multiple Task Deep Convolutional Neural Network |
Yamashita, Takayoshi | Chubu Univ |
Fukui, Hiroshi | Chubu Univ |
Yamauchi, Yuji | Chubu Univ |
Fujiyoshi, Hironobu | Chubu Univ |
Keywords: Deep learning, 2D/3D object detection and recognition
Abstract: In driving support systems, it is not only necessary to detect the position of pedestrians, but also to estimate the distance between a pedestrian and the vehicle. In general approaches using monocular cameras, the upper and lower positions of each pedestrian are detected using a bounding box obtained from a pedestrian detection technique. The distance between the pedestrian and the vehicle is then estimated using these positions and the camera parameters. This conventional framework uses independent pedestrian detection and position detection processes to estimate the distance. In this paper, we propose a method to detect both the pedestrian and their position simultaneously using a regression-based deep convolutional neural network (DCNN). This simultaneous detection enables the DCNN to train efficient parameters for the extraction of proper features, because the position information is expressly controlled by the pedestrian region. In a series of experiments, our method is shown to improve the pedestrian detection performance compared with methods based solely on pedestrian detection. The proposed approach also improves the detection accuracy of the head and leg positions compared with methods that consider only position detection. Using the results of position detection and the obtained camera parameters, our method achieves distance estimation to within 5% error.
|
|
14:00-16:10, Paper ThPT1.4 | |
Beam Search for Learning a Deep Convolutional Neural Network of 3D Shapes |
Xu, Xu | Oreong State Univ |
Todorovic, Sinisa | Oregon State Univ |
Keywords: Deep learning, 2D/3D object detection and recognition
Abstract: This paper addresses one of the basic problems in computer vision, that of recognizing 3D shapes of objects. Recent work typically represents a 3D shape as a set of binary variables corresponding to 3D voxels of a uniform 3D grid centered on the shape, and resorts to deep convolutional neural networks (CNNs) for modeling these binary variables. However, robust learning of CNNs is currently limited by the small datasets of 3D shapes available – an order of magnitude smaller than other common datasets in computer vision. Related work typically deals with the small training datasets using a number of ad hoc, hand-tuning strategies. To address this issue, we formulate CNN learning as a beam search aimed at identifying an optimal CNN architecture – namely, the number of layers, nodes, and their connectivity in the network – as well as estimating parameters of such an optimal CNN. Each state of the beam search corresponds to a candidate CNN. Two types of actions are defined to add new convolutional filters or new convolutional layers to a parent CNN, and thus transition to children states. The utility function of each action is efficiently computed by transferring parameter values of the parent CNN to its children, thereby enabling an efficient beam search. Our experimental evaluation on the 3D ModelNet dataset demonstrates that our model pursuit using the beam search yields a CNN with superior performance on 3D shape classification than the state of the art.
|
|
14:00-16:10, Paper ThPT1.5 | |
Discriminant Auto Encoders for Face Recognition with Expression and Pose Variations |
Pathirage, Chathurdara Sri Nadith | Curtin Univ |
Li, Ling | Curtin Univ. of Tech |
Liu, Wanquan | Curtin Univ. of Tech |
Keywords: Deep learning, 2D/3D object detection and recognition, Biologically motivated vision
Abstract: The key challenge of face recognition is to develop effective feature representations for reducing intra-personal variations while enlarging inter-personal differences. This paper presents a novel non-linear discriminant error criterion which can be used in effective feature learning from raw pixels. Unlike many existing methods which assume the problem to be linear in nature, the proposed method utilizes a novel deep learning (DL) framework which makes no prior assumptions thus exploiting the full potential of learning a highly non-linear transformation. High level representations learnt via the proposed model are highly supervised and can help to boost the performance of subsequent classifiers such as LDA. This study clearly shows the value of using non-linear discriminant error criterion as a tractable objective to guide the learning of useful high level features in various face related problems. The extracted features are learnt from local face regions and the results of the experiments performed on 3 different face image databases demonstrate the superiority and the generalizability of our method compared to existing work, as well as the applicability of the concept onto many different deep learning models of the same nature.
|
|
14:00-16:10, Paper ThPT1.6 | |
MRCNN: A Stateful Fast R-CNN |
Burlina, Philippe | Johns Hopkins Univ. Applied Physics Lab |
Keywords: Deep learning, 2D/3D object detection and recognition, Machine learning and data mining
Abstract: Deep convolutional neural networks (DCNNs) perform on par or better than humans for image classification. Hence efforts have now shifted to more challenging tasks such as object detection and classification in images, video or RGBD. Recently developed region CNNs (R-CNN) such as Fast R-CNN [7] address this detection task for images. Instead, this paper is concerned with video and also focuses on resource-limited systems. Newly proposed methods accelerate R-CNN by sharing convolutional layers for proposal generation, location regression and labeling [12][13][19][25]. These approaches when applied to video are stateless: they process each image individually. This suggests an alternate route: to make R-CNN stateful and exploit temporal consistency. We extend Fast R-CNNs by making it employ recursive Bayesian filtering and perform proposal propagation and reuse. We couple multi-target proposal/detection tracking (MTT) with R-CNN and do detection-to-track association. We call this approach MRCNN as short for MTT + R-CNN. In MRCNN, region proposals -- that are vetted via classification and regression in R-CNNs -- are treated as observations in MTT and propagated using assumed kinematics. Actual proposal generation (e.g. via Selective Search) need only be performed sporadically and/or periodically and is replaced at all other times by MTT proposal predictions. Preliminary results show that MRCNNs can economize on both proposal and classification computations, and can yield up to a 10 to 30 factor decrease in number of proposals generated, about one order of magnitude proposal computation time savings and nearly one order magnitude improvement in overall computational time savings, for comparable localization and classification performance. This method can additionally be beneficial for false alarm abatement.
|
|
14:00-16:10, Paper ThPT1.7 | |
MSR-CNN: Applying Motion Salient Region Based Descriptors for Action Recognition |
Zhigang, Tu | Arizona State Univ |
Cao, Jun | Intel Corp |
Li, Yikang | Arizona State Univ |
Li, Baoxin | Arizona State Univ |
Keywords: Deep learning, 2D/3D object detection and recognition, Motion, tracking and video analysis
Abstract: In recent years the most popular video-based human action recognition methods rely on extracting feature representations using Convolutional Neural Networks (CNN) and then using these representations to classify actions. In this work, we propose a fast and accurate video representation that is derived from the motion-salient region (MSR), which represents features most useful for action labeling. By improving a well-performed foreground detection technique, the region of interest (ROI) corresponding to actors in the foreground in both the appearance and the motion field can be detected under various realistic challenges. Furthermore, we propose a complementary motion salient measure to select a secondary ROI -- the major moving part of the human. Accordingly, a MSR-based CNN descriptor (MSR-CNN) is formulated to recognize human action, where the descriptor incorporates appearance and motion features along with tracks of MSR. The computation can be efficiently implemented due to two characteristics: 1) only part of the RGB image and the motion field need to be processed; 2) less data is used as input for the CNN feature extraction. Comparative evaluation on JHMDB and UCF Sports datasets shows that our method outperforms the state-of-the-art in both accuracy and efficiency.
|
|
14:00-16:10, Paper ThPT1.8 | |
Convolutional Neural Networks for Object Recognition on Mobile Devices: A Case Study |
Tobias Quiroz, Jose Luis | Telecom-Bretagne |
Ducournau, Aurélien | Telecom Bretagne |
Rousseau, François | Inst. Mines Telecom |
Mercier, Grégoire | Telecom Bretagne |
Fablet, Ronan | Telecom Bretagne/LabSTICC |
Keywords: Deep learning, 2D/3D object detection and recognition, Pattern Recognition for Art, Cultural Heritage and Entertainment
Abstract: Deep Learning (DL), especially in Convolutional Neural Networks (CNN), has become the state-of-the-art for a variety of pattern recognition issues. The advances in technology allowed the use of high-end General Purpose Graphic Processor Units (GPGPU) for accelerating numerical problem solving. These advances are not only in terms of speed, but also in terms of network size, nowadays computers are able to drive deeper, wider and more powerful models. State of the art CNN’s have achieved human-like performance in several recognition tasks such as: handwritten character recognition, face recognition, scene labelling, object detection and image classification among others. Meanwhile, mobile devices have become powerful enough to handle the computations required for deploying CNNs models in near real-time. Here, we investigate the implementation of light-weight CNN schemes on mobile devices for domain-specific objection recognition tasks.
|
|
14:00-16:10, Paper ThPT1.9 | |
Deep Feature Extraction in the DCT Domain |
Ghosh, Arthita | Univ. of Maryland Coll. Park |
Chellappa, Rama | Univ. of Maryland |
Keywords: Deep learning, Artificial neural networks, 2D/3D object detection and recognition
Abstract: We explore the effectiveness of deep features extracted by Convolutional Neural Networks(CNNs) in the Discrete Cosine Transform(DCT) domain on various image classification tasks such as pedestrian and face detection, material identification and object recognition. We perform the DCT operation on the feature maps generated by convolutional layers in CNNs. We compare the performance of the same network on the same datasets, with the same hyper-parameters with or without the DCT step. Our results indicate that a DCT operation incorporated into the network after convolution+thresholding layer and before pooling can have certain advantages such as convergence over fewer training epochs and sparser weight matrices that are more conducive to pruning and hashing techniques.
|
|
14:00-16:10, Paper ThPT1.10 | |
Faster Training of Very Deep Networks Via P -Norm Gates |
Pham, Trang | Deakin Univ |
Tran, Truyen | Deakin Univ |
Phung, Dinh | Deakin Univ |
Venkatesh, Svetha | Deakin Univ |
Keywords: Deep learning, Artificial neural networks, Classification and clustering
Abstract: A major contributing factor to the recent advances in deep neural networks is structural units that let sensory information and gradient propagate easily. Gating is such an important structure that acts as a flow control. Gates are pervasive among state-of-the-art recurrent models such as LSTM and GRU, and feedforward models such as Residual Nets and Highway Networks. This enables learning very deep networks with hundred layers and helps achieve record-breaking results in vision (e.g., ImageNet with Residual Nets) and NLP (e.g., machine translation with GRU). However, there is a little work in analysing the role of gating in the learning process. In this paper, we propose a flexible p -norm gating scheme, which allows user-controllable flow and as a consequence, can improve the learning speed. This scheme subsumes other existing gating schemes, including those in GRU, Highway Networks and Residual Nets as special cases. Experiments on large sequence and vector datasets demonstrate that the proposed gating scheme help improve the learning speed significantly without extra overhead.
|
|
14:00-16:10, Paper ThPT1.11 | |
Coupled Convolution Layer for Convolutional Neural Network |
Uchida, Kazutaka | Tokyo Inst. of Tech |
Tanaka, Masayuki | Tokyo Inst. of Tech |
Okutomi, Masatoshi | Tokyo Inst. of Tech |
Keywords: Deep learning, Artificial neural networks, Classification and clustering
Abstract: We introduce a coupled convolution layer comprising two parallel convolutions with mutually constrained weights. Inspired by the human retina mechanism, we constrain the convolution weights such that one set of weights should be the negative of the other to mimic responses of on-center and off-center retinal ganglion cells. Our analysis shows that the retina-like convolution layer, a special case of the coupled convolution layer, can be realized by a normal convolutional layer with a pair of activation functions designated as Biased ON/OFF ReLU. Experimental comparisons demonstrate that the proposed coupled convolution layer performs better without increasing the number of parameters, which reveals two important facts. First, the separation of the positive and negative part into different channels plays an important role. Secondly, constraining weights across convolutions can produce better performance than training weights freely. We evaluate its effect by comparison with ReLU, LReLU, and PReLU using the CIFAR-10, CIFAR-100, and PlanktonSet 1.0 datasets.
|
|
14:00-16:10, Paper ThPT1.12 | |
Finetuning Convolutional Neural Networks for Visual Aesthetics |
Wang, Yeqing | Changzhou Coll. of Information Tech |
Li, Yi | Toyota Res. Inst. Australian National Univ. NICTA |
Porikli, Fatih | Anu / Nicta |
Keywords: Deep learning, Artificial neural networks, Image and video analysis and understanding
Abstract: Inferring the aesthetic quality of images is a challenging computer vision task due to its subjective and conceptual nature. Most image aesthetics evaluation approaches focused on designing handcrafted features, and only a few adopted learning of relevant and imperative characteristics in a data-driven manner. In this paper, we propose to attune Convolutional Neural Networks (CNNs) for image aesthetics. Unlike previous deep learning based techniques, we employ pretrained models, namely AlexNet and the 16-layer VGGNet, and calibrate them to estimate visual aesthetic quality. This enables exploiting automatically the inherent information from much larger scale and more diversified image datasets. We tested our methods on AVA and CUHKPQ image aesthetics datasets on two different training-testing partitions, and compared the performance using both local and contextual information. Experimental results suggest that our strategy is robust, effective and superior to the state-of-the-art approaches.
|
|
14:00-16:10, Paper ThPT1.13 | |
Face Detection Based on Deep Convolutional Neural Networks Exploiting Incremental Facial Part Learning |
Triantafyllidou, Danai | Aristotle Univ. of Thessaloniki |
Tefas, Anastasios | Aristotle Univ. of Thessaloniki |
Keywords: Deep learning, Artificial neural networks, Machine learning and data mining
Abstract: Deep learning methods are powerful approaches but often require expensive computations and lead to models of high complexity which need to be trained with large amounts of data. In this paper, we consider the problem of face detection and we propose a light-weight deep convolutional neural network that achieves a state-of-the-art recall rate at the challenging FDDB dataset. Our model is designed with a view to minimize both training and run time and outperforms the convolutional network used in cite{ddfd} for the same task. Our model consists only of 113.864 free parameters whereas the previously proposed CNN for face detection had 60 million parameters. We propose a new training method that gradually increases the difficulty of both negative and positive examples and has proved to drastically improve training speed and accuracy. Our second approach, involves training a separate deep network to detect individual facial features whilst creating a model that combines the outputs of two different networks. Both methods are able to detect faces under severe occlusion and unconstrained pose variation and meet the difficulties and the large variations of real-world face detection.
|
|
14:00-16:10, Paper ThPT1.14 | |
Learning to Semantically Segment High-Resolution Remote Sensing Images |
Nogueira, Keiller | Univ. Federal De Minas Gerais |
Dalla Mura, Mauro | Fondazione Bruno Kessler |
Chanussot, Jocelyn | Grenoble Inst. of Tech |
Schwartz, William | Federal Univ. of Minas Gerais |
dos Santos, Jefersson Alex | Univ. Federal De Minas Gerais |
Keywords: Deep learning, Artificial neural networks, Other applications
Abstract: Land cover classification is a task that requires methods capable of learning high-level features while dealing with high volume of data. Overcoming these challenges, Convolutional Networks (ConvNets) can learn specific and adaptable features depending on the data while, at the same time, learn classifiers. In this work, we propose a novel technique to automatically perform pixel-wise land cover classification. To the best of our knowledge, there is no other work in the literature that perform pixel-wise semantic segmentation based on data-driven feature descriptors for high-resolution remote sensing images. The main idea is to exploit the power of ConvNet feature representations to learn how to semantically segment remote sensing images. First, our method learns each label in a pixel-wise manner by taking into account the spatial context of each pixel. In a predicting phase, the probability of a pixel belonging to a class is also estimated according to its spatial context and the learned patterns. We conducted a systematic evaluation of the proposed algorithm using two remote sensing datasets with very distinct properties. Our results show that the proposed algorithm provides improvements when compared to traditional and state-of-the-art methods that ranges from 5 to 15% in terms of accuracy.
|
|
14:00-16:10, Paper ThPT1.15 | |
On the Size of Convolutional Neural Networks and Generalization Performance |
Kabkab, Maya | Univ. of Maryland |
Hand, Emily | Univ. of Maryland |
Chellappa, Rama | Univ. of Maryland |
Keywords: Deep learning, Classification and clustering, Artificial neural networks
Abstract: While Convolutional Neural Networks (CNNs) have recently achieved impressive results on many classification tasks, it is still unclear why they perform so well and how to properly design them. In this work, we investigate the effect of the convolutional depth of a CNN on its generalization performance for binary classification problems. We prove a sufficient condition -polynomial in the depth of the CNN- on the training database size to guarantee such performance. We empirically test our theory on the problem of gender classification and explore the effect of varying the CNN depth, as well as the training distribution and set size.
|
|
14:00-16:10, Paper ThPT1.16 | |
Adaptive Hierarchical Classification Networks |
Nooka, Sai | RIT |
Chennupati, Vijaya Naga Jyoth Sumanth | Rochester Inst. of Tech |
Veerabhadra, Naga Karthik Reddy | Rochester Inst. of Tech |
Sah, Shagan | Rochester Inst. of Tech |
Ptucha, Raymond | Rochester Inst. of Tech |
Keywords: Deep learning, Classification and clustering, Artificial neural networks
Abstract: Hierarchical decomposition enables increased number of classes in a classification problem. Class similarities guide the creation of a family of course to fine classifiers which solve categorical problems more effectively than a single flat classifier. High accuracies require precise configurations for each of the family of classifiers. This paper proposes a method to adaptively select the configuration of the hierarchical family of classifiers. Linkage statistics from overall and sub-classification confusion matrices define categorical groupings for an efficient and accurate classification framework. Depending on the number of classes and the complexity of the problem, an adaptive configuration manager chooses between a multi-layer perceptron or a deep convolutional neural network, then selects the complexity of each.
|
|
14:00-16:10, Paper ThPT1.17 | |
An Information Theoretic Feature Selection Framework Based on Integer Programming |
Nie, Siqi | RPI |
Gao, Tian | Rensselaer Pol. Inst |
Ji, Qiang | RPI |
Keywords: Dimensionality reduction and manifold learning
Abstract: We propose a general framework for information theoretic feature selection based on the integer programming. Filter feature selection methods usually rely on a greedy forward or backward selection heuristic to find a satisfactory set of features, as the exact search is a combinatorial problem. We formulate the existing filter information theoretic criteria into an integer programming problem, and by using objective functions, we can represent many different existing scoring criteria. The integer programming framework can be solved efficiently by the existing solvers. We demonstrate the superior performance of the integer programming formulation over its corresponding criterion empirically.
|
|
14:00-16:10, Paper ThPT1.18 | |
 Nonlinear Dimensionality Reduction by Curvature Minimization |
Yoshiyasu, Yusuke | AIST |
Yoshida, Eiichi | AIST |
Attachments: Supplementary material
Keywords: Dimensionality reduction and manifold learning, 2D/3D object detection and recognition, Shape modeling and encoding
Abstract: In this paper, we introduce a nonlinear dimensionality reduction (NLDR) technique that can construct a low-dimensional embedding efficiently and accurately with low embedding distortions. The key idea is to divide NLDR into nonlinearity reduction and linear dimensionality reduction, which simplifies the overall NLDR process. Nonlinearity reduction is based on the elastic shell model that measures the in-plane stretching and bending energy. With this model, we minimize the curvature of the data, which is the source of nonlinearity, while preserving the original intrinsic property (i.e., local lengths) as-much-as possible. We discretize and linearize our nonlinearity reduction model such that it leads to an iterative deformation technique that alternates between two steps in order to flatten a manifold: the curvature minimization step that solves a bi-Laplace system and the local length restoration step that solves a Poisson system. We propose an efficient optimization technique for the both steps using a direct solver based on Cholesky decomposition, which exploits the fact that the system matrices stay constant; during iterations, we reuse the factorizations that are obtained once at the beginning and perform back substitutions only. Since our algorithm relies only on local geometric properties, it can accurately embed the data with complicated topology. Experimental results show that our algorithm is faster than the most of other state-of-the-art algorithms and preserves local areas and angles better than previous approaches.
|
|
14:00-16:10, Paper ThPT1.19 | |
Unsupervised Feature Extraction Using a Learned Graph with Clustering Structure |
Zhuge, Wenzhang | National Univ. of Defense Tech |
Hou, Chenping | National Univ. of Defense Tech |
Nie, Feiping | NWPU |
Yi, Dongyun | National Univ. of Defense Tech |
Keywords: Dimensionality reduction and manifold learning, Classification and clustering, Machine learning and data mining
Abstract: Feature extraction, one kind of dimensionality reduction methodology, has aroused considerable research interests during the last few decades. Traditional graph embedding methods construct a fixed graph with original data to fulfill the aim of feature extraction. The lack of the graph learning mechanism leaves room for the improvement of their performances. In this paper, we propose a novel framework, termed as unsupervised feature extraction using a learned graph with clustering structure(LGCS), in which a graph learning mechanism has been presented. To be specific, the proposed LGCS learn both a transformation matrix and a structured graph which has k connected components(where k is the number of clusters). To show the effectiveness of the framework, we present a method within our framework combining the locality preserving projection(LPP) with the graph learning mechanism, and an iteration algorithm has been designed to solve the corresponding optimizing problem. Promising experimental results on real-world datasets have validated the effectiveness of our proposed algorithm.
|
|
14:00-16:10, Paper ThPT1.20 | |
Simultaneous Visualization of Samples, Features and Multi-Labels |
Kudo, Mineichi | Hokkaido Univ |
Kimura, Keigo | Hokkaido Univ |
Haindl, Michael | Inst. of Information Theory and Automation |
Tenmoto, Hiroshi | Kushiro National Coll. of Tech |
|
|
14:00-16:10, Paper ThPT1.21 | |
Simplex-Based Dimension Estimation of Topological Manifolds |
Tasaki, Hajime | Chuo Univ |
Lenz, Reiner | Linköping Univ |
Chao, Jinhui | Department of Information and System Engineering, Chuo Univ |
Keywords: Dimensionality reduction and manifold learning, Machine learning and data mining
Abstract: Dimension reduction is one of the most important issues in machine learning and computational intelligence. Typical data sets are point clouds in a high dimensional space with a hidden structure to be found in low dimensional submanifolds.Finding this intrinsic manifold structure is very important in the understanding of the data and for reducing computational complexity. In this paper, we propose a novel approach for dimension estimation of topological manifolds based on measures of simplices. We also investigate the effects of resolution changes for dimension estimation in the framework of Morse theory. The result is a method that can be used for data located in simplical complexes of varying dimensions and with no continuous or differentiable structure. The proposed method is applied to images of handwritten digits with known deforming dimensions, data with a nontrivial topology and noisy data. We compare the estimates with results obtain by local PCA.
|
|
14:00-16:10, Paper ThPT1.22 | |
 Robust Unsupervised Feature Selection by Nonnegative Sparse Subspace Learning |
Wei, Zheng | Nanjing Univ. of Science and Tech |
Yan, Hui | Nanjing Univ. of Science and Tech |
Yang, Jian | Nanjing Univ. of Science and Tech |
Yang, Jingyu | Nanjing Univ. of Science and Tech |
Attachments: Supplementary material
Keywords: Dimensionality reduction and manifold learning, Machine learning and data mining
Abstract: Sparse subspace learning has been demonstrated to be effective in data mining and machine learning. In this paper, we cast the unsupervised feature selection scenario as a matrix factorization problem from the view of sparse subspace learning. By minimizing the reconstruction residual, the learned feature weight matrix with the l2,1-norm and the non-negative constraints not only removes the irrelevant features, but also captures the underlying low dimensional structure of the data points. Meanwhile in order to enhance the model’s robustness, we attempt to solve our problem by l1-norm error function which is resistant to outliers and sparse noise. An efficient iterative algorithm is introduced to optimize this non-convex and non-smooth objective function and the proof of its convergence is given. Particularly, differ from conventional non-negative updating rules, we design a novel multiplicative update rule to iteratively solve the feature weight matrix, and we validate its non-negativity. Comparative experiments on various original datasets with and without malicious pollution demonstrate performance superiority of our model.
|
|
14:00-16:10, Paper ThPT1.23 | |
Moment-Based Symmetry Detection for Scene Modeling and Recognition Using RGB-D Images |
Su, Jui-Yuan | Ming Chuan Univ |
Cheng, Shyi-Chyi | National Taiwan Ocean Univ. Taiwan |
Hsieh, Jun-Wei | -National Taiwan Ocean Univ |
Hsu, Tzu-Hao | National Taiwan Ocean Univ |
Keywords: Dimensionality reduction and manifold learning, Representation and analysis in pixel/voxel images, Classification and clustering
Abstract: In this paper we present a novel unsupervised feature representation by extracting salient symmetries in RGB-D images using the proposed moment-based symmetric patch detector. A fast indexing structure is also derived to group local symmetric patches into semantically meaningful symmetric parts. Given an RGB-D image, the hash-based symmetric patch indexing speeds up the searches of symmetric patch pairs, which are further grouped into symmetric parts with nearly linear time complexity. In the context of symmetry matching and scene classification, the second part of this work presents a symmetry-based scene modeling, aiming at computing a robust part-based feature set for each image category. To verify the effectiveness of the symmetry detector, based on the pre-learned part-based scene model, a part-based voting scheme is constructed to annotate the scene type of the input RGB-D image. Experimental results show that the proposed approach outperforms the compared methods in terms of detection and recognition accuracy using publicly available datasets.
|
|
14:00-16:10, Paper ThPT1.24 | |
 Unsupervised Object Counting without Object Recognition |
Katsuki, Takayuki | IBM Res. - Tokyo |
Morimura, Tetsuro | IBM Res. - Tokyo |
Ide, Tsuyoshi | T. J. Watson Res. Center |
Attachments: Supplementary material
Keywords: Machine learning and data mining, Classification and clustering, Pattern Recognition for Surveillance and Security
Abstract: This paper addresses the problem of object counting, which is to estimate the number of objects of interest from an input observation. We formalize the problem as a posterior inference of the count by introducing a particular type of Gaussian mixture for the input observation, whose mixture indexes correspond to the count. Unlike existing approaches in image analysis, which typically perform explicit object detection using labeled training images, our approach does not need any labeled training data. Our idea is to use the stick-breaking process as a constraint to make it possible to interpret the mixture indexes as the count. We apply our method to the problem of counting vehicles in real-world web camera images and demonstrate that the accuracy and robustness of the proposed approach without any labeled training data are comparable to those of supervised alternatives.
|
|
14:00-16:10, Paper ThPT1.25 | |
MCNC: Multi-Channel Nonparametric Clustering from Heterogeneous Data |
Nguyen, Thanh-Binh | Deakin Univ |
Nguyen, Vu | Deakin Univ |
Venkatesh, Svetha | Deakin Univ |
Phung, Dinh | Deakin Univ |
Keywords: Machine learning and data mining, Classification and clustering, Statistical, syntactic and structural pattern recognition
Abstract: Bayesian nonparametric (BNP) models have recently become popular due to their flexibility in identifying the unknown number of clusters. However, they have difficulties handling heterogeneous data from multiple sources. Existing BNP methods either treat each of these sources independently – hence do not get benefits from the correlating information between them, or require to explicitly specify data sources as primary and context channels. In this paper, we present a BNP framework, termed MCNC, which has the ability to (1) discover co-patterns from multiple sources; (2) explore multi-channel data simultaneously and treat them equally; (3) automatically identify a suitable number of patterns from data; and (4) handle missing data. The key idea is to utilize a richer base measure of a BNP model being a product-space. We demonstrate our framework on synthetic and real-world datasets to discover the identity–location–time (a.k.a who–where–when) patterns. The experimental results highlight the effectiveness of our MCNC framework in both cases of complete and missing data.
|
|
14:00-16:10, Paper ThPT1.26 | |
Witness Identification in Multiple Instance Learning Using Random Subspaces |
Carbonneau, Marc-André | Ec. De Tech. Supérieure |
Granger, Eric | École De Tech. Supérieure |
Gagnon, Ghyslain | École De Tech. Supérieure |
Keywords: Classification and clustering, Semi-supervised learning and spectral methods
Abstract: Multiple instance learning (MIL) is a form of weakly-supervised learning where instances are organized in bags. A label is provided for bags, but not for instances. MIL literature typically focuses on the classification of bags seen as one object, or as a combination of their instances. In both cases, performance is generally measured using labels assigned to entire bags. In this paper, the MIL problem is formulated as a knowledge discovery task for which algorithms seek to discover the witnesses (i.e. identifying positive instances), using the weak supervision provided by bag labels. Some MIL methods are suitable for instance classification, but perform poorly in application where the witness rate is low, or when the positive class distribution is multimodal. A new method that clusters data projected in random subspaces is proposed to perform witness identification in these adverse settings. The proposed method is assessed on MIL data sets from three application domains, and compared to 7 reference MIL algorithms for the witness identification task. The proposed algorithm constantly ranks among the best methods in all experiments, while all other methods perform unevenly across data sets.
|
|
14:00-16:10, Paper ThPT1.28 | |
Joint K-Means Quantization for Approximate Nearest Neighbor Search |
Ozan, Ezgi Can | Tampere Univ. of Tech |
Kiranyaz, Serkan | Tampere Univ. of Tech |
Moncef, Gabbouj | Tampere Univ. of Tech |
Keywords: Machine learning and data mining, Multimedia analysis, indexing and retrieval, Segmentation, features and descriptors
Abstract: Recently, Approximate Nearest Neighbor (ANN) Search has become a very popular approach for similarity search on large-scale datasets. In this paper, we propose a novel vector quantization method for ANN, which introduces a joint multi-layer K-Means clustering solution for determination of the codebooks. The performance of the proposed method is improved further by a joint encoding scheme. Experimental results verify the success of the proposed algorithm as it outperforms the state-of-the-art methods.
|
|
14:00-16:10, Paper ThPT1.29 | |
Semi-Supervised Learning Competence of Classifiers Based on Graph for Dynamic Classifier Selection |
Hou, Cui qin | Fujitsu R&D Center Co. Ltd |
Xia, Yingju | Information Tech. Lab. Fujitsu Res. & Developmen |
Xu, Zhuo ran | Fujitsu R&D Center Co. Ltd |
Sun, Jun | Fujitsu R&D Center Co., LTD |
Keywords: Machine learning and data mining, Statistical, syntactic and structural pattern recognition
Abstract: Classifier competence is critical important for dynamic classifier selection. This study proposes a semi-supervised learning algorithm to learn the competence of classifiers under the proposed optimization framework based on graph. First it constructs a graph based on the training data and some unlabeled data. Then it iteratively learns the competence of classifiers. The learned competence not just reflects the competitiveness of classifiers, but also varies smooth on the neighboring data. Experimental results on five different datasets show the dynamic classifier selection classification systems with the learned classifier competence perform better than the classification systems with local accuracy as the classifier competence.
|
|
14:00-16:10, Paper ThPT1.30 | |
Learning Tubes |
Ulm, Michael | Austrian Inst. of Tech |
Braendle, Norbert | Austrian Inst. of Tech |
Keywords: Machine learning and data mining, Statistical, syntactic and structural pattern recognition
Abstract: We present a new method for analyzing data manifolds based on Weyl’s tube theorem. The coefficients of the tube polynomial for a manifold provide geometric information such as the volume of the manifold or its Euler characteristic, thus providing bounds on the geometric nature of the manifold. We present an algorithm estimating the coefficients of the tube polynomial for a given manifold and demonstrate the features of our algorithm on artificial datasets. We apply the algorithm on a real-world traffic dataset to determine the number and properties of clusters. We furthermore demonstrate that our algorithm can be used to determine image coverage of an object, giving hints on where a manifold is not sufficiently sampled.
|
|
14:00-16:10, Paper ThPT1.31 | |
Bayesian Nonparametric Multiple Instance Regression |
Subramanian, Saravanan | Deakin Univ |
Rana, Santu | Deakin Univ |
Gupta, Sunil Kumar | Deakin Univ |
Bagavathi Sivakumar, P | Dept of Computer Science and Engineering Amrita School of Engin |
Velayutham, Shunmuga | Dept. of Computer Science and Engineering, Amrita School of Engi |
Venkatesh, Svetha | Deakin Univ |
Keywords: Machine learning and data mining, Statistical, syntactic and structural pattern recognition
Abstract: Multiple Instance Regression jointly models a set of instances and its corresponding real-valued output. We present a novel multiple instance regression model that infers a subset of instances in each bag that best describes the bag label and uses them to learn a predictive model in a unified framework. We assume that instances in each bag are drawn from a mixture distribution and thus naturally form groups, and instances from one of this group explain the bag label. The largest cluster is assumed to be correlated with the label. We evaluate this model on the crop yield prediction and aerosol depth prediction problems. The predictive accuracy of our model is better than the state of the art MIR methods.
|
|
14:00-16:10, Paper ThPT1.33 | |
Bayesian Approach to Learn Bayesian Networks Using Data and Constraints |
Gao, Xiao-guang | Northwestern Pol. Univ |
Yang, Yu | Northwestern Pol. Univ |
Guo, Zhigao | Northwestern Pol. Univ |
Chen, Da-qing | London South Bank Univ |
Keywords: Model selection, Machine learning and data mining
Abstract: One of the essential problems on Bayesian networks (BNs) is parameter learning. When purely data-driven methods fail to work, incorporating supplemental information, like expert judgments, can improve the learning of BN parameters. In practice, expert judgments are provided and transformed into qualitative parameter constraints. Moreover, prior distributions of BN parameters are also useful information. In this paper we propose a Bayesian approach to learn parameters from small datasets by integrating both parameter constraints and prior distributions. First, the feasible parameter region is derived from constraints. Then, using the prior distribution, a posterior distribution over the feasible region is developed based on the Bayes theorem. Finally, the parameter estimations are taken as the mean values of the posterior distribution. Learning experiments on standard BNs reveal that the proposed method outperforms most of the existing methods.
|
|
14:00-16:10, Paper ThPT1.34 | |
True-Negative Label Selection for Large-Scale Multi-Label Learning |
Kanehira, Atsushi | Univ. of Tokyo |
Shin, Andrew | The Univ. of Tokyo |
Harada, Tatsuya | The Univ. of Tokyo |
Keywords: Multimedia analysis, indexing and retrieval, Image and video analysis and understanding, Machine learning and data mining
Abstract: In this paper, we focus on training a classifier from large-scale data with incompletely assigned labels. In other words, we treat samples with following properties: 1. assigned labels are definitely positive, 2. absent labels are not necessarily negative, and 3. samples are allowed to take more than one label. These properties are frequently found in various kinds of computer vision tasks, including image and video classification and retrieval. Many online algorithms for multi-label task employ label sampling, which selects a label pair that reduces the largest penalty to update the model, thereby avoiding waste of computation. In the setting above, however, there are “false-negative” labels, which are originally positive labels but regarded as negative. Since it is high likely for label sampling to select these labels as negative labels in the sampled pair, it may severely degrade classification performance. In order to solve this problem while preserving convergence property of the online algorithms, we propose a novel label sampling approach, which aims to fetch “true-negative” labels via false-negativeness measure based on independently trained uni-class classifiers. Experimental results show the effectiveness of our approach.
|
|
14:00-16:10, Paper ThPT1.35 | |
Learning Data-Driven Image Similarity Measure |
Kobayashi, Takumi | National Inst. of Advanced Industrial Science And |
Keywords: Representation and analysis in pixel/voxel images, Statistical, syntactic and structural pattern recognition, Segmentation, features and descriptors
Abstract: Image quality assessment gains a greater interest due to development of digital imaging and storage. In that field, structural similarity (SSIM) index has been shown to favorably agree with human perceptual assessment, significantly outperforming the method of mean squared error, i.e., L2 distance. The similarity measure function in SSIM which compares a target (distorted) image with its reference (original) image is hand-crafted in a simple form via a top-down approach based on the human visual system. It, however, might lack optimality without directly considering the relationships between image data and the perceptual assessment (scores). In this paper, we propose a method to construct an image similarity measure based on actual data. The proposed method optimizes a similarity measure function by exploiting annotated data in a bottom-up and data-driven manner, while retaining the favorable property of structural similarity in SSIM. The non-linear similarity function is optimized as the global optimum of high generalization power. In addition, the proposed method is simply formulated and thus applicable to the family of SSIM, especially to FSIM which has been recently proposed exhibiting superior performance to SSIM. The experimental results on image quality assessment demonstrate the effectiveness of the proposed method compared to the other methods.
|
|
14:00-16:10, Paper ThPT1.36 | |
Information-Theoretic Atomic Representation for Robust Pattern Classification |
Wang, Yulong | Univ. of Macau |
Tang, YuanYan | Univ. of Macao |
Li, Luoqing | Hubei Univ |
Wang, Patrick | Northeastern Univ |
Keywords: Classification and clustering, Face recognition, Handwriting Recognition
Abstract: Representation-based classifiers (RCs) including sparse RC (SRC) have attracted intensive interest in pattern recognition in recent years. In our previous work, we have proposed a general framework called atomic representationbased classifier (ARC) including many popular RCs as special cases. Despite the empirical success, ARC and conventional RCs utilize the mean square error (MSE) criterion and assign the same weights to all entries of the test data, including both severely corrupted and clean ones. This makes ARC sensitive to the entries with large noise and outliers. In this work, we propose an information-theoretic ARC (ITARC) framework to alleviate such limitation of ARC. Using ITARC as a general platform, we develop three novel representation-based classifiers. The experiments on public real-world datasets demonstrate the efficacy of ITARC for robust pattern recognition.
|
|
14:00-16:10, Paper ThPT1.37 | |
Fully Automatic Image Colorization Based on Convolutional Neural Network |
Varga, Domonkos | Inst. for Computer Science and Control, Hungarian Acad. Of |
Sziranyi, Tamas | Mta Sztaki |
Keywords: Artificial neural networks, Deep learning, Texture and color analysis
Abstract: This paper deals with automatic image colorization. This is a very difficult task, since it is an ill-posed problem that usually requires user intervention to achieve high quality. A fully automatic approach is proposed that is able to produce realistic colorization of an input grayscale image. Motivated by the recent success of deep learning techniques in image processing, we propose a feed-forward, two-stage architecture based on Convolutional Neural Network that predicts the U and V color channels. Unlike most of the previous works, this paper presents a fully automatic colorization which is able to produce high-quality and realistic colorization even of complex scenes. Comprehensive experiments and qualitative and quantitative evaluations were conducted on the images of SUN database and on other images. We have found that Quaternion Structural Similarity (QSSIM) gives in some degree a good base for quantitative evaluation, that is why we chose QSSIM as an index-number for the quality of colorization.
|
|
14:00-16:10, Paper ThPT1.38 | |
Integrating Deep Features for Material Recognition |
Zhang, Yan | Tohoku Univ |
Ozay, Mete | Tohoku Univ |
Liu, Xing | TOHOKU Univ |
Okatani, Takayuki | Tohoku Univ |
Keywords: Deep learning
Abstract: This paper considers the problem of material recognition. Motivated by observation of close interconnections between material and object recognition, we study how to select and integrate multiple features obtained by different models of Convolutional Neural Networks (CNNs) trained in a transfer learning setting. To be specific, we first compute activations of features using representations on images to select a set of samples which are best represented by the features. Then, we measure uncertainty of the features by computing entropy of class distributions for each sample set. Finally, we compute contribution of each feature to representation of classes for feature selection and integration. Experimental results show that the proposed method achieves state-of-the-art performance on two benchmark datasets for material recognition. Additionally, we introduce a new material dataset, named EFMD, which extends Flickr Material Database (FMD). By the employment of the EFMD for transfer learning, we achieve 84.0+/-1.8 accuracy on the FMD dataset, which is close to the reported human performance 84.9.
|