Seung-Geon Lee, Jaedeok Kim, Hyun-Joo Jung, Yoonsuck Choe
Estimating the relative importance of each sample in a training set has
important practical and theoretical value, such as in importance sampling or
curriculum learning. This kind of focus on individual samples invokes the
concept of sample-wise learnability: How easy is it to correctly learn each
sample (cf. PAC learnability)? In this paper, we approach the sample-wise
learnability problem within a deep learning context. We propose a measure of
the learnability of a sample with a given deep neural network (DNN) model. The
basic idea is to train the given model on the training set, and for each
sample, aggregate the hits and misses over the entire training epochs. Our
experiments show that the sample-wise learnability measure collected this way
is highly linearly correlated across different DNN models (ResNet-20, VGG-16,
and MobileNet), suggesting that such a measure can provide deep general
insights on the data's properties. We expect our method to help develop better
curricula for training, and help us better understand the data itself.
Authors' comments: Accepted to AAAI 2019 Student Abstract
Sida Peng, Yuan Liu, Qixing Huang, Hujun Bao, Xiaowei Zhou
This paper addresses the challenge of 6DoF pose estimation from a single RGB
image under severe occlusion or truncation. Many recent works have shown that a
two-stage approach, which first detects keypoints and then solves a
Perspective-n-Point (PnP) problem for pose estimation, achieves remarkable
performance. However, most of these methods only localize a set of sparse
keypoints by regressing their image coordinates or heatmaps, which are
sensitive to occlusion and truncation. Instead, we introduce a Pixel-wise
Voting Network (PVNet) to regress pixel-wise unit vectors pointing to the
keypoints and use these vectors to vote for keypoint locations using RANSAC.
This creates a flexible representation for localizing occluded or truncated
keypoints. Another important feature of this representation is that it provides
uncertainties of keypoint locations that can be further leveraged by the PnP
solver. Experiments show that the proposed approach outperforms the state of
the art on the LINEMOD, Occlusion LINEMOD and YCB-Video datasets by a large
margin, while being efficient for real-time pose estimation. We further create
a Truncation LINEMOD dataset to validate the robustness of our approach against
truncation. The code will be avaliable at https://zju-3dv.github.io/pvnet/.
Authors' comments: The first two authors contributed equally to this paper. Project
page: https://zju-3dv.github.io/pvnet/
Nian Liu, Junwei Han, Ming-Hsuan Yang
In saliency detection, every pixel needs contextual information to make saliency prediction. Previous models usually incorporate contexts holistically. However, for each pixel, usually only part of its context region is useful and contributes to its prediction, while some other part may serve as noises and distractions. In this paper, we propose a novel pixel-wise contextual attention network, \ie PiCANet, to learn to selectively attend to informative context locations at each pixel. Specifically, PiCANet generates an attention map over the context region of each pixel, where each attention weight corresponds to the relevance of a context location w.r.t the referred pixel. Then, attentive contextual features can be constructed via selectively incorporating the features of useful context locations with the learned attention. We propose three specific formulations of the PiCANet via embedding the pixel-wise contextual attention mechanism into the pooling and convolution operations with attending to global or local contexts. All the three models are fully differentiable and can be integrated with CNNs with joint training. We introduce the proposed PiCANets into a U-Net architecture for salient object detection. Experimental results indicate that the proposed PiCANets can significantly improve the saliency detection performance. The generated global and local attention can learn to incorporate global contrast and smoothness, respectively, which help localize salient objects more accurately and highlight them more uniformly. Consequently, our saliency model performs favorably against other state-of-the-art methods. Moreover, we also validate that PiCANets can also improve semantic segmentation and object detection performances, which further demonstrates their effectiveness and generalization ability.
Martin Mundt, Sagnik Majumder, Tobias Weis, Visvanathan Ramesh
We characterize convolutional neural networks with respect to the relative
amount of features per layer. Using a skew normal distribution as a
parametrized framework, we investigate the common assumption of monotonously
increasing feature-counts with higher layers of architecture designs. Our
evaluation on models with VGG-type layers on the MNIST, Fashion-MNIST and
CIFAR-10 image classification benchmarks provides evidence that motivates
rethinking of our common assumption: architectures that favor larger early
layers seem to yield better accuracy.
Authors' comments: Accepted at the Critiquing and Correcting Trends in Machine Learning
(CRACT) Workshop at the 32nd Conference on Neural Information Processing
Systems (NeurIPS 2018)
Chengyue Gong, Xu Tan, Di He, Tao Qin
Maximum-likelihood estimation (MLE) is widely used in sequence to sequence
tasks for model training. It uniformly treats the generation/prediction of each
target token as multi-class classification, and yields non-smooth prediction
probabilities: in a target sequence, some tokens are predicted with small
probabilities while other tokens are with large probabilities. According to our
empirical study, we find that the non-smoothness of the probabilities results
in low quality of generated sequences. In this paper, we propose a
sentence-wise regularization method which aims to output smooth prediction
probabilities for all the tokens in the target sequence. Our proposed method
can automatically adjust the weights and gradients of each token in one
sentence to ensure the predictions in a sequence uniformly well. Experiments on
three neural machine translation tasks and one text summarization task show
that our method outperforms conventional MLE loss on all these tasks and
achieves promising BLEU scores on WMT14 English-German and WMT17
Chinese-English translation task.
Authors' comments: AAAI 2019
Alexey Kruglov
Neural network pruning is an important step in design process of efficient neural networks for edge devices with limited computational power. Pruning is a form of knowledge transfer from the weights of the original network to a smaller target subnetwork. We propose a new method for compute-constrained structured channel-wise pruning of convolutional neural networks. The method iteratively fine-tunes the network, while gradually tapering the computation resources available to the pruned network via a holonomic constraint in the method of Lagrangian multipliers framework. An explicit and adaptive automatic control over the rate of tapering is provided. The trainable parameters of our pruning method are separate from the weights of the neural network, which allows us to avoid the interference with the neural network solver (e.g. avoid the direct dependence of pruning speed on neural network learning rates). Our method combines the `rigoristic' approach by the direct application of constrained optimization, avoiding the pitfalls of ADMM-based methods, like their need to define the target amount of resources for each pruning run, and direct dependence of pruning speed and priority of pruning on the relative scale of weights between layers. For VGG-16 @ ILSVRC-2012, we achieve reduction of 15.47 -> 3.87 GMAC with only 1% top-1 accuracy reduction (68.4% -> 67.4%). For AlexNet @ ILSVRC-2012, we achieve 0.724 -> 0.411 GMAC with 1% top-1 accuracy reduction (56.8% -> 55.8%).
J. I. Penney, A. W. Blain, D. Wylezalek, N. A. Hatch, C. Lonsdale, A. Kimball, R. J. Assef, J. J. Condon et al.
We have observed the environments of a population of 33 heavily dust
obscured, ultra-luminous, high-redshift galaxies, selected using WISE and NVSS
at $z>$1.3 with the Infra-Red Array Camera on the $Spitzer$ Space Telescope
over $\rm5.12\,'\times5.12\,'$ fields. Colour selections are used to quantify
any potential overdensities of companion galaxies in these fields. We find no
significant excess of galaxies with the standard colour selection for IRAC
colours of $\rm[3.6]-[4.5]>-0.1$ consistent with galaxies at $z>$1.3 across the
whole fields with respect to wide-area $Spitzer$ comparison fields, but there
is a $\rm>2\sigma$ statistical excess within $\rm0.25\,'$ of the central
radio-WISE galaxy. Using a colour selection of $\rm[3.6]-[4.5]>0.4$, 0.5
magnitudes redder than the standard method of selecting galaxies at $z>$1.3, we
find a significant overdensity, in which $\rm76\%$ ($\rm33\%$) of the 33 fields
have a surface density greater than the $\rm3\sigma$ ($\rm5\sigma$) level.
There is a statistical excess of these redder galaxies within $\rm0.5\,'$,
rising to a central peak $\rm\sim2$--4 times the average density. This implies
that these galaxies are statistically linked to the radio-WISE selected galaxy,
indicating similar structures to those traced by red galaxies around radio-loud
AGN.
Authors' comments: 17 pages, 16 figures, 2 tables
Yuefeng Liang, Cho-Jui Hsieh, Thomas C. M. Lee
Extreme multi-label classification aims to learn a classifier that annotates an instance with a relevant subset of labels from an extremely large label set. Many existing solutions embed the label matrix to a low-dimensional linear subspace, or examine the relevance of a test instance to every label via a linear scan. In practice, however, those approaches can be computationally exorbitant. To alleviate this drawback, we propose a Block-wise Partitioning (BP) pretreatment that divides all instances into disjoint clusters, to each of which the most frequently tagged label subset is attached. One multi-label classifier is trained on one pair of instance and label clusters, and the label set of a test instance is predicted by first delivering it to the most appropriate instance cluster. Experiments on benchmark multi-label data sets reveal that BP pretreatment significantly reduces prediction time, and retains almost the same level of prediction accuracy.
Dong Huang, Chang-Dong Wang, Hongxing Peng, Jianhuang Lai, Chee-Keong Kwoh
Ensemble clustering has been a popular research topic in data mining and
machine learning. Despite its significant progress in recent years, there are
still two challenging issues in the current ensemble clustering research.
First, most of the existing algorithms tend to investigate the ensemble
information at the object-level, yet often lack the ability to explore the rich
information at higher levels of granularity. Second, they mostly focus on the
direct connections (e.g., direct intersection or pair-wise co-occurrence) in
the multiple base clusterings, but generally neglect the multi-scale indirect
relationship hidden in them. To address these two issues, this paper presents a
novel ensemble clustering approach based on fast propagation of cluster-wise
similarities via random walks. We first construct a cluster similarity graph
with the base clusters treated as graph nodes and the cluster-wise Jaccard
coefficient exploited to compute the initial edge weights. Upon the constructed
graph, a transition probability matrix is defined, based on which the random
walk process is conducted to propagate the graph structural information.
Specifically, by investigating the propagating trajectories starting from
different nodes, a new cluster-wise similarity matrix can be derived by
considering the trajectory relationship. Then, the newly obtained cluster-wise
similarity matrix is mapped from the cluster-level to the object-level to
achieve an enhanced co-association (ECA) matrix, which is able to
simultaneously capture the object-wise co-occurrence relationship as well as
the multi-scale cluster-wise relationship in ensembles. Finally, two novel
consensus functions are proposed to obtain the consensus clustering result.
Extensive experiments on a variety of real-world datasets have demonstrated the
effectiveness and efficiency of our approach.
Authors' comments: To appear in IEEE Transactions on Systems, Man, and Cybernetics:
Systems. The MATLAB source code of this work is available at:
http://www.researchgate.net/publication/328581758
Sharif Rahman
This paper puts forward a new generalized polynomial dimensional
decomposition (PDD), referred to as GPDD, comprising hierarchically ordered
measure-consistent multivariate orthogonal polynomials in dependent random
variables. Unlike the existing PDD, which is valid strictly for independent
random variables, no tensor-product structure is assumed or required. Important
mathematical properties of GPDD are studied by constructing dimension-wise
decomposition of polynomial spaces, deriving statistical properties of random
orthogonal polynomials, demonstrating completeness of orthogonal polynomials
for prescribed assumptions, and proving mean-square convergence to the correct
limit, including when there are infinitely many random variables. The GPDD
approximation proposed should be effective in solving high-dimensional
stochastic problems subject to dependent variables.
Authors' comments: 24 pages, two tables. arXiv admin note: substantial text overlap with
arXiv:1804.01647, arXiv:1804.05676
Qiaoying Huang, Dong Yang, Pengxiang Wu, Hui Qu, Jingru Yi, Dimitris Metaxas
We consider an MRI reconstruction problem with input of k-space data at a
very low undersampled rate. This can practically benefit patient due to reduced
time of MRI scan, but it is also challenging since quality of reconstruction
may be compromised. Currently, deep learning based methods dominate MRI
reconstruction over traditional approaches such as Compressed Sensing, but they
rarely show satisfactory performance in the case of low undersampled k-space
data. One explanation is that these methods treat channel-wise features
equally, which results in degraded representation ability of the neural
network. To solve this problem, we propose a new model called MRI Cascaded
Channel-wise Attention Network (MICCAN), highlighted by three components: (i) a
variant of U-net with Channel-wise Attention (UCA) module, (ii) a long skip
connection and (iii) a combined loss. Our model is able to attend to salient
information by filtering irrelevant features and also concentrate on
high-frequency information by enforcing low-frequency information bypassed to
the final output. We conduct both quantitative evaluation and qualitative
analysis of our method on a cardiac dataset. The experiment shows that our
method achieves very promising results in terms of three common metrics on the
MRI reconstruction with low undersampled k-space data.
Authors' comments: Accepted by the IEEE International Symposium on Biomedical Imaging
(ISBI) 2019. Code is available now
Animesh Koratana, Daniel Kang, Peter Bailis, Matei Zaharia
Knowledge distillation (KD) is a popular method for reducing the computational overhead of deep network inference, in which the output of a teacher model is used to train a smaller, faster student model. Hint training (i.e., FitNets) extends KD by regressing a student model's intermediate representation to a teacher model's intermediate representation. In this work, we introduce bLock-wise Intermediate representation Training (LIT), a novel model compression technique that extends the use of intermediate representations in deep network compression, outperforming KD and hint training. LIT has two key ideas: 1) LIT trains a student of the same width (but shallower depth) as the teacher by directly comparing the intermediate representations, and 2) LIT uses the intermediate representation from the previous block in the teacher model as an input to the current student block during training, avoiding unstable intermediate representations in the student network. We show that LIT provides substantial reductions in network depth without loss in accuracy -- for example, LIT can compress a ResNeXt-110 to a ResNeXt-20 (5.5x) on CIFAR10 and a VDCNN-29 to a VDCNN-9 (3.2x) on Amazon Reviews without loss in accuracy, outperforming KD and hint training in network size for a given accuracy. We also show that applying LIT to identical student/teacher architectures increases the accuracy of the student model above the teacher model, outperforming the recently-proposed Born Again Networks procedure on ResNet, ResNeXt, and VDCNN. Finally, we show that LIT can effectively compress GAN generators, which are not supported in the KD framework because GANs output pixels as opposed to probabilities.
Bartlomiej Waclaw, Justyna Cholewa-Waclaw, Philip Greulich
We propose an extension of the totally asymmetric simple exclusion process
(TASEP) in which particles hopping along a lattice can be blocked by obstacles
that dynamically attach/detach from lattice sites. The model can be thought as
TASEP with site-wise dynamic disorder. We consider two versions of defect
dynamics: (i) defects can bind to any site, irrespective of particle
occupation, (ii) defects only bind to sites which are not occupied by particles
(particle-obstacle exclusion). In case (i) there is a symmetric, parabolic-like
relationship between the current and particle density, as in the standard
TASEP. Case (ii) leads to a skewed relationship for slow defect dynamics. We
also show that the presence of defects induces particle clustering, despite the
translation invariance of the system. For open boundaries the same three phases
as for the standard TASEP are observed, albeit the position of phase boundaries
is affected by the presence of obstacles. We develop a simple mean-field theory
that captures the model's quantitative behaviour for periodic and open boundary
conditions and yields good estimates for the current-density relationship, mean
cluster sizes and phase boundaries. Lastly, we discuss an application of the
model to the biological process of gene transcription.
Authors' comments: submitted to J. Phys. A
M. P. B. Gallaugher, C. Biernacki, P. D. McNicholas
In recent years, data dimensionality has increasingly become a concern,
leading to many parameter and dimension reduction techniques being proposed in
the literature. A parameter-wise co-clustering model, for data modelled via
continuous random variables, is presented. The proposed model, although
allowing more flexibility, still maintains the very high degree of parsimony
achieved by traditional co-clustering. A stochastic expectation-maximization
(SEM) algorithm along with a Gibbs sampler is used for parameter estimation and
an integrated complete log-likelihood criterion is used for model selection.
Simulated and real datasets are used for illustration and comparison with
traditional co-clustering.
Authors' comments: Submitted to Pattern Recognition Letters
Ivano Notarnicola, Ying Sun, Gesualdo Scutari, Giuseppe Notarstefano
We study distributed big-data nonconvex optimization in multi-agent networks. We consider the (constrained) minimization of the sum of a smooth (possibly) nonconvex function, i.e., the agents' sum-utility, plus a convex (possibly) nonsmooth regularizer. Our interest is on big-data problems in which there is a large number of variables to optimize. If treated by means of standard distributed optimization algorithms, these large-scale problems may be intractable due to the prohibitive local computation and communication burden at each node. We propose a novel distributed solution method where, at each iteration, agents update in an uncoordinated fashion only one block of the entire decision vector. To deal with the nonconvexity of the cost function, the novel scheme hinges on Successive Convex Approximation (SCA) techniques combined with a novel block-wise perturbed push-sum consensus protocol, which is instrumental to perform local block-averaging operations and tracking of gradient averages. Asymptotic convergence to stationary solutions of the nonconvex problem is established. Finally, numerical results show the effectiveness of the proposed algorithm and highlight how the block dimension impacts on the communication overhead and practical convergence speed.
Zhao Zhong, Zichen Yang, Boyang Deng, Junjie Yan, Wei Wu, Jing Shao, Cheng-Lin Liu
Convolutional neural networks have gained a remarkable success in computer
vision. However, most usable network architectures are hand-crafted and usually
require expertise and elaborate design. In this paper, we provide a block-wise
network generation pipeline called BlockQNN which automatically builds
high-performance networks using the Q-Learning paradigm with epsilon-greedy
exploration strategy. The optimal network block is constructed by the learning
agent which is trained to choose component layers sequentially. We stack the
block to construct the whole auto-generated network. To accelerate the
generation process, we also propose a distributed asynchronous framework and an
early stop strategy. The block-wise generation brings unique advantages: (1) it
yields state-of-the-art results in comparison to the hand-crafted networks on
image classification, particularly, the best network generated by BlockQNN
achieves 2.35% top-1 error rate on CIFAR-10. (2) it offers tremendous reduction
of the search space in designing networks, spending only 3 days with 32 GPUs. A
faster version can yield a comparable result with only 1 GPU in 20 hours. (3)
it has strong generalizability in that the network built on CIFAR also performs
well on the larger-scale dataset. The best network achieves very competitive
accuracy of 82.0% top-1 and 96.0% top-5 on ImageNet.
Authors' comments: 14 pages, 18 figures
Mehmet Günel, Erkut Erdem, Aykut Erdem
Developing techniques for editing an outfit image through natural sentences
and accordingly generating new outfits has promising applications for art,
fashion and design. However, it is considered as a certainly challenging task
since image manipulation should be carried out only on the relevant parts of
the image while keeping the remaining sections untouched. Moreover, this
manipulation process should generate an image that is as realistic as possible.
In this work, we propose FiLMedGAN, which leverages feature-wise linear
modulation (FiLM) to relate and transform visual features with natural language
representations without using extra spatial information. Our experiments
demonstrate that this approach, when combined with skip connections and total
variation regularization, produces more plausible results than the baseline
work, and has a better localization capability when generating new outfits
consistent with the target description.
Authors' comments: Accepted to ECCV 2018, First Workshop on Computer Vision For Fashion,
Art and Design (extended version)
Weili Zhang, Naiyu Wang, Charles Nicholsonc, Mohammad Hadikhan Tehrani
This study introduces a comprehensive stage-wise decision framework to
support resilience planning for roadway networks regarding pre-disaster
mitigation (Stage I), post-disaster emergency response (Stage II) and long-term
recovery (Stage III). Three decision metrics are first defined, each based on a
derivation of the number of independent pathways (IPW) within a roadway system,
to measure the performance of a network in term of its robustness, redundancy,
and recoverability, respectively. Using the three IPW-based decision metrics, a
stage-wise decision process is then formulated as a stochastic multi-objective
optimization problem, which includes a project ranking mechanism to identify
pre-disaster network retrofit projects in Phase I, a prioritization approach
for temporary repairs to facilitate immediate post-disaster emergency responses
in Phase II, and a methodology for scheduling network-wide repairs during the
long-term recovery of the roadway system in Phase III. Finally, this stage-wise
decision framework is applied to the roadway network of Shelby County, TN, USA
subjected to seismic hazards, to illustrate its implementation in supporting
community network resilience planning.
Authors' comments: submit to Journal
T. S. Choi, Kiwing To, K. Y. Michael Wong
We compare the point-wise and segment-wise descriptions of the traffic
system. Using real data from the Taiwan highway system with a tremendous volume
of segment-wise data, we find that the segment-wise description is much more
informative of the evolution of the system during congestion. Congestion is
characterized by a loopy trajectory in the fundamental diagram. By considering
the area enclosed by the loop, we find that there are two types of congestion
dynamics -- moderate flow and serious congestion. They are different in terms
of whether the area enclosed vanishes. Data extracted from the time delays of
individual vehicles show that the area enclosed is a measure of the economic
loss due to congestion. The use of the loss area in helping to understand
various road characteristics is also explored.
Authors' comments: 13 pages, 12 figures
Manu Goyal, Jiahua Ng, Moi Hoon Yap
Lesion diagnosis of skin lesions is a very challenging task due to high
inter-class similarities and intra-class variations in terms of color, size,
site and appearance among different skin lesions. With the emergence of
computer vision especially deep learning algorithms, lesion diagnosis is made
possible using these algorithms trained on dermoscopic images. Usually, deep
classification networks are used for the lesion diagnosis to determine
different types of skin lesions. In this work, we used pixel-wise
classification network to provide lesion diagnosis rather than classification
network. We propose to use DeeplabV3+ for multi-class lesion diagnosis in
dermoscopic images of Task 3 of ISIC Challenge 2018. We used various
post-processing methods with DeeplabV3+ to determine the lesion diagnosis in
this challenge and submitted the test results.
Authors' comments: 6 pages, 4 figures and 2 tables