Dong Huang, Chang-Dong Wang, Hongxing Peng, Jianhuang Lai, Chee-Keong Kwoh
Ensemble clustering has been a popular research topic in data mining and
machine learning. Despite its significant progress in recent years, there are
still two challenging issues in the current ensemble clustering research.
First, most of the existing algorithms tend to investigate the ensemble
information at the object-level, yet often lack the ability to explore the rich
information at higher levels of granularity. Second, they mostly focus on the
direct connections (e.g., direct intersection or pair-wise co-occurrence) in
the multiple base clusterings, but generally neglect the multi-scale indirect
relationship hidden in them. To address these two issues, this paper presents a
novel ensemble clustering approach based on fast propagation of cluster-wise
similarities via random walks. We first construct a cluster similarity graph
with the base clusters treated as graph nodes and the cluster-wise Jaccard
coefficient exploited to compute the initial edge weights. Upon the constructed
graph, a transition probability matrix is defined, based on which the random
walk process is conducted to propagate the graph structural information.
Specifically, by investigating the propagating trajectories starting from
different nodes, a new cluster-wise similarity matrix can be derived by
considering the trajectory relationship. Then, the newly obtained cluster-wise
similarity matrix is mapped from the cluster-level to the object-level to
achieve an enhanced co-association (ECA) matrix, which is able to
simultaneously capture the object-wise co-occurrence relationship as well as
the multi-scale cluster-wise relationship in ensembles. Finally, two novel
consensus functions are proposed to obtain the consensus clustering result.
Extensive experiments on a variety of real-world datasets have demonstrated the
effectiveness and efficiency of our approach.
Authors' comments: To appear in IEEE Transactions on Systems, Man, and Cybernetics:
Systems. The MATLAB source code of this work is available at:
http://www.researchgate.net/publication/328581758
Sharif Rahman
This paper puts forward a new generalized polynomial dimensional
decomposition (PDD), referred to as GPDD, comprising hierarchically ordered
measure-consistent multivariate orthogonal polynomials in dependent random
variables. Unlike the existing PDD, which is valid strictly for independent
random variables, no tensor-product structure is assumed or required. Important
mathematical properties of GPDD are studied by constructing dimension-wise
decomposition of polynomial spaces, deriving statistical properties of random
orthogonal polynomials, demonstrating completeness of orthogonal polynomials
for prescribed assumptions, and proving mean-square convergence to the correct
limit, including when there are infinitely many random variables. The GPDD
approximation proposed should be effective in solving high-dimensional
stochastic problems subject to dependent variables.
Authors' comments: 24 pages, two tables. arXiv admin note: substantial text overlap with
arXiv:1804.01647, arXiv:1804.05676
Qiaoying Huang, Dong Yang, Pengxiang Wu, Hui Qu, Jingru Yi, Dimitris Metaxas
We consider an MRI reconstruction problem with input of k-space data at a
very low undersampled rate. This can practically benefit patient due to reduced
time of MRI scan, but it is also challenging since quality of reconstruction
may be compromised. Currently, deep learning based methods dominate MRI
reconstruction over traditional approaches such as Compressed Sensing, but they
rarely show satisfactory performance in the case of low undersampled k-space
data. One explanation is that these methods treat channel-wise features
equally, which results in degraded representation ability of the neural
network. To solve this problem, we propose a new model called MRI Cascaded
Channel-wise Attention Network (MICCAN), highlighted by three components: (i) a
variant of U-net with Channel-wise Attention (UCA) module, (ii) a long skip
connection and (iii) a combined loss. Our model is able to attend to salient
information by filtering irrelevant features and also concentrate on
high-frequency information by enforcing low-frequency information bypassed to
the final output. We conduct both quantitative evaluation and qualitative
analysis of our method on a cardiac dataset. The experiment shows that our
method achieves very promising results in terms of three common metrics on the
MRI reconstruction with low undersampled k-space data.
Authors' comments: Accepted by the IEEE International Symposium on Biomedical Imaging
(ISBI) 2019. Code is available now
Animesh Koratana, Daniel Kang, Peter Bailis, Matei Zaharia
Knowledge distillation (KD) is a popular method for reducing the computational overhead of deep network inference, in which the output of a teacher model is used to train a smaller, faster student model. Hint training (i.e., FitNets) extends KD by regressing a student model's intermediate representation to a teacher model's intermediate representation. In this work, we introduce bLock-wise Intermediate representation Training (LIT), a novel model compression technique that extends the use of intermediate representations in deep network compression, outperforming KD and hint training. LIT has two key ideas: 1) LIT trains a student of the same width (but shallower depth) as the teacher by directly comparing the intermediate representations, and 2) LIT uses the intermediate representation from the previous block in the teacher model as an input to the current student block during training, avoiding unstable intermediate representations in the student network. We show that LIT provides substantial reductions in network depth without loss in accuracy -- for example, LIT can compress a ResNeXt-110 to a ResNeXt-20 (5.5x) on CIFAR10 and a VDCNN-29 to a VDCNN-9 (3.2x) on Amazon Reviews without loss in accuracy, outperforming KD and hint training in network size for a given accuracy. We also show that applying LIT to identical student/teacher architectures increases the accuracy of the student model above the teacher model, outperforming the recently-proposed Born Again Networks procedure on ResNet, ResNeXt, and VDCNN. Finally, we show that LIT can effectively compress GAN generators, which are not supported in the KD framework because GANs output pixels as opposed to probabilities.
Bartlomiej Waclaw, Justyna Cholewa-Waclaw, Philip Greulich
We propose an extension of the totally asymmetric simple exclusion process
(TASEP) in which particles hopping along a lattice can be blocked by obstacles
that dynamically attach/detach from lattice sites. The model can be thought as
TASEP with site-wise dynamic disorder. We consider two versions of defect
dynamics: (i) defects can bind to any site, irrespective of particle
occupation, (ii) defects only bind to sites which are not occupied by particles
(particle-obstacle exclusion). In case (i) there is a symmetric, parabolic-like
relationship between the current and particle density, as in the standard
TASEP. Case (ii) leads to a skewed relationship for slow defect dynamics. We
also show that the presence of defects induces particle clustering, despite the
translation invariance of the system. For open boundaries the same three phases
as for the standard TASEP are observed, albeit the position of phase boundaries
is affected by the presence of obstacles. We develop a simple mean-field theory
that captures the model's quantitative behaviour for periodic and open boundary
conditions and yields good estimates for the current-density relationship, mean
cluster sizes and phase boundaries. Lastly, we discuss an application of the
model to the biological process of gene transcription.
Authors' comments: submitted to J. Phys. A
M. P. B. Gallaugher, C. Biernacki, P. D. McNicholas
In recent years, data dimensionality has increasingly become a concern,
leading to many parameter and dimension reduction techniques being proposed in
the literature. A parameter-wise co-clustering model, for data modelled via
continuous random variables, is presented. The proposed model, although
allowing more flexibility, still maintains the very high degree of parsimony
achieved by traditional co-clustering. A stochastic expectation-maximization
(SEM) algorithm along with a Gibbs sampler is used for parameter estimation and
an integrated complete log-likelihood criterion is used for model selection.
Simulated and real datasets are used for illustration and comparison with
traditional co-clustering.
Authors' comments: Submitted to Pattern Recognition Letters
Ivano Notarnicola, Ying Sun, Gesualdo Scutari, Giuseppe Notarstefano
We study distributed big-data nonconvex optimization in multi-agent networks. We consider the (constrained) minimization of the sum of a smooth (possibly) nonconvex function, i.e., the agents' sum-utility, plus a convex (possibly) nonsmooth regularizer. Our interest is on big-data problems in which there is a large number of variables to optimize. If treated by means of standard distributed optimization algorithms, these large-scale problems may be intractable due to the prohibitive local computation and communication burden at each node. We propose a novel distributed solution method where, at each iteration, agents update in an uncoordinated fashion only one block of the entire decision vector. To deal with the nonconvexity of the cost function, the novel scheme hinges on Successive Convex Approximation (SCA) techniques combined with a novel block-wise perturbed push-sum consensus protocol, which is instrumental to perform local block-averaging operations and tracking of gradient averages. Asymptotic convergence to stationary solutions of the nonconvex problem is established. Finally, numerical results show the effectiveness of the proposed algorithm and highlight how the block dimension impacts on the communication overhead and practical convergence speed.
Zhao Zhong, Zichen Yang, Boyang Deng, Junjie Yan, Wei Wu, Jing Shao, Cheng-Lin Liu
Convolutional neural networks have gained a remarkable success in computer
vision. However, most usable network architectures are hand-crafted and usually
require expertise and elaborate design. In this paper, we provide a block-wise
network generation pipeline called BlockQNN which automatically builds
high-performance networks using the Q-Learning paradigm with epsilon-greedy
exploration strategy. The optimal network block is constructed by the learning
agent which is trained to choose component layers sequentially. We stack the
block to construct the whole auto-generated network. To accelerate the
generation process, we also propose a distributed asynchronous framework and an
early stop strategy. The block-wise generation brings unique advantages: (1) it
yields state-of-the-art results in comparison to the hand-crafted networks on
image classification, particularly, the best network generated by BlockQNN
achieves 2.35% top-1 error rate on CIFAR-10. (2) it offers tremendous reduction
of the search space in designing networks, spending only 3 days with 32 GPUs. A
faster version can yield a comparable result with only 1 GPU in 20 hours. (3)
it has strong generalizability in that the network built on CIFAR also performs
well on the larger-scale dataset. The best network achieves very competitive
accuracy of 82.0% top-1 and 96.0% top-5 on ImageNet.
Authors' comments: 14 pages, 18 figures
Mehmet Günel, Erkut Erdem, Aykut Erdem
Developing techniques for editing an outfit image through natural sentences
and accordingly generating new outfits has promising applications for art,
fashion and design. However, it is considered as a certainly challenging task
since image manipulation should be carried out only on the relevant parts of
the image while keeping the remaining sections untouched. Moreover, this
manipulation process should generate an image that is as realistic as possible.
In this work, we propose FiLMedGAN, which leverages feature-wise linear
modulation (FiLM) to relate and transform visual features with natural language
representations without using extra spatial information. Our experiments
demonstrate that this approach, when combined with skip connections and total
variation regularization, produces more plausible results than the baseline
work, and has a better localization capability when generating new outfits
consistent with the target description.
Authors' comments: Accepted to ECCV 2018, First Workshop on Computer Vision For Fashion,
Art and Design (extended version)
Weili Zhang, Naiyu Wang, Charles Nicholsonc, Mohammad Hadikhan Tehrani
This study introduces a comprehensive stage-wise decision framework to
support resilience planning for roadway networks regarding pre-disaster
mitigation (Stage I), post-disaster emergency response (Stage II) and long-term
recovery (Stage III). Three decision metrics are first defined, each based on a
derivation of the number of independent pathways (IPW) within a roadway system,
to measure the performance of a network in term of its robustness, redundancy,
and recoverability, respectively. Using the three IPW-based decision metrics, a
stage-wise decision process is then formulated as a stochastic multi-objective
optimization problem, which includes a project ranking mechanism to identify
pre-disaster network retrofit projects in Phase I, a prioritization approach
for temporary repairs to facilitate immediate post-disaster emergency responses
in Phase II, and a methodology for scheduling network-wide repairs during the
long-term recovery of the roadway system in Phase III. Finally, this stage-wise
decision framework is applied to the roadway network of Shelby County, TN, USA
subjected to seismic hazards, to illustrate its implementation in supporting
community network resilience planning.
Authors' comments: submit to Journal
T. S. Choi, Kiwing To, K. Y. Michael Wong
We compare the point-wise and segment-wise descriptions of the traffic
system. Using real data from the Taiwan highway system with a tremendous volume
of segment-wise data, we find that the segment-wise description is much more
informative of the evolution of the system during congestion. Congestion is
characterized by a loopy trajectory in the fundamental diagram. By considering
the area enclosed by the loop, we find that there are two types of congestion
dynamics -- moderate flow and serious congestion. They are different in terms
of whether the area enclosed vanishes. Data extracted from the time delays of
individual vehicles show that the area enclosed is a measure of the economic
loss due to congestion. The use of the loss area in helping to understand
various road characteristics is also explored.
Authors' comments: 13 pages, 12 figures
Manu Goyal, Jiahua Ng, Moi Hoon Yap
Lesion diagnosis of skin lesions is a very challenging task due to high
inter-class similarities and intra-class variations in terms of color, size,
site and appearance among different skin lesions. With the emergence of
computer vision especially deep learning algorithms, lesion diagnosis is made
possible using these algorithms trained on dermoscopic images. Usually, deep
classification networks are used for the lesion diagnosis to determine
different types of skin lesions. In this work, we used pixel-wise
classification network to provide lesion diagnosis rather than classification
network. We propose to use DeeplabV3+ for multi-class lesion diagnosis in
dermoscopic images of Task 3 of ISIC Challenge 2018. We used various
post-processing methods with DeeplabV3+ to determine the lesion diagnosis in
this challenge and submitted the test results.
Authors' comments: 6 pages, 4 figures and 2 tables
Shyam Narayanan
Recently, many streaming algorithms have utilized generalizations of the fact
that the expected maximum distance of any $4$-wise independent random walk on a
line over $n$ steps is $O(\sqrt{n})$. In this paper, we show that $4$-wise
independence is required for all of these algorithms, by constructing a
$3$-wise independent random walk with expected maximum distance
$\Omega(\sqrt{n} \lg n)$ from the origin. We prove that this bound is tight for
the first and second moment, and also extract a surprising matrix inequality
from these results.
Next, we consider a generalization where the steps $X_i$ are $k$-wise
independent random variables with bounded $p$th moments. For general $k, p$, we
determine the (asymptotically) maximum possible $p$th moment of the supremum of
$X_1 + \dots + X_i$ over $1 \le i \le n$. We highlight the case $k = 4, p = 2$:
here, we prove that the second moment of the furthest distance traveled is
$O(\sum X_i^2)$. For this case, we only need the $X_i$'s to have bounded second
moments and do not even need the $X_i$'s to be identically distributed. This
implies an asymptotically stronger statement than Kolmogorov's maximal
inequality that requires only $4$-wise independent random variables, and
generalizes a recent result of B{\l}asiok.
Authors' comments: 26 pages
Anson Lam, Edward L. Wright, Matthew A. Malkan
While there are numerous criteria for photometrically identifying active
galactic nuclei (AGNs), searches in the optical and UV tend to exclude galaxies
that are highly dust obscured. This is problematic for constraining models of
AGN evolution and estimating the AGN contribution to the cosmic X-ray and IR
backgrounds, as highly obscured objects tend to be underrepresented in
large-scale surveys. To address this, we identify potentially obscured AGNs
using mid-IR color colors from the Wide-field Infrared Survey Explorer (WISE)
catalog. This paper presents the results of optical spectroscopy of obscured
AGN candidates using Keck DEIMOS, and their physical properties derived from
these spectra. We find that a $W1-W2>0.8$ color criterion effectively selects
AGNs with a higher median level of $E(B-V)$ extinction compared to the AGNs
found in the SDSS DR7 survey. This optical extinction can be measured using SED
modeling or by using $r-W1$ as a measure of optical to IR flux. We find that
specific, targeted observations are necessary to find the most highly optically
obscured AGNs, and that additional far-IR photometry is necessary to further
constrain the dust properties of these AGNs.
Authors' comments: 20 pages, 25 figures, accepted by MNRAS
Griffin Lacey, Graham W. Taylor, Shawki Areibi
Low precision weights, activations, and gradients have been proposed as a way
to improve the computational efficiency and memory footprint of deep neural
networks. Recently, low precision networks have even shown to be more robust to
adversarial attacks. However, typical implementations of low precision DNNs use
uniform precision across all layers of the network. In this work, we explore
whether a heterogeneous allocation of precision across a network leads to
improved performance, and introduce a learning scheme where a DNN
stochastically explores multiple precision configurations through learning.
This permits a network to learn an optimal precision configuration. We show on
convolutional neural networks trained on MNIST and ILSVRC12 that even though
these nets learn a uniform or near-uniform allocation strategy respectively,
stochastic precision leads to a favourable regularization effect improving
generalization.
Authors' comments: UAI 2018
Yingzhou Li, Jianfeng Lu, Zhe Wang
Leading eigenvalue problems for large scale matrices arise in many applications. Coordinate-wise descent methods are considered in this work for such problems based on a reformulation of the leading eigenvalue problem as a non-convex optimization problem. The convergence of several coordinate-wise methods is analyzed and compared. Numerical examples of applications to quantum many-body problems demonstrate the efficiency and provide benchmarks of the proposed coordinate-wise descent methods.
Andreas Bender, Fabian Scheipl
This article introduces the pammtools package, which facilitates data transformation, estimation and interpretation of Piece-wise exponential Additive Mixed Models. A special focus is on time-varying effects and cumulative effects of time-dependent covariates, where multiple past observations of a covariate can cumulatively affect the hazard, possibly weighted by a non-linear function. The package provides functions for convenient simulation and visualization of such effects as well as a robust and versatile function to transform time-to-event data from standard formats to a format suitable for their estimation. The models can be represented as Generalized Additive Mixed Models and estimated using the R package mgcv. Many examples on real and simulated data as well as the respective R code are provided throughout the article.
Badri N. Patro, Vinod K. Kurmi, Sandeep Kumar, Vinay P. Namboodiri
In this paper, we propose a method for obtaining sentence-level embeddings.
While the problem of securing word-level embeddings is very well studied, we
propose a novel method for obtaining sentence-level embeddings. This is
obtained by a simple method in the context of solving the paraphrase generation
task. If we use a sequential encoder-decoder model for generating paraphrase,
we would like the generated paraphrase to be semantically close to the original
sentence. One way to ensure this is by adding constraints for true paraphrase
embeddings to be close and unrelated paraphrase candidate sentence embeddings
to be far. This is ensured by using a sequential pair-wise discriminator that
shares weights with the encoder that is trained with a suitable loss function.
Our loss function penalizes paraphrase sentence embedding distances from being
too large. This loss is used in combination with a sequential encoder-decoder
network. We also validated our method by evaluating the obtained embeddings for
a sentiment analysis task. The proposed method results in semantic embeddings
and outperforms the state-of-the-art on the paraphrase generation and sentiment
analysis task on standard datasets. These results are also shown to be
statistically significant.
Authors' comments: COLING 2018 (accepted)
Xiaoxi He, Zimu Zhou, Lothar Thiele
Future mobile devices are anticipated to perceive, understand and react to
the world on their own by running multiple correlated deep neural networks
on-device. Yet the complexity of these neural networks needs to be trimmed down
both within-model and cross-model to fit in mobile storage and memory. Previous
studies focus on squeezing the redundancy within a single neural network. In
this work, we aim to reduce the redundancy across multiple models. We propose
Multi-Task Zipping (MTZ), a framework to automatically merge correlated,
pre-trained deep neural networks for cross-model compression. Central in MTZ is
a layer-wise neuron sharing and incoming weight updating scheme that induces a
minimal change in the error function. MTZ inherits information from each model
and demands light retraining to re-boost the accuracy of individual tasks.
Evaluations show that MTZ is able to fully merge the hidden layers of two
VGG-16 networks with a 3.18% increase in the test error averaged on ImageNet
and CelebA, or share 39.61% parameters between the two networks with <0.5%
increase in the test errors for both tasks. The number of iterations to retrain
the combined network is at least 17.8 times lower than that of training a
single VGG-16 network. Moreover, experiments show that MTZ is also able to
effectively merge multiple residual networks.
Authors' comments: Published as a conference paper at NeurIPS 2018
Sang-Ha Lee, Soon-Chul Kwon, Jin-Wook Shim, Jeong-Eun Lim, Jisang Yoo
Motion detection algorithms that can be applied to surveillance cameras such
as CCTV (Closed Circuit Television) have been studied extensively. Motion
detection algorithm is mostly based on background subtraction. One main issue
in this technique is that false positives of dynamic backgrounds such as wind
shaking trees and flowing rivers might occur. In this paper, we proposed a
method to search for dynamic background region by analyzing the video and
removing false positives by re-checking false positives. The proposed method
was evaluated based on CDnet 2012/2014 dataset obtained at
"changedetection.net" site. We also compared its processing speed with other
algorithms.
Authors' comments: 8 pages