MaungMaung AprilPyone, Hitoshi Kiya
In this paper, we propose a novel defensive transformation that enables us to
maintain a high classification accuracy under the use of both clean images and
adversarial examples for adversarially robust defense. The proposed
transformation is a block-wise preprocessing technique with a secret key to
input images. We developed three algorithms to realize the proposed
transformation: Pixel Shuffling, Bit Flipping, and FFX Encryption. Experiments
were carried out on the CIFAR-10 and ImageNet datasets by using both black-box
and white-box attacks with various metrics including adaptive ones. The results
show that the proposed defense achieves high accuracy close to that of using
clean images even under adaptive attacks for the first time. In the best-case
scenario, a model trained by using images transformed by FFX Encryption (block
size of 4) yielded an accuracy of 92.30% on clean images and 91.48% under PGD
attack with a noise distance of 8/255, which is close to the non-robust
accuracy (95.45%) for the CIFAR-10 dataset, and it yielded an accuracy of
72.18% on clean images and 71.43% under the same attack, which is also close to
the standard accuracy (73.70%) for the ImageNet dataset. Overall, all three
proposed algorithms are demonstrated to outperform state-of-the-art defenses
including adversarial training whether or not a model is under attack.
Authors' comments: Under review
Sam Sattarzadeh, Mahesh Sudhakar, Anthony Lem, Shervin Mehryar, K. N. Plataniotis, Jongseong Jang, Hyunwoo Kim, Yeonjeong Jeong et al.
As an emerging field in Machine Learning, Explainable AI (XAI) has been
offering remarkable performance in interpreting the decisions made by
Convolutional Neural Networks (CNNs). To achieve visual explanations for CNNs,
methods based on class activation mapping and randomized input sampling have
gained great popularity. However, the attribution methods based on these
techniques provide lower resolution and blurry explanation maps that limit
their explanation power. To circumvent this issue, visualization based on
various layers is sought. In this work, we collect visualization maps from
multiple layers of the model based on an attribution-based input sampling
technique and aggregate them to reach a fine-grained and complete explanation.
We also propose a layer selection strategy that applies to the whole family of
CNN-based models, based on which our extraction framework is applied to
visualize the last layers of each convolutional block of the model. Moreover,
we perform an empirical analysis of the efficacy of derived lower-level
information to enhance the represented attributions. Comprehensive experiments
conducted on shallow and deep models trained on natural and industrial
datasets, using both ground-truth and model-truth based evaluation metrics
validate our proposed algorithm by meeting or outperforming the
state-of-the-art methods in terms of explanation ability and visual quality,
demonstrating that our method shows stability regardless of the size of objects
or instances to be explained.
Authors' comments: 9 pages, 9 figures, Accepted at the Thirty-Fifth AAAI Conference on
Artificial Intelligence (AAAI-21)
Xinyue Liang, Alireza M. Javid, Mikael Skoglund, Saikat Chatterjee
We design a low complexity decentralized learning algorithm to train a
recently proposed large neural network in distributed processing nodes
(workers). We assume the communication network between the workers is
synchronized and can be modeled as a doubly-stochastic mixing matrix without
having any master node. In our setup, the training data is distributed among
the workers but is not shared in the training process due to privacy and
security concerns. Using alternating-direction-method-of-multipliers (ADMM)
along with a layerwise convex optimization approach, we propose a decentralized
learning algorithm which enjoys low computational complexity and communication
cost among the workers. We show that it is possible to achieve equivalent
learning performance as if the data is available in a single place. Finally, we
experimentally illustrate the time complexity and convergence behavior of the
algorithm.
Authors' comments: Accepted to The International Joint Conference on Neural Networks
(IJCNN) 2020, to appear
Xuezhe Ma
In this paper, we introduce Apollo, a quasi-Newton method for nonconvex
stochastic optimization, which dynamically incorporates the curvature of the
loss function by approximating the Hessian via a diagonal matrix. Importantly,
the update and storage of the diagonal approximation of Hessian is as efficient
as adaptive first-order optimization methods with linear complexity for both
time and memory. To handle nonconvexity, we replace the Hessian with its
rectified absolute value, which is guaranteed to be positive-definite.
Experiments on three tasks of vision and language show that Apollo achieves
significant improvements over other stochastic optimization methods, including
SGD and variants of Adam, in term of both convergence speed and generalization
performance. The implementation of the algorithm is available at
https://github.com/XuezheMax/apollo.
Authors' comments: Fixed errors in convergence analysis. 29 pages (plus appendix), 6
figures, 7 tables
Qing Guo, Jingyang Sun, Felix Juefei-Xu, Lei Ma, Xiaofei Xie, Wei Feng, Yang Liu
Single-image deraining is rather challenging due to the unknown rain model.
Existing methods often make specific assumptions of the rain model, which can
hardly cover many diverse circumstances in the real world, making them have to
employ complex optimization or progressive refinement. This, however,
significantly affects these methods' efficiency and effectiveness for many
efficiency-critical applications. To fill this gap, in this paper, we regard
the single-image deraining as a general image-enhancing problem and originally
propose a model-free deraining method, i.e., EfficientDeRain, which is able to
process a rainy image within 10~ms (i.e., around 6~ms on average), over 80
times faster than the state-of-the-art method (i.e., RCDNet), while achieving
similar de-rain effects. We first propose the novel pixel-wise dilation
filtering. In particular, a rainy image is filtered with the pixel-wise kernels
estimated from a kernel prediction network, by which suitable multi-scale
kernels for each pixel can be efficiently predicted. Then, to eliminate the gap
between synthetic and real data, we further propose an effective data
augmentation method (i.e., RainMix) that helps to train network for real rainy
image handling.We perform comprehensive evaluation on both synthetic and
real-world rainy datasets to demonstrate the effectiveness and efficiency of
our method. We release the model and code in
https://github.com/tsingqguo/efficientderain.git.
Authors' comments: 9 pages, 9 figures
Siyuan Lu, Meiqi Wang, Shuang Liang, Jun Lin, Zhongfeng Wang
Designing hardware accelerators for deep neural networks (DNNs) has been much
desired. Nonetheless, most of these existing accelerators are built for either
convolutional neural networks (CNNs) or recurrent neural networks (RNNs).
Recently, the Transformer model is replacing the RNN in the natural language
processing (NLP) area. However, because of intensive matrix computations and
complicated data flow being involved, the hardware design for the Transformer
model has never been reported. In this paper, we propose the first hardware
accelerator for two key components, i.e., the multi-head attention (MHA)
ResBlock and the position-wise feed-forward network (FFN) ResBlock, which are
the two most complex layers in the Transformer. Firstly, an efficient method is
introduced to partition the huge matrices in the Transformer, allowing the two
ResBlocks to share most of the hardware resources. Secondly, the computation
flow is well designed to ensure the high hardware utilization of the systolic
array, which is the biggest module in our design. Thirdly, complicated
nonlinear functions are highly optimized to further reduce the hardware
complexity and also the latency of the entire system. Our design is coded using
hardware description language (HDL) and evaluated on a Xilinx FPGA. Compared
with the implementation on GPU with the same setting, the proposed design
demonstrates a speed-up of 14.6x in the MHA ResBlock, and 3.4x in the FFN
ResBlock, respectively. Therefore, this work lays a good foundation for
building efficient hardware accelerators for multiple Transformer networks.
Authors' comments: 6 pages, 8 figures. This work has been accepted by IEEE SOCC
(System-on-chip Conference) 2020, and peresnted by Siyuan Lu in SOCC2020. It
also received the Best Paper Award in the Methdology Track in this conference
Metodi P. Yankov, Uiara Celine de Moura, Francesco Da Ros
Cascades of a machine learning-based EDFA gain model trained on a single physical device and a fully differentiable stimulated Raman scattering fiber model are used to predict and optimize the power profile at the output of an experimental multi-span fully-loaded C-band optical communication system.
Li Yang, Zhezhi He, Junshan Zhang, Deliang Fan
Deep Neural Networks (DNN) could forget the knowledge about earlier tasks when learning new tasks, and this is known as \textit{catastrophic forgetting}. While recent continual learning methods are capable of alleviating the catastrophic problem on toy-sized datasets, some issues still remain to be tackled when applying them in real-world problems. Recently, the fast mask-based learning method (e.g. piggyback \cite{mallya2018piggyback}) is proposed to address these issues by learning only a binary element-wise mask in a fast manner, while keeping the backbone model fixed. However, the binary mask has limited modeling capacity for new tasks. A more recent work \cite{hung2019compacting} proposes a compress-grow-based method (CPG) to achieve better accuracy for new tasks by partially training backbone model, but with order-higher training cost, which makes it infeasible to be deployed into popular state-of-the-art edge-/mobile-learning. The primary goal of this work is to simultaneously achieve fast and high-accuracy multi task adaption in continual learning setting. Thus motivated, we propose a new training method called \textit{kernel-wise Soft Mask} (KSM), which learns a kernel-wise hybrid binary and real-value soft mask for each task, while using the same backbone model. Such a soft mask can be viewed as a superposition of a binary mask and a properly scaled real-value tensor, which offers a richer representation capability without low-level kernel support to meet the objective of low hardware overhead. We validate KSM on multiple benchmark datasets against recent state-of-the-art methods (e.g. Piggyback, Packnet, CPG, etc.), which shows good improvement in both accuracy and training cost.
Laura Giordano, Valentina Gliozzi, Daniele Theseider Dupré
Inthispaperwedescribeaconcept-wisemulti-preferencesemantics for description
logic which has its root in the preferential approach for modeling defeasible
reasoning in knowledge representation. We argue that this proposal, beside
satisfying some desired properties, such as KLM postulates, and avoiding the
drowning problem, also defines a plausible notion of semantics. We motivate the
plausibility of the concept-wise multi-preference semantics by developing a
logical semantics of self-organising maps, which have been proposed as possible
candidates to explain the psychological mechanisms underlying category
generalisation, in terms of multi-preference interpretations.
Authors' comments: 13 pages
Cong Guo, Bo Yang Hsueh, Jingwen Leng, Yuxian Qiu, Yue Guan, Zehuan Wang, Xiaoying Jia, Xipeng Li et al.
Network pruning can reduce the high computation cost of deep neural network
(DNN) models. However, to maintain their accuracies, sparse models often carry
randomly-distributed weights, leading to irregular computations. Consequently,
sparse models cannot achieve meaningful speedup on commodity hardware (e.g.,
GPU) built for dense matrix computations. As such, prior works usually modify
or design completely new sparsity-optimized architectures for exploiting
sparsity. We propose an algorithm-software co-designed pruning method that
achieves latency speedups on existing dense architectures. Our work builds upon
the insight that the matrix multiplication generally breaks the large matrix
into multiple smaller tiles for parallel execution. We propose a
tiling-friendly "tile-wise" sparsity pattern, which maintains a regular pattern
at the tile level for efficient execution but allows for irregular, arbitrary
pruning at the global scale to maintain the high accuracy. We implement and
evaluate the sparsity pattern on GPU tensor core, achieving a 1.95x speedup
over the dense model.
Authors' comments: 12pages, ACM/IEEE Proceedings of the International Conference for
High Performance Computing, Networking, Storage and Analysis (SC20)
Qiong Liu
Debris disks around stars are considered as components of planetary systems.
Constrain the dust properties of these disks can give crucial information to
formation and evolution of planetary systems. As an all-sky survey,
\textit{InfRared Astronomical Satellite} (\iras) gave great contribution to the
debris disk searching which discovered the first debris disk host star (Vega).
The \iras-detected debris disk sample published by Rhee \citep{rhe07} contains
146 stars with detailed information of dust properties. While the dust
properties of 45 of them still can not be determined due to the limitations
with the \iras\ database (have \iras\ detection at 60 $\mu$m only). Therefore,
using more sensitivity data of \textit{Wide-field Infrared Survey Explorer}
(\wise), we can better characterize the sample stars: For the stars with \iras\
detection at 60 $\mu$m only, we refit the excessive flux densities and obtain
the dust temperatures and fractional luminosities; While for the remaining
stars with multi-bands \iras\ detections, the dust properties are revised which
show that the dust temperatures were over estimated in high temperatures band
before. Moreover, we identify 17 stars with excesses at the \wise\ 22 $\mu$m
which have smaller distribution of distance from Earth and higher fractional
luminosities than the other stars without mid-infrared excess emission. Among
them, 15 stars can be found in previous works.
Authors' comments: 21 pages, 4 figures; 4 Tables, RAA in press
O. V. Maryeva, V. V. Gvaramadze, A. Y. Kniazev, L. N. Berdnikov
We present the results of study of the Galactic candidate luminous blue
variable Wray 15-906, revealed via detection of its infrared circumstellar
shell (of \approx2 pc in diameter) with the Wide-field Infrared Survey Explorer
and the Herschel Space Observatory. Using the stellar atmosphere code CMFGEN
and the Gaia parallax, we found that Wray 15-906 is a relatively
low-luminosity, log(L/Lsun)\approx5.4, star of temperature of 25\pm2 kK, with a
mass-loss rate of \approx3\times10^{-5} Msun/yr, a wind velocity of 280\pm50
km/s, and a surface helium abundance of 65\pm2 per cent (by mass). In the
framework of single star evolution, the obtained results suggest that Wray
15-906 is a post-red supergiant star with initial mass of \approx26\pm2 Msun
and that before exploding as a supernova it could transform for a short time
into a WN11h star. Our spectroscopic monitoring with the Southern African Large
Telescope (SALT) does not reveal significant changes in the spectrum of Wray
15-906 during the last 8 yr, while the V-band light curve of this star over
years 1999--2019 shows quasi-periodic variability with a period of \approx1700
d and an amplitude of \approx0.1 mag. We estimated the mass of the shell to be
2.9\pm0.5 Msun assuming the gas-to-dust mass ratio of 200. The presence of such
a shell indicates that Wray 15-906 has suffered substantial mass loss in the
recent past. We found that the open star cluster C1128-631 could be the birth
place of Wray 15-906 provided that this star is a rejuvenated product of binary
evolution (a blue straggler).
Authors' comments: 18 pages, 15 figures, accepted to MNRAS
Esa Ollila, Ammar Mian
Huber's criterion can be used for robust joint estimation of regression and
scale parameters in the linear model. Huber's (Huber, 1981) motivation for
introducing the criterion stemmed from non-convexity of the joint maximum
likelihood objective function as well as non-robustness (unbounded influence
function) of the associated ML-estimate of scale. In this paper, we illustrate
how the original algorithm proposed by Huber can be set within the block-wise
minimization majorization framework. In addition, we propose novel
data-adaptive step sizes for both the location and scale, which are further
improving the convergence. We then illustrate how Huber's criterion can be used
for sparse learning of underdetermined linear model using the iterative hard
thresholding approach. We illustrate the usefulness of the algorithms in an
image denoising application and simulation studies.
Authors' comments: To appear in International Workshop on Machine Learning for Signal
Processing (MLSP), 2020
Liangwei Li, Liucheng Sun, Chenwei Weng, Chengfu Huo, Weijun Ren
Online electronic coupon (e-coupon) is becoming a primary tool for e-commerce platforms to attract users to place orders. E-coupons are the digital equivalent of traditional paper coupons which provide customers with discounts or gifts. One of the fundamental problems related is how to deliver e-coupons with minimal cost while users' willingness to place an order is maximized. We call this problem the coupon allocation problem. This is a non-trivial problem since the number of regular users on a mature e-platform often reaches hundreds of millions and the types of e-coupons to be allocated are often multiple. The policy space is extremely large and the online allocation has to satisfy a budget constraint. Besides, one can never observe the responses of one user under different policies which increases the uncertainty of the policy making process. Previous work fails to deal with these challenges. In this paper, we decompose the coupon allocation task into two subtasks: the user intent detection task and the allocation task. Accordingly, we propose a two-stage solution: at the first stage (detection stage), we put forward a novel Instantaneous Intent Detection Network (IIDN) which takes the user-coupon features as input and predicts user real-time intents; at the second stage (allocation stage), we model the allocation problem as a Multiple-Choice Knapsack Problem (MCKP) and provide a computational efficient allocation method using the intents predicted at the detection stage. We conduct extensive online and offline experiments and the results show the superiority of our proposed framework, which has brought great profits to the platform and continues to function online.
Wenli Mo, Anthony Gonzalez, Mark Brodwin, Bandon Decker, Peter Eisenhardt, Emily Moravec, S. A. Stanford, Daniel Stern et al.
We present a study of the central radio activity of galaxy clusters at high
redshift. Using a large sample of galaxy clusters at $0.7<z<1.5$ from the
Massive and Distant Clusters of {\it WISE} Survey and the Faint Images of the
Radio Sky at Twenty-Centimeters $1.4$~GHz catalog, we measure the fraction of
clusters containing a radio source within the central $500$~kpc, which we term
the cluster radio-active fraction, and the fraction of cluster galaxies within
the central $500$~kpc exhibiting radio emission. We find tentative
($2.25\sigma$) evidence that the cluster radio-active fraction increases with
cluster richness, while the fraction of cluster galaxies that are
radio-luminous ($L_{1.4~\mathrm{GHz}}\geq10^{25}$~W~Hz$^{-1}$) does not
correlate with richness at a statistically significant level. Compared to that
calculated at $0 < z < 0.6$, the cluster radio-active fraction at $0 < z < 1.5$
increases by a factor of $10$. This fraction is also dependent on the radio
luminosity. Clusters at higher redshift are much more likely to host a radio
source of luminosity $L_{1.4~\mathrm{GHz}}\gtrsim10^{26}$~W~Hz$^{-1}$ than are
lower redshift clusters. We compare the fraction of radio-luminous cluster
galaxies to the fraction measured in a field environment. For $0.7<z<1.5$, we
find that both the cluster and field radio-luminous galaxy fraction increases
with stellar mass, regardless of environment, though at fixed stellar mass,
cluster galaxies are roughly $2$ times more likely to be radio-luminous than
field galaxies.
Authors' comments: 12 pages, 6 figures, accepted to ApJ
Haohe Liu, Lei Xie, Jian Wu, Geng Yang
This paper presents a new input format, channel-wise subband input (CWS), for
convolutional neural networks (CNN) based music source separation (MSS) models
in the frequency domain. We aim to address the major issues in CNN-based
high-resolution MSS model: high computational cost and weight sharing between
distinctly different bands. Specifically, in this paper, we decompose the input
mixture spectra into several bands and concatenate them channel-wise as the
model input. The proposed approach enables effective weight sharing in each
subband and introduces more flexibility between channels. For comparison
purposes, we perform voice and accompaniment separation (VAS) on models with
different scales, architectures, and CWS settings. Experiments show that the
CWS input is beneficial in many aspects. We evaluate our method on musdb18hq
test set, focusing on SDR, SIR and SAR metrics. Among all our experiments, CWS
enables models to obtain 6.9% performance gain on the average metrics. With
even a smaller number of parameters, less training data, and shorter training
time, our MDenseNet with 8-bands CWS input still surpasses the original
MMDenseNet with a large margin. Moreover, CWS also reduces computational cost
and training time to a large extent.
Authors' comments: Accepted in INTERSPEECH 2020
Fotios Logothetis, Ignas Budvytis, Roberto Mecca, Roberto Cipolla
Retrieving accurate 3D reconstructions of objects from the way they reflect light is a very challenging task in computer vision. Despite more than four decades since the definition of the Photometric Stereo problem, most of the literature has had limited success when global illumination effects such as cast shadows, self-reflections and ambient light come into play, especially for specular surfaces. Recent approaches have leveraged the power of deep learning in conjunction with computer graphics in order to cope with the need of a vast number of training data in order to invert the image irradiance equation and retrieve the geometry of the object. However, rendering global illumination effects is a slow process which can limit the amount of training data that can be generated. In this work we propose a novel pixel-wise training procedure for normal prediction by replacing the training data (observation maps) of globally rendered images with independent per-pixel generated data. We show that global physical effects can be approximated on the observation map domain and this simplifies and speeds up the data creation procedure. Our network, PX-NET, achieves the state-of-the-art performance compared to other pixelwise methods on synthetic datasets, as well as the Diligent real dataset on both dense and sparse light settings.
Wasi Uddin Ahmad, Xiao Bai, Soomin Lee, Kai-Wei Chang
Natural language processing techniques have demonstrated promising results in
keyphrase generation. However, one of the major challenges in \emph{neural}
keyphrase generation is processing long documents using deep neural networks.
Generally, documents are truncated before given as inputs to neural networks.
Consequently, the models may miss essential points conveyed in the target
document. To overcome this limitation, we propose \emph{SEG-Net}, a neural
keyphrase generation model that is composed of two major components, (1) a
selector that selects the salient sentences in a document and (2) an
extractor-generator that jointly extracts and generates keyphrases from the
selected sentences. SEG-Net uses Transformer, a self-attentive architecture, as
the basic building block with a novel \emph{layer-wise} coverage attention to
summarize most of the points discussed in the document. The experimental
results on seven keyphrase generation benchmarks from scientific and web
documents demonstrate that SEG-Net outperforms the state-of-the-art neural
generative methods by a large margin.
Authors' comments: ACL 2021 (camera ready)
Shuai Zhang, Peng Zhang, Xindian Ma, Junqiu Wei, Ningning Wang, Qun Liu
Transformer has been widely-used in many Natural Language Processing (NLP) tasks and the scaled dot-product attention between tokens is a core module of Transformer. This attention is a token-wise design and its complexity is quadratic to the length of sequence, limiting its application potential for long sequence tasks. In this paper, we propose a dimension-wise attention mechanism based on which a novel language modeling approach (namely TensorCoder) can be developed. The dimension-wise attention can reduce the attention complexity from the original $O(N^2d)$ to $O(Nd^2)$, where $N$ is the length of the sequence and $d$ is the dimensionality of head. We verify TensorCoder on two tasks including masked language modeling and neural machine translation. Compared with the original Transformer, TensorCoder not only greatly reduces the calculation of the original model but also obtains improved performance on masked language modeling task (in PTB dataset) and comparable performance on machine translation tasks.
Alexandra-Ioana Albu, Alina Enescu, Luigi Malagò
Anomaly detection for Magnetic Resonance Images (MRIs) can be solved with
unsupervised methods by learning the distribution of healthy images and
identifying anomalies as outliers. In presence of an additional dataset of
unlabelled data containing also anomalies, the task can be framed as a
semi-supervised task with negative and unlabelled sample points. Recently, in
Albu et al., 2020, we have proposed a slice-wise semi-supervised method for
tumour detection based on the computation of a dissimilarity function in the
latent space of a Variational AutoEncoder, trained on unlabelled data. The
dissimilarity is computed between the encoding of the image and the encoding of
its reconstruction obtained through a different autoencoder trained only on
healthy images. In this paper we present novel and improved results for our
method, obtained by training the Variational AutoEncoders on a subset of the
HCP and BRATS-2018 datasets and testing on the remaining individuals. We show
that by training the models on higher resolution images and by improving the
quality of the reconstructions, we obtain results which are comparable with
different baselines, which employ a single VAE trained on healthy individuals.
As expected, the performance of our method increases with the size of the
threshold used to determine the presence of an anomaly.
Authors' comments: In 2020 KDD Workshop on Applied Data Science for Healthcare, August
24, 2020, San Diego, CA, USA. ACM, New York, NY, USA, 4 pages