Youngmin Ro, Jin Young Choi
Existing fine-tuning methods use a single learning rate over all layers. In
this paper, first, we discuss that trends of layer-wise weight variations by
fine-tuning using a single learning rate do not match the well-known notion
that lower-level layers extract general features and higher-level layers
extract specific features. Based on our discussion, we propose an algorithm
that improves fine-tuning performance and reduces network complexity through
layer-wise pruning and auto-tuning of layer-wise learning rates. The proposed
algorithm has verified the effectiveness by achieving state-of-the-art
performance on the image retrieval benchmark datasets (CUB-200, Cars-196,
Stanford online product, and Inshop). Code is available at
https://github.com/youngminPIL/AutoLR.
Authors' comments: Accepted to AAAI 2021
Quanyu Liao, Xin Wang, Bin Kong, Siwei Lyu, Youbing Yin, Qi Song, Xi Wu
Deep neural networks have been demonstrated to be vulnerable to adversarial attacks: subtle perturbations can completely change the classification results. Their vulnerability has led to a surge of research in this direction. However, most works dedicated to attacking anchor-based object detection models. In this work, we aim to present an effective and efficient algorithm to generate adversarial examples to attack anchor-free object models based on two approaches. First, we conduct category-wise instead of instance-wise attacks on the object detectors. Second, we leverage the high-level semantic information to generate the adversarial examples. Surprisingly, the generated adversarial examples it not only able to effectively attack the targeted anchor-free object detector but also to be transferred to attack other object detectors, even anchor-based detectors such as Faster R-CNN.
Qian Liu, Dongyang Cai, Jie Liu, Nan Ding, Tao Wang
In this report, the method for the iqiyi submission to the task of
ActivityNet 2019 Kinetics-700 challenge is described. Three models are involved
in the model ensemble stage: TSN, HG-NL and StNet. We propose the hierarchical
group-wise non-local (HG-NL) module for frame-level features aggregation for
video classification. The standard non-local (NL) module is effective in
aggregating frame-level features on the task of video classification but
presents low parameters efficiency and high computational cost. The HG-NL
method involves a hierarchical group-wise structure and generates multiple
attention maps to enhance performance. Basing on this hierarchical group-wise
structure, the proposed method has competitive accuracy, fewer parameters and
smaller computational cost than the standard NL. For the task of ActivityNet
2019 Kinetics-700 challenge, after model ensemble, we finally obtain an
averaged top-1 and top-5 error percentage 28.444% on the test set.
Authors' comments: Tech report
Cyprien Ruffino, Romain Hérault, Eric Laloy, Gilles Gasso
Generative Adversarial Networks (GANs) have proven successful for unsupervised image generation. Several works have extended GANs to image inpainting by conditioning the generation with parts of the image to be reconstructed. Despite their success, these methods have limitations in settings where only a small subset of the image pixels is known beforehand. In this paper we investigate the effectiveness of conditioning GANs when very few pixel values are provided. We propose a modelling framework which results in adding an explicit cost term to the GAN objective function to enforce pixel-wise conditioning. We investigate the influence of this regularization term on the quality of the generated images and the fulfillment of the given pixel constraints. Using the recent PacGAN technique, we ensure that we keep diversity in the generated samples. Conducted experiments on FashionMNIST show that the regularization term effectively controls the trade-off between quality of the generated images and the conditioning. Experimental evaluation on the CIFAR-10 and CelebA datasets evidences that our method achieves accurate results both visually and quantitatively in term of Fr\'echet Inception Distance, while still enforcing the pixel conditioning. We also evaluate our method on a texture image generation task using fully-convolutional networks. As a final contribution, we apply the method to a classical geological simulation application.
Jin Jin, Lin Zhang, Ethan Leng, Gregory J. Metzger, Joseph S. Koopmeiners
Multi-parametric magnetic resonance imaging (mpMRI) plays an increasingly
important role in the diagnosis of prostate cancer. Various computer-aided
detection algorithms have been proposed for automated prostate cancer detection
by combining information from various mpMRI data components. However, there
exist other features of mpMRI, including the spatial correlation between voxels
and between-patient heterogeneity in the mpMRI parameters, that have not been
fully explored in the literature but could potentially improve cancer detection
if leveraged appropriately. This paper proposes novel voxel-wise Bayesian
classifiers for prostate cancer that account for the spatial correlation and
between-patient heterogeneity in mpMRI. Modeling the spatial correlation is
challenging due to the extreme high dimensionality of the data, and we consider
three computationally efficient approaches using Nearest Neighbor Gaussian
Process (NNGP), knot-based reduced-rank approximation, and a conditional
autoregressive (CAR) model, respectively. The between-patient heterogeneity is
accounted for by adding a subject-specific random intercept on the mpMRI
parameter model. Simulation results show that properly modeling the spatial
correlation and between-patient heterogeneity improves classification accuracy.
Application to in vivo data illustrates that classification is improved by
spatial modeling using NNGP and reduced-rank approximation but not the CAR
model, while modeling the between-patient heterogeneity does not further
improve our classifier. Among our proposed models, the NNGP-based model is
recommended considering its robust classification accuracy and high
computational efficiency.
Authors' comments: 21 pages, 4 figures
Shuohang Wang, Yunshi Lan, Yi Tay, Jing Jiang, Jingjing Liu
Transformer has been successfully applied to many natural language processing
tasks. However, for textual sequence matching, simple matching between the
representation of a pair of sequences might bring in unnecessary noise. In this
paper, we propose a new approach to sequence pair matching with Transformer, by
learning head-wise matching representations on multiple levels. Experiments
show that our proposed approach can achieve new state-of-the-art performance on
multiple tasks that rely only on pre-computed sequence-vector-representation,
such as SNLI, MNLI-match, MNLI-mismatch, QQP, and SQuAD-binary.
Authors' comments: AAAI 2020, 8 pages
Imtiaz Ahmed, Xia Ben Hu, Mithun P. Acharya, Yu Ding
Dimensionality reduction is considered as an important step for ensuring competitive performance in unsupervised learning such as anomaly detection. Non-negative matrix factorization (NMF) is a popular and widely used method to accomplish this goal. But NMF do not have the provision to include the neighborhood structure information and, as a result, may fail to provide satisfactory performance in presence of nonlinear manifold structure. To address that shortcoming, we propose to consider and incorporate the neighborhood structural similarity information within the NMF framework by modeling the data through a minimum spanning tree. We label the resulting method as the neighborhood structure assisted NMF. We further devise both offline and online algorithmic versions of the proposed method. Empirical comparisons using twenty benchmark datasets as well as an industrial dataset extracted from a hydropower plant demonstrate the superiority of the neighborhood structure assisted NMF and support our claim of merit. Looking closer into the formulation and properties of the neighborhood structure assisted NMF with other recent, enhanced versions of NMF reveals that inclusion of the neighborhood structure information using MST plays a key role in attaining the enhanced performance in anomaly detection.
Madhu Priya, Prabhat K. Jaiswal
Motivated by growing interests in multicomponent metallic alloys and complex
fluids, we study a complex mixture with bidispersity in size and polydispersity
in energy. The energy polydispersity in the system is introduced by considering
random pair-wise interactions between the particles. Extensive molecular
dynamics simulations are performed to compute potential energy and neighborhood
identity ordering (NIO) parameter as a function of temperature for a wide range
of parameters including size-ratio and concentration of the two species by
quenching it from a high temperature fluid state to a crystalline state. Our
findings demonstrate an enhancement of the neighborhood identity ordering on
addition of particles of different sizes. Moreover, a comparatively higher
increase in NIO parameter is achieved by tuning the size-ratio of the
particles. We also propose NIO parameter to be a good marker to differentiate
systems (below the liquid-to-solid transition temperature) having different
values of size-ratio and concentrations. Effect of cooling rates on NIO
parameter is also discussed.
Authors' comments: 17 pages, 6 figures
Peng Liu, Ruogu Fang
Observing retinal fundus images by an ophthalmologist is a major diagnosis
approach for glaucoma. However, it is still difficult to distinguish the
features of the lesion solely through manual observations, especially, in
glaucoma early phase. In this paper, we present two deep learning-based
automated algorithms for glaucoma detection and optic disc and cup
segmentation. We utilize the attention mechanism to learn pixel-wise features
for accurate prediction. In particular, we present two convolutional neural
networks that can focus on learning various pixel-wise level features. In
addition, we develop several attention strategies to guide the networks to
learn the important features that have a major impact on prediction accuracy.
We evaluate our methods on the validation dataset and The proposed both tasks'
solutions can achieve impressive results and outperform current
state-of-the-art methods. \textit{The code is available at
\url{https://github.com/cswin/RLPA}}.
Authors' comments: MICCAI 2018 workshop challenge
Juntae Kim, Jaesung Bae, Minsoo Hahn
A state transition model (STM) based on chunk-wise classification was proposed for end-point detection (EPD). In general, EPD is developed using frame-wise voice activity detection (VAD) with additional STM, in which the state transition is conducted based on VAD's frame-level decision (speech or non-speech). However, VAD errors frequently occur in noisy environments, even though we use state-of-the-art deep neural network based VAD, which causes the undesired state transition of STM. In this work, to build robust STM, a state transition is conducted based on chunk-wise classification as EPD does not need to be conducted in frame-level. The chunk consists of multiple frames and the classification of chunk between speech and non-speech is done by aggregating the decisions of VAD for multiple frames, so that some undesired VAD errors in a chunk can be smoothed by other correct VAD decisions. Finally, the model was evaluated in both qualitative and quantitative measures including phone error rate.
Kim Batselier, Andrzej Cichocki, Ngai Wong
In this article two new algorithms are presented that convert a given data tensor train into either a Tucker decomposition with orthogonal matrix factors or a multi-scale entanglement renormalization ansatz (MERA). The Tucker core tensor is never explicitly computed but stored as a tensor train instead, resulting in both computationally and storage efficient algorithms. Both the multilinear Tucker-ranks as well as the MERA-ranks are automatically determined by the algorithm for a given upper bound on the relative approximation error. In addition, an iterative algorithm with low computational complexity based on solving an orthogonal Procrustes problem is proposed for the first time to retrieve optimal rank-lowering disentangler tensors, which are a crucial component in the construction of a low-rank MERA. Numerical experiments demonstrate the effectiveness of the proposed algorithms together with the potential storage benefit of a low-rank MERA over a tensor train.
Caifa Zhou
Fingerprinting-based positioning, one of the promising indoor positioning
solutions, has been broadly explored owing to the pervasiveness of sensor-rich
mobile devices, the prosperity of opportunistically measurable
location-relevant signals and the progress of data-driven algorithms. One
critical challenge is to controland improve the quality of the reference
fingerprint map (RFM), which is built at the offline stage and applied for
online positioning. The key concept concerningthe quality control of the RFM is
updating the RFM according to the newly measured data. Though varies methods
have been proposed for adapting the RFM, they approach the problem by
introducing extra-positioning schemes (e.g. PDR orUGV) and directly adjust the
RFM without distinguishing whether critical changes have occurred. This paper
aims at proposing an extra-positioning-free solution by making full use of the
redundancy of measurable features. Loosely inspired by random sampling
consensus (RANSAC), arbitrarily sampled subset of features from the online
measurement are used for generating multi-resamples, which areused for
estimating the intermediate locations. In the way of resampling, it can
mitigate the impact of the changed features on positioning and enables to
retrieve accurate location estimation. The users location is robustly computed
by identifying the candidate locations from these intermediate ones using
modified Jaccardindex (MJI) and the feature-wise change belief is calculated
according to the world model of the RFM and the estimated variability of
features. In order to validate our proposed approach, two levels of
experimental analysis have been carried out. On the simulated dataset, the
average change detection accuracy is about 90%. Meanwhile, the improvement of
positioning accuracy within 2 m is about 20% by dropping out the features that
are detected as changed when performing positioning comparing to that of using
all measured features for location estimation. On the long-term collected
dataset, the average change detection accuracy is about 85%.
Authors' comments: 36 pages, 20 figures, 2 tables
Shaohuai Shi, Xiaowen Chu, Bo Li
Distributed synchronous stochastic gradient descent has been widely used to
train deep neural networks (DNNs) on computer clusters. With the increase of
computational power, network communications generally limit the system
scalability. Wait-free backpropagation (WFBP) is a popular solution to overlap
communications with computations during the training process. In this paper, we
observe that many DNNs have a large number of layers with only a small amount
of data to be communicated at each layer in distributed training, which could
make WFBP inefficient. Based on the fact that merging some short communication
tasks into a single one can reduce the overall communication time, we formulate
an optimization problem to minimize the training time in pipelining
communications and computations. We derive an optimal solution that can be
solved efficiently without affecting the training performance. We then apply
the solution to propose a distributed training algorithm named merged-gradient
WFBP (MG-WFBP) and implement it in two platforms Caffe and PyTorch. Extensive
experiments in three GPU clusters are conducted to verify the effectiveness of
MG-WFBP. We further exploit trace-based simulations of 4 to 2048 GPUs to
explore the potential scaling efficiency of MG-WFBP. Experimental results show
that MG-WFBP achieves much better scaling performance than existing methods.
Authors' comments: Accepted by IEEE TPDS. 15 pages. arXiv admin note: substantial text
overlap with arXiv:1811.11141
Guangxing Wang, Wolfgang Polonik
It is well known that the empirical likelihood ratio confidence region
suffers finite sample under-coverage issue, and this severely hampers its
application in statistical inferences.} The root cause of this under-coverage
is an upper limit imposed by the convex hull of the estimating functions that
is used in the construction of the profile empirical likelihood. For i.i.d
data, various methods have been proposed to solve this issue by modifying the
convex hull, but it is not clear how well these methods perform when the data
are no longer independent. In this paper, we propose an adjusted blockwise
empirical likelihood that is designed for weakly dependent multivariate data.
We show that our method not only preserves the much celebrated asymptotic
$\chi^2-$distribution, but also improves the finite sample coverage probability
by removing the upper limit imposed by the convex hull. Further, we show that
our method is also Bartlett correctable, thus is able to achieve high order
asymptotic coverage accuracy.
Authors' comments: Typos corrected
Preetum Nakkiran
In this expository note we describe a surprising phenomenon in overparameterized linear regression, where the dimension exceeds the number of samples: there is a regime where the test risk of the estimator found by gradient descent increases with additional samples. In other words, more data actually hurts the estimator. This behavior is implicit in a recent line of theoretical works analyzing "double-descent" phenomenon in linear models. In this note, we isolate and understand this behavior in an extremely simple setting: linear regression with isotropic Gaussian covariates. In particular, this occurs due to an unconventional type of bias-variance tradeoff in the overparameterized regime: the bias decreases with more samples, but variance increases.
V. Bonjean, N. Aghanim, M. Douspis, N. Malavasi, H. Tanimura
The role played by the large-scale structures in the galaxy evolution is not
quite well understood yet. In this study, we investigate properties of galaxy
in the range 0.1<z<0.3 from a value-added version of the WISExSCOS catalogue
around cosmic filaments detected with DisPerSE. We have fitted a profile of
galaxy over-density around cosmic filaments and found a typical radius of r_m =
7.5+-0.2 Mpc. We have measured an excess of passive galaxies near the
filament's spine, higher than the excess of transitioning and active galaxies.
We have also detected SFR and Mstar gradients pointing towards the filament's
spine. We have investigated this result and found an Mstar gradient for each
type of galaxies: active, transitioning, and passive, and a positive SFR
gradient for passive galaxies. We also link the galaxy properties and the gas
content in the Cosmic Web. To do so, we have investigated the quiescent
fraction fQ profile of galaxies around the cosmic filaments. Based on recent
studies about the effect of the gas and of the Cosmic Web on galaxy properties,
we have modeled fQ with a beta model of gas pressure. The slope obtained here,
beta=0.54+-0.18, is compatible with the scenario of projected isothermal gas in
hydrostatic equilibrium (beta=2/3), and with the profiles of gas fitted in SZ.
Authors' comments: 14 pages, 14 figures, submitted to A&A
Yoshiki Toba, Wei-Hao Wang, Tohru Nagao, Yoshihiro Ueda, Junko Ueda, Chen-Fatt Lim, Yu-Yen Chang, Toshiki Saito et al.
We present far-infrared (FIR) properties of an extremely luminous infrared
galaxy (ELIRG) at $z_{\rm spec}$ = 3.703, WISE J101326.25+611220.1
(WISE1013+6112). This ELIRG is selected as an IR-bright dust-obscured galaxy
(DOG) based on the photometry from the Sloan digital sky survey (SDSS) and
wide-field infrared survey explorer (WISE). In order to derive its accurate IR
luminosity, we perform follow-up observations at 89 and 154 $\mu$m using the
high-resolution airborne wideband camera-plus (HAWC+) on board the 2.7-m
stratospheric observatory for infrared astronomy (SOFIA) telescope. We conduct
spectral energy distribution (SED) fitting with CIGALE using 15 photometric
data (0.4-1300 $\mu$m). We successfully pin down FIR SED of WISE1013+6112 and
its IR luminosity is estimated to be $L_{\rm IR}$ = (1.62 $\pm$ 0.08) $\times
10^{14}$ $L_{\odot}$, making it one of the most luminous IR galaxies in the
universe. We determine the dust temperature of WISE1013+6112 is $T_{\rm dust}$
= 89 $\pm$ 3 K, which is significantly higher than that of other populations
such as SMGs and FIR-selected galaxies at similar IR luminosities. The
resultant dust mass is $M_{\rm dust}$ = (2.2 $\pm$ 0.1) $\times 10^{8}$
$M_{\odot}$. This indicates that WISE1013+6112 has a significant active
galactic nucleus (AGN) and star-forming activity behind a large amount of dust.
Authors' comments: 9 pages, 4 figures, and 2 tables, accepted for publication in ApJ
Bo Luo, Qiang Xu
Deep neural networks (DNNs) are shown to be susceptible to adversarial example attacks. Most existing works achieve this malicious objective by crafting subtle pixel-wise perturbations, and they are difficult to launch in the physical world due to inevitable transformations (e.g., different photographic distances and angles). Recently, there are a few research works on generating physical adversarial examples, but they generally require the details of the model a priori, which is often impractical. In this work, we propose a novel physical adversarial attack for arbitrary black-box DNN models, namely Region-Wise Attack. To be specific, we present how to efficiently search for regionwise perturbations to the inputs and determine their shapes, locations and colors via both top-down and bottom-up techniques. In addition, we introduce two fine-tuning techniques to further improve the robustness of our attack. Experimental results demonstrate the efficacy and robustness of the proposed Region-Wise Attack in real world.
Chhavi Dhiman, Dinesh Kumar Vishwakarma, Paras Aggarwal
There exist a wide range of intra class variations of the same actions and
inter class similarity among the actions, at the same time, which makes the
action recognition in videos very challenging. In this paper, we present a
novel skeleton-based part-wise Spatiotemporal CNN RIAC Network-based 3D human
action recognition framework to visualise the action dynamics in part wise
manner and utilise each part for action recognition by applying weighted late
fusion mechanism. Part wise skeleton based motion dynamics helps to highlight
local features of the skeleton which is performed by partitioning the complete
skeleton in five parts such as Head to Spine, Left Leg, Right Leg, Left Hand,
Right Hand. The RIAFNet architecture is greatly inspired by the InceptionV4
architecture which unified the ResNet and Inception based Spatio-temporal
feature representation concept and achieving the highest top-1 accuracy till
date. To extract and learn salient features for action recognition, attention
driven residues are used which enhance the performance of residual components
for effective 3D skeleton-based Spatio-temporal action representation. The
robustness of the proposed framework is evaluated by performing extensive
experiments on three challenging datasets such as UT Kinect Action 3D, Florence
3D action Dataset, and MSR Daily Action3D datasets, which consistently
demonstrate the superiority of our method
Authors' comments: 20 pages, 9 figures
Jung Hyun Lee, Jihun Yun, Sung Ju Hwang, Eunho Yang
Network quantization, which aims to reduce the bit-lengths of the network
weights and activations, has emerged as one of the key ingredients to reduce
the size of neural networks for their deployments to resource-limited devices.
In order to overcome the nature of transforming continuous activations and
weights to discrete ones, recent study called Relaxed Quantization (RQ)
[Louizos et al. 2019] successfully employ the popular Gumbel-Softmax that
allows this transformation with efficient gradient-based optimization. However,
RQ with this Gumbel-Softmax relaxation still suffers from bias-variance
trade-off depending on the temperature parameter of Gumbel-Softmax. To resolve
the issue, we propose a novel method, Semi-Relaxed Quantization (SRQ) that uses
multi-class straight-through estimator to effectively reduce the bias and
variance, along with a new regularization technique, DropBits that replaces
dropout regularization to randomly drop the bits instead of neurons to further
reduce the bias of the multi-class straight-through estimator in SRQ. As a
natural extension of DropBits, we further introduce the way of learning
heterogeneous quantization levels to find proper bit-length for each layer
using DropBits. We experimentally validate our method on various benchmark
datasets and network architectures, and also support the quantized lottery
ticket hypothesis: learning heterogeneous quantization levels outperforms the
case using the same but fixed quantization levels from scratch.
Authors' comments: New submission with another link