Shuohang Wang, Yunshi Lan, Yi Tay, Jing Jiang, Jingjing Liu
Transformer has been successfully applied to many natural language processing
tasks. However, for textual sequence matching, simple matching between the
representation of a pair of sequences might bring in unnecessary noise. In this
paper, we propose a new approach to sequence pair matching with Transformer, by
learning head-wise matching representations on multiple levels. Experiments
show that our proposed approach can achieve new state-of-the-art performance on
multiple tasks that rely only on pre-computed sequence-vector-representation,
such as SNLI, MNLI-match, MNLI-mismatch, QQP, and SQuAD-binary.
Authors' comments: AAAI 2020, 8 pages
Imtiaz Ahmed, Xia Ben Hu, Mithun P. Acharya, Yu Ding
Dimensionality reduction is considered as an important step for ensuring competitive performance in unsupervised learning such as anomaly detection. Non-negative matrix factorization (NMF) is a popular and widely used method to accomplish this goal. But NMF do not have the provision to include the neighborhood structure information and, as a result, may fail to provide satisfactory performance in presence of nonlinear manifold structure. To address that shortcoming, we propose to consider and incorporate the neighborhood structural similarity information within the NMF framework by modeling the data through a minimum spanning tree. We label the resulting method as the neighborhood structure assisted NMF. We further devise both offline and online algorithmic versions of the proposed method. Empirical comparisons using twenty benchmark datasets as well as an industrial dataset extracted from a hydropower plant demonstrate the superiority of the neighborhood structure assisted NMF and support our claim of merit. Looking closer into the formulation and properties of the neighborhood structure assisted NMF with other recent, enhanced versions of NMF reveals that inclusion of the neighborhood structure information using MST plays a key role in attaining the enhanced performance in anomaly detection.
Madhu Priya, Prabhat K. Jaiswal
Motivated by growing interests in multicomponent metallic alloys and complex
fluids, we study a complex mixture with bidispersity in size and polydispersity
in energy. The energy polydispersity in the system is introduced by considering
random pair-wise interactions between the particles. Extensive molecular
dynamics simulations are performed to compute potential energy and neighborhood
identity ordering (NIO) parameter as a function of temperature for a wide range
of parameters including size-ratio and concentration of the two species by
quenching it from a high temperature fluid state to a crystalline state. Our
findings demonstrate an enhancement of the neighborhood identity ordering on
addition of particles of different sizes. Moreover, a comparatively higher
increase in NIO parameter is achieved by tuning the size-ratio of the
particles. We also propose NIO parameter to be a good marker to differentiate
systems (below the liquid-to-solid transition temperature) having different
values of size-ratio and concentrations. Effect of cooling rates on NIO
parameter is also discussed.
Authors' comments: 17 pages, 6 figures
Peng Liu, Ruogu Fang
Observing retinal fundus images by an ophthalmologist is a major diagnosis
approach for glaucoma. However, it is still difficult to distinguish the
features of the lesion solely through manual observations, especially, in
glaucoma early phase. In this paper, we present two deep learning-based
automated algorithms for glaucoma detection and optic disc and cup
segmentation. We utilize the attention mechanism to learn pixel-wise features
for accurate prediction. In particular, we present two convolutional neural
networks that can focus on learning various pixel-wise level features. In
addition, we develop several attention strategies to guide the networks to
learn the important features that have a major impact on prediction accuracy.
We evaluate our methods on the validation dataset and The proposed both tasks'
solutions can achieve impressive results and outperform current
state-of-the-art methods. \textit{The code is available at
\url{https://github.com/cswin/RLPA}}.
Authors' comments: MICCAI 2018 workshop challenge
Juntae Kim, Jaesung Bae, Minsoo Hahn
A state transition model (STM) based on chunk-wise classification was proposed for end-point detection (EPD). In general, EPD is developed using frame-wise voice activity detection (VAD) with additional STM, in which the state transition is conducted based on VAD's frame-level decision (speech or non-speech). However, VAD errors frequently occur in noisy environments, even though we use state-of-the-art deep neural network based VAD, which causes the undesired state transition of STM. In this work, to build robust STM, a state transition is conducted based on chunk-wise classification as EPD does not need to be conducted in frame-level. The chunk consists of multiple frames and the classification of chunk between speech and non-speech is done by aggregating the decisions of VAD for multiple frames, so that some undesired VAD errors in a chunk can be smoothed by other correct VAD decisions. Finally, the model was evaluated in both qualitative and quantitative measures including phone error rate.
Kim Batselier, Andrzej Cichocki, Ngai Wong
In this article two new algorithms are presented that convert a given data tensor train into either a Tucker decomposition with orthogonal matrix factors or a multi-scale entanglement renormalization ansatz (MERA). The Tucker core tensor is never explicitly computed but stored as a tensor train instead, resulting in both computationally and storage efficient algorithms. Both the multilinear Tucker-ranks as well as the MERA-ranks are automatically determined by the algorithm for a given upper bound on the relative approximation error. In addition, an iterative algorithm with low computational complexity based on solving an orthogonal Procrustes problem is proposed for the first time to retrieve optimal rank-lowering disentangler tensors, which are a crucial component in the construction of a low-rank MERA. Numerical experiments demonstrate the effectiveness of the proposed algorithms together with the potential storage benefit of a low-rank MERA over a tensor train.
Caifa Zhou
Fingerprinting-based positioning, one of the promising indoor positioning
solutions, has been broadly explored owing to the pervasiveness of sensor-rich
mobile devices, the prosperity of opportunistically measurable
location-relevant signals and the progress of data-driven algorithms. One
critical challenge is to controland improve the quality of the reference
fingerprint map (RFM), which is built at the offline stage and applied for
online positioning. The key concept concerningthe quality control of the RFM is
updating the RFM according to the newly measured data. Though varies methods
have been proposed for adapting the RFM, they approach the problem by
introducing extra-positioning schemes (e.g. PDR orUGV) and directly adjust the
RFM without distinguishing whether critical changes have occurred. This paper
aims at proposing an extra-positioning-free solution by making full use of the
redundancy of measurable features. Loosely inspired by random sampling
consensus (RANSAC), arbitrarily sampled subset of features from the online
measurement are used for generating multi-resamples, which areused for
estimating the intermediate locations. In the way of resampling, it can
mitigate the impact of the changed features on positioning and enables to
retrieve accurate location estimation. The users location is robustly computed
by identifying the candidate locations from these intermediate ones using
modified Jaccardindex (MJI) and the feature-wise change belief is calculated
according to the world model of the RFM and the estimated variability of
features. In order to validate our proposed approach, two levels of
experimental analysis have been carried out. On the simulated dataset, the
average change detection accuracy is about 90%. Meanwhile, the improvement of
positioning accuracy within 2 m is about 20% by dropping out the features that
are detected as changed when performing positioning comparing to that of using
all measured features for location estimation. On the long-term collected
dataset, the average change detection accuracy is about 85%.
Authors' comments: 36 pages, 20 figures, 2 tables
Shaohuai Shi, Xiaowen Chu, Bo Li
Distributed synchronous stochastic gradient descent has been widely used to
train deep neural networks (DNNs) on computer clusters. With the increase of
computational power, network communications generally limit the system
scalability. Wait-free backpropagation (WFBP) is a popular solution to overlap
communications with computations during the training process. In this paper, we
observe that many DNNs have a large number of layers with only a small amount
of data to be communicated at each layer in distributed training, which could
make WFBP inefficient. Based on the fact that merging some short communication
tasks into a single one can reduce the overall communication time, we formulate
an optimization problem to minimize the training time in pipelining
communications and computations. We derive an optimal solution that can be
solved efficiently without affecting the training performance. We then apply
the solution to propose a distributed training algorithm named merged-gradient
WFBP (MG-WFBP) and implement it in two platforms Caffe and PyTorch. Extensive
experiments in three GPU clusters are conducted to verify the effectiveness of
MG-WFBP. We further exploit trace-based simulations of 4 to 2048 GPUs to
explore the potential scaling efficiency of MG-WFBP. Experimental results show
that MG-WFBP achieves much better scaling performance than existing methods.
Authors' comments: Accepted by IEEE TPDS. 15 pages. arXiv admin note: substantial text
overlap with arXiv:1811.11141
Guangxing Wang, Wolfgang Polonik
It is well known that the empirical likelihood ratio confidence region
suffers finite sample under-coverage issue, and this severely hampers its
application in statistical inferences.} The root cause of this under-coverage
is an upper limit imposed by the convex hull of the estimating functions that
is used in the construction of the profile empirical likelihood. For i.i.d
data, various methods have been proposed to solve this issue by modifying the
convex hull, but it is not clear how well these methods perform when the data
are no longer independent. In this paper, we propose an adjusted blockwise
empirical likelihood that is designed for weakly dependent multivariate data.
We show that our method not only preserves the much celebrated asymptotic
$\chi^2-$distribution, but also improves the finite sample coverage probability
by removing the upper limit imposed by the convex hull. Further, we show that
our method is also Bartlett correctable, thus is able to achieve high order
asymptotic coverage accuracy.
Authors' comments: Typos corrected
Preetum Nakkiran
In this expository note we describe a surprising phenomenon in overparameterized linear regression, where the dimension exceeds the number of samples: there is a regime where the test risk of the estimator found by gradient descent increases with additional samples. In other words, more data actually hurts the estimator. This behavior is implicit in a recent line of theoretical works analyzing "double-descent" phenomenon in linear models. In this note, we isolate and understand this behavior in an extremely simple setting: linear regression with isotropic Gaussian covariates. In particular, this occurs due to an unconventional type of bias-variance tradeoff in the overparameterized regime: the bias decreases with more samples, but variance increases.
V. Bonjean, N. Aghanim, M. Douspis, N. Malavasi, H. Tanimura
The role played by the large-scale structures in the galaxy evolution is not
quite well understood yet. In this study, we investigate properties of galaxy
in the range 0.1<z<0.3 from a value-added version of the WISExSCOS catalogue
around cosmic filaments detected with DisPerSE. We have fitted a profile of
galaxy over-density around cosmic filaments and found a typical radius of r_m =
7.5+-0.2 Mpc. We have measured an excess of passive galaxies near the
filament's spine, higher than the excess of transitioning and active galaxies.
We have also detected SFR and Mstar gradients pointing towards the filament's
spine. We have investigated this result and found an Mstar gradient for each
type of galaxies: active, transitioning, and passive, and a positive SFR
gradient for passive galaxies. We also link the galaxy properties and the gas
content in the Cosmic Web. To do so, we have investigated the quiescent
fraction fQ profile of galaxies around the cosmic filaments. Based on recent
studies about the effect of the gas and of the Cosmic Web on galaxy properties,
we have modeled fQ with a beta model of gas pressure. The slope obtained here,
beta=0.54+-0.18, is compatible with the scenario of projected isothermal gas in
hydrostatic equilibrium (beta=2/3), and with the profiles of gas fitted in SZ.
Authors' comments: 14 pages, 14 figures, submitted to A&A
Yoshiki Toba, Wei-Hao Wang, Tohru Nagao, Yoshihiro Ueda, Junko Ueda, Chen-Fatt Lim, Yu-Yen Chang, Toshiki Saito et al.
We present far-infrared (FIR) properties of an extremely luminous infrared
galaxy (ELIRG) at $z_{\rm spec}$ = 3.703, WISE J101326.25+611220.1
(WISE1013+6112). This ELIRG is selected as an IR-bright dust-obscured galaxy
(DOG) based on the photometry from the Sloan digital sky survey (SDSS) and
wide-field infrared survey explorer (WISE). In order to derive its accurate IR
luminosity, we perform follow-up observations at 89 and 154 $\mu$m using the
high-resolution airborne wideband camera-plus (HAWC+) on board the 2.7-m
stratospheric observatory for infrared astronomy (SOFIA) telescope. We conduct
spectral energy distribution (SED) fitting with CIGALE using 15 photometric
data (0.4-1300 $\mu$m). We successfully pin down FIR SED of WISE1013+6112 and
its IR luminosity is estimated to be $L_{\rm IR}$ = (1.62 $\pm$ 0.08) $\times
10^{14}$ $L_{\odot}$, making it one of the most luminous IR galaxies in the
universe. We determine the dust temperature of WISE1013+6112 is $T_{\rm dust}$
= 89 $\pm$ 3 K, which is significantly higher than that of other populations
such as SMGs and FIR-selected galaxies at similar IR luminosities. The
resultant dust mass is $M_{\rm dust}$ = (2.2 $\pm$ 0.1) $\times 10^{8}$
$M_{\odot}$. This indicates that WISE1013+6112 has a significant active
galactic nucleus (AGN) and star-forming activity behind a large amount of dust.
Authors' comments: 9 pages, 4 figures, and 2 tables, accepted for publication in ApJ
Bo Luo, Qiang Xu
Deep neural networks (DNNs) are shown to be susceptible to adversarial example attacks. Most existing works achieve this malicious objective by crafting subtle pixel-wise perturbations, and they are difficult to launch in the physical world due to inevitable transformations (e.g., different photographic distances and angles). Recently, there are a few research works on generating physical adversarial examples, but they generally require the details of the model a priori, which is often impractical. In this work, we propose a novel physical adversarial attack for arbitrary black-box DNN models, namely Region-Wise Attack. To be specific, we present how to efficiently search for regionwise perturbations to the inputs and determine their shapes, locations and colors via both top-down and bottom-up techniques. In addition, we introduce two fine-tuning techniques to further improve the robustness of our attack. Experimental results demonstrate the efficacy and robustness of the proposed Region-Wise Attack in real world.
Chhavi Dhiman, Dinesh Kumar Vishwakarma, Paras Aggarwal
There exist a wide range of intra class variations of the same actions and
inter class similarity among the actions, at the same time, which makes the
action recognition in videos very challenging. In this paper, we present a
novel skeleton-based part-wise Spatiotemporal CNN RIAC Network-based 3D human
action recognition framework to visualise the action dynamics in part wise
manner and utilise each part for action recognition by applying weighted late
fusion mechanism. Part wise skeleton based motion dynamics helps to highlight
local features of the skeleton which is performed by partitioning the complete
skeleton in five parts such as Head to Spine, Left Leg, Right Leg, Left Hand,
Right Hand. The RIAFNet architecture is greatly inspired by the InceptionV4
architecture which unified the ResNet and Inception based Spatio-temporal
feature representation concept and achieving the highest top-1 accuracy till
date. To extract and learn salient features for action recognition, attention
driven residues are used which enhance the performance of residual components
for effective 3D skeleton-based Spatio-temporal action representation. The
robustness of the proposed framework is evaluated by performing extensive
experiments on three challenging datasets such as UT Kinect Action 3D, Florence
3D action Dataset, and MSR Daily Action3D datasets, which consistently
demonstrate the superiority of our method
Authors' comments: 20 pages, 9 figures
Jung Hyun Lee, Jihun Yun, Sung Ju Hwang, Eunho Yang
Network quantization, which aims to reduce the bit-lengths of the network
weights and activations, has emerged as one of the key ingredients to reduce
the size of neural networks for their deployments to resource-limited devices.
In order to overcome the nature of transforming continuous activations and
weights to discrete ones, recent study called Relaxed Quantization (RQ)
[Louizos et al. 2019] successfully employ the popular Gumbel-Softmax that
allows this transformation with efficient gradient-based optimization. However,
RQ with this Gumbel-Softmax relaxation still suffers from bias-variance
trade-off depending on the temperature parameter of Gumbel-Softmax. To resolve
the issue, we propose a novel method, Semi-Relaxed Quantization (SRQ) that uses
multi-class straight-through estimator to effectively reduce the bias and
variance, along with a new regularization technique, DropBits that replaces
dropout regularization to randomly drop the bits instead of neurons to further
reduce the bias of the multi-class straight-through estimator in SRQ. As a
natural extension of DropBits, we further introduce the way of learning
heterogeneous quantization levels to find proper bit-length for each layer
using DropBits. We experimentally validate our method on various benchmark
datasets and network architectures, and also support the quantized lottery
ticket hypothesis: learning heterogeneous quantization levels outperforms the
case using the same but fixed quantization levels from scratch.
Authors' comments: New submission with another link
Talha Bin Masood, Ingrid Hotz
The analysis of contours of scalar fields plays an important role in visualization. For example the contour tree and contour statistics can be used as a means for interaction and filtering or as signatures. In the context of tensor field analysis, such methods are also interesting for the analysis of derived scalar invariants. While there are standard algorithms to compute and analyze contours, they are not directly applicable to tensor invariants when using component-wise tensor interpolation. In this chapter we present an accurate derivation of the contour spectrum for invariants with quadratic behavior computed from two-dimensional piece-wise linear tensor fields. For this work, we are mostly motivated by a consistent treatment of the anisotropy field, which plays an important role as stability measure for tensor field topology. We show that it is possible to derive an analytical expression for the distribution of the invariant values in this setting, which is exemplary given for the anisotropy in all details. Our derivation is based on a topological sub-division of the mesh in triangles that exhibit a monotonic behavior. This triangulation can also directly be used to compute the accurate contour tree with standard algorithms. We compare the results to a na\"ive approach based on linear interpolation on the original mesh or the subdivision.
Weizhe Liu, Mathieu Salzmann, Pascal Fua
State-of-the-art methods for counting people in crowded scenes rely on deep networks to estimate crowd density. While effective, deep learning approaches are vulnerable to adversarial attacks, which, in a crowd-counting context, can lead to serious security issues. However, attack and defense mechanisms have been virtually unexplored in regression tasks, let alone for crowd density estimation. In this paper, we investigate the effectiveness of existing attack strategies on crowd-counting networks, and introduce a simple yet effective pixel-wise detection mechanism. It builds on the intuition that, when attacking a multitask network, in our case estimating crowd density and scene depth, both outputs will be perturbed, and thus the second one can be used for detection purposes. We will demonstrate that this significantly outperforms heuristic and uncertainty-based strategies.
Fu-Zhao Ou, Yuan-Gen Wang, Jin Li, Guopu Zhu, Sam Kwong
No-reference image quality assessment (NR-IQA) has received increasing attention in the IQA community since reference image is not always available. Real-world images generally suffer from various types of distortion. Unfortunately, existing NR-IQA methods do not work with all types of distortion. It is a challenging task to develop universal NR-IQA that has the ability of evaluating all types of distorted images. In this paper, we propose a universal NR-IQA method based on controllable list-wise ranking (CLRIQA). First, to extend the authentically distorted image dataset, we present an imaging-heuristic approach, in which the over-underexposure is formulated as an inverse of Weber-Fechner law, and fusion strategy and probabilistic compression are adopted, to generate the degraded real-world images. These degraded images are label-free yet associated with quality ranking information. We then design a controllable list-wise ranking function by limiting rank range and introducing an adaptive margin to tune rank interval. Finally, the extended dataset and controllable list-wise ranking function are used to pre-train a CNN. Moreover, in order to obtain an accurate prediction model, we take advantage of the original dataset to further fine-tune the pre-trained network. Experiments evaluated on four benchmark datasets (i.e. LIVE, CSIQ, TID2013, and LIVE-C) show that the proposed CLRIQA improves the state of the art by over 9% in terms of overall performance. The code and model are publicly available at https://github.com/GZHU-Image-Lab/CLRIQA.
Hyunsung D. Jun, Roberto J. Assef, Franz E. Bauer, Andrew W. Blain, Tanio Diaz-Santos, Peter R. Eisenhardt, Daniel Stern, Chao-Wei Tsai et al.
We present VLT/XSHOOTER rest-frame UV-optical spectra of 10 Hot Dust-Obscured
Galaxies (Hot DOGs) at $z\sim2$ to investigate AGN diagnostics and to assess
the presence and effect of ionized gas outflows. Most Hot DOGs in this sample
are narrow-line dominated AGN (type 1.8 or higher), and have higher Balmer
decrements than typical type 2 quasars. Almost all (8/9) sources show evidence
for ionized gas outflows in the form of broad and blueshifted [O III] profiles,
and some sources have such profiles in H$\alpha$ (5/7) or [O II] (3/6).
Combined with the literature, these results support additional sources of
obscuration beyond the simple torus invoked by AGN unification models. Outflow
rates derived from the broad [O III] line ($\rm
\gtrsim10^{3}\,M_{\odot}\,yr^{-1}$) are greater than the black hole accretion
and star formation rates, with feedback efficiencies ($\sim0.1-1\%$) consistent
with negative feedback to the host galaxy's star formation in merger-driven
quasar activity scenarios. We find the broad emission lines in luminous,
obscured quasars are often better explained by outflows within the narrow line
region, and caution that black hole mass estimates for such sources in the
literature may have substantial uncertainty. Regardless, we find lower bounds
on the Eddington ratio for Hot DOGs near unity.
Authors' comments: 20 pages, accepted to ApJ, minor corrections (typos and references)
in section 4.4
Shaohuai Shi, Zhenheng Tang, Qiang Wang, Kaiyong Zhao, Xiaowen Chu
To reduce the long training time of large deep neural network (DNN) models,
distributed synchronous stochastic gradient descent (S-SGD) is commonly used on
a cluster of workers. However, the speedup brought by multiple workers is
limited by the communication overhead. Two approaches, namely pipelining and
gradient sparsification, have been separately proposed to alleviate the impact
of communication overheads. Yet, the gradient sparsification methods can only
initiate the communication after the backpropagation, and hence miss the
pipelining opportunity. In this paper, we propose a new distributed
optimization method named LAGS-SGD, which combines S-SGD with a novel
layer-wise adaptive gradient sparsification (LAGS) scheme. In LAGS-SGD, every
worker selects a small set of "significant" gradients from each layer
independently whose size can be adaptive to the communication-to-computation
ratio of that layer. The layer-wise nature of LAGS-SGD opens the opportunity of
overlapping communications with computations, while the adaptive nature of
LAGS-SGD makes it flexible to control the communication time. We prove that
LAGS-SGD has convergence guarantees and it has the same order of convergence
rate as vanilla S-SGD under a weak analytical assumption. Extensive experiments
are conducted to verify the analytical assumption and the convergence
performance of LAGS-SGD. Experimental results on a 16-GPU cluster show that
LAGS-SGD outperforms the original S-SGD and existing sparsified S-SGD without
losing obvious model accuracy.
Authors' comments: 8 pages. To appear at ECAI 2020