Ahmed Rasheed, Muhammad Shahzad Younis, Farooq Ahmad, Junaid Qadir, Muhammad Kashif
Wheat is the main agricultural crop of Pakistan and is a staple food
requirement of almost every Pakistani household making it the main strategic
commodity of the country whose availability and affordability is the
government's main priority. Wheat food availability can be vastly affected by
multiple factors included but not limited to the production, consumption,
financial crisis, inflation, or volatile market. The government ensures food
security by particular policy and monitory arrangements, which keeps up
purchase parity for the poor. Such arrangements can be made more effective if a
dynamic analysis is carried out to estimate the future yield based on certain
current factors. Future planning of commodity pricing is achievable by
forecasting their future price anticipated by the current circumstances. This
paper presents a wheat price forecasting methodology, which uses the price,
weather, production, and consumption trends for wheat prices taken over the
past few years and analyzes them with the help of advance neural networks
architecture Long Short Term Memory (LSTM) networks. The proposed methodology
presented significantly improved results versus other conventional machine
learning and statistical time series analysis methods.
Authors' comments: 9 pages, submitted to IEEE Access
Antonio Joia Neto, Andre G C Pacheco, Diogo C Luvizon
In this paper, we propose a new Sound Event Classification (SEC) method which
is inspired in recent works for out-of-distribution detection. In our method,
we analyse all the activations of a generic CNN in order to produce feature
representations using Gram Matrices. The similarity metrics are evaluated
considering all possible classes, and the final prediction is defined as the
class that minimizes the deviation with respect to the features seeing during
training. The proposed approach can be applied to any CNN and our experimental
evaluation of four different architectures on two datasets demonstrated that
our method consistently improves the baseline models.
Authors' comments: To appear on ICASSP 2021
Ken'ichiro Tanaka
We propose a method for generating nodes for kernel quadrature by a
point-wise gradient descent method. For kernel quadrature, most methods for
generating nodes are based on the worst case error of a quadrature formula in a
reproducing kernel Hilbert space corresponding to the kernel. In typical ones
among those methods, a new node is chosen among a candidate set of points in
each step by an optimization problem with respect to a new node. Although such
sequential methods are appropriate for adaptive quadrature, it is difficult to
apply standard routines for mathematical optimization to the problem. In this
paper, we propose a method that updates a set of points one by one with a
simple gradient descent method. To this end, we provide an upper bound of the
worst case error by using the fundamental solution of the Laplacian on
$\mathbf{R}^{d}$. We observe the good performance of the proposed method by
numerical experiments.
Authors' comments: 21 pages, 12 figures
Seyedehsara, Nayer, Namrata Vaswani
We study the following lesser-known low rank (LR) recovery problem: recover
an $n \times q$ rank-$r$ matrix, $X^* =[x^*_1 , x^*_2, ..., x^*_q]$, with $r
\ll \min(n,q)$, from $m$ independent linear projections of each of its $q$
columns, i.e., from $y_k := A_k x^*_k , k \in [q]$, when $y_k$ is an $m$-length
vector with $m < n$. The matrices $A_k$ are known and mutually independent for
different $k$. We introduce a novel gradient descent (GD) based solution called
AltGD-Min. We show that, if the $A_k$s are i.i.d. with i.i.d. Gaussian entries,
and if the right singular vectors of $X^*$ satisfy the incoherence assumption,
then $\epsilon$-accurate recovery of $X^*$ is possible with order $(n+q) r^2
\log(1/\epsilon)$ total samples and order $ mq nr \log (1/\epsilon)$ time.
Compared with existing work, this is the fastest solution. For $\epsilon <
r^{1/4}$, it also has the best sample complexity. A simple extension of
AltGD-Min also provably solves LR Phase Retrieval, which is a magnitude-only
generalization of the above problem. AltGD-Min factorizes the unknown $X$ as $X
= UB$ where $U$ and $B$ are matrices with $r$ columns and rows respectively. It
alternates between a (projected) GD step for updating $U$, and a minimization
step for updating $B$. Its each iteration is as fast as that of regular
projected GD because the minimization over $B$ decouples column-wise. At the
same time, we can prove exponential error decay for it, which we are unable to
for projected GD. Finally, it can also be efficiently federated with a
communication cost of only $nr$ per node, instead of $nq$ for projected GD.
Authors' comments: To appear in IEEE Transactions on Information Theory (T-IT)
Hanshu Yan, Jingfeng Zhang, Gang Niu, Jiashi Feng, Vincent Y. F. Tan, Masashi Sugiyama
We investigate the adversarial robustness of CNNs from the perspective of channel-wise activations. By comparing \textit{non-robust} (normally trained) and \textit{robustified} (adversarially trained) models, we observe that adversarial training (AT) robustifies CNNs by aligning the channel-wise activations of adversarial data with those of their natural counterparts. However, the channels that are \textit{negatively-relevant} (NR) to predictions are still over-activated when processing adversarial data. Besides, we also observe that AT does not result in similar robustness for all classes. For the robust classes, channels with larger activation magnitudes are usually more \textit{positively-relevant} (PR) to predictions, but this alignment does not hold for the non-robust classes. Given these observations, we hypothesize that suppressing NR channels and aligning PR ones with their relevances further enhances the robustness of CNNs under AT. To examine this hypothesis, we introduce a novel mechanism, i.e., \underline{C}hannel-wise \underline{I}mportance-based \underline{F}eature \underline{S}election (CIFS). The CIFS manipulates channels' activations of certain layers by generating non-negative multipliers to these channels based on their relevances to predictions. Extensive experiments on benchmark datasets including CIFAR10 and SVHN clearly verify the hypothesis and CIFS's effectiveness of robustifying CNNs. \url{https://github.com/HanshuYAN/CIFS}
Zhiqiang Wang, Qingyun She, Junlin Zhang
Click-Through Rate(CTR) estimation has become one of the most fundamental
tasks in many real-world applications and it's important for ranking models to
effectively capture complex high-order features. Shallow feed-forward network
is widely used in many state-of-the-art DNN models such as FNN, DeepFM and
xDeepFM to implicitly capture high-order feature interactions. However, some
research has proved that addictive feature interaction, particular feed-forward
neural networks, is inefficient in capturing common feature interaction. To
resolve this problem, we introduce specific multiplicative operation into DNN
ranking system by proposing instance-guided mask which performs element-wise
product both on the feature embedding and feed-forward layers guided by input
instance. We also turn the feed-forward layer in DNN model into a mixture of
addictive and multiplicative feature interactions by proposing MaskBlock in
this paper. MaskBlock combines the layer normalization, instance-guided mask,
and feed-forward layer and it is a basic building block to be used to design
new ranking model under various configurations. The model consisting of
MaskBlock is called MaskNet in this paper and two new MaskNet models are
proposed to show the effectiveness of MaskBlock as basic building block for
composing high performance ranking systems. The experiment results on three
real-world datasets demonstrate that our proposed MaskNet models outperform
state-of-the-art models such as DeepFM and xDeepFM significantly, which implies
MaskBlock is an effective basic building unit for composing new high
performance ranking systems.
Authors' comments: In Proceedings of DLP-KDD 2021. ACM,Singapore. arXiv admin note: text
overlap with arXiv:2006.12753
Huu-Thiet Nguyen, Chien Chern Cheah, Kar-Ann Toh
Deep learning (DL) has achieved great success in many applications, but it
has been less well analyzed from the theoretical perspective. The unexplainable
success of black-box DL models has raised questions among scientists and
promoted the emergence of the field of explainable artificial intelligence
(XAI). In robotics, it is particularly important to deploy DL algorithms in a
predictable and stable manner as robots are active agents that need to interact
safely with the physical world. This paper presents an analytic deep learning
framework for fully connected neural networks, which can be applied for both
regression problems and classification problems. Examples for regression and
classification problems include online robot control and robot vision. We
present two layer-wise learning algorithms such that the convergence of the
learning systems can be analyzed. Firstly, an inverse layer-wise learning
algorithm for multilayer networks with convergence analysis for each layer is
presented to understand the problems of layer-wise deep learning. Secondly, a
forward progressive learning algorithm where the deep networks are built
progressively by using single hidden layer networks is developed to achieve
better accuracy. It is shown that the progressive learning method can be used
for fine-tuning of weights from convergence point of view. The effectiveness of
the proposed framework is illustrated based on classical benchmark recognition
tasks using the MNIST and CIFAR-10 datasets and the results show a good balance
between performance and explainability. The proposed method is subsequently
applied for online learning of robot kinematics and experimental results on
kinematic control of UR5e robot with unknown model are presented.
Authors' comments: The paper has been published in Automatica
Kanchan Chowdhury, Ankita Sharma, Arun Deepak Chandrasekar
Increasing the batch size of a deep learning model is a challenging task. Although it might help in utilizing full available system memory during training phase of a model, it results in significant loss of test accuracy most often. LARS solved this issue by introducing an adaptive learning rate for each layer of a deep learning model. However, there are doubts on how popular distributed machine learning systems such as SystemML or MLlib will perform with this optimizer. In this work, we apply LARS optimizer to a deep learning model implemented using SystemML.We perform experiments with various batch sizes and compare the performance of LARS optimizer with \textit{Stochastic Gradient Descent}. Our experimental results show that LARS optimizer performs significantly better than Stochastic Gradient Descent for large batch sizes even with the distributed machine learning framework, SystemML.
Long Chen, Junyu Dong, Huiyu Zhou
Underwater object detection technique is of great significance for various applications in underwater the scenes. However, class imbalance issue is still an unsolved bottleneck for current underwater object detection algorithms. It leads to large precision discrepancies among different classes that the dominant classes with more training data achieve higher detection precisions while the minority classes with fewer training data achieves much lower detection precisions. In this paper, we propose a novel class-wise style augmentation (CWSA) algorithm to generate a class-balanced underwater dataset Balance18 from the public contest underwater dataset URPC2018. CWSA is a new kind of data augmentation technique which augments the training data for the minority classes by generating various colors, textures and contrasts for the minority classes. Compare with previous data augmentation algorithms such flipping, cropping and rotations, CWSA is able to generate a class balanced underwater dataset with diverse color distortions and haze-effects.
Shihao Zhao, Xingjun Ma, Yisen Wang, James Bailey, Bo Li, Yu-Gang Jiang
Deep neural networks (DNNs) are increasingly deployed in different applications to achieve state-of-the-art performance. However, they are often applied as a black box with limited understanding of what knowledge the model has learned from the data. In this paper, we focus on image classification and propose a method to visualize and understand the class-wise knowledge (patterns) learned by DNNs under three different settings including natural, backdoor and adversarial. Different to existing visualization methods, our method searches for a single predictive pattern in the pixel space to represent the knowledge learned by the model for each class. Based on the proposed method, we show that DNNs trained on natural (clean) data learn abstract shapes along with some texture, and backdoored models learn a suspicious pattern for the backdoored class. Interestingly, the phenomenon that DNNs can learn a single predictive pattern for each class indicates that DNNs can learn a backdoor even from clean data, and the pattern itself is a backdoor trigger. In the adversarial setting, we show that adversarially trained models tend to learn more simplified shape patterns. Our method can serve as a useful tool to better understand the knowledge learned by DNNs on different datasets under different settings.
Mateus Roder, Leandro A. Passos, Luiz Carlos Felix Ribeiro, Clayton Pereira, João Paulo Papa
With the advent of deep learning, the number of works proposing new methods or improving existent ones has grown exponentially in the last years. In this scenario, "very deep" models were emerging, once they were expected to extract more intrinsic and abstract features while supporting a better performance. However, such models suffer from the gradient vanishing problem, i.e., backpropagation values become too close to zero in their shallower layers, ultimately causing learning to stagnate. Such an issue was overcome in the context of convolution neural networks by creating "shortcut connections" between layers, in a so-called deep residual learning framework. Nonetheless, a very popular deep learning technique called Deep Belief Network still suffers from gradient vanishing when dealing with discriminative tasks. Therefore, this paper proposes the Residual Deep Belief Network, which considers the information reinforcement layer-by-layer to improve the feature extraction and knowledge retaining, that support better discriminative performance. Experiments conducted over three public datasets demonstrate its robustness concerning the task of binary image classification.
Dan Xu, Xavier Alameda-Pineda, Wanli Ouyang, Elisa Ricci, Xiaogang Wang, Nicu Sebe
Multi-scale representations deeply learned via convolutional neural networks
have shown tremendous importance for various pixel-level prediction problems.
In this paper we present a novel approach that advances the state of the art on
pixel-level prediction in a fundamental aspect, i.e. structured multi-scale
features learning and fusion. In contrast to previous works directly
considering multi-scale feature maps obtained from the inner layers of a
primary CNN architecture, and simply fusing the features with weighted
averaging or concatenation, we propose a probabilistic graph attention network
structure based on a novel Attention-Gated Conditional Random Fields (AG-CRFs)
model for learning and fusing multi-scale representations in a principled
manner. In order to further improve the learning capacity of the network
structure, we propose to exploit feature dependant conditional kernels within
the deep probabilistic framework. Extensive experiments are conducted on four
publicly available datasets (i.e. BSDS500, NYUD-V2, KITTI, and Pascal-Context)
and on three challenging pixel-wise prediction problems involving both discrete
and continuous labels (i.e. monocular depth estimation, object contour
prediction, and semantic segmentation). Quantitative and qualitative results
demonstrate the effectiveness of the proposed latent AG-CRF model and the
overall probabilistic graph attention network with feature conditional kernels
for structured feature learning and pixel-wise prediction.
Authors' comments: Regular paper accepted at TPAMI 2020. arXiv admin note: text overlap
with arXiv:1801.00524
E. R. Ferris, A. W. Blain, R. J. Assef, N. A. Hatch, A. Kimball, M. Kim, A. Sajina, A. Silva et al.
We present near-IR photometry and spectroscopy of 30 extremely luminous radio
and mid-IR selected galaxies. With bolometric luminosities exceeding
$\sim10^{13}$ $\rm{L_{\odot}}$ and redshifts ranging from $z = 0.880-2.853$, we
use VLT instruments X-shooter and ISAAC to investigate this unique population
of galaxies. Broad multi-component emission lines are detected in 18 galaxies
and we measure the near-IR lines $\rm{H\rm{\beta}}$,
$\text{[OIII]}\rm{\lambda}\rm{\lambda}4959,5007$ and $\rm{H\rm{\alpha}}$ in
six, 15 and 13 galaxies respectively, with 10 $\rm{Ly\alpha}$ and five CIV
lines additionally detected in the UVB arm. We use the broad
$\text{[OIII]}\rm{\lambda}5007$ emission lines as a proxy for the bolometric
AGN luminosity, and derive lower limits to supermassive black hole masses of
$10^{7.9}$-$10^{9.4}$ $\text{M}_{\odot}$ with expectations of corresponding
host masses of $10^{10.4}$-$10^{12.0}$ $\text{M}_{\odot}$. We measure
$\rm{\lambda}_{Edd}$ > 1 for eight of these sources at a $2\sigma$
significance. Near-IR photometry and SED fitting are used to compare stellar
masses directly. We detect both Balmer lines in five galaxies and use these to
infer a mean visual extinction of $A_{V}$ = 2.68 mag. Due to non-detections and
uncertainties in our $\rm{H\rm{\beta}}$ emission line measurements, we simulate
a broad $\rm{H\rm{\beta}}$ line of FWHM = 1480 $\rm{kms^{-1}}$ to estimate
extinction for all sources with measured $\rm{H\rm{\alpha}}$ emission. We then
use this to infer a mean $A_{V}=3.62$ mag, demonstrating the highly-obscured
nature of these galaxies, with the consequence of increasing our estimates of
black-hole masses by an 0.5 orders of magnitude in the most extreme and
obscured cases.
Authors' comments: Accepted for publication in MNRAS. 14 pages (+8 page appendix), 11
figures and 9 tables
Cheeun Hong, Heewon Kim, Sungyong Baik, Junghun Oh, Kyoung Mu Lee
Quantizing deep convolutional neural networks for image super-resolution
substantially reduces their computational costs. However, existing works either
suffer from a severe performance drop in ultra-low precision of 4 or lower
bit-widths, or require a heavy fine-tuning process to recover the performance.
To our knowledge, this vulnerability to low precisions relies on two
statistical observations of feature map values. First, distribution of feature
map values varies significantly per channel and per input image. Second,
feature maps have outliers that can dominate the quantization error. Based on
these observations, we propose a novel distribution-aware quantization scheme
(DAQ) which facilitates accurate training-free quantization in ultra-low
precision. A simple function of DAQ determines dynamic range of feature maps
and weights with low computational burden. Furthermore, our method enables
mixed-precision quantization by calculating the relative sensitivity of each
channel, without any training process involved. Nonetheless, quantization-aware
training is also applicable for auxiliary performance gain. Our new method
outperforms recent training-free and even training-based quantization methods
to the state-of-the-art image super-resolution networks in ultra-low precision.
Authors' comments: WACV 2022
S. Valère Bitseki Penda, Jean-François Delmas
Bifurcating Markov chains (BMC) are Markov chains indexed by a full binary
tree representing the evolution of a trait along a population where each
individual has two children. We provide a central limit theorem for general
additive functionals of BMC, and prove the existence of three regimes. This
corresponds to a competition between the reproducing rate (each individual has
two children) and the ergodicity rate for the evolution of the trait. This is
in contrast with the work of Guyon (2007), where the considered additive
functionals are sums of martingale increments, and only one regime appears. Our
result can be seen as a discrete time version, but with general trait
evolution, of results in the time continuous setting of branching particle
system from Adamczak and Mi\l{}o\'{s} (2015), where the evolution of the trait
is given by an Ornstein-Uhlenbeck process.
Authors' comments: 32
Byeong-Hoo Lee, Byeong-Hee Kwon, Do-Yeun Lee, Ji-Hoon Jeong
Brain-computer interface uses brain signals to control external devices
without actual control behavior. Recently, speech imagery has been studied for
direct communication using language. Speech imagery uses brain signals
generated when the user imagines speech. Unlike motor imagery, speech imagery
still has unknown characteristics. Additionally, electroencephalography has
intricate and non-stationary properties resulting in insufficient decoding
performance. In addition, speech imagery is difficult to utilize spatial
features. In this study, we designed length-wise training that allows the model
to classify a series of a small number of words. In addition, we proposed
hierarchical convolutional neural network structure and loss function to
maximize the training strategy. The proposed method showed competitive
performance in speech imagery classification. Hence, we demonstrated that the
length of the word is a clue at improving classification performance.
Authors' comments: Submitted IEEE The 9th International Winter Conference on
Brain-Computer Interface
Ke Yan, Jinzheng Cai, Dakai Jin, Shun Miao, Dazhou Guo, Adam P. Harrison, Youbao Tang, Jing Xiao et al.
Radiological images such as computed tomography (CT) and X-rays render
anatomy with intrinsic structures. Being able to reliably locate the same
anatomical structure across varying images is a fundamental task in medical
image analysis. In principle it is possible to use landmark detection or
semantic segmentation for this task, but to work well these require large
numbers of labeled data for each anatomical structure and sub-structure of
interest. A more universal approach would learn the intrinsic structure from
unlabeled images. We introduce such an approach, called Self-supervised
Anatomical eMbedding (SAM). SAM generates semantic embeddings for each image
pixel that describes its anatomical location or body part. To produce such
embeddings, we propose a pixel-level contrastive learning framework. A
coarse-to-fine strategy ensures both global and local anatomical information
are encoded. Negative sample selection strategies are designed to enhance the
embedding's discriminability. Using SAM, one can label any point of interest on
a template image and then locate the same body part in other images by simple
nearest neighbor searching. We demonstrate the effectiveness of SAM in multiple
tasks with 2D and 3D image modalities. On a chest CT dataset with 19 landmarks,
SAM outperforms widely-used registration algorithms while only taking 0.23
seconds for inference. On two X-ray datasets, SAM, with only one labeled
template image, surpasses supervised methods trained on 50 labeled images. We
also apply SAM on whole-body follow-up lesion matching in CT and obtain an
accuracy of 91%. SAM can also be applied for improving image registration and
initializing CNN weights.
Authors' comments: code:
https://github.com/alibaba-damo-academy/self-supervised-anatomical-embedding-v2;
IEEE Trans on Medical Imaging: https://ieeexplore.ieee.org/document/9760421
Takuma Doi, Fumio Okura, Toshiki Nagahara, Yasuyuki Matsushita, Yasushi Yagi
This paper proposes a multi-view extension of instance segmentation without
relying on texture or shape descriptor matching. Multi-view instance
segmentation becomes challenging for scenes with repetitive textures and
shapes, e.g., plant leaves, due to the difficulty of multi-view matching using
texture or shape descriptors. To this end, we propose a multi-view region
matching method based on epipolar geometry, which does not rely on any feature
descriptors. We further show that the epipolar region matching can be easily
integrated into instance segmentation and effective for instance-wise 3D
reconstruction. Experiments demonstrate the improved accuracy of multi-view
instance matching and the 3D reconstruction compared to the baseline methods.
Authors' comments: ACCV2020 Oral
hsan Ullah, Andre Rios, Vaibhav Gala, Susan Mckeever
Trust and credibility in machine learning models is bolstered by the ability
of a model to explain itsdecisions. While explainability of deep learning
models is a well-known challenge, a further chal-lenge is clarity of the
explanation itself, which must be interpreted by downstream users.
Layer-wiseRelevance Propagation (LRP), an established explainability technique
developed for deep models incomputer vision, provides intuitive human-readable
heat maps of input images. We present the novelapplication of LRP for the first
time with structured datasets using a deep neural network (1D-CNN),for Credit
Card Fraud detection and Telecom Customer Churn prediction datasets. We show
how LRPis more effective than traditional explainability concepts of Local
Interpretable Model-agnostic Ex-planations (LIME) and Shapley Additive
Explanations (SHAP) for explainability. This effectivenessis both local to a
sample level and holistic over the whole testing set. We also discuss the
significantcomputational time advantage of LRP (1-2s) over LIME (22s) and SHAP
(108s), and thus its poten-tial for real time application scenarios. In
addition, our validation of LRP has highlighted features forenhancing model
performance, thus opening up a new area of research of using XAI as an
approachfor feature subset selection
Authors' comments: 13 pages, 5 figures, 6 tables
Dietrich Ryter
SDEs are solved in two steps: (1) for short times by successive approximation
in the integral equation, which leads to non-Gaussian increments when the noise
is multiplicative, (2) by summing up these increments in consecutive short time
intervals. This corresponds to a modified anti-Ito integral. That procedure
saves the choice of an integration sense, and it also avoids an intrinsic
mismatch between the standard stochastic integrals (with Gaussian increments)
and the Fokker-Planck equations (with non-Gaussian solutions). As a further new
feature, the local diffusion parameters (plus a noise-independent drift) are
sufficient to specify the SDE. This can simplify the modelling. For the FPE it
means that the diffusion matrix alone accounts for the noise (the well-known
and valid anti-Ito FPE involves a noise-induced drift part that cancels with
some other term).
Authors' comments: New access, with non-Gaussian basic path increments derived from
local diffusion