Chen Lin, Zhichao Ouyang, Junqing Zhuang, Jianqiang Chen, Hui Li, Rongxin Wu
Automatic code summarization frees software developers from the heavy burden
of manual commenting and benefits software development and maintenance.
Abstract Syntax Tree (AST), which depicts the source code's syntactic
structure, has been incorporated to guide the generation of code summaries.
However, existing AST based methods suffer from the difficulty of training and
generate inadequate code summaries. In this paper, we present the Block-wise
Abstract Syntax Tree Splitting method (BASTS for short), which fully utilizes
the rich tree-form syntax structure in ASTs, for improving code summarization.
BASTS splits the code of a method based on the blocks in the dominator tree of
the Control Flow Graph, and generates a split AST for each code split. Each
split AST is then modeled by a Tree-LSTM using a pre-training strategy to
capture local non-linear syntax encoding. The learned syntax encoding is
combined with code encoding, and fed into Transformer to generate high-quality
code summaries. Comprehensive experiments on benchmarks have demonstrated that
BASTS significantly outperforms state-of-the-art approaches in terms of various
evaluation metrics. To facilitate reproducibility, our implementation is
available at https://github.com/XMUDM/BASTS.
Authors' comments: Accepted in 29th IEEE/ACM International Conference on Program
Comprehension (ICPC 2021)
Yang Bai, Yuyuan Zeng, Yong Jiang, Shu-Tao Xia, Xingjun Ma, Yisen Wang
The study of adversarial examples and their activation has attracted
significant attention for secure and robust learning with deep neural networks
(DNNs). Different from existing works, in this paper, we highlight two new
characteristics of adversarial examples from the channel-wise activation
perspective: 1) the activation magnitudes of adversarial examples are higher
than that of natural examples; and 2) the channels are activated more uniformly
by adversarial examples than natural examples. We find that the
state-of-the-art defense adversarial training has addressed the first issue of
high activation magnitudes via training on adversarial examples, while the
second issue of uniform activation remains. This motivates us to suppress
redundant activation from being activated by adversarial perturbations via a
Channel-wise Activation Suppressing (CAS) strategy. We show that CAS can train
a model that inherently suppresses adversarial activation, and can be easily
applied to existing defense methods to further improve their robustness. Our
work provides a simple but generic training strategy for robustifying the
intermediate layer activation of DNNs.
Authors' comments: ICLR2021 accepted paper
Giancarlo Di Biase, Hermann Blum, Roland Siegwart, Cesar Cadena
The inability of state-of-the-art semantic segmentation methods to detect anomaly instances hinders them from being deployed in safety-critical and complex applications, such as autonomous driving. Recent approaches have focused on either leveraging segmentation uncertainty to identify anomalous areas or re-synthesizing the image from the semantic label map to find dissimilarities with the input image. In this work, we demonstrate that these two methodologies contain complementary information and can be combined to produce robust predictions for anomaly segmentation. We present a pixel-wise anomaly detection framework that uses uncertainty maps to improve over existing re-synthesis methods in finding dissimilarities between the input and generated images. Our approach works as a general framework around already trained segmentation networks, which ensures anomaly detection without compromising segmentation accuracy, while significantly outperforming all similar methods. Top-2 performance across a range of different anomaly datasets shows the robustness of our approach to handling different anomaly instances.
Jinhua Zhu, Lijun Wu, Yingce Xia, Shufang Xie, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu
With sequentially stacked self-attention, (optional) encoder-decoder
attention, and feed-forward layers, Transformer achieves big success in natural
language processing (NLP), and many variants have been proposed. Currently,
almost all these models assume that the layer order is fixed and kept the same
across data samples. We observe that different data samples actually favor
different orders of the layers. Based on this observation, in this work, we
break the assumption of the fixed layer order in the Transformer and introduce
instance-wise layer reordering into the model structure. Our Instance-wise
Ordered Transformer (IOT) can model variant functions by reordered layers,
which enables each sample to select the better one to improve the model
performance under the constraint of almost the same number of parameters. To
achieve this, we introduce a light predictor with negligible parameter and
inference cost to decide the most capable and favorable layer order for any
input sequence. Experiments on 3 tasks (neural machine translation, abstractive
summarization, and code generation) and 9 datasets demonstrate consistent
improvements of our method. We further show that our method can also be applied
to other architectures beyond Transformer. Our code is released at Github.
Authors' comments: Accepted at ICLR-2021
Haozhe Liu, Haoqian Wu, Weicheng Xie, Feng Liu, Linlin Shen
The convolutional neural network (CNN) is vulnerable to degraded images with
even very small variations (e.g. corrupted and adversarial samples). One of the
possible reasons is that CNN pays more attention to the most discriminative
regions, but ignores the auxiliary features when learning, leading to the lack
of feature diversity for final judgment. In our method, we propose to
dynamically suppress significant activation values of CNN by group-wise
inhibition, but not fixedly or randomly handle them when training. The feature
maps with different activation distribution are then processed separately to
take the feature independence into account. CNN is finally guided to learn
richer discriminative features hierarchically for robust classification
according to the proposed regularization. Our method is comprehensively
evaluated under multiple settings, including classification against
corruptions, adversarial attacks and low data regime. Extensive experimental
results show that the proposed method can achieve significant improvements in
terms of both robustness and generalization performances, when compared with
the state-of-the-art methods. Code is available at
https://github.com/LinusWu/TENET_Training.
Authors' comments: Accepted to ICCV 2021
Shohei Kubota, Hideaki Hayashi, Tomohiro Hayase, Seiichi Uchida
The interpretability of neural networks (NNs) is a challenging but essential
topic for transparency in the decision-making process using machine learning.
One of the reasons for the lack of interpretability is random weight
initialization, where the input is randomly embedded into a different feature
space in each layer. In this paper, we propose an interpretation method for a
deep multilayer perceptron, which is the most general architecture of NNs,
based on identity initialization (namely, initialization using identity
matrices). The proposed method allows us to analyze the contribution of each
neuron to classification and class likelihood in each hidden layer. As a
property of the identity-initialized perceptron, the weight matrices remain
near the identity matrices even after learning. This property enables us to
treat the change of features from the input to each hidden layer as the
contribution to classification. Furthermore, we can separate the output of each
hidden layer into a contribution map that depicts the contribution to
classification and class likelihood, by adding extra dimensions to each layer
according to the number of classes, thereby allowing the calculation of the
recognition accuracy in each layer and thus revealing the roles of independent
layers, such as feature extraction and classification.
Authors' comments: Accepted at ICASSP2021
Yang You, Yujing Lou, Ruoxi Shi, Qi Liu, Yu-Wing Tai, Lizhuang Ma, Weiming Wang, Cewu Lu
Point cloud analysis without pose priors is very challenging in real
applications, as the orientations of point clouds are often unknown. In this
paper, we propose a brand new point-set learning framework PRIN, namely,
Point-wise Rotation Invariant Network, focusing on rotation invariant feature
extraction in point clouds analysis. We construct spherical signals by Density
Aware Adaptive Sampling to deal with distorted point distributions in spherical
space. Spherical Voxel Convolution and Point Re-sampling are proposed to
extract rotation invariant features for each point. In addition, we extend PRIN
to a sparse version called SPRIN, which directly operates on sparse point
clouds. Both PRIN and SPRIN can be applied to tasks ranging from object
classification, part segmentation, to 3D feature matching and label alignment.
Results show that, on the dataset with randomly rotated point clouds, SPRIN
demonstrates better performance than state-of-the-art methods without any data
augmentation. We also provide thorough theoretical proof and analysis for
point-wise rotation invariance achieved by our methods. Our code is available
on https://github.com/qq456cvb/SPRIN.
Authors' comments: Accepted to IEEE Transactions on Pattern Analysis and Machine
Intelligence
Ginevra Carbone, Guido Sanguinetti, Luca Bortolussi
We consider the problem of the stability of saliency-based explanations of Neural Network predictions under adversarial attacks in a classification task. Saliency interpretations of deterministic Neural Networks are remarkably brittle even when the attacks fail, i.e. for attacks that do not change the classification label. We empirically show that interpretations provided by Bayesian Neural Networks are considerably more stable under adversarial perturbations of the inputs and even under direct attacks to the explanations. By leveraging recent results, we also provide a theoretical explanation of this result in terms of the geometry of the data manifold. Additionally, we discuss the stability of the interpretations of high level representations of the inputs in the internal layers of a Network. Our results demonstrate that Bayesian methods, in addition to being more robust to adversarial attacks, have the potential to provide more stable and interpretable assessments of Neural Network predictions.
Pritam Anand
In this paper, we have considered general k-piece-wise linear convex loss
functions in SVM model for measuring the empirical risk. The resulting
k-Piece-wise Linear loss Support Vector Machine (k-PL-SVM) model is an adaptive
SVM model which can learn a suitable piece-wise linear loss function according
to nature of the given training set. The k-PL-SVM models are general SVM models
and existing popular SVM models, like C-SVM, LS-SVM and Pin-SVM models, are
their particular cases. We have performed the extensive numerical experiments
with k-PL-SVM models for k = 2 and 3 and shown that they are improvement over
existing SVM models.
Authors' comments: 9 pages
Tristan M. Gottschalk, Andreas Maier, Florian Kordon, Björn W. Kreher
Metal implants that are inserted into the patient's body during trauma
interventions cause heavy artifacts in 3D X-ray acquisitions. Metal Artifact
Reduction (MAR) methods, whose first step is always a segmentation of the
present metal objects, try to remove these artifacts. Thereby, the segmentation
is a crucial task which has strong influence on the MAR's outcome. This study
proposes and evaluates a learning-based patch-wise segmentation network and a
newly proposed Consistency Check as post-processing step. The combination of
the learned segmentation and Consistency Check reaches a high segmentation
performance with an average IoU score of 0.924 on the test set. Furthermore,
the Consistency Check proves the ability to significantly reduce false positive
segmentations whilst simultaneously ensuring consistent segmentations.
Authors' comments: Accepted for Bildverarbeitung f\"ur die Medizin, 07.-09.03.2021
Sertac Arisoy, Nasser M. Nasrabadi, Koray Kayabol
We propose a completely unsupervised pixel-wise anomaly detection method for hyperspectral images. The proposed method consists of three steps called data preparation, reconstruction, and detection. In the data preparation step, we apply a background purification to train the deep network in an unsupervised manner. In the reconstruction step, we propose to use three different deep autoencoding adversarial network (AEAN) models including 1D-AEAN, 2D-AEAN, and 3D-AEAN which are developed for working on spectral, spatial, and joint spectral-spatial domains, respectively. The goal of the AEAN models is to generate synthesized hyperspectral images (HSIs) which are close to real ones. A reconstruction error map (REM) is calculated between the original and the synthesized image pixels. In the detection step, we propose to use a WRX-based detector in which the pixel weights are obtained according to REM. We compare our proposed method with the classical RX, WRX, support vector data description-based (SVDD), collaborative representation-based detector (CRD), adaptive weight deep belief network (AW-DBN) detector and deep autoencoder anomaly detection (DAEAD) method on real hyperspectral datasets. The experimental results show that the proposed approach outperforms other detectors in the benchmark.
Elizabeth Naluminsa, Edward C. Elson, Thomas H. Jarrett
We present the global scaling relations between the neutral atomic hydrogen
gas, the stellar disk and the star forming disk in a sample of 228 nearby
galaxies that are both spatially and spectrally resolved in HI line emission.
We have used HI data from the Westerbork survey of HI in Irregular and Spiral
galaxies (WHISP) and Mid Infrared (3.4 $\mu m$, 11.6 $\mu m$) data from the
Wide-field Infrared Survey Explorer (WISE) survey, combining two datasets that
are well-suited to such a study in terms of uniformity, resolution and
sensitivity. We utilize the novel method of deriving scaling relations for
quantities enclosed within the stellar disk rather than integrating over the HI
disk and find the global scaling relations to be tighter when defined for
enclosed quantities. We also present new HI intensity maps for the WHISP survey
derived using a robust noise rejection technique along with corresponding
velocity fields.
Authors' comments: 18 pages, 5 tables, 16 Figures. Accepted for publication in the
Monthly Notices of the Royal Astronomical Society. Minor revision
Benoît de Courson, Léo Fitouchi, Jean-Philippe Bouchaud, Michael Benzaquen
The ability to learn from others (social learning) is often deemed a cause of
human species success. But if social learning is indeed more efficient (whether
less costly or more accurate) than individual learning, it raises the question
of why would anyone engage in individual information seeking, which is a
necessary condition for social learning's efficacy. We propose an evolutionary
model solving this paradox, provided agents (i) aim not only at information
quality but also vie for audience and prestige, and (ii) do not only value
accuracy but also reward originality -- allowing them to alleviate herding
effects. We find that under some conditions (large enough success rate of
informed agents and intermediate taste for popularity), both social learning's
higher accuracy and the taste for original opinions are evolutionary-stable,
within a mutually beneficial division of labour-like equilibrium. When such
conditions are not met, the system most often converges towards mutually
detrimental equilibria.
Authors' comments: 11 pages, 6 figures, 1 table
Yuval Nirkin, Lior Wolf, Tal Hassner
We present a novel, real-time, semantic segmentation network in which the
encoder both encodes and generates the parameters (weights) of the decoder.
Furthermore, to allow maximal adaptivity, the weights at each decoder block
vary spatially. For this purpose, we design a new type of hypernetwork,
composed of a nested U-Net for drawing higher level context features, a
multi-headed weight generating module which generates the weights of each block
in the decoder immediately before they are consumed, for efficient memory
utilization, and a primary network that is composed of novel dynamic patch-wise
convolutions. Despite the usage of less-conventional blocks, our architecture
obtains real-time performance. In terms of the runtime vs. accuracy trade-off,
we surpass state of the art (SotA) results on popular semantic segmentation
benchmarks: PASCAL VOC 2012 (val. set) and real-time semantic segmentation on
Cityscapes, and CamVid. The code is available: https://nirkin.com/hyperseg.
Authors' comments: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Shachar Gluska, Mark Grobman
Quantization is an essential step in the efficient deployment of deep learning models and as such is an increasingly popular research topic. An important practical aspect that is not addressed in the current literature is how to analyze and fix fail cases where the use of quantization results in excessive degradation. In this paper, we present a simple analytic framework that breaks down overall degradation to its per layer contributions. We analyze many common networks and observe that a layer's contribution is determined by both intrinsic (local) factors - the distribution of the layer's weights and activations - and extrinsic (global) factors having to do with the the interaction with the rest of the layers. Layer-wise analysis of existing quantization schemes reveals local fail-cases of existing techniques which are not reflected when inspecting their overall performance. As an example, we consider ResNext26 on which SoTA post-training quantization methods perform poorly. We show that almost all of the degradation stems from a single layer. The same analysis also allows for local fixes - applying a common weight clipping heuristic only to this layer reduces degradation to a minimum while applying the same heuristic globally results in high degradation. More generally, layer-wise analysis allows for a more nuanced examination of how quantization affects the network, enabling the design of better performing schemes.
Junchen Ye, Leilei Sun, Bowen Du, Yanjie Fu, Hui Xiong
Graph Convolutional Network (GCN) has been widely applied in transportation demand prediction due to its excellent ability to capture non-Euclidean spatial dependence among station-level or regional transportation demands. However, in most of the existing research, the graph convolution was implemented on a heuristically generated adjacency matrix, which could neither reflect the real spatial relationships of stations accurately, nor capture the multi-level spatial dependence of demands adaptively. To cope with the above problems, this paper provides a novel graph convolutional network for transportation demand prediction. Firstly, a novel graph convolution architecture is proposed, which has different adjacency matrices in different layers and all the adjacency matrices are self-learned during the training process. Secondly, a layer-wise coupling mechanism is provided, which associates the upper-level adjacency matrix with the lower-level one. It also reduces the scale of parameters in our model. Lastly, a unitary network is constructed to give the final prediction result by integrating the hidden spatial states with gated recurrent unit, which could capture the multi-level spatial dependence and temporal dynamics simultaneously. Experiments have been conducted on two real-world datasets, NYC Citi Bike and NYC Taxi, and the results demonstrate the superiority of our model over the state-of-the-art ones.
Lorenzo Luciano, Imre Kiss, Peter William Beardshear, Esther Kadosh, A. Ben Hamza
The performance levels of a computing machine running a given workload configuration are crucial for both users and providers of computing resources. Knowing how well a computing machine is running with a given workload configuration is critical to making proper computing resource allocation decisions. In this paper, we introduce a novel framework for deriving computing machine and computing resource performance indicators for a given workload configuration. We propose a workload/machine index score (WISE) framework for computing a fitness score for a workload/machine combination. The WISE score indicates how well a computing machine is running with a specific workload configuration by addressing the issue of whether resources are being stressed or sitting idle wasting precious resources. In addition to encompassing any number of computing resources, the WISE score is determined by considering how far from target levels the machine resources are operating at without maxing out. Experimental results demonstrate the efficacy of the proposed WISE framework on two distinct workload configurations.
Vajira Thambawita, Steven Hicks, Pål Halvorsen, Michael A. Riegler
Segmentation of findings in the gastrointestinal tract is a challenging but also an important task which is an important building stone for sufficient automatic decision support systems. In this work, we present our solution for the Medico 2020 task, which focused on the problem of colon polyp segmentation. We present our simple but efficient idea of using an augmentation method that uses grids in a pyramid-like manner (large to small) for segmentation. Our results show that the proposed methods work as indented and can also lead to comparable results when competing with other methods.
Xueyi Li, Tianfei Zhou, Jianwu Li, Yi Zhou, Zhaoxiang Zhang
Acquiring sufficient ground-truth supervision to train deep visual models has
been a bottleneck over the years due to the data-hungry nature of deep
learning. This is exacerbated in some structured prediction tasks, such as
semantic segmentation, which requires pixel-level annotations. This work
addresses weakly supervised semantic segmentation (WSSS), with the goal of
bridging the gap between image-level annotations and pixel-level segmentation.
We formulate WSSS as a novel group-wise learning task that explicitly models
semantic dependencies in a group of images to estimate more reliable pseudo
ground-truths, which can be used for training more accurate segmentation
models. In particular, we devise a graph neural network (GNN) for group-wise
semantic mining, wherein input images are represented as graph nodes, and the
underlying relations between a pair of images are characterized by an efficient
co-attention mechanism. Moreover, in order to prevent the model from paying
excessive attention to common semantics only, we further propose a graph
dropout layer, encouraging the model to learn more accurate and complete object
responses. The whole network is end-to-end trainable by iterative message
passing, which propagates interaction cues over the images to progressively
improve the performance. We conduct experiments on the popular PASCAL VOC 2012
and COCO benchmarks, and our model yields state-of-the-art performance. Our
code is available at: https://github.com/Lixy1997/Group-WSSS.
Authors' comments: Accepted to AAAI 2021. Code: https://github.com/Lixy1997/Group-WSSS
Zhibin Li, Jian Zhang, Yongshun Gong, Yazhou Yao, Qiang Wu
We propose a new method for learning with multi-field categorical data.
Multi-field categorical data are usually collected over many heterogeneous
groups. These groups can reflect in the categories under a field. The existing
methods try to learn a universal model that fits all data, which is challenging
and inevitably results in learning a complex model. In contrast, we propose a
field-wise learning method leveraging the natural structure of data to learn
simple yet efficient one-to-one field-focused models with appropriate
constraints. In doing this, the models can be fitted to each category and thus
can better capture the underlying differences in data. We present a model that
utilizes linear models with variance and low-rank constraints, to help it
generalize better and reduce the number of parameters. The model is also
interpretable in a field-wise manner. As the dimensionality of multi-field
categorical data can be very high, the models applied to such data are mostly
over-parameterized. Our theoretical analysis can potentially explain the effect
of over-parametrization on the generalization of our model. It also supports
the variance constraints in the learning objective. The experiment results on
two large-scale datasets show the superior performance of our model, the trend
of the generalization error bound, and the interpretability of learning
outcomes. Our code is available at
https://github.com/lzb5600/Field-wise-Learning.
Authors' comments: Accepted at NeurIPS 2020