Yuchen Ma, Songtao Liu, Zeming Li, Jian Sun
We propose a dense object detector with an instance-wise sampling strategy,
named IQDet. Instead of using human prior sampling strategies, we first extract
the regional feature of each ground-truth to estimate the instance-wise quality
distribution. According to a mixture model in spatial dimensions, the
distribution is more noise-robust and adapted to the semantic pattern of each
instance. Based on the distribution, we propose a quality sampling strategy,
which automatically selects training samples in a probabilistic manner and
trains with more high-quality samples. Extensive experiments on MS COCO show
that our method steadily improves baseline by nearly 2.4 AP without bells and
whistles. Moreover, our best model achieves 51.6 AP, outperforming all existing
state-of-the-art one-stage detectors and it is completely cost-free in
inference time.
Authors' comments: Accepted by CVPR 2021
Dimitra Koumoutsou, Eleni Charou, Georgios Siolas, Giorgos Stamou
This paper introduces the Class-wise Principal Component Analysis, a supervised feature extraction method for hyperspectral data. Hyperspectral Imaging (HSI) has appeared in various fields in recent years, including Remote Sensing. Realizing that information extraction tasks for hyperspectral images are burdened by data-specific issues, we identify and address two major problems. Those are the Curse of Dimensionality which occurs due to the high-volume of the data cube and the class imbalance problem which is common in hyperspectral datasets. Dimensionality reduction is an essential preprocessing step to complement a hyperspectral image classification task. Therefore, we propose a feature extraction algorithm for dimensionality reduction, based on Principal Component Analysis (PCA). Evaluations are carried out on the Indian Pines dataset to demonstrate that significant improvements are achieved when using the reduced data in a classification task.
Yufei Feng, Binbin Hu, Yu Gong, Fei Sun, Qingwen Liu, Wenwu Ou
Reranking is attracting incremental attention in the recommender systems,
which rearranges the input ranking list into the final rank-ing list to better
meet user demands. Most existing methods greedily rerank candidates through the
rating scores from point-wise or list-wise models. Despite effectiveness,
neglecting the mutual influence between each item and its contexts in the final
ranking list often makes the greedy strategy based reranking methods
sub-optimal. In this work, we propose a new context-wise reranking framework
named Generative Rerank Network (GRN). Specifically, we first design the
evaluator, which applies Bi-LSTM and self-attention mechanism to model the
contextual information in the labeled final ranking list and predict the
interaction probability of each item more precisely. Afterwards, we elaborate
on the generator, equipped with GRU, attention mechanism and pointer network to
select the item from the input ranking list step by step. Finally, we apply
cross-entropy loss to train the evaluator and, subsequently, policy gradient to
optimize the generator under the guidance of the evaluator. Empirical results
show that GRN consistently and significantly outperforms state-of-the-art
point-wise and list-wise methods. Moreover, GRN has achieved a performance
improvement of 5.2% on PV and 6.1% on IPV metric after the successful
deployment in one popular recommendation scenario of Taobao application.
Authors' comments: Better read with arXiv:2102.12057. arXiv admin note: text overlap
with arXiv:2102.12057
Ange Lou, Murray Loew
Real-time semantic segmentation is playing a more important role in computer
vision, due to the growing demand for mobile devices and autonomous driving.
Therefore, it is very important to achieve a good trade-off among performance,
model size and inference speed. In this paper, we propose a Channel-wise
Feature Pyramid (CFP) module to balance those factors. Based on the CFP module,
we built CFPNet for real-time semantic segmentation which applied a series of
dilated convolution channels to extract effective features. Experiments on
Cityscapes and CamVid datasets show that the proposed CFPNet achieves an
effective combination of those factors. For the Cityscapes test dataset, CFPNet
achieves 70.1% class-wise mIoU with only 0.55 million parameters and 2.5 MB
memory. The inference speed can reach 30 FPS on a single RTX 2080Ti GPU with a
1024x2048-pixel image.
Authors' comments: Accepted by ICIP 2021
Guanglei Yang, Hao Tang, Mingli Ding, Nicu Sebe, Elisa Ricci
While convolutional neural networks have shown a tremendous impact on various
computer vision tasks, they generally demonstrate limitations in explicitly
modeling long-range dependencies due to the intrinsic locality of the
convolution operation. Initially designed for natural language processing
tasks, Transformers have emerged as alternative architectures with innate
global self-attention mechanisms to capture long-range dependencies. In this
paper, we propose TransDepth, an architecture that benefits from both
convolutional neural networks and transformers. To avoid the network losing its
ability to capture local-level details due to the adoption of transformers, we
propose a novel decoder that employs attention mechanisms based on gates.
Notably, this is the first paper that applies transformers to pixel-wise
prediction problems involving continuous labels (i.e., monocular depth
prediction and surface normal estimation). Extensive experiments demonstrate
that the proposed TransDepth achieves state-of-the-art performance on three
challenging datasets. Our code is available at:
https://github.com/ygjwd12345/TransDepth.
Authors' comments: ICCV 2021
HanQin Cai, Keaton Hamm, Longxiu Huang, Deanna Needell
Low rank tensor approximation is a fundamental tool in modern machine learning and data science. In this paper, we study the characterization, perturbation analysis, and an efficient sampling strategy for two primary tensor CUR approximations, namely Chidori and Fiber CUR. We characterize exact tensor CUR decompositions for low multilinear rank tensors. We also present theoretical error bounds of the tensor CUR approximations when (adversarial or Gaussian) noise appears. Moreover, we show that low cost uniform sampling is sufficient for tensor CUR approximations if the tensor has an incoherent structure. Empirical performance evaluations, with both synthetic and real-world datasets, establish the speed advantage of the tensor CUR approximations over other state-of-the-art low multilinear rank tensor approximations.
Chen Liang, Haoming Jiang, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao, Tuo Zhao
Existing curriculum learning approaches to Neural Machine Translation (NMT) require sampling sufficient amounts of "easy" samples from training data at the early training stage. This is not always achievable for low-resource languages where the amount of training data is limited. To address such limitation, we propose a novel token-wise curriculum learning approach that creates sufficient amounts of easy samples. Specifically, the model learns to predict a short sub-sequence from the beginning part of each target sentence at the early stage of training, and then the sub-sequence is gradually expanded as the training progresses. Such a new curriculum design is inspired by the cumulative effect of translation errors, which makes the latter tokens more difficult to predict than the beginning ones. Extensive experiments show that our approach can consistently outperform baselines on 5 language pairs, especially for low-resource languages. Combining our approach with sentence-level methods further improves the performance on high-resource languages.
Yang Su, Michael Chesser, Yansong Gao, Alanson P. Sample, Damith C. Ranasinghe
Emerging ultra-low-power tiny scale computing devices in Cyber-Physical
Systems %and Internet of Things (IoT) run on harvested energy, are
intermittently powered, have limited computational capability, and perform
sensing and actuation functions under the control of a dedicated firmware
operating without the supervisory control of an operating system. Wirelessly
updating or patching the firmware of such devices is inevitable. We consider
the challenging problem of simultaneous and secure firmware updates or patching
for a typical class of such devices -- Computational Radio Frequency
Identification (CRFID) devices. We propose Wisecr, the first secure and
simultaneous wireless code dissemination mechanism to multiple devices that
prevent malicious code injection attacks and intellectual property (IP) theft,
whilst enabling remote attestation of code installation. Importantly, Wisecr is
engineered to comply with existing ISO compliant communication protocol
standards employed by CRFID devices and systems. We comprehensively evaluate
Wisecr's overhead, demonstrate its implementation over standards-compliant
protocols, analyze its security and implement an end-to-end realization with
popular CRFID devices -- the open-source code is released on GitHub.
Authors' comments: 19 main pages, 6 Appendix. Under review at IEEE TDSC
Chen Lin, Zhichao Ouyang, Junqing Zhuang, Jianqiang Chen, Hui Li, Rongxin Wu
Automatic code summarization frees software developers from the heavy burden
of manual commenting and benefits software development and maintenance.
Abstract Syntax Tree (AST), which depicts the source code's syntactic
structure, has been incorporated to guide the generation of code summaries.
However, existing AST based methods suffer from the difficulty of training and
generate inadequate code summaries. In this paper, we present the Block-wise
Abstract Syntax Tree Splitting method (BASTS for short), which fully utilizes
the rich tree-form syntax structure in ASTs, for improving code summarization.
BASTS splits the code of a method based on the blocks in the dominator tree of
the Control Flow Graph, and generates a split AST for each code split. Each
split AST is then modeled by a Tree-LSTM using a pre-training strategy to
capture local non-linear syntax encoding. The learned syntax encoding is
combined with code encoding, and fed into Transformer to generate high-quality
code summaries. Comprehensive experiments on benchmarks have demonstrated that
BASTS significantly outperforms state-of-the-art approaches in terms of various
evaluation metrics. To facilitate reproducibility, our implementation is
available at https://github.com/XMUDM/BASTS.
Authors' comments: Accepted in 29th IEEE/ACM International Conference on Program
Comprehension (ICPC 2021)
Yang Bai, Yuyuan Zeng, Yong Jiang, Shu-Tao Xia, Xingjun Ma, Yisen Wang
The study of adversarial examples and their activation has attracted
significant attention for secure and robust learning with deep neural networks
(DNNs). Different from existing works, in this paper, we highlight two new
characteristics of adversarial examples from the channel-wise activation
perspective: 1) the activation magnitudes of adversarial examples are higher
than that of natural examples; and 2) the channels are activated more uniformly
by adversarial examples than natural examples. We find that the
state-of-the-art defense adversarial training has addressed the first issue of
high activation magnitudes via training on adversarial examples, while the
second issue of uniform activation remains. This motivates us to suppress
redundant activation from being activated by adversarial perturbations via a
Channel-wise Activation Suppressing (CAS) strategy. We show that CAS can train
a model that inherently suppresses adversarial activation, and can be easily
applied to existing defense methods to further improve their robustness. Our
work provides a simple but generic training strategy for robustifying the
intermediate layer activation of DNNs.
Authors' comments: ICLR2021 accepted paper
Giancarlo Di Biase, Hermann Blum, Roland Siegwart, Cesar Cadena
The inability of state-of-the-art semantic segmentation methods to detect anomaly instances hinders them from being deployed in safety-critical and complex applications, such as autonomous driving. Recent approaches have focused on either leveraging segmentation uncertainty to identify anomalous areas or re-synthesizing the image from the semantic label map to find dissimilarities with the input image. In this work, we demonstrate that these two methodologies contain complementary information and can be combined to produce robust predictions for anomaly segmentation. We present a pixel-wise anomaly detection framework that uses uncertainty maps to improve over existing re-synthesis methods in finding dissimilarities between the input and generated images. Our approach works as a general framework around already trained segmentation networks, which ensures anomaly detection without compromising segmentation accuracy, while significantly outperforming all similar methods. Top-2 performance across a range of different anomaly datasets shows the robustness of our approach to handling different anomaly instances.
Jinhua Zhu, Lijun Wu, Yingce Xia, Shufang Xie, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu
With sequentially stacked self-attention, (optional) encoder-decoder
attention, and feed-forward layers, Transformer achieves big success in natural
language processing (NLP), and many variants have been proposed. Currently,
almost all these models assume that the layer order is fixed and kept the same
across data samples. We observe that different data samples actually favor
different orders of the layers. Based on this observation, in this work, we
break the assumption of the fixed layer order in the Transformer and introduce
instance-wise layer reordering into the model structure. Our Instance-wise
Ordered Transformer (IOT) can model variant functions by reordered layers,
which enables each sample to select the better one to improve the model
performance under the constraint of almost the same number of parameters. To
achieve this, we introduce a light predictor with negligible parameter and
inference cost to decide the most capable and favorable layer order for any
input sequence. Experiments on 3 tasks (neural machine translation, abstractive
summarization, and code generation) and 9 datasets demonstrate consistent
improvements of our method. We further show that our method can also be applied
to other architectures beyond Transformer. Our code is released at Github.
Authors' comments: Accepted at ICLR-2021
Haozhe Liu, Haoqian Wu, Weicheng Xie, Feng Liu, Linlin Shen
The convolutional neural network (CNN) is vulnerable to degraded images with
even very small variations (e.g. corrupted and adversarial samples). One of the
possible reasons is that CNN pays more attention to the most discriminative
regions, but ignores the auxiliary features when learning, leading to the lack
of feature diversity for final judgment. In our method, we propose to
dynamically suppress significant activation values of CNN by group-wise
inhibition, but not fixedly or randomly handle them when training. The feature
maps with different activation distribution are then processed separately to
take the feature independence into account. CNN is finally guided to learn
richer discriminative features hierarchically for robust classification
according to the proposed regularization. Our method is comprehensively
evaluated under multiple settings, including classification against
corruptions, adversarial attacks and low data regime. Extensive experimental
results show that the proposed method can achieve significant improvements in
terms of both robustness and generalization performances, when compared with
the state-of-the-art methods. Code is available at
https://github.com/LinusWu/TENET_Training.
Authors' comments: Accepted to ICCV 2021
Shohei Kubota, Hideaki Hayashi, Tomohiro Hayase, Seiichi Uchida
The interpretability of neural networks (NNs) is a challenging but essential
topic for transparency in the decision-making process using machine learning.
One of the reasons for the lack of interpretability is random weight
initialization, where the input is randomly embedded into a different feature
space in each layer. In this paper, we propose an interpretation method for a
deep multilayer perceptron, which is the most general architecture of NNs,
based on identity initialization (namely, initialization using identity
matrices). The proposed method allows us to analyze the contribution of each
neuron to classification and class likelihood in each hidden layer. As a
property of the identity-initialized perceptron, the weight matrices remain
near the identity matrices even after learning. This property enables us to
treat the change of features from the input to each hidden layer as the
contribution to classification. Furthermore, we can separate the output of each
hidden layer into a contribution map that depicts the contribution to
classification and class likelihood, by adding extra dimensions to each layer
according to the number of classes, thereby allowing the calculation of the
recognition accuracy in each layer and thus revealing the roles of independent
layers, such as feature extraction and classification.
Authors' comments: Accepted at ICASSP2021
Yang You, Yujing Lou, Ruoxi Shi, Qi Liu, Yu-Wing Tai, Lizhuang Ma, Weiming Wang, Cewu Lu
Point cloud analysis without pose priors is very challenging in real
applications, as the orientations of point clouds are often unknown. In this
paper, we propose a brand new point-set learning framework PRIN, namely,
Point-wise Rotation Invariant Network, focusing on rotation invariant feature
extraction in point clouds analysis. We construct spherical signals by Density
Aware Adaptive Sampling to deal with distorted point distributions in spherical
space. Spherical Voxel Convolution and Point Re-sampling are proposed to
extract rotation invariant features for each point. In addition, we extend PRIN
to a sparse version called SPRIN, which directly operates on sparse point
clouds. Both PRIN and SPRIN can be applied to tasks ranging from object
classification, part segmentation, to 3D feature matching and label alignment.
Results show that, on the dataset with randomly rotated point clouds, SPRIN
demonstrates better performance than state-of-the-art methods without any data
augmentation. We also provide thorough theoretical proof and analysis for
point-wise rotation invariance achieved by our methods. Our code is available
on https://github.com/qq456cvb/SPRIN.
Authors' comments: Accepted to IEEE Transactions on Pattern Analysis and Machine
Intelligence
Ginevra Carbone, Guido Sanguinetti, Luca Bortolussi
We consider the problem of the stability of saliency-based explanations of Neural Network predictions under adversarial attacks in a classification task. Saliency interpretations of deterministic Neural Networks are remarkably brittle even when the attacks fail, i.e. for attacks that do not change the classification label. We empirically show that interpretations provided by Bayesian Neural Networks are considerably more stable under adversarial perturbations of the inputs and even under direct attacks to the explanations. By leveraging recent results, we also provide a theoretical explanation of this result in terms of the geometry of the data manifold. Additionally, we discuss the stability of the interpretations of high level representations of the inputs in the internal layers of a Network. Our results demonstrate that Bayesian methods, in addition to being more robust to adversarial attacks, have the potential to provide more stable and interpretable assessments of Neural Network predictions.
Pritam Anand
In this paper, we have considered general k-piece-wise linear convex loss
functions in SVM model for measuring the empirical risk. The resulting
k-Piece-wise Linear loss Support Vector Machine (k-PL-SVM) model is an adaptive
SVM model which can learn a suitable piece-wise linear loss function according
to nature of the given training set. The k-PL-SVM models are general SVM models
and existing popular SVM models, like C-SVM, LS-SVM and Pin-SVM models, are
their particular cases. We have performed the extensive numerical experiments
with k-PL-SVM models for k = 2 and 3 and shown that they are improvement over
existing SVM models.
Authors' comments: 9 pages
Tristan M. Gottschalk, Andreas Maier, Florian Kordon, Björn W. Kreher
Metal implants that are inserted into the patient's body during trauma
interventions cause heavy artifacts in 3D X-ray acquisitions. Metal Artifact
Reduction (MAR) methods, whose first step is always a segmentation of the
present metal objects, try to remove these artifacts. Thereby, the segmentation
is a crucial task which has strong influence on the MAR's outcome. This study
proposes and evaluates a learning-based patch-wise segmentation network and a
newly proposed Consistency Check as post-processing step. The combination of
the learned segmentation and Consistency Check reaches a high segmentation
performance with an average IoU score of 0.924 on the test set. Furthermore,
the Consistency Check proves the ability to significantly reduce false positive
segmentations whilst simultaneously ensuring consistent segmentations.
Authors' comments: Accepted for Bildverarbeitung f\"ur die Medizin, 07.-09.03.2021
Sertac Arisoy, Nasser M. Nasrabadi, Koray Kayabol
We propose a completely unsupervised pixel-wise anomaly detection method for hyperspectral images. The proposed method consists of three steps called data preparation, reconstruction, and detection. In the data preparation step, we apply a background purification to train the deep network in an unsupervised manner. In the reconstruction step, we propose to use three different deep autoencoding adversarial network (AEAN) models including 1D-AEAN, 2D-AEAN, and 3D-AEAN which are developed for working on spectral, spatial, and joint spectral-spatial domains, respectively. The goal of the AEAN models is to generate synthesized hyperspectral images (HSIs) which are close to real ones. A reconstruction error map (REM) is calculated between the original and the synthesized image pixels. In the detection step, we propose to use a WRX-based detector in which the pixel weights are obtained according to REM. We compare our proposed method with the classical RX, WRX, support vector data description-based (SVDD), collaborative representation-based detector (CRD), adaptive weight deep belief network (AW-DBN) detector and deep autoencoder anomaly detection (DAEAD) method on real hyperspectral datasets. The experimental results show that the proposed approach outperforms other detectors in the benchmark.
Elizabeth Naluminsa, Edward C. Elson, Thomas H. Jarrett
We present the global scaling relations between the neutral atomic hydrogen
gas, the stellar disk and the star forming disk in a sample of 228 nearby
galaxies that are both spatially and spectrally resolved in HI line emission.
We have used HI data from the Westerbork survey of HI in Irregular and Spiral
galaxies (WHISP) and Mid Infrared (3.4 $\mu m$, 11.6 $\mu m$) data from the
Wide-field Infrared Survey Explorer (WISE) survey, combining two datasets that
are well-suited to such a study in terms of uniformity, resolution and
sensitivity. We utilize the novel method of deriving scaling relations for
quantities enclosed within the stellar disk rather than integrating over the HI
disk and find the global scaling relations to be tighter when defined for
enclosed quantities. We also present new HI intensity maps for the WHISP survey
derived using a robust noise rejection technique along with corresponding
velocity fields.
Authors' comments: 18 pages, 5 tables, 16 Figures. Accepted for publication in the
Monthly Notices of the Royal Astronomical Society. Minor revision