Chang Song, Riya Ranjan, Hai Li
Neural networks are getting better accuracy with higher energy and
computational cost. After quantization, the cost can be greatly saved, and the
quantized models are more hardware friendly with acceptable accuracy loss. On
the other hand, recent research has found that neural networks are vulnerable
to adversarial attacks, and the robustness of a neural network model can only
be improved with defense methods, such as adversarial training. In this work,
we find that adversarially-trained neural networks are more vulnerable to
quantization loss than plain models. To minimize both the adversarial and the
quantization losses simultaneously and to make the quantized model robust, we
propose a layer-wise adversarial-aware quantization method, using the Lipschitz
constant to choose the best quantization parameter settings for a neural
network. We theoretically derive the losses and prove the consistency of our
metric selection. The experiment results show that our method can effectively
and efficiently improve the robustness of quantized adversarially-trained
neural networks.
Authors' comments: arXiv admin note: substantial text overlap with arXiv:2012.14965
Nathan Myhrvold, Pavlo Pinchuk, Jean-Luc Margot
We analyzed 82,548 carefully curated observations of 4420 asteroids with
Wide-field Infrared Survey Explorer (WISE) four-band data to produce estimates
of diameters and infrared emissivities. We also used these diameter values in
conjunction with absolute visual magnitudes to infer estimates of visible-band
geometric albedos. We provide solutions to 131 asteroids not analyzed by the
NEOWISE team and to 1778 asteroids not analyzed with four-band data by the
NEOWISE team. Our process differs from the NEOWISE analysis in that it uses an
accurate solar flux, integrates the flux with actual bandpass responses, obeys
Kirchhoff's law, and does not force emissivity values in all four bands to an
arbitrary value of 0.9. We used a regularized model fitting algorithm that
yields improved fits to the data. Our results more closely match stellar
occultation diameter estimates than the NEOWISE results by a factor of ~2.
Using 24 high-quality stellar occultation results as a benchmark, we found that
the median error of four-infrared-band diameter estimates in a carefully
curated data set is 9.3%. Our results also suggest the presence of a
size-dependent bias in the NEOWISE diameter estimates, which may pollute
estimates of asteroid size distributions and slightly inflate impact hazard
risk calculations. For more than 90% of asteroids in this sample, the primary
source of error on the albedo estimate is the error on absolute visual
magnitude.
Authors' comments: 30 pages, 23 figures, Planetary Science Journal, in press
Yu Wang, Charu Aggarwal, Tyler Derr
Recent years have witnessed the significant success of applying graph neural networks (GNNs) in learning effective node representations for classification. However, current GNNs are mostly built under the balanced data-splitting, which is inconsistent with many real-world networks where the number of training nodes can be extremely imbalanced among the classes. Thus, directly utilizing current GNNs on imbalanced data would generate coarse representations of nodes in minority classes and ultimately compromise the classification performance. This therefore portends the importance of developing effective GNNs for handling imbalanced graph data. In this work, we propose a novel Distance-wise Prototypical Graph Neural Network (DPGNN), which proposes a class prototype-driven training to balance the training loss between majority and minority classes and then leverages distance metric learning to differentiate the contributions of different dimensions of representations and fully encode the relative position of each node to each class prototype. Moreover, we design a new imbalanced label propagation mechanism to derive extra supervision from unlabeled nodes and employ self-supervised learning to smooth representations of adjacent nodes while separating inter-class prototypes. Comprehensive node classification experiments and parameter analysis on multiple networks are conducted and the proposed DPGNN almost always significantly outperforms all other baselines, which demonstrates its effectiveness in imbalanced node classification. The implementation of DPGNN is available at \url{https://github.com/YuWVandy/DPGNN}.
Guiyun Xiao, Zheng-Jian Bai, Wai-Ki Ching
Nonnegative matrix factorization arises widely in machine learning and data
analysis. In this paper, for a given factorization of rank r, we consider the
sparse stochastic matrix factorization (SSMF) of decomposing a prescribed
m-by-n stochastic matrix V into a product of an m-by-r stochastic matrix W and
an r-by-n stochastic matrix H, where both W and H are required to be sparse.
With the prescribed sparsity level, we reformulate the SSMF as an unconstrained
nonconvex-nonsmooth minimization problem and introduce a column-wise update
algorithm for solving the minimization problem. We show that our algorithm
converges globally. The main advantage of our algorithm is that the generated
sequence converges to a special critical point of the cost function, which is
nearly a global minimizer over each column vector of the W-factor and is a
global minimizer over the H-factor as a whole if there is no sparsity
requirement on H. Numerical experiments on both synthetic and real data sets
are given to demonstrate the effectiveness of our proposed algorithm.
Authors' comments: 28 pages,8 figures
Sunwoo Lee, Tuo Zhang, Chaoyang He, Salman Avestimehr
In Federated Learning, a common approach for aggregating local models across clients is periodic averaging of the full model parameters. It is, however, known that different layers of neural networks can have a different degree of model discrepancy across the clients. The conventional full aggregation scheme does not consider such a difference and synchronizes the whole model parameters at once, resulting in inefficient network bandwidth consumption. Aggregating the parameters that are similar across the clients does not make meaningful training progress while increasing the communication cost. We propose FedLAMA, a layer-wise model aggregation scheme for scalable Federated Learning. FedLAMA adaptively adjusts the aggregation interval in a layer-wise manner, jointly considering the model discrepancy and the communication cost. The layer-wise aggregation method enables to finely control the aggregation interval to relax the aggregation frequency without a significant impact on the model accuracy. Our empirical study shows that FedLAMA reduces the communication cost by up to 60% for IID data and 70% for non-IID data while achieving a comparable accuracy to FedAvg.
HyoJung Han, Seokchan Ahn, Yoonjung Choi, Insoo Chung, Sangha Kim, Kyunghyun Cho
Recent work in simultaneous machine translation is often trained with
conventional full sentence translation corpora, leading to either excessive
latency or necessity to anticipate as-yet-unarrived words, when dealing with a
language pair whose word orders significantly differ. This is unlike human
simultaneous interpreters who produce largely monotonic translations at the
expense of the grammaticality of a sentence being translated. In this paper, we
thus propose an algorithm to reorder and refine the target side of a full
sentence translation corpus, so that the words/phrases between the source and
target sentences are aligned largely monotonically, using word alignment and
non-autoregressive neural machine translation. We then train a widely used
wait-k simultaneous translation model on this reordered-and-refined corpus. The
proposed approach improves BLEU scores and resulting translations exhibit
enhanced monotonicity with source sentences.
Authors' comments: To be published in WMT2021
Enyan Dai, Shijie Zhou, Zhimeng Guo, Suhang Wang
Graph Neural Networks (GNNs) have achieved remarkable performance in modeling graphs for various applications. However, most existing GNNs assume the graphs exhibit strong homophily in node labels, i.e., nodes with similar labels are connected in the graphs. They fail to generalize to heterophilic graphs where linked nodes may have dissimilar labels and attributes. Therefore, in this paper, we investigate a novel framework that performs well on graphs with either homophily or heterophily. More specifically, we propose a label-wise message passing mechanism to avoid the negative effects caused by aggregating dissimilar node representations and preserve the heterophilic contexts for representation learning. We further propose a bi-level optimization method to automatically select the model for graphs with homophily/heterophily. Theoretical analysis and extensive experiments demonstrate the effectiveness of our proposed framework for node classification on both homophilic and heterophilic graphs.
Chenyang Huang, Hao Zhou, Osmar R. Zaïane, Lili Mou, Lei Li
How do we perform efficient inference while retaining high translation quality? Existing neural machine translation models, such as Transformer, achieve high performance, but they decode words one by one, which is inefficient. Recent non-autoregressive translation models speed up the inference, but their quality is still inferior. In this work, we propose DSLP, a highly efficient and high-performance model for machine translation. The key insight is to train a non-autoregressive Transformer with Deep Supervision and feed additional Layer-wise Predictions. We conducted extensive experiments on four translation tasks (both directions of WMT'14 EN-DE and WMT'16 EN-RO). Results show that our approach consistently improves the BLEU scores compared with respective base models. Specifically, our best variant outperforms the autoregressive model on three translation tasks, while being 14.8 times more efficient in inference.
Hao Su, Jianwei Niu, Xuefeng Liu, Jiahe Cui, Ji Wan
Manga is a fashionable Japanese-style comic form that is composed of
black-and-white strokes and is generally displayed as raster images on digital
devices. Typical mangas have simple textures, wide lines, and few color
gradients, which are vectorizable natures to enjoy the merits of vector
graphics, e.g., adaptive resolutions and small file sizes. In this paper, we
propose MARVEL (MAnga's Raster to VEctor Learning), a primitive-wise approach
for vectorizing raster mangas by Deep Reinforcement Learning (DRL). Unlike
previous learning-based methods which predict vector parameters for an entire
image, MARVEL introduces a new perspective that regards an entire manga as a
collection of basic primitives\textemdash stroke lines, and designs a DRL model
to decompose the target image into a primitive sequence for achieving accurate
vectorization. To improve vectorization accuracies and decrease file sizes, we
further propose a stroke accuracy reward to predict accurate stroke lines, and
a pruning mechanism to avoid generating erroneous and repeated strokes.
Extensive subjective and objective experiments show that our MARVEL can
generate impressive results and reaches the state-of-the-art level. Our code is
open-source at: https://github.com/SwordHolderSH/Mang2Vec.
Authors' comments: The name of the previous version paper was: Mang2Vec: Vectorization
of raster manga by deep reinforcement learning
Jiehua Zhang, Zhuo Su, Yanghe Feng, Xin Lu, Matti Pietikäinen, Li Liu
Binary neural networks (BNNs) constrain weights and activations to +1 or -1
with limited storage and computational cost, which is hardware-friendly for
portable devices. Recently, BNNs have achieved remarkable progress and been
adopted into various fields. However, the performance of BNNs is sensitive to
activation distribution. The existing BNNs utilized the Sign function with
predefined or learned static thresholds to binarize activations. This process
limits representation capacity of BNNs since different samples may adapt to
unequal thresholds. To address this problem, we propose a dynamic BNN (DyBNN)
incorporating dynamic learnable channel-wise thresholds of Sign function and
shift parameters of PReLU. The method aggregates the global information into
the hyper function and effectively increases the feature expression ability.
The experimental results prove that our method is an effective and
straightforward way to reduce information loss and enhance performance of BNNs.
The DyBNN based on two backbones of ReActNet (MobileNetV1 and ResNet18) achieve
71.2% and 67.4% top1-accuracy on ImageNet dataset, outperforming baselines by a
large margin (i.e., 1.8% and 1.5% respectively).
Authors' comments: 5 pages, 3 figures
Tianfang Zhu, Yue Guan, Anan Li
Mixed-based point cloud augmentation is a popular solution to the problem of limited availability of large-scale public datasets. But the mismatch between mixed points and corresponding semantic labels hinders the further application in point-wise tasks such as part segmentation. This paper proposes a point cloud augmentation approach, PointManifoldCut(PMC), which replaces the neural network embedded points, rather than the Euclidean space coordinates. This approach takes the advantage that points at the higher levels of the neural network are already trained to embed its neighbors relations and mixing these representation will not mingle the relation between itself and its label. We set up a spatial transform module after PointManifoldCut operation to align the new instances in the embedded space. The effects of different hidden layers and methods of replacing points are also discussed in this paper. The experiments show that our proposed approach can enhance the performance of point cloud classification as well as segmentation networks, and brings them additional robustness to attacks and geometric transformations. The code of this paper is available at: https://github.com/fun0515/PointManifoldCut.
Wentao Xu, Weiqing Liu, Jiang Bian, Jian Yin, Tie-Yan Liu
The multivariate time series forecasting has attracted more and more attention because of its vital role in different fields in the real world, such as finance, traffic, and weather. In recent years, many research efforts have been proposed for forecasting multivariate time series. Although some previous work considers the interdependencies among different variables in the same timestamp, existing work overlooks the inter-connections between different variables at different time stamps. In this paper, we propose a simple yet efficient instance-wise graph-based framework to utilize the inter-dependencies of different variables at different time stamps for multivariate time series forecasting. The key idea of our framework is aggregating information from the historical time series of different variables to the current time series that we need to forecast. We conduct experiments on the Traffic, Electricity, and Exchange-Rate multivariate time series datasets. The results show that our proposed model outperforms the state-of-the-art baseline methods.
Alexander Bors, Qiang Wang
Let $K$ be a finite field of characteristic $p$. We study a certain class of
functions $K\rightarrow K$ that agree with an $\mathbb{F}_p$-affine function
$K\rightarrow K$ on each coset of a given additive subgroup $W$ of $K$ - we
call them $W$-coset-wise $\mathbb{F}_p$-affine functions of $K$. We show that
these functions form a permutation group on $K$ with the structure of an
imprimitive wreath product and characterize which of them are complete mappings
of $K$. As a consequence, we are able to provide various new examples of cycle
types of complete mappings of $K$, including that $K$ has a complete mapping
moving all elements of $K$ in one cycle if $p>2$.
Authors' comments: 29 pages
Alperen Kantarcı, Hasan Dertli, Hazım Kemal Ekenel
Face anti-spoofing is essential to prevent false facial verification by using
a photo, video, mask, or a different substitute for an authorized person's
face. Most of the state-of-the-art presentation attack detection (PAD) systems
suffer from overfitting, where they achieve near-perfect scores on a single
dataset but fail on a different dataset with more realistic data. This problem
drives researchers to develop models that perform well under real-world
conditions. This is an especially challenging problem for frame-based
presentation attack detection systems that use convolutional neural networks
(CNN). To this end, we propose a new PAD approach, which combines pixel-wise
binary supervision with patch-based CNN. We believe that training a CNN with
face patches allows the model to distinguish spoofs without learning background
or dataset-specific traces. We tested the proposed method both on the standard
benchmark datasets -- Replay-Mobile, OULU-NPU -- and on a real-world dataset.
The proposed approach shows its superiority on challenging experimental setups.
Namely, it achieves higher performance on OULU-NPU protocol 3, 4 and on
inter-dataset real-world experiments.
Authors' comments: Accepted to 20th International Conference of the Biometrics Special
Interest Group (BIOSIG 2021) as Oral paper
Chun Fan, Jiwei Li, Xiang Ao, Fei Wu, Yuxian Meng, Xiaofei Sun
The proposed pruning strategy offers merits over weight-based pruning
techniques: (1) it avoids irregular memory access since representations and
matrices can be squeezed into their smaller but dense counterparts, leading to
greater speedup; (2) in a manner of top-down pruning, the proposed method
operates from a more global perspective based on training signals in the top
layer, and prunes each layer by propagating the effect of global signals
through layers, leading to better performances at the same sparsity level.
Extensive experiments show that at the same sparsity level, the proposed
strategy offers both greater speedup and higher performances than weight-based
pruning methods (e.g., magnitude pruning, movement pruning).
Authors' comments: To appear at EMNLP2021
Kishan Wimalawarne, Taiji Suzuki
We investigate adaptive layer-wise graph convolution in deep GCN models. We propose AdaGPR to learn generalized Pageranks at each layer of a GCNII network to induce adaptive convolution. We show that the generalization bound for AdaGPR is bounded by a polynomial of the eigenvalue spectrum of the normalized adjacency matrix in the order of the number of generalized Pagerank coefficients. By analysing the generalization bounds we show that oversmoothing depends on both the convolutions by the higher orders of the normalized adjacency matrix and the depth of the model. We performed evaluations on node-classification using benchmark real data and show that AdaGPR provides improved accuracies compared to existing graph convolution networks while demonstrating robustness against oversmoothing. Further, we demonstrate that analysis of coefficients of layer-wise generalized Pageranks allows us to qualitatively understand convolution at each layer enabling model interpretations.
Hualian Sheng, Sijia Cai, Yuan Liu, Bing Deng, Jianqiang Huang, Xian-Sheng Hua, Min-Jian Zhao
Though 3D object detection from point clouds has achieved rapid progress in
recent years, the lack of flexible and high-performance proposal refinement
remains a great hurdle for existing state-of-the-art two-stage detectors.
Previous works on refining 3D proposals have relied on human-designed
components such as keypoints sampling, set abstraction and multi-scale feature
fusion to produce powerful 3D object representations. Such methods, however,
have limited ability to capture rich contextual dependencies among points. In
this paper, we leverage the high-quality region proposal network and a
Channel-wise Transformer architecture to constitute our two-stage 3D object
detection framework (CT3D) with minimal hand-crafted design. The proposed CT3D
simultaneously performs proposal-aware embedding and channel-wise context
aggregation for the point features within each proposal. Specifically, CT3D
uses proposal's keypoints for spatial contextual modelling and learns attention
propagation in the encoding module, mapping the proposal to point embeddings.
Next, a new channel-wise decoding module enriches the query-key interaction via
channel-wise re-weighting to effectively merge multi-level contexts, which
contributes to more accurate object predictions. Extensive experiments
demonstrate that our CT3D method has superior performance and excellent
scalability. Remarkably, CT3D achieves the AP of 81.77% in the moderate car
category on the KITTI test 3D detection benchmark, outperforms state-of-the-art
3D detectors.
Authors' comments: Accepted by ICCV2021
Ansheng You, Chenglin Zhou, Qixuan Zhang, Lan Xu
Adaptive and flexible image editing is a desirable function of modern generative models. In this work, we present a generative model with auto-encoder architecture for per-region style manipulation. We apply a code consistency loss to enforce an explicit disentanglement between content and style latent representations, making the content and style of generated samples consistent with their corresponding content and style references. The model is also constrained by a content alignment loss to ensure the foreground editing will not interfere background contents. As a result, given interested region masks provided by users, our model supports foreground region-wise style transfer. Specially, our model receives no extra annotations such as semantic labels except for self-supervision. Extensive experiments show the effectiveness of the proposed method and exhibit the flexibility of the proposed model for various applications, including region-wise style editing, latent space interpolation, cross-domain style transfer.
Heng Wang, Chaoyi Zhang, Jianhui Yu, Yang Song, Siqi Liu, Wojciech Chrzanowski, Weidong Cai
Automatic 3D neuron reconstruction is critical for analysing the morphology
and functionality of neurons in brain circuit activities. However, the
performance of existing tracing algorithms is hinged by the low image quality.
Recently, a series of deep learning based segmentation methods have been
proposed to improve the quality of raw 3D optical image stacks by removing
noises and restoring neuronal structures from low-contrast background. Due to
the variety of neuron morphology and the lack of large neuron datasets, most of
current neuron segmentation models rely on introducing complex and
specially-designed submodules to a base architecture with the aim of encoding
better feature representations. Though successful, extra burden would be put on
computation during inference. Therefore, rather than modifying the base
network, we shift our focus to the dataset itself. The encoder-decoder backbone
used in most neuron segmentation models attends only intra-volume voxel points
to learn structural features of neurons but neglect the shared intrinsic
semantic features of voxels belonging to the same category among different
volumes, which is also important for expressive representation learning. Hence,
to better utilise the scarce dataset, we propose to explicitly exploit such
intrinsic features of voxels through a novel voxel-level cross-volume
representation learning paradigm on the basis of an encoder-decoder
segmentation model. Our method introduces no extra cost during inference.
Evaluated on 42 3D neuron images from BigNeuron project, our proposed method is
demonstrated to improve the learning ability of the original segmentation model
and further enhancing the reconstruction performance.
Authors' comments: 10 pages, 3 figures, 3 tables, accepted by MICCAI-MLMI 2021
Chi Zhang, Xiaoning Ma, Yu Liu, Le Wang, Yuanqi Su, Yuehu Liu
Fundamental machine learning theory shows that different samples contribute
unequally both in learning and testing processes. Contemporary studies on DNN
imply that such sample difference is rooted on the distribution of intrinsic
pattern information, namely sample regularity. Motivated by the recent
discovery on network memorization and generalization, we proposed a pair of
sample regularity measures for both processes with a formulation-consistent
representation. Specifically, cumulative binary training/generalizing loss
(CBTL/CBGL), the cumulative number of correct classiffcations of the
training/testing sample within training stage, is proposed to quantize the
stability in memorization-generalization process; while
forgetting/mal-generalizing events, i.e., the mis-classification of previously
learned or generalized sample, are utilized to represent the uncertainty of
sample regularity with respect to optimization dynamics. Experiments validated
the effectiveness and robustness of the proposed approaches for mini-batch SGD
optimization. Further applications on training/testing sample selection show
the proposed measures sharing the unified computing procedure could benefit for
both tasks.
Authors' comments: 20 pages, 13 figures, 3 tables