Kento Hasegawa, Kazuki Yamashita, Seira Hidano, Kazuhide Fukushima, Kazuo Hashimoto, Nozomu Togawa
In the fourth industrial revolution, securing the protection of the supply chain has become an ever-growing concern. One such cyber threat is a hardware Trojan (HT), a malicious modification to an IC. HTs are often identified in the hardware manufacturing process, but should be removed earlier, when the design is being specified. Machine learning-based HT detection in gate-level netlists is an efficient approach to identify HTs at the early stage. However, feature-based modeling has limitations in discovering an appropriate set of HT features. We thus propose NHTD-GL in this paper, a novel node-wise HT detection method based on graph learning (GL). Given the formal analysis of HT features obtained from domain knowledge, NHTD-GL bridges the gap between graph representation learning and feature-based HT detection. The experimental results demonstrate that NHTD-GL achieves 0.998 detection accuracy and outperforms state-of-the-art node-wise HT detection methods. NHTD-GL extracts HT features without heuristic feature engineering.
Hyunmin Lee, Jaesik Park
In this paper, we introduce a new dataset, named InstaOrder, that can be used
to understand the geometrical relationships of instances in an image. The
dataset consists of 2.9M annotations of geometric orderings for class-labeled
instances in 101K natural scenes. The scenes were annotated by 3,659
crowd-workers regarding (1) occlusion order that identifies occluder/occludee
and (2) depth order that describes ordinal relations that consider relative
distance from the camera. The dataset provides joint annotation of two kinds of
orderings for the same instances, and we discover that the occlusion order and
depth order are complementary. We also introduce a geometric order prediction
network called InstaOrderNet, which is superior to state-of-the-art approaches.
Moreover, we propose a dense depth prediction network called InstaDepthNet that
uses auxiliary geometric order loss to boost the accuracy of the
state-of-the-art depth prediction approach, MiDaS [56].
Authors' comments: Accepted to CVPR 2022. Code is available at
https://github.com/POSTECH-CVLab/InstaOrder
Barnabás Janzer
A family $\mathcal{F}$ of subsets of $\{1,\dots,n\}$ is called $k$-wise
intersecting if any $k$ members of $\mathcal{F}$ have non-empty intersection,
and it is called maximal $k$-wise intersecting if no family strictly containing
$\mathcal{F}$ satisfies this condition. We show that for each $k\geq 2$ there
is a maximal $k$-wise intersecting family of size $O(2^{n/(k-1)})$. Up to a
constant factor, this matches the best known lower bound, and answers an old
question of Erd\H{o}s and Kleitman, recently studied by Hendrey, Lund,
Tompkins, and Tran.
Authors' comments: 4 pages; added a new section about the non-existence of certain types
of constructions
Marco Colussi, Stavros Ntalampiras
After constructing a deep neural network for urban sound classification, this work focuses on the sensitive application of assisting drivers suffering from hearing loss. As such, clear etiology justifying and interpreting model predictions comprise a strong requirement. To this end, we used two different representations of audio signals, i.e. Mel and constant-Q spectrograms, while the decisions made by the deep neural network are explained via layer-wise relevance propagation. At the same time, frequency content assigned with high relevance in both feature sets, indicates extremely discriminative information characterizing the present classification task. Overall, we present an explainable AI framework for understanding deep urban sound classification.
Yuanhao Cai, Jing Lin, Xiaowan Hu, Haoqian Wang, Xin Yuan, Yulun Zhang, Radu Timofte, Luc Van Gool
Hyperspectral image (HSI) reconstruction aims to recover the 3D
spatial-spectral signal from a 2D measurement in the coded aperture snapshot
spectral imaging (CASSI) system. The HSI representations are highly similar and
correlated across the spectral dimension. Modeling the inter-spectra
interactions is beneficial for HSI reconstruction. However, existing CNN-based
methods show limitations in capturing spectral-wise similarity and long-range
dependencies. Besides, the HSI information is modulated by a coded aperture
(physical mask) in CASSI. Nonetheless, current algorithms have not fully
explored the guidance effect of the mask for HSI restoration. In this paper, we
propose a novel framework, Mask-guided Spectral-wise Transformer (MST), for HSI
reconstruction. Specifically, we present a Spectral-wise Multi-head
Self-Attention (S-MSA) that treats each spectral feature as a token and
calculates self-attention along the spectral dimension. In addition, we
customize a Mask-guided Mechanism (MM) that directs S-MSA to pay attention to
spatial regions with high-fidelity spectral representations. Extensive
experiments show that our MST significantly outperforms state-of-the-art (SOTA)
methods on simulation and real HSI datasets while requiring dramatically
cheaper computational and memory costs. Code and pre-trained models are
available at https://github.com/caiyuanhao1998/MST/
Authors' comments: CVPR 2022; The first Transformer-based method for snapshot
compressive imaging
Feng Liu, Zhe Kong, Haozhe Liu, Wentian Zhang, Linlin Shen
Due to the diversity of attack materials, fingerprint recognition systems
(AFRSs) are vulnerable to malicious attacks. It is thus important to propose
effective fingerprint presentation attack detection (PAD) methods for the
safety and reliability of AFRSs. However, current PAD methods often exhibit
poor robustness under new attack types settings. This paper thus proposes a
novel channel-wise feature denoising fingerprint PAD (CFD-PAD) method by
handling the redundant noise information ignored in previous studies. The
proposed method learns important features of fingerprint images by weighing the
importance of each channel and identifying discriminative channels and "noise"
channels. Then, the propagation of "noise" channels is suppressed in the
feature map to reduce interference. Specifically, a PA-Adaptation loss is
designed to constrain the feature distribution to make the feature distribution
of live fingerprints more aggregate and that of spoof fingerprints more
disperse. Experimental results evaluated on the LivDet 2017 dataset showed that
the proposed CFD-PAD can achieve a 2.53% average classification error (ACE) and
a 93.83% true detection rate when the false detection rate equals 1.0%
(TDR@FDR=1%). Also, the proposed method markedly outperforms the best
single-model-based methods in terms of ACE (2.53% vs. 4.56%) and
TDR@FDR=1%(93.83% vs. 73.32%), which demonstrates its effectiveness. Although
we have achieved a comparable result with the state-of-the-art
multiple-model-based methods, there still is an increase in TDR@FDR=1% from
91.19% to 93.83%. In addition, the proposed model is simpler, lighter and more
efficient and has achieved a 74.76% reduction in computation time compared with
the state-of-the-art multiple-model-based method. The source code is available
at https://github.com/kongzhecn/cfd-pad.
Authors' comments: 15 pages, 8 figures, Accepted by TIFS
Matteo Guarrera, Baihong Jin, Tung-Wei Lin, Maria Zuluaga, Yuxin Chen, Alberto Sangiovanni-Vincentelli
We consider the problem of detecting OoD(Out-of-Distribution) input data when
using deep neural networks, and we propose a simple yet effective way to
improve the robustness of several popular OoD detection methods against label
shift. Our work is motivated by the observation that most existing OoD
detection algorithms consider all training/test data as a whole, regardless of
which class entry each input activates (inter-class differences). Through
extensive experimentation, we have found that such practice leads to a detector
whose performance is sensitive and vulnerable to label shift. To address this
issue, we propose a class-wise thresholding scheme that can apply to most
existing OoD detection algorithms and can maintain similar OoD detection
performance even in the presence of label shift in the test distribution.
Authors' comments: 12 pages, 7 figures, 7 tables
Wangyou Zhang, Zhuo Chen, Naoyuki Kanda, Shujie Liu, Jinyu Li, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao et al.
Multi-talker conversational speech processing has drawn many interests for
various applications such as meeting transcription. Speech separation is often
required to handle overlapped speech that is commonly observed in conversation.
Although the original utterancelevel permutation invariant training-based
continuous speech separation approach has proven to be effective in various
conditions, it lacks the ability to leverage the long-span relationship of
utterances and is computationally inefficient due to the highly overlapped
sliding windows. To overcome these drawbacks, we propose a novel training
scheme named Group-PIT, which allows direct training of the speech separation
models on the long-form speech with a low computational cost for label
assignment. Two different speech separation approaches with Group-PIT are
explored, including direct long-span speech separation and short-span speech
separation with long-span tracking. The experiments on the simulated
meeting-style data demonstrate the effectiveness of our proposed approaches,
especially in dealing with a very long speech input.
Authors' comments: 5 pages, 3 figures, 3 tables, submitted to IEEE ICASSP 2022
Chang Song, Riya Ranjan, Hai Li
Neural networks are getting better accuracy with higher energy and
computational cost. After quantization, the cost can be greatly saved, and the
quantized models are more hardware friendly with acceptable accuracy loss. On
the other hand, recent research has found that neural networks are vulnerable
to adversarial attacks, and the robustness of a neural network model can only
be improved with defense methods, such as adversarial training. In this work,
we find that adversarially-trained neural networks are more vulnerable to
quantization loss than plain models. To minimize both the adversarial and the
quantization losses simultaneously and to make the quantized model robust, we
propose a layer-wise adversarial-aware quantization method, using the Lipschitz
constant to choose the best quantization parameter settings for a neural
network. We theoretically derive the losses and prove the consistency of our
metric selection. The experiment results show that our method can effectively
and efficiently improve the robustness of quantized adversarially-trained
neural networks.
Authors' comments: arXiv admin note: substantial text overlap with arXiv:2012.14965
Nathan Myhrvold, Pavlo Pinchuk, Jean-Luc Margot
We analyzed 82,548 carefully curated observations of 4420 asteroids with
Wide-field Infrared Survey Explorer (WISE) four-band data to produce estimates
of diameters and infrared emissivities. We also used these diameter values in
conjunction with absolute visual magnitudes to infer estimates of visible-band
geometric albedos. We provide solutions to 131 asteroids not analyzed by the
NEOWISE team and to 1778 asteroids not analyzed with four-band data by the
NEOWISE team. Our process differs from the NEOWISE analysis in that it uses an
accurate solar flux, integrates the flux with actual bandpass responses, obeys
Kirchhoff's law, and does not force emissivity values in all four bands to an
arbitrary value of 0.9. We used a regularized model fitting algorithm that
yields improved fits to the data. Our results more closely match stellar
occultation diameter estimates than the NEOWISE results by a factor of ~2.
Using 24 high-quality stellar occultation results as a benchmark, we found that
the median error of four-infrared-band diameter estimates in a carefully
curated data set is 9.3%. Our results also suggest the presence of a
size-dependent bias in the NEOWISE diameter estimates, which may pollute
estimates of asteroid size distributions and slightly inflate impact hazard
risk calculations. For more than 90% of asteroids in this sample, the primary
source of error on the albedo estimate is the error on absolute visual
magnitude.
Authors' comments: 30 pages, 23 figures, Planetary Science Journal, in press
Yu Wang, Charu Aggarwal, Tyler Derr
Recent years have witnessed the significant success of applying graph neural networks (GNNs) in learning effective node representations for classification. However, current GNNs are mostly built under the balanced data-splitting, which is inconsistent with many real-world networks where the number of training nodes can be extremely imbalanced among the classes. Thus, directly utilizing current GNNs on imbalanced data would generate coarse representations of nodes in minority classes and ultimately compromise the classification performance. This therefore portends the importance of developing effective GNNs for handling imbalanced graph data. In this work, we propose a novel Distance-wise Prototypical Graph Neural Network (DPGNN), which proposes a class prototype-driven training to balance the training loss between majority and minority classes and then leverages distance metric learning to differentiate the contributions of different dimensions of representations and fully encode the relative position of each node to each class prototype. Moreover, we design a new imbalanced label propagation mechanism to derive extra supervision from unlabeled nodes and employ self-supervised learning to smooth representations of adjacent nodes while separating inter-class prototypes. Comprehensive node classification experiments and parameter analysis on multiple networks are conducted and the proposed DPGNN almost always significantly outperforms all other baselines, which demonstrates its effectiveness in imbalanced node classification. The implementation of DPGNN is available at \url{https://github.com/YuWVandy/DPGNN}.
Guiyun Xiao, Zheng-Jian Bai, Wai-Ki Ching
Nonnegative matrix factorization arises widely in machine learning and data
analysis. In this paper, for a given factorization of rank r, we consider the
sparse stochastic matrix factorization (SSMF) of decomposing a prescribed
m-by-n stochastic matrix V into a product of an m-by-r stochastic matrix W and
an r-by-n stochastic matrix H, where both W and H are required to be sparse.
With the prescribed sparsity level, we reformulate the SSMF as an unconstrained
nonconvex-nonsmooth minimization problem and introduce a column-wise update
algorithm for solving the minimization problem. We show that our algorithm
converges globally. The main advantage of our algorithm is that the generated
sequence converges to a special critical point of the cost function, which is
nearly a global minimizer over each column vector of the W-factor and is a
global minimizer over the H-factor as a whole if there is no sparsity
requirement on H. Numerical experiments on both synthetic and real data sets
are given to demonstrate the effectiveness of our proposed algorithm.
Authors' comments: 28 pages,8 figures
Sunwoo Lee, Tuo Zhang, Chaoyang He, Salman Avestimehr
In Federated Learning, a common approach for aggregating local models across clients is periodic averaging of the full model parameters. It is, however, known that different layers of neural networks can have a different degree of model discrepancy across the clients. The conventional full aggregation scheme does not consider such a difference and synchronizes the whole model parameters at once, resulting in inefficient network bandwidth consumption. Aggregating the parameters that are similar across the clients does not make meaningful training progress while increasing the communication cost. We propose FedLAMA, a layer-wise model aggregation scheme for scalable Federated Learning. FedLAMA adaptively adjusts the aggregation interval in a layer-wise manner, jointly considering the model discrepancy and the communication cost. The layer-wise aggregation method enables to finely control the aggregation interval to relax the aggregation frequency without a significant impact on the model accuracy. Our empirical study shows that FedLAMA reduces the communication cost by up to 60% for IID data and 70% for non-IID data while achieving a comparable accuracy to FedAvg.
HyoJung Han, Seokchan Ahn, Yoonjung Choi, Insoo Chung, Sangha Kim, Kyunghyun Cho
Recent work in simultaneous machine translation is often trained with
conventional full sentence translation corpora, leading to either excessive
latency or necessity to anticipate as-yet-unarrived words, when dealing with a
language pair whose word orders significantly differ. This is unlike human
simultaneous interpreters who produce largely monotonic translations at the
expense of the grammaticality of a sentence being translated. In this paper, we
thus propose an algorithm to reorder and refine the target side of a full
sentence translation corpus, so that the words/phrases between the source and
target sentences are aligned largely monotonically, using word alignment and
non-autoregressive neural machine translation. We then train a widely used
wait-k simultaneous translation model on this reordered-and-refined corpus. The
proposed approach improves BLEU scores and resulting translations exhibit
enhanced monotonicity with source sentences.
Authors' comments: To be published in WMT2021
Enyan Dai, Shijie Zhou, Zhimeng Guo, Suhang Wang
Graph Neural Networks (GNNs) have achieved remarkable performance in modeling graphs for various applications. However, most existing GNNs assume the graphs exhibit strong homophily in node labels, i.e., nodes with similar labels are connected in the graphs. They fail to generalize to heterophilic graphs where linked nodes may have dissimilar labels and attributes. Therefore, in this paper, we investigate a novel framework that performs well on graphs with either homophily or heterophily. More specifically, we propose a label-wise message passing mechanism to avoid the negative effects caused by aggregating dissimilar node representations and preserve the heterophilic contexts for representation learning. We further propose a bi-level optimization method to automatically select the model for graphs with homophily/heterophily. Theoretical analysis and extensive experiments demonstrate the effectiveness of our proposed framework for node classification on both homophilic and heterophilic graphs.
Chenyang Huang, Hao Zhou, Osmar R. Zaïane, Lili Mou, Lei Li
How do we perform efficient inference while retaining high translation quality? Existing neural machine translation models, such as Transformer, achieve high performance, but they decode words one by one, which is inefficient. Recent non-autoregressive translation models speed up the inference, but their quality is still inferior. In this work, we propose DSLP, a highly efficient and high-performance model for machine translation. The key insight is to train a non-autoregressive Transformer with Deep Supervision and feed additional Layer-wise Predictions. We conducted extensive experiments on four translation tasks (both directions of WMT'14 EN-DE and WMT'16 EN-RO). Results show that our approach consistently improves the BLEU scores compared with respective base models. Specifically, our best variant outperforms the autoregressive model on three translation tasks, while being 14.8 times more efficient in inference.
Hao Su, Jianwei Niu, Xuefeng Liu, Jiahe Cui, Ji Wan
Manga is a fashionable Japanese-style comic form that is composed of
black-and-white strokes and is generally displayed as raster images on digital
devices. Typical mangas have simple textures, wide lines, and few color
gradients, which are vectorizable natures to enjoy the merits of vector
graphics, e.g., adaptive resolutions and small file sizes. In this paper, we
propose MARVEL (MAnga's Raster to VEctor Learning), a primitive-wise approach
for vectorizing raster mangas by Deep Reinforcement Learning (DRL). Unlike
previous learning-based methods which predict vector parameters for an entire
image, MARVEL introduces a new perspective that regards an entire manga as a
collection of basic primitives\textemdash stroke lines, and designs a DRL model
to decompose the target image into a primitive sequence for achieving accurate
vectorization. To improve vectorization accuracies and decrease file sizes, we
further propose a stroke accuracy reward to predict accurate stroke lines, and
a pruning mechanism to avoid generating erroneous and repeated strokes.
Extensive subjective and objective experiments show that our MARVEL can
generate impressive results and reaches the state-of-the-art level. Our code is
open-source at: https://github.com/SwordHolderSH/Mang2Vec.
Authors' comments: The name of the previous version paper was: Mang2Vec: Vectorization
of raster manga by deep reinforcement learning
Jiehua Zhang, Zhuo Su, Yanghe Feng, Xin Lu, Matti Pietikäinen, Li Liu
Binary neural networks (BNNs) constrain weights and activations to +1 or -1
with limited storage and computational cost, which is hardware-friendly for
portable devices. Recently, BNNs have achieved remarkable progress and been
adopted into various fields. However, the performance of BNNs is sensitive to
activation distribution. The existing BNNs utilized the Sign function with
predefined or learned static thresholds to binarize activations. This process
limits representation capacity of BNNs since different samples may adapt to
unequal thresholds. To address this problem, we propose a dynamic BNN (DyBNN)
incorporating dynamic learnable channel-wise thresholds of Sign function and
shift parameters of PReLU. The method aggregates the global information into
the hyper function and effectively increases the feature expression ability.
The experimental results prove that our method is an effective and
straightforward way to reduce information loss and enhance performance of BNNs.
The DyBNN based on two backbones of ReActNet (MobileNetV1 and ResNet18) achieve
71.2% and 67.4% top1-accuracy on ImageNet dataset, outperforming baselines by a
large margin (i.e., 1.8% and 1.5% respectively).
Authors' comments: 5 pages, 3 figures
Tianfang Zhu, Yue Guan, Anan Li
Mixed-based point cloud augmentation is a popular solution to the problem of limited availability of large-scale public datasets. But the mismatch between mixed points and corresponding semantic labels hinders the further application in point-wise tasks such as part segmentation. This paper proposes a point cloud augmentation approach, PointManifoldCut(PMC), which replaces the neural network embedded points, rather than the Euclidean space coordinates. This approach takes the advantage that points at the higher levels of the neural network are already trained to embed its neighbors relations and mixing these representation will not mingle the relation between itself and its label. We set up a spatial transform module after PointManifoldCut operation to align the new instances in the embedded space. The effects of different hidden layers and methods of replacing points are also discussed in this paper. The experiments show that our proposed approach can enhance the performance of point cloud classification as well as segmentation networks, and brings them additional robustness to attacks and geometric transformations. The code of this paper is available at: https://github.com/fun0515/PointManifoldCut.
Wentao Xu, Weiqing Liu, Jiang Bian, Jian Yin, Tie-Yan Liu
The multivariate time series forecasting has attracted more and more attention because of its vital role in different fields in the real world, such as finance, traffic, and weather. In recent years, many research efforts have been proposed for forecasting multivariate time series. Although some previous work considers the interdependencies among different variables in the same timestamp, existing work overlooks the inter-connections between different variables at different time stamps. In this paper, we propose a simple yet efficient instance-wise graph-based framework to utilize the inter-dependencies of different variables at different time stamps for multivariate time series forecasting. The key idea of our framework is aggregating information from the historical time series of different variables to the current time series that we need to forecast. We conduct experiments on the Traffic, Electricity, and Exchange-Rate multivariate time series datasets. The results show that our proposed model outperforms the state-of-the-art baseline methods.