Chen Tang, Wenyu Sun, Zhuqing Yuan, Yongpan Liu
To accelerate deep CNN models, this paper proposes a novel spatially adaptive framework that can dynamically generate pixel-wise sparsity according to the input image. The sparse scheme is pixel-wise refined, regional adaptive under a unified importance map, which makes it friendly to hardware implementation. A sparse controlling method is further presented to enable online adjustment for applications with different precision/latency requirements. The sparse model is applicable to a wide range of vision tasks. Experimental results show that this method efficiently improve the computing efficiency for both image classification using ResNet-18 and super resolution using SRResNet. On image classification task, our method can save 30%-70% MACs with a slightly drop in top-1 and top-5 accuracy. On super resolution task, our method can reduce more than 90% MACs while only causing around 0.1 dB and 0.01 decreasing in PSNR and SSIM. Hardware validation is also included.
Hao Wang, Jia Zhang, Yingce Xia, Jiang Bian, Chao Zhang, Tie-Yan Liu
Semantic code search, which aims to retrieve code snippets relevant to a given natural language query, has attracted many research efforts with the purpose of accelerating software development. The huge amount of online publicly available code repositories has prompted the employment of deep learning techniques to build state-of-the-art code search models. Particularly, they leverage deep neural networks to embed codes and queries into a unified semantic vector space and then use the similarity between code's and query's vectors to approximate the semantic correlation between code and the query. However, most existing studies overlook the code's intrinsic structural logic, which indeed contains a wealth of semantic information, and fails to capture intrinsic features of codes. In this paper, we propose a new deep learning architecture, COSEA, which leverages convolutional neural networks with layer-wise attention to capture the valuable code's intrinsic structural logic. To further increase the learning efficiency of COSEA, we propose a variant of contrastive loss for training the code search model, where the ground-truth code should be distinguished from the most similar negative sample. We have implemented a prototype of COSEA. Extensive experiments over existing public datasets of Python and SQL have demonstrated that COSEA can achieve significant improvements over state-of-the-art methods on code search tasks.
Fan Mo, Anastasia Borovykh, Mohammad Malekzadeh, Hamed Haddadi, Soteris Demetriou
Training deep neural networks via federated learning allows clients to share,
instead of the original data, only the model trained on their data. Prior work
has demonstrated that in practice a client's private information, unrelated to
the main learning task, can be discovered from the model's gradients, which
compromises the promised privacy protection. However, there is still no formal
approach for quantifying the leakage of private information via the shared
updated model or gradients. In this work, we analyze property inference attacks
and define two metrics based on (i) an adaptation of the empirical
$\mathcal{V}$-information, and (ii) a sensitivity analysis using Jacobian
matrices allowing us to measure changes in the gradients with respect to latent
information. We show the applicability of our proposed metrics in localizing
private latent information in a layer-wise manner and in two settings where (i)
we have or (ii) we do not have knowledge of the attackers' capabilities. We
evaluate the proposed metrics for quantifying information leakage on three
real-world datasets using three benchmark models.
Authors' comments: 9 pages, at ICLR workshop (Distributed and Private Machine Learning)
Alessio Netti, Daniele Tafani, Michael Ott, Martin Schulz
Modern High-Performance Computing (HPC) and data center operators rely more
and more on data analytics techniques to improve the efficiency and reliability
of their operations. They employ models that ingest time-series monitoring
sensor data and transform it into actionable knowledge for system tuning: a
process known as Operational Data Analytics (ODA). However, monitoring data has
a high dimensionality, is hardware-dependent and difficult to interpret. This,
coupled with the strict requirements of ODA, makes most traditional data mining
methods impractical and in turn renders this type of data cumbersome to
process. Most current ODA solutions use ad-hoc processing methods that are not
generic, are sensible to the sensors' features and are not fit for
visualization.
In this paper we propose a novel method, called Correlation-wise Smoothing
(CS), to extract descriptive signatures from time-series monitoring data in a
generic and lightweight way. Our CS method exploits correlations between data
dimensions to form groups and produces image-like signatures that can be easily
manipulated, visualized and compared. We evaluate the CS method on HPC-ODA, a
collection of datasets that we release with this work, and show that it leads
to the same performance as most state-of-the-art methods while producing
signatures that are up to ten times smaller and up to ten times faster, while
gaining visualizability, portability across systems and clear scaling
properties.
Authors' comments: Accepted for publication at the 35th IEEE International Parallel &
Distributed Processing Symposium (IPDPS 2021)
Tsai-Shien Chen, Man-Yu Lee, Chih-Ting Liu, Shao-Yi Chien
Vehicle re-identification (re-ID) matches images of the same vehicle across
different cameras. It is fundamentally challenging because the dramatically
different appearance caused by different viewpoints would make the framework
fail to match two vehicles of the same identity. Most existing works solved the
problem by extracting viewpoint-aware feature via spatial attention mechanism,
which, yet, usually suffers from noisy generated attention map or otherwise
requires expensive keypoint labels to improve the quality. In this work, we
propose Viewpoint-aware Channel-wise Attention Mechanism (VCAM) by observing
the attention mechanism from a different aspect. Our VCAM enables the feature
learning framework channel-wisely reweighing the importance of each feature
maps according to the "viewpoint" of input vehicle. Extensive experiments
validate the effectiveness of the proposed method and show that we perform
favorably against state-of-the-arts methods on the public VeRi-776 dataset and
obtain promising results on the 2020 AI City Challenge. We also conduct other
experiments to demonstrate the interpretability of how our VCAM practically
assists the learning framework.
Authors' comments: CVPR Workshop 2020
Yuhki Hatakeyama, Hiroki Sakuma, Yoshinori Konishi, Kohei Suenaga
Image classification based on machine learning is being commonly used.
However, a classification result given by an advanced method, including deep
learning, is often hard to interpret. This problem of interpretability is one
of the major obstacles in deploying a trained model in safety-critical systems.
Several techniques have been proposed to address this problem; one of which is
RISE, which explains a classification result by a heatmap, called a saliency
map, which explains the significance of each pixel. We propose MC-RISE
(Multi-Color RISE), which is an enhancement of RISE to take color information
into account in an explanation. Our method not only shows the saliency of each
pixel in a given image as the original RISE does, but the significance of color
components of each pixel; a saliency map with color information is useful
especially in the domain where the color information matters (e.g.,
traffic-sign recognition). We implemented MC-RISE and evaluate them using two
datasets (GTSRB and ImageNet) to demonstrate the effectiveness of our methods
in comparison with existing techniques for interpreting image classification
results.
Authors' comments: To appear in ACCV 2020
Saptarshi Sinha, Hiroki Ohashi, Katsuyuki Nakamura
Class-imbalance is one of the major challenges in real world datasets, where
a few classes (called majority classes) constitute much more data samples than
the rest (called minority classes). Learning deep neural networks using such
datasets leads to performances that are typically biased towards the majority
classes. Most of the prior works try to solve class-imbalance by assigning more
weights to the minority classes in various manners (e.g., data re-sampling,
cost-sensitive learning). However, we argue that the number of available
training data may not be always a good clue to determine the weighting strategy
because some of the minority classes might be sufficiently represented even by
a small number of training data. Overweighting samples of such classes can lead
to drop in the model's overall performance. We claim that the 'difficulty' of a
class as perceived by the model is more important to determine the weighting.
In this light, we propose a novel loss function named Class-wise
Difficulty-Balanced loss, or CDB loss, which dynamically distributes weights to
each sample according to the difficulty of the class that the sample belongs
to. Note that the assigned weights dynamically change as the 'difficulty' for
the model may change with the learning progress. Extensive experiments are
conducted on both image (artificially induced class-imbalanced MNIST,
long-tailed CIFAR and ImageNet-LT) and video (EGTEA) datasets. The results show
that CDB loss consistently outperforms the recently proposed loss functions on
class-imbalanced datasets irrespective of the data type (i.e., video or image).
Authors' comments: Accepted for ACCV 2020 oral presentation
Wolfgang Fuhl, Enkelejda Kasneci
Simple image rotations significantly reduce the accuracy of deep neural networks. Moreover, training with all possible rotations increases the data set, which also increases the training duration. In this work, we address trainable rotation invariant convolutions as well as the construction of nets, since fully connected layers can only be rotation invariant with a one-dimensional input. On the one hand, we show that our approach is rotationally invariant for different models and on different public data sets. We also discuss the influence of purely rotational invariant features on accuracy. The rotationally adaptive convolution models presented in this work are more computationally intensive than normal convolution models. Therefore, we also present a depth wise separable approach with radial convolution. Link to CUDA code https://atreus.informatik.uni-tuebingen.de/seafile/d/8e2ab8c3fdd444e1a135/
Keyu Tian, Chen Lin, Ming Sun, Luping Zhou, Junjie Yan, Wanli Ouyang
The recent progress on automatically searching augmentation policies has
boosted the performance substantially for various tasks. A key component of
automatic augmentation search is the evaluation process for a particular
augmentation policy, which is utilized to return reward and usually runs
thousands of times. A plain evaluation process, which includes full model
training and validation, would be time-consuming. To achieve efficiency, many
choose to sacrifice evaluation reliability for speed. In this paper, we dive
into the dynamics of augmented training of the model. This inspires us to
design a powerful and efficient proxy task based on the Augmentation-Wise
Weight Sharing (AWS) to form a fast yet accurate evaluation process in an
elegant way. Comprehensive analysis verifies the superiority of this approach
in terms of effectiveness and efficiency. The augmentation policies found by
our method achieve superior accuracies compared with existing auto-augmentation
search methods. On CIFAR-10, we achieve a top-1 error rate of 1.24%, which is
currently the best performing single model without extra training data. On
ImageNet, we get a top-1 error rate of 20.36% for ResNet-50, which leads to
3.34% absolute error rate reduction over the baseline augmentation.
Authors' comments: Accepted to NeurIPS 2020 (Poster)
Behzad Ghazanfari, Fatemeh Afghah, Sixian Zhang
This paper proposes piece-wise matching layer as a novel layer in representation learning methods for electrocardiogram (ECG) classification. Despite the remarkable performance of representation learning methods in the analysis of time series, there are still several challenges associated with these methods ranging from the complex structures of methods, the lack of generality of solutions, the need for expert knowledge, and large-scale training datasets. We introduce the piece-wise matching layer that works based on two levels to address some of the aforementioned challenges. At the first level, a set of morphological, statistical, and frequency features and comparative forms of them are computed based on each periodic part and its neighbors. At the second level, these features are modified by predefined transformation functions based on a receptive field scenario. Several scenarios of offline processing, incremental processing, fixed sliding receptive field, and event-based triggering receptive field can be implemented based on the choice of length and mechanism of indicating the receptive field. We propose dynamic time wrapping as a mechanism that indicates a receptive field based on event triggering tactics. To evaluate the performance of this method in time series analysis, we applied the proposed layer in two publicly available datasets of PhysioNet competitions in 2015 and 2017 where the input data is ECG signal. We compared the performance of our method against a variety of known tuned methods from expert knowledge, machine learning, deep learning methods, and the combination of them. The proposed approach improves the state of the art in two known completions 2015 and 2017 around 4% and 7% correspondingly while it does not rely on in advance knowledge of the classes or the possible places of arrhythmia.
Weiwei Hou, Hanna Suominen, Piotr Koniusz, Sabrina Caldwell, Tom Gedeon
Sentence compression is a Natural Language Processing (NLP) task aimed at shortening original sentences and preserving their key information. Its applications can benefit many fields e.g. one can build tools for language education. However, current methods are largely based on Recurrent Neural Network (RNN) models which suffer from poor processing speed. To address this issue, in this paper, we propose a token-wise Convolutional Neural Network, a CNN-based model along with pre-trained Bidirectional Encoder Representations from Transformers (BERT) features for deletion-based sentence compression. We also compare our model with RNN-based models and fine-tuned BERT. Although one of the RNN-based models outperforms marginally other models given the same input, our CNN-based model was ten times faster than the RNN-based approach.
Hengyi Cai, Hongshen Chen, Yonghao Song, Zhuoye Ding, Yongjun Bao, Weipeng Yan, Xiaofang Zhao
Neural dialogue response generation has gained much popularity in recent years. Maximum Likelihood Estimation (MLE) objective is widely adopted in existing dialogue model learning. However, models trained with MLE objective function are plagued by the low-diversity issue when it comes to the open-domain conversational setting. Inspired by the observation that humans not only learn from the positive signals but also benefit from correcting behaviors of undesirable actions, in this work, we introduce contrastive learning into dialogue generation, where the model explicitly perceives the difference between the well-chosen positive and negative utterances. Specifically, we employ a pretrained baseline model as a reference. During contrastive learning, the target dialogue model is trained to give higher conditional probabilities for the positive samples, and lower conditional probabilities for those negative samples, compared to the reference model. To manage the multi-mapping relations prevailed in human conversation, we augment contrastive dialogue learning with group-wise dual sampling. Extensive experimental results show that the proposed group-wise contrastive learning framework is suited for training a wide range of neural dialogue generation models with very favorable performance over the baseline training approaches.
Xuyang Shen, Jo Plested, Yue Yao, Tom Gedeon
Three-dimensional face reconstruction is one of the popular applications in
computer vision. However, even state-of-the-art models still require frontal
face as inputs, which restricts its usage scenarios in the wild. A similar
dilemma also happens in face recognition. New research designed to recover the
frontal face from a single side-pose facial image has emerged. The
state-of-the-art in this area is the Face-Transformation generative adversarial
network, which is based on the CycleGAN. This inspired our research which
explores the performance of two models from pixel transformation in frontal
facial synthesis, Pix2Pix and CycleGAN. We conducted the experiments on five
different loss functions on Pix2Pix to improve its performance, then followed
by proposing a new network Pairwise-GAN in frontal facial synthesis.
Pairwise-GAN uses two parallel U-Nets as the generator and PatchGAN as the
discriminator. The detailed hyper-parameters are also discussed. Based on the
quantitative measurement by face similarity comparison, our results showed that
Pix2Pix with L1 loss, gradient difference loss, and identity loss results in
2.72% of improvement at average similarity compared to the default Pix2Pix
model. Additionally, the performance of Pairwise-GAN is 5.4% better than the
CycleGAN and 9.1% than the Pix2Pix at average similarity.
Authors' comments: The 27th International Conference on Neural Information
Processing(ICONIP2020)
Xu Qian, Victor Li, Crews Darren
Second-order information has proven to be very effective in determining the redundancy of neural network weights and activations. Recent paper proposes to use Hessian traces of weights and activations for mixed-precision quantization and achieves state-of-the-art results. However, prior works only focus on selecting bits for each layer while the redundancy of different channels within a layer also differ a lot. This is mainly because the complexity of determining bits for each channel is too high for original methods. Here, we introduce Channel-wise Hessian Aware trace-Weighted Quantization (CW-HAWQ). CW-HAWQ uses Hessian trace to determine the relative sensitivity order of different channels of activations and weights. What's more, CW-HAWQ proposes to use deep Reinforcement learning (DRL) Deep Deterministic Policy Gradient (DDPG)-based agent to find the optimal ratios of different quantization bits and assign bits to channels according to the Hessian trace order. The number of states in CW-HAWQ is much smaller compared with traditional AutoML based mix-precision methods since we only need to search ratios for the quantization bits. Compare CW-HAWQ with state-of-the-art shows that we can achieve better results for multiple networks.
Han Liu, Caixia Yuan, Xiaojie Wang
A major challenge of multi-label text classification (MLTC) is to
stimulatingly exploit possible label differences and label correlations. In
this paper, we tackle this challenge by developing Label-Wise Pre-Training
(LW-PT) method to get a document representation with label-aware information.
The basic idea is that, a multi-label document can be represented as a
combination of multiple label-wise representations, and that, correlated labels
always cooccur in the same or similar documents. LW-PT implements this idea by
constructing label-wise document classification tasks and trains label-wise
document encoders. Finally, the pre-trained label-wise encoder is fine-tuned
with the downstream MLTC task. Extensive experimental results validate that the
proposed method has significant advantages over the previous state-of-the-art
models and is able to discover reasonable label relationship. The code is
released to facilitate other researchers.
Authors' comments: Accepted to NLPCC 2020
Zhanghan Ke, Di Qiu, Kaican Li, Qiong Yan, Rynson W. H. Lau
We investigate the generalization of semi-supervised learning (SSL) to
diverse pixel-wise tasks. Although SSL methods have achieved impressive results
in image classification, the performances of applying them to pixel-wise tasks
are unsatisfactory due to their need for dense outputs. In addition, existing
pixel-wise SSL approaches are only suitable for certain tasks as they usually
require to use task-specific properties. In this paper, we present a new SSL
framework, named Guided Collaborative Training (GCT), for pixel-wise tasks,
with two main technical contributions. First, GCT addresses the issues caused
by the dense outputs through a novel flaw detector. Second, the modules in GCT
learn from unlabeled data collaboratively through two newly proposed
constraints that are independent of task-specific properties. As a result, GCT
can be applied to a wide range of pixel-wise tasks without structural
adaptation. Our extensive experiments on four challenging vision tasks,
including semantic segmentation, real image denoising, portrait image matting,
and night image enhancement, show that GCT outperforms state-of-the-art SSL
methods by a large margin. Our code available at:
https://github.com/ZHKKKe/PixelSSL.
Authors' comments: 16th European Conference on Computer Vision (ECCV 2020)
Myoungha Song, Jeongho Lee, Donghwan Kim
6D pose estimation refers to object recognition and estimation of 3D rotation
and 3D translation. The key technology for estimating 6D pose is to estimate
pose by extracting enough features to find pose in any environment. Previous
methods utilized depth information in the refinement process or were designed
as a heterogeneous architecture for each data space to extract feature.
However, these methods are limited in that they cannot extract sufficient
feature. Therefore, this paper proposes a Point Attention Module that can
efficiently extract powerful feature from RGB-D. In our Module, attention map
is formed through a Geometric Attention Path(GAP) and Channel Attention
Path(CAP). In GAP, it is designed to pay attention to important information in
geometric information, and CAP is designed to pay attention to important
information in Channel information. We show that the attention module
efficiently creates feature representations without significantly increasing
computational complexity. Experimental results show that the proposed method
outperforms the existing methods in benchmarks, YCB Video and LineMod. In
addition, the attention module was applied to the classification task, and it
was confirmed that the performance significantly improved compared to the
existing model.
Authors' comments: 11 pages, 5figures
Dikshant Sagar, Jatin Garg, Prarthana Kansal, Sejal Bhalla, Rajiv Ratn Shah, Yi Yu
Fashion is an important part of human experience. Events such as interviews,
meetings, marriages, etc. are often based on clothing styles. The rise in the
fashion industry and its effect on social influencing have made outfit
compatibility a need. Thus, it necessitates an outfit compatibility model to
aid people in clothing recommendation. However, due to the highly subjective
nature of compatibility, it is necessary to account for personalization. Our
paper devises an attribute-wise interpretable compatibility scheme with
personal preference modelling which captures user-item interaction along with
general item-item interaction. Our work solves the problem of interpretability
in clothing matching by locating the discordant and harmonious attributes
between fashion items. Extensive experiment results on IQON3000, a publicly
available real-world dataset, verify the effectiveness of the proposed model.
Authors' comments: 10 pages, 5 figures, to be published in IEEE BigMM, 2020
Siniša Družeta, Stefan Ivić
Throughout the course of the development of Particle Swarm Optimization,
particle inertia has been established as an important aspect of the method for
researching possible method improvements. As a continuation of our previous
research, we propose a novel generalized technique of inertia weight adaptation
based on individual particle's fitness improvement, called anakatabatic
inertia. This technique allows for adapting inertia weight value for each
particle corresponding to the particle's increasing or decreasing fitness, i.e.
conditioned by particle's ascending (anabatic) or descending (katabatic)
movement. The proposed inertia weight control framework was metaoptimized and
tested on the 30 test functions of the CEC 2014 test suite. The conducted
procedure produced four anakatabatic models, two for each of the PSO methods
used (Standard PSO and TVAC-PSO). The benchmark testing results show that using
the proposed anakatabatic inertia models reliably yield moderate improvements
in accuracy of Standard PSO (final fitness minimum reduced up to 0.09 orders of
magnitude) and rather strong improvements for TVAC-PSO (final fitness minimum
reduced up to 0.59 orders of magnitude), mostly without any adverse effects on
the method's performance.
Authors' comments: 6 pages, 5 figures, 2 tables. arXiv admin note: substantial text
overlap with arXiv:1906.02474
Shota Horiguchi, Yusuke Fujita, Kenji Nagamatsu
A novel framework for meeting transcription using asynchronous microphones is
proposed in this paper. It consists of audio synchronization, speaker
diarization, utterance-wise speech enhancement using guided source separation,
automatic speech recognition, and duplication reduction. Doing speaker
diarization before speech enhancement enables the system to deal with
overlapped speech without considering sampling frequency mismatch between
microphones. Evaluation on our real meeting datasets showed that our framework
achieved a character error rate (CER) of 28.7 % by using 11 distributed
microphones, while a monaural microphone placed on the center of the table had
a CER of 38.2 %. We also showed that our framework achieved CER of 21.8 %,
which is only 2.1 percentage points higher than the CER in headset
microphone-based transcription.
Authors' comments: Accepted to INTERSPEECH 2020