Yasitha Warahena Liyanage, Daphney-Stavroula Zois, Charalampos Chelmis
In a typical supervised machine learning setting, the predictions on all test instances are based on a common subset of features discovered during model training. However, using a different subset of features that is most informative for each test instance individually may not only improve prediction accuracy, but also the overall interpretability of the model. At the same time, feature selection methods for classification have been known to be the most effective when many features are irrelevant and/or uncorrelated. In fact, feature selection ignoring correlations between features can lead to poor classification performance. In this work, a Bayesian network is utilized to model feature dependencies. Using the dependency network, a new method is proposed that sequentially selects the best feature to evaluate for each test instance individually, and stops the selection process to make a prediction once it determines that no further improvement can be achieved with respect to classification accuracy. The optimum number of features to acquire and the optimum classification strategy are derived for each test instance. The theoretical properties of the optimum solution are analyzed, and a new algorithm is proposed that takes advantage of these properties to implement a robust and scalable solution for high dimensional settings. The effectiveness, generalizability, and scalability of the proposed method is illustrated on a variety of real-world datasets from diverse application domains.
Lei Shen, Fandong Meng, Jinchao Zhang, Yang Feng, Jie Zhou
Generating some appealing questions in open-domain conversations is an
effective way to improve human-machine interactions and lead the topic to a
broader or deeper direction. To avoid dull or deviated questions, some
researchers tried to utilize answer, the "future" information, to guide
question generation. However, they separate a post-question-answer (PQA) triple
into two parts: post-question (PQ) and question-answer (QA) pairs, which may
hurt the overall coherence. Besides, the QA relationship is modeled as a
one-to-one mapping that is not reasonable in open-domain conversations. To
tackle these problems, we propose a generative triple-wise model with
hierarchical variations for open-domain conversational question generation
(CQG). Latent variables in three hierarchies are used to represent the shared
background of a triple and one-to-many semantic mappings in both PQ and QA
pairs. Experimental results on a large-scale CQG dataset show that our method
significantly improves the quality of questions in terms of fluency, coherence
and diversity over competitive baselines.
Authors' comments: To appear at ACL 2021 main conference (long paper)
Qi Tian, Kun Kuang, Kelu Jiang, Fei Wu, Yisen Wang
Adversarial training is one of the most effective approaches to improve model robustness against adversarial examples. However, previous works mainly focus on the overall robustness of the model, and the in-depth analysis on the role of each class involved in adversarial training is still missing. In this paper, we propose to analyze the class-wise robustness in adversarial training. First, we provide a detailed diagnosis of adversarial training on six benchmark datasets, i.e., MNIST, CIFAR-10, CIFAR-100, SVHN, STL-10 and ImageNet. Surprisingly, we find that there are remarkable robustness discrepancies among classes, leading to unbalance/unfair class-wise robustness in the robust models. Furthermore, we keep investigating the relations between classes and find that the unbalanced class-wise robustness is pretty consistent among different attack and defense methods. Moreover, we observe that the stronger attack methods in adversarial learning achieve performance improvement mainly from a more successful attack on the vulnerable classes (i.e., classes with less robustness). Inspired by these interesting findings, we design a simple but effective attack method based on the traditional PGD attack, named Temperature-PGD attack, which proposes to enlarge the robustness disparity among classes with a temperature factor on the confidence distribution of each image. Experiments demonstrate our method can achieve a higher attack rate than the PGD attack. Furthermore, from the defense perspective, we also make some modifications in the training and inference phase to improve the robustness of the most vulnerable class, so as to mitigate the large difference in class-wise robustness. We believe our work can contribute to a more comprehensive understanding of adversarial training as well as rethinking the class-wise properties in robust models.
Alessandro Casa, Andrea Cappozzo, Michael Fop
Finite Gaussian mixture models provide a powerful and widely employed
probabilistic approach for clustering multivariate continuous data. However,
the practical usefulness of these models is jeopardized in high-dimensional
spaces, where they tend to be over-parameterized. As a consequence, different
solutions have been proposed, often relying on matrix decompositions or
variable selection strategies. Recently, a methodological link between Gaussian
graphical models and finite mixtures has been established, paving the way for
penalized model-based clustering in the presence of large precision matrices.
Notwithstanding, current methodologies implicitly assume similar levels of
sparsity across the classes, not accounting for different degrees of
association between the variables across groups. We overcome this limitation by
deriving group-wise penalty factors, which automatically enforce under or
over-connectivity in the estimated graphs. The approach is entirely data-driven
and does not require additional hyper-parameter specification. Analyses on
synthetic and real data showcase the validity of our proposal.
Authors' comments: 41 pages, 11 figures
Qingyun Wang, Semih Yavuz, Victoria Lin, Heng Ji, Nazneen Rajani
Graph-to-text generation has benefited from pre-trained language models
(PLMs) in achieving better performance than structured graph encoders. However,
they fail to fully utilize the structure information of the input graph. In
this paper, we aim to further improve the performance of the pre-trained
language model by proposing a structured graph-to-text model with a two-step
fine-tuning mechanism which first fine-tunes the model on Wikipedia before
adapting to the graph-to-text generation. In addition to using the traditional
token and position embeddings to encode the knowledge graph (KG), we propose a
novel tree-level embedding method to capture the inter-dependency structures of
the input graph. This new approach has significantly improved the performance
of all text generation metrics for the English WebNLG 2017 dataset.
Authors' comments: 10 pages, Accepted by Proceedings of ACL-IJCNLP 2021 Student Research
Workshop, Code and Resources at
https://github.com/EagleW/Stage-wise-Fine-tuning
Polychronis Charitidis, Giorgos Kordopatis-Zilos, Symeon Papadopoulos, Ioannis Kompatsiaris
In this work, we present a deep learning-based approach for image tampering
localization fusion. This approach is designed to combine the outcomes of
multiple image forensics algorithms and provides a fused tampering localization
map, which requires no expert knowledge and is easier to interpret by end
users. Our fusion framework includes a set of five individual tampering
localization methods for splicing localization on JPEG images. The proposed
deep learning fusion model is an adapted architecture, initially proposed for
the image restoration task, that performs multiple operations in parallel,
weighted by an attention mechanism to enable the selection of proper operations
depending on the input signals. This weighting process can be very beneficial
for cases where the input signal is very diverse, as in our case where the
output signals of multiple image forensics algorithms are combined. Evaluation
in three publicly available forensics datasets demonstrates that the
performance of the proposed approach is competitive, outperforming the
individual forensics techniques as well as another recently proposed fusion
framework in the majority of cases.
Authors' comments: 6 pages, 3 figures, cbmi
Ivan Lazarevich, Alexander Kozlov, Nikita Malinin
We present a post-training weight pruning method for deep neural networks that achieves accuracy levels tolerable for the production setting and that is sufficiently fast to be run on commodity hardware such as desktop CPUs or edge devices. We propose a data-free extension of the approach for computer vision models based on automatically-generated synthetic fractal images. We obtain state-of-the-art results for data-free neural network pruning, with ~1.5% top@1 accuracy drop for a ResNet50 on ImageNet at 50% sparsity rate. When using real data, we are able to get a ResNet50 model on ImageNet with 65% sparsity rate in 8-bit precision in a post-training setting with a ~1% top@1 accuracy drop. We release the code as a part of the OpenVINO(TM) Post-Training Optimization tool.
Pranoy Panda, Sai Srinivas Kancheti, Vineeth N Balasubramanian
We formulate a causal extension to the recently introduced paradigm of
instance-wise feature selection to explain black-box visual classifiers. Our
method selects a subset of input features that has the greatest causal effect
on the models output. We quantify the causal influence of a subset of features
by the Relative Entropy Distance measure. Under certain assumptions this is
equivalent to the conditional mutual information between the selected subset
and the output variable. The resulting causal selections are sparser and cover
salient objects in the scene. We show the efficacy of our approach on multiple
vision datasets by measuring the post-hoc accuracy and Average Causal Effect of
selected features on the models output.
Authors' comments: 6 pages, 5 figures. Accepted at the Causality in Vision workshop,
CVPR 2021
Usman Sajid, Michael Chow, Jin Zhang, Taejoon Kim, Guanghui Wang
The paper proposes a new text recognition network for scene-text images. Many state-of-the-art methods employ the attention mechanism either in the text encoder or decoder for the text alignment. Although the encoder-based attention yields promising results, these schemes inherit noticeable limitations. They perform the feature extraction (FE) and visual attention (VA) sequentially, which bounds the attention mechanism to rely only on the FE final single-scale output. Moreover, the utilization of the attention process is limited by only applying it directly to the single scale feature-maps. To address these issues, we propose a new multi-scale and encoder-based attention network for text recognition that performs the multi-scale FE and VA in parallel. The multi-scale channels also undergo regular fusion with each other to develop the coordinated knowledge together. Quantitative evaluation and robustness analysis on the standard benchmarks demonstrate that the proposed network outperforms the state-of-the-art in most cases.
Weizhe Liu, David Ferstl, Samuel Schulter, Lukas Zebedin, Pascal Fua, Christian Leistner
We introduce a novel approach to unsupervised and semi-supervised domain adaptation for semantic segmentation. Unlike many earlier methods that rely on adversarial learning for feature alignment, we leverage contrastive learning to bridge the domain gap by aligning the features of structurally similar label patches across domains. As a result, the networks are easier to train and deliver better performance. Our approach consistently outperforms state-of-the-art unsupervised and semi-supervised methods on two challenging domain adaptive segmentation tasks, particularly with a small number of target domain annotations. It can also be naturally extended to weakly-supervised domain adaptation, where only a minor drop in accuracy can save up to 75% of annotation cost.
Daniel Geng, Max Hamilton, Andrew Owens
Image prediction methods often struggle on tasks that require changing the
positions of objects, such as video prediction, producing blurry images that
average over the many positions that objects might occupy. In this paper, we
propose a simple change to existing image similarity metrics that makes them
more robust to positional errors: we match the images using optical flow, then
measure the visual similarity of corresponding pixels. This change leads to
crisper and more perceptually accurate predictions, and does not require
modifications to the image prediction network. We apply our method to a variety
of video prediction tasks, where it obtains strong performance with simple
network architectures, and to the closely related task of video interpolation.
Code and results are available at our webpage:
https://dangeng.github.io/CorrWiseLosses
Authors' comments: CVPR 2022 Camera Ready
Peter A. Monkewitz
The scaling of different features of stream-wise normal stress profiles
$\langle uu\rangle^+(y^+)$ in turbulent wall-bounded flows, in particular in
truly parallel flows, such as channel and pipe flows, is the subject of a long
running debate. Particular points of contention are the scaling of the "inner"
and "outer" peaks of $\langle uu\rangle^+$ at $y^+\approxeq 15$ and $y^+
=\mathcal{O}(10^3)$, respectively, their infinite Reynolds number limit, and
the rate of logarithmic decay in the outer part of the flow. Inspired by the
landmark paper of Chen and Sreenivasan (2021), two terms of the inner
asymptotic expansion of $\langle uu\rangle^+$ in the small parameter
$Re_\tau^{-1/4}$ are extracted for the first time from a set of direct
numerical simulations (DNS) of channel flow. This inner expansion is completed
by a matching outer expansion, which not only fits the same set of channel DNS
within 1.5\% of the peak stress, but also provides a good match of laboratory
data in pipes and the near-wall part of boundary layers, up to the highest
$Re_\tau$'s of order $10^5$. The salient features of the new composite
expansion are first, an inner $\langle uu\rangle^+$ peak, which saturates at
11.3 and decreases as $Re_\tau^{-1/4}$, followed by a short "wall loglaw" with
a slope that becomes positive for $Re_\tau \gtrapprox 20'000$, leading up to an
outer peak, and an outer logarithmic overlap with a negative slope continuously
going to zero for $Re_\tau \to\infty$.
Authors' comments: 10 pages, 4 figures
Yuchen Ma, Songtao Liu, Zeming Li, Jian Sun
We propose a dense object detector with an instance-wise sampling strategy,
named IQDet. Instead of using human prior sampling strategies, we first extract
the regional feature of each ground-truth to estimate the instance-wise quality
distribution. According to a mixture model in spatial dimensions, the
distribution is more noise-robust and adapted to the semantic pattern of each
instance. Based on the distribution, we propose a quality sampling strategy,
which automatically selects training samples in a probabilistic manner and
trains with more high-quality samples. Extensive experiments on MS COCO show
that our method steadily improves baseline by nearly 2.4 AP without bells and
whistles. Moreover, our best model achieves 51.6 AP, outperforming all existing
state-of-the-art one-stage detectors and it is completely cost-free in
inference time.
Authors' comments: Accepted by CVPR 2021
Dimitra Koumoutsou, Eleni Charou, Georgios Siolas, Giorgos Stamou
This paper introduces the Class-wise Principal Component Analysis, a supervised feature extraction method for hyperspectral data. Hyperspectral Imaging (HSI) has appeared in various fields in recent years, including Remote Sensing. Realizing that information extraction tasks for hyperspectral images are burdened by data-specific issues, we identify and address two major problems. Those are the Curse of Dimensionality which occurs due to the high-volume of the data cube and the class imbalance problem which is common in hyperspectral datasets. Dimensionality reduction is an essential preprocessing step to complement a hyperspectral image classification task. Therefore, we propose a feature extraction algorithm for dimensionality reduction, based on Principal Component Analysis (PCA). Evaluations are carried out on the Indian Pines dataset to demonstrate that significant improvements are achieved when using the reduced data in a classification task.
Yufei Feng, Binbin Hu, Yu Gong, Fei Sun, Qingwen Liu, Wenwu Ou
Reranking is attracting incremental attention in the recommender systems,
which rearranges the input ranking list into the final rank-ing list to better
meet user demands. Most existing methods greedily rerank candidates through the
rating scores from point-wise or list-wise models. Despite effectiveness,
neglecting the mutual influence between each item and its contexts in the final
ranking list often makes the greedy strategy based reranking methods
sub-optimal. In this work, we propose a new context-wise reranking framework
named Generative Rerank Network (GRN). Specifically, we first design the
evaluator, which applies Bi-LSTM and self-attention mechanism to model the
contextual information in the labeled final ranking list and predict the
interaction probability of each item more precisely. Afterwards, we elaborate
on the generator, equipped with GRU, attention mechanism and pointer network to
select the item from the input ranking list step by step. Finally, we apply
cross-entropy loss to train the evaluator and, subsequently, policy gradient to
optimize the generator under the guidance of the evaluator. Empirical results
show that GRN consistently and significantly outperforms state-of-the-art
point-wise and list-wise methods. Moreover, GRN has achieved a performance
improvement of 5.2% on PV and 6.1% on IPV metric after the successful
deployment in one popular recommendation scenario of Taobao application.
Authors' comments: Better read with arXiv:2102.12057. arXiv admin note: text overlap
with arXiv:2102.12057
Ange Lou, Murray Loew
Real-time semantic segmentation is playing a more important role in computer
vision, due to the growing demand for mobile devices and autonomous driving.
Therefore, it is very important to achieve a good trade-off among performance,
model size and inference speed. In this paper, we propose a Channel-wise
Feature Pyramid (CFP) module to balance those factors. Based on the CFP module,
we built CFPNet for real-time semantic segmentation which applied a series of
dilated convolution channels to extract effective features. Experiments on
Cityscapes and CamVid datasets show that the proposed CFPNet achieves an
effective combination of those factors. For the Cityscapes test dataset, CFPNet
achieves 70.1% class-wise mIoU with only 0.55 million parameters and 2.5 MB
memory. The inference speed can reach 30 FPS on a single RTX 2080Ti GPU with a
1024x2048-pixel image.
Authors' comments: Accepted by ICIP 2021
Guanglei Yang, Hao Tang, Mingli Ding, Nicu Sebe, Elisa Ricci
While convolutional neural networks have shown a tremendous impact on various
computer vision tasks, they generally demonstrate limitations in explicitly
modeling long-range dependencies due to the intrinsic locality of the
convolution operation. Initially designed for natural language processing
tasks, Transformers have emerged as alternative architectures with innate
global self-attention mechanisms to capture long-range dependencies. In this
paper, we propose TransDepth, an architecture that benefits from both
convolutional neural networks and transformers. To avoid the network losing its
ability to capture local-level details due to the adoption of transformers, we
propose a novel decoder that employs attention mechanisms based on gates.
Notably, this is the first paper that applies transformers to pixel-wise
prediction problems involving continuous labels (i.e., monocular depth
prediction and surface normal estimation). Extensive experiments demonstrate
that the proposed TransDepth achieves state-of-the-art performance on three
challenging datasets. Our code is available at:
https://github.com/ygjwd12345/TransDepth.
Authors' comments: ICCV 2021
HanQin Cai, Keaton Hamm, Longxiu Huang, Deanna Needell
Low rank tensor approximation is a fundamental tool in modern machine learning and data science. In this paper, we study the characterization, perturbation analysis, and an efficient sampling strategy for two primary tensor CUR approximations, namely Chidori and Fiber CUR. We characterize exact tensor CUR decompositions for low multilinear rank tensors. We also present theoretical error bounds of the tensor CUR approximations when (adversarial or Gaussian) noise appears. Moreover, we show that low cost uniform sampling is sufficient for tensor CUR approximations if the tensor has an incoherent structure. Empirical performance evaluations, with both synthetic and real-world datasets, establish the speed advantage of the tensor CUR approximations over other state-of-the-art low multilinear rank tensor approximations.
Chen Liang, Haoming Jiang, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao, Tuo Zhao
Existing curriculum learning approaches to Neural Machine Translation (NMT) require sampling sufficient amounts of "easy" samples from training data at the early training stage. This is not always achievable for low-resource languages where the amount of training data is limited. To address such limitation, we propose a novel token-wise curriculum learning approach that creates sufficient amounts of easy samples. Specifically, the model learns to predict a short sub-sequence from the beginning part of each target sentence at the early stage of training, and then the sub-sequence is gradually expanded as the training progresses. Such a new curriculum design is inspired by the cumulative effect of translation errors, which makes the latter tokens more difficult to predict than the beginning ones. Extensive experiments show that our approach can consistently outperform baselines on 5 language pairs, especially for low-resource languages. Combining our approach with sentence-level methods further improves the performance on high-resource languages.
Yang Su, Michael Chesser, Yansong Gao, Alanson P. Sample, Damith C. Ranasinghe
Emerging ultra-low-power tiny scale computing devices in Cyber-Physical
Systems %and Internet of Things (IoT) run on harvested energy, are
intermittently powered, have limited computational capability, and perform
sensing and actuation functions under the control of a dedicated firmware
operating without the supervisory control of an operating system. Wirelessly
updating or patching the firmware of such devices is inevitable. We consider
the challenging problem of simultaneous and secure firmware updates or patching
for a typical class of such devices -- Computational Radio Frequency
Identification (CRFID) devices. We propose Wisecr, the first secure and
simultaneous wireless code dissemination mechanism to multiple devices that
prevent malicious code injection attacks and intellectual property (IP) theft,
whilst enabling remote attestation of code installation. Importantly, Wisecr is
engineered to comply with existing ISO compliant communication protocol
standards employed by CRFID devices and systems. We comprehensively evaluate
Wisecr's overhead, demonstrate its implementation over standards-compliant
protocols, analyze its security and implement an end-to-end realization with
popular CRFID devices -- the open-source code is released on GitHub.
Authors' comments: 19 main pages, 6 Appendix. Under review at IEEE TDSC