Cédric Rommel, Thomas Moreau, Joseph Paillard, Alexandre Gramfort
Data augmentation is a key element of deep learning pipelines, as it informs the network during training about transformations of the input data that keep the label unchanged. Manually finding adequate augmentation methods and parameters for a given pipeline is however rapidly cumbersome. In particular, while intuition can guide this decision for images, the design and choice of augmentation policies remains unclear for more complex types of data, such as neuroscience signals. Besides, class-dependent augmentation strategies have been surprisingly unexplored in the literature, although it is quite intuitive: changing the color of a car image does not change the object class to be predicted, but doing the same to the picture of an orange does. This paper investigates gradient-based automatic data augmentation algorithms amenable to class-wise policies with exponentially larger search spaces. Motivated by supervised learning applications using EEG signals for which good augmentation policies are mostly unknown, we propose a new differentiable relaxation of the problem. In the class-agnostic setting, results show that our new relaxation leads to optimal performance with faster training than competing gradient-based methods, while also outperforming gradient-free methods in the class-wise setting. This work proposes also novel differentiable augmentation operations relevant for sleep stage classification.
J. Davy Kirkpatrick, Federico Marocco, Dan Caselden, Aaron M. Meisner, Jacqueline K. Faherty, Adam C. Schneider, Marc J. Kuchner, S. L. Casewell et al.
Continued follow-up of WISEA J153429.75-104303.3, announced in Meisner et al
(2020), has proven it to have an unusual set of properties. New imaging data
from Keck/MOSFIRE and HST/WFC3 show that this object is one of the few faint
proper motion sources known with J-ch2 > 8 mag, indicating a very cold
temperature consistent with the latest known Y dwarfs. Despite this, it has
W1-W2 and ch1-ch2 colors ~1.6 mag bluer than a typical Y dwarf. A new
trigonometric parallax measurement from a combination of WISE, Spitzer, and HST
astrometry confirms a nearby distance of $16.3^{+1.4}_{-1.2}$ pc and a large
transverse velocity of $207.4{\pm}15.9$ km/s. The absolute J, W2, and ch2
magnitudes are in line with the coldest known Y dwarfs, despite the highly
discrepant W1-W2 and ch1-ch2 colors. We explore possible reasons for the unique
traits of this object and conclude that it is most likely an old, metal-poor
brown dwarf and possibly the first Y subdwarf. Given that the object has an HST
F110W magnitude of 24.7 mag, broad-band spectroscopy and photometry from JWST
are the best options for testing this hypothesis.
Authors' comments: 8 pages, 4 figures, accepted for publication in The Astrophysical
Journal Letters
An-phi Nguyen, Maria Rodriguez Martinez
Interpretability has become a necessary feature for machine learning models deployed in critical scenarios, e.g. legal system, healthcare. In these situations, algorithmic decisions may have (potentially negative) long-lasting effects on the end-user affected by the decision. In many cases, the representational power of deep learning models is not needed, therefore simple and interpretable models (e.g. linear models) should be preferred. However, in high-dimensional and/or complex domains (e.g. computer vision), the universal approximation capabilities of neural networks are required. Inspired by linear models and the Kolmogorov-Arnold representation theorem, we propose a novel class of structurally-constrained neural networks, which we call FLANs (Feature-wise Latent Additive Networks). Crucially, FLANs process each input feature separately, computing for each of them a representation in a common latent space. These feature-wise latent representations are then simply summed, and the aggregated representation is used for prediction. These constraints (which are at the core of the interpretability of linear models) allow a user to estimate the effect of each individual feature independently from the others, enhancing interpretability. In a set of experiments across different domains, we show how without compromising excessively the test performance, the structural constraints proposed in FLANs indeed facilitates the interpretability of deep learning models. We quantitatively compare FLANs interpretability to post-hoc methods using recently introduced metrics, discussing the advantages of natively interpretable models over a post-hoc analysis.
Yifan Wu, Min Zeng, Ying Yu, Min Li
Automatic International Classification of Diseases (ICD) coding is defined as a kind of text multi-label classification problem, which is difficult because the number of labels is very large and the distribution of labels is unbalanced. The label-wise attention mechanism is widely used in automatic ICD coding because it can assign weights to every word in full Electronic Medical Records (EMR) for different ICD codes. However, the label-wise attention mechanism is computational redundant and costly. In this paper, we propose a pseudo label-wise attention mechanism to tackle the problem. Instead of computing different attention modes for different ICD codes, the pseudo label-wise attention mechanism automatically merges similar ICD codes and computes only one attention mode for the similar ICD codes, which greatly compresses the number of attention modes and improves the predicted accuracy. In addition, we apply a more convenient and effective way to obtain the ICD vectors, and thus our model can predict new ICD codes by calculating the similarities between EMR vectors and ICD vectors. Extensive experiments show the superior performance of our model. On the public MIMIC-III dataset and private Xiangya dataset, our model achieves micro f1 of 0.583 and 0.806, respectively, which outperforms other competing models. Furthermore, we verify the ability of our model in predicting new ICD codes. The case study shows how pseudo label-wise attention works, and demonstrates the effectiveness of pseudo label-wise attention mechanism.
Zhong Ji, Kexin Chen, Haoran Wang
Image-text matching plays a central role in bridging the semantic gap between
vision and language. The key point to achieve precise visual-semantic alignment
lies in capturing the fine-grained cross-modal correspondence between image and
text. Most previous methods rely on single-step reasoning to discover the
visual-semantic interactions, which lacks the ability of exploiting the
multi-level information to locate the hierarchical fine-grained relevance.
Different from them, in this work, we propose a step-wise hierarchical
alignment network (SHAN) that decomposes image-text matching into multi-step
cross-modal reasoning process. Specifically, we first achieve local-to-local
alignment at fragment level, following by performing global-to-local and
global-to-global alignment at context level sequentially. This progressive
alignment strategy supplies our model with more complementary and sufficient
semantic clues to understand the hierarchical correlations between image and
text. The experimental results on two benchmark datasets demonstrate the
superiority of our proposed method.
Authors' comments: Accepted by IJCAI 2021
Zefan Li, Chenxi Liu, Alan Yuille, Bingbing Ni, Wenjun Zhang, Wen Gao
Unsupervised learning methods have recently shown their competitiveness
against supervised training. Typically, these methods use a single objective to
train the entire network. But one distinct advantage of unsupervised over
supervised learning is that the former possesses more variety and freedom in
designing the objective. In this work, we explore new dimensions of
unsupervised learning by proposing the Progressive Stage-wise Learning (PSL)
framework. For a given unsupervised task, we design multilevel tasks and define
different learning stages for the deep network. Early learning stages are
forced to focus on lowlevel tasks while late stages are guided to extract
deeper information through harder tasks. We discover that by progressive
stage-wise learning, unsupervised feature representation can be effectively
enhanced. Our extensive experiments show that PSL consistently improves results
for the leading unsupervised learning methods.
Authors' comments: Accepted by the IEEE conference on computer vision and pattern
recognition. 2021
Jianqiang Huang, Ke Hu, Qingtao Tang, Mingjian Chen, Yi Qi, Jia Cheng, Jun Lei
Click-through rate (CTR) prediction plays an important role in online
advertising and recommender systems. In practice, the training of CTR models
depends on click data which is intrinsically biased towards higher positions
since higher position has higher CTR by nature. Existing methods such as actual
position training with fixed position inference and inverse propensity weighted
training with no position inference alleviate the bias problem to some extend.
However, the different treatment of position information between training and
inference will inevitably lead to inconsistency and sub-optimal online
performance. Meanwhile, the basic assumption of these methods, i.e., the click
probability is the product of examination probability and relevance
probability, is oversimplified and insufficient to model the rich interaction
between position and other information. In this paper, we propose a Deep
Position-wise Interaction Network (DPIN) to efficiently combine all candidate
items and positions for estimating CTR at each position, achieving consistency
between offline and online as well as modeling the deep non-linear interaction
among position, user, context and item under the limit of serving performance.
Following our new treatment to the position bias in CTR prediction, we propose
a new evaluation metrics named PAUC (position-wise AUC) that is suitable for
measuring the ranking quality at a given position. Through extensive
experiments on a real world dataset, we show empirically that our method is
both effective and efficient in solving position bias problem. We have also
deployed our method in production and observed statistically significant
improvement over a highly optimized baseline in a rigorous A/B test.
Authors' comments: Accepted by SIGIR 2021
Zichuan Lin, Jing Huang, Bowen Zhou, Xiaodong He, Tengyu Ma
Recent work (Takanobu et al., 2020) proposed the system-wise evaluation on
dialog systems and found that improvement on individual components (e.g., NLU,
policy) in prior work may not necessarily bring benefit to pipeline systems in
system-wise evaluation. To improve the system-wise performance, in this paper,
we propose new joint system-wise optimization techniques for the pipeline
dialog system. First, we propose a new data augmentation approach which
automates the labeling process for NLU training. Second, we propose a novel
stochastic policy parameterization with Poisson distribution that enables
better exploration and offers a principled way to compute policy gradient.
Third, we propose a reward bonus to help policy explore successful dialogs. Our
approaches outperform the competitive pipeline systems from Takanobu et al.
(2020) by big margins of 12% success rate in automatic system-wise evaluation
and of 16% success rate in human evaluation on the standard multi-domain
benchmark dataset MultiWOZ 2.1, and also outperform the recent state-of-the-art
end-to-end trained model from DSTC9.
Authors' comments: 13 pages
Yasitha Warahena Liyanage, Daphney-Stavroula Zois, Charalampos Chelmis
In a typical supervised machine learning setting, the predictions on all test instances are based on a common subset of features discovered during model training. However, using a different subset of features that is most informative for each test instance individually may not only improve prediction accuracy, but also the overall interpretability of the model. At the same time, feature selection methods for classification have been known to be the most effective when many features are irrelevant and/or uncorrelated. In fact, feature selection ignoring correlations between features can lead to poor classification performance. In this work, a Bayesian network is utilized to model feature dependencies. Using the dependency network, a new method is proposed that sequentially selects the best feature to evaluate for each test instance individually, and stops the selection process to make a prediction once it determines that no further improvement can be achieved with respect to classification accuracy. The optimum number of features to acquire and the optimum classification strategy are derived for each test instance. The theoretical properties of the optimum solution are analyzed, and a new algorithm is proposed that takes advantage of these properties to implement a robust and scalable solution for high dimensional settings. The effectiveness, generalizability, and scalability of the proposed method is illustrated on a variety of real-world datasets from diverse application domains.
Lei Shen, Fandong Meng, Jinchao Zhang, Yang Feng, Jie Zhou
Generating some appealing questions in open-domain conversations is an
effective way to improve human-machine interactions and lead the topic to a
broader or deeper direction. To avoid dull or deviated questions, some
researchers tried to utilize answer, the "future" information, to guide
question generation. However, they separate a post-question-answer (PQA) triple
into two parts: post-question (PQ) and question-answer (QA) pairs, which may
hurt the overall coherence. Besides, the QA relationship is modeled as a
one-to-one mapping that is not reasonable in open-domain conversations. To
tackle these problems, we propose a generative triple-wise model with
hierarchical variations for open-domain conversational question generation
(CQG). Latent variables in three hierarchies are used to represent the shared
background of a triple and one-to-many semantic mappings in both PQ and QA
pairs. Experimental results on a large-scale CQG dataset show that our method
significantly improves the quality of questions in terms of fluency, coherence
and diversity over competitive baselines.
Authors' comments: To appear at ACL 2021 main conference (long paper)
Qi Tian, Kun Kuang, Kelu Jiang, Fei Wu, Yisen Wang
Adversarial training is one of the most effective approaches to improve model robustness against adversarial examples. However, previous works mainly focus on the overall robustness of the model, and the in-depth analysis on the role of each class involved in adversarial training is still missing. In this paper, we propose to analyze the class-wise robustness in adversarial training. First, we provide a detailed diagnosis of adversarial training on six benchmark datasets, i.e., MNIST, CIFAR-10, CIFAR-100, SVHN, STL-10 and ImageNet. Surprisingly, we find that there are remarkable robustness discrepancies among classes, leading to unbalance/unfair class-wise robustness in the robust models. Furthermore, we keep investigating the relations between classes and find that the unbalanced class-wise robustness is pretty consistent among different attack and defense methods. Moreover, we observe that the stronger attack methods in adversarial learning achieve performance improvement mainly from a more successful attack on the vulnerable classes (i.e., classes with less robustness). Inspired by these interesting findings, we design a simple but effective attack method based on the traditional PGD attack, named Temperature-PGD attack, which proposes to enlarge the robustness disparity among classes with a temperature factor on the confidence distribution of each image. Experiments demonstrate our method can achieve a higher attack rate than the PGD attack. Furthermore, from the defense perspective, we also make some modifications in the training and inference phase to improve the robustness of the most vulnerable class, so as to mitigate the large difference in class-wise robustness. We believe our work can contribute to a more comprehensive understanding of adversarial training as well as rethinking the class-wise properties in robust models.
Alessandro Casa, Andrea Cappozzo, Michael Fop
Finite Gaussian mixture models provide a powerful and widely employed
probabilistic approach for clustering multivariate continuous data. However,
the practical usefulness of these models is jeopardized in high-dimensional
spaces, where they tend to be over-parameterized. As a consequence, different
solutions have been proposed, often relying on matrix decompositions or
variable selection strategies. Recently, a methodological link between Gaussian
graphical models and finite mixtures has been established, paving the way for
penalized model-based clustering in the presence of large precision matrices.
Notwithstanding, current methodologies implicitly assume similar levels of
sparsity across the classes, not accounting for different degrees of
association between the variables across groups. We overcome this limitation by
deriving group-wise penalty factors, which automatically enforce under or
over-connectivity in the estimated graphs. The approach is entirely data-driven
and does not require additional hyper-parameter specification. Analyses on
synthetic and real data showcase the validity of our proposal.
Authors' comments: 41 pages, 11 figures
Qingyun Wang, Semih Yavuz, Victoria Lin, Heng Ji, Nazneen Rajani
Graph-to-text generation has benefited from pre-trained language models
(PLMs) in achieving better performance than structured graph encoders. However,
they fail to fully utilize the structure information of the input graph. In
this paper, we aim to further improve the performance of the pre-trained
language model by proposing a structured graph-to-text model with a two-step
fine-tuning mechanism which first fine-tunes the model on Wikipedia before
adapting to the graph-to-text generation. In addition to using the traditional
token and position embeddings to encode the knowledge graph (KG), we propose a
novel tree-level embedding method to capture the inter-dependency structures of
the input graph. This new approach has significantly improved the performance
of all text generation metrics for the English WebNLG 2017 dataset.
Authors' comments: 10 pages, Accepted by Proceedings of ACL-IJCNLP 2021 Student Research
Workshop, Code and Resources at
https://github.com/EagleW/Stage-wise-Fine-tuning
Polychronis Charitidis, Giorgos Kordopatis-Zilos, Symeon Papadopoulos, Ioannis Kompatsiaris
In this work, we present a deep learning-based approach for image tampering
localization fusion. This approach is designed to combine the outcomes of
multiple image forensics algorithms and provides a fused tampering localization
map, which requires no expert knowledge and is easier to interpret by end
users. Our fusion framework includes a set of five individual tampering
localization methods for splicing localization on JPEG images. The proposed
deep learning fusion model is an adapted architecture, initially proposed for
the image restoration task, that performs multiple operations in parallel,
weighted by an attention mechanism to enable the selection of proper operations
depending on the input signals. This weighting process can be very beneficial
for cases where the input signal is very diverse, as in our case where the
output signals of multiple image forensics algorithms are combined. Evaluation
in three publicly available forensics datasets demonstrates that the
performance of the proposed approach is competitive, outperforming the
individual forensics techniques as well as another recently proposed fusion
framework in the majority of cases.
Authors' comments: 6 pages, 3 figures, cbmi
Ivan Lazarevich, Alexander Kozlov, Nikita Malinin
We present a post-training weight pruning method for deep neural networks that achieves accuracy levels tolerable for the production setting and that is sufficiently fast to be run on commodity hardware such as desktop CPUs or edge devices. We propose a data-free extension of the approach for computer vision models based on automatically-generated synthetic fractal images. We obtain state-of-the-art results for data-free neural network pruning, with ~1.5% top@1 accuracy drop for a ResNet50 on ImageNet at 50% sparsity rate. When using real data, we are able to get a ResNet50 model on ImageNet with 65% sparsity rate in 8-bit precision in a post-training setting with a ~1% top@1 accuracy drop. We release the code as a part of the OpenVINO(TM) Post-Training Optimization tool.
Pranoy Panda, Sai Srinivas Kancheti, Vineeth N Balasubramanian
We formulate a causal extension to the recently introduced paradigm of
instance-wise feature selection to explain black-box visual classifiers. Our
method selects a subset of input features that has the greatest causal effect
on the models output. We quantify the causal influence of a subset of features
by the Relative Entropy Distance measure. Under certain assumptions this is
equivalent to the conditional mutual information between the selected subset
and the output variable. The resulting causal selections are sparser and cover
salient objects in the scene. We show the efficacy of our approach on multiple
vision datasets by measuring the post-hoc accuracy and Average Causal Effect of
selected features on the models output.
Authors' comments: 6 pages, 5 figures. Accepted at the Causality in Vision workshop,
CVPR 2021
Usman Sajid, Michael Chow, Jin Zhang, Taejoon Kim, Guanghui Wang
The paper proposes a new text recognition network for scene-text images. Many state-of-the-art methods employ the attention mechanism either in the text encoder or decoder for the text alignment. Although the encoder-based attention yields promising results, these schemes inherit noticeable limitations. They perform the feature extraction (FE) and visual attention (VA) sequentially, which bounds the attention mechanism to rely only on the FE final single-scale output. Moreover, the utilization of the attention process is limited by only applying it directly to the single scale feature-maps. To address these issues, we propose a new multi-scale and encoder-based attention network for text recognition that performs the multi-scale FE and VA in parallel. The multi-scale channels also undergo regular fusion with each other to develop the coordinated knowledge together. Quantitative evaluation and robustness analysis on the standard benchmarks demonstrate that the proposed network outperforms the state-of-the-art in most cases.
Weizhe Liu, David Ferstl, Samuel Schulter, Lukas Zebedin, Pascal Fua, Christian Leistner
We introduce a novel approach to unsupervised and semi-supervised domain adaptation for semantic segmentation. Unlike many earlier methods that rely on adversarial learning for feature alignment, we leverage contrastive learning to bridge the domain gap by aligning the features of structurally similar label patches across domains. As a result, the networks are easier to train and deliver better performance. Our approach consistently outperforms state-of-the-art unsupervised and semi-supervised methods on two challenging domain adaptive segmentation tasks, particularly with a small number of target domain annotations. It can also be naturally extended to weakly-supervised domain adaptation, where only a minor drop in accuracy can save up to 75% of annotation cost.
Daniel Geng, Max Hamilton, Andrew Owens
Image prediction methods often struggle on tasks that require changing the
positions of objects, such as video prediction, producing blurry images that
average over the many positions that objects might occupy. In this paper, we
propose a simple change to existing image similarity metrics that makes them
more robust to positional errors: we match the images using optical flow, then
measure the visual similarity of corresponding pixels. This change leads to
crisper and more perceptually accurate predictions, and does not require
modifications to the image prediction network. We apply our method to a variety
of video prediction tasks, where it obtains strong performance with simple
network architectures, and to the closely related task of video interpolation.
Code and results are available at our webpage:
https://dangeng.github.io/CorrWiseLosses
Authors' comments: CVPR 2022 Camera Ready
Peter A. Monkewitz
The scaling of different features of stream-wise normal stress profiles
$\langle uu\rangle^+(y^+)$ in turbulent wall-bounded flows, in particular in
truly parallel flows, such as channel and pipe flows, is the subject of a long
running debate. Particular points of contention are the scaling of the "inner"
and "outer" peaks of $\langle uu\rangle^+$ at $y^+\approxeq 15$ and $y^+
=\mathcal{O}(10^3)$, respectively, their infinite Reynolds number limit, and
the rate of logarithmic decay in the outer part of the flow. Inspired by the
landmark paper of Chen and Sreenivasan (2021), two terms of the inner
asymptotic expansion of $\langle uu\rangle^+$ in the small parameter
$Re_\tau^{-1/4}$ are extracted for the first time from a set of direct
numerical simulations (DNS) of channel flow. This inner expansion is completed
by a matching outer expansion, which not only fits the same set of channel DNS
within 1.5\% of the peak stress, but also provides a good match of laboratory
data in pipes and the near-wall part of boundary layers, up to the highest
$Re_\tau$'s of order $10^5$. The salient features of the new composite
expansion are first, an inner $\langle uu\rangle^+$ peak, which saturates at
11.3 and decreases as $Re_\tau^{-1/4}$, followed by a short "wall loglaw" with
a slope that becomes positive for $Re_\tau \gtrapprox 20'000$, leading up to an
outer peak, and an outer logarithmic overlap with a negative slope continuously
going to zero for $Re_\tau \to\infty$.
Authors' comments: 10 pages, 4 figures