Mehmet Ozgur Turkoglu, Alexander Becker, Hüseyin Anil Gündüz, Mina Rezaei, Bernd Bischl, Rodrigo Caye Daudt, Stefano D'Aronco, Jan Dirk Wegner et al.
The ability to estimate epistemic uncertainty is often crucial when deploying
machine learning in the real world, but modern methods often produce
overconfident, uncalibrated uncertainty predictions. A common approach to
quantify epistemic uncertainty, usable across a wide class of prediction
models, is to train a model ensemble. In a naive implementation, the ensemble
approach has high computational cost and high memory demand. This challenges in
particular modern deep learning, where even a single deep network is already
demanding in terms of compute and memory, and has given rise to a number of
attempts to emulate the model ensemble without actually instantiating separate
ensemble members. We introduce FiLM-Ensemble, a deep, implicit ensemble method
based on the concept of Feature-wise Linear Modulation (FiLM). That technique
was originally developed for multi-task learning, with the aim of decoupling
different tasks. We show that the idea can be extended to uncertainty
quantification: by modulating the network activations of a single deep network
with FiLM, one obtains a model ensemble with high diversity, and consequently
well-calibrated estimates of epistemic uncertainty, with low computational
overhead in comparison. Empirically, FiLM-Ensemble outperforms other implicit
ensemble methods, and it and comes very close to the upper bound of an explicit
ensemble of networks (sometimes even beating it), at a fraction of the memory
cost.
Authors' comments: accepted at NeurIPS 2022
Sebastian Krieter, Thomas Thüm, Sandro Schulze, Sebastian Ruland, Malte Lochau, Gunter Saake, Thomas Leich
Sampling techniques, such as t-wise interaction sampling are used to enable
efficient testing for configurable systems. This is achieved by generating a
small yet representative sample of configurations for a system, which
circumvents testing the entire solution space. However, by design, most recent
approaches for t-wise interaction sampling only consider combinations of
configuration options from a configurable system's variability model and do not
take into account their mapping onto the solution space, thus potentially
leaving critical implementation artifacts untested. Tartler et al. address this
problem by considering presence conditions of implementation artifacts rather
than pure configuration options, but do not consider the possible interactions
between these artifacts. In this paper, we introduce t-wise presence condition
coverage, which extends the approach of Tartler et al. by using presence
conditions extracted from the code as basis to cover t-wise interactions. This
ensures that all t-wise interactions of implementation artifacts are included
in the sample and that the chance of detecting combinations of faulty
configuration options is increased. We evaluate our approach in terms of
testing efficiency and testing effectiveness by comparing the approach to
existing t-wise interaction sampling techniques. We show that t-wise presence
condition sampling is able to produce mostly smaller samples compared to t-wise
interaction sampling, while guaranteeing a t-wise presence condition coverage
of 100%.
Authors' comments: 28 pages
Bowen Li, Jianfeng Lu, Ziang Yu
This work aims to numerically construct exactly commuting matrices close to
given almost commuting ones, which is equivalent to the joint approximate
diagonalization problem. We first prove that almost commuting matrices
generically have approximate common eigenvectors that are almost orthogonal to
each other. Based on this key observation, we propose a fast and robust
vector-wise joint diagonalization (VJD) algorithm, which constructs the
orthogonal similarity transform by sequentially finding these approximate
common eigenvectors. In doing so, we consider sub-optimization problems over
the unit sphere, for which we present a Riemannian quasi-Newton method with
rigorous convergence analysis. We also discuss the numerical stability of the
proposed VJD algorithm. Numerical examples with applications in independent
component analysis are provided to reveal the relation with Huaxin Lin's
theorem and to demonstrate that our method compares favorably with the
state-of-the-art Jacobi-type joint diagonalization algorithm.
Authors' comments: revised
Jiantao Wu, Shentong Mo
Self-supervised pre-training for images without labels has recently achieved promising performance in image classification. The success of transformer-based methods, ViT and MAE, draws the community's attention to the design of backbone architecture and self-supervised task. In this work, we show that current masked image encoding models learn the underlying relationship between all objects in the whole scene, instead of a single object representation. Therefore, those methods bring a lot of compute time for self-supervised pre-training. To solve this issue, we introduce a novel object selection and division strategy to drop non-object patches for learning object-wise representations by selective reconstruction with interested region masks. We refer to this method ObjMAE. Extensive experiments on four commonly-used datasets demonstrate the effectiveness of our model in reducing the compute cost by 72% while achieving competitive performance. Furthermore, we investigate the inter-object and intra-object relationship and find that the latter is crucial for self-supervised pre-training.
Samarth Tiwari, Michelle Yeo, Zeta Avarikioti, Iosif Salem, Krzysztof Pietrzak, Stefan Schmid
Payment channel networks (PCNs) are one of the most prominent solutions to the limited transaction throughput of blockchains. Nevertheless, PCNs suffer themselves from a throughput limitation due to the capital constraints of their channels. A similar dependence on high capital is also found in inter-bank payment settlements, where the so-called netting technique is used to mitigate liquidity demands. In this work, we alleviate this limitation by introducing the notion of transaction aggregation: instead of executing transactions sequentially through a PCN, we enable senders to aggregate multiple transactions and execute them simultaneously to benefit from several amounts that may "cancel out". Two direct advantages of our proposal is the decrease in intermediary fees paid by senders as well as the obfuscation of the transaction data from the intermediaries. We formulate the transaction aggregation as a computational problem, a generalization of the Bank Clearing Problem. We present a generic framework for the transaction aggregation execution, and thereafter we propose Wiser as an implementation of this framework in a specific hub-based setting. To overcome the NP-hardness of the transaction aggregation problem, in Wiser we propose a fixed-parameter linear algorithm for a special case of transaction aggregation as well as the Bank Clearing Problem. Wiser can also be seen as a modern variant of the Hawala money transfer system, as well as a decentralized implementation of the overseas remittance service of Wise.
Chae Eun Lee, Hyelim Park, Yeong-Gil Shin, Minyoung Chung
Semi-supervised learning for medical image segmentation is an important area of research for alleviating the huge cost associated with the construction of reliable large-scale annotations in the medical domain. Recent semi-supervised approaches have demonstrated promising results by employing consistency regularization, pseudo-labeling techniques, and adversarial learning. These methods primarily attempt to learn the distribution of labeled and unlabeled data by enforcing consistency in the predictions or embedding context. However, previous approaches have focused only on local discrepancy minimization or context relations across single classes. In this paper, we introduce a novel adversarial learning-based semi-supervised segmentation method that effectively embeds both local and global features from multiple hidden layers and learns context relations between multiple classes. Our voxel-wise adversarial learning method utilizes a voxel-wise feature discriminator, which considers multilayer voxel-wise features (involving both local and global features) as an input by embedding class-specific voxel-wise feature distribution. Furthermore, we improve our previous representation learning method by overcoming information loss and learning stability problems, which enables rich representations of labeled data. Our method outperforms current best-performing state-of-the-art semi-supervised learning approaches on the image segmentation of the left atrium (single class) and multiorgan datasets (multiclass). Moreover, our visual interpretation of the feature space demonstrates that our proposed method enables a well-distributed and separated feature space from both labeled and unlabeled data, which improves the overall prediction results.
Hrayr Harutyunyan, Greg Ver Steeg, Aram Galstyan
Some of the tightest information-theoretic generalization bounds depend on
the average information between the learned hypothesis and a single training
example. However, these sample-wise bounds were derived only for expected
generalization gap. We show that even for expected squared generalization gap
no such sample-wise information-theoretic bounds exist. The same is true for
PAC-Bayes and single-draw bounds. Remarkably, PAC-Bayes, single-draw and
expected squared generalization gap bounds that depend on information in pairs
of examples exist.
Authors' comments: 2022 IEEE Information Theory Workshop
Xiaosong Ma, Jie Zhang, Song Guo, Wenchao Xu
Personalized Federated Learning (pFL) not only can capture the common priors from broad range of distributed data, but also support customized models for heterogeneous clients. Researches over the past few years have applied the weighted aggregation manner to produce personalized models, where the weights are determined by calibrating the distance of the entire model parameters or loss values, and have yet to consider the layer-level impacts to the aggregation process, leading to lagged model convergence and inadequate personalization over non-IID datasets. In this paper, we propose a novel pFL training framework dubbed Layer-wised Personalized Federated learning (pFedLA) that can discern the importance of each layer from different clients, and thus is able to optimize the personalized model aggregation for clients with heterogeneous data. Specifically, we employ a dedicated hypernetwork per client on the server side, which is trained to identify the mutual contribution factors at layer granularity. Meanwhile, a parameterized mechanism is introduced to update the layer-wised aggregation weights to progressively exploit the inter-user similarity and realize accurate model personalization. Extensive experiments are conducted over different models and learning tasks, and we show that the proposed methods achieve significantly higher performance than state-of-the-art pFL methods.
Hui Liu, Yibiao Huang, Xuejun Liu, Lei Deng
Accurate and efficient prediction of the molecular properties of drugs is one of the fundamental problems in drug research and development. Recent advancements in representation learning have been shown to greatly improve the performance of molecular property prediction. However, due to limited labeled data, supervised learning-based molecular representation algorithms can only search limited chemical space, which results in poor generalizability. In this work, we proposed a self-supervised representation learning framework for large-scale unlabeled molecules. We developed a novel molecular graph augmentation strategy, referred to as attention-wise graph mask, to generate challenging positive sample for contrastive learning. We adopted the graph attention network (GAT) as the molecular graph encoder, and leveraged the learned attention scores as masking guidance to generate molecular augmentation graphs. By minimization of the contrastive loss between original graph and masked graph, our model can capture important molecular structure and higher-order semantic information. Extensive experiments showed that our attention-wise graph mask contrastive learning exhibit state-of-the-art performance in a couple of downstream molecular property prediction tasks.
Leibo Liu, Oscar Perez-Concha, Anthony Nguyen, Vicki Bennett, Louisa Jorm
International Classification of Diseases (ICD) coding plays an important role in systematically classifying morbidity and mortality data. In this study, we propose a hierarchical label-wise attention Transformer model (HiLAT) for the explainable prediction of ICD codes from clinical documents. HiLAT firstly fine-tunes a pretrained Transformer model to represent the tokens of clinical documents. We subsequently employ a two-level hierarchical label-wise attention mechanism that creates label-specific document representations. These representations are in turn used by a feed-forward neural network to predict whether a specific ICD code is assigned to the input clinical document of interest. We evaluate HiLAT using hospital discharge summaries and their corresponding ICD-9 codes from the MIMIC-III database. To investigate the performance of different types of Transformer models, we develop ClinicalplusXLNet, which conducts continual pretraining from XLNet-Base using all the MIMIC-III clinical notes. The experiment results show that the F1 scores of the HiLAT+ClinicalplusXLNet outperform the previous state-of-the-art models for the top-50 most frequent ICD-9 codes from MIMIC-III. Visualisations of attention weights present a potential explainability tool for checking the face validity of ICD code predictions.
Yuanhao Cai, Jing Lin, Zudi Lin, Haoqian Wang, Yulun Zhang, Hanspeter Pfister, Radu Timofte, Luc Van Gool
Existing leading methods for spectral reconstruction (SR) focus on designing
deeper or wider convolutional neural networks (CNNs) to learn the end-to-end
mapping from the RGB image to its hyperspectral image (HSI). These CNN-based
methods achieve impressive restoration performance while showing limitations in
capturing the long-range dependencies and self-similarity prior. To cope with
this problem, we propose a novel Transformer-based method, Multi-stage
Spectral-wise Transformer (MST++), for efficient spectral reconstruction. In
particular, we employ Spectral-wise Multi-head Self-attention (S-MSA) that is
based on the HSI spatially sparse while spectrally self-similar nature to
compose the basic unit, Spectral-wise Attention Block (SAB). Then SABs build up
Single-stage Spectral-wise Transformer (SST) that exploits a U-shaped structure
to extract multi-resolution contextual information. Finally, our MST++,
cascaded by several SSTs, progressively improves the reconstruction quality
from coarse to fine. Comprehensive experiments show that our MST++
significantly outperforms other state-of-the-art methods. In the NTIRE 2022
Spectral Reconstruction Challenge, our approach won the First place. Code and
pre-trained models are publicly available at
https://github.com/caiyuanhao1998/MST-plus-plus.
Authors' comments: Winner of NTIRE 2022 Challenge on Spectral Reconstruction from RGB;
The First Transformer-based Method for Spectral Reconstruction
Furkan Kınlı, Barış Özcan, Furkan Kıraç
Image-level corruptions and perturbations degrade the performance of CNNs on
different downstream vision tasks. Social media filters are one of the most
common resources of various corruptions and perturbations for real-world visual
analysis applications. The negative effects of these distractive factors can be
alleviated by recovering the original images with their pure style for the
inference of the downstream vision tasks. Assuming these filters substantially
inject a piece of additional style information to the social media images, we
can formulate the problem of recovering the original versions as a reverse
style transfer problem. We introduce Contrastive Instagram Filter Removal
Network (CIFR), which enhances this idea for Instagram filter removal by
employing a novel multi-layer patch-wise contrastive style learning mechanism.
Experiments show our proposed strategy produces better qualitative and
quantitative results than the previous studies. Moreover, we present the
results of our additional experiments for proposed architecture within
different settings. Finally, we present the inference outputs and quantitative
comparison of filtered and recovered images on localization and segmentation
tasks to encourage the main motivation for this problem.
Authors' comments: Accepted to NTIRE: New Trends in Image Restoration and Enhancement
workshop and challenges at CVPR 2022
Amir Valizadeh
The human nervous system utilizes synaptic plasticity to solve optimization
problems. Previous studies have tried to add the plasticity factor to the
training process of artificial neural networks, but most of those models
require complex external control over the network or complex novel rules. In
this manuscript, a novel nature-inspired optimization algorithm is introduced
that imitates biological neural plasticity. Furthermore, the model is tested on
three datasets and the results are compared with gradient descent optimization.
Authors' comments: 12 pages, 4 figures
Zhengqi Gao, Sucheng Ren, Zihui Xue, Siting Li, Hang Zhao
Multimodal fusion emerges as an appealing technique to improve model performances on many tasks. Nevertheless, the robustness of such fusion methods is rarely involved in the present literature. In this paper, we propose a training-free robust late-fusion method by exploiting conditional independence assumption and Jacobian regularization. Our key is to minimize the Frobenius norm of a Jacobian matrix, where the resulting optimization problem is relaxed to a tractable Sylvester equation. Furthermore, we provide a theoretical error bound of our method and some insights about the function of the extra modality. Several numerical experiments on AV-MNIST, RAVDESS, and VGGsound demonstrate the efficacy of our method under both adversarial attacks and random corruptions.
Jan Friedrich, Daniela Moreno, Michael Sinhuber, Matthias Waechter, Joachim Peinke
Accurate models of turbulent wind fields have become increasingly important
in the atmospheric sciences, e.g., for the determination of spatiotemporal
correlations in wind parks, the estimation of individual loads on turbine rotor
and blades, or for the modeling of particle-turbulence interaction in
atmospheric clouds or pollutant distributions in urban settings. Due to the
prohibitive task of resolving the fields across a broad range of scales, one
oftentimes has to resort to stochastic wind field models that fulfill specific,
empirically observed, properties. Here, we present a new model for the
generation of synthetic wind fields that can be apprehended as an extension of
the well-known Mann model for inflow turbulence in the wind energy sciences.
Whereas such Gaussian random field models solely control second-order
statistics (i.e., velocity correlation tensors or kinetic energy spectra), we
explicitly show that our extended model emulates the effects of higher-order
statistics as well. Most importantly, the empirically observed phenomenon of
small-scale intermittency, which can be regarded as one of the key features of
atmospheric turbulent flows, is reproduced with high accuracy and at
considerably low computational cost. Our method is based on a recently
developed multipoint statistical description of turbulent velocity fields [J.
Friedrich et al., J. Phys. Complex. 2 045006 (2021)] and consists of a
superposition of multivariate Gaussian statistics with fluctuating covariances.
We demonstrate exemplarily how such "superstatistical" wind fields can be
constrained on a certain number of point-wise measurement data from a
meteorological mast array.
Authors' comments: 12 pages, 5 figures
Norihide Tokushige
Let $\mathcal G$ be a family of subsets of an $n$-element set. The family
$\mathcal G$ is called non-trivial $3$-wise intersecting if the intersection of
any three subsets in $\mathcal G$ is non-empty, but the intersection of all
subsets is empty. For a real number $p\in(0,1)$ we define the measure of the
family by the sum of $p^{|G|}(1-p)^{n-|G|}$ over all $G\in\mathcal G$. We
determine the maximum measure of non-trivial $3$-wise intersecting families. We
also discuss the uniqueness and stability of the corresponding optimal
structure. These results are obtained by solving linear programming problems.
Authors' comments: 30 pages
Haozhe Wang, Zhiyang Liu, Lei Zhou, Huan Yin, Marcelo H Ang
Vision-based grasp estimation is an essential part of robotic manipulation
tasks in the real world. Existing planar grasp estimation algorithms have been
demonstrated to work well in relatively simple scenes. But when it comes to
complex scenes, such as cluttered scenes with messy backgrounds and moving
objects, the algorithms from previous works are prone to generate inaccurate
and unstable grasping contact points. In this work, we first study the existing
planar grasp estimation algorithms and analyze the related challenges in
complex scenes. Secondly, we design a Pixel-wise Efficient Grasp Generation
Network (PEGG-Net) to tackle the problem of grasping in complex scenes.
PEGG-Net can achieve improved state-of-the-art performance on the Cornell
dataset (98.9%) and second-best performance on the Jacquard dataset (93.8%),
outperforming other existing algorithms without the introduction of complex
structures. Thirdly, PEGG-Net could operate in a closed-loop manner for added
robustness in dynamic environments using position-based visual servoing (PBVS).
Finally, we conduct real-world experiments on static, dynamic, and cluttered
objects in different complex scenes. The results show that our proposed network
achieves a high success rate in grasping irregular objects, household objects,
and workshop tools. To benefit the community, our trained model and
supplementary materials are available at https://github.com/HZWang96/PEGG-Net.
Authors' comments: An updated version of the paper. Fixed typos and added new content
Ryoma Kobayashi, Yusuke Mukuta, Tatsuya Harada
Learning from Label Proportions (LLP) is a weakly supervised learning method
that aims to perform instance classification from training data consisting of
pairs of bags containing multiple instances and the class label proportions
within the bags. Previous studies on multiclass LLP can be divided into two
categories according to the learning task: per-instance label classification
and per-bag label proportion estimation. However, these methods often results
in high variance estimates of the risk when applied to complex models, or lack
statistical learning theory arguments. To address this issue, we propose new
learning methods based on statistical learning theory for both per-instance and
per-bag policies. We demonstrate that the proposed methods are respectively
risk-consistent and classifier-consistent in an instance-wise manner, and
analyze the estimation error bounds. Additionally, we present a heuristic
approximation method that utilizes an existing method for regressing label
proportions to reduce the computational complexity of the proposed methods.
Through benchmark experiments, we demonstrated the effectiveness of the
proposed methods.
Authors' comments: 21 pages, 5 figures
Yuta Matsunaga, Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari
In this paper, we propose a method to generate personalized filled pauses
(FPs) with group-wise prediction models. Compared with fluent text generation,
disfluent text generation has not been widely explored. To generate more
human-like texts, we addressed disfluent text generation. The usage of
disfluency, such as FPs, rephrases, and word fragments, differs from speaker to
speaker, and thus, the generation of personalized FPs is required. However, it
is difficult to predict them because of the sparsity of position and the
frequency difference between more and less frequently used FPs. Moreover, it is
sometimes difficult to adapt FP prediction models to each speaker because of
the large variation of the tendency within each speaker. To address these
issues, we propose a method to build group-dependent prediction models by
grouping speakers on the basis of their tendency to use FPs. This method does
not require a large amount of data and time to train each speaker model. We
further introduce a loss function and a word embedding model suitable for FP
prediction. Our experimental results demonstrate that group-dependent models
can predict FPs with higher scores than a non-personalized one and the
introduced loss function and word embedding model improve the prediction
performance.
Authors' comments: Accepted to LREC 2022
Chen Tang, Kai Ouyang, Zhi Wang, Yifei Zhu, Yaowei Wang, Wen Ji, Wenwu Zhu
The exponentially large discrete search space in mixed-precision quantization
(MPQ) makes it hard to determine the optimal bit-width for each layer. Previous
works usually resort to iterative search methods on the training set, which
consume hundreds or even thousands of GPU-hours. In this study, we reveal that
some unique learnable parameters in quantization, namely the scale factors in
the quantizer, can serve as importance indicators of a layer, reflecting the
contribution of that layer to the final accuracy at certain bit-widths. These
importance indicators naturally perceive the numerical transformation during
quantization-aware training, which can precisely provide quantization
sensitivity metrics of layers. However, a deep network always contains hundreds
of such indicators, and training them one by one would lead to an excessive
time cost. To overcome this issue, we propose a joint training scheme that can
obtain all indicators at once. It considerably speeds up the indicators
training process by parallelizing the original sequential training processes.
With these learned importance indicators, we formulate the MPQ search problem
as a one-time integer linear programming (ILP) problem. That avoids the
iterative search and significantly reduces search time without limiting the
bit-width search space. For example, MPQ search on ResNet18 with our indicators
takes only 0.06 s, which improves time efficiency exponentially compared to
iterative search methods. Also, extensive experiments show our approach can
achieve SOTA accuracy on ImageNet for far-ranging models with various
constraints (e.g., BitOps, compress rate). Code is available on
https://github.com/1hunters/LIMPQ.
Authors' comments: Published on ECCV 2022, code is available on
https://github.com/1hunters/LIMPQ