Winfried Lötzsch, Max Reimann, Martin Büssemeyer, Amir Semmo, Jürgen Döllner, Matthias Trapp
Image-based artistic rendering can synthesize a variety of expressive styles
using algorithmic image filtering. In contrast to deep learning-based methods,
these heuristics-based filtering techniques can operate on high-resolution
images, are interpretable, and can be parameterized according to various design
aspects. However, adapting or extending these techniques to produce new styles
is often a tedious and error-prone task that requires expert knowledge. We
propose a new paradigm to alleviate this problem: implementing algorithmic
image filtering techniques as differentiable operations that can learn
parametrizations aligned to certain reference styles. To this end, we present
WISE, an example-based image-processing system that can handle a multitude of
stylization techniques, such as watercolor, oil or cartoon stylization, within
a common framework. By training parameter prediction networks for global and
local filter parameterizations, we can simultaneously adapt effects to
reference styles and image content, e.g., to enhance facial features. Our
method can be optimized in a style-transfer framework or learned in a
generative-adversarial setting for image-to-image translation. We demonstrate
that jointly training an XDoG filter and a CNN for postprocessing can achieve
comparable results to a state-of-the-art GAN-based method.
Authors' comments: Accepted to ECCV
Ankit Mishra, Sarika Jalan
Localization behaviours of Laplacian eigenvectors of complex networks provide
understanding to various dynamical phenomena on the corresponding complex
systems. We numerically investigate role of hyperedges in driving eigenvector
localization of hypergraphs Laplacians. By defining a single parameter \gamma
which measures the relative strengths of pair-wise and higher-order
interactions, we analyze the impact of interactions on localization properties.
For, \gamma < 1 there exists no impact of pairwise links on eigenvector
localization while the higher-order interactions instigate localization in the
larger eigenvalues. For \gamma > 1, pair-wise interactions cause localization
of eigenvector corresponding to small eigenvalues, where as higherorder
interactions, despite being much lesser than the pair-wise links, keep driving
localization of the eigenvectors corresponding to larger eigenvalues. The
results will be useful to understand dynamical phenomena such as diffusion, and
random walks on a range of real-world complex systems having higher-order
interactions.
Authors' comments: 8 pages, 7 figures
Li Shen, Yongpeng Wu, Derrick Wing Kwan Ng, Wenjun Zhang, Xiang-Gen Xia
In this letter, we propose a symbol-wise puncturing scheme to support hybrid
automatic repeat request (HARQ) integrated probabilistic amplitude shaping
(PAS). To prevent the probability distribution distortion caused by the
traditional sequential puncturing and realize the promised gain of PAS, we
perform symbol-wise puncturing on the label sequence of the shaped modulation
symbols. Our simulation results indicate that the proposed puncturing scheme
achieves a stable shaping gain across the signal-to-noise ratio of at least 0.6
dB compared with the uniform case under the same throughput, while the gain of
sequential puncturing drops rapidly with retransmissions. Moreover, in each
transmission, the proposed scheme is able to reduce the distribution distortion
that achieves over 1.2 dB gain at a block error rate (BLER) of ${10^{-3}}$. In
contrast, for sequential puncturing, the distribution is severely distorted and
the BLER performance is even worse than that of the uniform case in
retransmissions.
Authors' comments: 5 pages. This paper has been accepted by IEEE Wireless Communications
Letters
Tim Alderson, Benjamin Morine
A combinatorial problem concerning the maximum size of the (hamming) weight
set of an $[n,k]_q$ linear code was recently introduced. Codes attaining the
established upper bound are the Maximum Weight Spectrum (MWS) codes. Those
$[n,k]_q $ codes with the same weight set as $ \mathbb{F}_q^n $ are called Full
Weight Spectrum (FWS) codes. FWS codes are necessarily ``short", whereas MWS
codes are necessarily ``long". For fixed $ k,q $ the values of $ n $ for which
an $ [n,k]_q $-FWS code exists are completely determined, but the determination
of the minimum length $ M(H,k,q) $ of an $ [n,k]_q $-MWS code remains an open
problem. The current work broadens discussion first to general coordinate-wise
weight functions, and then specifically to the Lee weight and a Manhattan like
weight. In the general case we provide bounds on $ n $ for which an FWS code
exists, and bounds on $ n $ for which an MWS code exists. When specializing to
the Lee or to the Manhattan setting we are able to completely determine the
parameters of FWS codes. As with the Hamming case, we are able to provide an
upper bound on $ M(\mathcal{L},k,q) $ (the minimum length of Lee MWS codes),
and pose the determination of $ M(\mathcal{L},k,q) $ as an open problem. On the
other hand, with respect to the Manhattan weight we completely determine the
parameters of MWS codes.
Authors' comments: 17 pages
Vy Vo, Van Nguyen, Trung Le, Quan Hung Tran, Gholamreza Haffari, Seyit Camtepe, Dinh Phung
Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system. A large number of interpreting methods focus on identifying explanatory input features, which generally fall into two main categories: attribution and selection. A popular attribution-based approach is to exploit local neighborhoods for learning instance-specific explainers in an additive manner. The process is thus inefficient and susceptible to poorly-conditioned samples. Meanwhile, many selection-based methods directly optimize local feature distributions in an instance-wise training framework, thereby being capable of leveraging global information from other inputs. However, they can only interpret single-class predictions and many suffer from inconsistency across different settings, due to a strict reliance on a pre-defined number of features selected. This work exploits the strengths of both methods and proposes a framework for learning local explanations simultaneously for multiple target classes. Our model explainer significantly outperforms additive and instance-wise counterparts on faithfulness with more compact and comprehensible explanations. We also demonstrate the capacity to select stable and important features through extensive experiments on various data sets and black-box model architectures.
R. Sekhar Chivukula, Elizabeth H. Simmons, Xing Wang
In this work we demonstrate that the mixed gravitational and scalar sectors
of the five-dimensional Goldberger-Wise (GW) model, in which the size of a
warped extra dimension is dynamically determined, has a "hidden" dual $N=2$
supersymmetric structure. This symmetry structure, a generalization of one
found in the unstabilized Randall-Sundrum model, is a result of the
spontaneously broken five-dimensional diffeomorphism invariance of the
underlying gravitational theory. The supersymmetries relate the properties of
the spin-1 and spin-0 modes "eaten" by the massive spin-2 Kaluza-Klein states
of the theory to the mode functions of the spin-2 modes. Because the symmetries
relate the couplings and masses of the massive spin-2 states to those of the
tower of physical spin-0 states of the GW model, they enable us to analytically
prove the sum rule relations which ensure the tree-level scattering amplitudes
of the massive spin-2 states will grow no faster than ${\cal O}(s)$. The
analysis given here also explains the unconventional forms of the spin-0 mode
equation, boundary condition(s), and normalization found in the GW model.
Authors' comments: 32 pages
Dugang Liu, Pengxiang Cheng, Hong Zhu, Xing Tang, Yanyu Chen, Xiaoting Wang, Weike Pan, Zhong Ming et al.
Tabular data is one of the most common data storage formats behind many
real-world web applications such as retail, banking, and e-commerce. The
success of these web applications largely depends on the ability of the
employed machine learning model to accurately distinguish influential features
from all the predetermined features in tabular data. Intuitively, in practical
business scenarios, different instances should correspond to different sets of
influential features, and the set of influential features of the same instance
may vary in different scenarios. However, most existing methods focus on global
feature selection assuming that all instances have the same set of influential
features, and few methods considering instance-wise feature selection ignore
the variability of influential features in different scenarios. In this paper,
we first introduce a new perspective based on the influence function for
instance-wise feature selection, and give some corresponding theoretical
insights, the core of which is to use the influence function as an indicator to
measure the importance of an instance-wise feature. We then propose a new
solution for discovering instance-wise influential features in tabular data
(DIWIFT), where a self-attention network is used as a feature selection model
and the value of the corresponding influence function is used as an
optimization objective to guide the model. Benefiting from the advantage of the
influence function, i.e., its computation does not depend on a specific
architecture and can also take into account the data distribution in different
scenarios, our DIWIFT has better flexibility and robustness. Finally, we
conduct extensive experiments on both synthetic and real-world datasets to
validate the effectiveness of our DIWIFT.
Authors' comments: Accepted by TheWebConf 2023 Research Tracks
Tengyu Xu, Yue Wang, Shaofeng Zou, Yingbin Liang
The remarkable success of reinforcement learning (RL) heavily relies on
observing the reward of every visited state-action pair. In many real world
applications, however, an agent can observe only a score that represents the
quality of the whole trajectory, which is referred to as the {\em
trajectory-wise reward}. In such a situation, it is difficult for standard RL
methods to well utilize trajectory-wise reward, and large bias and variance
errors can be incurred in policy evaluation. In this work, we propose a novel
offline RL algorithm, called Pessimistic vAlue iteRaTion with rEward
Decomposition (PARTED), which decomposes the trajectory return into per-step
proxy rewards via least-squares-based reward redistribution, and then performs
pessimistic value iteration based on the learned proxy reward. To ensure the
value functions constructed by PARTED are always pessimistic with respect to
the optimal ones, we design a new penalty term to offset the uncertainty of the
proxy reward. For general episodic MDPs with large state space, we show that
PARTED with overparameterized neural network function approximation achieves an
$\tilde{\mathcal{O}}(D_{\text{eff}}H^2/\sqrt{N})$ suboptimality, where $H$ is
the length of episode, $N$ is the total number of samples, and $D_{\text{eff}}$
is the effective dimension of the neural tangent kernel matrix. To further
illustrate the result, we show that PARTED achieves an
$\tilde{\mathcal{O}}(dH^3/\sqrt{N})$ suboptimality with linear MDPs, where $d$
is the feature dimension, which matches with that with neural network function
approximation, when $D_{\text{eff}}=dH$. To the best of our knowledge, PARTED
is the first offline RL algorithm that is provably efficient in general MDP
with trajectory-wise reward.
Authors' comments: Submitted for IEEE Transactions on Information Theory
Xu Zhang, Guodong Li, Catherine C. Liu, Jianhua Guo
High-dimensional, higher-order tensor data are gaining prominence in a variety of fields, including but not limited to computer vision and network analysis. Tensor factor models, induced from noisy versions of tensor decompositions or factorizations, are natural potent instruments to study a collection of tensor-variate objects that may be dependent or independent. However, it is still in the early stage of developing statistical inferential theories for the estimation of various low-rank structures, which are customary to play the role of signals of tensor factor models. In this paper, we attempt to ``decode" the estimation of a higher-order tensor factor model by leveraging tensor matricization. Specifically, we recast it into mode-wise traditional high-dimensional vector/fiber factor models, enabling the deployment of conventional principal components analysis (PCA) for estimation. Demonstrated by the Tucker tensor factor model (TuTFaM), which is induced from the noisy version of the widely-used Tucker decomposition, we summarize that estimations on signal components are essentially mode-wise PCA techniques, and the involvement of projection and iteration will enhance the signal-to-noise ratio to various extent. We establish the inferential theory of the proposed estimators, conduct rich simulation experiments, and illustrate how the proposed estimations can work in tensor reconstruction, and clustering for independent video and dependent economic datasets, respectively.
Yuezihan Jiang, Hao Yang, Junyang Lin, Hanyu Zhao, An Yang, Chang Zhou, Hongxia Yang, Zhi Yang et al.
Prompt Learning has recently gained great popularity in bridging the gap between pretraining tasks and various downstream tasks. It freezes Pretrained Language Models (PLMs) and only tunes a few task-related parameters (prompts) for downstream tasks, greatly reducing the cost of tuning giant models. The key enabler of this is the idea of querying PLMs with task-specific knowledge implicated in prompts. This paper reveals a major limitation of existing methods that the indiscriminate prompts for all input data in a task ignore the intrinsic knowledge from input data, resulting in sub-optimal performance. We introduce Instance-wise Prompt Tuning (IPT), the first prompt learning paradigm that injects knowledge from the input data instances to the prompts, thereby providing PLMs with richer and more concrete context information. We devise a series of strategies to produce instance-wise prompts, addressing various concerns like model quality and cost-efficiency. Across multiple tasks and resource settings, IPT significantly outperforms task-based prompt learning methods, and achieves comparable performance to conventional finetuning with only 0.5% - 1.5% of tuned parameters.
Sándor Kunsági-Máté, Róbert Beck, István Szapudi, István Csabai
Three-dimensional wide-field galaxy surveys are fundamental for cosmological studies. For higher redshifts (z > 1.0), where galaxies are too faint, quasars still trace the large-scale structure of the Universe. Since available telescope time limits spectroscopic surveys, photometric methods are efficient for estimating redshifts for many quasars. Recently, machine learning methods are increasingly successful for quasar photometric redshifts, however, they hinge on the distribution of the training set. Therefore a rigorous estimation of reliability is critical. We extracted optical and infrared photometric data from the cross-matched catalogue of the WISE All-Sky and PS1 3$\pi$ DR2 sky surveys. We trained an XGBoost regressor and an artificial neural network on the relation between color indices and spectroscopic redshift. We approximated the effective training set coverage with the K nearest neighbors algorithm. We estimated reliable photometric redshifts of 2,879,298 quasars which overlap with the training set in feature space. We validated the derived redshifts with an independent, clustering-based redshift estimation technique. The final catalog is publicly available.
Yifan Chen, Tianning Xu, Dilek Hakkani-Tur, Di Jin, Yun Yang, Ruoqing Zhu
Multiple sampling-based methods have been developed for approximating and
accelerating node embedding aggregation in graph convolutional networks (GCNs)
training. Among them, a layer-wise approach recursively performs importance
sampling to select neighbors jointly for existing nodes in each layer. This
paper revisits the approach from a matrix approximation perspective, and
identifies two issues in the existing layer-wise sampling methods: suboptimal
sampling probabilities and estimation biases induced by sampling without
replacement. To address these issues, we accordingly propose two remedies: a
new principle for constructing sampling probabilities and an efficient
debiasing algorithm. The improvements are demonstrated by extensive analyses of
estimation variance and experiments on common benchmarks. Code and algorithm
implementations are publicly available at
https://github.com/ychen-stat-ml/GCN-layer-wise-sampling .
Authors' comments: Published at TMLR. Code is available at
https://github.com/ychen-stat-ml/GCN-layer-wise-sampling
Mehmet Ozgur Turkoglu, Alexander Becker, Hüseyin Anil Gündüz, Mina Rezaei, Bernd Bischl, Rodrigo Caye Daudt, Stefano D'Aronco, Jan Dirk Wegner et al.
The ability to estimate epistemic uncertainty is often crucial when deploying
machine learning in the real world, but modern methods often produce
overconfident, uncalibrated uncertainty predictions. A common approach to
quantify epistemic uncertainty, usable across a wide class of prediction
models, is to train a model ensemble. In a naive implementation, the ensemble
approach has high computational cost and high memory demand. This challenges in
particular modern deep learning, where even a single deep network is already
demanding in terms of compute and memory, and has given rise to a number of
attempts to emulate the model ensemble without actually instantiating separate
ensemble members. We introduce FiLM-Ensemble, a deep, implicit ensemble method
based on the concept of Feature-wise Linear Modulation (FiLM). That technique
was originally developed for multi-task learning, with the aim of decoupling
different tasks. We show that the idea can be extended to uncertainty
quantification: by modulating the network activations of a single deep network
with FiLM, one obtains a model ensemble with high diversity, and consequently
well-calibrated estimates of epistemic uncertainty, with low computational
overhead in comparison. Empirically, FiLM-Ensemble outperforms other implicit
ensemble methods, and it and comes very close to the upper bound of an explicit
ensemble of networks (sometimes even beating it), at a fraction of the memory
cost.
Authors' comments: accepted at NeurIPS 2022
Sebastian Krieter, Thomas Thüm, Sandro Schulze, Sebastian Ruland, Malte Lochau, Gunter Saake, Thomas Leich
Sampling techniques, such as t-wise interaction sampling are used to enable
efficient testing for configurable systems. This is achieved by generating a
small yet representative sample of configurations for a system, which
circumvents testing the entire solution space. However, by design, most recent
approaches for t-wise interaction sampling only consider combinations of
configuration options from a configurable system's variability model and do not
take into account their mapping onto the solution space, thus potentially
leaving critical implementation artifacts untested. Tartler et al. address this
problem by considering presence conditions of implementation artifacts rather
than pure configuration options, but do not consider the possible interactions
between these artifacts. In this paper, we introduce t-wise presence condition
coverage, which extends the approach of Tartler et al. by using presence
conditions extracted from the code as basis to cover t-wise interactions. This
ensures that all t-wise interactions of implementation artifacts are included
in the sample and that the chance of detecting combinations of faulty
configuration options is increased. We evaluate our approach in terms of
testing efficiency and testing effectiveness by comparing the approach to
existing t-wise interaction sampling techniques. We show that t-wise presence
condition sampling is able to produce mostly smaller samples compared to t-wise
interaction sampling, while guaranteeing a t-wise presence condition coverage
of 100%.
Authors' comments: 28 pages
Bowen Li, Jianfeng Lu, Ziang Yu
This work aims to numerically construct exactly commuting matrices close to
given almost commuting ones, which is equivalent to the joint approximate
diagonalization problem. We first prove that almost commuting matrices
generically have approximate common eigenvectors that are almost orthogonal to
each other. Based on this key observation, we propose a fast and robust
vector-wise joint diagonalization (VJD) algorithm, which constructs the
orthogonal similarity transform by sequentially finding these approximate
common eigenvectors. In doing so, we consider sub-optimization problems over
the unit sphere, for which we present a Riemannian quasi-Newton method with
rigorous convergence analysis. We also discuss the numerical stability of the
proposed VJD algorithm. Numerical examples with applications in independent
component analysis are provided to reveal the relation with Huaxin Lin's
theorem and to demonstrate that our method compares favorably with the
state-of-the-art Jacobi-type joint diagonalization algorithm.
Authors' comments: revised
Jiantao Wu, Shentong Mo
Self-supervised pre-training for images without labels has recently achieved promising performance in image classification. The success of transformer-based methods, ViT and MAE, draws the community's attention to the design of backbone architecture and self-supervised task. In this work, we show that current masked image encoding models learn the underlying relationship between all objects in the whole scene, instead of a single object representation. Therefore, those methods bring a lot of compute time for self-supervised pre-training. To solve this issue, we introduce a novel object selection and division strategy to drop non-object patches for learning object-wise representations by selective reconstruction with interested region masks. We refer to this method ObjMAE. Extensive experiments on four commonly-used datasets demonstrate the effectiveness of our model in reducing the compute cost by 72% while achieving competitive performance. Furthermore, we investigate the inter-object and intra-object relationship and find that the latter is crucial for self-supervised pre-training.
Samarth Tiwari, Michelle Yeo, Zeta Avarikioti, Iosif Salem, Krzysztof Pietrzak, Stefan Schmid
Payment channel networks (PCNs) are one of the most prominent solutions to the limited transaction throughput of blockchains. Nevertheless, PCNs suffer themselves from a throughput limitation due to the capital constraints of their channels. A similar dependence on high capital is also found in inter-bank payment settlements, where the so-called netting technique is used to mitigate liquidity demands. In this work, we alleviate this limitation by introducing the notion of transaction aggregation: instead of executing transactions sequentially through a PCN, we enable senders to aggregate multiple transactions and execute them simultaneously to benefit from several amounts that may "cancel out". Two direct advantages of our proposal is the decrease in intermediary fees paid by senders as well as the obfuscation of the transaction data from the intermediaries. We formulate the transaction aggregation as a computational problem, a generalization of the Bank Clearing Problem. We present a generic framework for the transaction aggregation execution, and thereafter we propose Wiser as an implementation of this framework in a specific hub-based setting. To overcome the NP-hardness of the transaction aggregation problem, in Wiser we propose a fixed-parameter linear algorithm for a special case of transaction aggregation as well as the Bank Clearing Problem. Wiser can also be seen as a modern variant of the Hawala money transfer system, as well as a decentralized implementation of the overseas remittance service of Wise.
Chae Eun Lee, Hyelim Park, Yeong-Gil Shin, Minyoung Chung
Semi-supervised learning for medical image segmentation is an important area of research for alleviating the huge cost associated with the construction of reliable large-scale annotations in the medical domain. Recent semi-supervised approaches have demonstrated promising results by employing consistency regularization, pseudo-labeling techniques, and adversarial learning. These methods primarily attempt to learn the distribution of labeled and unlabeled data by enforcing consistency in the predictions or embedding context. However, previous approaches have focused only on local discrepancy minimization or context relations across single classes. In this paper, we introduce a novel adversarial learning-based semi-supervised segmentation method that effectively embeds both local and global features from multiple hidden layers and learns context relations between multiple classes. Our voxel-wise adversarial learning method utilizes a voxel-wise feature discriminator, which considers multilayer voxel-wise features (involving both local and global features) as an input by embedding class-specific voxel-wise feature distribution. Furthermore, we improve our previous representation learning method by overcoming information loss and learning stability problems, which enables rich representations of labeled data. Our method outperforms current best-performing state-of-the-art semi-supervised learning approaches on the image segmentation of the left atrium (single class) and multiorgan datasets (multiclass). Moreover, our visual interpretation of the feature space demonstrates that our proposed method enables a well-distributed and separated feature space from both labeled and unlabeled data, which improves the overall prediction results.
Hrayr Harutyunyan, Greg Ver Steeg, Aram Galstyan
Some of the tightest information-theoretic generalization bounds depend on
the average information between the learned hypothesis and a single training
example. However, these sample-wise bounds were derived only for expected
generalization gap. We show that even for expected squared generalization gap
no such sample-wise information-theoretic bounds exist. The same is true for
PAC-Bayes and single-draw bounds. Remarkably, PAC-Bayes, single-draw and
expected squared generalization gap bounds that depend on information in pairs
of examples exist.
Authors' comments: 2022 IEEE Information Theory Workshop
Xiaosong Ma, Jie Zhang, Song Guo, Wenchao Xu
Personalized Federated Learning (pFL) not only can capture the common priors from broad range of distributed data, but also support customized models for heterogeneous clients. Researches over the past few years have applied the weighted aggregation manner to produce personalized models, where the weights are determined by calibrating the distance of the entire model parameters or loss values, and have yet to consider the layer-level impacts to the aggregation process, leading to lagged model convergence and inadequate personalization over non-IID datasets. In this paper, we propose a novel pFL training framework dubbed Layer-wised Personalized Federated learning (pFedLA) that can discern the importance of each layer from different clients, and thus is able to optimize the personalized model aggregation for clients with heterogeneous data. Specifically, we employ a dedicated hypernetwork per client on the server side, which is trained to identify the mutual contribution factors at layer granularity. Meanwhile, a parameterized mechanism is introduced to update the layer-wised aggregation weights to progressively exploit the inter-user similarity and realize accurate model personalization. Extensive experiments are conducted over different models and learning tasks, and we show that the proposed methods achieve significantly higher performance than state-of-the-art pFL methods.