Xiaojun Xu, Linyi Li, Bo Li
Recent studies show that training deep neural networks (DNNs) with Lipschitz
constraints are able to enhance adversarial robustness and other model
properties such as stability. In this paper, we propose a layer-wise orthogonal
training method (LOT) to effectively train 1-Lipschitz convolution layers via
parametrizing an orthogonal matrix with an unconstrained matrix. We then
efficiently compute the inverse square root of a convolution kernel by
transforming the input domain to the Fourier frequency domain. On the other
hand, as existing works show that semi-supervised training helps improve
empirical robustness, we aim to bridge the gap and prove that semi-supervised
learning also improves the certified robustness of Lipschitz-bounded models. We
conduct comprehensive evaluations for LOT under different settings. We show
that LOT significantly outperforms baselines regarding deterministic l2
certified robustness, and scales to deeper neural networks. Under the
supervised scenario, we improve the state-of-the-art certified robustness for
all architectures (e.g. from 59.04% to 63.50% on CIFAR-10 and from 32.57% to
34.59% on CIFAR-100 at radius rho = 36/255 for 40-layer networks). With
semi-supervised learning over unlabelled data, we are able to improve
state-of-the-art certified robustness on CIFAR-10 at rho = 108/255 from 36.04%
to 42.39%. In addition, LOT consistently outperforms baselines on different
model architectures with only 1/3 evaluation time.
Authors' comments: NeurIPS 2022
Dong-Hee Paek, Kevin Tirta Wijaya, Seung-Hyun Kong
Lane detection is one of the most important functions for autonomous driving.
In recent years, deep learning-based lane detection networks with RGB camera
images have shown promising performance. However, camera-based methods are
inherently vulnerable to adverse lighting conditions such as poor or dazzling
lighting. Unlike camera, LiDAR sensor is robust to the lighting conditions. In
this work, we propose a novel two-stage LiDAR lane detection network with
row-wise detection approach. The first-stage network produces lane proposals
through a global feature correlator backbone and a row-wise detection head.
Meanwhile, the second-stage network refines the feature map of the first-stage
network via attention-based mechanism between the local features around the
lane proposals, and outputs a set of new lane proposals. Experimental results
on the K-Lane dataset show that the proposed network advances the
state-of-the-art in terms of F1-score with 30% less GFLOPs. In addition, the
second-stage network is found to be especially robust to lane occlusions, thus,
demonstrating the robustness of the proposed network for driving in crowded
environments.
Authors' comments: Accepted at 2022 IEEE Conference on Intelligent Transportation
Systems (ITSC)
Tran Van Sang, Mhd Irvan, Rie Shigetomi Yamaguchi, Toshiyuki Nakata
Natural Gradient Descent (NGD) is a second-order neural network training that preconditions the gradient descent with the inverse of the Fisher Information Matrix (FIM). Although NGD provides an efficient preconditioner, it is not practicable due to the expensive computation required when inverting the FIM. This paper proposes a new NGD variant algorithm named Component-Wise Natural Gradient Descent (CW-NGD). CW-NGD is composed of 2 steps. Similar to several existing works, the first step is to consider the FIM matrix as a block-diagonal matrix whose diagonal blocks correspond to the FIM of each layer's weights. In the second step, unique to CW-NGD, we analyze the layer's structure and further decompose the layer's FIM into smaller segments whose derivatives are approximately independent. As a result, individual layers' FIMs are approximated in a block-diagonal form that trivially supports the inversion. The segment decomposition strategy is varied by layer structure. Specifically, we analyze the dense and convolutional layers and design their decomposition strategies appropriately. In an experiment of training a network containing these 2 types of layers, we empirically prove that CW-NGD requires fewer iterations to converge compared to the state-of-the-art first-order and second-order methods.
Avraham Chapman, Lingqiao Liu
It is well-known that a deep neural network has a strong fitting capability
and can easily achieve a low training error even with randomly assigned class
labels. When the number of training samples is small, or the class labels are
noisy, networks tend to memorize patterns specific to individual instances to
minimize the training error. This leads to the issue of overfitting and poor
generalisation performance. This paper explores a remedy by suppressing the
network's tendency to rely on instance-specific patterns for empirical error
minimisation. The proposed method is based on an adversarial training
framework. It suppresses features that can be utilized to identify individual
instances among samples within each class. This leads to classifiers only using
features that are both discriminative across classes and common within each
class. We call our method Adversarial Suppression of Identity Features (ASIF),
and demonstrate the usefulness of this technique in boosting generalisation
accuracy when faced with small datasets or noisy labels. Our source code is
available.
Authors' comments: DICTA 2022
Katelinh Jones, Yuya Jeremy Ong, Yi Zhou, Nathalie Baracaldo
Federated Learning (FL) is a paradigm for jointly training machine learning
algorithms in a decentralized manner which allows for parties to communicate
with an aggregator to create and train a model, without exposing the underlying
raw data distribution of the local parties involved in the training process.
Most research in FL has been focused on Neural Network-based approaches,
however Tree-Based methods, such as XGBoost, have been underexplored in
Federated Learning due to the challenges in overcoming the iterative and
additive characteristics of the algorithm. Decision tree-based models, in
particular XGBoost, can handle non-IID data, which is significant for
algorithms used in Federated Learning frameworks since the underlying
characteristics of the data are decentralized and have risks of being non-IID
by nature. In this paper, we focus on investigating the effects of how
Federated XGBoost is impacted by non-IID distributions by performing
experiments on various sample size-based data skew scenarios and how these
models perform under various non-IID scenarios. We conduct a set of extensive
experiments across multiple different datasets and different data skew
partitions. Our experimental results demonstrate that despite the various
partition ratios, the performance of the models stayed consistent and performed
close to or equally well against models that were trained in a centralized
manner.
Authors' comments: 9 Pages, 1 figure, 3 tables
Hao-Wei Chen, Ting-Hsuan Liao, Hsuan-Kung Yang, Chun-Yi Lee
This paper introduces pixel-wise prediction based visual odometry (PWVO), which is a dense prediction task that evaluates the values of translation and rotation for every pixel in its input observations. PWVO employs uncertainty estimation to identify the noisy regions in the input observations, and adopts a selection mechanism to integrate pixel-wise predictions based on the estimated uncertainty maps to derive the final translation and rotation. In order to train PWVO in a comprehensive fashion, we further develop a data generation workflow for generating synthetic training data. The experimental results show that PWVO is able to deliver favorable results. In addition, our analyses validate the effectiveness of the designs adopted in PWVO, and demonstrate that the uncertainty maps estimated by PWVO is capable of capturing the noises in its input observations.
Faranak Tohidi, Manoranjan Paul, Anwaar Ulhaq
With the fast growth of immersive video sequences, achieving seamless and high-quality compressed 3D content is even more critical. MPEG recently developed a video-based point cloud compression (V-PCC) standard for dynamic point cloud coding. However, reconstructed point clouds using V-PCC suffer from different artifacts, including losing data during pre-processing before applying existing video coding techniques, e.g., High-Efficiency Video Coding (HEVC). Patch generations and self-occluded points in the 3D to the 2D projection are the main reasons for missing data using V-PCC. This paper proposes a new method that introduces overlapping slicing as an alternative to patch generation to decrease the number of patches generated and the amount of data lost. In the proposed method, the entire point cloud has been cross-sectioned into variable-sized slices based on the number of self-occluded points so that data loss can be minimized in the patch generation process and projection. For this, a variable number of layers are considered, partially overlapped to retain the self-occluded points. The proposed method's added advantage is to reduce the bits requirement and to encode geometric data using the slicing base position. The experimental results show that the proposed method is much more flexible than the standard V-PCC method, improves the rate-distortion performance, and decreases the data loss significantly compared to the standard V-PCC method.
Yunpeng Bai, Chao Dong, Cairong Wang
We study how to represent a video with implicit neural representations
(INRs). Classical INRs methods generally utilize MLPs to map input coordinates
to output pixels. While some recent works have tried to directly reconstruct
the whole image with CNNs. However, we argue that both the above pixel-wise and
image-wise strategies are not favorable to video data. Instead, we propose a
patch-wise solution, PS-NeRV, which represents videos as a function of patches
and the corresponding patch coordinate. It naturally inherits the advantages of
image-wise methods, and achieves excellent reconstruction performance with fast
decoding speed. The whole method includes conventional modules, like positional
embedding, MLPs and CNNs, while also introduces AdaIN to enhance intermediate
features. These simple yet essential changes could help the network easily fit
high-frequency details. Extensive experiments have demonstrated its
effectiveness in several video-related tasks, such as video compression and
video inpainting.
Authors' comments: 9 pages, 11 figures
Winfried Lötzsch, Max Reimann, Martin Büssemeyer, Amir Semmo, Jürgen Döllner, Matthias Trapp
Image-based artistic rendering can synthesize a variety of expressive styles
using algorithmic image filtering. In contrast to deep learning-based methods,
these heuristics-based filtering techniques can operate on high-resolution
images, are interpretable, and can be parameterized according to various design
aspects. However, adapting or extending these techniques to produce new styles
is often a tedious and error-prone task that requires expert knowledge. We
propose a new paradigm to alleviate this problem: implementing algorithmic
image filtering techniques as differentiable operations that can learn
parametrizations aligned to certain reference styles. To this end, we present
WISE, an example-based image-processing system that can handle a multitude of
stylization techniques, such as watercolor, oil or cartoon stylization, within
a common framework. By training parameter prediction networks for global and
local filter parameterizations, we can simultaneously adapt effects to
reference styles and image content, e.g., to enhance facial features. Our
method can be optimized in a style-transfer framework or learned in a
generative-adversarial setting for image-to-image translation. We demonstrate
that jointly training an XDoG filter and a CNN for postprocessing can achieve
comparable results to a state-of-the-art GAN-based method.
Authors' comments: Accepted to ECCV
Ankit Mishra, Sarika Jalan
Localization behaviours of Laplacian eigenvectors of complex networks provide
understanding to various dynamical phenomena on the corresponding complex
systems. We numerically investigate role of hyperedges in driving eigenvector
localization of hypergraphs Laplacians. By defining a single parameter \gamma
which measures the relative strengths of pair-wise and higher-order
interactions, we analyze the impact of interactions on localization properties.
For, \gamma < 1 there exists no impact of pairwise links on eigenvector
localization while the higher-order interactions instigate localization in the
larger eigenvalues. For \gamma > 1, pair-wise interactions cause localization
of eigenvector corresponding to small eigenvalues, where as higherorder
interactions, despite being much lesser than the pair-wise links, keep driving
localization of the eigenvectors corresponding to larger eigenvalues. The
results will be useful to understand dynamical phenomena such as diffusion, and
random walks on a range of real-world complex systems having higher-order
interactions.
Authors' comments: 8 pages, 7 figures
Li Shen, Yongpeng Wu, Derrick Wing Kwan Ng, Wenjun Zhang, Xiang-Gen Xia
In this letter, we propose a symbol-wise puncturing scheme to support hybrid
automatic repeat request (HARQ) integrated probabilistic amplitude shaping
(PAS). To prevent the probability distribution distortion caused by the
traditional sequential puncturing and realize the promised gain of PAS, we
perform symbol-wise puncturing on the label sequence of the shaped modulation
symbols. Our simulation results indicate that the proposed puncturing scheme
achieves a stable shaping gain across the signal-to-noise ratio of at least 0.6
dB compared with the uniform case under the same throughput, while the gain of
sequential puncturing drops rapidly with retransmissions. Moreover, in each
transmission, the proposed scheme is able to reduce the distribution distortion
that achieves over 1.2 dB gain at a block error rate (BLER) of ${10^{-3}}$. In
contrast, for sequential puncturing, the distribution is severely distorted and
the BLER performance is even worse than that of the uniform case in
retransmissions.
Authors' comments: 5 pages. This paper has been accepted by IEEE Wireless Communications
Letters
Tim Alderson, Benjamin Morine
A combinatorial problem concerning the maximum size of the (hamming) weight
set of an $[n,k]_q$ linear code was recently introduced. Codes attaining the
established upper bound are the Maximum Weight Spectrum (MWS) codes. Those
$[n,k]_q $ codes with the same weight set as $ \mathbb{F}_q^n $ are called Full
Weight Spectrum (FWS) codes. FWS codes are necessarily ``short", whereas MWS
codes are necessarily ``long". For fixed $ k,q $ the values of $ n $ for which
an $ [n,k]_q $-FWS code exists are completely determined, but the determination
of the minimum length $ M(H,k,q) $ of an $ [n,k]_q $-MWS code remains an open
problem. The current work broadens discussion first to general coordinate-wise
weight functions, and then specifically to the Lee weight and a Manhattan like
weight. In the general case we provide bounds on $ n $ for which an FWS code
exists, and bounds on $ n $ for which an MWS code exists. When specializing to
the Lee or to the Manhattan setting we are able to completely determine the
parameters of FWS codes. As with the Hamming case, we are able to provide an
upper bound on $ M(\mathcal{L},k,q) $ (the minimum length of Lee MWS codes),
and pose the determination of $ M(\mathcal{L},k,q) $ as an open problem. On the
other hand, with respect to the Manhattan weight we completely determine the
parameters of MWS codes.
Authors' comments: 17 pages
Vy Vo, Van Nguyen, Trung Le, Quan Hung Tran, Gholamreza Haffari, Seyit Camtepe, Dinh Phung
Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system. A large number of interpreting methods focus on identifying explanatory input features, which generally fall into two main categories: attribution and selection. A popular attribution-based approach is to exploit local neighborhoods for learning instance-specific explainers in an additive manner. The process is thus inefficient and susceptible to poorly-conditioned samples. Meanwhile, many selection-based methods directly optimize local feature distributions in an instance-wise training framework, thereby being capable of leveraging global information from other inputs. However, they can only interpret single-class predictions and many suffer from inconsistency across different settings, due to a strict reliance on a pre-defined number of features selected. This work exploits the strengths of both methods and proposes a framework for learning local explanations simultaneously for multiple target classes. Our model explainer significantly outperforms additive and instance-wise counterparts on faithfulness with more compact and comprehensible explanations. We also demonstrate the capacity to select stable and important features through extensive experiments on various data sets and black-box model architectures.
R. Sekhar Chivukula, Elizabeth H. Simmons, Xing Wang
In this work we demonstrate that the mixed gravitational and scalar sectors
of the five-dimensional Goldberger-Wise (GW) model, in which the size of a
warped extra dimension is dynamically determined, has a "hidden" dual $N=2$
supersymmetric structure. This symmetry structure, a generalization of one
found in the unstabilized Randall-Sundrum model, is a result of the
spontaneously broken five-dimensional diffeomorphism invariance of the
underlying gravitational theory. The supersymmetries relate the properties of
the spin-1 and spin-0 modes "eaten" by the massive spin-2 Kaluza-Klein states
of the theory to the mode functions of the spin-2 modes. Because the symmetries
relate the couplings and masses of the massive spin-2 states to those of the
tower of physical spin-0 states of the GW model, they enable us to analytically
prove the sum rule relations which ensure the tree-level scattering amplitudes
of the massive spin-2 states will grow no faster than ${\cal O}(s)$. The
analysis given here also explains the unconventional forms of the spin-0 mode
equation, boundary condition(s), and normalization found in the GW model.
Authors' comments: 32 pages
Dugang Liu, Pengxiang Cheng, Hong Zhu, Xing Tang, Yanyu Chen, Xiaoting Wang, Weike Pan, Zhong Ming et al.
Tabular data is one of the most common data storage formats behind many
real-world web applications such as retail, banking, and e-commerce. The
success of these web applications largely depends on the ability of the
employed machine learning model to accurately distinguish influential features
from all the predetermined features in tabular data. Intuitively, in practical
business scenarios, different instances should correspond to different sets of
influential features, and the set of influential features of the same instance
may vary in different scenarios. However, most existing methods focus on global
feature selection assuming that all instances have the same set of influential
features, and few methods considering instance-wise feature selection ignore
the variability of influential features in different scenarios. In this paper,
we first introduce a new perspective based on the influence function for
instance-wise feature selection, and give some corresponding theoretical
insights, the core of which is to use the influence function as an indicator to
measure the importance of an instance-wise feature. We then propose a new
solution for discovering instance-wise influential features in tabular data
(DIWIFT), where a self-attention network is used as a feature selection model
and the value of the corresponding influence function is used as an
optimization objective to guide the model. Benefiting from the advantage of the
influence function, i.e., its computation does not depend on a specific
architecture and can also take into account the data distribution in different
scenarios, our DIWIFT has better flexibility and robustness. Finally, we
conduct extensive experiments on both synthetic and real-world datasets to
validate the effectiveness of our DIWIFT.
Authors' comments: Accepted by TheWebConf 2023 Research Tracks
Tengyu Xu, Yue Wang, Shaofeng Zou, Yingbin Liang
The remarkable success of reinforcement learning (RL) heavily relies on
observing the reward of every visited state-action pair. In many real world
applications, however, an agent can observe only a score that represents the
quality of the whole trajectory, which is referred to as the {\em
trajectory-wise reward}. In such a situation, it is difficult for standard RL
methods to well utilize trajectory-wise reward, and large bias and variance
errors can be incurred in policy evaluation. In this work, we propose a novel
offline RL algorithm, called Pessimistic vAlue iteRaTion with rEward
Decomposition (PARTED), which decomposes the trajectory return into per-step
proxy rewards via least-squares-based reward redistribution, and then performs
pessimistic value iteration based on the learned proxy reward. To ensure the
value functions constructed by PARTED are always pessimistic with respect to
the optimal ones, we design a new penalty term to offset the uncertainty of the
proxy reward. For general episodic MDPs with large state space, we show that
PARTED with overparameterized neural network function approximation achieves an
$\tilde{\mathcal{O}}(D_{\text{eff}}H^2/\sqrt{N})$ suboptimality, where $H$ is
the length of episode, $N$ is the total number of samples, and $D_{\text{eff}}$
is the effective dimension of the neural tangent kernel matrix. To further
illustrate the result, we show that PARTED achieves an
$\tilde{\mathcal{O}}(dH^3/\sqrt{N})$ suboptimality with linear MDPs, where $d$
is the feature dimension, which matches with that with neural network function
approximation, when $D_{\text{eff}}=dH$. To the best of our knowledge, PARTED
is the first offline RL algorithm that is provably efficient in general MDP
with trajectory-wise reward.
Authors' comments: Submitted for IEEE Transactions on Information Theory
Xu Zhang, Guodong Li, Catherine C. Liu, Jianhua Guo
High-dimensional, higher-order tensor data are gaining prominence in a variety of fields, including but not limited to computer vision and network analysis. Tensor factor models, induced from noisy versions of tensor decompositions or factorizations, are natural potent instruments to study a collection of tensor-variate objects that may be dependent or independent. However, it is still in the early stage of developing statistical inferential theories for the estimation of various low-rank structures, which are customary to play the role of signals of tensor factor models. In this paper, we attempt to ``decode" the estimation of a higher-order tensor factor model by leveraging tensor matricization. Specifically, we recast it into mode-wise traditional high-dimensional vector/fiber factor models, enabling the deployment of conventional principal components analysis (PCA) for estimation. Demonstrated by the Tucker tensor factor model (TuTFaM), which is induced from the noisy version of the widely-used Tucker decomposition, we summarize that estimations on signal components are essentially mode-wise PCA techniques, and the involvement of projection and iteration will enhance the signal-to-noise ratio to various extent. We establish the inferential theory of the proposed estimators, conduct rich simulation experiments, and illustrate how the proposed estimations can work in tensor reconstruction, and clustering for independent video and dependent economic datasets, respectively.
Yuezihan Jiang, Hao Yang, Junyang Lin, Hanyu Zhao, An Yang, Chang Zhou, Hongxia Yang, Zhi Yang et al.
Prompt Learning has recently gained great popularity in bridging the gap between pretraining tasks and various downstream tasks. It freezes Pretrained Language Models (PLMs) and only tunes a few task-related parameters (prompts) for downstream tasks, greatly reducing the cost of tuning giant models. The key enabler of this is the idea of querying PLMs with task-specific knowledge implicated in prompts. This paper reveals a major limitation of existing methods that the indiscriminate prompts for all input data in a task ignore the intrinsic knowledge from input data, resulting in sub-optimal performance. We introduce Instance-wise Prompt Tuning (IPT), the first prompt learning paradigm that injects knowledge from the input data instances to the prompts, thereby providing PLMs with richer and more concrete context information. We devise a series of strategies to produce instance-wise prompts, addressing various concerns like model quality and cost-efficiency. Across multiple tasks and resource settings, IPT significantly outperforms task-based prompt learning methods, and achieves comparable performance to conventional finetuning with only 0.5% - 1.5% of tuned parameters.
Sándor Kunsági-Máté, Róbert Beck, István Szapudi, István Csabai
Three-dimensional wide-field galaxy surveys are fundamental for cosmological studies. For higher redshifts (z > 1.0), where galaxies are too faint, quasars still trace the large-scale structure of the Universe. Since available telescope time limits spectroscopic surveys, photometric methods are efficient for estimating redshifts for many quasars. Recently, machine learning methods are increasingly successful for quasar photometric redshifts, however, they hinge on the distribution of the training set. Therefore a rigorous estimation of reliability is critical. We extracted optical and infrared photometric data from the cross-matched catalogue of the WISE All-Sky and PS1 3$\pi$ DR2 sky surveys. We trained an XGBoost regressor and an artificial neural network on the relation between color indices and spectroscopic redshift. We approximated the effective training set coverage with the K nearest neighbors algorithm. We estimated reliable photometric redshifts of 2,879,298 quasars which overlap with the training set in feature space. We validated the derived redshifts with an independent, clustering-based redshift estimation technique. The final catalog is publicly available.
Yifan Chen, Tianning Xu, Dilek Hakkani-Tur, Di Jin, Yun Yang, Ruoqing Zhu
Multiple sampling-based methods have been developed for approximating and
accelerating node embedding aggregation in graph convolutional networks (GCNs)
training. Among them, a layer-wise approach recursively performs importance
sampling to select neighbors jointly for existing nodes in each layer. This
paper revisits the approach from a matrix approximation perspective, and
identifies two issues in the existing layer-wise sampling methods: suboptimal
sampling probabilities and estimation biases induced by sampling without
replacement. To address these issues, we accordingly propose two remedies: a
new principle for constructing sampling probabilities and an efficient
debiasing algorithm. The improvements are demonstrated by extensive analyses of
estimation variance and experiments on common benchmarks. Code and algorithm
implementations are publicly available at
https://github.com/ychen-stat-ml/GCN-layer-wise-sampling .
Authors' comments: Published at TMLR. Code is available at
https://github.com/ychen-stat-ml/GCN-layer-wise-sampling