Jiuyu Liu, Yi Ma, Rahim Tafazolli
Numerous low-complexity iterative algorithms have been proposed to offer the
performance of linear multiple-input multiple-output (MIMO) detectors bypassing
the channel matrix inverse. These algorithms exhibit fast convergence in
well-conditioned MIMO channels. However, in the emerging MIMO paradigm
utilizing extremely large aperture arrays (ELAA), the wireless channel may
become ill-conditioned because of spatial non-stationarity, which results in a
considerably slower convergence rate for these algorithms. In this paper, we
propose a novel ELAA-MIMO detection scheme that leverages user-wise singular
value decomposition (UW-SVD) to accelerate the convergence of these iterative
algorithms. By applying UW-SVD, the MIMO signal model can be converted into an
equivalent form featuring a better-conditioned transfer function. Then,
existing iterative algorithms can be utilized to recover the transmitted signal
from the converted signal model with accelerated convergence towards
zero-forcing performance. Our simulation results indicate that proposed UW-SVD
scheme can significantly accelerate the convergence of the iterative algorithms
in spatially non-stationary ELAA channels. Moreover, the computational
complexity of the UW-SVD is comparatively minor in relation to the inherent
complexity of the iterative algorithms.
Authors' comments: Legend correction to Fig. 2(b)
Paul Primus and, Gerhard Widmer
Varying conditions between the data seen at training and at application time
remain a major challenge for machine learning. We study this problem in the
context of Acoustic Scene Classification (ASC) with mismatching recording
devices. Previous works successfully employed frequency-wise normalization of
inputs and hidden layer activations in convolutional neural networks to reduce
the recording device discrepancy. The main objective of this work was to adopt
frequency-wise normalization for Audio Spectrogram Transformers (ASTs), which
have recently become the dominant model architecture in ASC. To this end, we
first investigate how recording device characteristics are encoded in the
hidden layer activations of ASTs. We find that recording device information is
initially encoded in the frequency dimension; however, after the first
self-attention block, it is largely transformed into the token dimension. Based
on this observation, we conjecture that suppressing recording device
characteristics in the input spectrogram is the most effective. We propose a
frequency-centering operation for spectrograms that improves the ASC
performance on unseen recording devices on average by up to 18.2 percentage
points.
Authors' comments: EUSIPCO 2023
Lan Anh Thi Nguy, Bach Nguyen Gia, Thanh Tu Thi Nguyen, Kamioka Eiji, Tan Xuan Phan
Eye blinking detection in the wild plays an essential role in deception detection, driving fatigue detection, etc. Despite the fact that numerous attempts have already been made, the majority of them have encountered difficulties, such as the derived eye images having different resolutions as the distance between the face and the camera changes; or the requirement of a lightweight detection model to obtain a short inference time in order to perform in real-time. In this research, two problems are addressed: how the eye blinking detection model can learn efficiently from different resolutions of eye pictures in diverse conditions; and how to reduce the size of the detection model for faster inference time. We propose to utilize upsampling and downsampling the input eye images to the same resolution as one potential solution for the first problem, then find out which interpolation method can result in the highest performance of the detection model. For the second problem, although a recent spatiotemporal convolutional neural network used for eye blinking detection has a strong capacity to extract both spatial and temporal characteristics, it remains having a high number of network parameters, leading to high inference time. Therefore, using Depth-wise Separable Convolution rather than conventional convolution layers inside each branch is considered in this paper as a feasible solution.
Dong Xing, Pengjie Gu, Qian Zheng, Xinrun Wang, Shanqi Liu, Longtao Zheng, Bo An, Gang Pan
Ad hoc teamwork requires an agent to cooperate with unknown teammates without
prior coordination. Many works propose to abstract teammate instances into
high-level representation of types and then pre-train the best response for
each type. However, most of them do not consider the distribution of teammate
instances within a type. This could expose the agent to the hidden risk of
\emph{type confounding}. In the worst case, the best response for an abstract
teammate type could be the worst response for all specific instances of that
type. This work addresses the issue from the lens of causal inference. We first
theoretically demonstrate that this phenomenon is due to the spurious
correlation brought by uncontrolled teammate distribution. Then, we propose our
solution, CTCAT, which disentangles such correlation through an instance-wise
teammate feedback rectification. This operation reweights the interaction of
teammate instances within a shared type to reduce the influence of type
confounding. The effect of CTCAT is evaluated in multiple domains, including
classic ad hoc teamwork tasks and real-world scenarios. Results show that CTCAT
is robust to the influence of type confounding, a practical issue that directly
hazards the robustness of our trained agents but was unnoticed in previous
works.
Authors' comments: Accepted by ICML 2023
Claudia Chinea Hammecher, Karin van Garderen, Marion Smits, Pieter Wesseling, Bart Westerman, Pim French, Mathilde Kouwenhoven, Roel Verhaak et al.
Glioma growth may be quantified with longitudinal image registration.
However, the large mass-effects and tissue changes across images pose an added
challenge. Here, we propose a longitudinal, learning-based, and groupwise
registration method for the accurate and unbiased registration of glioma MRI.
We evaluate on a dataset from the Glioma Longitudinal AnalySiS consortium and
compare it to classical registration methods. We achieve comparable Dice
coefficients, with more detailed registrations, while significantly reducing
the runtime to under a minute. The proposed methods may serve as an alternative
to classical toolboxes, to provide further insight into glioma growth.
Authors' comments: Digital poster presented at the annual meeting of the International
Society for Magnetic Resonance in Medicine (ISMRM) 2023. A 6 minute video
about this work is available for browsing by the conference website (Program
number: 4361)
Xiaoyun Li, Ping Li
Minwise hashing (MinHash) is a standard algorithm widely used in the industry, for large-scale search and learning applications with the binary (0/1) Jaccard similarity. One common use of MinHash is for processing massive n-gram text representations so that practitioners do not have to materialize the original data (which would be prohibitive). Another popular use of MinHash is for building hash tables to enable sub-linear time approximate near neighbor (ANN) search. MinHash has also been used as a tool for building large-scale machine learning systems. The standard implementation of MinHash requires applying $K$ random permutations. In comparison, the method of one permutation hashing (OPH), is an efficient alternative of MinHash which splits the data vectors into $K$ bins and generates hash values within each bin. OPH is substantially more efficient and also more convenient to use. In this paper, we combine the differential privacy (DP) with OPH (as well as MinHash), to propose the DP-OPH framework with three variants: DP-OPH-fix, DP-OPH-re and DP-OPH-rand, depending on which densification strategy is adopted to deal with empty bins in OPH. A detailed roadmap to the algorithm design is presented along with the privacy analysis. An analytical comparison of our proposed DP-OPH methods with the DP minwise hashing (DP-MH) is provided to justify the advantage of DP-OPH. Experiments on similarity search confirm the merits of DP-OPH, and guide the choice of the proper variant in different practical scenarios. Our technique is also extended to bin-wise consistent weighted sampling (BCWS) to develop a new DP algorithm called DP-BCWS for non-binary data. Experiments on classification tasks demonstrate that DP-BCWS is able to achieve excellent utility at around $\epsilon = 5\sim 10$, where $\epsilon$ is the standard parameter in the language of $(\epsilon, \delta)$-DP.
Donghao Li, Ruiquan Huang, Cong Shen, Jing Yang
This paper investigates conservative exploration in reinforcement learning
where the performance of the learning agent is guaranteed to be above a certain
threshold throughout the learning process. It focuses on the tabular episodic
Markov Decision Process (MDP) setting that has finite states and actions. With
the knowledge of an existing safe baseline policy, an algorithm termed as
StepMix is proposed to balance the exploitation and exploration while ensuring
that the conservative constraint is never violated in each episode with high
probability. StepMix features a unique design of a mixture policy that
adaptively and smoothly interpolates between the baseline policy and the
optimistic policy. Theoretical analysis shows that StepMix achieves
near-optimal regret order as in the constraint-free setting, indicating that
obeying the stringent episode-wise conservative constraint does not compromise
the learning performance. Besides, a randomization-based EpsMix algorithm is
also proposed and shown to achieve the same performance as StepMix. The
algorithm design and theoretical analysis are further extended to the setting
where the baseline policy is not given a priori but must be learned from an
offline dataset, and it is proved that similar conservative guarantee and
regret can be achieved if the offline dataset is sufficiently large. Experiment
results corroborate the theoretical analysis and demonstrate the effectiveness
of the proposed conservative exploration strategies.
Authors' comments: Accepted by ICML2023
Yan Sun, Li Shen, Dacheng Tao
Federated learning (FL) is a distributed paradigm that coordinates massive
local clients to collaboratively train a global model via stage-wise local
training processes on the heterogeneous dataset. Previous works have implicitly
studied that FL suffers from the ``client-drift'' problem, which is caused by
the inconsistent optimum across local clients. However, till now it still lacks
solid theoretical analysis to explain the impact of this local inconsistency.
To alleviate the negative impact of the ``client drift'' and explore its
substance in FL, in this paper, we first design an efficient FL algorithm
\textit{FedInit}, which allows employing the personalized relaxed
initialization state at the beginning of each local training stage.
Specifically, \textit{FedInit} initializes the local state by moving away from
the current global state towards the reverse direction of the latest local
state. This relaxed initialization helps to revise the local divergence and
enhance the local consistency level. Moreover, to further understand how
inconsistency disrupts performance in FL, we introduce the excess risk analysis
and study the divergence term to investigate the test error of the proposed
\textit{FedInit} method. Our studies show that optimization error is not
sensitive to this local inconsistency, while it mainly affects the
generalization error bound in \textit{FedInit}. Extensive experiments are
conducted to validate this conclusion. Our proposed \textit{FedInit} could
achieve state-of-the-art~(SOTA) results compared to several advanced benchmarks
without any additional costs. Meanwhile, stage-wise relaxed initialization
could also be incorporated into the current advanced algorithms to achieve
higher performance in the FL paradigm.
Authors' comments: 32 pages
Jiachen Yao, Chang Su, Zhongkai Hao, Songming Liu, Hang Su, Jun Zhu
Physics-informed Neural Networks (PINNs) have recently achieved remarkable progress in solving Partial Differential Equations (PDEs) in various fields by minimizing a weighted sum of PDE loss and boundary loss. However, there are several critical challenges in the training of PINNs, including the lack of theoretical frameworks and the imbalance between PDE loss and boundary loss. In this paper, we present an analysis of second-order non-homogeneous PDEs, which are classified into three categories and applicable to various common problems. We also characterize the connections between the training loss and actual error, guaranteeing convergence under mild conditions. The theoretical analysis inspires us to further propose MultiAdam, a scale-invariant optimizer that leverages gradient momentum to parameter-wisely balance the loss terms. Extensive experiment results on multiple problems from different physical domains demonstrate that our MultiAdam solver can improve the predictive accuracy by 1-2 orders of magnitude compared with strong baselines.
Christine W. Bang, Vanessa Didelez
Equivalence classes of DAGs (represented by CPDAGs) may be too large to
provide useful causal information. Here, we address incorporating tiered
background knowledge yielding restricted equivalence classes represented by
'tiered MPDAGs'. Tiered knowledge leads to considerable gains in
informativeness and computational efficiency: We show that construction of
tiered MPDAGs only requires application of Meek's 1st rule, and that tiered
MPDAGs (unlike general MPDAGs) are chain graphs with chordal components. This
entails simplifications e.g. of determining valid adjustment sets for causal
effect estimation. Further, we characterise when one tiered ordering is more
informative than another, providing insights into useful aspects of background
knowledge.
Authors' comments: Accepted for the 39th Conference on Uncertainty in Artificial
Intelligence (UAI 2023)
Bernardo Carvalho, Elias Rego
We discuss different regularities on stable/unstable holonomies of
cw-hyperbolic homeomorphisms and prove that if a cw-hyperbolic homeomorphism
has continuous joint stable/unstable holonomies, then it has a dense set of
periodic points in its non-wandering set. For that, we prove that the
hyperbolic cw-metric (introduced in [9]) can be adapted to be self-similar (as
in [6]) and, in this case, continuous joint stable/unstable holonomies are
pseudo-isometric. We also prove transitivity of cw-hyperbolic homeomorphisms
assuming that the stable/unstable holonomies are isometric. In the case the
ambient space is a surface, we prove that a cw$_F$-hyperbolic homeomorphism has
continuous joint stable/unstable holonomies when every bi-asymptotic sector is
regular.
Authors' comments: 27 pages, 9 figures
Jung Hyun Lee, Jeonghoon Kim, Se Jung Kwon, Dongsoo Lee
Post-training quantization (PTQ) has been gaining popularity for the
deployment of deep neural networks on resource-limited devices since unlike
quantization-aware training, neither a full training dataset nor end-to-end
training is required at all. As PTQ schemes based on reconstructing each layer
or block output turn out to be effective to enhance quantized model
performance, recent works have developed algorithms to devise and learn a new
weight-rounding scheme so as to better reconstruct each layer or block output.
In this work, we propose a simple yet effective new weight-rounding mechanism
for PTQ, coined \emph{FlexRound}, based on element-wise division instead of
typical element-wise addition such that FlexRound enables jointly learning a
common quantization grid size as well as a different scale for each pre-trained
weight. Thanks to the reciprocal rule of derivatives induced by element-wise
division, FlexRound is inherently able to exploit pre-trained weights when
updating their corresponding scales, and thus, flexibly quantize pre-trained
weights depending on their magnitudes. We empirically validate the efficacy of
FlexRound on a wide range of models and tasks. To the best of our knowledge,
our work is the first to carry out comprehensive experiments on not only image
classification and natural language understanding but also natural language
generation. Moreover, we demonstrate, for the first time, that large language
models can be efficiently quantized, with only a negligible impact on
performance compared to half-precision baselines, achieved by reconstructing
the output in a block-by-block manner. Our code is available at
\url{https://github.com/onliwad101/FlexRound_LRQ}.
Authors' comments: Accepted to ICML 2023
Kangjun Liu, Ke Chen, Lihua Guo, Yaowei Wang, Kui Jia
Mixup style data augmentation algorithms have been widely adopted in various tasks as implicit network regularization on representation learning to improve model generalization, which can be achieved by a linear interpolation of labeled samples in input or feature space as well as target space. Inspired by good robustness of alternative dropout strategies against over-fitting on limited patterns of training samples, this paper introduces a novel concept of ShuffleMix -- Shuffle of Mixed hidden features, which can be interpreted as a kind of dropout operation in feature space. Specifically, our ShuffleMix method favors a simple linear shuffle of randomly selected feature channels for feature mixup in-between training samples to leverage semantic interpolated supervision signals, which can be extended to a generalized shuffle operation via additionally combining linear interpolations of intra-channel features. Compared to its direct competitor of feature augmentation -- the Manifold Mixup, the proposed ShuffleMix can gain superior generalization, owing to imposing more flexible and smooth constraints on generating samples and achieving regularization effects of channel-wise feature dropout. Experimental results on several public benchmarking datasets of single-label and multi-label visual classification tasks can confirm the effectiveness of our method on consistently improving representations over the state-of-the-art mixup augmentation.
Ana Carolina da Cruz, Camila P. E. de Souza
Change-point models deal with ordered data sequences. Their primary goal is
to infer the locations where an aspect of the data sequence changes. In this
paper, we propose and implement a nonparametric Bayesian model for clustering
observations based on their constant-wise change-point profiles via Gibbs
sampler. Our model incorporates a Dirichlet Process on the constant-wise
change-point structures to cluster observations while simultaneously performing
multiple change-point estimation. Additionally, our approach controls the
number of clusters in the model, not requiring the specification of the number
of clusters a priori. Satisfactory clustering and estimation results were
obtained when evaluating our method under various simulated scenarios and on a
real dataset from single-cell genomic sequencing. Our proposed methodology is
implemented as an R package called BayesCPclust and is available from the
Comprehensive R Archive Network at
https://CRAN.R-project.org/package=BayesCPclust.
Authors' comments: 30 pages, 12 figures
Minwoo Jung, Sangwoo Jung, Ayoung Kim
In recent years, multiple Light Detection and Ranging (LiDAR) systems have
grown in popularity due to their enhanced accuracy and stability from the
increased field of view (FOV). However, integrating multiple LiDARs can be
challenging, attributable to temporal and spatial discrepancies. Common
practice is to transform points among sensors while requiring strict time
synchronization or approximating transformation among sensor frames. Unlike
existing methods, we elaborate the inter-sensor transformation using
continuous-time (CT) inertial measurement unit (IMU) modeling and derive
associated ambiguity as a point-wise uncertainty. This uncertainty, modeled by
combining the state covariance with the acquisition time and point range,
allows us to alleviate the strict time synchronization and to overcome FOV
difference. The proposed method has been validated on both public and our
datasets and is compatible with various LiDAR manufacturers and scanning
patterns. We open-source the code for public access at
https://github.com/minwoo0611/MA-LIO.
Authors' comments: 9 pages, 10 figures
Achraf Bahamou, Donald Goldfarb
We propose a new per-layer adaptive step-size procedure for stochastic
first-order optimization methods for minimizing empirical loss functions in
deep learning, eliminating the need for the user to tune the learning rate
(LR). The proposed approach exploits the layer-wise stochastic curvature
information contained in the diagonal blocks of the Hessian in deep neural
networks (DNNs) to compute adaptive step-sizes (i.e., LRs) for each layer. The
method has memory requirements that are comparable to those of first-order
methods, while its per-iteration time complexity is only increased by an amount
that is roughly equivalent to an additional gradient computation. Numerical
experiments show that SGD with momentum and AdamW combined with the proposed
per-layer step-sizes are able to choose effective LR schedules and outperform
fine-tuned LR versions of these methods as well as popular first-order and
second-order algorithms for training DNNs on Autoencoder, Convolutional Neural
Network (CNN) and Graph Convolutional Network (GCN) models. Finally, it is
proved that an idealized version of SGD with the layer-wise step sizes
converges linearly when using full-batch gradients.
Authors' comments: requires revision
Na Dong, Yongqiang Zhang, Mingli Ding, Gim Hee Lee
Real-world data tends to follow a long-tailed distribution, where the class
imbalance results in dominance of the head classes during training. In this
paper, we propose a frustratingly simple but effective step-wise learning
framework to gradually enhance the capability of the model in detecting all
categories of long-tailed datasets. Specifically, we build smooth-tail data
where the long-tailed distribution of categories decays smoothly to correct the
bias towards head classes. We pre-train a model on the whole long-tailed data
to preserve discriminability between all categories. We then fine-tune the
class-agnostic modules of the pre-trained model on the head class dominant
replay data to get a head class expert model with improved decision boundaries
from all categories. Finally, we train a unified model on the tail class
dominant replay data while transferring knowledge from the head class expert
model to ensure accurate detection of all categories. Extensive experiments on
long-tailed datasets LVIS v0.5 and LVIS v1.0 demonstrate the superior
performance of our method, where we can improve the AP with ResNet-50 backbone
from 27.0% to 30.3% AP, and especially for the rare categories from 15.5% to
24.9% AP. Our best model using ResNet-101 backbone can achieve 30.7% AP, which
suppresses all existing detectors using the same backbone.
Authors' comments: 10 pages, 5 figures
Peize Li, Ruining Deng, Yuankai Huo
Tissue examination and quantification in a 3D context on serial section whole
slide images (WSIs) were laborintensive and time-consuming tasks. Our previous
study proposed a novel registration-based method (Map3D) to automatically align
WSIs to the same physical space, reducing the human efforts of screening serial
sections from WSIs. However, the registration performance of our Map3D method
was only evaluated on single-stain WSIs with large-scale kidney tissue samples.
In this paper, we provide a Docker for an end-to-end 3D slide-wise registration
pipeline on needle biopsy serial sections in a multi-stain paradigm. The
contribution of this study is three-fold: (1) We release a containerized Docker
for an end-to-end multi-stain WSI registration. (2) We prove that the Map3D
pipeline is capable of sectional registration from multi-stain WSI. (3) We
verify that the Map3D pipeline can also be applied to needle biopsy tissue
samples. The source code and the Docker have been made publicly available at
https://github.com/hrlblab/Map3D.
Authors' comments: 6 pages, 4 figures
Ming Hu, Zhihao Yue, Xiaofei Xie, Cheng Chen, Yihao Huang, Xian Wei, Xiang Lian, Yang Liu et al.
Although Federated Learning (FL) enables global model training across clients
without compromising their raw data, due to the unevenly distributed data among
clients, existing Federated Averaging (FedAvg)-based methods suffer from the
problem of low inference performance. Specifically, different data
distributions among clients lead to various optimization directions of local
models. Aggregating local models usually results in a low-generalized global
model, which performs worse on most of the clients. To address the above issue,
inspired by the observation from a geometric perspective that a
well-generalized solution is located in a flat area rather than a sharp area,
we propose a novel and heuristic FL paradigm named FedMR (Federated Model
Recombination). The goal of FedMR is to guide the recombined models to be
trained towards a flat area. Unlike conventional FedAvg-based methods, in
FedMR, the cloud server recombines collected local models by shuffling each
layer of them to generate multiple recombined models for local training on
clients rather than an aggregated global model. Since the area of the flat area
is larger than the sharp area, when local models are located in different
areas, recombined models have a higher probability of locating in a flat area.
When all recombined models are located in the same flat area, they are
optimized towards the same direction. We theoretically analyze the convergence
of model recombination. Experimental results show that, compared with
state-of-the-art FL methods, FedMR can significantly improve the inference
accuracy without exposing the privacy of each client.
Authors' comments: arXiv admin note: substantial text overlap with arXiv:2208.07677
Guiyu Zhao, Bo Qiu, A-Li Luo, Xiaoyu Guo, Lin Yao, Kun Wang, Yuanbo Liu
The Wide-field Infrared Survey Explorer (WISE) has detected hundreds of millions of sources over the entire sky. However, classifying them reliably is a great challenge due to degeneracies in WISE multicolor space and low detection levels in its two longest-wavelength bandpasses. In this paper, the deep learning classification network, IICnet (Infrared Image Classification network), is designed to classify sources from WISE images to achieve a more accurate classification goal. IICnet shows good ability on the feature extraction of the WISE sources. Experiments demonstrates that the classification results of IICnet are superior to some other methods; it has obtained 96.2% accuracy for galaxies, 97.9% accuracy for quasars, and 96.4% accuracy for stars, and the Area Under Curve (AUC) of the IICnet classifier can reach more than 99%. In addition, the superiority of IICnet in processing infrared images has been demonstrated in the comparisons with VGG16, GoogleNet, ResNet34, MobileNet, EfficientNetV2, and RepVGG-fewer parameters and faster inference. The above proves that IICnet is an effective method to classify infrared sources.