Ruoyang Liu, Chenhan Wei, Yixiong Yang, Wenxun Wang, Huazhong Yang, Yongpan Liu
Data quantization is an effective method to accelerate neural network
training and reduce power consumption. However, it is challenging to perform
low-bit quantized training: the conventional equal-precision quantization will
lead to either high accuracy loss or limited bit-width reduction, while
existing mixed-precision methods offer high compression potential but failed to
perform accurate and efficient bit-width assignment. In this work, we propose
DYNASTY, a block-wise dynamic-precision neural network training framework.
DYNASTY provides accurate data sensitivity information through fast online
analytics, and maintains stable training convergence with an adaptive bit-width
map generator. Network training experiments on CIFAR-100 and ImageNet dataset
are carried out, and compared to 8-bit quantization baseline, DYNASTY brings up
to $5.1\times$ speedup and $4.7\times$ energy consumption reduction with no
accuracy drop and negligible hardware overhead.
Authors' comments: 7 pages, to be published in 28th Asia and South Pacific Design
Automation Conference (ASP-DAC 2023)
Mi Qian, Yao Ge, Miaowen Wen, Fei Ji
As a promising technique for high-mobility wireless communications, orthogonal time frequency space (OTFS) has been proved to enjoy excellent advantages with respect to traditional orthogonal frequency division multiplexing (OFDM). However, a challenging problem is to design efficient systems to further improve the performance. In this paper, we propose a novel block-wise index modulation (IM) scheme for OTFS systems, named Doppler-IM with OTFS (DoIM-OTFS), where a block of Doppler resource bins are activated simultaneously. For practical implementation, we develop a low complexity customized message passing (CMP) algorithm for our proposed DoIM-OTFS scheme. Simulation results demonstrate our proposed DoIM-OTFS system outperforms traditional OTFS system without IM. The proposed CMP algorithm can achieve desired performance and robustness to the imperfect channel state information (CSI).
Vardhan Dongre, Abhinav Thimma Reddy, Nikhitha Reddeddy
DeepFake Audio, unlike DeepFake images and videos, has been relatively less
explored from detection perspective, and the solutions which exist for the
synthetic speech classification either use complex networks or dont generalize
to different varieties of synthetic speech obtained using different generative
and optimization-based methods. Through this work, we propose a channel-wise
recalibration of features using attention feature fusion for synthetic speech
detection and compare its performance against different detection methods
including End2End models and Resnet-based models on synthetic speech generated
using Text to Speech and Vocoder systems like WaveNet, WaveRNN, Tactotron, and
WaveGlow. We also experiment with Squeeze Excitation (SE) blocks in our Resnet
models and found that the combination was able to get better performance. In
addition to the analysis, we also demonstrate that the combination of Linear
frequency cepstral coefficients (LFCC) and Mel Frequency cepstral coefficients
(MFCC) using the attentional feature fusion technique creates better input
features representations which can help even simpler models generalize well on
synthetic speech classification tasks. Our models (Resnet based using feature
fusion) trained on Fake or Real (FoR) dataset and were able to achieve 95% test
accuracy with the FoR data, and an average of 90% accuracy with samples we
generated using different generative models after adapting this framework.
Authors' comments: 7 pages, 8 figures, 4 tables
Pouria Mehrabi, Hamid D. Taghirad
Both in terrestrial and extraterrestrial environments, the precise and informative model of the ground and the surface ahead is crucial for navigation and obstacle avoidance. The ground surface is not always flat and it may be sloped, bumpy and rough specially in off-road terrestrial scenes. In bumpy and rough scenes the functional relationship of the surface-related features may vary in different areas of the ground, as the structure of the ground surface may vary suddenly and further the measured point cloud of the ground does not bear smoothness. Thus, the ground-related features must be obtained based on local estimates or even point estimates. To tackle this problem, the segment-wise GP-based ground segmentation method with local smoothness estimation is proposed. This method is an extension to our previous method in which a realistic measurement of the length-scale values were provided for the covariance kernel in each line-segment to give precise estimation of the ground for sloped terrains. In this extension, the value of the length-scale is estimated locally for each data point which makes it much more precise for the rough scenes while being not computationally complex and more robust to under-segmentation, sparsity and under-represent-ability. The segment-wise task is performed to estimate a partial continuous model of the ground for each radial range segment. Simulation results show the effectiveness of the proposed method to give a continuous and precise estimation of the ground surface in rough and bumpy scenes while being fast enough for real-world applications.
Chenghao Yang, Xuezhe Ma
Fine-tuning over large pretrained language models (PLMs) has established many
state-of-the-art results. Despite its superior performance, such fine-tuning
can be unstable, resulting in significant variance in performance and potential
risks for practical applications. Previous works have attributed such
instability to the catastrophic forgetting problem in the top layers of PLMs,
which indicates iteratively that fine-tuning layers in a top-down manner is a
promising solution. In this paper, we first point out that this method does not
always work out due to the different convergence speeds of different
layers/modules. Inspired by this observation, we propose a simple
component-wise gradient norm clipping method to adjust the convergence speed
for different components. Experiment results demonstrate that our method
achieves consistent improvements in terms of generalization performance,
convergence speed, and training stability. The codebase can be found at
https://github.com/yangalan123/FineTuningStability.
Authors' comments: EMNLP 2022 Camera Ready
Penghui Fu, Zhiqiang Tan
For multivariate nonparametric regression, doubly penalized ANOVA modeling (DPAM) has recently been proposed, using hierarchical total variations (HTVs) and empirical norms as penalties on the component functions such as main effects and multi-way interactions in a functional ANOVA decomposition of the underlying regression function. The two penalties play complementary roles: the HTV penalty promotes sparsity in the selection of basis functions within each component function, whereas the empirical-norm penalty promotes sparsity in the selection of component functions. We adopt backfitting or block minimization for training DPAM, and develop two suitable primal-dual algorithms, including both batch and stochastic versions, for updating each component function in single-block optimization. Existing applications of primal-dual algorithms are intractable in our setting with both HTV and empirical-norm penalties. Through extensive numerical experiments, we demonstrate the validity and advantage of our stochastic primal-dual algorithms, compared with their batch versions and a previous active-set algorithm, in large-scale scenarios.
Suchetana Sadhukhan, Poulomi Sadhukhan
This paper, for the first time, focuses on the sector-wise analysis of a
stock market through multifractal analysis. We have considered Bombay Stock
Exchange, India, and identified two time scales, short ($<200$ days) and long
time-scale ($>200$ days) for investment. We infer that long-term investment
will be more profitable. For long time scale, sectors can be separated into two
categories based on the Hurst exponent values; one corresponds to stable
sectors with small fluctuations, and the other with dominance of large
fluctuations leading to possible downturns in those sectors.
Authors' comments: 15 pages, 3 figures, 2 tables
Miquel Martí i Rabadán, Alessandro Pieropan, Hossein Azizpour, Atsuto Maki
We propose Dense FixMatch, a simple method for online semi-supervised learning of dense and structured prediction tasks combining pseudo-labeling and consistency regularization via strong data augmentation. We enable the application of FixMatch in semi-supervised learning problems beyond image classification by adding a matching operation on the pseudo-labels. This allows us to still use the full strength of data augmentation pipelines, including geometric transformations. We evaluate it on semi-supervised semantic segmentation on Cityscapes and Pascal VOC with different percentages of labeled data and ablate design choices and hyper-parameters. Dense FixMatch significantly improves results compared to supervised learning using only labeled data, approaching its performance with 1/4 of the labeled samples.
Marco Landt-Hayen, Peer Kröger, Martin Claus, Willi Rath
Artificial neural networks (ANNs) are known to be powerful methods for many
hard problems (e.g. image classification, speech recognition or time series
prediction). However, these models tend to produce black-box results and are
often difficult to interpret. Layer-wise relevance propagation (LRP) is a
widely used technique to understand how ANN models come to their conclusion and
to understand what a model has learned. Here, we focus on Echo State Networks
(ESNs) as a certain type of recurrent neural networks, also known as reservoir
computing. ESNs are easy to train and only require a small number of trainable
parameters, but are still black-box models. We show how LRP can be applied to
ESNs in order to open the black-box. We also show how ESNs can be used not only
for time series prediction but also for image classification: Our ESN model
serves as a detector for El Nino Southern Oscillation (ENSO) from sea surface
temperature anomalies. ENSO is actually a well-known problem and has been
extensively discussed before. But here we use this simple problem to
demonstrate how LRP can significantly enhance the explainablility of ESNs.
Authors' comments: Shortened title, corrected author affiliation, added citation
reference: Accepted at 3rd International Conference on Machine Learning
Techniques (MLTEC 2022), Zurich, Switzerland
Themos Stafylakis, Ladislav Mosner, Sofoklis Kakouros, Oldrich Plchot, Lukas Burget, Jan Cernocky
Self-supervised learning of speech representations from large amounts of
unlabeled data has enabled state-of-the-art results in several speech
processing tasks. Aggregating these speech representations across time is
typically approached by using descriptive statistics, and in particular, using
the first- and second-order statistics of representation coefficients. In this
paper, we examine an alternative way of extracting speaker and emotion
information from self-supervised trained models, based on the correlations
between the coefficients of the representations - correlation pooling. We show
improvements over mean pooling and further gains when the pooling methods are
combined via fusion. The code is available at
github.com/Lamomal/s3prl_correlation.
Authors' comments: Accepted at IEEE-SLT 2022
Zhiyuan Zhang, Qi Su, Xu Sun
Despite the potential of federated learning, it is known to be vulnerable to
backdoor attacks. Many robust federated aggregation methods are proposed to
reduce the potential backdoor risk. However, they are mainly validated in the
CV field. In this paper, we find that NLP backdoors are hard to defend against
than CV, and we provide a theoretical analysis that the malicious update
detection error probabilities are determined by the relative backdoor
strengths. NLP attacks tend to have small relative backdoor strengths, which
may result in the failure of robust federated aggregation methods for NLP
attacks. Inspired by the theoretical results, we can choose some dimensions
with higher backdoor strengths to settle this issue. We propose a novel
federated aggregation algorithm, Dim-Krum, for NLP tasks, and experimental
results validate its effectiveness.
Authors' comments: Accepted by Findings of EMNLP 2022
Alexandre Martin
We show that two-dimensional Artin groups satisfy a strengthening of the Tits
alternative: their subgroups either contain a non-abelian free group or are
virtually free abelian of rank at most $2$.
When in addition the associated Coxeter group is hyperbolic, we answer in the
affirmative a question of Wise on the subgroups generated by large powers of
two elements: given any two elements $a, b$ of a two-dimensional Artin group of
hyperbolic type, there exists an integer $n\geq 1$ such that $a^n$ and $b^n$
either commute or generate a non-abelian free subgroup.
Authors' comments: 24 pages, 7 figures. Final version accepted for publication
Alex J. Chan, Mihaela van der Schaar
Consider making a prediction over new test data without any opportunity to learn from a training set of labelled data - instead given access to a set of expert models and their predictions alongside some limited information about the dataset used to train them. In scenarios from finance to the medical sciences, and even consumer practice, stakeholders have developed models on private data they either cannot, or do not want to, share. Given the value and legislation surrounding personal information, it is not surprising that only the models, and not the data, will be released - the pertinent question becoming: how best to use these models? Previous work has focused on global model selection or ensembling, with the result of a single final model across the feature space. Machine learning models perform notoriously poorly on data outside their training domain however, and so we argue that when ensembling models the weightings for individual instances must reflect their respective domains - in other words models that are more likely to have seen information on that instance should have more attention paid to them. We introduce a method for such an instance-wise ensembling of models, including a novel representation learning step for handling sparse high-dimensional domains. Finally, we demonstrate the need and generalisability of our method on classical machine learning tasks as well as highlighting a real world use case in the pharmacological setting of vancomycin precision dosing.
Huiyang Shao, Qianqian Xu, Zhiyong Yang, Shilong Bao, Qingming Huang
The Partial Area Under the ROC Curve (PAUC), typically including One-way
Partial AUC (OPAUC) and Two-way Partial AUC (TPAUC), measures the average
performance of a binary classifier within a specific false positive rate and/or
true positive rate interval, which is a widely adopted measure when decision
constraints must be considered. Consequently, PAUC optimization has naturally
attracted increasing attention in the machine learning community within the
last few years. Nonetheless, most of the existing methods could only optimize
PAUC approximately, leading to inevitable biases that are not controllable.
Fortunately, a recent work presents an unbiased formulation of the PAUC
optimization problem via distributional robust optimization. However, it is
based on the pair-wise formulation of AUC, which suffers from the limited
scalability w.r.t. sample size and a slow convergence rate, especially for
TPAUC. To address this issue, we present a simpler reformulation of the problem
in an asymptotically unbiased and instance-wise manner. For both OPAUC and
TPAUC, we come to a nonconvex strongly concave minimax regularized problem of
instance-wise functions. On top of this, we employ an efficient solver enjoys a
linear per-iteration computational complexity w.r.t. the sample size and a
time-complexity of $O(\epsilon^{-1/3})$ to reach a $\epsilon$ stationary point.
Furthermore, we find that the minimax reformulation also facilitates the
theoretical analysis of generalization error as a byproduct. Compared with the
existing results, we present new error bounds that are much easier to prove and
could deal with hypotheses with real-valued outputs. Finally, extensive
experiments on several benchmark datasets demonstrate the effectiveness of our
method.
Authors' comments: NeurIPS 2022
Misbah Shafi, Rakesh Kumar Jha, Sanjeev Jain
The advancement in wireless communication technologies is becoming more
demanding and pervasive. One of the fundamental parameters that limit the
efficiency of the network are the security challenges. The communication
network is vulnerable to security attacks such as spoofing attacks and signal
strength attacks. Intrusion detection signifies a central approach to ensuring
the security of the communication network. In this paper, an Intrusion
Detection System based on the framework of graph theory is proposed. A
Layerwise Graph Theory-Based Intrusion Detection System (LGTBIDS) algorithm is
designed to detect the attacked node. The algorithm performs the layer-wise
analysis to extract the vulnerable nodes and ultimately the attacked node(s).
For each layer, every node is scanned for the possibility of susceptible
node(s). The strategy of the IDS is based on the analysis of energy efficiency
and secrecy rate. The nodes with the energy efficiency and secrecy rate beyond
the range of upper and lower thresholds are detected as the nodes under attack.
Further, detected node(s) are transmitted with a random sequence of bits
followed by the process of re-authentication. The obtained results validate the
better performance, low time computations, and low complexity. Finally, the
proposed approach is compared with the conventional solution of intrusion
detection.
Authors' comments: in IEEE Transactions on Network and Service Management, 2022
Valeri V. Makarov, Nathan J. Secrest
Making use of strong correlations between closely separated multiple or
double sources and photometric and astrometric metadata in Gaia EDR3, we
generate a catalog of candidate double and multiply imaged lensed quasars and
AGNs, comprising 3140 systems. It includes two partially overlapping parts, a
sample of distant (redshifts mostly greater than 1) sources with perturbed
data, and systems resolved into separate components by Gaia at separations less
than $2\arcsec$. For the first part, which is roughly one third of the
published catalog, we synthesized 0.617 million redshifts by multiple machine
learning prediction and classification methods, using independent photometric
and astrometric data from Gaia EDR3 and WISE with accurate spectroscopic
redshifts from SDSS as a training set. Using these synthetic redshifts, we
estimate a rate of 4.9\% of interlopers with spectroscopic redshift below 1 in
this part of the catalog. Unresolved candidate double and dual AGNs and quasars
are selected as sources with marginally high BP/RP excess factor
(phot_bp_rp_excess_factor), which is sensitive to source extent, limiting our
search to high-redshift quasars. For the second part of the catalog, additional
filters on measured parallax and near-neighbor statistics are applied to
diminish the propagation of remaining stellar contaminants. The estimated rate
of positives (double or multiple sources) is 98\%, and the estimated rate of
dual (physically related quasars) is greater than 54\%. A few dozen
serendipitously found objects of interest are discussed in more detail,
including known and new lensed images, planetary nebulae and young infrared
stars of peculiar morphology, and quasars with catastrophic redshift errors in
SDSS.
Authors' comments: Accepted in ApJS
Skander Karkar, Ibrahim Ayed, Emmanuel de Bézenac, Patrick Gallinari
End-to-end backpropagation has a few shortcomings: it requires loading the
entire model during training, which can be impossible in constrained settings,
and suffers from three locking problems (forward locking, update locking and
backward locking), which prohibit training the layers in parallel. Solving
layer-wise optimization problems can address these problems and has been used
in on-device training of neural networks. We develop a layer-wise training
method, particularly welladapted to ResNets, inspired by the minimizing
movement scheme for gradient flows in distribution space. The method amounts to
a kinetic energy regularization of each block that makes the blocks optimal
transport maps and endows them with regularity. It works by alleviating the
stagnation problem observed in layer-wise training, whereby greedily-trained
early layers overfit and deeper layers stop increasing test accuracy after a
certain depth. We show on classification tasks that the test accuracy of
block-wise trained ResNets is improved when using our method, whether the
blocks are trained sequentially or in parallel.
Authors' comments: 1st International Workshop on Practical Deep Learning in the Wild at
AAAI 2022
Chen Liang, Simiao Zuo, Qingru Zhang, Pengcheng He, Weizhu Chen, Tuo Zhao
Layer-wise distillation is a powerful tool to compress large models (i.e.
teacher models) into small ones (i.e., student models). The student distills
knowledge from the teacher by mimicking the hidden representations of the
teacher at every intermediate layer. However, layer-wise distillation is
difficult. Since the student has a smaller model capacity than the teacher, it
is often under-fitted. Furthermore, the hidden representations of the teacher
contain redundant information that the student does not necessarily need for
the target task's learning. To address these challenges, we propose a novel
Task-aware layEr-wise Distillation (TED). TED designs task-aware filters to
align the hidden representations of the student and the teacher at each layer.
The filters select the knowledge that is useful for the target task from the
hidden representations. As such, TED reduces the knowledge gap between the two
models and helps the student to fit better on the target task. We evaluate TED
in two scenarios: continual pre-training and fine-tuning. TED demonstrates
significant and consistent improvements over existing distillation methods in
both scenarios. Code is available at
https://github.com/cliang1453/task-aware-distillation.
Authors' comments: Proceedings of ICML 2023
Maaike M. Galama, Hao Wu, Andreas Krämer, Mohsen Sadeghi, Frank Noé
The dynamics of molecules are governed by rare event transitions between long-lived (metastable) states. To explore these transitions efficiently, many enhanced sampling protocols have been introduced that involve using simulations with biases or changed temperatures. Two established statistically optimal estimators for obtaining unbiased equilibrium properties from such simulations are the multistate Bennett Acceptance Ratio (MBAR) and the transition-based reweighting analysis method (TRAM). Both MBAR and TRAM are solved iteratively and can suffer from long convergence times. Here we introduce stochastic approximators (SA) for both estimators, resulting in SAMBAR and SATRAM, which are shown to converge faster than their deterministic counterparts, without significant accuracy loss. Both methods are demonstrated on different molecular systems.
Muhammad ElNokrashy, Badr AlKhamissi, Mona Diab
Language Models pretrained on large textual data have been shown to encode
different types of knowledge simultaneously. Traditionally, only the features
from the last layer are used when adapting to new tasks or data. We put forward
that, when using or finetuning deep pretrained models, intermediate layer
features that may be relevant to the downstream task are buried too deep to be
used efficiently in terms of needed samples or steps. To test this, we propose
a new layer fusion method: Depth-Wise Attention (DWAtt), to help re-surface
signals from non-final layers. We compare DWAtt to a basic concatenation-based
layer fusion method (Concat), and compare both to a deeper model baseline --
all kept within a similar parameter budget. Our findings show that DWAtt and
Concat are more step- and sample-efficient than the baseline, especially in the
few-shot setting. DWAtt outperforms Concat on larger data sizes. On CoNLL-03
NER, layer fusion shows 3.68-9.73% F1 gain at different few-shot sizes. The
layer fusion models presented significantly outperform the baseline in various
training scenarios with different data sizes, architectures, and training
constraints.
Authors' comments: 7 pages, 7 figures