Shezheng Song, Hao Xu, Jun Ma, Shasha Li, Long Peng, Qian Wan, Xiaodong Liu, Jie Yu
Large Language Models (LLMs) exhibit strong general language capabilities.
However, fine-tuning these models on domain-specific tasks often leads to
catastrophic forgetting, where the model overwrites or loses essential
knowledge acquired during pretraining. This phenomenon significantly limits the
broader applicability of LLMs. To address this challenge, we propose a novel
approach to compute the element-wise importance of model parameters crucial for
preserving general knowledge during fine-tuning. Our method utilizes a
dual-objective optimization strategy: (1) regularization loss based on
element-wise parameter importance, which constrains the updates to parameters
crucial for general knowledge; (2) cross-entropy loss to adapt to
domain-specific tasks. Additionally, we introduce layer-wise coefficients to
account for the varying contributions of different layers, dynamically
balancing the dual-objective optimization. Extensive experiments on scientific,
medical, and physical tasks using GPT-J and LLaMA-3 demonstrate that our
approach mitigates catastrophic forgetting while enhancing model adaptability.
Compared to previous methods, our solution is approximately 20 times faster and
requires only 10-15% of the storage, highlighting the practical efficiency. The
code will be released.
Authors' comments: Work in progress
Weiye Zhao, Feihan Li, Yifan Sun, Yujie Wang, Rui Chen, Tianhao Wei, Changliu Liu
Enforcing state-wise safety constraints is critical for the application of
reinforcement learning (RL) in real-world problems, such as autonomous driving
and robot manipulation. However, existing safe RL methods only enforce
state-wise constraints in expectation or enforce hard state-wise constraints
with strong assumptions. The former does not exclude the probability of safety
violations, while the latter is impractical. Our insight is that although it is
intractable to guarantee hard state-wise constraints in a model-free setting,
we can enforce state-wise safety with high probability while excluding strong
assumptions. To accomplish the goal, we propose Absolute State-wise Constrained
Policy Optimization (ASCPO), a novel general-purpose policy search algorithm
that guarantees high-probability state-wise constraint satisfaction for
stochastic systems. We demonstrate the effectiveness of our approach by
training neural network policies for extensive robot locomotion tasks, where
the agent must adhere to various state-wise safety constraints. Our results
show that ASCPO significantly outperforms existing methods in handling
state-wise constraints across challenging continuous control tasks,
highlighting its potential for real-world applications.
Authors' comments: submission to Journal of Machine Learning Research
Tobias Pett, Sebastian Krieter, Thomas Thüm, Ina Schaefer
Ensuring the functional safety of highly configurable systems often requires testing representative subsets of all possible configurations to reduce testing effort and save resources. The ratio of covered t-wise feature interactions (i.e., T-Wise Feature Interaction Coverage) is a common criterion for determining whether a subset of configurations is representative and capable of finding faults. Existing t-wise sampling algorithms uniformly cover t-wise feature interactions for all features, resulting in lengthy execution times and large sample sizes, particularly when large t-wise feature interactions are considered (i.e., high values of t). In this paper, we introduce a novel approach to t-wise feature interaction sampling, questioning the necessity of uniform coverage across all t-wise feature interactions, called \emph{\mulTiWise{}}. Our approach prioritizes between subsets of critical and non-critical features, considering higher t-values for subsets of critical features when generating a t-wise feature interaction sample. We evaluate our approach using subject systems from real-world applications, including \busybox{}, \soletta{}, \fiasco{}, and \uclibc{}. Our results show that sacrificing uniform t-wise feature interaction coverage between all features reduces the time needed to generate a sample and the resulting sample size. Hence, \mulTiWise{} Sampling offers an alternative to existing approaches if knowledge about feature criticality is available.
Dragos Ristache, Fabian Spaeh, Charalampos E. Tsourakakis
In 1907, Sir Francis Galton independently asked 787 villagers to estimate the weight of an ox. Although none of them guessed the exact weight, the average estimate was remarkably accurate. This phenomenon is known as wisdom of crowds. In a clever experiment, Asch employed actors to demonstrate the human tendency to conform to others' opinions. The question we ask is: what would Sir Francis Galton have observed if Asch had interfered by employing actors? Would the wisdom of crowds become even wiser or not? The problem becomes intriguing when considering the inter-connectedness of the villagers, which is the central theme of this work. We examine a scenario where $n$ agents are interconnected and influence each other. The average of their opinions provides an estimator of a certain quality for some unknown quantity. How can one improve or reduce the quality of the original estimator in terms of the MSE by utilizing Asch's strategy of hiring a few stooges? We present a new formulation of this problem, assuming that nodes adjust their opinions according to the Friedkin-Johnsen opinion dynamics. We demonstrate that selecting $k$ stooges for maximizing and minimizing the MSE is NP-hard. We also demonstrate that our formulation is closely related to maximizing or minimizing polarization and show NP-hardness. We propose an efficient greedy heuristic that scales to large networks and test our algorithm on synthetic and real-world datasets. Although MSE and polarization objectives differ, we find in practice that maximizing polarization often yields solutions that are nearly optimal for minimizing the wisdom of crowds in terms of MSE. Our analysis of real-world data reveals that even a small number of stooges can significantly influence the conversation on the war in Ukraine, resulting in a relative increase of the MSE of 207.80% (maximization) or a decrease of 50.62% (minimization).
Joaquín Hernández-Yévenes, Neil Nagar, Vicente Arratia, Thomas H. Jarrett
Supermassive Black Holes (SMBHs) are commonly found at the centers of massive
galaxies. Estimating their masses ($M_\text{BH}$) is crucial for understanding
galaxy-SMBH co-evolution. We present WISE2MBH, an efficient algorithm that uses
cataloged Wide-field Infrared Survey Explorer (WISE) magnitudes to estimate
total stellar mass ($M_*$) and scale this to bulge mass ($M_\text{Bulge}$), and
$M_\text{BH}$, estimating the morphological type ($T_\text{Type}$) and bulge
fraction ($B/T$) in the process. WISE2MBH uses scaling relations from the
literature or developed in this work, providing a streamlined approach to
derive these parameters. It also distinguishes QSOs from galaxies and estimates
the galaxy $T_\text{Type}$ using WISE colors with a relation trained with
galaxies from the 2MASS Redshift Survey. WISE2MBH performs well up to
$z\sim0.5$ thanks to K-corrections in magnitudes and colors. WISE2MBH
$M_\text{BH}$ estimates agree very well with those of a selected sample of
local galaxies with $M_\text{BH}$ measurements or reliable estimates: a
Spearman score of $\sim$0.8 and a RMSE of $\sim$0.63 were obtained. When
applied to the ETHER sample at $z\leq0.5$, WISE2MBH provides $\sim$1.9 million
$M_\text{BH}$ estimates (78.5\% new) and $\sim$100 thousand upper limits. The
derived local black hole mass function (BHMF) is in good agreement with
existing literature BHMFs. Galaxy demographic projects, including target
selection for the Event Horizon Telescope, can benefit from WISE2MBH for
up-to-date galaxy parameters and $M_\text{BH}$ estimates. The WISE2MBH
algorithm is publicly available on GitHub.
Authors' comments: 21 pages, 13 main + 4 appendix figures, accepted for publication in
MNRAS
Zelin He, Ying Sun, Jingyuan Liu, Runze Li
We consider the transfer learning problem in the high dimensional linear regression setting, where the feature dimension is larger than the sample size. To learn transferable information, which may vary across features or the source samples, we propose an adaptive transfer learning method that can detect and aggregate the feature-wise (F-AdaTrans) or sample-wise (S-AdaTrans) transferable structures. We achieve this by employing a fused-penalty, coupled with weights that can adapt according to the transferable structure. To choose the weight, we propose a theoretically informed, data-driven procedure, enabling F-AdaTrans to selectively fuse the transferable signals with the target while filtering out non-transferable signals, and S-AdaTrans to obtain the optimal combination of information transferred from each source sample. We show that, with appropriately chosen weights, F-AdaTrans achieves a convergence rate close to that of an oracle estimator with a known transferable structure, and S-AdaTrans recovers existing near-minimax optimal rates as a special case. The effectiveness of the proposed method is validated using both simulation and real data, demonstrating favorable performance compared to the existing methods.
Yi Qin, Xiaomeng Li
Unsupervised deformable image registration is one of the challenging tasks in
medical imaging. Obtaining a high-quality deformation field while preserving
deformation topology remains demanding amid a series of deep-learning-based
solutions. Meanwhile, the diffusion model's latent feature space shows
potential in modeling the deformation semantics. To fully exploit the diffusion
model's ability to guide the registration task, we present two modules:
Feature-wise Diffusion-Guided Module (FDG) and Score-wise Diffusion-Guided
Module (SDG). Specifically, FDG uses the diffusion model's multi-scale semantic
features to guide the generation of the deformation field. SDG uses the
diffusion score to guide the optimization process for preserving deformation
topology with barely any additional computation. Experiment results on the 3D
medical cardiac image registration task validate our model's ability to provide
refined deformation fields with preserved topology effectively. Code is
available at: https://github.com/xmed-lab/FSDiffReg.git.
Authors' comments: Accepted as a conference paper at Medical Image Computing and
Computer-Assisted Intervention (MICCAI) conference 2023
Emmanouil Karystinaios, Gerhard Widmer
Roman Numeral analysis is the important task of identifying chords and their
functional context in pieces of tonal music. This paper presents a new approach
to automatic Roman Numeral analysis in symbolic music. While existing
techniques rely on an intermediate lossy representation of the score, we
propose a new method based on Graph Neural Networks (GNNs) that enable the
direct description and processing of each individual note in the score. The
proposed architecture can leverage notewise features and interdependencies
between notes but yield onset-wise representation by virtue of our novel edge
contraction algorithm. Our results demonstrate that ChordGNN outperforms
existing state-of-the-art models, achieving higher accuracy in Roman Numeral
analysis on the reference datasets. In addition, we investigate variants of our
model using proposed techniques such as NADE, and post-processing of the chord
predictions. The full source code for this work is available at
https://github.com/manoskary/chordgnn
Authors' comments: In Proceedings of the 24th Conference of the International Society
for Music Information Retrieval (ISMIR 2023), Milan, Italy
YaChen Yan, Liubo Li
Learning feature interactions is the key to success for the large-scale CTR prediction and recommendation. In practice, handcrafted feature engineering usually requires exhaustive searching. In order to reduce the high cost of human efforts in feature engineering, researchers propose several deep neural networks (DNN)-based approaches to learn the feature interactions in an end-to-end fashion. However, existing methods either do not learn both vector-wise interactions and bit-wise interactions simultaneously, or fail to combine them in a controllable manner. In this paper, we propose a new model, xDeepInt, based on a novel network architecture called polynomial interaction network (PIN) which learns higher-order vector-wise interactions recursively. By integrating subspace-crossing mechanism, we enable xDeepInt to balance the mixture of vector-wise and bit-wise feature interactions at a bounded order. Based on the network architecture, we customize a combined optimization strategy to conduct feature selection and interaction selection. We implement the proposed model and evaluate the model performance on three real-world datasets. Our experiment results demonstrate the efficacy and effectiveness of xDeepInt over state-of-the-art models. We open-source the TensorFlow implementation of xDeepInt: https://github.com/yanyachen/xDeepInt.
He Jia, Hong-Ming Zhu, Ue-Li Pen
The angular momentum of galaxies (galaxy spin) contains rich information
about the initial condition of the Universe, yet it is challenging to
efficiently measure the spin direction for the tremendous amount of galaxies
that are being mapped by the ongoing and forthcoming cosmological surveys. We
present a machine learning based classifier for the Z-wise vs S-wise spirals,
which can help to break the degeneracy in the galaxy spin direction
measurement. The proposed Chirality Equivariant Residual Network (CE-ResNet) is
manifestly equivariant under a reflection of the input image, which guarantees
that there is no inherent asymmetry between the Z-wise and S-wise probability
estimators. We train the model with Sloan Digital Sky Survey (SDSS) images,
with the training labels given by the Galaxy Zoo 1 (GZ1) project. A combination
of data augmentation tricks are used during the training, making the model more
robust to be applied to other surveys. We find a $\sim\!30\%$ increase of both
types of spirals when Dark Energy Spectroscopic Instrument (DESI) images are
used for classification, due to the better imaging quality of DESI. We verify
that the $\sim\!7\sigma$ difference between the numbers of Z-wise and S-wise
spirals is due to human bias, since the discrepancy drops to $<\!1.8\sigma$
with our CE-ResNet classification results. We discuss the potential systematics
that are relevant to the future cosmological applications.
Authors' comments: 13+4 pages, 11 figures, 2 tables, accepted by ApJ
Jintao Guo, Lei Qi, Yinghuan Shi, Yang Gao
Domain generalization (DG) aims to learn a generic model from multiple
observed source domains that generalizes well to arbitrary unseen target
domains without further training. The major challenge in DG is that the model
inevitably faces a severe overfitting issue due to the domain gap between
source and target domains. To mitigate this problem, some dropout-based methods
have been proposed to resist overfitting by discarding part of the
representation of the intermediate layers. However, we observe that most of
these methods only conduct the dropout operation in some specific layers,
leading to an insufficient regularization effect on the model. We argue that
applying dropout at multiple layers can produce stronger regularization
effects, which could alleviate the overfitting problem on source domains more
adequately than previous layer-specific dropout methods. In this paper, we
develop a novel layer-wise and channel-wise dropout for DG, which randomly
selects one layer and then randomly selects its channels to conduct dropout.
Particularly, the proposed method can generate a variety of data variants to
better deal with the overfitting issue. We also provide theoretical analysis
for our dropout method and prove that it can effectively reduce the
generalization error bound. Besides, we leverage the progressive scheme to
increase the dropout ratio with the training progress, which can gradually
boost the difficulty of training the model to enhance its robustness. Extensive
experiments on three standard benchmark datasets have demonstrated that our
method outperforms several state-of-the-art DG methods. Our code is available
at https://github.com/lingeringlight/PLACEdropout.
Authors' comments: Accepted by ACM TOMM 2023. The code is available at
https://github.com/lingeringlight/PLACEdropout
Jiaqi Li, Haoran Li, Yaran Chen, Zixiang Ding, Nannan Li, Mingjun Ma, Zicheng Duan, Dongbing Zhao
Currently, an increasing number of model pruning methods are proposed to
resolve the contradictions between the computer powers required by the deep
learning models and the resource-constrained devices. However, most of the
traditional rule-based network pruning methods can not reach a sufficient
compression ratio with low accuracy loss and are time-consuming as well as
laborious. In this paper, we propose Automatic Block-wise and Channel-wise
Network Pruning (ABCP) to jointly search the block-wise and channel-wise
pruning action with deep reinforcement learning. A joint sample algorithm is
proposed to simultaneously generate the pruning choice of each residual block
and the channel pruning ratio of each convolutional layer from the discrete and
continuous search space respectively. The best pruning action taking both the
accuracy and the complexity of the model into account is obtained finally.
Compared with the traditional rule-based pruning method, this pipeline saves
human labor and achieves a higher compression ratio with lower accuracy loss.
Tested on the mobile robot detection dataset, the pruned YOLOv3 model saves
99.5% FLOPs, reduces 99.5% parameters, and achieves 37.3 times speed up with
only 2.8% mAP loss. The results of the transfer task on the sim2real detection
dataset also show that our pruned model has much better robustness performance.
Authors' comments: 12 pages, 9 figures, submitted to Journal of IEEE Transactions on
Cybernetics
Jiahui Li, Kun Kuang, Lin Li, Long Chen, Songyang Zhang, Jian Shao, Jun Xiao
Deep neural networks have demonstrated remarkable performance in many data-driven and prediction-oriented applications, and sometimes even perform better than humans. However, their most significant drawback is the lack of interpretability, which makes them less attractive in many real-world applications. When relating to the moral problem or the environmental factors that are uncertain such as crime judgment, financial analysis, and medical diagnosis, it is essential to mine the evidence for the model's prediction (interpret model knowledge) to convince humans. Thus, investigating how to interpret model knowledge is of paramount importance for both academic research and real applications.
Giovanni Saraceno, Fatemah Alqallaf, Claudio Agostinelli
The Seemingly Unrelated Regressions (SUR) model is a wide used estimation
procedure in econometrics, insurance and finance, where very often, the
regression model contains more than one equation. Unknown parameters,
regression coefficients and covariances among the errors terms, are estimated
using algorithms based on Generalized Least Squares or Maximum Likelihood, and
the method, as a whole, is very sensitive to outliers. To overcome this problem
M-estimators and S-estimators are proposed in the literature together with fast
algorithms. However, these procedures are only able to cope with row-wise
outliers in the error terms, while their performance becomes very poor in the
presence of cell-wise outliers and as the number of equations increases. A new
robust approach is proposed which is able to perform well under both
contamination types as well as it is fast to compute. Illustrations based on
Monte Carlo simulations and a real data example are provided.
Authors' comments: 18 pages, 6 figures and 3 Tables
Feng Huang, Weisong Wen, Jiachen Zhang, Li-Ta Hsu
Robust and precise localization is essential for the autonomous system with
navigation requirements. Light detection and ranging (LiDAR) odometry is
extensively studied in the past decades to achieve this goal. Satisfactory
accuracy can be achieved in scenarios with abundant environmental features
using existing LiDAR odometry (LO) algorithms. Unfortunately, the performance
of the LiDAR odometry is significantly degraded in urban canyons with numerous
dynamic objects and complex environmental structures. Meanwhile, it is still
not clear from the existing literature which LO algorithms perform well in such
challenging environments. To fill this gap, this paper evaluates an array of
popular and extensively studied LO pipelines using the datasets collected in
urban canyons of Hong Kong. We present the results in terms of their
positioning accuracy and computational efficiency. Three major factors
dominating the performance of LO in urban canyons are concluded, including the
ego-vehicle dynamic, moving objects, and degree of urbanization. According to
our experiment results, point-wise achieves better accuracy in urban canyons
while feature-wise achieves cost-efficiency and satisfactory positioning
accuracy.
Authors' comments: 15 pages, 14 figures
Adam C. Schneider, Adam J. Burgasser, Roman Gerasimov, Federico Marocco, Jonathan Gagne, Sam Goodman, Paul Beaulieu, William Pendrill et al.
We present the discoveries of WISEA J041451.67-585456.7 and WISEA
J181006.18-101000.5, two low-temperature (1200$-$1400 K), high proper motion
T-type subdwarfs. Both objects were discovered via their high proper motion
($>$0.5 arcsec yr$^{-1}$); WISEA J181006.18-101000.5 as part of the NEOWISE
proper motion survey and WISEA J041451.67-585456.7 as part of the citizen
science project Backyard Worlds; Planet 9. We have confirmed both as brown
dwarfs with follow-up near-infrared spectroscopy. Their spectra and
near-infrared colors are unique amongst known brown dwarfs, with some colors
consistent with L-type brown dwarfs and other colors resembling those of the
latest-type T dwarfs. While no forward model consistently reproduces the
features seen in their near-infrared spectra, the closest matches suggest very
low metallicities ([Fe/H] $\leq$ -1), making these objects likely the first
examples of extreme subdwarfs of the T spectral class (esdT). WISEA
J041451.67-585456.7 and WISEA J181006.18-101000.5 are found to be part of a
small population of objects that occupy the "substellar transition zone," and
have the lowest masses and effective temperatures of all objects in this group.
Authors' comments: Accepted for publication in the Astrophysical Journal
Erik Dennihy, Jay Farihi, Nicola Pietro Gentile Fusillo, John H. Debes
Stars with excess infrared radiation from circumstellar dust are invaluable
for studies of exoplanetary systems, informing our understanding on processes
of planet formation and destruction alike. All-sky photometric surveys have
made the identification of dusty infrared excess candidates trivial, however,
samples that rely on data from WISE are plagued with source confusion, leading
to high false positive rates. Techniques to limit its contribution to
WISE-selected samples have been developed, and their effectiveness is even more
important as we near the end-of-life of Spitzer, the only facility capable of
confirming the excess. Here, we present a Spitzer follow-up of a sample of 22
WISE-selected infrared excess candidates near the faint-end of the WISE
detection limits. Eight of the 22 excesses are deemed the result of source
confusion, with the remaining candidates all confirmed by the Spitzer data. We
consider the efficacy of ground-based near-infrared imaging and astrometric
filtering of samples to limit confusion among the sample. We find that both
techniques are worthwhile for vetting candidates, but fail to identify all of
the confused excesses, indicating that they cannot be used to confirm
WISE-selected infrared excess candidates, but only to rule them out. This
result confirms the expectation that WISE-selected infrared excess samples will
always suffer from appreciable levels of contamination, and that care should be
taken in their interpretation regardless of the filters applied.
Authors' comments: 13 pages, 4 Figures; Accepted for publication in ApJ
Haotian Ma, Hao Zhang, Fan Zhou, Yinqing Zhang, Quanshi Zhang
This paper presents a method to explain how the information of each input variable is gradually discarded during the forward propagation in a deep neural network (DNN), which provides new perspectives to explain DNNs. We define two types of entropy-based metrics, i.e. (1) the discarding of pixel-wise information used in the forward propagation, and (2) the uncertainty of the input reconstruction, to measure input information contained by a specific layer from two perspectives. Unlike previous attribution metrics, the proposed metrics ensure the fairness of comparisons between different layers of different DNNs. We can use these metrics to analyze the efficiency of information processing in DNNs, which exhibits strong connections to the performance of DNNs. We analyze information discarding in a pixel-wise manner, which is different from the information bottleneck theory measuring feature information w.r.t. the sample distribution. Experiments have shown the effectiveness of our metrics in analyzing classic DNNs and explaining existing deep-learning techniques.
Anton Starostin, Viktor Valtsifer, Zahava Barkay, Irina Legchenkova, Viktor Danchuk, Edward Bormashenko
Water condensation was studied on silanized (superhydrophobic) and
fluorinated (superoleophobic) micro-rough aluminum surfaces of the same
topography. Condensation on superhydrophobic surfaces occurred via film-wise
mechanism, whereas on superoleophobic surfaces it was drop-wise. The difference
in the pathways of condensation was attributed to the various energy barriers
separating the Cassie and Wenzel wetting states on the investigated surfaces.
The higher barriers inherent for superoleophobic surfaces promoted the
drop-wise condensation. Triple-stage kinetics of growth of droplets condensed
on superoleophobic surfaces is reported and discussed.
Authors' comments: 20 pages, 6 figures
Xuefei Zhe, Shifeng Chen, Hong Yan
Deep supervised hashing has emerged as an influential solution to large-scale semantic image retrieval problems in computer vision. In the light of recent progress, convolutional neural network based hashing methods typically seek pair-wise or triplet labels to conduct the similarity preserving learning. However, complex semantic concepts of visual contents are hard to capture by similar/dissimilar labels, which limits the retrieval performance. Generally, pair-wise or triplet losses not only suffer from expensive training costs but also lack in extracting sufficient semantic information. In this regard, we propose a novel deep supervised hashing model to learn more compact class-level similarity preserving binary codes. Our deep learning based model is motivated by deep metric learning that directly takes semantic labels as supervised information in training and generates corresponding discriminant hashing code. Specifically, a novel cubic constraint loss function based on Gaussian distribution is proposed, which preserves semantic variations while penalizes the overlap part of different classes in the embedding space. To address the discrete optimization problem introduced by binary codes, a two-step optimization strategy is proposed to provide efficient training and avoid the problem of gradient vanishing. Extensive experiments on four large-scale benchmark databases show that our model can achieve the state-of-the-art retrieval performance. Moreover, when training samples are limited, our method surpasses other supervised deep hashing methods with non-negligible margins.