Yifan Li, Zhen Tan, Kai Shu, Zongsheng Cao, Yu Kong, Huan Liu
Graph Neural Networks (GNNs) have emerged as a powerful tool for representation learning on graphs, but they often suffer from overfitting and label noise issues, especially when the data is scarce or imbalanced. Different from the paradigm of previous methods that rely on single-node confidence, in this paper, we introduce a novel Class-wise Selection for Graph Neural Networks, dubbed CSGNN, which employs a neighbor-aggregated latent space to adaptively select reliable nodes across different classes. Specifically, 1) to tackle the class imbalance issue, we introduce a dynamic class-wise selection mechanism, leveraging the clustering technique to identify clean nodes based on the neighbor-aggregated confidences. In this way, our approach can avoid the pitfalls of biased sampling which is common with global threshold techniques. 2) To alleviate the problem of noisy labels, built on the concept of the memorization effect, CSGNN prioritizes learning from clean nodes before noisy ones, thereby iteratively enhancing model performance while mitigating label noise. Through extensive experiments, we demonstrate that CSGNN outperforms state-of-the-art methods in terms of both effectiveness and robustness.
Ping Li, Chenhan Zhang, Zheng Yang, Xianghua Xu, Mingli Song
Video prediction yields future frames by employing the historical frames and has exhibited its great potential in many applications, e.g., meteorological prediction, and autonomous driving. Previous works often decode the ultimate high-level semantic features to future frames without texture details, which deteriorates the prediction quality. Motivated by this, we develop a Pair-wise Layer Attention (PLA) module to enhance the layer-wise semantic dependency of the feature maps derived from the U-shape structure in Translator, by coupling low-level visual cues and high-level features. Hence, the texture details of predicted frames are enriched. Moreover, most existing methods capture the spatiotemporal dynamics by Translator, but fail to sufficiently utilize the spatial features of Encoder. This inspires us to design a Spatial Masking (SM) module to mask partial encoding features during pretraining, which adds the visibility of remaining feature pixels by Decoder. To this end, we present a Pair-wise Layer Attention with Spatial Masking (PLA-SM) framework for video prediction to capture the spatiotemporal dynamics, which reflect the motion trend. Extensive experiments and rigorous ablation studies on five benchmarks demonstrate the advantages of the proposed approach. The code is available at GitHub.
Yunqiao Yang, Long-Kai Huang, Ying Wei
A multitude of prevalent pre-trained models mark a major milestone in the development of artificial intelligence, while fine-tuning has been a common practice that enables pretrained models to figure prominently in a wide array of target datasets. Our empirical results reveal that off-the-shelf finetuning techniques are far from adequate to mitigate negative transfer caused by two types of underperforming features in a pre-trained model, including rare features and spuriously correlated features. Rooted in structural causal models of predictions after fine-tuning, we propose a Concept-wise fine-tuning (Concept-Tuning) approach which refines feature representations in the level of patches with each patch encoding a concept. Concept-Tuning minimizes the negative impacts of rare features and spuriously correlated features by (1) maximizing the mutual information between examples in the same category with regard to a slice of rare features (a patch) and (2) applying front-door adjustment via attention neural networks in channels and feature slices (patches). The proposed Concept-Tuning consistently and significantly (by up to 4.76%) improves prior state-of-the-art fine-tuning methods on eleven datasets, diverse pre-training strategies (supervised and self-supervised ones), various network architectures, and sample sizes in a target dataset.
Giuseppe Guarino, Matteo Ciotola, Gemine Vivone, Giuseppe Scarpa
Hyperspectral pansharpening is receiving a growing interest since the last few years as testified by a large number of research papers and challenges. It consists in a pixel-level fusion between a lower-resolution hyperspectral datacube and a higher-resolution single-band image, the panchromatic image, with the goal of providing a hyperspectral datacube at panchromatic resolution. Thanks to their powerful representational capabilities, deep learning models have succeeded to provide unprecedented results on many general purpose image processing tasks. However, when moving to domain specific problems, as in this case, the advantages with respect to traditional model-based approaches are much lesser clear-cut due to several contextual reasons. Scarcity of training data, lack of ground-truth, data shape variability, are some such factors that limit the generalization capacity of the state-of-the-art deep learning networks for hyperspectral pansharpening. To cope with these limitations, in this work we propose a new deep learning method which inherits a simple single-band unsupervised pansharpening model nested in a sequential band-wise adaptive scheme, where each band is pansharpened refining the model tuned on the preceding one. By doing so, a simple model is propagated along the wavelength dimension, adaptively and flexibly, with no need to have a fixed number of spectral bands, and, with no need to dispose of large, expensive and labeled training datasets. The proposed method achieves very good results on our datasets, outperforming both traditional and deep learning reference methods. The implementation of the proposed method can be found on https://github.com/giu-guarino/R-PNN
Junyoung Park, Jin Kim, Hyeongjun Kwon, Ilhoon Yoon, Kwanghoon Sohn
Given the inevitability of domain shifts during inference in real-world
applications, test-time adaptation (TTA) is essential for model adaptation
after deployment. However, the real-world scenario of continuously changing
target distributions presents challenges including catastrophic forgetting and
error accumulation. Existing TTA methods for non-stationary domain shifts,
while effective, incur excessive computational load, making them impractical
for on-device settings. In this paper, we introduce a layer-wise auto-weighting
algorithm for continual and gradual TTA that autonomously identifies layers for
preservation or concentrated adaptation. By leveraging the Fisher Information
Matrix (FIM), we first design the learning weight to selectively focus on
layers associated with log-likelihood changes while preserving unrelated ones.
Then, we further propose an exponential min-max scaler to make certain layers
nearly frozen while mitigating outliers. This minimizes forgetting and error
accumulation, leading to efficient adaptation to non-stationary target
distribution. Experiments on CIFAR-10C, CIFAR-100C, and ImageNet-C show our
method outperforms conventional continual and gradual TTA approaches while
significantly reducing computational load, highlighting the importance of
FIM-based learning weight in adapting to continuously or gradually shifting
target domains.
Authors' comments: WACV 2024
Zhewen Yu, Christos-Savvas Bouganis
With the great success of Deep Neural Networks (DNN), the design of efficient
hardware accelerators has triggered wide interest in the research community.
Existing research explores two architectural strategies: sequential layer
execution and layer-wise pipelining. While the former supports a wider range of
models, the latter is favoured for its enhanced customization and efficiency. A
challenge for the layer-wise pipelining architecture is its substantial demand
for the on-chip memory for weights storage, impeding the deployment of
large-scale networks on resource-constrained devices. This paper introduces
AutoWS, a pioneering memory management methodology that exploits both on-chip
and off-chip memory to optimize weight storage within a layer-wise pipelining
architecture, taking advantage of its static schedule. Through a comprehensive
investigation on both the hardware design and the Design Space Exploration, our
methodology is fully automated and enables the deployment of large-scale DNN
models on resource-constrained devices, which was not possible in existing
works that target layer-wise pipelining architectures. AutoWS is open-source:
https://github.com/Yu-Zhewen/AutoWS
Authors' comments: accepted by DATE2024
Siran Dai, Qianqian Xu, Zhiyong Yang, Xiaochun Cao, Qingming Huang
The Area Under the ROC Curve (AUC) is a widely employed metric in long-tailed classification scenarios. Nevertheless, most existing methods primarily assume that training and testing examples are drawn i.i.d. from the same distribution, which is often unachievable in practice. Distributionally Robust Optimization (DRO) enhances model performance by optimizing it for the local worst-case scenario, but directly integrating AUC optimization with DRO results in an intractable optimization problem. To tackle this challenge, methodically we propose an instance-wise surrogate loss of Distributionally Robust AUC (DRAUC) and build our optimization framework on top of it. Moreover, we highlight that conventional DRAUC may induce label bias, hence introducing distribution-aware DRAUC as a more suitable metric for robust AUC learning. Theoretically, we affirm that the generalization gap between the training loss and testing error diminishes if the training set is sufficiently large. Empirically, experiments on corrupted benchmark datasets demonstrate the effectiveness of our proposed method. Code is available at: https://github.com/EldercatSAM/DRAUC.
Simon Sinong Zhan, Yixuan Wang, Qingyuan Wu, Ruochen Jiao, Chao Huang, Qi Zhu
Reinforcement Learning(RL) in the context of safe exploration has long
grappled with the challenges of the delicate balance between maximizing rewards
and minimizing safety violations, the complexities arising from contact-rich or
non-smooth environments, and high-dimensional pixel observations. Furthermore,
incorporating state-wise safety constraints in the exploration and learning
process, where the agent is prohibited from accessing unsafe regions without
prior knowledge, adds an additional layer of complexity. In this paper, we
propose a novel pixel-observation safe RL algorithm that efficiently encodes
state-wise safety constraints with unknown hazard regions through the
introduction of a latent barrier function learning mechanism. As a joint
learning framework, our approach first involves constructing a latent dynamics
model with low-dimensional latent spaces derived from pixel observations.
Subsequently, we build and learn a latent barrier function on top of the latent
dynamics and conduct policy optimization simultaneously, thereby improving both
safety and the total expected return. Experimental evaluations on the
safety-gym benchmark suite demonstrate that our proposed method significantly
reduces safety violations throughout the training process and demonstrates
faster safety convergence compared to existing methods while achieving
competitive results in reward return.
Authors' comments: 10 pages, 6 figures
Zhimin Li, Shusen Liu, Kailkhura Bhavya, Timo Bremer, Valerio Pascucci
Neural network have achieved remarkable successes in many scientific fields. However, the interpretability of the neural network model is still a major bottlenecks to deploy such technique into our daily life. The challenge can dive into the non-linear behavior of the neural network, which rises a critical question that how a model use input feature to make a decision. The classical approach to address this challenge is feature attribution, which assigns an important score to each input feature and reveal its importance of current prediction. However, current feature attribution approaches often indicate the importance of each input feature without detail of how they are actually processed by a model internally. These attribution approaches often raise a concern that whether they highlight correct features for a model prediction. For a neural network model, the non-linear behavior is often caused by non-linear activation units of a model. However, the computation behavior of a prediction from a neural network model is locally linear, because one prediction has only one activation pattern. Base on the observation, we propose an instance-wise linearization approach to reformulates the forward computation process of a neural network prediction. This approach reformulates different layers of convolution neural networks into linear matrix multiplication. Aggregating all layers' computation, a prediction complex convolution neural network operations can be described as a linear matrix multiplication $F(x) = W \cdot x + b$. This equation can not only provides a feature attribution map that highlights the important of the input features but also tells how each input feature contributes to a prediction exactly. Furthermore, we discuss the application of this technique in both supervise classification and unsupervised neural network learning parametric t-SNE dimension reduction.
Zijie Pan, Jiachen Lu, Xiatian Zhu, Li Zhang
High-resolution 3D object generation remains a challenging task primarily due
to the limited availability of comprehensive annotated training data. Recent
advancements have aimed to overcome this constraint by harnessing image
generative models, pretrained on extensive curated web datasets, using
knowledge transfer techniques like Score Distillation Sampling (SDS).
Efficiently addressing the requirements of high-resolution rendering often
necessitates the adoption of latent representation-based models, such as the
Latent Diffusion Model (LDM). In this framework, a significant challenge
arises: To compute gradients for individual image pixels, it is necessary to
backpropagate gradients from the designated latent space through the frozen
components of the image model, such as the VAE encoder used within LDM.
However, this gradient propagation pathway has never been optimized, remaining
uncontrolled during training. We find that the unregulated gradients adversely
affect the 3D model's capacity in acquiring texture-related information from
the image generative model, leading to poor quality appearance synthesis. To
address this overarching challenge, we propose an innovative operation termed
Pixel-wise Gradient Clipping (PGC) designed for seamless integration into
existing 3D generative models, thereby enhancing their synthesis quality.
Specifically, we control the magnitude of stochastic gradients by clipping the
pixel-wise gradients efficiently, while preserving crucial texture-related
gradient directions. Despite this simplicity and minimal extra cost, extensive
experiments demonstrate the efficacy of our PGC in enhancing the performance of
existing 3D generative models for high-resolution object rendering.
Authors' comments: Accepted at ICLR 2024. Project page:
https://fudan-zvg.github.io/PGC-3D
Siqi Kou, Lei Gan, Dequan Wang, Chongxuan Li, Zhijie Deng
Diffusion models have impressive image generation capability, but low-quality generations still exist, and their identification remains challenging due to the lack of a proper sample-wise metric. To address this, we propose BayesDiff, a pixel-wise uncertainty estimator for generations from diffusion models based on Bayesian inference. In particular, we derive a novel uncertainty iteration principle to characterize the uncertainty dynamics in diffusion, and leverage the last-layer Laplace approximation for efficient Bayesian inference. The estimated pixel-wise uncertainty can not only be aggregated into a sample-wise metric to filter out low-fidelity images but also aids in augmenting successful generations and rectifying artifacts in failed generations in text-to-image tasks. Extensive experiments demonstrate the efficacy of BayesDiff and its promise for practical applications.
Catherine Xinrui Yu, Jiaqi Gu, Zhaomeng Chen, Zihuai He
Testing multiple hypotheses of conditional independence with provable error
rate control is a fundamental problem with various applications. To infer
conditional independence with family-wise error rate (FWER) control when only
summary statistics of marginal dependence are accessible, we adopt
GhostKnockoff to directly generate knockoff copies of summary statistics and
propose a new filter to select features conditionally dependent to the response
with provable FWER control. In addition, we develop a computationally efficient
algorithm to greatly reduce the computational cost of knockoff copies
generation without sacrificing power and FWER control. Experiments on simulated
data and a real dataset of Alzheimer's disease genetics demonstrate the
advantage of proposed method over the existing alternatives in both statistical
power and computational efficiency.
Authors' comments: 35 pages
Stefan Stojanovic, Yassir Jedra, Alexandre Proutiere
We study matrix estimation problems arising in reinforcement learning (RL)
with low-rank structure. In low-rank bandits, the matrix to be recovered
specifies the expected arm rewards, and for low-rank Markov Decision Processes
(MDPs), it may for example characterize the transition kernel of the MDP. In
both cases, each entry of the matrix carries important information, and we seek
estimation methods with low entry-wise error. Importantly, these methods
further need to accommodate for inherent correlations in the available data
(e.g. for MDPs, the data consists of system trajectories). We investigate the
performance of simple spectral-based matrix estimation approaches: we show that
they efficiently recover the singular subspaces of the matrix and exhibit
nearly-minimal entry-wise error. These new results on low-rank matrix
estimation make it possible to devise reinforcement learning algorithms that
fully exploit the underlying low-rank structure. We provide two examples of
such algorithms: a regret minimization algorithm for low-rank bandit problems,
and a best policy identification algorithm for reward-free RL in low-rank MDPs.
Both algorithms yield state-of-the-art performance guarantees.
Authors' comments: To appear in NeurIPS 2023
Guanqi Chen, Guanbin Li
Cardiac function assessment aims at predicting left ventricular ejection
fraction (LVEF) given an echocardiogram video, which requests models to focus
on the changes in the left ventricle during the cardiac cycle. How to assess
cardiac function accurately and automatically from an echocardiogram video is a
valuable topic in intelligent assisted healthcare. Existing video-based methods
do not pay much attention to the left ventricular region, nor the left
ventricular changes caused by motion. In this work, we propose a
semi-supervised auxiliary learning paradigm with a left ventricular
segmentation task, which contributes to the representation learning for the
left ventricular region. To better model the importance of motion information,
we introduce a temporal channel-wise attention (TCA) module to excite those
channels used to describe motion. Furthermore, we reform the TCA module with
semantic perception by taking the segmentation map of the left ventricle as
input to focus on the motion patterns of the left ventricle. Finally, to reduce
the difficulty of direct LVEF regression, we utilize an anchor-based
classification and regression method to predict LVEF. Our approach achieves
state-of-the-art performance on the Stanford dataset with an improvement of
0.22 MAE, 0.26 RMSE, and 1.9% $R^2$.
Authors' comments: Accepted by ISBI 2022 (oral)
Alexandre Mascarenhas, Yuri Lavinas, Claus Aranha
Dynamic Optimization Problems (DOPs) are characterized by changes in the fitness landscape that can occur at any time and are common in real world applications. The main issues to be considered include detecting the change in the fitness landscape and reacting in accord. Over the years, several evolutionary algorithms have been proposed to take into account this characteristic during the optimization process. However, the number of available tools or open source codebases for these approaches is limited, making reproducibility and extensive experimentation difficult. To solve this, we developed a component-oriented framework for DOPs called Adjustable Components for Dynamic Problems (AbCD), inspired by similar works in the Multiobjective static domain. Using this framework, we investigate components that were proposed in several popular DOP algorithms. Our experiments show that the performance of these components depends on the problem and the selected components used in a configuration, which differs from the results reported in the literature. Using irace, we demonstrate how this framework can automatically generate DOP algorithm configurations that take into account the characteristics of the problem to be solved. Our results highlight existing problems in the DOP field that need to be addressed in the future development of algorithms and components.
Li Li, You Qin, Wei Ji, Yuxiao Zhou, Roger Zimmermann
Panoptic Scene Graph Generation (PSG) involves the detection of objects and
the prediction of their corresponding relationships (predicates). However, the
presence of biased predicate annotations poses a significant challenge for PSG
models, as it hinders their ability to establish a clear decision boundary
among different predicates. This issue substantially impedes the practical
utility and real-world applicability of PSG models. To address the intrinsic
bias above, we propose a novel framework to infer potentially biased
annotations by measuring the predicate prediction risks within each
subject-object pair (domain), and adaptively transfer the biased annotations to
consistent ones by learning invariant predicate representation embeddings.
Experiments show that our method significantly improves the performance of
benchmark models, achieving a new state-of-the-art performance, and shows great
generalization and effectiveness on PSG dataset.
Authors' comments: arXiv admin note: text overlap with arXiv:2307.15567
Yuyuan Li, Chaochao Chen, Xiaolin Zheng, Yizhao Zhang, Zhongxuan Han, Dan Meng, Jun Wang
With the growing privacy concerns in recommender systems, recommendation unlearning, i.e., forgetting the impact of specific learned targets, is getting increasing attention. Existing studies predominantly use training data, i.e., model inputs, as the unlearning target. However, we find that attackers can extract private information, i.e., gender, race, and age, from a trained model even if it has not been explicitly encountered during training. We name this unseen information as attribute and treat it as the unlearning target. To protect the sensitive attribute of users, Attribute Unlearning (AU) aims to degrade attacking performance and make target attributes indistinguishable. In this paper, we focus on a strict but practical setting of AU, namely Post-Training Attribute Unlearning (PoT-AU), where unlearning can only be performed after the training of the recommendation model is completed. To address the PoT-AU problem in recommender systems, we design a two-component loss function that consists of i) distinguishability loss: making attribute labels indistinguishable from attackers, and ii) regularization loss: preventing drastic changes in the model that result in a negative impact on recommendation performance. Specifically, we investigate two types of distinguishability measurements, i.e., user-to-user and distribution-to-distribution. We use the stochastic gradient descent algorithm to optimize our proposed loss. Extensive experiments on three real-world datasets demonstrate the effectiveness of our proposed methods.
Chen Liang, Jiahui Yu, Ming-Hsuan Yang, Matthew Brown, Yin Cui, Tuo Zhao, Boqing Gong, Tianyi Zhou
Pre-trained multimodal foundation models have demonstrated remarkable generalizability but pose challenges for deployment due to their large sizes. One effective approach to reducing their sizes is layerwise distillation, wherein small student models are trained to match the hidden representations of large teacher models at each layer. Motivated by our observation that certain architecture components, referred to as modules, contribute more significantly to the student's performance than others, we propose to track the contributions of individual modules by recording the loss decrement after distillation each module and choose the module with a greater contribution to distill more frequently. Such an approach can be naturally formulated as a multi-armed bandit (MAB) problem, where modules and loss decrements are considered as arms and rewards, respectively. We then develop a modified-Thompson sampling algorithm named OPTIMA to address the nonstationarity of module contributions resulting from model updating. Specifically, we leverage the observed contributions in recent history to estimate the changing contribution of each module and select modules based on these estimations to maximize the cumulative contribution. We evaluate the effectiveness of OPTIMA through distillation experiments on various multimodal understanding and image captioning tasks, using the CoCa-Large model (Yu et al., 2022) as the teacher model.
Xiang Liu, Liangxi Liu, Feiyang Ye, Yunheng Shen, Xia Li, Linshan Jiang, Jialin Li
Efficiently aggregating trained neural networks from local clients into a
global model on a server is a widely researched topic in federated learning.
Recently, motivated by diminishing privacy concerns, mitigating potential
attacks, and reducing communication overhead, one-shot federated learning
(i.e., limiting client-server communication into a single round) has gained
popularity among researchers. However, the one-shot aggregation performances
are sensitively affected by the non-identical training data distribution, which
exhibits high statistical heterogeneity in some real-world scenarios. To
address this issue, we propose a novel one-shot aggregation method with
layer-wise posterior aggregation, named FedLPA. FedLPA aggregates local models
to obtain a more accurate global model without requiring extra auxiliary
datasets or exposing any private label information, e.g., label distributions.
To effectively capture the statistics maintained in the biased local datasets
in the practical non-IID scenario, we efficiently infer the posteriors of each
layer in each local model using layer-wise Laplace approximation and aggregate
them to train the global parameters. Extensive experimental results demonstrate
that FedLPA significantly improves learning performance over state-of-the-art
methods across several metrics.
Authors' comments: 39pages, 4 figures
Shengkun Tang, Yaqing Wang, Caiwen Ding, Yi Liang, Yao Li, Dongkuan Xu
Diffusion models achieve great success in generating diverse and high-fidelity images, yet their widespread application, especially in real-time scenarios, is hampered by their inherently slow generation speed. The slow generation stems from the necessity of multi-step network inference. While some certain predictions benefit from the full computation of the model in each sampling iteration, not every iteration requires the same amount of computation, potentially leading to inefficient computation. Unlike typical adaptive computation challenges that deal with single-step generation problems, diffusion processes with a multi-step generation need to dynamically adjust their computational resource allocation based on the ongoing assessment of each step's importance to the final image output, presenting a unique set of challenges. In this work, we propose AdaDiff, an adaptive framework that dynamically allocates computation resources in each sampling step to improve the generation efficiency of diffusion models. To assess the effects of changes in computational effort on image quality, we present a timestep-aware uncertainty estimation module (UEM). Integrated at each intermediate layer, the UEM evaluates the predictive uncertainty. This uncertainty measurement serves as an indicator for determining whether to terminate the inference process. Additionally, we introduce an uncertainty-aware layer-wise loss aimed at bridging the performance gap between full models and their adaptive counterparts.