Jonggyu Jang, Hyeonsu Lyu, Hyun Jong Yang
Model inversion (MI) attacks aim to infer or reconstruct the training dataset
through reverse-engineering from the target model's weights. Recently,
significant advancements in generative models have enabled MI attacks to
overcome challenges in producing photo-realistic replicas of the training
dataset, a technique known as generative MI. The generative MI primarily
focuses on identifying latent vectors that correspond to specific target
labels, leveraging a generative model trained with an auxiliary dataset.
However, an important aspect is often overlooked: the MI attacks fail if the
pre-trained generative model lacks the coverage to create an image
corresponding to the target label, especially when there is a significant
difference between the target and auxiliary datasets. To address this gap, we
propose the Patch-MI method, inspired by a jigsaw puzzle, which offers a novel
probabilistic interpretation of MI attacks. Even with a dissimilar auxiliary
dataset, our method effectively creates images that closely mimic the
distribution of image patches in the target dataset by patch-based
reconstruction. Moreover, we numerically demonstrate that the Patch-MI improves
Top 1 attack accuracy by 5\%p compared to existing methods.
Authors' comments: 12 pages
Ziyi Yin, Rafael Orozco, Mathias Louboutin, Felix J. Herrmann
We introduce a probabilistic technique for full-waveform inversion, employing variational inference and conditional normalizing flows to quantify uncertainty in migration-velocity models and its impact on imaging. Our approach integrates generative artificial intelligence with physics-informed common-image gathers, reducing reliance on accurate initial velocity models. Considered case studies demonstrate its efficacy producing realizations of migration-velocity models conditioned by the data. These models are used to quantify amplitude and positioning effects during subsequent imaging.
Robert Mnatsakanov, Rafik Aramyan, Farhad Jafari
The problem of recovering a moment-determinate multivariate function $f$ via its moment sequence is studied. Under mild conditions on $f$, the point-wise and $L_1$-rates of convergence for the proposed constructions are established. The cases where $f$ is the indicator function of a set, and represents a discrete probability mass function are also investigated. Calculations of the approximants and simulation studies are conducted to graphically illustrate the behavior of the approximations in several simple examples. Analytical and simulated errors of proposed approximations are recorded in Tables 1-3.
Bringfried Stecklum
The Wide-field Infrared Survey Explorer (WISE, Wright et al. 2010) and its
follow-up Near-Earth Object (NEO) mission (NEOWISE, Mainzer et al. 2011) scan
the mid-infrared sky twice a year. The spatial and temporal coverage of the
resulting database is of utmost importance for variability studies, in
particular of young stellar objects (YSOs) which have red $W1{-}W2$ colors.
During such an effort, I noticed subarcsecond position offsets between
subsequent visits. The offsets do not appear for targets with small $W1{-}W2$
colors, which points to a chromatic origin in the optics, caused by the
spacecraft pointing alternating ``forward'' and ``backward'' from one visit to
another. It amounts to 0\farcs1 for targets with $W1{-}W2\approx2$.
Consideration of this chromatic offset will improve astrometry. This is of
particular importance for NEOs that are generally red.
Authors' comments: 3 pages, 1 figure, submitted to RNAAS
Huan Chen, Wangcai Zhao, Tingfa Xu, Shiyun Zhou, Peifu Liu, Jianan Li
Coded Aperture Snapshot Spectral Imaging (CASSI) reconstruction aims to
recover the 3D spatial-spectral signal from 2D measurement. Existing methods
for reconstructing Hyperspectral Image (HSI) typically involve learning
mappings from a 2D compressed image to a predetermined set of discrete spectral
bands. However, this approach overlooks the inherent continuity of the spectral
information. In this study, we propose an innovative method called
Spectral-wise Implicit Neural Representation (SINR) as a pioneering step toward
addressing this limitation. SINR introduces a continuous spectral amplification
process for HSI reconstruction, enabling spectral super-resolution with
customizable magnification factors. To achieve this, we leverage the concept of
implicit neural representation. Specifically, our approach introduces a
spectral-wise attention mechanism that treats individual channels as distinct
tokens, thereby capturing global spectral dependencies. Additionally, our
approach incorporates two components, namely a Fourier coordinate encoder and a
spectral scale factor module. The Fourier coordinate encoder enhances the
SINR's ability to emphasize high-frequency components, while the spectral scale
factor module guides the SINR to adapt to the variable number of spectral
channels. Notably, the SINR framework enhances the flexibility of CASSI
reconstruction by accommodating an unlimited number of spectral bands in the
desired output. Extensive experiments demonstrate that our SINR outperforms
baseline methods. By enabling continuous reconstruction within the CASSI
framework, we take the initial stride toward integrating implicit neural
representation into the field.
Authors' comments: Accepted by IEEE Transactions on Circuits and Systems for Video
Technology, has been published
Yefan Zhou, Tianyu Pang, Keqin Liu, Charles H. Martin, Michael W. Mahoney, Yaoqing Yang
Regularization in modern machine learning is crucial, and it can take various
forms in algorithmic design: training set, model family, error function,
regularization terms, and optimizations. In particular, the learning rate,
which can be interpreted as a temperature-like parameter within the statistical
mechanics of learning, plays a crucial role in neural network training. Indeed,
many widely adopted training strategies basically just define the decay of the
learning rate over time. This process can be interpreted as decreasing a
temperature, using either a global learning rate (for the entire model) or a
learning rate that varies for each parameter. This paper proposes TempBalance,
a straightforward yet effective layer-wise learning rate method. TempBalance is
based on Heavy-Tailed Self-Regularization (HT-SR) Theory, an approach which
characterizes the implicit self-regularization of different layers in trained
models. We demonstrate the efficacy of using HT-SR-motivated metrics to guide
the scheduling and balancing of temperature across all network layers during
model training, resulting in improved performance during testing. We implement
TempBalance on CIFAR10, CIFAR100, SVHN, and TinyImageNet datasets using
ResNets, VGGs, and WideResNets with various depths and widths. Our results show
that TempBalance significantly outperforms ordinary SGD and carefully-tuned
spectral norm regularization. We also show that TempBalance outperforms a
number of state-of-the-art optimizers and learning rate schedulers.
Authors' comments: NeurIPS 2023 Spotlight, first two authors contributed equally
Shpresim Sadiku, Moritz Wagner, Sebastian Pokutta
Sparse adversarial attacks fool deep neural networks (DNNs) through minimal pixel perturbations, often regularized by the $\ell_0$ norm. Recent efforts have replaced this norm with a structural sparsity regularizer, such as the nuclear group norm, to craft group-wise sparse adversarial attacks. The resulting perturbations are thus explainable and hold significant practical relevance, shedding light on an even greater vulnerability of DNNs. However, crafting such attacks poses an optimization challenge, as it involves computing norms for groups of pixels within a non-convex objective. We address this by presenting a two-phase algorithm that generates group-wise sparse attacks within semantically meaningful areas of an image. Initially, we optimize a quasinorm adversarial loss using the $1/2-$quasinorm proximal operator tailored for non-convex programming. Subsequently, the algorithm transitions to a projected Nesterov's accelerated gradient descent with $2-$norm regularization applied to perturbation magnitudes. Rigorous evaluations on CIFAR-10 and ImageNet datasets demonstrate a remarkable increase in group-wise sparsity, e.g., $50.9\%$ on CIFAR-10 and $38.4\%$ on ImageNet (average case, targeted attack). This performance improvement is accompanied by significantly faster computation times, improved explainability, and a $100\%$ attack success rate.
Yixuan Luo, Mengye Ren, Sai Qian Zhang
Like masked language modeling (MLM) in natural language processing, masked image modeling (MIM) aims to extract valuable insights from image patches to enhance the feature extraction capabilities of the underlying deep neural network (DNN). Contrasted with other training paradigms like supervised learning and unsupervised contrastive learning, masked image modeling (MIM) pretraining typically demands significant computational resources in order to manage large training data batches (e.g., 4096). The significant memory and computation requirements pose a considerable challenge to its broad adoption. To mitigate this, we introduce a novel learning framework, termed~\textit{Block-Wise Masked Image Modeling} (BIM). This framework involves decomposing the MIM tasks into several sub-tasks with independent computation patterns, resulting in block-wise back-propagation operations instead of the traditional end-to-end approach. Our proposed BIM maintains superior performance compared to conventional MIM while greatly reducing peak memory consumption. Moreover, BIM naturally enables the concurrent training of numerous DNN backbones of varying depths. This leads to the creation of multiple trained DNN backbones, each tailored to different hardware platforms with distinct computing capabilities. This approach significantly reduces computational costs in comparison with training each DNN backbone individually. Our framework offers a promising solution for resource constrained training of MIM.
Yifan Li, Zhen Tan, Kai Shu, Zongsheng Cao, Yu Kong, Huan Liu
Graph Neural Networks (GNNs) have emerged as a powerful tool for representation learning on graphs, but they often suffer from overfitting and label noise issues, especially when the data is scarce or imbalanced. Different from the paradigm of previous methods that rely on single-node confidence, in this paper, we introduce a novel Class-wise Selection for Graph Neural Networks, dubbed CSGNN, which employs a neighbor-aggregated latent space to adaptively select reliable nodes across different classes. Specifically, 1) to tackle the class imbalance issue, we introduce a dynamic class-wise selection mechanism, leveraging the clustering technique to identify clean nodes based on the neighbor-aggregated confidences. In this way, our approach can avoid the pitfalls of biased sampling which is common with global threshold techniques. 2) To alleviate the problem of noisy labels, built on the concept of the memorization effect, CSGNN prioritizes learning from clean nodes before noisy ones, thereby iteratively enhancing model performance while mitigating label noise. Through extensive experiments, we demonstrate that CSGNN outperforms state-of-the-art methods in terms of both effectiveness and robustness.
Ping Li, Chenhan Zhang, Zheng Yang, Xianghua Xu, Mingli Song
Video prediction yields future frames by employing the historical frames and has exhibited its great potential in many applications, e.g., meteorological prediction, and autonomous driving. Previous works often decode the ultimate high-level semantic features to future frames without texture details, which deteriorates the prediction quality. Motivated by this, we develop a Pair-wise Layer Attention (PLA) module to enhance the layer-wise semantic dependency of the feature maps derived from the U-shape structure in Translator, by coupling low-level visual cues and high-level features. Hence, the texture details of predicted frames are enriched. Moreover, most existing methods capture the spatiotemporal dynamics by Translator, but fail to sufficiently utilize the spatial features of Encoder. This inspires us to design a Spatial Masking (SM) module to mask partial encoding features during pretraining, which adds the visibility of remaining feature pixels by Decoder. To this end, we present a Pair-wise Layer Attention with Spatial Masking (PLA-SM) framework for video prediction to capture the spatiotemporal dynamics, which reflect the motion trend. Extensive experiments and rigorous ablation studies on five benchmarks demonstrate the advantages of the proposed approach. The code is available at GitHub.
Yunqiao Yang, Long-Kai Huang, Ying Wei
A multitude of prevalent pre-trained models mark a major milestone in the development of artificial intelligence, while fine-tuning has been a common practice that enables pretrained models to figure prominently in a wide array of target datasets. Our empirical results reveal that off-the-shelf finetuning techniques are far from adequate to mitigate negative transfer caused by two types of underperforming features in a pre-trained model, including rare features and spuriously correlated features. Rooted in structural causal models of predictions after fine-tuning, we propose a Concept-wise fine-tuning (Concept-Tuning) approach which refines feature representations in the level of patches with each patch encoding a concept. Concept-Tuning minimizes the negative impacts of rare features and spuriously correlated features by (1) maximizing the mutual information between examples in the same category with regard to a slice of rare features (a patch) and (2) applying front-door adjustment via attention neural networks in channels and feature slices (patches). The proposed Concept-Tuning consistently and significantly (by up to 4.76%) improves prior state-of-the-art fine-tuning methods on eleven datasets, diverse pre-training strategies (supervised and self-supervised ones), various network architectures, and sample sizes in a target dataset.
Giuseppe Guarino, Matteo Ciotola, Gemine Vivone, Giuseppe Scarpa
Hyperspectral pansharpening is receiving a growing interest since the last few years as testified by a large number of research papers and challenges. It consists in a pixel-level fusion between a lower-resolution hyperspectral datacube and a higher-resolution single-band image, the panchromatic image, with the goal of providing a hyperspectral datacube at panchromatic resolution. Thanks to their powerful representational capabilities, deep learning models have succeeded to provide unprecedented results on many general purpose image processing tasks. However, when moving to domain specific problems, as in this case, the advantages with respect to traditional model-based approaches are much lesser clear-cut due to several contextual reasons. Scarcity of training data, lack of ground-truth, data shape variability, are some such factors that limit the generalization capacity of the state-of-the-art deep learning networks for hyperspectral pansharpening. To cope with these limitations, in this work we propose a new deep learning method which inherits a simple single-band unsupervised pansharpening model nested in a sequential band-wise adaptive scheme, where each band is pansharpened refining the model tuned on the preceding one. By doing so, a simple model is propagated along the wavelength dimension, adaptively and flexibly, with no need to have a fixed number of spectral bands, and, with no need to dispose of large, expensive and labeled training datasets. The proposed method achieves very good results on our datasets, outperforming both traditional and deep learning reference methods. The implementation of the proposed method can be found on https://github.com/giu-guarino/R-PNN
Junyoung Park, Jin Kim, Hyeongjun Kwon, Ilhoon Yoon, Kwanghoon Sohn
Given the inevitability of domain shifts during inference in real-world
applications, test-time adaptation (TTA) is essential for model adaptation
after deployment. However, the real-world scenario of continuously changing
target distributions presents challenges including catastrophic forgetting and
error accumulation. Existing TTA methods for non-stationary domain shifts,
while effective, incur excessive computational load, making them impractical
for on-device settings. In this paper, we introduce a layer-wise auto-weighting
algorithm for continual and gradual TTA that autonomously identifies layers for
preservation or concentrated adaptation. By leveraging the Fisher Information
Matrix (FIM), we first design the learning weight to selectively focus on
layers associated with log-likelihood changes while preserving unrelated ones.
Then, we further propose an exponential min-max scaler to make certain layers
nearly frozen while mitigating outliers. This minimizes forgetting and error
accumulation, leading to efficient adaptation to non-stationary target
distribution. Experiments on CIFAR-10C, CIFAR-100C, and ImageNet-C show our
method outperforms conventional continual and gradual TTA approaches while
significantly reducing computational load, highlighting the importance of
FIM-based learning weight in adapting to continuously or gradually shifting
target domains.
Authors' comments: WACV 2024
Zhewen Yu, Christos-Savvas Bouganis
With the great success of Deep Neural Networks (DNN), the design of efficient
hardware accelerators has triggered wide interest in the research community.
Existing research explores two architectural strategies: sequential layer
execution and layer-wise pipelining. While the former supports a wider range of
models, the latter is favoured for its enhanced customization and efficiency. A
challenge for the layer-wise pipelining architecture is its substantial demand
for the on-chip memory for weights storage, impeding the deployment of
large-scale networks on resource-constrained devices. This paper introduces
AutoWS, a pioneering memory management methodology that exploits both on-chip
and off-chip memory to optimize weight storage within a layer-wise pipelining
architecture, taking advantage of its static schedule. Through a comprehensive
investigation on both the hardware design and the Design Space Exploration, our
methodology is fully automated and enables the deployment of large-scale DNN
models on resource-constrained devices, which was not possible in existing
works that target layer-wise pipelining architectures. AutoWS is open-source:
https://github.com/Yu-Zhewen/AutoWS
Authors' comments: accepted by DATE2024
Siran Dai, Qianqian Xu, Zhiyong Yang, Xiaochun Cao, Qingming Huang
The Area Under the ROC Curve (AUC) is a widely employed metric in long-tailed classification scenarios. Nevertheless, most existing methods primarily assume that training and testing examples are drawn i.i.d. from the same distribution, which is often unachievable in practice. Distributionally Robust Optimization (DRO) enhances model performance by optimizing it for the local worst-case scenario, but directly integrating AUC optimization with DRO results in an intractable optimization problem. To tackle this challenge, methodically we propose an instance-wise surrogate loss of Distributionally Robust AUC (DRAUC) and build our optimization framework on top of it. Moreover, we highlight that conventional DRAUC may induce label bias, hence introducing distribution-aware DRAUC as a more suitable metric for robust AUC learning. Theoretically, we affirm that the generalization gap between the training loss and testing error diminishes if the training set is sufficiently large. Empirically, experiments on corrupted benchmark datasets demonstrate the effectiveness of our proposed method. Code is available at: https://github.com/EldercatSAM/DRAUC.
Simon Sinong Zhan, Yixuan Wang, Qingyuan Wu, Ruochen Jiao, Chao Huang, Qi Zhu
Reinforcement Learning(RL) in the context of safe exploration has long
grappled with the challenges of the delicate balance between maximizing rewards
and minimizing safety violations, the complexities arising from contact-rich or
non-smooth environments, and high-dimensional pixel observations. Furthermore,
incorporating state-wise safety constraints in the exploration and learning
process, where the agent is prohibited from accessing unsafe regions without
prior knowledge, adds an additional layer of complexity. In this paper, we
propose a novel pixel-observation safe RL algorithm that efficiently encodes
state-wise safety constraints with unknown hazard regions through the
introduction of a latent barrier function learning mechanism. As a joint
learning framework, our approach first involves constructing a latent dynamics
model with low-dimensional latent spaces derived from pixel observations.
Subsequently, we build and learn a latent barrier function on top of the latent
dynamics and conduct policy optimization simultaneously, thereby improving both
safety and the total expected return. Experimental evaluations on the
safety-gym benchmark suite demonstrate that our proposed method significantly
reduces safety violations throughout the training process and demonstrates
faster safety convergence compared to existing methods while achieving
competitive results in reward return.
Authors' comments: 10 pages, 6 figures
Zhimin Li, Shusen Liu, Kailkhura Bhavya, Timo Bremer, Valerio Pascucci
Neural network have achieved remarkable successes in many scientific fields. However, the interpretability of the neural network model is still a major bottlenecks to deploy such technique into our daily life. The challenge can dive into the non-linear behavior of the neural network, which rises a critical question that how a model use input feature to make a decision. The classical approach to address this challenge is feature attribution, which assigns an important score to each input feature and reveal its importance of current prediction. However, current feature attribution approaches often indicate the importance of each input feature without detail of how they are actually processed by a model internally. These attribution approaches often raise a concern that whether they highlight correct features for a model prediction. For a neural network model, the non-linear behavior is often caused by non-linear activation units of a model. However, the computation behavior of a prediction from a neural network model is locally linear, because one prediction has only one activation pattern. Base on the observation, we propose an instance-wise linearization approach to reformulates the forward computation process of a neural network prediction. This approach reformulates different layers of convolution neural networks into linear matrix multiplication. Aggregating all layers' computation, a prediction complex convolution neural network operations can be described as a linear matrix multiplication $F(x) = W \cdot x + b$. This equation can not only provides a feature attribution map that highlights the important of the input features but also tells how each input feature contributes to a prediction exactly. Furthermore, we discuss the application of this technique in both supervise classification and unsupervised neural network learning parametric t-SNE dimension reduction.
Zijie Pan, Jiachen Lu, Xiatian Zhu, Li Zhang
High-resolution 3D object generation remains a challenging task primarily due
to the limited availability of comprehensive annotated training data. Recent
advancements have aimed to overcome this constraint by harnessing image
generative models, pretrained on extensive curated web datasets, using
knowledge transfer techniques like Score Distillation Sampling (SDS).
Efficiently addressing the requirements of high-resolution rendering often
necessitates the adoption of latent representation-based models, such as the
Latent Diffusion Model (LDM). In this framework, a significant challenge
arises: To compute gradients for individual image pixels, it is necessary to
backpropagate gradients from the designated latent space through the frozen
components of the image model, such as the VAE encoder used within LDM.
However, this gradient propagation pathway has never been optimized, remaining
uncontrolled during training. We find that the unregulated gradients adversely
affect the 3D model's capacity in acquiring texture-related information from
the image generative model, leading to poor quality appearance synthesis. To
address this overarching challenge, we propose an innovative operation termed
Pixel-wise Gradient Clipping (PGC) designed for seamless integration into
existing 3D generative models, thereby enhancing their synthesis quality.
Specifically, we control the magnitude of stochastic gradients by clipping the
pixel-wise gradients efficiently, while preserving crucial texture-related
gradient directions. Despite this simplicity and minimal extra cost, extensive
experiments demonstrate the efficacy of our PGC in enhancing the performance of
existing 3D generative models for high-resolution object rendering.
Authors' comments: Accepted at ICLR 2024. Project page:
https://fudan-zvg.github.io/PGC-3D
Siqi Kou, Lei Gan, Dequan Wang, Chongxuan Li, Zhijie Deng
Diffusion models have impressive image generation capability, but low-quality generations still exist, and their identification remains challenging due to the lack of a proper sample-wise metric. To address this, we propose BayesDiff, a pixel-wise uncertainty estimator for generations from diffusion models based on Bayesian inference. In particular, we derive a novel uncertainty iteration principle to characterize the uncertainty dynamics in diffusion, and leverage the last-layer Laplace approximation for efficient Bayesian inference. The estimated pixel-wise uncertainty can not only be aggregated into a sample-wise metric to filter out low-fidelity images but also aids in augmenting successful generations and rectifying artifacts in failed generations in text-to-image tasks. Extensive experiments demonstrate the efficacy of BayesDiff and its promise for practical applications.
Catherine Xinrui Yu, Jiaqi Gu, Zhaomeng Chen, Zihuai He
Testing multiple hypotheses of conditional independence with provable error
rate control is a fundamental problem with various applications. To infer
conditional independence with family-wise error rate (FWER) control when only
summary statistics of marginal dependence are accessible, we adopt
GhostKnockoff to directly generate knockoff copies of summary statistics and
propose a new filter to select features conditionally dependent to the response
with provable FWER control. In addition, we develop a computationally efficient
algorithm to greatly reduce the computational cost of knockoff copies
generation without sacrificing power and FWER control. Experiments on simulated
data and a real dataset of Alzheimer's disease genetics demonstrate the
advantage of proposed method over the existing alternatives in both statistical
power and computational efficiency.
Authors' comments: 35 pages