Stefan Stojanovic, Yassir Jedra, Alexandre Proutiere
We study matrix estimation problems arising in reinforcement learning (RL)
with low-rank structure. In low-rank bandits, the matrix to be recovered
specifies the expected arm rewards, and for low-rank Markov Decision Processes
(MDPs), it may for example characterize the transition kernel of the MDP. In
both cases, each entry of the matrix carries important information, and we seek
estimation methods with low entry-wise error. Importantly, these methods
further need to accommodate for inherent correlations in the available data
(e.g. for MDPs, the data consists of system trajectories). We investigate the
performance of simple spectral-based matrix estimation approaches: we show that
they efficiently recover the singular subspaces of the matrix and exhibit
nearly-minimal entry-wise error. These new results on low-rank matrix
estimation make it possible to devise reinforcement learning algorithms that
fully exploit the underlying low-rank structure. We provide two examples of
such algorithms: a regret minimization algorithm for low-rank bandit problems,
and a best policy identification algorithm for reward-free RL in low-rank MDPs.
Both algorithms yield state-of-the-art performance guarantees.
Authors' comments: To appear in NeurIPS 2023
Guanqi Chen, Guanbin Li
Cardiac function assessment aims at predicting left ventricular ejection
fraction (LVEF) given an echocardiogram video, which requests models to focus
on the changes in the left ventricle during the cardiac cycle. How to assess
cardiac function accurately and automatically from an echocardiogram video is a
valuable topic in intelligent assisted healthcare. Existing video-based methods
do not pay much attention to the left ventricular region, nor the left
ventricular changes caused by motion. In this work, we propose a
semi-supervised auxiliary learning paradigm with a left ventricular
segmentation task, which contributes to the representation learning for the
left ventricular region. To better model the importance of motion information,
we introduce a temporal channel-wise attention (TCA) module to excite those
channels used to describe motion. Furthermore, we reform the TCA module with
semantic perception by taking the segmentation map of the left ventricle as
input to focus on the motion patterns of the left ventricle. Finally, to reduce
the difficulty of direct LVEF regression, we utilize an anchor-based
classification and regression method to predict LVEF. Our approach achieves
state-of-the-art performance on the Stanford dataset with an improvement of
0.22 MAE, 0.26 RMSE, and 1.9% $R^2$.
Authors' comments: Accepted by ISBI 2022 (oral)
Alexandre Mascarenhas, Yuri Lavinas, Claus Aranha
Dynamic Optimization Problems (DOPs) are characterized by changes in the fitness landscape that can occur at any time and are common in real world applications. The main issues to be considered include detecting the change in the fitness landscape and reacting in accord. Over the years, several evolutionary algorithms have been proposed to take into account this characteristic during the optimization process. However, the number of available tools or open source codebases for these approaches is limited, making reproducibility and extensive experimentation difficult. To solve this, we developed a component-oriented framework for DOPs called Adjustable Components for Dynamic Problems (AbCD), inspired by similar works in the Multiobjective static domain. Using this framework, we investigate components that were proposed in several popular DOP algorithms. Our experiments show that the performance of these components depends on the problem and the selected components used in a configuration, which differs from the results reported in the literature. Using irace, we demonstrate how this framework can automatically generate DOP algorithm configurations that take into account the characteristics of the problem to be solved. Our results highlight existing problems in the DOP field that need to be addressed in the future development of algorithms and components.
Li Li, You Qin, Wei Ji, Yuxiao Zhou, Roger Zimmermann
Panoptic Scene Graph Generation (PSG) involves the detection of objects and
the prediction of their corresponding relationships (predicates). However, the
presence of biased predicate annotations poses a significant challenge for PSG
models, as it hinders their ability to establish a clear decision boundary
among different predicates. This issue substantially impedes the practical
utility and real-world applicability of PSG models. To address the intrinsic
bias above, we propose a novel framework to infer potentially biased
annotations by measuring the predicate prediction risks within each
subject-object pair (domain), and adaptively transfer the biased annotations to
consistent ones by learning invariant predicate representation embeddings.
Experiments show that our method significantly improves the performance of
benchmark models, achieving a new state-of-the-art performance, and shows great
generalization and effectiveness on PSG dataset.
Authors' comments: arXiv admin note: text overlap with arXiv:2307.15567
Yuyuan Li, Chaochao Chen, Xiaolin Zheng, Yizhao Zhang, Zhongxuan Han, Dan Meng, Jun Wang
With the growing privacy concerns in recommender systems, recommendation unlearning, i.e., forgetting the impact of specific learned targets, is getting increasing attention. Existing studies predominantly use training data, i.e., model inputs, as the unlearning target. However, we find that attackers can extract private information, i.e., gender, race, and age, from a trained model even if it has not been explicitly encountered during training. We name this unseen information as attribute and treat it as the unlearning target. To protect the sensitive attribute of users, Attribute Unlearning (AU) aims to degrade attacking performance and make target attributes indistinguishable. In this paper, we focus on a strict but practical setting of AU, namely Post-Training Attribute Unlearning (PoT-AU), where unlearning can only be performed after the training of the recommendation model is completed. To address the PoT-AU problem in recommender systems, we design a two-component loss function that consists of i) distinguishability loss: making attribute labels indistinguishable from attackers, and ii) regularization loss: preventing drastic changes in the model that result in a negative impact on recommendation performance. Specifically, we investigate two types of distinguishability measurements, i.e., user-to-user and distribution-to-distribution. We use the stochastic gradient descent algorithm to optimize our proposed loss. Extensive experiments on three real-world datasets demonstrate the effectiveness of our proposed methods.
Chen Liang, Jiahui Yu, Ming-Hsuan Yang, Matthew Brown, Yin Cui, Tuo Zhao, Boqing Gong, Tianyi Zhou
Pre-trained multimodal foundation models have demonstrated remarkable generalizability but pose challenges for deployment due to their large sizes. One effective approach to reducing their sizes is layerwise distillation, wherein small student models are trained to match the hidden representations of large teacher models at each layer. Motivated by our observation that certain architecture components, referred to as modules, contribute more significantly to the student's performance than others, we propose to track the contributions of individual modules by recording the loss decrement after distillation each module and choose the module with a greater contribution to distill more frequently. Such an approach can be naturally formulated as a multi-armed bandit (MAB) problem, where modules and loss decrements are considered as arms and rewards, respectively. We then develop a modified-Thompson sampling algorithm named OPTIMA to address the nonstationarity of module contributions resulting from model updating. Specifically, we leverage the observed contributions in recent history to estimate the changing contribution of each module and select modules based on these estimations to maximize the cumulative contribution. We evaluate the effectiveness of OPTIMA through distillation experiments on various multimodal understanding and image captioning tasks, using the CoCa-Large model (Yu et al., 2022) as the teacher model.
Xiang Liu, Liangxi Liu, Feiyang Ye, Yunheng Shen, Xia Li, Linshan Jiang, Jialin Li
Efficiently aggregating trained neural networks from local clients into a
global model on a server is a widely researched topic in federated learning.
Recently, motivated by diminishing privacy concerns, mitigating potential
attacks, and reducing communication overhead, one-shot federated learning
(i.e., limiting client-server communication into a single round) has gained
popularity among researchers. However, the one-shot aggregation performances
are sensitively affected by the non-identical training data distribution, which
exhibits high statistical heterogeneity in some real-world scenarios. To
address this issue, we propose a novel one-shot aggregation method with
layer-wise posterior aggregation, named FedLPA. FedLPA aggregates local models
to obtain a more accurate global model without requiring extra auxiliary
datasets or exposing any private label information, e.g., label distributions.
To effectively capture the statistics maintained in the biased local datasets
in the practical non-IID scenario, we efficiently infer the posteriors of each
layer in each local model using layer-wise Laplace approximation and aggregate
them to train the global parameters. Extensive experimental results demonstrate
that FedLPA significantly improves learning performance over state-of-the-art
methods across several metrics.
Authors' comments: 39pages, 4 figures
Shengkun Tang, Yaqing Wang, Caiwen Ding, Yi Liang, Yao Li, Dongkuan Xu
Diffusion models achieve great success in generating diverse and high-fidelity images, yet their widespread application, especially in real-time scenarios, is hampered by their inherently slow generation speed. The slow generation stems from the necessity of multi-step network inference. While some certain predictions benefit from the full computation of the model in each sampling iteration, not every iteration requires the same amount of computation, potentially leading to inefficient computation. Unlike typical adaptive computation challenges that deal with single-step generation problems, diffusion processes with a multi-step generation need to dynamically adjust their computational resource allocation based on the ongoing assessment of each step's importance to the final image output, presenting a unique set of challenges. In this work, we propose AdaDiff, an adaptive framework that dynamically allocates computation resources in each sampling step to improve the generation efficiency of diffusion models. To assess the effects of changes in computational effort on image quality, we present a timestep-aware uncertainty estimation module (UEM). Integrated at each intermediate layer, the UEM evaluates the predictive uncertainty. This uncertainty measurement serves as an indicator for determining whether to terminate the inference process. Additionally, we introduce an uncertainty-aware layer-wise loss aimed at bridging the performance gap between full models and their adaptive counterparts.
Utkarsh Singhal, Carlos Esteves, Ameesh Makadia, Stella X. Yu
Computer vision research has long aimed to build systems that are robust to
spatial transformations found in natural data. Traditionally, this is done
using data augmentation or hard-coding invariances into the architecture.
However, too much or too little invariance can hurt, and the correct amount is
unknown a priori and dependent on the instance. Ideally, the appropriate
invariance would be learned from data and inferred at test-time.
We treat invariance as a prediction problem. Given any image, we use a
normalizing flow to predict a distribution over transformations and average the
predictions over them. Since this distribution only depends on the instance, we
can align instances before classifying them and generalize invariance across
classes. The same distribution can also be used to adapt to out-of-distribution
poses. This normalizing flow is trained end-to-end and can learn a much larger
range of transformations than Augerino and InstaAug. When used as data
augmentation, our method shows accuracy and robustness gains on CIFAR 10,
CIFAR10-LT, and TinyImageNet.
Authors' comments: Accepted to ICCV 2023
Ankit Pratap Singh, Namrata Vaswani
This work considers two related learning problems in a federated attack prone
setting: federated principal components analysis (PCA) and federated low rank
column-wise sensing (LRCS). The node attacks are assumed to be Byzantine which
means that the attackers are omniscient and can collude. We introduce a novel
provably Byzantine-resilient communication-efficient and sampleefficient
algorithm, called Subspace-Median, that solves the PCA problem and is a key
part of the solution for the LRCS problem. We also study the most natural
Byzantine-resilient solution for federated PCA, a geometric median based
modification of the federated power method, and explain why it is not useful.
Our second main contribution is a complete alternating gradient descent (GD)
and minimization (altGDmin) algorithm for Byzantine-resilient horizontally
federated LRCS and sample and communication complexity guarantees for it.
Extensive simulation experiments are used to corroborate our theoretical
guarantees. The ideas that we develop for LRCS are easily extendable to other
LR recovery problems as well.
Authors' comments: 36 pages
Kiana D. McFadden, Amy K. Mainzer, Joseph R. Masiero, James M. Bauer, Roc M. Cutri, Dar Dahlen, Frank J. Masci, Jana Pittichov et al.
Probing small main-belt asteroids provides insight into their formation and
evolution through multiple dynamical and collisional processes. These asteroids
also overlap in size with the potentially hazardous near-earth object
population and supply the majority of these objects. The Lucy mission will
provide an opportunity for study of a small main-belt asteroid, (152830)
Dinkinesh. The spacecraft will perform a flyby of this object on November 1,
2023, in preparation for its mission to the Jupiter Trojan asteroids. We
employed aperture photometry on stacked frames of Dinkinesh obtained by the
Wide-field-Infrared Survey Explorer and performed thermal modeling on a
detection at 12 $\mu$m to compute diameter and albedo values. Through this
method, we determined Dinkinesh has an effective spherical diameter of
$0.76^{+0.11}_{-0.21}$ km and a visual geometric albedo of
$0.27^{+0.25}_{-0.06}$ at the 16th and 84th percentiles. This albedo is
consistent with typical stony (S-type) asteroids.
Authors' comments: Submitted to Astrophysical Journal Letters
Hanjiang Hu, Zuxin Liu, Linyi Li, Jiacheng Zhu, Ding Zhao
In recent years, computer vision has made remarkable advancements in
autonomous driving and robotics. However, it has been observed that deep
learning-based visual perception models lack robustness when faced with camera
motion perturbations. The current certification process for assessing
robustness is costly and time-consuming due to the extensive number of image
projections required for Monte Carlo sampling in the 3D camera motion space. To
address these challenges, we present a novel, efficient, and practical
framework for certifying the robustness of 3D-2D projective transformations
against camera motion perturbations. Our approach leverages a smoothing
distribution over the 2D pixel space instead of in the 3D physical space,
eliminating the need for costly camera motion sampling and significantly
enhancing the efficiency of robustness certifications. With the pixel-wise
smoothed classifier, we are able to fully upper bound the projection errors
using a technique of uniform partitioning in camera motion space. Additionally,
we extend our certification framework to a more general scenario where only a
single-frame point cloud is required in the projection oracle. This is achieved
by deriving Lipschitz-based approximated partition intervals. Through extensive
experimentation, we validate the trade-off between effectiveness and efficiency
enabled by our proposed method. Remarkably, our approach achieves approximately
80% certified accuracy while utilizing only 30% of the projected image frames.
Authors' comments: 32 pages, 5 figures, 13 tables
Shiheng Zhang, Jiahao Zhang, Jie Shen, Guang Lin
We present a novel optimization algorithm, element-wise relaxed scalar
auxiliary variable (E-RSAV), that satisfies an unconditional energy dissipation
law and exhibits improved alignment between the modified and the original
energy. Our algorithm features rigorous proofs of linear convergence in the
convex setting. Furthermore, we present a simple accelerated algorithm that
improves the linear convergence rate to super-linear in the univariate case. We
also propose an adaptive version of E-RSAV with Steffensen step size. We
validate the robustness and fast convergence of our algorithm through ample
numerical experiments.
Authors' comments: 25 pages, 7 figures
Oscar Pina, Vernica Vilaplana
End-to-end training of graph neural networks (GNN) on large graphs presents several memory and computational challenges, and limits the application to shallow architectures as depth exponentially increases the memory and space complexities. In this manuscript, we propose Layer-wise Regularized Graph Infomax, an algorithm to train GNNs layer by layer in a self-supervised manner. We decouple the feature propagation and feature transformation carried out by GNNs to learn node representations in order to derive a loss function based on the prediction of future inputs. We evaluate the algorithm in inductive large graphs and show similar performance to other end to end methods and a substantially increased efficiency, which enables the training of more sophisticated models in one single device. We also show that our algorithm avoids the oversmoothing of the representations, another common challenge of deep GNNs.
Caroline Brosse, Oscar Defrain, Kazuhiro Kurita, Vincent Limouzy, Takeaki Uno, Kunihiro Wasa
Enumeration problems are often encountered as key subroutines in the exact
computation of graph parameters such as chromatic number, treewidth, or
treedepth. In the case of treedepth computation, the enumeration of
inclusion-wise minimal separators plays a crucial role. However and quite
surprisingly, the complexity status of this problem has not been settled since
it has been posed as an open direction by Kloks and Kratsch in 1998. Recently
at the PACE 2020 competition dedicated to treedepth computation, solvers have
been circumventing that by listing all minimal $a$-$b$ separators and filtering
out those that are not inclusion-wise minimal, at the cost of efficiency.
Naturally, having an efficient algorithm for listing inclusion-wise minimal
separators would drastically improve such practical algorithms. In this note,
however, we show that no efficient algorithm is to be expected from an
output-sensitive perspective, namely, we prove that there is no
output-polynomial time algorithm for inclusion-wise minimal separators
enumeration unless P = NP.
Authors' comments: 11 pages, 3 figures
Hasan Saribas, Cagri Yesil, Serdarcan Dilbaz, Halit Orenbas
With the increasing complexity and scale of click-through rate (CTR) prediction tasks in online advertising and recommendation systems, accurately estimating the importance of features has become a critical aspect of developing effective models. In this paper, we propose an attention-based approach that leverages max and mean pooling operations, along with a bit-wise attention mechanism, to enhance feature importance estimation in CTR prediction. Traditionally, pooling operations such as max and mean pooling have been widely used to extract relevant information from features. However, these operations can lead to information loss and hinder the accurate determination of feature importance. To address this challenge, we propose a novel attention architecture that utilizes a bit-based attention structure that emphasizes the relationships between all bits in features, together with maximum and mean pooling. By considering the fine-grained interactions at the bit level, our method aims to capture intricate patterns and dependencies that might be overlooked by traditional pooling operations. To examine the effectiveness of the proposed method, experiments have been conducted on three public datasets. The experiments demonstrated that the proposed method significantly improves the performance of the base models to achieve state-of-the-art results.
Shoukang Hu, Fangzhou Hong, Tao Hu, Liang Pan, Haiyi Mei, Weiye Xiao, Lei Yang, Ziwei Liu
3D human generation from 2D images has achieved remarkable progress through
the synergistic utilization of neural rendering and generative models. Existing
3D human generative models mainly generate a clothed 3D human as an
undetectable 3D model in a single pass, while rarely considering the layer-wise
nature of a clothed human body, which often consists of the human body and
various clothes such as underwear, outerwear, trousers, shoes, etc. In this
work, we propose HumanLiff, the first layer-wise 3D human generative model with
a unified diffusion process. Specifically, HumanLiff firstly generates
minimal-clothed humans, represented by tri-plane features, in a canonical
space, and then progressively generates clothes in a layer-wise manner. In this
way, the 3D human generation is thus formulated as a sequence of
diffusion-based 3D conditional generation. To reconstruct more fine-grained 3D
humans with tri-plane representation, we propose a tri-plane shift operation
that splits each tri-plane into three sub-planes and shifts these sub-planes to
enable feature grid subdivision. To further enhance the controllability of 3D
generation with 3D layered conditions, HumanLiff hierarchically fuses tri-plane
features and 3D layered conditions to facilitate the 3D diffusion model
learning. Extensive experiments on two layer-wise 3D human datasets, SynBody
(synthetic) and TightCap (real-world), validate that HumanLiff significantly
outperforms state-of-the-art methods in layer-wise 3D human generation. Our
code will be available at https://skhu101.github.io/HumanLiff.
Authors' comments: Project page: https://skhu101.github.io/HumanLiff/
Jian-Zhou Zhu
Component-wise dimensionally reduced flows (CWDRFs) are characterized by the uniformly (over space and time) vanishing of some component(s) in the velocity gradient tensor, and they may present in various situations with different conditions. A more universal method for specifying and computing barotropic CWDRFs associated to the Navier-Stokes equation is designed for situations besides that in a (cyclic) box. The method is \textit{local} in the sense that global relations involving volume integration are not used, and the enthalpy gradient is used as the primitive variable and computed directly. Such a local method is more useful for, say, testing the physical relevance of CWDRFs, including the real Schur flows proposed recently, or finding their practically meaningful realizations. The local and global methods are shown to be equivalent for CWDRFs in (cyclic) boxes.
Hitoshi Kiya, Ryota Iijima, Teru Nagamori
This article presents block-wise image encryption for the vision transformer
and its applications. Perceptual image encryption for deep learning enables us
not only to protect the visual information of plain images but to also embed
unique features controlled with a key into images and models. However, when
using conventional perceptual encryption methods, the performance of models is
degraded due to the influence of encryption. In this paper, we focus on
block-wise encryption for the vision transformer, and we introduce three
applications: privacy-preserving image classification, access control, and the
combined use of federated learning and encrypted images. Our scheme can have
the same performance as models without any encryption, and it does not require
any network modification. It also allows us to easily update the secret key. In
experiments, the effectiveness of the scheme is demonstrated in terms of
performance degradation and access control on the CIFAR10 and CIFAR-100
datasets.
Authors' comments: 7 figures, 3 tables. arXiv admin note: substantial text overlap with
arXiv:2207.05366
Hui Kang, Sheng Liu, Huaxi Huang, Tongliang Liu
In real-world datasets, noisy labels are pervasive. The challenge of learning with noisy labels (LNL) is to train a classifier that discerns the actual classes from given instances. For this, the model must identify features indicative of the authentic labels. While research indicates that genuine label information is embedded in the learned features of even inaccurately labeled data, it's often intertwined with noise, complicating its direct application. Addressing this, we introduce channel-wise contrastive learning (CWCL). This method distinguishes authentic label information from noise by undertaking contrastive learning across diverse channels. Unlike conventional instance-wise contrastive learning (IWCL), CWCL tends to yield more nuanced and resilient features aligned with the authentic labels. Our strategy is twofold: firstly, using CWCL to extract pertinent features to identify cleanly labeled samples, and secondly, progressively fine-tuning using these samples. Evaluations on several benchmark datasets validate our method's superiority over existing approaches.