Daniel Siegismund, Mario Wieser, Stephan Heyse, Stephan Steigele
Deep Neural Networks (DNNs) have shown remarkable success in various computer vision tasks. However, their black-box nature often leads to difficulty in interpreting their decisions, creating an unfilled need for methods to explain the decisions, and ultimately forming a barrier to their wide acceptance especially in biomedical applications. This work introduces a novel method, Pixel-wise Channel Isolation Mixing (PCIM), to calculate pixel attribution maps, highlighting the image parts most crucial for a classification decision but without the need to extract internal network states or gradients. Unlike existing methods, PCIM treats each pixel as a distinct input channel and trains a blending layer to mix these pixels, reflecting specific classifications. This unique approach allows the generation of pixel attribution maps for each image, but agnostic to the choice of the underlying classification network. Benchmark testing on three application relevant, diverse high content Imaging datasets show state-of-the-art performance, particularly for model fidelity and localization ability in both, fluorescence and bright field High Content Imaging. PCIM contributes as a unique and effective method for creating pixel-level attribution maps from arbitrary DNNs, enabling interpretability and trust.
Gustavo P. C. P. da Luz, Gabriel Massuyoshi Sato, Luis Fernando Gomez Gonzalez, Juliana Freitag Borin
The increasing urbanization and the growing number of vehicles in cities have
underscored the need for efficient parking management systems. Traditional
smart parking solutions often rely on sensors or cameras for occupancy
detection, each with its limitations. Recent advancements in deep learning have
introduced new YOLO models (YOLOv8, YOLOv9, YOLOv10, and YOLOv11), but these
models have not been extensively evaluated in the context of smart parking
systems, particularly when combined with Region of Interest (ROI) selection for
object detection. Existing methods still rely on fixed polygonal ROI selections
or simple pixel-based modifications, which limit flexibility and precision.
This work introduces a novel approach that integrates Internet of Things, Edge
Computing, and Deep Learning concepts, by using the latest YOLO models for
vehicle detection. By exploring both edge and cloud computing, it was found
that inference times on edge devices ranged from 1 to 92 seconds, depending on
the hardware and model version. Additionally, a new pixel-wise post-processing
ROI selection method is proposed for accurately identifying regions of interest
to count vehicles in parking lot images. The proposed system achieved 99.68%
balanced accuracy on a custom dataset of 3,484 images, offering a
cost-effective smart parking solution that ensures precise vehicle detection
while preserving data privacy
Authors' comments: Submitted to Elsevier Internet of Things, 22 pages, 11 figures, 6
tables
Kunyang Han, Yibo Hu, Mengxue Qu, Hailin Shi, Yao Zhao, Yunchao Wei
Advances in CLIP and large multimodal models (LMMs) have enabled open-vocabulary and free-text segmentation, yet existing models still require predefined category prompts, limiting free-form category self-generation. Most segmentation LMMs also remain confined to sparse predictions, restricting their applicability in open-set environments. In contrast, we propose ROSE, a Revolutionary Open-set dense SEgmentation LMM, which enables dense mask prediction and open-category generation through patch-wise perception. Our method treats each image patch as an independent region of interest candidate, enabling the model to predict both dense and sparse masks simultaneously. Additionally, a newly designed instruction-response paradigm takes full advantage of the generation and generalization capabilities of LMMs, achieving category prediction independent of closed-set constraints or predefined categories. To further enhance mask detail and category precision, we introduce a conversation-based refinement paradigm, integrating the prediction result from previous step with textual prompt for revision. Extensive experiments demonstrate that ROSE achieves competitive performance across various segmentation tasks in a unified framework. Code will be released.
Zhiming Xu, Suorong Yang, Baile Xu, Jian Zhao, Furao Shen
Class-incremental learning (CIL) aims to acquire new classes while conserving
historical knowledge incrementally. Despite existing pre-trained model (PTM)
based methods performing excellently in CIL, it is better to fine-tune them on
downstream incremental tasks with massive patterns unknown to PTMs. However,
using task streams for fine-tuning could lead to catastrophic forgetting that
will erase the knowledge in PTMs. This paper proposes the Dual Prototype
network for Task-wise Adaption (DPTA) of PTM-based CIL. For each incremental
learning task, a task-wise adapter module is built to fine-tune the PTM, where
the center-adapt loss forces the representation to be more centrally clustered
and class separable. The dual prototype network improves the prediction process
by enabling test-time adapter selection, where the raw prototypes deduce
several possible task indexes of test samples to select suitable adapter
modules for PTM, and the augmented prototypes that could separate highly
correlated classes are utilized to determine the final result. Experiments on
several benchmark datasets demonstrate the state-of-the-art performance of
DPTA. The code will be open-sourced after the paper is published.
Authors' comments: 9 pages,6 figures,2 tables
Wei Lin, Qingyu Song, Hong Xu
Tuning effective step sizes is crucial for the stability and efficiency of optimization algorithms. While adaptive coordinate-wise step sizes tuning methods have been explored in first-order methods, second-order methods still lack efficient techniques. Current approaches, including hypergradient descent and cutting plane methods, offer limited improvements or encounter difficulties in second-order contexts. To address these challenges, we introduce a novel Learning-to-Optimize (L2O) model within the Broyden-Fletcher-Goldfarb-Shanno (BFGS) framework, which leverages neural networks to predict optimal coordinate-wise step sizes. Our model integrates a theoretical foundation that establishes conditions for the stability and convergence of these step sizes. Extensive experiments demonstrate that our approach achieves substantial improvements over traditional backtracking line search and hypergradient descent-based methods, offering up to 7$\times$ faster and stable performance across diverse optimization tasks.
Tao Song, Yicheng Wu, Minhao Hu, Xiangde Luo, Linda Wei, Guotai Wang, Yi Guo, Feng Xu et al.
Multimodal MR image synthesis aims to generate missing modality images by effectively fusing and mapping from a subset of available MRI modalities. Most existing methods adopt an image-to-image translation paradigm, treating multiple modalities as input channels. However, these approaches often yield sub-optimal results due to the inherent difficulty in achieving precise feature- or semantic-level alignment across modalities. To address these challenges, we propose an Adaptive Group-wise Interaction Network (AGI-Net) that explicitly models both inter-modality and intra-modality relationships for multimodal MR image synthesis. Specifically, feature channels are first partitioned into predefined groups, after which an adaptive rolling mechanism is applied to conventional convolutional kernels to better capture feature and semantic correspondences between different modalities. In parallel, a cross-group attention module is introduced to enable effective feature fusion across groups, thereby enhancing the network's representational capacity. We validate the proposed AGI-Net on the publicly available IXI and BraTS2023 datasets. Experimental results demonstrate that AGI-Net achieves state-of-the-art performance in multimodal MR image synthesis tasks, confirming the effectiveness of its modality-aware interaction design. We release the relevant code at: https://github.com/zunzhumu/Adaptive-Group-wise-Interaction-Network-for-Multimodal-MRI-Synthesis.git.
Nuria Fonseca-Bonilla, Luis Cerdán, Alberto Noriega-Crespo, Amaya Moro-Martín
While WISE is the largest, best quality infrared all-sky survey to date, a
smaller coverage mission, Spitzer, was designed to have better sensitivity and
spatial resolution at similar wavelengths. Confusion and contamination in WISE
data result in discrepancies between them. We present a novel approach to work
with WISE measurements with the goal of maintaining both its high coverage and
vast amount of data while taking full advantage of the higher sensitivity and
spatial resolution of Spitzer. We have applied machine learning (ML) techniques
to a complete WISE data sample of open cluster members, using a training set of
paired data from high-quality Spitzer Enhanced Imaging Products (SEIP), MIPS
and IRAC, and allWISE catalogs, W1 (3.4 {\mu}m) to W4 (22 {\mu}m) bands. We
have tested several ML regression models with the aim of predicting
mid-infrared fluxes at MIPS1 (24 {\mu}m) and IRAC4 (8 {\mu}m) bands from WISE
fluxes and quality flags. In addition, to improve the prediction quality, we
have implemented feature selection techniques to remove irrelevant WISE
variables. We have notably enhanced WISE detection capabilities, mostly at
lowest magnitudes, which previously showed the largest discrepancies with
Spitzer. In our particular case, extremely randomized trees was found to be the
best algorithm to predict mid-infrared fluxes from WISE variables. We have
tested our results in the SED of members of IC 348. We show discrepancies in
the measurements of Spitzer and WISE and demonstrate the good concordance of
our predicted fluxes with the real ones. ML is a fast and powerful tool that
can be used to find hidden relationships between datasets, as the ones that
exist between WISE and Spitzer fluxes. We believe this approach could be
employed for other samples from the allWISE catalog with SEIP positional
counterparts, and in other astrophysical studies with analogous discrepancies.
Authors' comments: 13 pages, 10 figures
Yasaman Saadati, M. Hadi Amini
Federated Learning (FL) is a decentralized learning approach that protects sensitive information by utilizing local model parameters rather than sharing clients' raw datasets. While this privacy-preserving method is widely employed across various applications, it still requires significant development and optimization. Automated Machine Learning (Auto-ML) has been adapted for reducing the need for manual adjustments. Previous studies have explored the integration of AutoML with different FL algorithms to evaluate their effectiveness in enhancing FL settings. However, Automated FL (Auto-FL) faces additional challenges due to the involvement of a large cohort of clients and global training rounds between clients and the server, rendering the tuning process time-consuming and nearly impossible on resource-constrained edge devices (e.g., IoT devices). This paper investigates the deployment and integration of two lightweight Hyper-Parameter Optimization (HPO) tools, Raytune and Optuna, within the context of FL settings. A step-wise feedback mechanism has also been designed to accelerate the hyper-parameter tuning process and coordinate AutoML toolkits with the FL server. To this end, both local and global feedback mechanisms are integrated to limit the search space and expedite the HPO process. Further, a novel client selection technique is introduced to mitigate the straggler effect in Auto-FL. The selected hyper-parameter tuning tools are evaluated using two benchmark datasets, FEMNIST, and CIFAR10. Further, the paper discusses the essential properties of successful HPO tools, the integration mechanism with the FL pipeline, and the challenges posed by the distributed and heterogeneous nature of FL environments.
Ying Yang, De Cheng, Chaowei Fang, Yubiao Wang, Changzhe Jiao, Lechao Cheng, Nannan Wang
Unsupervised out-of-distribution (OOD) detection aims to identify
out-of-domain data by learning only from unlabeled In-Distribution (ID)
training samples, which is crucial for developing a safe real-world machine
learning system. Current reconstruction-based methods provide a good
alternative approach by measuring the reconstruction error between the input
and its corresponding generative counterpart in the pixel/feature space.
However, such generative methods face a key dilemma: improving the
reconstruction power of the generative model while keeping a compact
representation of the ID data. To address this issue, we propose the
diffusion-based layer-wise semantic reconstruction approach for unsupervised
OOD detection. The innovation of our approach is that we leverage the diffusion
model's intrinsic data reconstruction ability to distinguish ID samples from
OOD samples in the latent feature space. Moreover, to set up a comprehensive
and discriminative feature representation, we devise a multi-layer semantic
feature extraction strategy. By distorting the extracted features with Gaussian
noise and applying the diffusion model for feature reconstruction, the
separation of ID and OOD samples is implemented according to the reconstruction
errors. Extensive experimental results on multiple benchmarks built upon
various datasets demonstrate that our method achieves state-of-the-art
performance in terms of detection accuracy and speed. Code is available at
<https://github.com/xbyym/DLSR>.
Authors' comments: 26 pages, 23 figures, published to Neurlps2024
Sucheng Ren, Yaodong Yu, Nataniel Ruiz, Feng Wang, Alan Yuille, Cihang Xie
There exists recent work in computer vision, named VAR, that proposes a new autoregressive paradigm for image generation. Diverging from the vanilla next-token prediction, VAR structurally reformulates the image generation into a coarse to fine next-scale prediction. In this paper, we show that this scale-wise autoregressive framework can be effectively decoupled into \textit{intra-scale modeling}, which captures local spatial dependencies within each scale, and \textit{inter-scale modeling}, which models cross-scale relationships progressively from coarse-to-fine scales. This decoupling structure allows to rebuild VAR in a more computationally efficient manner. Specifically, for intra-scale modeling -- crucial for generating high-fidelity images -- we retain the original bidirectional self-attention design to ensure comprehensive modeling; for inter-scale modeling, which semantically connects different scales but is computationally intensive, we apply linear-complexity mechanisms like Mamba to substantially reduce computational overhead. We term this new framework M-VAR. Extensive experiments demonstrate that our method outperforms existing models in both image quality and generation speed. For example, our 1.5B model, with fewer parameters and faster inference speed, outperforms the largest VAR-d30-2B. Moreover, our largest model M-VAR-d32 impressively registers 1.78 FID on ImageNet 256$\times$256 and outperforms the prior-art autoregressive models LlamaGen/VAR by 0.4/0.19 and popular diffusion models LDM/DiT by 1.82/0.49, respectively. Code is avaiable at \url{https://github.com/OliverRensu/MVAR}.
Huali Xu, Yongxiang Liu, Li Liu, Shuaifeng Zhi, Shuzhou Sun, Tianpeng Liu, MingMing Cheng
Existing cross-domain few-shot learning (CDFSL) methods, which develop
source-domain training strategies to enhance model transferability, face
challenges with large-scale pre-trained models (LMs) due to inaccessible source
data and training strategies. Moreover, fine-tuning LMs for CDFSL demands
substantial computational resources, limiting practicality. This paper
addresses the source-free CDFSL (SF-CDFSL) problem, tackling few-shot learning
(FSL) in the target domain using only pre-trained models and a few target
samples without source data or strategies. To overcome the challenge of
inaccessible source data, this paper introduces Step-wise Distribution
Alignment Guided Style Prompt Tuning (StepSPT), which implicitly narrows domain
gaps through prediction distribution optimization. StepSPT proposes a style
prompt to align target samples with the desired distribution and adopts a
dual-phase optimization process. In the external process, a step-wise
distribution alignment strategy factorizes prediction distribution optimization
into a multi-step alignment problem to tune the style prompt. In the internal
process, the classifier is updated using standard cross-entropy loss.
Evaluations on five datasets demonstrate that StepSPT outperforms existing
prompt tuning-based methods and SOTAs. Ablation studies further verify its
effectiveness. Code will be made publicly available at
\url{https://github.com/xuhuali-mxj/StepSPT}.
Authors' comments: 15 pages, 12 figures, 7 tables
Hao Tang, Junhao Lu, Guoheng Huang, Ming Li, Xuhang Chen, Guo Zhong, Zhengguang Tan, Zinuo Li
In Few-Shot Learning (FSL), traditional metric-based approaches often rely on global metrics to compute similarity. However, in natural scenes, the spatial arrangement of key instances is often inconsistent across images. This spatial misalignment can result in mismatched semantic pixels, leading to inaccurate similarity measurements. To address this issue, we propose a novel method called the Layer-Wise Features Metric of Semantic-Pixel Matching (LWFM-SPM) to make finer comparisons. Our method enhances model performance through two key modules: (1) the Layer-Wise Embedding (LWE) Module, which refines the cross-correlation of image pairs to generate well-focused feature maps for each layer; (2)the Semantic-Pixel Matching (SPM) Module, which aligns critical pixels based on semantic embeddings using an assignment algorithm. We conducted extensive experiments to evaluate our method on four widely used few-shot classification benchmarks: miniImageNet, tieredImageNet, CUB-200-2011, and CIFAR-FS. The results indicate that LWFM-SPM achieves competitive performance across these benchmarks. Our code will be publicly available on https://github.com/Halo2Tang/Code-for-LWFM-SPM.
Ioannis Caragiannis, Nick Gravin, Zhile Jiang
The problem of identifying the satisfiability threshold of random $3$-SAT
formulas has received a lot of attention during the last decades and has
inspired the study of other threshold phenomena in random combinatorial
structures. The classical assumption in this line of research is that, for a
given set of $n$ Boolean variables, each clause is drawn uniformly at random
among all sets of three literals from these variables, independently from other
clauses. Here, we keep the uniform distribution of each clause, but deviate
significantly from the independence assumption and consider richer families of
probability distributions. For integer parameters $n$, $m$, and $k$, we denote
by $\DistFamily_k(n,m)$ the family of probability distributions that produce
formulas with $m$ clauses, each selected uniformly at random from all sets of
three literals from the $n$ variables, so that the clauses are $k$-wise
independent. Our aim is to make general statements about the satisfiability or
unsatisfiability of formulas produced by distributions in $\DistFamily_k(n,m)$
for different values of the parameters $n$, $m$, and $k$.
Authors' comments: 26 pages, 1 fugure
Zhirui Deng, Zhicheng Dou, Yutao Zhu, Ji-Rong Wen, Ruibin Xiong, Mang Wang, Weipeng Chen
The outstanding capabilities of large language models (LLMs) render them a crucial component in various autonomous agent systems. While traditional methods depend on the inherent knowledge of LLMs without fine-tuning, more recent approaches have shifted toward the reinforcement learning strategy to further enhance agents' ability to solve complex interactive tasks with environments and tools. However, previous approaches are constrained by the sparse reward issue, where existing datasets solely provide a final scalar reward for each multi-step reasoning chain, potentially leading to ineffectiveness and inefficiency in policy learning. In this paper, we introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process. Inheriting the spirit of novice-to-expert theory, we first compare the actions of the expert and the agent to automatically generate intermediate rewards for fine-grained optimization. Additionally, we propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment. Further theoretical analysis demonstrates that the action distribution of the agent can converge toward the expert action distribution over multiple training cycles. Experimental results across various datasets indicate that StepAgent outperforms existing baseline methods.
Xiaoqing Chen, Siyang Li, Yunlu Tu, Ziwei Wang, Dongrui Wu
Objective: An electroencephalogram (EEG)-based brain-computer interface (BCI) is a direct communication pathway between the human brain and a computer. Most research so far studied more accurate BCIs, but much less attention has been paid to the ethics of BCIs. Aside from task-specific information, EEG signals also contain rich private information, e.g., user identity, emotion, disorders, etc., which should be protected. Approach: We show for the first time that adding user-wise perturbations can make identity information in EEG unlearnable. We propose four types of user-wise privacy-preserving perturbations, i.e., random noise, synthetic noise, error minimization noise, and error maximization noise. After adding the proposed perturbations to EEG training data, the user identity information in the data becomes unlearnable, while the BCI task information remains unaffected. Main results: Experiments on six EEG datasets using three neural network classifiers and various traditional machine learning models demonstrated the robustness and practicability of the proposed perturbations. Significance: Our research shows the feasibility of hiding user identity information in EEG data without impacting the primary BCI task information.
Chengting Yu, Fengzhao Zhang, Ruizhe Chen, Aili Wang, Zuozhu Liu, Shurun Tan, Er-Ping Li
Knowledge Distillation (KD), a learning manner with a larger teacher network guiding a smaller student network, transfers dark knowledge from the teacher to the student via logits or intermediate features, with the aim of producing a well-performed lightweight model. Notably, many subsequent feature-based KD methods outperformed the earliest logit-based KD method and iteratively generated numerous state-of-the-art distillation methods. Nevertheless, recent work has uncovered the potential of the logit-based method, bringing the simple KD form based on logits back into the limelight. Features or logits? They partially implement the KD with entirely distinct perspectives; therefore, choosing between logits and features is not straightforward. This paper provides a unified perspective of feature alignment in order to obtain a better comprehension of their fundamental distinction. Inheriting the design philosophy and insights of feature-based and logit-based methods, we introduce a block-wise logit distillation framework to apply implicit logit-based feature alignment by gradually replacing teacher's blocks as intermediate stepping-stone models to bridge the gap between the student and the teacher. Our method obtains comparable or superior results to state-of-the-art distillation methods. This paper demonstrates the great potential of combining logit and features, and we hope it will inspire future research to revisit KD from a higher vantage point.
Nikita Guseynov, Nana Liu
Efficiently uploading data into quantum states is essential for many quantum
algorithms to achieve advantage across various applications. In this paper, we
address this challenge by proposing a method to upload a polynomial function
$f(x)$ on the interval $x \in (a, b)$ into a pure quantum state consisting of
qubits, where a discretized $f(x)$ is the amplitude of this state. The
preparation cost has $\mathcal{O}(n\log n)$ scaling in the number of qubits $n$
and linear scaling with the degree of the polynomial $Q$. This efficiency
allows the preparation of states whose amplitudes correspond to high-degree
polynomials, enabling the approximation of almost any continuous function. We
introduce an explicit algorithm for uploading such functions using four real
polynomials that meet specific parity and boundedness conditions. We also
generalize this approach to piece-wise polynomial functions, with the algorithm
scaling linearly with the number of piecewise parts. Our method achieves
efficient quantum circuit implementation and we present detailed gate counting
and resource analysis.
Authors' comments: 17 pages, 9 figures, 2 tables
Wenhan Chang, Tianqing Zhu, Yufeng Wu, Wanlei Zhou
In the rapid advancement of artificial intelligence, privacy protection has
become crucial, giving rise to machine unlearning. Machine unlearning is a
technique that removes specific data influences from trained models without the
need for extensive retraining. However, it faces several key challenges,
including accurately implementing unlearning, ensuring privacy protection
during the unlearning process, and achieving effective unlearning without
significantly compromising model performance. This paper presents a novel
approach to machine unlearning by employing Layer-wise Relevance Analysis and
Neuronal Path Perturbation. We address three primary challenges: the lack of
detailed unlearning principles, privacy guarantees in zero-shot unlearning
scenario, and the balance between unlearning effectiveness and model utility.
Our method balances machine unlearning performance and model utility by
identifying and perturbing highly relevant neurons, thereby achieving effective
unlearning. By using data not present in the original training set during the
unlearning process, we satisfy the zero-shot unlearning scenario and ensure
robust privacy protection. Experimental results demonstrate that our approach
effectively removes targeted data from the target unlearning model while
maintaining the model's utility, offering a practical solution for
privacy-preserving machine learning.
Authors' comments: 17 pages, 5 figures
Stefan Stojanovic, Yassir Jedra, Alexandre Proutiere
We consider the problem of learning an $\varepsilon$-optimal policy in
controlled dynamical systems with low-rank latent structure. For this problem,
we present LoRa-PI (Low-Rank Policy Iteration), a model-free learning algorithm
alternating between policy improvement and policy evaluation steps. In the
latter, the algorithm estimates the low-rank matrix corresponding to the
(state, action) value function of the current policy using the following
two-phase procedure. The entries of the matrix are first sampled uniformly at
random to estimate, via a spectral method, the leverage scores of its rows and
columns. These scores are then used to extract a few important rows and columns
whose entries are further sampled. The algorithm exploits these new samples to
complete the matrix estimation using a CUR-like method. For this leveraged
matrix estimation procedure, we establish entry-wise guarantees that
remarkably, do not depend on the coherence of the matrix but only on its
spikiness. These guarantees imply that LoRa-PI learns an $\varepsilon$-optimal
policy using $\widetilde{O}({S+A\over \mathrm{poly}(1-\gamma)\varepsilon^2})$
samples where $S$ (resp. $A$) denotes the number of states (resp. actions) and
$\gamma$ the discount factor. Our algorithm achieves this order-optimal (in
$S$, $A$ and $\varepsilon$) sample complexity under milder conditions than
those assumed in previously proposed approaches.
Authors' comments: Accepted for presentation at the Conference on Neural Information
Processing Systems (NeurIPS) 2024
Peizhuang Cong, Qizhi Chen, Haochen Zhao, Tong Yang
The advanced capabilities of Large Language Models (LLMs) have inspired the development of various interactive web services or applications, such as ChatGPT, which offer query inference services for users. Unlike traditional DNN model, the inference of LLM entails different iterations of forward computation for different queries, which result in efficiency challenges for existing run-to-completion batch-wise inference. Hence, some methods refine batch-wise inference to iteration-level by duplicating all nonlinear layers of LLM. However, this approach not only increases resource usage but also introduces idle computations to the batch due to the prefilling of newly added queries. Therefore, we propose BATON, an efficient batch-wise LLM inference scheme by dynamically adjusting processing batch, which can achieve near-zero idle computations without incurring additional resource consumption. To do so, BATON 1) shapes the vectors involved in the inference of the newly inserted query and processing batch to align dimensions and generates a new attention mask based on vector shaping to ensure inference correctness, which enables query inserting without consuming additional resource; 2) embeds prefilled Keys and Values of the new query into the KV_Cache of the processing batch by leveraging the prefilling and decoding separation mechanism, eliminating idle computations to the batch introduced by the prefilling process of the new query. Experimental results show that compared to the state-of-the-art solution Orca, BATON improves query processing by up to 1.75 times.