Andreas Papageorgiou, Paulo Vitor Itaborai, Kostas Blekos, Karl Jansen
This paper introduces quantum circuit methodologies for pointwise multiplication and convolution of complex functions, conceptualized as "processing through encoding". Leveraging known techniques, we describe an approach where multiple complex functions are encoded onto auxiliary qubits. Applying the proposed scheme for two functions $f$ and $g$, their pointwise product $f(x)g(x)$ is shown to naturally form as the coefficients of part of the resulting quantum state. Adhering to the convolution theorem, we then demonstrate how the convolution $f*g$ can be constructed. Similarly to related work, this involves the encoding of the Fourier coefficients $\mathcal{F}[f]$ and $\mathcal{F}[g]$, which facilitates their pointwise multiplication, followed by the inverse Quantum Fourier Transform. We discuss the simulation of these techniques, their integration into an extended \verb|quantumaudio| package for audio signal processing, and present initial experimental validations. This work offers a promising avenue for quantum signal processing, with potential applications in areas such as quantum-enhanced audio manipulation and synthesis.
Authors' comments: Presented at ISQCMC '25: 3rd International Symposium on Quantum Computing and Musical Creativity
Kesheng Chen, Wenjian Luo, Zhenqian Zhu, Yamin Hu, Yiya Xi
Constructing a Pareto set is pivotal for navigating the capability-efficiency trade-offs in Large Language Models (LLMs); however, existing merging techniques remain inadequate for this task. Coarse-grained, model-level methods yield only a sparse set of suboptimal solutions, while fine-grained, layer-wise approaches suffer from the "curse of dimensionality," rendering the search space computationally intractable. To resolve this dichotomy, we propose BAMBO (Bayesian Adaptive Multi-objective Block-wise Optimization), a novel framework that automatically constructs the LLM Pareto set. BAMBO renders the search tractable by introducing a Hybrid Optimal Block Partitioning strategy. Formulated as a 1D clustering problem, this strategy leverages a dynamic programming approach to optimally balance intra-block homogeneity and inter-block information distribution, thereby dramatically reducing dimensionality without sacrificing critical granularity. The entire process is automated within an evolutionary loop driven by the q-Expected Hypervolume Improvement (qEHVI) acquisition function. Experiments demonstrate that BAMBO discovers a superior and more comprehensive Pareto frontier than baselines, enabling agile model selection tailored to diverse operational constraints. Code is available at: https://github.com/xin8coder/BAMBO.
Kevin Lee, Pablo Millan Arias
Layer-wise Relevance Propagation (LRP) provides principled attribution for neural networks through conservation properties and foundations in Deep Taylor Decomposition. However, existing implementations operate at the module level, requiring architecture-specific propagation rules and modifications. These limit the generality of target model and sustainability of implementations as architectures evolve. We introduce DynamicLRP, a model-agnostic LRP framework operating at the tensor operation level. By decomposing attribution to individual operations within computation graphs and introducing a novel mechanism for deferred activation resolution, named the Promise System, our approach achieves true architecture agnosticity while maintaining LRP's theoretical guarantees. This design operates independently of backpropagation machinery, enabling operation on arbitrary computation graphs without model modification and side-by-side execution with gradient backpropagation. Being based on computation graphs, this method is theoretically extensible to other deep learning libraries that support auto-differentiation. We demonstrate faithfulness matching or exceeding specialized implementations (1.77 vs 1.69 ABPC on VGG, equivalent performance on ViT, 93.70\% and 95.06\% top-1 attribution accuracy for explaining RoBERTa-large and Flan-T5-large answers on SQuADv2, respectively) while maintaining practical efficiency on models with hundreds of millions of parameters. We achieved 99.92\% node coverage across 31,465 computation graph nodes from 15 diverse architectures, including state-space models (Mamba), audio transformers (Whisper), and multimodal systems (DePlot) without any model-specific code with rules for 47 fundamental operations implemented. Our operation-level decomposition and Promise System establish a sustainable, extensible foundation for LRP across evolving architectures.
Authors' comments: Work in progress, (12 pages manuscript, 6 figures, 6 tables, 3 pages references, 14 pages appendix)
Yebo Wu, Jingguang Li, Zhijiang Guo, Li Li
Federated fine-tuning offers a promising solution for adapting Large Language Models (LLMs) to downstream tasks while safeguarding data privacy. However, its high computational and communication demands hinder its deployment on resource-constrained devices. In this paper, we propose SmartFed, a resource-efficient federated fine-tuning framework. SmartFed intelligently reuses knowledge embedded in existing LoRA modules, eliminating the need for expensive training from scratch when adapting LLMs to new tasks. To effectively exploit this knowledge and ensure scalability, we introduce the Mixture of Rank-Wise Experts (MoRE). MoRE decomposes LoRA modules into fine-grained rank-level experts. These experts are selectively activated and combined based on input semantics and resource budgets. Moreover, to optimize resource utilization, we present the Elastic Expert Quota Allocation (EEQA). EEQA adaptively allocates expert capacity across parameter matrices based on their contribution to model performance, focusing computing resources on the critical experts. Extensive evaluations across multiple benchmarks demonstrate that SmartFed significantly outperforms existing methods in model performance and training efficiency.
Wenna Lai, Haoran Xie, Guandong Xu, Qing Li, S. Joe Qin
Aspect sentiment quad prediction (ASQP) is inherently challenging to predict a structured quadruple with four core sentiment elements, including aspect term (a), aspect category (c), opinion term (o), and sentiment polarity (s). Prior methods relying on marker-based prediction struggle with modeling the intricate relationships among elements and experience sharp performance declines when predicting higher-order elements (e.g., c and s) under standard supervised fine-tuning. To address these limitations, we employ reasoning-based generation to output both the quadruple and a natural language rationale under element prefixes within a unified template, encouraging explicit relational reasoning and interpretability. To further enhance element-wise alignment, we introduce a listwise preference optimization framework for improving structural validity and relational coherence. Specifically, we generate element-wise confusable candidates via syntactic and semantic proximity, then train the model with listwise objectives to prefer the gold candidates over closely competing alternatives. Extensive experiments on four benchmark datasets demonstrate that our framework effectively improves quadruple prediction accuracy and explanation consistency.
Authors' comments: 11 pages, 7 figures, and 6 tables
Amirreza Zamani, Ayfer Özgür, Mikael Skoglund
In this paper, we study an information-theoretic problem of designing a fair representation under a bounded point-wise statistical (demographic) parity constraint. More specifically, an agent uses some useful data (database) $X$ to solve a task $T$. Since both $X$ and $T$ are correlated with some latent sensitive attribute or secret $S$, the agent designs a representation $Y$ that satisfies a bounded point-wise statistical parity, that is, such that for all realizations of the representation $y\in\cal Y$, we have $χ^2(P_{S|y};P_S)\leq ε$. In contrast to our previous work, here we use the point-wise measure instead of a bounded mutual information, and we assume that the agent has no direct access to $S$ and $T$; hence, the Markov chains $S - X - Y$ and $T - X - Y$ hold. In this work, we design $Y$ that maximizes the mutual information $I(Y;T)$ about the task while satisfying a bounded compression rate constraint, that is, ensuring that $I(Y;X) \leq r$. Finally, $Y$ satisfies the point-wise bounded statistical parity constraint $χ^2(P_{S|y};P_S)\leq ε$. When $ε$ is small, concepts from information geometry allow us to locally approximate the KL-divergence and mutual information. To design the representation $Y$, we utilize this approximation and show that the main complex fairness design problem can be rewritten as a quadratic optimization problem that has simple closed-form solution under certain constraints. For the cases where the closed-form solution is not obtained we obtain lower bounds with low computational complexity. Here, we provide simple fairness designs with low complexity which are based on finding the maximum singular value and singular vector of a matrix. Finally, in a numerical example we compare our obtained results with the optimal solution.
Kyeongha Rho, Hyeongkeun Lee, Jae Won Cho, Joon Son Chung
In this paper, we propose Mixture of Layer-Wise Tokens (MoLT), a parameter- and memory-efficient adaptation framework for audio-visual learning. The key idea of MoLT is to replace conventional, computationally heavy sequential adaptation at every transformer layer with a parallel, lightweight scheme that extracts and fuses layer-wise tokens only from the late layers. We adopt two types of adapters to distill modality-specific information and cross-modal interaction into compact latent tokens in a layer-wise manner. A token fusion module then dynamically fuses these layer-wise tokens by taking into account their relative significance. To prevent the redundancy of latent tokens, we apply an orthogonality regularization between latent tokens during training. Through the systematic analysis of the position of adaptation in the pre-trained transformers, we extract latent tokens only from the late layers of the transformers. This strategic adaptation approach avoids error propagation from the volatile early-layer features, thereby maximizing the adaptation performance while maintaining parameter and memory efficiency. Through extensive experiments, we demonstrate that MoLT outperforms existing methods on diverse audio-visual benchmarks, including Audio-Visual Question Answering, Audio-Visual Segmentation, and Audio-Visual Event Localization.
Authors' comments: 10 pages, 5 figures
Naifu Zhang, Wei Tao, Xi Xiao, Qianpu Sun, Yuxin Zheng, Wentao Mo, Peiqiang Wang, Nan Zhang
In recent years, Vision-Language-Action (VLA) models in embodied intelligence have developed rapidly. However, existing adversarial attack methods require costly end-to-end training and often generate noticeable perturbation patches. To address these limitations, we propose ADVLA, a framework that directly applies adversarial perturbations on features projected from the visual encoder into the textual feature space. ADVLA efficiently disrupts downstream action predictions under low-amplitude constraints, and attention guidance allows the perturbations to be both focused and sparse. We introduce three strategies that enhance sensitivity, enforce sparsity, and concentrate perturbations. Experiments demonstrate that under an $L_{\infty}=4/255$ constraint, ADVLA combined with Top-K masking modifies less than 10% of the patches while achieving an attack success rate of nearly 100%. The perturbations are concentrated on critical regions, remain almost imperceptible in the overall image, and a single-step iteration takes only about 0.06 seconds, significantly outperforming conventional patch-based attacks. In summary, ADVLA effectively weakens downstream action predictions of VLA models under low-amplitude and locally sparse conditions, avoiding the high training costs and conspicuous perturbations of traditional patch attacks, and demonstrates unique effectiveness and practical value for attacking VLA feature spaces.
Jungyeon Koh, Hyeonho Noh, Hyun Jong Yang
In this letter, we propose a group-wise semantic splitting multiple access framework for multi-user semantic communication in downlink scenarios. The framework begins by applying a balanced clustering mechanism that groups users based on the similarity of their semantic characteristics, enabling the extraction of group-level common features and user-specific private features. The base station then transmits the common features via multicast and the private features via unicast, effectively leveraging both shared and user-dependent semantic information. To further enhance semantic separability and reconstruction fidelity, we design a composite loss function that integrates a reconstruction loss with a repulsion loss, improving both the accuracy of semantic recovery and the distinctiveness of common embeddings in the latent space. Simulation results demonstrate that the proposed method achieves up to 3.26% performance improvement over conventional schemes across various channel conditions, validating its robustness and semantic efficiency for next-generation wireless networks.
Huiyun Tang, Long Feng, Yang Li, Feifei Wang
Vertical Federated Learning (VFL) often suffers from client-wise missingness, where entire feature blocks from some clients are unobserved, and conventional approaches are vulnerable to privacy leakage. We propose a Gaussian copulabased framework for VFL data privatization under missingness constraints, which requires no prior specification of downstream analysis tasks and imposes no restriction on the number of analyses. To privately estimate copula parameters, we introduce a debiased randomized response mechanism for correlation matrix estimation from perturbed ranks, together with a nonparametric privatized marginal estimation that yields consistent CDFs even under MAR. The proposed methods comprise VCDS for MCAR data, EVCDS for MAR data, and IEVCDS, which iteratively refines copula parameters to mitigate MAR-induced bias. Notably, EVCDS and IEVCDS also apply under MCAR, and the framework accommodates mixed data types, including discrete variables. Theoretically, we introduce the notion of Vertical Distributed Attribute Differential Privacy (VDADP), tailored to the VFL setting, establish corresponding privacy and utility guarantees, and investigate the utility of privatized data for GLM coefficient estimation and variable selection. We further establish asymptotic properties including estimation and variable selection consistency for VFL-GLMs. Extensive simulations and a real-data application demonstrate the effectiveness of the proposed framework.
Chengyue Huang, Mellon M. Zhang, Robert Azarcon, Glen Chou, Zsolt Kira
Vision-Language-Action (VLA) models inherit strong priors from pretrained Vision-Language Models (VLMs), but naive fine-tuning often disrupts these representations and harms generalization. Existing fixes -- freezing modules or applying uniform regularization -- either overconstrain adaptation or ignore the differing roles of VLA components. We present MAPS (Module-Wise Proximity Scheduling), the first robust fine-tuning framework for VLAs. Through systematic analysis, we uncover an empirical order in which proximity constraints should be relaxed to balance stability and flexibility. MAPS linearly schedules this relaxation, enabling visual encoders to stay close to their pretrained priors while action-oriented language layers adapt more freely. MAPS introduces no additional parameters or data, and can be seamlessly integrated into existing VLAs. Across MiniVLA-VQ, MiniVLA-OFT, OpenVLA-OFT, and challenging benchmarks such as SimplerEnv, CALVIN, LIBERO, as well as real-world evaluations on the Franka Emika Panda platform, MAPS consistently boosts both in-distribution and out-of-distribution performance (up to +30%). Our findings highlight empirically guided proximity to pretrained VLMs as a simple yet powerful principle for preserving broad generalization in VLM-to-VLA transfer.
Cuong Pham, Hoang Anh Dung, Cuong C. Nguyen, Trung Le, Gustavo Carneiro, Thanh-Toan Do
Large language models (LLMs) have significantly advanced natural language processing, but their massive parameter counts create substantial computational and memory challenges during deployment. Post-training quantization (PTQ) has emerged as a promising approach to mitigate these challenges with minimal overhead. While existing PTQ methods can effectively quantize LLMs, they experience substantial accuracy loss at extremely low bit-widths, primarily due to high-impact parameters that significantly influence quantization performance. Several approaches address these issues by identifying and retaining the high-impact parameters in FP16 format. However, they apply fixed ratios of high-impact parameters across all layers, overlooking layer-wise sensitivity variations. In this paper, we propose a quadratic optimization framework that determines layer-specific ratios of high-impact parameters while considering inter-layer dependencies. We quantize high-impact parameters to moderate bit-widths, which often result in negligible performance degradation in quantized LLMs, while the remaining parameters can be quantized to extremely low bit-widths. Under the same resource-constrained budget, this allows for preserving more high-impact parameters than methods that keep selecting a few in FP16 format. Additionally, the proposed framework allows us to leverage an advanced quantization method that often requires extensive learnable parameters solely for high-impact parameters, while applying a computationally efficient method to the rest. Our approach achieves an effective balance between computational efficiency and model accuracy while maintaining high performance compared to state-of-the-art methods.
Luc Bouteille, Alexander Jaus, Jens Kleesiek, Rainer Stiefelhagen, Lukas Heine
Traditional loss functions in medical image segmentation, such as Dice, often under-segment small lesions because their small relative volume contributes negligibly to the overall loss. To address this, instance-wise loss functions and metrics have been proposed to evaluate segmentation quality on a per-lesion basis. We introduce CC-DiceCE, a loss function based on the CC-Metrics framework, and compare it with the existing blob loss. Both are benchmarked against a DiceCE baseline within the nnU-Net framework, which provides a robust and standardized setup. We find that CC-DiceCE loss increases detection (recall) with minimal to no degradation in segmentation performance, albeit at the cost of slightly more false positives. Furthermore, our multi-dataset study shows that CC-DiceCE generally outperforms blob loss.
Authors' comments: 5 pages, 2 figures, 2 tables
Edoardo Colombo, Vasil Dimitrov, Dario Martelli, Alberto Zaffaroni
In this paper, using equivariant localization for foliations, we compute the on-shell action of a general class of supersymmetric solutions of five-dimensional gauged supergravity with vector multiplets. Unlike previous literature, we also allow for a non-trivial topology for the space-time as well as for the gauge fields. In practice, we achieve this by covering the spacetime manifold with patches and localizing in each patch. We derive a general formula for the on-shell action that depends on topological data only and can be used without a detailed knowledge of the solution. Our final result is relevant for the physically interesting examples of multi-center black holes, black rings and black lenses, topological solitons and Euclidean black saddles. We also show how to connect the topological data with the thermodynamic data: electrostatic potential, the magnetic fluxes and the possible flat connections of the solutions. Our formula for the on-shell action is derived in the context of gauged supergravity, but it is straightforward to take the ungauged limit. Thus, we reproduce known results for a large class of explicit asymptotically flat supersymmetric solutions, and we provide predictions for AdS solutions still to be found.
Authors' comments: 88 pages, 2 figures
Ali Abu-Nada, Aryan Iliat, Russell Ceballo
Amplitude damping fundamentally limits qubit lifetimes by irreversibly leaking energy and information into the environment. Standard Wiseman--Milburn feedback offers only modest improvement because it acts on a single measured quadrature and its corrective drive is degraded by loop delay. We introduce a compact hybrid upgrade with two components: (i) a coherently coupled \emph{ancilla} qubit that receives the homodyne current and feeds back \emph{quantum-coherently} on the system, recovering information from \emph{both} field quadratures and intentionally engineered to decay much faster than the system; and (ii) a lightweight supervised predictor that forecasts the near-future homodyne current, phase-aligning the correction to overcome hardware latency. A Lindblad treatment yields closed-form effective decay rates: the ancilla suppresses the emission channel by a cooperativity factor, while the predictor further suppresses the residual decay in proportion to forecast quality. Using IBM-scale parameters (baseline \(T_1 = 50~μ\mathrm{s}\)), numerical simulations surpass the W--M limit, achieving \(\sim 3\!-\!4\times\) longer \(T_1\) together with improved population retention and integrated energy. The method is modular and hardware-compatible: ancilla coupling and supervised prediction can be added to existing W--M loops to convert leaked information into a precise, time-advanced corrective drive. We also include a detailed, student-friendly derivation of the effective rates for both ancilla-assisted and prediction-enhanced feedback, making the impact of each design element analytically transparent.
Authors' comments: 13 pages, 8 figures. Intended for submission to IEEE QCNC 2026. Comments welcome
Joseph Onuegbu, Dafne Guetta, Yael Hillman, Volker Perdelwitz, Massimo Della Valle
We present a novel approach for characterizing nova candidates by exploiting the infrared capabilities of the Wide-field Infrared Survey Explorer (WISE) catalog. We developed a pipeline to identify novae based on well-defined infrared criteria, and leveraging this pipeline, we successfully identified 41 optically confirmed novae in the WISE catalog. In particular, we focus on the color difference between the optical V band and the WISE 3.4 microns W1 band as a diagnostic. We compared their infrared light curves with their optical counterparts. We identified a strong correlation from which we proposed a color difference model that can be used for further identification and characterization of novae. Our analysis validates the mass-loss timescale theory, which predicts that systems with lower accretion rates accumulate larger envelopes and produce more massive ejecta. We also confirm models' prediction that the early color evolution of novae is governed by ejecta expansion and cooling. From our sample statistics, we infer a Galactic nova rate of approximately 40 to 50 novae per year, consistent with modern and infrared-corrected estimates. The resultant model from this work paves the way for future large-scale investigations of nova candidates.
Farhana Amin, Sabiha Afroz, Kanchon Gharami, Mona Moghadampanah, Dimitrios S. Nikolopoulos
Diffusion models produce high quality images but inference is costly due to many denoising steps and heavy matrix operations. We present DiffPro, a post-training, hardware-faithful framework that works with the exact integer kernels used in deployment and jointly tunes timesteps and per-layer precision in Diffusion Transformers (DiTs) to reduce latency and memory without any training. DiffPro combines three parts: a manifold-aware sensitivity metric to allocate weight bits, dynamic activation quantization to stabilize activations across timesteps, and a budgeted timestep selector guided by teacher-student drift. In experiments DiffPro achieves up to 6.25x model compression, fifty percent fewer timesteps, and 2.8x faster inference with Delta FID <= 10 on standard benchmarks, demonstrating practical efficiency gains. DiffPro unifies step reduction and precision planning into a single budgeted deployable plan for real-time energy-aware diffusion inference.
Jiahao Wang, Bokang Fu, Yu Zhu, Yuli Liu
LLM-based agents are emerging as a promising paradigm for simulating user behavior to enhance recommender systems. However, their effectiveness is often limited by existing studies that focus on modeling user ratings for individual items. This point-wise approach leads to prevalent issues such as inaccurate user preference comprehension and rigid item-semantic representations. To address these limitations, we propose the novel Set-wise Reflective Learning Framework (SRLF). Our framework operationalizes a closed-loop "assess-validate-reflect" cycle that harnesses the powerful in-context learning capabilities of LLMs. SRLF departs from conventional point-wise assessment by formulating a holistic judgment on an entire set of items. It accomplishes this by comprehensively analyzing both the intricate interrelationships among items within the set and their collective alignment with the user's preference profile. This method of set-level contextual understanding allows our model to capture complex relational patterns essential to user behavior, making it significantly more adept for sequential recommendation. Extensive experiments validate our approach, confirming that this set-wise perspective is crucial for achieving state-of-the-art performance in sequential recommendation tasks.
Jawad Ibn Ahad, Muhammad Rafsan Kabir, Robin Krambroeckers, Sifat Momen, Nabeel Mohammed, Shafin Rahman
Natural Language Processing (NLP) has transformed the financial industry, enabling advancements in areas such as textual analysis, risk management, and forecasting. Large language models (LLMs) like BloombergGPT and FinMA have set new benchmarks across various financial NLP tasks, including sentiment analysis, stock movement prediction, and credit risk assessment. Furthermore, FinMA-ES, a bilingual financial LLM, has also demonstrated strong performance using the FLARE and FLARE-ES benchmarks. However, the high computational demands of these models limit the accessibility of many organizations. To address this, we propose Layer-wise Adaptive Ensemble Tuning (LAET), a novel strategy that selectively fine-tunes the most effective layers of pre-trained LLMs by analyzing hidden state representations while freezing less critical layers. LAET significantly reduces computational overhead while enhancing task-specific performance. Our approach shows strong results in financial NLP tasks, outperforming existing benchmarks and state-of-the-art LLMs such as GPT-4, even with smaller LLMs ($\sim$3B parameters). This work bridges cutting-edge financial NLP research and real-world deployment with efficient and scalable models for financial applications.
Ci Lin, Tet Yeap, Iluju Kiringa, Biwei Zhang
Deep neural networks are known to be vulnerable to adversarial perturbations, which are small and carefully crafted inputs that lead to incorrect predictions. In this paper, we propose DeepDefense, a novel defense framework that applies Gradient-Feature Alignment (GFA) regularization across multiple layers to suppress adversarial vulnerability. By aligning input gradients with internal feature representations, DeepDefense promotes a smoother loss landscape in tangential directions, thereby reducing the model's sensitivity to adversarial noise.
We provide theoretical insights into how adversarial perturbation can be decomposed into radial and tangential components and demonstrate that alignment suppresses loss variation in tangential directions, where most attacks are effective. Empirically, our method achieves significant improvements in robustness across both gradient-based and optimization-based attacks. For example, on CIFAR-10, CNN models trained with DeepDefense outperform standard adversarial training by up to 15.2% under APGD attacks and 24.7% under FGSM attacks. Against optimization-based attacks such as DeepFool and EADEN, DeepDefense requires 20 to 30 times higher perturbation magnitudes to cause misclassification, indicating stronger decision boundaries and a flatter loss landscape. Our approach is architecture-agnostic, simple to implement, and highly effective, offering a promising direction for improving the adversarial robustness of deep learning models.
Authors' comments: no available