Longlong Li, Cunquan Qu, Guanghui Wang
Conventional Graph Neural Networks (GNNs) aggregate neighbor embeddings as holistic vectors, lacking the ability to identify fine-grained, direction-specific feature relevance. We propose MSH-GNN (Multi-Scale Harmonic Graph Neural Network), a novel architecture that performs feature-wise adaptive message passing through node-specific harmonic projections. For each node, MSH-GNN dynamically projects neighbor features onto frequency-sensitive directions determined by the target node's own representation. These projections are further modulated using learnable sinusoidal encodings at multiple frequencies, enabling the model to capture both smooth and oscillatory structural patterns across scales. A frequency-aware attention pooling mechanism is introduced to emphasize spectrally and structurally salient nodes during readout. Theoretically, we prove that MSH-GNN approximates shift-invariant kernels and matches the expressive power of the 1-Weisfeiler-Lehman (1-WL) test. Empirically, MSH-GNN consistently outperforms state-of-the-art models on a wide range of graph and node classification tasks. Furthermore, in challenging classification settings involving joint variations in graph topology and spectral frequency, MSH-GNN excels at capturing structural asymmetries and high-frequency modulations, enabling more accurate graph discrimination.
Ziliang Wang, Xuhui Zheng, Kang An, Cijun Ouyang, Jialu Cai, Yuhang Wang, Yichao Wu
Efficient multi-hop reasoning requires Large Language Models (LLMs) based
agents to acquire high-value external knowledge iteratively. Previous work has
explored reinforcement learning (RL) to train LLMs to perform search-based
document retrieval, achieving notable improvements in QA performance, but
underperform on complex, multi-hop QA resulting from the sparse rewards from
global signal only. To address this gap in existing research, we introduce
StepSearch, a framework for search LLMs that trained with step-wise proximal
policy optimization method. It consists of richer and more detailed
intermediate search rewards and token-level process supervision based on
information gain and redundancy penalties to better guide each search step. We
constructed a fine-grained question-answering dataset containing
sub-question-level search trajectories based on open source datasets through a
set of data pipeline method. On standard multi-hop QA benchmarks, it
significantly outperforms global-reward baselines, achieving 11.2% and 4.2%
absolute improvements for 3B and 7B models over various search with RL
baselines using only 19k training data, demonstrating the effectiveness of
fine-grained, stepwise supervision in optimizing deep search LLMs. Our code
will be released on https://github.com/Zillwang/StepSearch.
Authors' comments: 20 pages, 6 figures
Guoming Li, Jian Yang, Yifan Chen
Filtering-based graph neural networks (GNNs) constitute a distinct class of
GNNs that employ graph filters to handle graph-structured data, achieving
notable success in various graph-related tasks. Conventional methods adopt a
graph-wise filtering paradigm, imposing a uniform filter across all nodes, yet
recent findings suggest that this rigid paradigm struggles with heterophilic
graphs. To overcome this, recent works have introduced node-wise filtering,
which assigns distinct filters to individual nodes, offering enhanced
adaptability. However, a fundamental gap remains: a comprehensive framework
unifying these two strategies is still absent, limiting theoretical insights
into the filtering paradigms. Moreover, through the lens of Contextual
Stochastic Block Model, we reveal that a synthesis of graph-wise and node-wise
filtering provides a sufficient solution for classification on graphs
exhibiting both homophily and heterophily, suggesting the risk of excessive
parameterization and potential overfitting with node-wise filtering. To address
the limitations, this paper introduces Coarsening-guided Partition-wise
Filtering (CPF). CPF innovates by performing filtering on node partitions. The
method begins with structure-aware partition-wise filtering, which filters node
partitions obtained via graph coarsening algorithms, and then performs
feature-aware partition-wise filtering, refining node embeddings via filtering
on clusters produced by $k$-means clustering over features. In-depth analysis
is conducted for each phase of CPF, showing its superiority over other
paradigms. Finally, benchmark node classification experiments, along with a
real-world graph anomaly detection application, validate CPF's efficacy and
practical utility.
Authors' comments: Accepted at the 31st ACM SIGKDD Conference on Knowledge Discovery and
Data Mining, KDD 2025 February Cycle
Jan Hurt, Stefan Thurner, Peter Klimek
Dynamic input-output models are standard tools for understanding inter-industry dependencies and how economies respond to shocks like disasters and pandemics. However, traditional approaches often assume fixed prices, limiting their ability to capture realistic economic behavior. Here, we introduce an adaptive extension to dynamic input-output recovery models where producers respond to shocks through simultaneous price and quantity adjustments. Our framework preserves the economic constraints of the Leontief input-output model while converging towards equilibrium configurations based on sector-specific behavioral parameters. When applied to input-output data, the model allows us to compute behavioral metrics indicating whether specific sectors predominantly favor price or quantity adjustments. Using the World Input-Output Database, we identify strong, consistent regional and sector-specific behavioral patterns. These findings provide insights into how different regions employ distinct strategies to manage shocks, thereby influencing economic resilience and recovery dynamics.
Alexandre Moreira, Patricia Silva, Miguel Heleno, Andre Marcato
Long-duration energy storage (LDES) assets can be fundamental resources for the next-generation power systems. However, LDES technologies are still immature and their future technology costs remain highly uncertain. In this context, we perform in this paper an extensive study to estimate the maximum LDES technology costs (which we define as viability costs) under which LDES systems would be economically viable in each state of the contiguous U.S. according to their characteristics. Our results indicate that only 4 states (out of 48) would be able to remove firm conventional generation supported by LDES systems without increasing their total system costs under the current US-DOE cost target of 1,100 US$/kW for multi-day LDES. In addition, we find that states with the highest LDES viability costs have in general low participation of thermal generation, a high share of wind generation, and higher thermal-related fixed operation and maintenance (FO&M) costs.
Sofia Casarin, Sergio Escalera, Oswald Lanz
Training-free Neural Architecture Search (NAS) efficiently identifies
high-performing neural networks using zero-cost (ZC) proxies. Unlike multi-shot
and one-shot NAS approaches, ZC-NAS is both (i) time-efficient, eliminating the
need for model training, and (ii) interpretable, with proxy designs often
theoretically grounded. Despite rapid developments in the field, current SOTA
ZC proxies are typically constrained to well-established convolutional search
spaces. With the rise of Large Language Models shaping the future of deep
learning, this work extends ZC proxy applicability to Vision Transformers
(ViTs). We present a new benchmark using the Autoformer search space evaluated
on 6 distinct tasks and propose Layer-Sample Wise Activation with Gradients
information (L-SWAG), a novel, generalizable metric that characterizes both
convolutional and transformer architectures across 14 tasks. Additionally,
previous works highlighted how different proxies contain complementary
information, motivating the need for a ML model to identify useful
combinations. To further enhance ZC-NAS, we therefore introduce LIBRA-NAS (Low
Information gain and Bias Re-Alignment), a method that strategically combines
proxies to best represent a specific benchmark. Integrated into the NAS search,
LIBRA-NAS outperforms evolution and gradient-based NAS techniques by
identifying an architecture with a 17.0% test error on ImageNet1k in just 0.1
GPU days.
Authors' comments: accepted at CVPR 2025
Austin Braniff, Yuhe Tian
This work formally introduces Y-wise Affine Neural Networks (YANNs), a fully-explainable network architecture that continuously and efficiently represent piecewise affine functions with polytopic subdomains. Following from the proofs, it is shown that the development of YANNs requires no training to achieve the functionally equivalent representation. YANNs thus maintain all mathematical properties of the original formulations. Multi-parametric model predictive control is utilized as an application showcase of YANNs, which theoretically computes optimal control laws as a piecewise affine function of states, outputs, setpoints, and disturbances. With the exact representation of multi-parametric control laws, YANNs retain essential control-theoretic guarantees such as recursive feasibility and stability. This sets YANNs apart from the existing works which apply neural networks for approximating optimal control laws instead of exactly representing them. By optimizing the inference speed of the networks, YANNs can evaluate substantially faster in real-time compared to traditional piecewise affine function calculations. Numerical case studies are presented to demonstrate the algorithmic scalability with respect to the input/output dimensions and the number of subdomains. YANNs represent a significant advancement in control as the first neural network-based controller that inherently ensures both feasibility and stability. Future applications can leverage them as an efficient and interpretable starting point for data-driven modeling/control.
Moritz Vandenhirtz, Julia E. Vogt
Understanding the decision-making process of machine learning models provides
valuable insights into the task, the data, and the reasons behind a model's
failures. In this work, we propose a method that performs inherently
interpretable predictions through the instance-wise sparsification of input
images. To align the sparsification with human perception, we learn the masking
in the space of semantically meaningful pixel regions rather than on
pixel-level. Additionally, we introduce an explicit way to dynamically
determine the required level of sparsity for each instance. We show empirically
on semi-synthetic and natural image datasets that our inherently interpretable
classifier produces more meaningful, human-understandable predictions than
state-of-the-art benchmarks.
Authors' comments: International Conference on Machine Learning
Ingrid Pelisoli, T. R. Marsh, G. Tovmassian, L. A. Amaral, Amornrat Aungwerojwit, M. J. Green, R. P. Ashley, David A. H. Buckley et al.
After its discovery in 2016, the white dwarf binary AR Scorpii (AR Sco)
remained for several years the only white dwarf system to show pulsed radio
emission associated with a fast-spinning white dwarf. The evolutionary origin
and the emission mechanism for AR Sco are not completely understood, with
different models proposed. Testing and improving these models requires
observational input. Here we report the results of a targeted search for other
binary white dwarf pulsars like AR Sco. Using data from Gaia and WISE, we
identified 56 candidate systems with similar properties to AR Sco, of which 26
were previously uncharacterised. These were subject to spectroscopic and
photometric follow-up observations. Aside from one new binary white dwarf
pulsar found, J191213.72-441045.1, which was reported in a separate work, we
find no other systems whose characteristics are akin to AR Sco. The newly
characterised systems are primarily young stellar objects (with 10 found) or
cataclysmic variables (7 identifications), with the remaining being either
blended or non-variable on short timescales.
Authors' comments: 17 pages, 21 figures. Accepted for publication in MNRAS
Shiwei Guo, Ziang Chen, Yupeng Ma, Yunfei Han, Yi Wang
The Transformer model has shown strong performance in multivariate time series forecasting by leveraging channel-wise self-attention. However, this approach lacks temporal constraints when computing temporal features and does not utilize cumulative historical series effectively.To address these limitations, we propose the Structured Channel-wise Transformer with Cumulative Historical state (SCFormer). SCFormer introduces temporal constraints to all linear transformations, including the query, key, and value matrices, as well as the fully connected layers within the Transformer. Additionally, SCFormer employs High-order Polynomial Projection Operators (HiPPO) to deal with cumulative historical time series, allowing the model to incorporate information beyond the look-back window during prediction. Extensive experiments on multiple real-world datasets demonstrate that SCFormer significantly outperforms mainstream baselines, highlighting its effectiveness in enhancing time series forecasting. The code is publicly available at https://github.com/ShiweiGuo1995/SCFormer
Avalpreet Singh Brar, Rong Su, Christos G. Cassandras, Gioele Zardini
Traditional rebalancing methods in ride-hailing systems direct idle drivers to fixed destinations, overlooking the fact that ride allocations frequently occur while cruising. This destination-centric view fails to exploit the path-dependent nature of modern platforms, where real-time matching depends on the entire trajectory rather than a static endpoint. We propose the Wise Goose Chase (WGC) algorithm, an event-triggered, driver-specific path planning framework that anticipates future matching opportunities by forecasting spatio-temporal supply and demand dynamics. WGC uses a system of Retarded Functional Differential Equations (RFDEs) to model the evolution of idle driver density and passenger queues at the road-segment level, incorporating both en-route matching and competition among drivers. Upon request, WGC computes personalized cruising paths that minimize each driver's expected time to allocation. Monte Carlo simulations on synthetic urban networks show that WGC consistently outperforms baseline strategies, highlighting the advantage of predictive, context-aware rebalancing in dynamic mobility systems.
Yu-Hsiang Lan, Eric K. Oermann
There has been a recent surge of interest in time series modeling using the Transformer architecture. However, forecasting multivariate time series with Transformer presents a unique challenge as it requires modeling both temporal (cross-time) and variate (cross-variate) dependencies. While Transformer-based models have gained popularity for their flexibility in capturing both sequential and cross-variate relationships, it is unclear how to best integrate these two sources of information in the context of the Transformer architecture while optimizing for both performance and efficiency. We re-purpose the Transformer architecture to effectively model both cross-time and cross-variate dependencies. Our approach begins by embedding each variate independently into a variate-wise representation that captures its cross-time dynamics, and then models cross-variate dependencies through attention mechanisms on these learned embeddings. Gating operations in both cross-time and cross-variate modeling phases regulate information flow, allowing the model to focus on the most relevant features for accurate predictions. Our method achieves state-of-the-art performance across 13 real-world datasets and can be seamlessly integrated into other Transformer-based and LLM-based forecasters, delivering performance improvements up to 20.7\% over original models. Code is available at this repository: https://github.com/nyuolab/Gateformer.
Raffaele Di Santo, Dikran Dikranjan, Anna Giordano Bruno, Hans Weber
According to Cartan, given an ideal $\mathcal I$ of $\mathbb N$, a sequence $(x_n)_{n\in\mathbb N}$ in the circle group $\mathbb T$ is said to {\em $\mathcal I$-converge} to a point $x\in \mathbb T$ if $\{n\in \mathbb N: x_n \not \in U\}\in \mathcal I$ for every neighborhood $U$ of $x$ in $\mathbb T$. For a sequence $\mathbf u=(u_n)_{n\in\mathbb N}$ in $\mathbb Z$, let $$t_{\mathbf u}^\mathcal I(\mathbb T) :=\{x\in \mathbb T: u_nx \ \text{$\mathcal I$-converges to}\ 0 \}.$$ This set is a Borel (hence, Polishable) subgroup of $\mathbb T$ with many nice properties, largely studied in the case when $\mathcal I = \mathcal F in$ is the ideal of all finite subsets of $\mathbb N$ (so $\mathcal F in$-convergence coincides with the usual one) for its remarkable connection to topological algebra, descriptive set theory and harmonic analysis. We give a complete element-wise description of $t_{\mathbf u}^\mathcal I(\mathbb T)$ when $u_n\mid u_{n+1}$ for every $n\in\mathbb N$ and under suitable hypotheses on $\mathcal I$. In the special case when $\mathcal I =\mathcal F in$, we obtain an alternative proof of a simplified version of a known result.
Pengxiang Li, Zhi Gao, Bofei Zhang, Yapeng Mi, Xiaojian Ma, Chenrui Shi, Tao Yuan, Yuwei Wu et al.
Multimodal agents, which integrate a controller e.g., a vision language
model) with external tools, have demonstrated remarkable capabilities in
tackling complex multimodal tasks. Existing approaches for training these
agents, both supervised fine-tuning and reinforcement learning, depend on
extensive human-annotated task-answer pairs and tool trajectories. However, for
complex multimodal tasks, such annotations are prohibitively expensive or
impractical to obtain. In this paper, we propose an iterative tool usage
exploration method for multimodal agents without any pre-collected data, namely
SPORT, via step-wise preference optimization to refine the trajectories of tool
usage. Our method enables multimodal agents to autonomously discover effective
tool usage strategies through self-exploration and optimization, eliminating
the bottleneck of human annotation. SPORT has four iterative components: task
synthesis, step sampling, step verification, and preference tuning. We first
synthesize multimodal tasks using language models. Then, we introduce a novel
trajectory exploration scheme, where step sampling and step verification are
executed alternately to solve synthesized tasks. In step sampling, the agent
tries different tools and obtains corresponding results. In step verification,
we employ a verifier to provide AI feedback to construct step-wise preference
data. The data is subsequently used to update the controller for tool usage
through preference tuning, producing a SPORT agent. By interacting with real
environments, the SPORT agent gradually evolves into a more refined and capable
system. Evaluation in the GTA and GAIA benchmarks shows that the SPORT agent
achieves 6.41% and 3.64% improvements, underscoring the generalization and
effectiveness introduced by our method. The project page is
https://SPORT-Agents.github.io.
Authors' comments: 24 pages
Changjun Li, Runqing Jiang, Zhuo Song, Pengpeng Yu, Ye Zhang, Yulan Guo
Post-training quantization (PTQ) has evolved as a prominent solution for compressing complex models, which advocates a small calibration dataset and avoids end-to-end retraining. However, most existing PTQ methods employ block-wise reconstruction, which neglects cross-block dependency and exhibits a notable accuracy drop in low-bit cases. To address these limitations, this paper presents a novel PTQ method, dubbed Pack-PTQ. First, we design a Hessian-guided adaptive packing mechanism to partition blocks into non-overlapping packs, which serve as the base unit for reconstruction, thereby preserving the cross-block dependency and enabling accurate quantization parameters estimation. Second, based on the pack configuration, we propose a mixed-precision quantization approach to assign varied bit-widths to packs according to their distinct sensitivities, thereby further enhancing performance. Extensive experiments on 2D image and 3D point cloud classification tasks, using various network architectures, demonstrate the superiority of our method over the state-of-the-art PTQ methods.
Ke Hong, Lufang Chen, Zhong Wang, Xiuhong Li, Qiuli Mao, Jianping Ma, Chao Xiong, Guanyu Wu et al.
Existing large language model (LLM) serving systems fall into two categories:
1) a unified system where prefill phase and decode phase are co-located on the
same GPU, sharing the unified computational resource and storage, and 2) a
disaggregated system where the two phases are disaggregated to different GPUs.
The design of the disaggregated system addresses the latency interference and
sophisticated scheduling issues in the unified system but leads to storage
challenges including 1) replicated weights for both phases that prevent
flexible deployment, 2) KV cache transfer overhead between the two phases, 3)
storage imbalance that causes substantial wasted space of the GPU capacity, and
4) suboptimal resource adjustment arising from the difficulties in migrating KV
cache. Such storage inefficiency delivers poor serving performance under high
request rates.
In this paper, we identify that the advantage of the disaggregated system
lies in the disaggregated computation, i.e., partitioning the computational
resource to enable the asynchronous computation of two phases. Thus, we propose
a novel LLM serving system, semi-PD, characterized by disaggregated computation
and unified storage. In semi-PD, we introduce a computation resource controller
to achieve disaggregated computation at the streaming multi-processor (SM)
level, and a unified memory manager to manage the asynchronous memory access
from both phases. semi-PD has a low-overhead resource adjustment mechanism
between the two phases, and a service-level objective (SLO) aware dynamic
partitioning algorithm to optimize the SLO attainment. Compared to
state-of-the-art systems, semi-PD maintains lower latency at higher request
rates, reducing the average end-to-end latency per request by 1.27-2.58x on
DeepSeek series models, and serves 1.55-1.72x more requests adhering to latency
constraints on Llama series models.
Authors' comments: 18 pages, 16 figures
Brian K. S. Isaac-Medina, Toby P. Breckon
Deep neural networks have demonstrated great generalization capabilities for
tasks whose training and test sets are drawn from the same distribution.
Nevertheless, out-of-distribution (OOD) detection remains a challenging task
that has received significant attention in recent years. Specifically, OOD
detection refers to the detection of instances that do not belong to the
training distribution, while still having good performance on the
in-distribution task (e.g., classification or object detection). Recent work
has focused on generating synthetic outliers and using them to train an outlier
detector, generally achieving improved OOD detection than traditional OOD
methods. In this regard, outliers can be generated either in feature or pixel
space. Feature space driven methods have shown strong performance on both the
classification and object detection tasks, at the expense that the
visualization of training outliers remains unknown, making further analysis on
OOD failure modes challenging. On the other hand, pixel space outlier
generation techniques enabled by diffusion models have been used for image
classification using, providing improved OOD detection performance and outlier
visualization, although their adaption to the object detection task is as yet
unexplored. We therefore introduce Dream-Box, a method that provides a link to
object-wise outlier generation in the pixel space for OOD detection.
Specifically, we use diffusion models to generate object-wise outliers that are
used to train an object detector for an in-distribution task and OOD detection.
Our method achieves comparable performance to previous traditional methods
while being the first technique to provide concrete visualization of generated
OOD objects.
Authors' comments: 9 pages, 6 figures, 2 tables, LatinX in AI CVPR 2025 Workshop
Dasol Jeong, Donggoo Kang, Jiwon Park, Hyebean Lee, Joonki Paik
We propose a diffusion-based framework for zero-shot image editing that unifies text-guided and reference-guided approaches without requiring fine-tuning. Our method leverages diffusion inversion and timestep-specific null-text embeddings to preserve the structural integrity of the source image. By introducing a stage-wise latent injection strategy-shape injection in early steps and attribute injection in later steps-we enable precise, fine-grained modifications while maintaining global consistency. Cross-attention with reference latents facilitates semantic alignment between the source and reference. Extensive experiments across expression transfer, texture transformation, and style infusion demonstrate state-of-the-art performance, confirming the method's scalability and adaptability to diverse image editing scenarios.
Hanyu Zhang, Zhen Xing, Wenxuan Yang, Chenxi Ma, Weimin Tan, Bo Yan
As transfer learning models and datasets grow larger, efficient adaptation
and storage optimization have become critical needs. Coreset selection
addresses these challenges by identifying and retaining the most informative
samples, constructing a compact subset for target domain training. However,
current methods primarily rely on instance-level difficulty assessments,
overlooking crucial category-level characteristics and consequently
under-representing minority classes. To overcome this limitation, we propose
Non-Uniform Class-Wise Coreset Selection (NUCS), a novel framework that
integrates both class-level and instance-level criteria. NUCS automatically
allocates data selection budgets for each class based on intrinsic category
difficulty and adaptively selects samples within optimal difficulty ranges. By
explicitly incorporating category-specific insights, our approach achieves a
more balanced and representative coreset, addressing key shortcomings of prior
methods. Comprehensive theoretical analysis validates the rationale behind
adaptive budget allocation and sample selection, while extensive experiments
across 14 diverse datasets and model architectures demonstrate NUCS's
consistent improvements over state-of-the-art methods, achieving superior
accuracy and computational efficiency. Notably, on CIFAR100 and Food101, NUCS
matches full-data training accuracy while retaining just 30% of samples and
reducing computation time by 60%. Our work highlights the importance of
characterizing category difficulty in coreset selection, offering a robust and
data-efficient solution for transfer learning.
Authors' comments: 11pages
Thanh-Dung Le, Vu Nguyen Ha, Ti Ti Nguyen, Geoffrey Eappen, Prabhu Thiruvasagam, Hong-fu Chou, Duc-Dung Tran, Hung Nguyen-Kha et al.
This study introduces ResNet-GLUSE, a lightweight ResNet variant enhanced
with Gated Linear Unit-enhanced Squeeze-and-Excitation (GLUSE), an adaptive
channel-wise attention mechanism. By integrating dynamic gating into the
traditional SE framework, GLUSE improves feature recalibration while
maintaining computational efficiency. Experiments on EuroSAT and PatternNet
datasets confirm its effectiveness, achieving exceeding \textbf{94\% and 98\%
accuracy}, respectively. While \textbf{MobileViT achieves 99\% accuracy},
ResNet-GLUSE offers \textbf{33x fewer parameters, 27x fewer FLOPs, 33x smaller
model size (MB), $\approx$6x lower power consumption (W), and $\approx$3x
faster inference time (s)}, making it significantly more efficient for onboard
satellite deployment. Furthermore, due to its simplicity, ResNet-GLUSE can be
easily mimicked for \textbf{neuromorphic computing}, enabling ultra-low power
inference at just \textbf{852.30 mW} on Akida Brainchip. This balance between
high accuracy and ultra-low resource consumption establishes ResNet-GLUSE as a
practical solution for real-time Earth Observation (EO) tasks. Reproducible
codes are available in our shared repository.
Authors' comments: Under review. arXiv admin note: text overlap with arXiv:2411.00209