Shehroz S. Khan, Ali Abedi, Charlene H. Chu
Interpreting large volumes of high-dimensional, unlabeled data in a manner
that is comprehensible to humans remains a significant challenge across various
domains. In unsupervised healthcare data analysis, interpreting clustered data
can offer meaningful insights into patients' health outcomes, which hold direct
implications for healthcare providers. This paper addresses the problem of
interpreting clustered sensor data collected from older adult patients
recovering from lower-limb fractures in the community. A total of 560 days of
multimodal sensor data, including acceleration, step count, ambient motion, GPS
location, heart rate, and sleep, alongside clinical scores, were remotely
collected from patients at home. Clustering was first carried out separately
for each data modality to assess the impact of feature sets extracted from each
modality on patients' recovery trajectories. Then, using context-aware
prompting, a large language model was employed to infer meaningful cluster
labels for the clusters derived from each modality. The quality of these
clusters and their corresponding labels was validated through rigorous
statistical testing and visualization against clinical scores collected
alongside the multimodal sensor data. The results demonstrated the statistical
significance of most modality-specific cluster labels generated by the large
language model with respect to clinical scores, confirming the efficacy of the
proposed method for interpreting sensor data in an unsupervised manner. This
unsupervised data analysis approach, relying solely on sensor data, enables
clinicians to identify at-risk patients and take timely measures to improve
health outcomes.
Authors' comments: 15 pages, 2 figures, 3 tables
Jung Hyun Lee, Seungjae Shin, Vinnam Kim, Jaeseong You, An Chen
As the rapid scaling of large language models (LLMs) poses significant
challenges for deployment on resource-constrained devices, there is growing
interest in extremely low-bit quantization, such as 2-bit. Although prior works
have shown that 2-bit large models are pareto-optimal over their 4-bit smaller
counterparts in both accuracy and latency, these advancements have been limited
to pre-trained LLMs and have not yet been extended to instruction-tuned models.
To bridge this gap, we propose Unified Progressive Quantization (UPQ)$-$a novel
progressive quantization framework (FP16$\rightarrow$INT4$\rightarrow$INT2)
that unifies block-wise post-training quantization (PTQ) with
distillation-based quantization-aware training (Distill-QAT) for INT2
instruction-tuned LLM quantization. UPQ first quantizes FP16 instruction-tuned
models to INT4 using block-wise PTQ to significantly reduce the quantization
error introduced by subsequent INT2 quantization. Next, UPQ applies Distill-QAT
to enable INT2 instruction-tuned LLMs to generate responses consistent with
their original FP16 counterparts by minimizing the generalized Jensen-Shannon
divergence (JSD) between the two. To the best of our knowledge, we are the
first to demonstrate that UPQ can quantize open-source instruction-tuned LLMs
to INT2 without relying on proprietary post-training data, while achieving
state-of-the-art performances on MMLU and IFEval$-$two of the most
representative benchmarks for evaluating instruction-tuned LLMs.
Authors' comments: Preprint
Xiao Chen, Sihang Zhou, Ke Liang, Xiaoyu Sun, Xinwang Liu
Chain-of-thought (CoT) distillation allows a large language model (LLM) to guide a small language model (SLM) in reasoning tasks. Existing methods train the SLM to learn the long rationale in one iteration, resulting in two issues: 1) Long rationales lead to a large token-level batch size during training, making gradients of core reasoning tokens (i.e., the token will directly affect the correctness of subsequent reasoning) over-smoothed as they contribute a tiny fraction of the rationale. As a result, the SLM converges to sharp minima where it fails to grasp the reasoning logic. 2) The response is slow, as the SLM must generate a long rationale before reaching the answer. Therefore, we propose chunk-wise training (CWT), which uses a heuristic search to divide the rationale into internal semantically coherent chunks and focuses SLM on learning from only one chunk per iteration. In this way, CWT naturally isolates non-reasoning chunks that do not involve the core reasoning token (e.g., summary and transitional chunks) from the SLM learning for reasoning chunks, making the fraction of the core reasoning token increase in the corresponding iteration. Based on CWT, skip-thinking training (STT) is proposed. STT makes the SLM automatically skip non-reasoning medium chunks to reach the answer, improving reasoning speed while maintaining accuracy. We validate our approach on a variety of SLMs and multiple reasoning tasks.
Weiqi Wang, Limeng Cui, Xin Liu, Sreyashi Nag, Wenju Xu, Chen Luo, Sheikh Muhammad Sarwar, Yang Li et al.
Goal-oriented script planning, or the ability to devise coherent sequences of
actions toward specific goals, is commonly employed by humans to plan for
typical activities. In e-commerce, customers increasingly seek LLM-based
assistants to generate scripts and recommend products at each step, thereby
facilitating convenient and efficient shopping experiences. However, this
capability remains underexplored due to several challenges, including the
inability of LLMs to simultaneously conduct script planning and product
retrieval, difficulties in matching products caused by semantic discrepancies
between planned actions and search queries, and a lack of methods and benchmark
data for evaluation. In this paper, we step forward by formally defining the
task of E-commerce Script Planning (EcomScript) as three sequential subtasks.
We propose a novel framework that enables the scalable generation of
product-enriched scripts by associating products with each step based on the
semantic similarity between the actions and their purchase intentions. By
applying our framework to real-world e-commerce data, we construct the very
first large-scale EcomScript dataset, EcomScriptBench, which includes 605,229
scripts sourced from 2.4 million products. Human annotations are then conducted
to provide gold labels for a sampled subset, forming an evaluation benchmark.
Extensive experiments reveal that current (L)LMs face significant challenges
with EcomScript tasks, even after fine-tuning, while injecting product purchase
intentions improves their performance.
Authors' comments: ACL2025
Ya Li, Bin Zhou, Bo Hu
In speaker verification, traditional models often emphasize modeling long-term contextual features to capture global speaker characteristics. However, this approach can neglect fine-grained voiceprint information, which contains highly discriminative features essential for robust speaker embeddings. This paper introduces a novel model architecture, termed MGFF-TDNN, based on multi-granularity feature fusion. The MGFF-TDNN leverages a two-dimensional depth-wise separable convolution module, enhanced with local feature modeling, as a front-end feature extractor to effectively capture time-frequency domain features. To achieve comprehensive multi-granularity feature fusion, we propose the M-TDNN structure, which integrates global contextual modeling with fine-grained feature extraction by combining time-delay neural networks and phoneme-level feature pooling. Experiments on the VoxCeleb dataset demonstrate that the MGFF-TDNN achieves outstanding performance in speaker verification while remaining efficient in terms of parameters and computational resources.
Jong Chul Lee, Joon Hyeop Lee, Hyunjin Jeong, Mina Pak, Sree Oh
We study star formation rate (SFR) indicators and dust attenuation of 74
nearby star-forming galaxies on kiloparsec scales, based on GALEX
far-ultraviolet (FUV) and WISE mid-infrared (MIR) images with CALIFA optical
integral field spectroscopic data. We obtain hybrid SFR indicators by combining
the observed FUV and MIR luminosities and calibrate them using the
dust-corrected H$\alpha$ luminosity as a reference SFR. The simple linear
combination appears to follow well the reference SFR, but the calibration
residual shows a significant dependence on the specific SFR (sSFR), which can
be removed by employing the combination coefficient or conversion offset that
varies with the sSFR. In the plane of gas versus stellar attenuation, the
median trend line's slope ($\approx$ stellar-to-gas attenuation ratio) changes
from 0.44 to 1.0 with increasing attenuation. The differential attenuation,
defined as the deviation of stellar attenuation from the median trend line, is
strongly correlated with the SFR surface density and sSFR, compatible with the
two-component dust model. The differential attenuation seems to be affected by
both local and global factors.
Authors' comments: 18 pages, 13 figures, To appear in ApJ
Ming Li, Yanhong Li, Ziyue Li, Tianyi Zhou
As the post-training of large language models (LLMs) advances from instruction-following to complex reasoning tasks, understanding how different data affect finetuning dynamics remains largely unexplored. In this paper, we present a spectral analysis of layer-wise gradients induced by low/high-quality instruction and reasoning data for LLM post-training. Our analysis reveals that widely-studied metrics for data evaluation, e.g., IFD, InsTag, Difficulty, and Reward, can be explained and unified by spectral properties computed from gradients' singular value decomposition (SVD). Specifically, higher-quality data are usually associated with lower nuclear norms and higher effective ranks. Notably, effective rank exhibits better robustness and resolution than nuclear norm in capturing subtle quality differences. For example, reasoning data achieves substantially higher effective ranks than instruction data, implying richer gradient structures on more complex tasks. Our experiments also highlight that models within the same family share similar gradient patterns regardless of their sizes, whereas different model families diverge significantly. Providing a unified view on the effects of data quality across instruction and reasoning data, this work illuminates the interplay between data quality and training stability, shedding novel insights into developing better data exploration strategies for post-training.
Yuanhong A, Guoyu Zhang, Yongcheng Zeng, Bo Zhang
In this study, we establish a unified framework to deal with the high dimensional matrix completion problem under flexible nonignorable missing mechanisms. Although the matrix completion problem has attracted much attention over the years, there are very sparse works that consider the nonignorable missing mechanism. To address this problem, we derive a row- and column-wise matrix U-statistics type loss function, with the nuclear norm for regularization. A singular value proximal gradient algorithm is developed to solve the proposed optimization problem. We prove the non-asymptotic upper bound of the estimation error's Frobenius norm and show the performance of our method through numerical simulations and real data analysis.
Stephen Meisenbacher, Chaeeun Joy Lee, Florian Matthes
The task of $\textit{Differentially Private Text Rewriting}$ is a class of
text privatization techniques in which (sensitive) input textual documents are
$\textit{rewritten}$ under Differential Privacy (DP) guarantees. The motivation
behind such methods is to hide both explicit and implicit identifiers that
could be contained in text, while still retaining the semantic meaning of the
original text, thus preserving utility. Recent years have seen an uptick in
research output in this field, offering a diverse array of word-, sentence-,
and document-level DP rewriting methods. Common to these methods is the
selection of a privacy budget (i.e., the $\varepsilon$ parameter), which
governs the degree to which a text is privatized. One major limitation of
previous works, stemming directly from the unique structure of language itself,
is the lack of consideration of $\textit{where}$ the privacy budget should be
allocated, as not all aspects of language, and therefore text, are equally
sensitive or personal. In this work, we are the first to address this
shortcoming, asking the question of how a given privacy budget can be
intelligently and sensibly distributed amongst a target document. We construct
and evaluate a toolkit of linguistics- and NLP-based methods used to allocate a
privacy budget to constituent tokens in a text document. In a series of privacy
and utility experiments, we empirically demonstrate that given the same privacy
budget, intelligent distribution leads to higher privacy levels and more
positive trade-offs than a naive distribution of $\varepsilon$. Our work
highlights the intricacies of text privatization with DP, and furthermore, it
calls for further work on finding more efficient ways to maximize the
privatization benefits offered by DP in text rewriting.
Authors' comments: 14 pages, 1 figure, 6 tables. Accepted to CODASPY 2025
Shu Yang, Chengting Yu, Lei Liu, Hanzhi Ma, Aili Wang, Erping Li
Spiking Neural Networks (SNNs) have garnered considerable attention as a potential alternative to Artificial Neural Networks (ANNs). Recent studies have highlighted SNNs' potential on large-scale datasets. For SNN training, two main approaches exist: direct training and ANN-to-SNN (ANN2SNN) conversion. To fully leverage existing ANN models in guiding SNN learning, either direct ANN-to-SNN conversion or ANN-SNN distillation training can be employed. In this paper, we propose an ANN-SNN distillation framework from the ANN-to-SNN perspective, designed with a block-wise replacement strategy for ANN-guided learning. By generating intermediate hybrid models that progressively align SNN feature spaces to those of ANN through rate-based features, our framework naturally incorporates rate-based backpropagation as a training method. Our approach achieves results comparable to or better than state-of-the-art SNN distillation methods, showing both training and learning efficiency.
Jingyi Zhang, Jiaxing Huang, Huanjin Yao, Shunyu Liu, Xikun Zhang, Shijian Lu, Dacheng Tao
Recent studies generally enhance MLLMs' reasoning capabilities via supervised fine-tuning on high-quality chain-of-thought reasoning data, which often leads models to merely imitate successful reasoning paths without understanding what the wrong reasoning paths are. In this work, we aim to enhance the MLLMs' reasoning ability beyond passively imitating positive reasoning paths. To this end, we design Step-wise Group Relative Policy Optimization (StepGRPO), a new online reinforcement learning framework that enables MLLMs to self-improve reasoning ability via simple, effective and dense step-wise rewarding. Specifically, StepGRPO introduces two novel rule-based reasoning rewards: Step-wise Reasoning Accuracy Reward (StepRAR) and Step-wise Reasoning Validity Reward (StepRVR). StepRAR rewards the reasoning paths that contain necessary intermediate reasoning steps via a soft key-step matching technique, while StepRAR rewards reasoning paths that follow a well-structured and logically consistent reasoning process through a reasoning completeness and logic evaluation strategy. With the proposed StepGRPO, we introduce R1-VL, a series of MLLMs with outstanding capabilities in step-by-step reasoning. Extensive experiments over 8 benchmarks demonstrate the superiority of our methods.
Yuqi Wu, Guangya Wan, Jingjing Li, Shengming Zhao, Lingfeng Ma, Tianyi Ye, Ion Pop, Yanbo Zhang et al.
Translating state-of-the-art NLP into practice often stalls at the "last
mile" owing to insufficient contextualization of the target domain's knowledge,
processes, and evaluation. Psychiatric differential diagnosis exemplifies this
challenge: accurate assessments depend on nuanced clinical knowledge, a
delicate cognitive-affective interview process, and downstream outcomes that
extend far beyond benchmark accuracy. We present WiseMind, a systematic
interdisciplinary contextualization framework that delivers both instrumental
(diagnostic precision) and humanistic (empathy) gains. WiseMind comprises three
components:(i) structured knowledge-guided proactive reasoning, which embeds
DSM-5 criteria in a knowledge graph to steer questioning; (ii) a
theory-informed dual-agent architecture that coordinates a "reasonable-mind"
reasoning agent and an "emotional-mind" empathy agent, inspired by Dialectical
Behavior Therapy; and (iii) a multi-faceted evaluation strategy covering
simulated patients, user studies, clinician review, and ethical assessment.
Tested on depression, anxiety, and bipolar disorder, WiseMind attains up to
84.2% diagnostic accuracy, which is comparable to human experts, while
outperforming single-agent baselines in perceived empathy and trustworthiness.
These results show that deep contextualization-across knowledge, process, and
evaluation layers-can transform benchmark-driven NLP into clinically meaningful
impact.
Authors' comments: 27 pages, 13 figures
Jian Song, Boxuan Zheng, Xiangfei Yang, Donglin Wang
Due to the similar characteristics between event-based visual data and point clouds, recent studies have emerged that treat event data as event clouds to learn based on point cloud analysis. Additionally, some works approach point clouds from the perspective of event vision, employing Spiking Neural Network (SNN) due to their asynchronous nature. However, these contributions are often domain-specific, making it difficult to extend their applicability to other intersecting fields. Moreover, while SNN-based visual tasks have seen significant growth, the conventional timestep-wise iterative activation strategy largely limits their real-world applications by large timesteps, resulting in significant delays and increased computational costs. Although some innovative methods achieve good performance with short timesteps (<10), few have fundamentally restructured the update strategy of spiking neurons to completely overcome the limitations of timesteps. In response to these concerns, we propose a novel and general activation strategy for spiking neurons called Activation-wise Membrane Potential Propagation (AMP2). This approach extends the concept of timesteps from a manually crafted parameter within the activation function to any existing network structure. In experiments on common point cloud tasks (classification, object, and scene segmentation) and event cloud tasks (action recognition), we found that AMP2 stabilizes SNN training, maintains competitive performance, and reduces latency compared to the traditional timestep-wise activation paradigm.
Xuelin Shen, Yitong Wang, Silin Zheng, Kang Xiao, Wenhan Yang, Xu Wang
In the context of Omni-Directional Image (ODI) Super-Resolution (SR), the
unique challenge arises from the non-uniform oversampling characteristics
caused by EquiRectangular Projection (ERP). Considerable efforts in designing
complex spherical convolutions or polyhedron reprojection offer significant
performance improvements but at the expense of cumbersome processing procedures
and slower inference speeds. Under these circumstances, this paper proposes a
new ODI-SR model characterized by its capacity to perform Fast and
Arbitrary-scale ODI-SR processes, denoted as FAOR. The key innovation lies in
adapting the implicit image function from the planar image domain to the ERP
image domain by incorporating spherical geometric priors at both the latent
representation and image reconstruction stages, in a low-overhead manner.
Specifically, at the latent representation stage, we adopt a pair of pixel-wise
and semantic-wise sphere-to-planar distortion maps to perform affine
transformations on the latent representation, thereby incorporating it with
spherical properties. Moreover, during the image reconstruction stage, we
introduce a geodesic-based resampling strategy, aligning the implicit image
function with spherical geometrics without introducing additional parameters.
As a result, the proposed FAOR outperforms the state-of-the-art ODI-SR models
with a much faster inference speed. Extensive experimental results and ablation
studies have demonstrated the effectiveness of our design.
Authors' comments: 9 pages, 4 figures, AAAI 2025
Xing Li, Zeyu Xing, Yiming Li, Linping Qu, Hui-Ling Zhen, Wulong Liu, Yiwu Yao, Sinno Jialin Pan et al.
KV cache quantization can improve Large Language Models (LLMs) inference
throughput and latency in long contexts and large batch-size scenarios while
preserving LLMs effectiveness. However, current methods have three unsolved
issues: overlooking layer-wise sensitivity to KV cache quantization, high
overhead of online fine-grained decision-making, and low flexibility to
different LLMs and constraints. Therefore, we thoroughly analyze the inherent
correlation of layer-wise transformer attention patterns to KV cache
quantization errors and study why key cache is more important than value cache
for quantization error reduction. We further propose a simple yet effective
framework KVTuner to adaptively search for the optimal hardware-friendly
layer-wise KV quantization precision pairs for coarse-grained KV cache with
multi-objective optimization and directly utilize the offline searched
configurations during online inference. To reduce the computational cost of
offline calibration, we utilize the intra-layer KV precision pair pruning and
inter-layer clustering to reduce the search space. Experimental results show
that we can achieve nearly lossless 3.25-bit mixed precision KV cache
quantization for LLMs like Llama-3.1-8B-Instruct and 4.0-bit for sensitive
models like Qwen2.5-7B-Instruct on mathematical reasoning tasks. The maximum
inference throughput can be improved by 38.3% compared with KV8 quantization
over various context lengths. Our code and searched configurations are
available at https://github.com/cmd2001/KVTuner.
Authors' comments: 36 pages. Code: https://github.com/cmd2001/KVTuner
Ka Ho Lai, Lok Ming Lui
Surface parametrization is a crucial part in various fields, having applications in computer graphic, medical imaging, scientific computing and computational engineering. The majority of surface parametrization approaches are performed on triangular meshes. On the contrary, the theories and methods of point cloud surface parametrization are less researched, despite its rising significance. In this work, we compute surface parametrization in an optimization approach using neural networks, with novel loss functions introduced without extrinsic information, together with theoretical analyses. Based on the theory, we develop an optimization algorithm to improve the parametrization quality. Using our methods, general open surfaces can be parametrized in either free-boundary manner or with arbitrary domain constraints. Landmark matching can also be enforced under our framework. Numerical experiments are conducted and presented, along with applications including surface reconstruction and boundary detection.
Fumio Hiroshima, Oliver Matte
We study the renormalized Nelson model for a scalar matter particle in a
continuous confining potential interacting with a possibly massless quantized
radiation field. When the radiation field is massless we impose a mild infrared
regularization ensuring that the Nelson Hamiltonian has a non-degenerate ground
state in all considered cases. Employing Feynman-Kac representations, we derive
lower bounds on the point-wise spatial decay of the partial Fock space norms of
ground state eigenvectors. Here the exponential rate function governing the
decay is given by the Agmon distance familiar from the analysis of
Schr\"{o}dinger operators. For a large class of confining potentials, our lower
bounds on the decay of ground state eigenvectors match asymptotically with the
upper bounds implied by previous work of the present authors.
Authors' comments: 16 pages
Hui Rao, Yan-Li Xu, Yuan Zhang
Doubling measure was introduced by Beurling and Ahlfors in 1956 and now it becomes a basic concept in analysis on metric space. In this paper, for a measure which is not doubling, we introduce a notion of point-wise doubling index, and calculate the point-wise doubling indices of uniform Bernoulli measures on Bedford-McMullen carpets. As an application, we show that, except a small class of Bedford-McMullen carpets, if two Bedford-McMullen carpets are bi-Lipschitz equivalent, then they have the same fiber sequence up to a permutation.
Selcuk Sözeri, Nihad Abuawwad, Amal Aldarawsheh, Samir Lounis
We conduct a comprehensive density functional theory (DFT) study to explore the intricate magnetic properties of frustrated Mn monolayer on the Ag(111) surface. Spin-polarized scanning tunneling microscopy demonstrates that a N\'eel magnetic state characterizes such an interface, which contradicts systematic ab-initio predictions made in the last two decades indicating that the ground state is collinear row-wise antiferromagnetic (RW-AFM) state. Here, we employ the all-electron full-potential Korringa-Kohn-Rostoker Green function (KKR) method and find that the ground state is a chiral magnetic N\'eel state, with magnetic moments rotating in the surface plane following a unique sense of rotation, as dictated by the underlying in-plane magnetic anisotropy and Dzyaloshinskii-Moriya interaction. Once allowing disordered magnetic states, as described within the disordered local moment (DLM) approach, we reveal the possibility of stabilization of a RW-AFM state. We conjecture that at low temperatures, the chiral N\'eel state prevails, while at higher temperatures, the magnetic exchange interactions are modified by magnetic disorder, which can then induce a transition towards a RW-AFM state. Our work addresses a long term experimental-theoretical controversy and provides significant insights into the magnetic interactions and stability of Mn films on noble metal substrates, contributing to the broader understanding of the different magnetic facets of frustrated magnetism in thin films.
Davor Vukadin, Petar Afrić, Marin Šilić, Goran Delač
Recent advancement in deep-neural network performance led to the development
of new state-of-the-art approaches in numerous areas. However, the black-box
nature of neural networks often prohibits their use in areas where model
explainability and model transparency are crucial. Over the years, researchers
proposed many algorithms to aid neural network understanding and provide
additional information to the human expert. One of the most popular methods
being Layer-Wise Relevance Propagation (LRP). This method assigns local
relevance based on the pixel-wise decomposition of nonlinear classifiers. With
the rise of attribution method research, there has emerged a pressing need to
assess and evaluate their performance. Numerous metrics have been proposed,
each assessing an individual property of attribution methods such as
faithfulness, robustness or localization. Unfortunately, no single metric is
deemed optimal for every case, and researchers often use several metrics to
test the quality of the attribution maps. In this work, we address the
shortcomings of the current LRP formulations and introduce a novel method for
determining the relevance of input neurons through layer-wise relevance
propagation. Furthermore, we apply this approach to the recently developed
Vision Transformer architecture and evaluate its performance against existing
methods on two image classification datasets, namely ImageNet and PascalVOC.
Our results clearly demonstrate the advantage of our proposed method.
Furthermore, we discuss the insufficiencies of current evaluation metrics for
attribution-based explainability and propose a new evaluation metric that
combines the notions of faithfulness, robustness and contrastiveness. We
utilize this new metric to evaluate the performance of various
attribution-based methods. Our code is available at:
https://github.com/davor10105/relative-absolute-magnitude-propagation
Authors' comments: 30 pages, 16 figures, 13 tables, ACM Transactions on Intelligence
Systems and Technology