T. A. Stockmans, F. Snik, J. M. Smit, J. H. H. Rietjens, M. Esposito, C. van Dijk, C. U. Keller
Modern detector manufacturing allows spectral and polarimetric filters to be
directly integrated on top of separate detector pixels. This enables the
creation of CubeSat-sized spectro-polarimetric instruments that are not much
larger than the detector and a lens. Redundancy inherent to the observed scene,
offers the opportunity for sparse sampling in the form of not scanning all
filters at every location. However, when there are fewer pushbroom steps than
filters, data are missing in the resulting data cube. The missing, largely
redundant data can be filled in with interpolation methods, often called
demosaicers. The choice of filters and their precise layout influences the
performance of the instrument after the demosaicing process. In these
proceedings we describe a part of a design toolbox for both the filter layout
and the optimum parameters for the reconstruction to a full
spectro-polarimetric data cube. The design tool is based on training a (neural)
network and jointly updating the values of the filters and demosaicer. We
optimized a filter layout by training on spectro-polarimetric remote
observations of the Earth acquired by SPEX airborne. This optimised filter
layout could reconstruct a validation scene from five overlapping snapshots
(pushbroom steps), which would take 109 pushbroom steps when measuring with a
classical layout and no reconstruction.
Authors' comments: 5 pages, 3 figures, conference proceedings
Jihun Park, Jongmin Gim, Kyoungmin Lee, Minseok Oh, Minwoo Choi, Jaeyeul Kim, Woo Chool Park, Sunghoon Im
We present a training-free style-aligned image generation method that
leverages a scale-wise autoregressive model. While large-scale text-to-image
(T2I) models, particularly diffusion-based methods, have demonstrated
impressive generation quality, they often suffer from style misalignment across
generated image sets and slow inference speeds, limiting their practical
usability. To address these issues, we propose three key components: initial
feature replacement to ensure consistent background appearance, pivotal feature
interpolation to align object placement, and dynamic style injection, which
reinforces style consistency using a schedule function. Unlike previous methods
requiring fine-tuning or additional training, our approach maintains fast
inference while preserving individual content details. Extensive experiments
show that our method achieves generation quality comparable to competing
approaches, significantly improves style alignment, and delivers inference
speeds over six times faster than the fastest model.
Authors' comments: 17 pages, 15 figures
Ivan Ilin, Peter Richtarik
This paper presents Thanos, a novel weight-pruning algorithm designed to
reduce the memory footprint and enhance the computational efficiency of large
language models (LLMs) by removing redundant weights while maintaining
accuracy. Thanos introduces a block-wise pruning strategy with adaptive masks
that dynamically adjust to weight importance, enabling flexible sparsity
patterns and structured formats, such as $n:m$ sparsity, optimized for hardware
acceleration. Experimental evaluations demonstrate that Thanos achieves
state-of-the-art performance in structured pruning and outperforms existing
methods in unstructured pruning. By providing an efficient and adaptable
approach to model compression, Thanos offers a practical solution for deploying
large models in resource-constrained environments.
Authors' comments: 8 pages, 3 Figures, 3 Tables, 2 Algorithms, paper comes with Appendix
Laura M Fernández-Pardo, Jorge Rodríguez-López
We present a version of Krasnosel'skii fixed point theorem for operators acting on Cartesian products of normed linear spaces, under cone-compression and cone-expansion conditions of norm type. Our approach, based on the fixed point index theory in cones, guarantees the existence of a coexistence fixed point - that is, one with nontrivial components. As an application, we prove the existence of periodic solutions with strictly positive components for a system of second-order differential equations. In particular, we address cases involving singular nonlinearities and hybrid terms, characterized by sublinear behavior in one component and superlinear behavior in the other.
Shivesh Prakash, Viki Kumar Prasad, Hans-Arno Jacobsen
We introduce MHNpath, a machine learning-driven retrosynthetic tool designed for computer-aided synthesis planning. Leveraging modern Hopfield networks and novel comparative metrics, MHNpath efficiently prioritizes reaction templates, improving the scalability and accuracy of retrosynthetic predictions. The tool incorporates a tunable scoring system that allows users to prioritize pathways based on cost, reaction temperature, and toxicity, thereby facilitating the design of greener and cost-effective reaction routes. We demonstrate its effectiveness through case studies involving complex molecules from ChemByDesign, showcasing its ability to predict novel synthetic and enzymatic pathways. Furthermore, we benchmark MHNpath against existing frameworks, replicating experimentally validated "gold-standard" pathways from PaRoutes. Our case studies reveal that the tool can generate shorter, cheaper, moderate-temperature routes employing green solvents, as exemplified by compounds such as dronabinol, arformoterol, and lupinine.
Reza Esfandiarpoor, George Zerveas, Ruochen Zhang, Macton Mgonzo, Carsten Eickhoff, Stephen H. Bach
Recent advancements in large language models (LLMs) have allowed the
augmentation of information retrieval (IR) pipelines with synthetic data in
various ways. Yet, the main training paradigm remains: contrastive learning
with binary relevance labels and the InfoNCE loss, where one positive document
is compared against one or more negatives. This objective treats all documents
that are not explicitly annotated as relevant on an equally negative footing,
regardless of their actual degree of relevance, thus (a) missing subtle nuances
that are useful for ranking and (b) being susceptible to annotation noise. To
overcome this limitation, in this work we forgo real training documents and
annotations altogether and use open-source LLMs to directly generate synthetic
documents that answer real user queries according to several different levels
of relevance. This fully synthetic ranking context of graduated relevance,
together with an appropriate list-wise loss (Wasserstein distance), enables us
to train dense retrievers in a way that better captures the ranking task.
Experiments on various IR datasets show that our proposed approach outperforms
conventional training with InfoNCE by a large margin. Without using any real
documents for training, our dense retriever significantly outperforms the same
retriever trained through self-supervision. More importantly, it matches the
performance of the same retriever trained on real, labeled training documents
of the same dataset, while being more robust to distribution shift and clearly
outperforming it when evaluated zero-shot on the BEIR dataset collection.
Authors' comments: Code: https://github.com/BatsResearch/sycl
Yuyuan Li, Junjie Fang, Chaochao Chen, Xiaolin Zheng, Yizhao Zhang, Zhongxuan Han
In this paper, we reproduce the experimental results presented in our previous work titled "Making Users Indistinguishable: Attribute-wise Unlearning in Recommender Systems," which was published in the proceedings of the 31st ACM International Conference on Multimedia. This paper aims to validate the effectiveness of our proposed method and help others reproduce our experimental results. We provide detailed descriptions of our preprocessed datasets, source code structure, configuration file settings, experimental environment, and reproduced experimental results.
Yuta Tomokiyo, Keita Nishimoto, Kimitaka Asatani, Ichiro Sakata
Researchers are no longer limited to producing knowledge; in today's complex world, they also address societal challenges by engaging in policymaking. Although involvement in policymaking has expanded, direct empirical evidence of its career benefits remains underexplored. Prior survey-based studies suggest potential advantages-such as broader professional networks and enhanced opportunities-yet raise concerns about insufficient institutional support. Here, we examine the 2021 WHO global air quality guideline-a science-based regulatory guideline-as a case study. To evaluate the impact of guideline development on research outcomes, we match guideline researchers with a control group of peers sharing similar research topics and prior performance. Our analysis reveals that guideline researchers attain higher future citation counts in both academic and policy domains. New collaborations formed during development yield publications with higher citation impact and the disruptive index. Moreover, about half the guideline's references are derived from guideline researchers' papers, highlighting their central role in shaping the evidence base. These results provide empirical support for the career benefits of policy engagement. Our findings indicate that engaging in international guideline development offers tangible career incentives for researchers, and that institutions can enhance research impact and promote innovative scientific progress by actively supporting their researchers' participation in such initiatives.
Zhuo-Yang Song, Zeyu Li, Qing-Hong Cao, Ming-xing Luo, Hua Xing Zhu
The geometric evolution of token representations in large language models
(LLMs) presents a fundamental paradox: while human language inherently
organizes semantic information in low-dimensional spaces ($\sim 10^1$
dimensions), modern LLMs employ high-dimensional embeddings ($\sim 10^3$
dimensions) processed through Transformer architectures. To resolve this
paradox, this work bridges this conceptual gap by developing a geometric
framework that tracks token dynamics across Transformers layers. Through
layer-wise analysis of intrinsic dimensions across multiple architectures, we
reveal an expansion-contraction pattern where tokens diffuse to a "working
space" and then progressively project onto lower-dimensional submanifolds. Our
finding implies a negative correlation between the working space dimension and
parameter-sensitive performance of the LLMs, and indicates that effective
models tend to compress tokens into approximately 10-dimensional submanifolds,
closely resembling human semantic spaces. This work not only advances LLM
interpretability by reframing Transformers layers as projectors that mediate
between high-dimensional computation and low-dimensional semantics, but also
provides practical tools for model diagnostics that do not rely on
task-specific evaluations.
Authors' comments: 17 pages, 9 figures, 2 tables
Jerry Jun-Yan Zhang, Nicolas Lodieu, Eduardo L. Martín, Pascal Tremblin, María Rosa Zapatero Osorio, Víctor J. S. Béjar, Nikola Vitas, Bartosz Gauza et al.
WISEA J181006.18-101000.5 (WISE1810) is the nearest metal-poor ultracool
dwarf to the Sun. It has a low effective temperature and has been classified as
extreme early-T subdwarf. However, methane, the characteristic molecule of the
spectral class T, was not seen in the previous low-resolution spectrum. Using
the 10.4-m Gran Telescopio Canarias, we collected a high-quality JHK-band
intermediate-resolution R~5000 spectrum of WISE1810, in which a 17+/-6 ppm of
methane is clearly detected, while carbon monoxide is absent. Based on customly
computed ATMO2020++ model, we estimated an effective temperature of 1000+/-100
K, a high surface gravity of log g = 5.5+/-0.5 dex, a carbon abundance
[C/H]=-1.5+/-0.2 dex, inferring [Fe/H]=-1.7+/-0.2 dex. Potassium is not seen in
our data, and the upper limits of pseudo-equivalent width of J-band atomic
lines are at least 25 to 60 times weaker than those measured from
solar-metallicity early-T counterparts. We measured a heliocentric radial
velocity of -83+/-13 km/s, inferring that WISE1810 is more likely a thick disk
member.
Authors' comments: 7 pages, 2 figures in text; 5 figures in appendices. Accepted in ApJL
Inpyo Hong, Youngwan Jo, Hyojeong Lee, Sunghyun Ahn, Sanghyun Park
Zero-shot quantization (ZSQ) enables neural network compression without original training data, making it a promising solution for restricted data access scenarios. To compensate for the lack of data, recent ZSQ methods typically rely on synthetic inputs generated from the full-precision model. However, these synthetic inputs often lead to activation distortion, especially under low-bit settings. As a result, existing methods struggle to mitigate this issue due to coarse activation scaling. To address this issue, we propose GranQ, a novel activation quantization framework that efficiently applies per-channel scaling through vectorized computation. In contrast to conventional channel-wise methods, which apply vectorization only to the quantization step, GranQ improves efficiency by vectorizing the scaling operation. This design allows GranQ to maintain fine-grained quantization granularity with minimal computational overhead, even in low-bit environments. Extensive experiments under quantization-aware training (QAT) settings demonstrate that GranQ consistently outperforms state-of-the-art ZSQ methods across CIFAR and ImageNet. In particular, our method achieves up to 5.45% higher accuracy in the 3-bit setting on CIFAR-100 and even surpasses the full-precision baseline on CIFAR-10. Furthermore, GranQ achieves significant speedup in quantization latency over conventional per-channel methods, demonstrating improved efficiency. With these findings, we anticipate that GranQ will inspire future research beyond conventional ZSQ approaches centered on data generation and model fine-tuning.
Joshua Näf, Keith Moffat, Jaap Eising, Florian Dörfler
This paper proposes Select-Data-driven Predictive Control (Select-DPC), a new method for controlling nonlinear systems using output-feedback for which data are available but an explicit model is not. At each timestep, Select-DPC employs only the most relevant data to implicitly linearize the dynamics in "trajectory space". Then, taking user-defined output constraints into account, it makes control decisions using a convex optimization. This optimal control is applied in a receding-horizon manner. As the online data-selection is the core of Select-DPC, we propose and verify both norm-based and manifold-embedding-based selection methods. We evaluate Select-DPC on three benchmark nonlinear system simulators -- rocket-landing, a robotic arm and cart-pole inverted pendulum swing-up -- comparing them with standard Data-enabled Predictive Control (DeePC) and Time-Windowed DeePC methods, and find that Select-DPC outperforms both methods.
Youhui Zuo, Sibo Wei, Chen Zhang, Zhuorui Liu, Wenpeng Lu, Dawei Song
With the advancements in long-context inference capabilities of large language models (LLMs), the KV cache has become one of the foundational components. However, its substantial GPU memory consumption makes KV cache compression a key technique for enabling efficient LLM inference in industrial scenarios. While recent studies have focused on optimizing the memory occupied by the KV cache, they overlook two critical factors: preserving semantic coherence and considering task-specific characteristic during compression. To address these limitations, we propose a novel task-adaptive KV cache window selection method, WindowKV. WindowKV dynamically selects local semantic windows consisting of consecutive tokens, according to task-specific characteristics, ensuring the retained KV cache captures continuous, essential context. Additionally, we introduce an intra-group layer KV cache indices sharing strategy to reduce computational overhead, achieving a balance between performance and efficiency. We rigorously evaluate WindowKV on the LongBench benchmark, and the results demonstrate that it maintains a performance comparable to full KV cache retention while using only 12% of the original KV cache, significantly reducing memory requirements. Furthermore, our method also achieves state-of-the-art results in the Needle-in-a-Haystack evaluation, highlighting its effectiveness and robustness.
Aladin Djuhera, Swanand Ravindra Kadhe, Farhan Ahmed, Syed Zawad, Holger Boche
Fine-tuning large language models (LLMs) on downstream tasks can inadvertently erode their safety alignment, even for benign fine-tuning datasets. We address this challenge by proposing SafeMERGE, a post-fine-tuning framework that preserves safety while maintaining task utility. It achieves this by selectively merging fine-tuned and safety-aligned model layers only when those deviate from safe behavior, measured by a cosine similarity criterion. We evaluate SafeMERGE against other fine-tuning- and post-fine-tuning-stage approaches for Llama-2-7B-Chat and Qwen-2-7B-Instruct models on GSM8K and PubMedQA tasks while exploring different merging strategies. We find that SafeMERGE consistently reduces harmful outputs compared to other baselines without significantly sacrificing performance, sometimes even enhancing it. The results suggest that our selective, subspace-guided, and per-layer merging method provides an effective safeguard against the inadvertent loss of safety in fine-tuned LLMs while outperforming simpler post-fine-tuning-stage defenses.
Mingyang Song, Mao Zheng, Zheng Li, Wenjie Yang, Xuan Luo, Yue Pan, Feng Zhang
Improving training efficiency continues to be one of the primary challenges
in large-scale Reinforcement Learning (RL). In this paper, we investigate how
context length and the complexity of training data influence the RL scaling
training process of R1-distilled small reasoning models, e.g.,
DeepSeek-R1-Distill-Qwen-1.5B. Our experimental results reveal that: (1) simply
controlling the context length and curating the training data based on the
input prompt length can effectively improve the training efficiency of scaling
RL, achieving better performance with more concise CoT; (2) properly scaling
the context length helps mitigate entropy collapse; and (3) choosing an optimal
context length can improve the efficiency of model training and incentivize the
model's chain-of-thought reasoning capabilities. Inspired by these insights, we
propose FastCuRL, a curriculum RL framework with stage-wise context scaling to
achieve efficient training and concise CoT reasoning. Experiment results
demonstrate that FastCuRL-1.5B-V3 significantly outperforms state-of-the-art
reasoning models on five competition-level benchmarks and achieves 49.6\%
accuracy on AIME 2024. Furthermore, FastCuRL-1.5B-Preview surpasses
DeepScaleR-1.5B-Preview on five benchmarks while only using a single node with
8 GPUs and a total of 50\% of training steps. %The code, training data, and
models will be publicly released.
Authors' comments: Ongoing Work
Julian Ziegler, Patrick Frenzel, Mirco Fuchs
This work concerns itself with the task of reconstructing all edges of an
arbitrary 3D wire-frame model projected to an image plane. We explore a
bottom-up part-wise procedure undertaken by an RL agent to segment and
reconstruct these 2D multipart objects. The environment's state is represented
as a four-colour image, where different colours correspond to background, a
target edge, a reconstruction line, and the overlap of both. At each step, the
agent can transform the reconstruction line within a four-dimensional action
space or terminate the episode using a specific termination action. To
investigate the impact of reward function formulations, we tested episodic and
incremental rewards, as well as combined approaches. Empirical results
demonstrated that the latter yielded the most effective training performance.
To further enhance efficiency and stability, we introduce curriculum learning
strategies. First, an action-based curriculum was implemented, where the agent
was initially restricted to a reduced action space, being able to only perform
three of the five possible actions, before progressing to the full action
space. Second, we test a task-based curriculum, where the agent first solves a
simplified version of the problem before being presented with the full, more
complex task. This second approach produced promising results, as the agent not
only successfully transitioned from learning the simplified task to mastering
the full task, but in doing so gained significant performance. This study
demonstrates the potential of an iterative RL wire-frame reconstruction in two
dimensions. By combining optimized reward function formulations with curriculum
learning strategies, we achieved significant improvements in training success.
The proposed methodology provides an effective framework for solving similar
tasks and represents a promising direction for future research in the field.
Authors' comments: Accepted to RLDM 2025
Jingjing Zhao, Qingyi Huang, Kaiquan Cai, Quan Zhou, Xidong Mu, Yuanwei Liu
A point-to-point movable element (ME) enabled reconfigurable intelligent surface (ME-RIS) communication system is investigated, where each element position can be flexibly adjusted to create favorable channel conditions. For maximizing the communication rate, an efficient ME position optimization approach is proposed. Specifically, by characterizing the cascaded channel power gain in an element-wise manner, the position of each ME is iteratively updated by invoking the successive convex approximation method. Numerical results unveil that 1) the proposed element-wise ME position optimization algorithm outperforms the gradient descent algorithm; and 2) the ME-RIS significantly improves the communication rate compared to the conventional RIS with fixed-position elements.
Bouarfa Mahi Quantiota
We introduce the Structured Knowledge Accumulation (SKA) framework, which
reinterprets entropy as a dynamic, layer-wise measure of knowledge alignment in
neural networks. Instead of relying on traditional gradient-based optimization,
SKA defines entropy in terms of knowledge vectors and their influence on
decision probabilities across multiple layers. This formulation naturally leads
to the emergence of activation functions such as the sigmoid as a consequence
of entropy minimization. Unlike conventional backpropagation, SKA allows each
layer to optimize independently by aligning its knowledge representation with
changes in decision probabilities. As a result, total network entropy decreases
in a hierarchical manner, allowing knowledge structures to evolve
progressively. This approach provides a scalable, biologically plausible
alternative to gradient-based learning, bridging information theory and
artificial intelligence while offering promising applications in
resource-constrained and parallel computing environments.
Authors' comments: 16 pages, 6 figures
Sunwoo Lee
Sharpness-aware minimization (SAM) is known to improve the generalization
performance of neural networks. However, it is not widely used in real-world
applications yet due to its expensive model perturbation cost. A few variants
of SAM have been proposed to tackle such an issue, but they commonly do not
alleviate the cost noticeably. In this paper, we propose a lightweight
layer-wise gradient norm penalizing method that tackles the expensive
computational cost of SAM while maintaining its superior generalization
performance. Our study empirically proves that the gradient norm of the whole
model can be effectively suppressed by penalizing the gradient norm of only a
few critical layers. We also theoretically show that such a partial model
perturbation does not harm the convergence rate of SAM, allowing them to be
safely adapted in real-world applications. To demonstrate the efficacy of the
proposed method, we perform extensive experiments comparing the proposed method
to mini-batch SGD and the conventional SAM using representative computer vision
and language modeling benchmarks.
Authors' comments: Published in KDD 2024
Yujia Wu, Wei Lan, Long Feng, Chih-Ling Tsai
The stochastic block model (SBM) has been widely used to analyze network data. Various goodness-of-fit tests have been proposed to assess the adequacy of model structures. To the best of our knowledge, however, none of the existing approaches are applicable for sparse networks in which the connection probability of any two communities is of order log n/n, and the number of communities is divergent. To fill this gap, we propose a novel goodness-of-fit test for the stochastic block model. The key idea is to construct statistics by sampling the maximum entry-deviations of the adjacency matrix that the negative impacts of network sparsity are alleviated by the sampling process. We demonstrate theoretically that the proposed test statistic converges to the Type-I extreme value distribution under the null hypothesis regardless of the network structure. Accordingly, it can be applied to both dense and sparse networks. In addition, we obtain the asymptotic power against alternatives. Moreover, we introduce a bootstrap-corrected test statistic to improve the finite sample performance, recommend an augmented test statistic to increase the power, and extend the proposed test to the degree-corrected SBM. Simulation studies and two empirical examples with both dense and sparse networks indicate that the proposed method performs well.