Yuibi Gomi, Akira Sato, Waleed Madany, Kenichi Okada, Satoshi Adachi, Masatoshi Itoh, Masanori Hashimoto
We developed a 55 nm CMOS SRAM chip that scans all data every 125 ns and outputs timestamped soft error data via an SPI interface through a FIFO. The proposed system, consisting of the developed chip and particle detectors, enables event-wise soft error measurement and precise identification of SBUs and MCUs, thus resolving misclassifications such as Pseudo- and Distant MCUs that conventional methods cannot distinguish. An 80-MeV proton irradiation experiment at RASiS, Tohoku University verified the system operation. Timestamps between the SRAM chip and the particle detectors were successfully synchronized, accounting for PLL disturbances caused by radiation. Event building was achieved by determining a reset offset with sub-ns resolution, and spatial synchronization was maintained within several tens of micrometers.
Javier Muñoz-Haro, Ruben Tolosana, Ruben Vera-Rodriguez, Aythami Morales, Julian Fierrez
In an increasingly digitalized world, verifying the authenticity of ID documents has become a critical challenge for real-life applications such as digital banking, crypto-exchanges, renting, etc. This study focuses on the topic of fake ID detection, covering several limitations in the field. In particular, no publicly available data from real ID documents exists, and most studies rely on proprietary in-house databases that are not available due to privacy reasons. In order to shed some light on this critical challenge that makes difficult to advance in the field, we explore a trade-off between privacy (i.e., amount of sensitive data available) and performance, proposing a novel patch-wise approach for privacy-preserving fake ID detection. Our proposed approach explores how privacy can be enhanced through: i) two levels of anonymization for an ID document (i.e., fully- and pseudo-anonymized), and ii) different patch size configurations, varying the amount of sensitive data visible in the patch image. Also, state-of-the-art methods such as Vision Transformers and Foundation Models are considered in the analysis. The experimental framework shows that, on an unseen database (DLC-2021), our proposal achieves 13.91% and 0% EERs at patch and ID document level, showing a good generalization to other databases. In addition to this exploration, another key contribution of our study is the release of the first publicly available database that contains 48,400 patches from both real and fake ID documents, along with the experimental framework and models, which will be available in our GitHub.
G. Charbel N. Kindji, Elisa Fromont, Lina Maria Rojas-Barahona, Tanguy Urvoy
The growing power of generative models raises major concerns about the authenticity of published content. To address this problem, several synthetic content detection methods have been proposed for uniformly structured media such as image or text. However, little work has been done on the detection of synthetic tabular data, despite its importance in industry and government. This form of data is complex to handle due to the diversity of its structures: the number and types of the columns may vary wildly from one table to another. We tackle the tough problem of detecting synthetic tabular data ''in the wild'', i.e. when the model is deployed on table structures it has never seen before. We introduce a novel datum-wise transformer architecture and show that it outperforms existing models. Furthermore, we investigate the application of domain adaptation techniques to enhance the effectiveness of our model, thereby providing a more robust data-forgery detection solution.
Samy-Melwan Vilhes, Gilles Gasso, Mokhtar Z Alaya
Time series anomaly detection (TSAD) focuses on identifying whether observations in streaming data deviate significantly from normal patterns. With the prevalence of connected devices, anomaly detection on time series has become paramount, as it enables real-time monitoring and early detection of irregular behaviors across various application domains. In this work, we introduce PatchTrAD, a Patch-based Transformer model for time series anomaly detection. Our approach leverages a Transformer encoder along with the use of patches under a reconstructionbased framework for anomaly detection. Empirical evaluations on multiple benchmark datasets show that PatchTrAD is on par, in terms of detection performance, with state-of-the-art deep learning models for anomaly detection while being time efficient during inference.
Rita Sevastjanova, Robin Gerling, Thilo Spinner, Mennatallah El-Assady
Large language models (LLMs) represent words through contextual word embeddings encoding different language properties like semantics and syntax. Understanding these properties is crucial, especially for researchers investigating language model capabilities, employing embeddings for tasks related to text similarity, or evaluating the reasons behind token importance as measured through attribution methods. Applications for embedding exploration frequently involve dimensionality reduction techniques, which reduce high-dimensional vectors to two dimensions used as coordinates in a scatterplot. This data transformation step introduces uncertainty that can be propagated to the visual representation and influence users' interpretation of the data. To communicate such uncertainties, we present LayerFlow - a visual analytics workspace that displays embeddings in an interlinked projection design and communicates the transformation, representation, and interpretation uncertainty. In particular, to hint at potential data distortions and uncertainties, the workspace includes several visual components, such as convex hulls showing 2D and HD clusters, data point pairwise distances, cluster summaries, and projection quality metrics. We show the usability of the presented workspace through replication and expert case studies that highlight the need to communicate uncertainty through multiple visual components and different data perspectives.
T. A. Stockmans, F. Snik, J. M. Smit, J. H. H. Rietjens, M. Esposito, C. van Dijk, C. U. Keller
Modern detector manufacturing allows spectral and polarimetric filters to be
directly integrated on top of separate detector pixels. This enables the
creation of CubeSat-sized spectro-polarimetric instruments that are not much
larger than the detector and a lens. Redundancy inherent to the observed scene,
offers the opportunity for sparse sampling in the form of not scanning all
filters at every location. However, when there are fewer pushbroom steps than
filters, data are missing in the resulting data cube. The missing, largely
redundant data can be filled in with interpolation methods, often called
demosaicers. The choice of filters and their precise layout influences the
performance of the instrument after the demosaicing process. In these
proceedings we describe a part of a design toolbox for both the filter layout
and the optimum parameters for the reconstruction to a full
spectro-polarimetric data cube. The design tool is based on training a (neural)
network and jointly updating the values of the filters and demosaicer. We
optimized a filter layout by training on spectro-polarimetric remote
observations of the Earth acquired by SPEX airborne. This optimised filter
layout could reconstruct a validation scene from five overlapping snapshots
(pushbroom steps), which would take 109 pushbroom steps when measuring with a
classical layout and no reconstruction.
Authors' comments: 5 pages, 3 figures, conference proceedings
Jihun Park, Jongmin Gim, Kyoungmin Lee, Minseok Oh, Minwoo Choi, Jaeyeul Kim, Woo Chool Park, Sunghoon Im
We present a training-free style-aligned image generation method that
leverages a scale-wise autoregressive model. While large-scale text-to-image
(T2I) models, particularly diffusion-based methods, have demonstrated
impressive generation quality, they often suffer from style misalignment across
generated image sets and slow inference speeds, limiting their practical
usability. To address these issues, we propose three key components: initial
feature replacement to ensure consistent background appearance, pivotal feature
interpolation to align object placement, and dynamic style injection, which
reinforces style consistency using a schedule function. Unlike previous methods
requiring fine-tuning or additional training, our approach maintains fast
inference while preserving individual content details. Extensive experiments
show that our method achieves generation quality comparable to competing
approaches, significantly improves style alignment, and delivers inference
speeds over six times faster than the fastest model.
Authors' comments: 17 pages, 15 figures
Ivan Ilin, Peter Richtarik
This paper presents Thanos, a novel weight-pruning algorithm designed to
reduce the memory footprint and enhance the computational efficiency of large
language models (LLMs) by removing redundant weights while maintaining
accuracy. Thanos introduces a block-wise pruning strategy with adaptive masks
that dynamically adjust to weight importance, enabling flexible sparsity
patterns and structured formats, such as $n:m$ sparsity, optimized for hardware
acceleration. Experimental evaluations demonstrate that Thanos achieves
state-of-the-art performance in structured pruning and outperforms existing
methods in unstructured pruning. By providing an efficient and adaptable
approach to model compression, Thanos offers a practical solution for deploying
large models in resource-constrained environments.
Authors' comments: 8 pages, 3 Figures, 3 Tables, 2 Algorithms, paper comes with Appendix
Laura M Fernández-Pardo, Jorge Rodríguez-López
We present a version of Krasnosel'skii fixed point theorem for operators acting on Cartesian products of normed linear spaces, under cone-compression and cone-expansion conditions of norm type. Our approach, based on the fixed point index theory in cones, guarantees the existence of a coexistence fixed point - that is, one with nontrivial components. As an application, we prove the existence of periodic solutions with strictly positive components for a system of second-order differential equations. In particular, we address cases involving singular nonlinearities and hybrid terms, characterized by sublinear behavior in one component and superlinear behavior in the other.
Shivesh Prakash, Viki Kumar Prasad, Hans-Arno Jacobsen
We introduce MHNpath, a machine learning-driven retrosynthetic tool designed for computer-aided synthesis planning. Leveraging modern Hopfield networks and novel comparative metrics, MHNpath efficiently prioritizes reaction templates, improving the scalability and accuracy of retrosynthetic predictions. The tool incorporates a tunable scoring system that allows users to prioritize pathways based on cost, reaction temperature, and toxicity, thereby facilitating the design of greener and cost-effective reaction routes. We demonstrate its effectiveness through case studies involving complex molecules from ChemByDesign, showcasing its ability to predict novel synthetic and enzymatic pathways. Furthermore, we benchmark MHNpath against existing frameworks, replicating experimentally validated "gold-standard" pathways from PaRoutes. Our case studies reveal that the tool can generate shorter, cheaper, moderate-temperature routes employing green solvents, as exemplified by compounds such as dronabinol, arformoterol, and lupinine.
Reza Esfandiarpoor, George Zerveas, Ruochen Zhang, Macton Mgonzo, Carsten Eickhoff, Stephen H. Bach
Recent advancements in large language models (LLMs) have allowed the
augmentation of information retrieval (IR) pipelines with synthetic data in
various ways. Yet, the main training paradigm remains: contrastive learning
with binary relevance labels and the InfoNCE loss, where one positive document
is compared against one or more negatives. This objective treats all documents
that are not explicitly annotated as relevant on an equally negative footing,
regardless of their actual degree of relevance, thus (a) missing subtle nuances
that are useful for ranking and (b) being susceptible to annotation noise. To
overcome this limitation, in this work we forgo real training documents and
annotations altogether and use open-source LLMs to directly generate synthetic
documents that answer real user queries according to several different levels
of relevance. This fully synthetic ranking context of graduated relevance,
together with an appropriate list-wise loss (Wasserstein distance), enables us
to train dense retrievers in a way that better captures the ranking task.
Experiments on various IR datasets show that our proposed approach outperforms
conventional training with InfoNCE by a large margin. Without using any real
documents for training, our dense retriever significantly outperforms the same
retriever trained through self-supervision. More importantly, it matches the
performance of the same retriever trained on real, labeled training documents
of the same dataset, while being more robust to distribution shift and clearly
outperforming it when evaluated zero-shot on the BEIR dataset collection.
Authors' comments: Code: https://github.com/BatsResearch/sycl
Yuyuan Li, Junjie Fang, Chaochao Chen, Xiaolin Zheng, Yizhao Zhang, Zhongxuan Han
In this paper, we reproduce the experimental results presented in our previous work titled "Making Users Indistinguishable: Attribute-wise Unlearning in Recommender Systems," which was published in the proceedings of the 31st ACM International Conference on Multimedia. This paper aims to validate the effectiveness of our proposed method and help others reproduce our experimental results. We provide detailed descriptions of our preprocessed datasets, source code structure, configuration file settings, experimental environment, and reproduced experimental results.
Yuta Tomokiyo, Keita Nishimoto, Kimitaka Asatani, Ichiro Sakata
Researchers are no longer limited to producing knowledge; in today's complex world, they also address societal challenges by engaging in policymaking. Although involvement in policymaking has expanded, direct empirical evidence of its career benefits remains underexplored. Prior survey-based studies suggest potential advantages-such as broader professional networks and enhanced opportunities-yet raise concerns about insufficient institutional support. Here, we examine the 2021 WHO global air quality guideline-a science-based regulatory guideline-as a case study. To evaluate the impact of guideline development on research outcomes, we match guideline researchers with a control group of peers sharing similar research topics and prior performance. Our analysis reveals that guideline researchers attain higher future citation counts in both academic and policy domains. New collaborations formed during development yield publications with higher citation impact and the disruptive index. Moreover, about half the guideline's references are derived from guideline researchers' papers, highlighting their central role in shaping the evidence base. These results provide empirical support for the career benefits of policy engagement. Our findings indicate that engaging in international guideline development offers tangible career incentives for researchers, and that institutions can enhance research impact and promote innovative scientific progress by actively supporting their researchers' participation in such initiatives.
Zhuo-Yang Song, Zeyu Li, Qing-Hong Cao, Ming-xing Luo, Hua Xing Zhu
The geometric evolution of token representations in large language models
(LLMs) presents a fundamental paradox: while human language inherently
organizes semantic information in low-dimensional spaces ($\sim 10^1$
dimensions), modern LLMs employ high-dimensional embeddings ($\sim 10^3$
dimensions) processed through Transformer architectures. To resolve this
paradox, this work bridges this conceptual gap by developing a geometric
framework that tracks token dynamics across Transformers layers. Through
layer-wise analysis of intrinsic dimensions across multiple architectures, we
reveal an expansion-contraction pattern where tokens diffuse to a "working
space" and then progressively project onto lower-dimensional submanifolds. Our
finding implies a negative correlation between the working space dimension and
parameter-sensitive performance of the LLMs, and indicates that effective
models tend to compress tokens into approximately 10-dimensional submanifolds,
closely resembling human semantic spaces. This work not only advances LLM
interpretability by reframing Transformers layers as projectors that mediate
between high-dimensional computation and low-dimensional semantics, but also
provides practical tools for model diagnostics that do not rely on
task-specific evaluations.
Authors' comments: 17 pages, 9 figures, 2 tables
Jerry Jun-Yan Zhang, Nicolas Lodieu, Eduardo L. Martín, Pascal Tremblin, María Rosa Zapatero Osorio, Víctor J. S. Béjar, Nikola Vitas, Bartosz Gauza et al.
WISEA J181006.18-101000.5 (WISE1810) is the nearest metal-poor ultracool
dwarf to the Sun. It has a low effective temperature and has been classified as
extreme early-T subdwarf. However, methane, the characteristic molecule of the
spectral class T, was not seen in the previous low-resolution spectrum. Using
the 10.4-m Gran Telescopio Canarias, we collected a high-quality JHK-band
intermediate-resolution R~5000 spectrum of WISE1810, in which a 17+/-6 ppm of
methane is clearly detected, while carbon monoxide is absent. Based on customly
computed ATMO2020++ model, we estimated an effective temperature of 1000+/-100
K, a high surface gravity of log g = 5.5+/-0.5 dex, a carbon abundance
[C/H]=-1.5+/-0.2 dex, inferring [Fe/H]=-1.7+/-0.2 dex. Potassium is not seen in
our data, and the upper limits of pseudo-equivalent width of J-band atomic
lines are at least 25 to 60 times weaker than those measured from
solar-metallicity early-T counterparts. We measured a heliocentric radial
velocity of -83+/-13 km/s, inferring that WISE1810 is more likely a thick disk
member.
Authors' comments: 7 pages, 2 figures in text; 5 figures in appendices. Accepted in ApJL
Inpyo Hong, Youngwan Jo, Hyojeong Lee, Sunghyun Ahn, Sanghyun Park
Zero-shot quantization (ZSQ) enables neural network compression without original training data, making it a promising solution for restricted data access scenarios. To compensate for the lack of data, recent ZSQ methods typically rely on synthetic inputs generated from the full-precision model. However, these synthetic inputs often lead to activation distortion, especially under low-bit settings. As a result, existing methods struggle to mitigate this issue due to coarse activation scaling. To address this issue, we propose GranQ, a novel activation quantization framework that efficiently applies per-channel scaling through vectorized computation. In contrast to conventional channel-wise methods, which apply vectorization only to the quantization step, GranQ improves efficiency by vectorizing the scaling operation. This design allows GranQ to maintain fine-grained quantization granularity with minimal computational overhead, even in low-bit environments. Extensive experiments under quantization-aware training (QAT) settings demonstrate that GranQ consistently outperforms state-of-the-art ZSQ methods across CIFAR and ImageNet. In particular, our method achieves up to 5.45% higher accuracy in the 3-bit setting on CIFAR-100 and even surpasses the full-precision baseline on CIFAR-10. Furthermore, GranQ achieves significant speedup in quantization latency over conventional per-channel methods, demonstrating improved efficiency. With these findings, we anticipate that GranQ will inspire future research beyond conventional ZSQ approaches centered on data generation and model fine-tuning.
Joshua Näf, Keith Moffat, Jaap Eising, Florian Dörfler
This paper proposes Select-Data-driven Predictive Control (Select-DPC), a new method for controlling nonlinear systems using output-feedback for which data are available but an explicit model is not. At each timestep, Select-DPC employs only the most relevant data to implicitly linearize the dynamics in "trajectory space". Then, taking user-defined output constraints into account, it makes control decisions using a convex optimization. This optimal control is applied in a receding-horizon manner. As the online data-selection is the core of Select-DPC, we propose and verify both norm-based and manifold-embedding-based selection methods. We evaluate Select-DPC on three benchmark nonlinear system simulators -- rocket-landing, a robotic arm and cart-pole inverted pendulum swing-up -- comparing them with standard Data-enabled Predictive Control (DeePC) and Time-Windowed DeePC methods, and find that Select-DPC outperforms both methods.
Youhui Zuo, Sibo Wei, Chen Zhang, Zhuorui Liu, Wenpeng Lu, Dawei Song
With the advancements in long-context inference capabilities of large language models (LLMs), the KV cache has become one of the foundational components. However, its substantial GPU memory consumption makes KV cache compression a key technique for enabling efficient LLM inference in industrial scenarios. While recent studies have focused on optimizing the memory occupied by the KV cache, they overlook two critical factors: preserving semantic coherence and considering task-specific characteristic during compression. To address these limitations, we propose a novel task-adaptive KV cache window selection method, WindowKV. WindowKV dynamically selects local semantic windows consisting of consecutive tokens, according to task-specific characteristics, ensuring the retained KV cache captures continuous, essential context. Additionally, we introduce an intra-group layer KV cache indices sharing strategy to reduce computational overhead, achieving a balance between performance and efficiency. We rigorously evaluate WindowKV on the LongBench benchmark, and the results demonstrate that it maintains a performance comparable to full KV cache retention while using only 12% of the original KV cache, significantly reducing memory requirements. Furthermore, our method also achieves state-of-the-art results in the Needle-in-a-Haystack evaluation, highlighting its effectiveness and robustness.
Aladin Djuhera, Swanand Ravindra Kadhe, Farhan Ahmed, Syed Zawad, Holger Boche
Fine-tuning large language models (LLMs) on downstream tasks can inadvertently erode their safety alignment, even for benign fine-tuning datasets. We address this challenge by proposing SafeMERGE, a post-fine-tuning framework that preserves safety while maintaining task utility. It achieves this by selectively merging fine-tuned and safety-aligned model layers only when those deviate from safe behavior, measured by a cosine similarity criterion. We evaluate SafeMERGE against other fine-tuning- and post-fine-tuning-stage approaches for Llama-2-7B-Chat and Qwen-2-7B-Instruct models on GSM8K and PubMedQA tasks while exploring different merging strategies. We find that SafeMERGE consistently reduces harmful outputs compared to other baselines without significantly sacrificing performance, sometimes even enhancing it. The results suggest that our selective, subspace-guided, and per-layer merging method provides an effective safeguard against the inadvertent loss of safety in fine-tuned LLMs while outperforming simpler post-fine-tuning-stage defenses.
Mingyang Song, Mao Zheng, Zheng Li, Wenjie Yang, Xuan Luo, Yue Pan, Feng Zhang
Improving training efficiency continues to be one of the primary challenges
in large-scale Reinforcement Learning (RL). In this paper, we investigate how
context length and the complexity of training data influence the RL scaling
training process of R1-distilled small reasoning models, e.g.,
DeepSeek-R1-Distill-Qwen-1.5B. Our experimental results reveal that: (1) simply
controlling the context length and curating the training data based on the
input prompt length can effectively improve the training efficiency of scaling
RL, achieving better performance with more concise CoT; (2) properly scaling
the context length helps mitigate entropy collapse; and (3) choosing an optimal
context length can improve the efficiency of model training and incentivize the
model's chain-of-thought reasoning capabilities. Inspired by these insights, we
propose FastCuRL, a curriculum RL framework with stage-wise context scaling to
achieve efficient training and concise CoT reasoning. Experiment results
demonstrate that FastCuRL-1.5B-V3 significantly outperforms state-of-the-art
reasoning models on five competition-level benchmarks and achieves 49.6\%
accuracy on AIME 2024. Furthermore, FastCuRL-1.5B-Preview surpasses
DeepScaleR-1.5B-Preview on five benchmarks while only using a single node with
8 GPUs and a total of 50\% of training steps. %The code, training data, and
models will be publicly released.
Authors' comments: Ongoing Work