Yingxin Li, Ye Li, Yuan Meng, Xinzhu Ma, Zihan Geng, Shutao Xia, Zhi Wang
As large language models (LLMs) continue to advance, the demand for higher quality and faster processing of long contexts across various applications is growing. KV cache is widely adopted as it stores previously generated key and value tokens, effectively reducing redundant computations during inference. However, as memory overhead becomes a significant concern, efficient compression of KV cache has gained increasing attention. Most existing methods perform compression from two perspectives: identifying important tokens and designing compression strategies. However, these approaches often produce biased distributions of important tokens due to the influence of accumulated attention scores or positional encoding. Furthermore, they overlook the sparsity and redundancy across different heads, which leads to difficulties in preserving the most effective information at the head level. To this end, we propose EMS to overcome these limitations, while achieving better KV cache compression under extreme compression ratios. Specifically, we introduce a Global-Local score that combines accumulated attention scores from both global and local KV tokens to better identify the token importance. For the compression strategy, we design an adaptive and unified Evict-then-Merge framework that accounts for the sparsity and redundancy of KV tokens across different heads. Additionally, we implement the head-wise parallel compression through a zero-class mechanism to enhance efficiency. Extensive experiments demonstrate our SOTA performance even under extreme compression ratios. EMS consistently achieves the lowest perplexity, improves scores by over 1.28 points across four LLMs on LongBench under a 256 cache budget, and preserves 95% retrieval accuracy with a cache budget less than 2% of the context length in the Needle-in-a-Haystack task.
Yang Gao, Enci Wang, Qing-Hua Tan, Timothy A. Davis, Fu-Heng Liang, Xue-Jian Jiang, Ning Gai, Qian Jiao et al.
We present the analysis of a comprehensive sample of 352 early-type galaxies
using public data, to investigate the correlations between CO luminosities and
mid-infrared luminosities observed by \textit{Wide-field Infrared Survey
Explorer} (\textit{WISE}). We find strong correlations between both CO (1-0)
and CO (2-1) luminosities and 12 \micron\ luminosity, boasting a correlation
coefficient greater than 0.9 and an intrinsic scatter smaller than 0.1 dex. The
consistent slopes observed for the relationships of CO (1-0) and CO (2-1)
suggest that the line ratio R21 lacks correlation with mid-infrared emission in
early-type galaxies, which is significantly different from star-forming
galaxies. Moreover, the slopes of $L_{\rm CO (1-0)}$--$L_{\mbox{12\micron}}$
and $L_{\rm CO (2-1)}$--$L_{\mbox{12\micron}}$ relations in early-type galaxies
are steeper than those observed in star-forming galaxies. Given the absence of
correlation with color, morphology or sSFR, the correlation between deviations
and the molecular gas mass surface density could be eliminated by correcting
the possible 12 \micron\ emission from old stars or adopting a systematically
different $\alpha_{\rm CO}$. The latter, on average, is equivalent to adding an
constant CO brightness density, specifically
${2.8{_{-0.6}}\!\!\!\!\!\!\!\!\!^{+0.8}}~[\mathrm{K~km~s^{-1}}]$ and
${4.4{_{-1.4}}\!\!\!\!\!\!\!\!\!^{+2.2}}~[\mathrm{K~km~s^{-1}}]$ for CO (1-0)
and (2-1) respectively. These explorations will serve as useful tools for
estimating the molecular gas content in gas-poor galaxies and understanding
associated quenching processes.
Authors' comments: 20 pages, 6 figures, accepted for publication in ApJ
Sanjay Mishra, Chander Mohan Bishnoi
Cardinal functions provide valuable insight into the topological properties of spaces, helping to analyze and compare spaces in terms of their covering, convergence and separation properties. This paper focuses on investigating cardinal functions like network weight, Lindel\"of degree, tightness, weak covering, pseudocharacter, and $i$-weight, for the spaces $Q_{P}(X)$ and $Q_{P}(X,Y)$ of quasi-continuous functions under the topology of point-wise convergence. In addition to these, we also investigate properties of restriction and induced maps associated with the spaces $Q_{P}(X)$ and $Q_{P}(X,Y)$.
Boyao Zhou, Shunyuan Zheng, Hanzhang Tu, Ruizhi Shao, Boning Liu, Shengping Zhang, Liqiang Nie, Yebin Liu
Differentiable rendering techniques have recently shown promising results for
free-viewpoint video synthesis of characters. However, such methods, either
Gaussian Splatting or neural implicit rendering, typically necessitate
per-subject optimization which does not meet the requirement of real-time
rendering in an interactive application. We propose a generalizable Gaussian
Splatting approach for high-resolution image rendering under a sparse-view
camera setting. To this end, we introduce Gaussian parameter maps defined on
the source views and directly regress Gaussian properties for instant novel
view synthesis without any fine-tuning or optimization. We train our Gaussian
parameter regression module on human-only data or human-scene data, jointly
with a depth estimation module to lift 2D parameter maps to 3D space. The
proposed framework is fully differentiable with both depth and rendering
supervision or with only rendering supervision. We further introduce a
regularization term and an epipolar attention mechanism to preserve geometry
consistency between two source views, especially when neglecting depth
supervision. Experiments on several datasets demonstrate that our method
outperforms state-of-the-art methods while achieving an exceeding rendering
speed.
Authors' comments: Journal extension of CVPR 2024,Project
page:https://yaourtb.github.io/GPS-Gaussian+
Huaqin Zhao, Jiaxi Li, Yi Pan, Shizhe Liang, Xiaofeng Yang, Wei Liu, Xiang Li, Fei Dou et al.
Fine-tuning large language models (LLMs) poses significant memory challenges, as the back-propagation process demands extensive resources, especially with growing model sizes. Recent work, MeZO, addresses this issue using a zeroth-order (ZO) optimization method, which reduces memory consumption by matching the usage to the inference phase. However, MeZO experiences slow convergence due to varying curvatures across model parameters. To overcome this limitation, we introduce HELENE, a novel scalable and memory-efficient optimizer that integrates annealed A-GNB gradients with a diagonal Hessian estimation and layer-wise clipping, serving as a second-order pre-conditioner. This combination allows for faster and more stable convergence. Our theoretical analysis demonstrates that HELENE improves convergence rates, particularly for models with heterogeneous layer dimensions, by reducing the dependency on the total parameter space dimension. Instead, the method scales with the largest layer dimension, making it highly suitable for modern LLM architectures. Experimental results on RoBERTa-large and OPT-1.3B across multiple tasks show that HELENE achieves up to a 20x speedup compared to MeZO, with average accuracy improvements of 1.5%. Furthermore, HELENE remains compatible with both full parameter tuning and parameter-efficient fine-tuning (PEFT), outperforming several state-of-the-art optimizers. The codes will be released after reviewing.
Ivica Kopriva, Dario Sitnik, Laura-Isabelle Dion-Bertrand, Marija Milković Periša, Mirko Hadžija, Marijana Popović Hadžija
Hyperspectral imaging (HSI) holds significant potential for transforming the
field of computational pathology. However, there is currently a shortage of
pixel-wise annotated HSI data necessary for training deep learning (DL) models.
Additionally, the number of HSI-based research studies remains limited, and in
many cases, the advantages of HSI over traditional RGB imaging have not been
conclusively demonstrated, particularly for specimens collected
intraoperatively. To address these challenges we present a database consisted
of 27 HSIs of hematoxylin-eosin stained frozen sections, collected from 14
patients with colon adenocarcinoma metastasized to the liver. It is aimed to
validate pixel-wise classification for intraoperative tumor resection. The HSIs
were acquired in the spectral range of 450 to 800 nm, with a resolution of 1
nm, resulting in images of 1384x1035 pixels. Pixel-wise annotations were
performed by three pathologists. To overcome challenges such as experimental
variability and the lack of annotated data, we combined label-propagation-based
semi-supervised learning (SSL) with spectral-spatial features extracted by: the
multiscale principle of relevant information (MPRI) method and tensor singular
spectrum analysis method. Using only 1% of labeled pixels per class the
SSL-MPRI method achieved a micro balanced accuracy (BACC) of 0.9313 and a micro
F1-score of 0.9235 on the HSI dataset. The performance on corresponding RGB
images was lower, with a micro BACC of 0.8809 and a micro F1-score of 0.8688.
These improvements are statistically significant. The SSL-MPRI approach
outperformed six DL architectures trained with 63% of labeled pixels. Data and
code are available at: https://github.com/ikopriva/ColonCancerHSI.
Authors' comments: 12 pages, 5 figures, 5 tables
Rogelio Albarracín, M. Zoccali, J. Olivares Carvajal, Á. Rojas-Arriagada, J. H. Minniti, M. Catelan, M. De Leo, F. Gran et al.
The structure and kinematics of the Milky Way disk are largely inferred from
the solar vicinity. To gain a comprehensive understanding, it is essential to
find reliable tracers in less-explored regions like the bulge and the far side
of the disk. Mira variables, which are well-studied and bright standard
candles, offer an excellent opportunity to trace intermediate and old
populations in these complex regions. We aim to isolate a clean sample of Miras
in the Vista Variables in the V\'ia L\'actea survey using Gaussian process
algorithms. This sample will be used to study intermediate and old age
populations in the Galactic bulge and far disk. Near- and mid-infrared
time-series photometry were processed using Gaussian Process algorithms to
identify Mira variables and model their light curves. We calibrated selection
criteria with a visually inspected sample to create a high-purity sample of
Miras, integrating multi-band photometry and kinematic data from proper
motions. We present a catalog of 3602 Mira variables. By analyzing photometry,
we classify them by O-rich or C-rich surface chemistry and derive
selective-to-total extinction ratios of $A_{K_{s}}/E(J - K_{s}) = 0.471 \pm
0.01$ and $A_{K_{s}}/E(H - K_{s}) = 1.320 \pm 0.020$. Using the Mira period-age
relation, we find evidence supporting the inside-out formation of the Milky Way
disk. The distribution of proper motions and distances aligns with the Galactic
rotation curve and disk kinematics. We extend the rotation curve up to R$_{\rm
GC} \sim 17 \ \rm{kpc}$ and find no strong evidence of the nuclear stellar disk
in our Mira sample. This study constitutes the largest catalog of variable
stars on the far side of the Galactic disk to date.
Authors' comments: 20 pages, 19 figures, Accepted in A&A
Bowei Du, Zhixuan Liao, Yanan Zhang, Zhi Cai, Jiaxin Chen, Di Huang
Developing accurate and efficient detectors for drone imagery is challenging due to the inherent complexity of aerial scenes. While some existing methods aim to achieve high accuracy by utilizing larger models, their computational cost is prohibitive for drones. Recently, Knowledge Distillation (KD) has shown promising potential for maintaining satisfactory accuracy while significantly compressing models in general object detection. Considering the advantages of KD, this paper presents the first attempt to adapt it to object detection on drone imagery and addresses two intrinsic issues: (1) low foreground-background ratio and (2) small instances and complex backgrounds, which lead to inadequate training, resulting insufficient distillation. Therefore, we propose a task-wise Lightweight Mutual Lifting (Light-ML) module with a Centerness-based Instance-aware Distillation (CID) strategy. The Light-ML module mutually harmonizes the classification and localization branches by channel shuffling and convolution, integrating teacher supervision across different tasks during back-propagation, thus facilitating training the student model. The CID strategy extracts valuable regions surrounding instances through the centerness of proposals, enhancing distillation efficacy. Experiments on the VisDrone, UAVDT, and COCO benchmarks demonstrate that the proposed approach promotes the accuracies of existing state-of-the-art KD methods with comparable computational requirements. Codes will be available upon acceptance.
Nicolas Chauvaux, Adrian Kneip, Christoph Posch, Kofi Makinwa, Charlotte Frenkel
Compute-in-memory (CIM) accelerators for spiking neural networks (SNNs) are
promising solutions to enable $\mu$s-level inference latency and ultra-low
energy in edge vision applications. Yet, their current lack of flexibility at
both the circuit and system levels prevents their deployment in a wide range of
real-life scenarios. In this work, we propose a novel digital CIM macro that
supports arbitrary operand resolution and shape, with a unified CIM storage for
weights and membrane potentials. These circuit-level techniques enable a hybrid
weight- and output-stationary dataflow at the system level to maximize operand
reuse, thereby minimizing costly on- and off-chip data movements during the SNN
execution. Measurement results of a fabricated FlexSpIM prototype in 40-nm CMOS
demonstrate a 2$\times$ increase in bit-normalized energy efficiency compared
to prior fixed-precision digital CIM-SNNs, while providing resolution
reconfiguration with bitwise granularity. Our approach can save up to 90%
energy in large-scale systems, while reaching a state-of-the-art classification
accuracy of 95.8% on the IBM DVS gesture dataset.
Authors' comments: 5 pages, 7 figures, submitted to IEEE ISCAS 2025
Jeongjae Lee, Songnam Hong
We study the channel estimation problem for a reconfigurable intelligent
surface (RIS)-assisted millimeter-wave (mmWave) multi-user multiple-input
multiple-output (MU-MIMO) system. In particular, it is assumed that the channel
between a RIS and a base station (BS) exhibits a near-field line-of-sight (LoS)
channel, which is a dominant signal path in mmWave communication systems. Due
to the high-rankness and non-sparsity of the RIS-BS channel matrix in our
system, the state-of-the-art (SOTA) methods, which are constructed based on
far-field or near-field non-LoS (NLoS) channel, cannot provide attractive
estimation performances. We for the first time propose an efficient near-field
LoS/NLoS channel estimation method for RIS-assisted MU-MIMO systems by means of
a piece-wise low-rank approximation. Specifically, an effective channel (to be
estimated) is partitioned into piece-wise effective channels containing
low-rank structures and then, they are estimated via collaborative low-rank
approximation. The proposed method is named PW-CLRA. Via simulations, we verify
the effectiveness of the proposed PW-CLRA.
Authors' comments: Submitted to the IEEE Transactions on Wireless Communications, 12
pages, 8 figures
Kai Yao, Penglei Gao, Lichun Li, Yuan Zhao, Xiaofeng Wang, Wei Wang, Jianke Zhu
Parameter-Efficient Fine-Tuning (PEFT) methods have gained significant
popularity for adapting pre-trained Large Language Models (LLMs) to downstream
tasks, primarily due to their potential to significantly reduce memory and
computational overheads. However, a common limitation in most PEFT approaches
is their application of a uniform architectural design across all layers. This
uniformity involves identical trainable modules and ignores the varying
importance of each layer, leading to sub-optimal fine-tuning results. To
overcome the above limitation and obtain better performance, we develop a novel
approach, Importance-aware Sparse Tuning (IST), to fully utilize the inherent
sparsity and select the most important subset of full layers with effective
layer-wise importance scoring. The proposed IST is a versatile and
plug-and-play technique compatible with various PEFT methods that operate on a
per-layer basis. By leveraging the estimated importance scores, IST dynamically
updates these selected layers in PEFT modules, leading to reduced memory
demands. We further provide theoretical proof of convergence and empirical
evidence of superior performance to demonstrate the advantages of IST over
uniform updating strategies. Extensive experiments on a range of LLMs, PEFTs,
and downstream tasks substantiate the effectiveness of our proposed method,
showcasing IST's capacity to enhance existing layer-based PEFT methods. Our
code is available at https://github.com/Kaiseem/IST.
Authors' comments: EMNLP 2024
Sotaro Fushimi, Yuto Watanabe, Kazunori Sakurama
This study addresses a design of distributed controllers for discrete-time systems using linear matrix inequalities (LMIs). Sparsity constraints on control gains of distributed controllers result in conservatism via the convexification of the existing methods such as the extended LMI method. In order to mitigate the conservatism, we introduce a novel LMI formulation for this problem, utilizing the clique-wise decomposition method from our previous work on continuous-time systems. By reformulating the sparsity constraint on the gain matrix within cliques, this method achieves a broader solution set. Also, the analytical superiority of our method is confirmed through numerical examples.
Shuai Chen, Fanman Meng, Chenhao Wu, Haoran Wei, Runtong Zhang, Qingbo Wu, Linfeng Xu, Hongliang Li
Few-Shot Segmentation (FSS) aims to segment novel classes using only a few
annotated images. Despite considerable progress under pixel-wise support
annotation, current FSS methods still face three issues: the inflexibility of
backbone upgrade without re-training, the inability to uniformly handle various
types of annotations (e.g., scribble, bounding box, mask, and text), and the
difficulty in accommodating different annotation quantity. To address these
issues simultaneously, we propose DiffUp, a novel framework that conceptualizes
the FSS task as a conditional generative problem using a diffusion process. For
the first issue, we introduce a backbone-agnostic feature transformation module
that converts different segmentation cues into unified coarse priors,
facilitating seamless backbone upgrade without re-training. For the second
issue, due to the varying granularity of transformed priors from diverse
annotation types (scribble, bounding box, mask, and text), we conceptualize
these multi-granular transformed priors as analogous to noisy intermediates at
different steps of a diffusion model. This is implemented via a
self-conditioned modulation block coupled with a dual-level quality modulation
branch. For the third issue, we incorporate an uncertainty-aware information
fusion module to harmonize the variability across zero-shot, one-shot, and
many-shot scenarios. Evaluated through rigorous benchmarks, DiffUp
significantly outperforms existing FSS models in terms of flexibility and
accuracy.
Authors' comments: 9 figures
Andrew Jeong
This letter presents KGpose, a novel end-to-end framework for 6D pose estimation of multiple objects. Our approach combines keypoint-based method with learnable pose regression through `keypoint-graph', which is a graph representation of the keypoints. KGpose first estimates 3D keypoints for each object using an attentional multi-modal feature fusion of RGB and point cloud features. These keypoints are estimated from each point of point cloud and converted into a graph representation. The network directly regresses 6D pose parameters for each point through a sequence of keypoint-graph embedding and local graph embedding which are designed with graph convolutions, followed by rotation and translation heads. The final pose for each object is selected from the candidates of point-wise predictions. The method achieves competitive results on the benchmark dataset, demonstrating the effectiveness of our model. KGpose enables multi-object pose estimation without requiring an extra localization step, offering a unified and efficient solution for understanding geometric contexts in complex scenes for robotic applications.
Haoyu Wang, Tianci Liu, Tuo Zhao, Jing Gao
Pre-trained language models, trained on large-scale corpora, demonstrate strong generalizability across various NLP tasks. Fine-tuning these models for specific tasks typically involves updating all parameters, which is resource-intensive. Parameter-efficient fine-tuning (PEFT) methods, such as the popular LoRA family, introduce low-rank matrices to learn only a few parameters efficiently. However, during inference, the product of these matrices updates all pre-trained parameters, complicating tasks like knowledge editing that require selective updates. We propose a novel PEFT method, which conducts \textbf{r}ow and c\textbf{o}lumn-wise spar\textbf{se} \textbf{lo}w-\textbf{r}ank \textbf{a}daptation (RoseLoRA), to address this challenge. RoseLoRA identifies and updates only the most important parameters for a specific task, maintaining efficiency while preserving other model knowledge. By adding a sparsity constraint on the product of low-rank matrices and converting it to row and column-wise sparsity, we ensure efficient and precise model updates. Our theoretical analysis guarantees the lower bound of the sparsity with respective to the matrix product. Extensive experiments on five benchmarks across twenty datasets demonstrate that RoseLoRA outperforms baselines in both general fine-tuning and knowledge editing tasks.
Zhigao Cai, Xing-Ming Zhao
Automatic segmentation of the fetal brain is still challenging due to the health state of fetal development, motion artifacts, and variability across gestational ages, since existing methods rely on high-quality datasets of healthy fetuses. In this work, we propose a novel cascade network called CasUNext to enhance the accuracy and generalization of fetal brain MRI segmentation. CasUNext incorporates depth-wise separable convolution, attention mechanisms, and a two-step cascade architecture for efficient high-precision segmentation. The first network localizes the fetal brain region, while the second network focuses on detailed segmentation. We evaluate CasUNext on 150 fetal MRI scans between 20 to 36 weeks from two scanners made by Philips and Siemens including axial, coronal, and sagittal views, and also validated on a dataset of 50 abnormal fetuses. Results demonstrate that CasUNext achieves improved segmentation performance compared to U-Nets and other state-of-the-art approaches. It obtains an average Dice coefficient of 96.1% and mean intersection over union of 95.9% across diverse scenarios. CasUNext shows promising capabilities for handling the challenges of multi-view fetal MRI and abnormal cases, which could facilitate various quantitative analyses and apply to multi-site data.
Yanshu Wang, Wenyang He, Tong Yang
Large Language Models (LLMs) have significantly advanced natural language processing tasks such as machine translation, text generation, and sentiment analysis. However, their large size, often consisting of billions of parameters, poses challenges for storage, computation, and deployment, particularly in resource-constrained environments like mobile devices and edge computing platforms. Effective compression and quantization techniques are crucial for addressing these issues, reducing memory footprint and computational requirements without significantly compromising performance. Traditional methods that uniformly map parameters to compressed spaces fail to account for the uneven distribution of parameters, leading to substantial accuracy loss. In this work, we propose Athena, a novel algorithm for efficient block-wise post-training quantization of LLMs. Athena leverages Second-Order Matrix Derivative Information to guide the quantization process using the curvature information of the loss landscape. By grouping parameters by columns or rows and iteratively optimizing the quantization process, Athena updates the model parameters and Hessian matrix to achieve significant compression while maintaining high accuracy. This makes Athena a practical solution for deploying LLMs in various settings.
Zong-Wei Hong, Yen-Yang Hung, Chu-Song Chen
In this work, we introduce a novel method for calculating the 6DoF pose of an
object using a single RGB-D image. Unlike existing methods that either directly
predict objects' poses or rely on sparse keypoints for pose recovery, our
approach addresses this challenging task using dense correspondence, i.e., we
regress the object coordinates for each visible pixel. Our method leverages
existing object detection methods. We incorporate a re-projection mechanism to
adjust the camera's intrinsic matrix to accommodate cropping in RGB-D images.
Moreover, we transform the 3D object coordinates into a residual
representation, which can effectively reduce the output space and yield
superior performance. We conducted extensive experiments to validate the
efficacy of our approach for 6D pose estimation. Our approach outperforms most
previous methods, especially in occlusion scenarios, and demonstrates notable
improvements over the state-of-the-art methods. Our code is available on
https://github.com/AI-Application-and-Integration-Lab/RDPN6D.
Authors' comments: Accepted by CVPR Workshop DLGC, 2024
Johan Öfverstedt, Elin Lundström, Göran Bergström, Joel Kullberg, Håkan Ahlström
The study of associations between an individual's age and imaging and
non-imaging data is an active research area that attempts to aid understanding
of the effects and patterns of aging. In this work we have conducted a
supervoxel-wise association study between both volumetric and tissue density
features in coronary computed tomography angiograms and the chronological age
of a subject, to understand the localized changes in morphology and tissue
density with age. To enable a supervoxel-wise study of volume and tissue
density, we developed a novel method based on image segmentation, inter-subject
image registration, and robust supervoxel-based correlation analysis, to
achieve a statistical association study between the images and age. We evaluate
the registration methodology in terms of the Dice coefficient for the heart
chambers and myocardium, and the inverse consistency of the transformations,
showing that the method works well in most cases with high overlap and inverse
consistency. In a sex-stratified study conducted on a subset of $n=1388$ images
from the SCAPIS study, the supervoxel-wise analysis was able to find localized
associations with age outside of the commonly segmented and analyzed
sub-regions, and several substantial differences between the sexes in the
association of age and volume.
Authors' comments: 35 pages
Tao Feng, Lizhen Qu, Zhuang Li, Haolan Zhan, Yuncheng Hua, Gholamreza Haffari
Machine learning models have made incredible progress, but they still struggle when applied to examples from unseen domains. This study focuses on a specific problem of domain generalization, where a model is trained on one source domain and tested on multiple target domains that are unseen during training. We propose IMO: Invariant features Masks for Out-of-Distribution text classification, to achieve OOD generalization by learning invariant features. During training, IMO would learn sparse mask layers to remove irrelevant features for prediction, where the remaining features keep invariant. Additionally, IMO has an attention module at the token level to focus on tokens that are useful for prediction. Our comprehensive experiments show that IMO substantially outperforms strong baselines in terms of various evaluation metrics and settings.