benty-fields - Search paper

Keith Ando Ogawa, Bruno Lopes Yamamoto, Lucas Lauton de Alcantara, Lucas Pellicer, Rosimeire Pereira Costa, Edson Bollis, Anna Helena Reali Costa, Artur Jordao

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.05988v1

Pre-training Large Language Models (LLMs) on web-scale datasets becomes fundamental for advancing general-purpose AI. In contrast, enhancing their predictive performance on downstream tasks typically involves adapting their knowledge through fine-tuning. Parameter-efficient fine-tuning techniques, such as Low-Rank Adaptation (LoRA), aim to reduce the computational cost of this process by freezing the pre-trained model and updating a smaller number of parameters. In comparison to full fine-tuning, these methods achieve over 99\% reduction in trainable parameter count, depending on the configuration. Unfortunately, such a reduction may prove insufficient as LLMs continue to grow in scale. In this work, we address the previous problem by systematically selecting only a few layers to fine-tune using LoRA or its variants. We argue that not all layers contribute equally to the model adaptation. Leveraging this, we identify the most relevant layers to fine-tune by measuring their contribution to changes in internal representations. Our method is orthogonal to and readily compatible with existing low-rank adaptation techniques. We reduce the trainable parameters in LoRA-based techniques by up to 50\%, while maintaining the predictive performance across different models and tasks. Specifically, on encoder-only architectures, this reduction in trainable parameters leads to a negligible predictive performance drop on the GLUE benchmark. On decoder-only architectures, we achieve a small drop or even improvements in the predictive performance on mathematical problem-solving capabilities and coding tasks. Finally, this effectiveness extends to multimodal models, for which we also observe competitive results relative to fine-tuning with LoRA modules in all layers. Code is available at: https://github.com/c2d-usp/Layer-wise-LoRA-with-CKA
Authors' comments: Code is available at https://github.com/c2d-usp/Layer-wise-LoRA-with-CKA

Vote

Add to Library

Recommend

315. Step-Wise Refusal Dynamics in Autoregressive and Diffusion Language Models

Eliron Rahimi, Elad Hirshel, Rom Himelstein, Amit LeVi, Avi Mendelson, Chaim Baskin

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.02600v1

Vote

Add to Library

Recommend

316. LoPRo: Enhancing Low-Rank Quantization via Permuted Block-Wise Rotation

Hongyaoxing Gu, Lijuan Hu, Liye Yu, Haowei Li, Fangfang Liu

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2601.19675v1

Vote

Add to Library

Recommend

317. Closing the Modality Gap Aligns Group-Wise Semantics

Eleonora Grassucci, Giordano Cicchetti, Emanuele Frasca, Aurelio Uncini, Danilo Comminiello

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2601.18525v1

Vote

Add to Library

Recommend

318. CP Loss: Channel-wise Perceptual Loss for Time Series Forecasting

Yaohua Zha, Chunlin Fan, Peiyuan Liu, Yong Jiang, Tao Dai, Hai Wu, Shu-Tao Xia

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2601.18829v1

Vote

Add to Library

Recommend

319. Context-Aware Semantic Segmentation via Stage-Wise Attention

Antoine Carreaud, Elias Naha, Arthur Chansel, Nina Lahellec, Jan Skaloud, Adrien Gressin

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2601.11310v1

Vote

Add to Library

Recommend

320. SoLA-Vision: Fine-grained Layer-wise Linear Softmax Hybrid Attention

Ruibang Li, Guan Luo, Yiwei Zhang, Jin Gao, Bing Li, Weiming Hu

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2601.11164v1

Vote

Add to Library

Recommend

Benty-search

301. Cluster-Wise Spatio-Temporal Masking for Efficient Video-Language Pretraining

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2603.22953v1

302. Implicit Turn-Wise Policy Optimization for Proactive User-LLM Interaction

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2603.23550v1

303. BATQuant: Outlier-resilient MXFP4 Quantization via Learnable Block-wise Optimization

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2603.16590v1

304. $C^1$-generic continuum-wise expansive surface diffeomorphisms

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2603.12909v1

305. Modeling Stage-wise Evolution of User Interests for News Recommendation

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2603.10471v1

306. SRNeRV: A Scale-wise Recursive Framework for Neural Video Representation

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2603.08227v1

307. SLICE: Speech Enhancement via Layer-wise Injection of Conditioning Embeddings

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2603.05302v1

308. Diff-ES: Stage-wise Structural Diffusion Pruning via Evolutionary Search

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2603.05105v1

309. PonderLM-3: Adaptive Token-Wise Pondering with Differentiable Masking

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2603.02023v1

310. AdaPonderLM: Gated Pondering Language Models with Token-Wise Adaptive Depth

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2603.01914v1

311. Beyond performance-wise Contribution Evaluation in Federated Learning

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.22470v1

312. Unsupervised Layer-Wise Dynamic Test Time Adaptation for LLMs

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.09719v1

313. A prediction interval for the population-wise error rate

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.06828v1

314. Layer-wise LoRA fine-tuning: a similarity metric approach

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.05988v1

315. Step-Wise Refusal Dynamics in Autoregressive and Diffusion Language Models

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.02600v1

316. LoPRo: Enhancing Low-Rank Quantization via Permuted Block-Wise Rotation

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2601.19675v1

317. Closing the Modality Gap Aligns Group-Wise Semantics

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2601.18525v1

318. CP Loss: Channel-wise Perceptual Loss for Time Series Forecasting

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2601.18829v1

319. Context-Aware Semantic Segmentation via Stage-Wise Attention

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2601.11310v1

320. SoLA-Vision: Fine-grained Layer-wise Linear Softmax Hybrid Attention

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2601.11164v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.22953v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.23550v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.16590v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.12909v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.10471v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.08227v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.05302v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.05105v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.02023v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.01914v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.22470v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.09719v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.06828v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.05988v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.02600v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2601.19675v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2601.18525v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2601.18829v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2601.11310v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2601.11164v1