benty-fields - Search paper

Zero-shot proxies, also known as training-free metrics, are widely adopted to reduce the computational overhead in neural network evaluation for scenarios such as Neural Architecture Search (NAS), as they do not require any training. Existing zero-shot metrics have several limitations, including weak correlation with the true performance and poor generalisation across different networks or downstream tasks. For example, most of these metrics apply only to either convolutional neural networks (CNNs) or Transformers, but not both. To address these limitations, we propose Sample-Wise Activation Patterns (SWAP), and its derivative, SWAP-Score, a novel and highly effective zero-shot metric. SWAP-Score is broadly applicable across both architecture families and task domains, demonstrating strong predictive performance in the majority of tasks. This metric measures the expressivity of neural networks over a mini-batch of samples, showing a high correlation with the neural networks' ground-truth performance. For both CNNs and Transformers, the SWAP-Score outperforms existing zero-shot metrics across computer vision and natural language processing tasks. For instance, Spearman's correlation coefficient between the SWAP-Score and CIFAR-10 validation accuracy for DARTS CNNs is 0.93, and 0.71 for FlexiBERT Transformers on GLUE tasks. Moreover, SWAP-Score is label-independent, hence can be applied at the pre-training stage of language models to estimate their performance for downstream tasks. When applied to NAS, SWAP-empowered NAS, SWAP-NAS can achieve competitive performance using only approximately 6 and 9 minutes of GPU time, on CIFAR-10 and ImageNet respectively. Our code is available at: https://github.com/pym1024/SWAP_Universal
Authors' comments: Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence. arXiv admin note: text overlap with arXiv:2403.04161

Vote

Add to Library

Recommend

289. Block-wise Codeword Embedding for Reliable Multi-bit Text Watermarking

Joeun Kim, HoEun Kim, Dongsup Jin, Young-Sik Kim

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2605.00348v1

Vote

Add to Library

Recommend

290. Solidarity of Spectral Gaps for Component-Wise Markov Chains

Youngwoo Kwon, Galin Jones, Qian Qin

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2604.23229v1

Vote

Add to Library

Recommend

291. Geometric Layer-wise Approximation Rates for Deep Networks

Shijun Zhang, Zuowei Shen, Yuesheng Xu

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2604.20219v1

Vote

Add to Library

Recommend

292. Mitigating Multimodal Hallucination via Phase-wise Self-reward

Yu Zhang, Chuyang Sun, Kehai Chen, Xuefeng Bai, Yang Xiang, Min Zhang

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2604.17982v1

Vote

Add to Library

Recommend

293. Concept-wise Attention for Fine-grained Concept Bottleneck Models

Minghong Zhong, Guoshuai Zou, Kanghao Chen, Dexia Chen, Ruixuan Wang

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2604.15748v1

Vote

Add to Library

Recommend

294. GroupDPO: Memory efficient Group-wise Direct Preference Optimization

Jixuan Leng, Si Si, Hsiang-Fu Yu, Vinod Raman, Inderjit S. Dhillon

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2604.15602v1

Vote

Add to Library

Recommend

295. SpecBound: Adaptive Bounded Self-Speculation with Layer-wise Confidence Calibration

Zhuofan Wen, Yang Feng

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2604.12247v1

Vote

Add to Library

Recommend

296. A Layer-wise Analysis of Supervised Fine-Tuning

Qinghua Zhao, Xueling Gong, Xinyu Chen, Zhongfeng Kang, Xinlu Li

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2604.11838v1

Vote

Add to Library

Recommend

297. Bayesian Semiparametric Multivariate Density Regression with Coordinate-Wise Predictor Selection

Giovanni Toto, Peter Müller, Abhra Sarkar

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2604.08470v1

Vote

Add to Library

Recommend

298. Weight Group-wise Post-Training Quantization for Medical Foundation Model

Yineng Chen, Peng Huang, Aozhong Zhang, Hui Guo, Penghang Yin, Shu Hu, Shao Lin, Xin Li et al.

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2604.07674v1

Vote

Add to Library

Recommend

299. VecAttention: Vector-wise Sparse Attention for Accelerating Long Context Inference

Anmin Liu, Ruixuan Yang, Huiqiang Jiang, Bin Lin, Minmin Sun, Yong Li, Chen Zhang, Tao Xie

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.29494v1

Vote

Add to Library

Recommend

300. AuthorMix: Modular Authorship Style Transfer via Layer-wise Adapter Mixing

Sarubi Thillainathan, Ji-Ung Lee, Michael Sullivan, Alexander Koller

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.23069v1

Vote

Add to Library

Recommend

Benty-search

281. Fast Tensorization of Neural Networks via Slice-wise Feature Distillation

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2605.19842v1

282. Geometry-Lite: Interpretable Safety Probing via Layer-Wise Margin Geometry

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2605.20241v1

283. Robust Audio Tagging under Class-wise Supervision Unreliability

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2605.17512v1

284. Model of Simplicial Complexes with dimension-wise preferential attachment

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2605.17004v1

285. QuBridge: Layer-wise Fidelity Decomposition in Quantum Computation Pipeline

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2605.11529v1

286. UniRank: Unified List-wise Reranking via Confidence-Ordered Denoising

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2605.10527v1

287. OrScale: Orthogonalised Optimization with Layer-Wise Trust-Ratio Scaling

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2605.07815v1

288. Zero-Shot Neural Network Evaluation with Sample-Wise Activation Patterns

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2605.07378v1

289. Block-wise Codeword Embedding for Reliable Multi-bit Text Watermarking

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2605.00348v1

290. Solidarity of Spectral Gaps for Component-Wise Markov Chains

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2604.23229v1

291. Geometric Layer-wise Approximation Rates for Deep Networks

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2604.20219v1

292. Mitigating Multimodal Hallucination via Phase-wise Self-reward

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2604.17982v1

293. Concept-wise Attention for Fine-grained Concept Bottleneck Models

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2604.15748v1

294. GroupDPO: Memory efficient Group-wise Direct Preference Optimization

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2604.15602v1

295. SpecBound: Adaptive Bounded Self-Speculation with Layer-wise Confidence Calibration

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2604.12247v1

296. A Layer-wise Analysis of Supervised Fine-Tuning

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2604.11838v1

297. Bayesian Semiparametric Multivariate Density Regression with Coordinate-Wise Predictor Selection

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2604.08470v1

298. Weight Group-wise Post-Training Quantization for Medical Foundation Model

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2604.07674v1

299. VecAttention: Vector-wise Sparse Attention for Accelerating Long Context Inference

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2603.29494v1

300. AuthorMix: Modular Authorship Style Transfer via Layer-wise Adapter Mixing

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2603.23069v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2605.19842v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2605.20241v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2605.17512v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2605.17004v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2605.11529v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2605.10527v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2605.07815v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2605.07378v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2605.00348v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2604.23229v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2604.20219v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2604.17982v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2604.15748v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2604.15602v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2604.12247v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2604.11838v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2604.08470v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2604.07674v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.29494v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.23069v1