benty-fields - Search paper

Analyzing the similarity of internal representations has been an important technique for understanding the behavior of deep neural networks. Most existing methods for analyzing the similarity between representations of high dimensions, such as those based on Centered Kernel Alignment (CKA), rely on statistical properties of the representations for a set of data points. In this paper, we focus on transformer models and study the similarity of representations between the hidden layers of individual transformers. In this context, we show that a simple sample-wise cosine similarity metric is capable of capturing the similarity and aligns with the complicated CKA. Our experimental results on common transformers reveal that representations across layers are positively correlated, with similarity increasing when layers get closer. We provide a theoretical justification for this phenomenon under the geodesic curve assumption for the learned transformer. We then show that an increase in representation similarity implies an increase in predicted probability when directly applying the last-layer classifier to any hidden layer representation. We then propose an aligned training method to improve the effectiveness of shallow layer by enhancing the similarity between internal representations, with trained models that enjoy the following properties: (1) more early saturation events, (2) layer-wise accuracies monotonically increase and reveal the minimal depth needed for the given task, (3) when served as multi-exit models, they achieve on-par performance with standard multi-exit architectures which consist of additional classifiers designed for early exiting in shallow layers. To our knowledge, our work is the first to show that one common classifier is sufficient for multi-exit models. We conduct experiments on both vision and NLP tasks to demonstrate the performance of the proposed aligned training.

Vote

Add to Library

Recommend

475. Effective Edge-wise Representation Learning in Edge-Attributed Bipartite Graphs

Hewen Wang, Renchi Yang, Xiaokui Xiao

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2406.13369v1

Graph representation learning (GRL) is to encode graph elements into informative vector representations, which can be used in downstream tasks for analyzing graph-structured data and has seen extensive applications in various domains. However, the majority of extant studies on GRL are geared towards generating node representations, which cannot be readily employed to perform edge-based analytics tasks in edge-attributed bipartite graphs (EABGs) that pervade the real world, e.g., spam review detection in customer-product reviews and identifying fraudulent transactions in user-merchant networks. Compared to node-wise GRL, learning edge representations (ERL) on such graphs is challenging due to the need to incorporate the structure and attribute semantics from the perspective of edges while considering the separate influence of two heterogeneous node sets U and V in bipartite graphs. To our knowledge, despite its importance, limited research has been devoted to this frontier, and existing workarounds all suffer from sub-par results. Motivated by this, this paper designs EAGLE, an effective ERL method for EABGs. Building on an in-depth and rigorous theoretical analysis, we propose the factorized feature propagation (FFP) scheme for edge representations with adequate incorporation of long-range dependencies of edges/features without incurring tremendous computation overheads. We further ameliorate FFP as a dual-view FFP by taking into account the influences from nodes in U and V severally in ERL. Extensive experiments on 5 real datasets showcase the effectiveness of the proposed EAGLE models in semi-supervised edge classification tasks. In particular, EAGLE can attain a considerable gain of at most 38.11% in AP and 1.86% in AUC when compared to the best baselines.
Authors' comments: 11 pages. Full version of the research paper accepted to KDD 2024

Vote

Add to Library

Recommend

476. Personalized Federated Knowledge Graph Embedding with Client-Wise Relation Graph

Xiaoxiong Zhang, Zhiwei Zeng, Xin Zhou, Dusit Niyato, Zhiqi Shen

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2406.11943v1

Vote

Add to Library

Recommend

477. Few-Shot Recognition via Stage-Wise Augmented Finetuning

Tian Liu, Huixin Zhang, Shubham Parashar, Shu Kong

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2406.11148v1

Vote

Add to Library

Recommend

478. Horizon-wise Learning Paradigm Promotes Gene Splicing Identification

Qi-Jie Li, Qian Sun, Shao-Qun Zhang

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2406.11900v1

Vote

Add to Library

Recommend

479. STAR: Scale-wise Text-conditioned AutoRegressive image generation

Xiaoxiao Ma, Mohan Zhou, Tao Liang, Yalong Bai, Tiejun Zhao, Biye Li, Huaian Chen, Yi Jin

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2406.10797v4

Vote

Add to Library

Recommend

480. Convex-area-wise Linear Regression and Algorithms for Data Analysis

Bohan Lyu, Jianzhong Li

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2406.05817v1

Vote

Add to Library

Recommend

Benty-search

461. Customizing Language Models with Instance-wise LoRA for Sequential Recommendation

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2408.10159v3

462. Penny-Wise and Pound-Foolish in Deepfake Detection

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2408.08412v1

463. Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2408.05710v1

464. Differentially Private Block-wise Gradient Shuffle for Deep Learning

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2407.21347v1

465. Certain Properties of Indices-dependent Element-wise Transformed Matrices

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2409.09033v1

466. Straightforward Layer-wise Pruning for More Efficient Visual Adaptation

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2407.14330v1

467. PolyFormer: Scalable Node-wise Filters via Polynomial Graph Transformer

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2407.14459v1

468. Instance-wise Uncertainty for Class Imbalance in Semantic Segmentation

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2407.12609v1

469. Layer-Wise Relevance Propagation with Conservation Property for ResNet

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2407.09115v1

470. OutlierTune: Efficient Channel-Wise Quantization for Large Language Models

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2406.18832v1

471. CharED: Character-wise Ensemble Decoding for Large Language Models

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2407.11009v1

472. Conformal time series decomposition with component-wise exchangeability

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2406.16766v1

473. RIS-aided MIMO Beamforming: Piece-Wise Near-field Channel Model

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2406.14939v1

474. Tracing Representation Progression: Analyzing and Enhancing Layer-Wise Similarity

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2406.14479v3

475. Effective Edge-wise Representation Learning in Edge-Attributed Bipartite Graphs

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2406.13369v1

476. Personalized Federated Knowledge Graph Embedding with Client-Wise Relation Graph

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2406.11943v1

477. Few-Shot Recognition via Stage-Wise Augmented Finetuning

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2406.11148v1

478. Horizon-wise Learning Paradigm Promotes Gene Splicing Identification

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2406.11900v1

479. STAR: Scale-wise Text-conditioned AutoRegressive image generation

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2406.10797v4

480. Convex-area-wise Linear Regression and Algorithms for Data Analysis

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2406.05817v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2408.10159v3

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2408.08412v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2408.05710v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2407.21347v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2409.09033v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2407.14330v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2407.14459v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2407.12609v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2407.09115v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2406.18832v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2407.11009v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2406.16766v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2406.14939v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2406.14479v3

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2406.13369v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2406.11943v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2406.11148v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2406.11900v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2406.10797v4

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2406.05817v1