Weijian Chen, Zai Yang, Zhiqiang Wei, Derrick Wing Kwan Ng, Michail Matthaiou
This paper proposes a joint active and passive beamforming design for
reconfigurable intelligent surface (RIS)-aided wireless communication systems,
adopting a piece-wise near-field channel model. While a traditional near-field
channel model, applied without any approximations, offers higher modeling
accuracy than a far-field model, it renders the system design more sensitive to
channel estimation errors (CEEs). As a remedy, we propose to adopt a piece-wise
near-field channel model that leverages the advantages of the near-field
approach while enhancing its robustness against CEEs. Our study analyzes the
impact of different channel models, including the traditional near-field, the
proposed piece-wise near-field and far-field channel models, on the
interference distribution caused by CEEs and model mismatches. Subsequently, by
treating the interference as noise, we formulate a joint active and passive
beamforming design problem to maximize the spectral efficiency (SE). The
formulated problem is then recast as a mean squared error (MSE) minimization
problem and a suboptimal algorithm is developed to iteratively update the
active and passive beamforming strategies. Simulation results demonstrate that
adopting the piece-wise near-field channel model leads to an improved SE
compared to both the near-field and far-field models in the presence of CEEs.
Furthermore, the proposed piece-wise near-field model achieves a good trade-off
between modeling accuracy and system's degrees of freedom (DoF).
Authors' comments: 28pages
Jiachen Jiang, Jinxin Zhou, Zhihui Zhu
Analyzing the similarity of internal representations has been an important technique for understanding the behavior of deep neural networks. Most existing methods for analyzing the similarity between representations of high dimensions, such as those based on Centered Kernel Alignment (CKA), rely on statistical properties of the representations for a set of data points. In this paper, we focus on transformer models and study the similarity of representations between the hidden layers of individual transformers. In this context, we show that a simple sample-wise cosine similarity metric is capable of capturing the similarity and aligns with the complicated CKA. Our experimental results on common transformers reveal that representations across layers are positively correlated, with similarity increasing when layers get closer. We provide a theoretical justification for this phenomenon under the geodesic curve assumption for the learned transformer. We then show that an increase in representation similarity implies an increase in predicted probability when directly applying the last-layer classifier to any hidden layer representation. We then propose an aligned training method to improve the effectiveness of shallow layer by enhancing the similarity between internal representations, with trained models that enjoy the following properties: (1) more early saturation events, (2) layer-wise accuracies monotonically increase and reveal the minimal depth needed for the given task, (3) when served as multi-exit models, they achieve on-par performance with standard multi-exit architectures which consist of additional classifiers designed for early exiting in shallow layers. To our knowledge, our work is the first to show that one common classifier is sufficient for multi-exit models. We conduct experiments on both vision and NLP tasks to demonstrate the performance of the proposed aligned training.
Hewen Wang, Renchi Yang, Xiaokui Xiao
Graph representation learning (GRL) is to encode graph elements into
informative vector representations, which can be used in downstream tasks for
analyzing graph-structured data and has seen extensive applications in various
domains. However, the majority of extant studies on GRL are geared towards
generating node representations, which cannot be readily employed to perform
edge-based analytics tasks in edge-attributed bipartite graphs (EABGs) that
pervade the real world, e.g., spam review detection in customer-product reviews
and identifying fraudulent transactions in user-merchant networks. Compared to
node-wise GRL, learning edge representations (ERL) on such graphs is
challenging due to the need to incorporate the structure and attribute
semantics from the perspective of edges while considering the separate
influence of two heterogeneous node sets U and V in bipartite graphs. To our
knowledge, despite its importance, limited research has been devoted to this
frontier, and existing workarounds all suffer from sub-par results.
Motivated by this, this paper designs EAGLE, an effective ERL method for
EABGs. Building on an in-depth and rigorous theoretical analysis, we propose
the factorized feature propagation (FFP) scheme for edge representations with
adequate incorporation of long-range dependencies of edges/features without
incurring tremendous computation overheads. We further ameliorate FFP as a
dual-view FFP by taking into account the influences from nodes in U and V
severally in ERL. Extensive experiments on 5 real datasets showcase the
effectiveness of the proposed EAGLE models in semi-supervised edge
classification tasks. In particular, EAGLE can attain a considerable gain of at
most 38.11% in AP and 1.86% in AUC when compared to the best baselines.
Authors' comments: 11 pages. Full version of the research paper accepted to KDD 2024
Xiaoxiong Zhang, Zhiwei Zeng, Xin Zhou, Dusit Niyato, Zhiqi Shen
Federated Knowledge Graph Embedding (FKGE) has recently garnered considerable interest due to its capacity to extract expressive representations from distributed knowledge graphs, while concurrently safeguarding the privacy of individual clients. Existing FKGE methods typically harness the arithmetic mean of entity embeddings from all clients as the global supplementary knowledge, and learn a replica of global consensus entities embeddings for each client. However, these methods usually neglect the inherent semantic disparities among distinct clients. This oversight not only results in the globally shared complementary knowledge being inundated with too much noise when tailored to a specific client, but also instigates a discrepancy between local and global optimization objectives. Consequently, the quality of the learned embeddings is compromised. To address this, we propose Personalized Federated knowledge graph Embedding with client-wise relation Graph (PFedEG), a novel approach that employs a client-wise relation graph to learn personalized embeddings by discerning the semantic relevance of embeddings from other clients. Specifically, PFedEG learns personalized supplementary knowledge for each client by amalgamating entity embedding from its neighboring clients based on their "affinity" on the client-wise relation graph. Each client then conducts personalized embedding learning based on its local triples and personalized supplementary knowledge. We conduct extensive experiments on four benchmark datasets to evaluate our method against state-of-the-art models and results demonstrate the superiority of our method.
Tian Liu, Huixin Zhang, Shubham Parashar, Shu Kong
Few-shot recognition aims to train a classification model with only a few labeled examples of pre-defined concepts, where annotation can be costly in a downstream task. In another related research area, zero-shot recognition, which assumes no access to any downstream-task data, has been greatly advanced by using pretrained Vision-Language Models (VLMs). In this area, retrieval-augmented learning (RAL) effectively boosts zero-shot accuracy by retrieving and learning from external data relevant to downstream concepts. Motivated by these advancements, our work explores RAL for few-shot recognition. While seemingly straightforward despite being under-explored in the literature (till now!), we present novel challenges and opportunities for applying RAL for few-shot recognition. First, perhaps surprisingly, simply finetuning the VLM on a large amount of retrieved data barely surpasses state-of-the-art zero-shot methods due to the imbalanced distribution of retrieved data and its domain gaps compared to few-shot annotated data. Second, finetuning a VLM on few-shot examples alone significantly outperforms prior methods, and finetuning on the mix of retrieved and few-shot data yields even better results. Third, to mitigate the imbalanced distribution and domain gap issue, we propose Stage-Wise Augmented fineTuning (SWAT) method, which involves end-to-end finetuning on mixed data for the first stage and retraining the classifier solely on the few-shot data in the second stage. Extensive experiments show that SWAT achieves the best performance on standard benchmark datasets, resoundingly outperforming prior works by ~10% in accuracy. Code is available at https://github.com/tian1327/SWAT.
Qi-Jie Li, Qian Sun, Shao-Qun Zhang
Identifying gene splicing is a core and significant task confronted in modern collaboration between artificial intelligence and bioinformatics. Past decades have witnessed great efforts on this concern, such as the bio-plausible splicing pattern AT-CG and the famous SpliceAI. In this paper, we propose a novel framework for the task of gene splicing identification, named Horizon-wise Gene Splicing Identification (H-GSI). The proposed H-GSI follows the horizon-wise identification paradigm and comprises four components: the pre-processing procedure transforming string data into tensors, the sliding window technique handling long sequences, the SeqLab model, and the predictor. In contrast to existing studies that process gene information with a truncated fixed-length sequence, H-GSI employs a horizon-wise identification paradigm in which all positions in a sequence are predicted with only one forward computation, improving accuracy and efficiency. The experiments conducted on the real-world Human dataset show that our proposed H-GSI outperforms SpliceAI and achieves the best accuracy of 97.20\%. The source code is available from this link.
Xiaoxiao Ma, Mohan Zhou, Tao Liang, Yalong Bai, Tiejun Zhao, Biye Li, Huaian Chen, Yi Jin
We introduce STAR, a text-to-image model that employs a scale-wise
auto-regressive paradigm. Unlike VAR, which is constrained to class-conditioned
synthesis for images up to 256$\times$256, STAR enables text-driven image
generation up to 1024$\times$1024 through three key designs. First, we
introduce a pre-trained text encoder to extract and adopt representations for
textual constraints, enhancing details and generalizability. Second, given the
inherent structural correlation across different scales, we leverage 2D Rotary
Positional Encoding (RoPE) and tweak it into a normalized version, ensuring
consistent interpretation of relative positions across token maps and
stabilizing the training process. Third, we observe that simultaneously
sampling all tokens within a single scale can disrupt inter-token
relationships, leading to structural instability, particularly in
high-resolution generation. To address this, we propose a novel stable sampling
method that incorporates causal relationships into the sampling process,
ensuring both rich details and stable structures. Compared to previous
diffusion models and auto-regressive models, STAR surpasses existing benchmarks
in fidelity, text-image consistency, and aesthetic quality, requiring just
2.21s for 1024$\times$1024 images on A100. This highlights the potential of
auto-regressive methods in high-quality image synthesis, offering new
directions for the text-to-image generation.
Authors' comments: 16 pages
Bohan Lyu, Jianzhong Li
This paper introduces a new type of regression methodology named as Convex-Area-Wise Linear Regression(CALR), which separates given datasets by disjoint convex areas and fits different linear regression models for different areas. This regression model is highly interpretable, and it is able to interpolate any given datasets, even when the underlying relationship between explanatory and response variables are non-linear and discontinuous. In order to solve CALR problem, 3 accurate algorithms are proposed under different assumptions. The analysis of correctness and time complexity of the algorithms are given, indicating that the problem can be solved in $o(n^2)$ time accurately when the input datasets have some special features. Besides, this paper introduces an equivalent mixed integer programming problem of CALR which can be approximately solved using existing optimization solvers.
Samuel Deng, Daniel Hsu, Jingwen Liu
We study the problem of online multi-group learning, a learning model in which an online learner must simultaneously achieve small prediction regret on a large collection of (possibly overlapping) subsequences corresponding to a family of groups. Groups are subsets of the context space, and in fairness applications, they may correspond to subpopulations defined by expressive functions of demographic attributes. In contrast to previous work on this learning model, we consider scenarios in which the family of groups is too large to explicitly enumerate, and hence we seek algorithms that only access groups via an optimization oracle. In this paper, we design such oracle-efficient algorithms with sublinear regret under a variety of settings, including: (i) the i.i.d. setting, (ii) the adversarial setting with smoothed context distributions, and (iii) the adversarial transductive setting.
Feilong Jiang, Xiaonan Hou, Min Xia
As a promising framework for resolving partial differential equations (PDEs), physics-informed neural networks (PINNs) have received widespread attention from industrial and scientific fields. However, lack of expressive ability and initialization pathology issues are found to prevent the application of PINNs in complex PDEs. In this work, we propose Element-wise Multiplication Based Physics-informed Neural Networks (EM-PINNs) to resolve these issues. The element-wise multiplication operation is adopted to transform features into high-dimensional, non-linear spaces, which effectively enhance the expressive capability of PINNs. Benefiting from element-wise multiplication operation, EM-PINNs can eliminate the initialization pathologies of PINNs. The proposed structure is verified on various benchmarks. The results show that EM-PINNs have strong expressive ability.
Felix Zahner, Soumyajyoti Haldar, Roland Wiesendanger, Stefan Heinze, Kirsten von Bergmann, André Kubetzka
Diffusion on surfaces is a fundamental process in surface science, governing
nanostructure and film growth, molecular self-assembly, and chemical reactions.
Atom motion on non-magnetic surfaces has been studied extensively both
theoretically and by real-space imaging techniques. For magnetic surfaces
density functional theory (DFT) calculations have predicted strong effects of
the magnetic state onto adatom diffusion, but to date no corresponding
experimental data exists. Here, we investigate Co and Rh atoms on a hexagonal
magnetic layer, using scanning tunneling microscopy (STM) and DFT calculations.
Experimentally, we "kick" atoms by local voltage pulses and thereby initiate
strictly one-dimensional motion which is dictated by the row-wise
antiferromagnetic (AFM) state. Our calculations show that the one-dimensional
motion of Co and Rh atoms results from conserving the Co spin direction during
movement and avoiding high induced Rh spin moments, respectively. These
findings demonstrate that magnetism can be a means to control adatom mobility.
Authors' comments: 5 main figures, 3 extended figures
Andrew Parry, Sean MacAvaney, Debasis Ganguly
Large Language Models (LLMs) have significantly impacted many facets of
natural language processing and information retrieval. Unlike previous
encoder-based approaches, the enlarged context window of these generative
models allows for ranking multiple documents at once, commonly called list-wise
ranking. However, there are still limits to the number of documents that can be
ranked in a single inference of the model, leading to the broad adoption of a
sliding window approach to identify the k most relevant items in a ranked list.
We argue that the sliding window approach is not well-suited for list-wise
re-ranking because it (1) cannot be parallelized in its current form, (2) leads
to redundant computational steps repeatedly re-scoring the best set of
documents as it works its way up the initial ranking, and (3) prioritizes the
lowest-ranked documents for scoring rather than the highest-ranked documents by
taking a bottom-up approach. Motivated by these shortcomings and an initial
study that shows list-wise rankers are biased towards relevant documents at the
start of their context window, we propose a novel algorithm that partitions a
ranking to depth k and processes documents top-down. Unlike sliding window
approaches, our algorithm is inherently parallelizable due to the use of a
pivot element, which can be compared to documents down to an arbitrary depth
concurrently. In doing so, we reduce the number of expected inference calls by
around 33% when ranking at depth 100 while matching the performance of prior
approaches across multiple strong re-rankers.
Authors' comments: 16 pages, 3 figures, 2 tables
Peng Li, Yuan Liu, Xiaoxiao Long, Feihu Zhang, Cheng Lin, Mengfei Li, Xingqun Qi, Shanghang Zhang et al.
In this paper, we introduce Era3D, a novel multiview diffusion method that
generates high-resolution multiview images from a single-view image. Despite
significant advancements in multiview generation, existing methods still suffer
from camera prior mismatch, inefficacy, and low resolution, resulting in
poor-quality multiview images. Specifically, these methods assume that the
input images should comply with a predefined camera type, e.g. a perspective
camera with a fixed focal length, leading to distorted shapes when the
assumption fails. Moreover, the full-image or dense multiview attention they
employ leads to an exponential explosion of computational complexity as image
resolution increases, resulting in prohibitively expensive training costs. To
bridge the gap between assumption and reality, Era3D first proposes a
diffusion-based camera prediction module to estimate the focal length and
elevation of the input image, which allows our method to generate images
without shape distortions. Furthermore, a simple but efficient attention layer,
named row-wise attention, is used to enforce epipolar priors in the multiview
diffusion, facilitating efficient cross-view information fusion. Consequently,
compared with state-of-the-art methods, Era3D generates high-quality multiview
images with up to a 512*512 resolution while reducing computation complexity by
12x times. Comprehensive experiments demonstrate that Era3D can reconstruct
high-quality and detailed 3D meshes from diverse single-view input images,
significantly outperforming baseline multiview diffusion methods. Project page:
https://penghtyx.github.io/Era3D/.
Authors' comments: NeurIPS2024
Yufei Gu
Double descent presents a counter-intuitive aspect within the machine
learning domain, and researchers have observed its manifestation in various
models and tasks. While some theoretical explanations have been proposed for
this phenomenon in specific contexts, an accepted theory for its occurring
mechanism in deep learning remains yet to be established. In this study, we
revisited the phenomenon of double descent and discussed the conditions of its
occurrence. This paper introduces the concept of class-activation matrices and
a methodology for estimating the effective complexity of functions, on which we
unveil that over-parameterized models exhibit more distinct and simpler class
patterns in hidden activations compared to under-parameterized ones. We further
looked into the interpolation of noisy labelled data among clean
representations and demonstrated overfitting w.r.t. expressive capacity. By
comprehensively analysing hypotheses and presenting corresponding empirical
evidence that either validates or contradicts these hypotheses, we aim to
provide fresh insights into the phenomenon of double descent and benign
over-parameterization and facilitate future explorations. By comprehensively
studying different hypotheses and the corresponding empirical evidence either
supports or challenges these hypotheses, our goal is to offer new insights into
the phenomena of double descent and benign over-parameterization, thereby
enabling further explorations in the field. The source code is available at
https://github.com/Yufei-Gu-451/sparse-generalization.git.
Authors' comments: arXiv admin note: text overlap with arXiv:2310.13572
Yang Yang, Nan Jiang, Yi Xu, De-Chuan Zhan
Open-set Semi-supervised Learning (OSSL) holds a realistic setting that unlabeled data may come from classes unseen in the labeled set, i.e., out-of-distribution (OOD) data, which could cause performance degradation in conventional SSL models. To handle this issue, except for the traditional in-distribution (ID) classifier, some existing OSSL approaches employ an extra OOD detection module to avoid the potential negative impact of the OOD data. Nevertheless, these approaches typically employ the entire set of open-set data during their training process, which may contain data unfriendly to the OSSL task that can negatively influence the model performance. This inspires us to develop a robust open-set data selection strategy for OSSL. Through a theoretical understanding from the perspective of learning theory, we propose Wise Open-set Semi-supervised Learning (WiseOpen), a generic OSSL framework that selectively leverages the open-set data for training the model. By applying a gradient-variance-based selection mechanism, WiseOpen exploits a friendly subset instead of the whole open-set dataset to enhance the model's capability of ID classification. Moreover, to reduce the computational expense, we also propose two practical variants of WiseOpen by adopting low-frequency update and loss-based selection respectively. Extensive experiments demonstrate the effectiveness of WiseOpen in comparison with the state-of-the-art.
Nick, Nikzad, Yongsheng Gao, Jun Zhou
In recent years, convolutional neural networks (CNNs) with channel-wise feature refining mechanisms have brought noticeable benefits to modelling channel dependencies. However, current attention paradigms fail to infer an optimal channel descriptor capable of simultaneously exploiting statistical and spatial relationships among feature maps. In this paper, to overcome this shortcoming, we present a novel channel-wise spatially autocorrelated (CSA) attention mechanism. Inspired by geographical analysis, the proposed CSA exploits the spatial relationships between channels of feature maps to produce an effective channel descriptor. To the best of our knowledge, this is the f irst time that the concept of geographical spatial analysis is utilized in deep CNNs. The proposed CSA imposes negligible learning parameters and light computational overhead to the deep model, making it a powerful yet efficient attention module of choice. We validate the effectiveness of the proposed CSA networks (CSA-Nets) through extensive experiments and analysis on ImageNet, and MS COCO benchmark datasets for image classification, object detection, and instance segmentation. The experimental results demonstrate that CSA-Nets are able to consistently achieve competitive performance and superior generalization than several state-of-the-art attention-based CNNs over different benchmark tasks and datasets.
Dieter Verbruggen, Sofie Pollin, Hazem Sallouha
Deep learning (DL) techniques are increasingly pervasive across various domains, including wireless communication, where they extract insights from raw radio signals. However, the computational demands of DL pose significant challenges, particularly in distributed wireless networks like Cell-free networks, where deploying DL models on edge devices becomes hard due to heightened computational loads. These computational loads escalate with larger input sizes, often correlating with improved model performance. To mitigate this challenge, Early Exiting (EE) techniques have been introduced in DL, primarily targeting the depth of the model. This approach enables models to exit during inference based on specified criteria, leveraging entropy measures at intermediate exits. Doing so makes less complex samples exit early, reducing computational load and inference time. In our contribution, we propose a novel width-wise exiting strategy for Convolutional Neural Network (CNN)-based architectures. By selectively adjusting the input size, we aim to regulate computational demands effectively. Our approach aims to decrease the average computational load during inference while maintaining performance levels comparable to conventional models. We specifically investigate Modulation Classification, a well-established application of DL in wireless communication. Our experimental results show substantial reductions in computational load, with an average decrease of 28%, and particularly notable reductions of 65% in high-SNR scenarios. Through this work, we present a practical solution for reducing computational demands in deep learning applications, particularly within the domain of wireless communication.
Qianqian Qi, David J. Hessen, Aike N. Vonk, Peter G. M. van der Heijden
Correspondence analysis (CA) is a popular technique to visualize the relationship between two categorical variables. CA uses the data from a two-way contingency table and is affected by the presence of outliers. The supplementary points method is a popular method to handle outliers. Its disadvantage is that the information from entire rows or columns is removed. However, outliers can be caused by cells only. In this paper, a reconstitution algorithm is introduced to cope with such cells. This algorithm can reduce the contribution of cells in CA instead of deleting entire rows or columns. Thus the remaining information in the row and column involved can be used in the analysis. The reconstitution algorithm is compared with two alternative methods for handling outliers, the supplementary points method and MacroPCA. It is shown that the proposed strategy works well.
Junting Dong, Qi Fang, Zehuan Huang, Xudong Xu, Jingbo Wang, Sida Peng, Bo Dai
This paper addresses the task of 3D clothed human generation from textural descriptions. Previous works usually encode the human body and clothes as a holistic model and generate the whole model in a single-stage optimization, which makes them struggle for clothing editing and meanwhile lose fine-grained control over the whole generation process. To solve this, we propose a layer-wise clothed human representation combined with a progressive optimization strategy, which produces clothing-disentangled 3D human models while providing control capacity for the generation process. The basic idea is progressively generating a minimal-clothed human body and layer-wise clothes. During clothing generation, a novel stratified compositional rendering method is proposed to fuse multi-layer human models, and a new loss function is utilized to help decouple the clothing model from the human body. The proposed method achieves high-quality disentanglement, which thereby provides an effective way for 3D garment generation. Extensive experiments demonstrate that our approach achieves state-of-the-art 3D clothed human generation while also supporting cloth editing applications such as virtual try-on. Project page: http://jtdong.com/tela_layer/
Yi Hu, Hanchi Ren, Chen Hu, Jingjing Deng, Xianghua Xie
Federated learning (FL) is a powerful Machine Learning (ML) paradigm that
enables distributed clients to collaboratively learn a shared global model
while keeping the data on the original device, thereby preserving privacy. A
central challenge in FL is the effective aggregation of local model weights
from disparate and potentially unbalanced participating clients. Existing
methods often treat each client indiscriminately, applying a single proportion
to the entire local model. However, it is empirically advantageous for each
weight to be assigned a specific proportion. This paper introduces an
innovative Element-Wise Weights Aggregation Method for Federated Learning
(EWWA-FL) aimed at optimizing learning performance and accelerating convergence
speed. Unlike traditional FL approaches, EWWA-FL aggregates local weights to
the global model at the level of individual elements, thereby allowing each
participating client to make element-wise contributions to the learning
process. By taking into account the unique dataset characteristics of each
client, EWWA-FL enhances the robustness of the global model to different
datasets while also achieving rapid convergence. The method is flexible enough
to employ various weighting strategies. Through comprehensive experiments, we
demonstrate the advanced capabilities of EWWA-FL, showing significant
improvements in both accuracy and convergence speed across a range of backbones
and benchmarks.
Authors' comments: 2023 IEEE International Conference on Data Mining Workshops (ICDMW)