Samuel Deng, Daniel Hsu, Jingwen Liu
We study the problem of online multi-group learning, a learning model in which an online learner must simultaneously achieve small prediction regret on a large collection of (possibly overlapping) subsequences corresponding to a family of groups. Groups are subsets of the context space, and in fairness applications, they may correspond to subpopulations defined by expressive functions of demographic attributes. In contrast to previous work on this learning model, we consider scenarios in which the family of groups is too large to explicitly enumerate, and hence we seek algorithms that only access groups via an optimization oracle. In this paper, we design such oracle-efficient algorithms with sublinear regret under a variety of settings, including: (i) the i.i.d. setting, (ii) the adversarial setting with smoothed context distributions, and (iii) the adversarial transductive setting.
Feilong Jiang, Xiaonan Hou, Min Xia
As a promising framework for resolving partial differential equations (PDEs), physics-informed neural networks (PINNs) have received widespread attention from industrial and scientific fields. However, lack of expressive ability and initialization pathology issues are found to prevent the application of PINNs in complex PDEs. In this work, we propose Element-wise Multiplication Based Physics-informed Neural Networks (EM-PINNs) to resolve these issues. The element-wise multiplication operation is adopted to transform features into high-dimensional, non-linear spaces, which effectively enhance the expressive capability of PINNs. Benefiting from element-wise multiplication operation, EM-PINNs can eliminate the initialization pathologies of PINNs. The proposed structure is verified on various benchmarks. The results show that EM-PINNs have strong expressive ability.
Felix Zahner, Soumyajyoti Haldar, Roland Wiesendanger, Stefan Heinze, Kirsten von Bergmann, André Kubetzka
Diffusion on surfaces is a fundamental process in surface science, governing
nanostructure and film growth, molecular self-assembly, and chemical reactions.
Atom motion on non-magnetic surfaces has been studied extensively both
theoretically and by real-space imaging techniques. For magnetic surfaces
density functional theory (DFT) calculations have predicted strong effects of
the magnetic state onto adatom diffusion, but to date no corresponding
experimental data exists. Here, we investigate Co and Rh atoms on a hexagonal
magnetic layer, using scanning tunneling microscopy (STM) and DFT calculations.
Experimentally, we "kick" atoms by local voltage pulses and thereby initiate
strictly one-dimensional motion which is dictated by the row-wise
antiferromagnetic (AFM) state. Our calculations show that the one-dimensional
motion of Co and Rh atoms results from conserving the Co spin direction during
movement and avoiding high induced Rh spin moments, respectively. These
findings demonstrate that magnetism can be a means to control adatom mobility.
Authors' comments: 5 main figures, 3 extended figures
Andrew Parry, Sean MacAvaney, Debasis Ganguly
Large Language Models (LLMs) have significantly impacted many facets of
natural language processing and information retrieval. Unlike previous
encoder-based approaches, the enlarged context window of these generative
models allows for ranking multiple documents at once, commonly called list-wise
ranking. However, there are still limits to the number of documents that can be
ranked in a single inference of the model, leading to the broad adoption of a
sliding window approach to identify the k most relevant items in a ranked list.
We argue that the sliding window approach is not well-suited for list-wise
re-ranking because it (1) cannot be parallelized in its current form, (2) leads
to redundant computational steps repeatedly re-scoring the best set of
documents as it works its way up the initial ranking, and (3) prioritizes the
lowest-ranked documents for scoring rather than the highest-ranked documents by
taking a bottom-up approach. Motivated by these shortcomings and an initial
study that shows list-wise rankers are biased towards relevant documents at the
start of their context window, we propose a novel algorithm that partitions a
ranking to depth k and processes documents top-down. Unlike sliding window
approaches, our algorithm is inherently parallelizable due to the use of a
pivot element, which can be compared to documents down to an arbitrary depth
concurrently. In doing so, we reduce the number of expected inference calls by
around 33% when ranking at depth 100 while matching the performance of prior
approaches across multiple strong re-rankers.
Authors' comments: 16 pages, 3 figures, 2 tables
Peng Li, Yuan Liu, Xiaoxiao Long, Feihu Zhang, Cheng Lin, Mengfei Li, Xingqun Qi, Shanghang Zhang et al.
In this paper, we introduce Era3D, a novel multiview diffusion method that
generates high-resolution multiview images from a single-view image. Despite
significant advancements in multiview generation, existing methods still suffer
from camera prior mismatch, inefficacy, and low resolution, resulting in
poor-quality multiview images. Specifically, these methods assume that the
input images should comply with a predefined camera type, e.g. a perspective
camera with a fixed focal length, leading to distorted shapes when the
assumption fails. Moreover, the full-image or dense multiview attention they
employ leads to an exponential explosion of computational complexity as image
resolution increases, resulting in prohibitively expensive training costs. To
bridge the gap between assumption and reality, Era3D first proposes a
diffusion-based camera prediction module to estimate the focal length and
elevation of the input image, which allows our method to generate images
without shape distortions. Furthermore, a simple but efficient attention layer,
named row-wise attention, is used to enforce epipolar priors in the multiview
diffusion, facilitating efficient cross-view information fusion. Consequently,
compared with state-of-the-art methods, Era3D generates high-quality multiview
images with up to a 512*512 resolution while reducing computation complexity by
12x times. Comprehensive experiments demonstrate that Era3D can reconstruct
high-quality and detailed 3D meshes from diverse single-view input images,
significantly outperforming baseline multiview diffusion methods. Project page:
https://penghtyx.github.io/Era3D/.
Authors' comments: NeurIPS2024
Yufei Gu
Double descent presents a counter-intuitive aspect within the machine
learning domain, and researchers have observed its manifestation in various
models and tasks. While some theoretical explanations have been proposed for
this phenomenon in specific contexts, an accepted theory for its occurring
mechanism in deep learning remains yet to be established. In this study, we
revisited the phenomenon of double descent and discussed the conditions of its
occurrence. This paper introduces the concept of class-activation matrices and
a methodology for estimating the effective complexity of functions, on which we
unveil that over-parameterized models exhibit more distinct and simpler class
patterns in hidden activations compared to under-parameterized ones. We further
looked into the interpolation of noisy labelled data among clean
representations and demonstrated overfitting w.r.t. expressive capacity. By
comprehensively analysing hypotheses and presenting corresponding empirical
evidence that either validates or contradicts these hypotheses, we aim to
provide fresh insights into the phenomenon of double descent and benign
over-parameterization and facilitate future explorations. By comprehensively
studying different hypotheses and the corresponding empirical evidence either
supports or challenges these hypotheses, our goal is to offer new insights into
the phenomena of double descent and benign over-parameterization, thereby
enabling further explorations in the field. The source code is available at
https://github.com/Yufei-Gu-451/sparse-generalization.git.
Authors' comments: arXiv admin note: text overlap with arXiv:2310.13572
Yang Yang, Nan Jiang, Yi Xu, De-Chuan Zhan
Open-set Semi-supervised Learning (OSSL) holds a realistic setting that unlabeled data may come from classes unseen in the labeled set, i.e., out-of-distribution (OOD) data, which could cause performance degradation in conventional SSL models. To handle this issue, except for the traditional in-distribution (ID) classifier, some existing OSSL approaches employ an extra OOD detection module to avoid the potential negative impact of the OOD data. Nevertheless, these approaches typically employ the entire set of open-set data during their training process, which may contain data unfriendly to the OSSL task that can negatively influence the model performance. This inspires us to develop a robust open-set data selection strategy for OSSL. Through a theoretical understanding from the perspective of learning theory, we propose Wise Open-set Semi-supervised Learning (WiseOpen), a generic OSSL framework that selectively leverages the open-set data for training the model. By applying a gradient-variance-based selection mechanism, WiseOpen exploits a friendly subset instead of the whole open-set dataset to enhance the model's capability of ID classification. Moreover, to reduce the computational expense, we also propose two practical variants of WiseOpen by adopting low-frequency update and loss-based selection respectively. Extensive experiments demonstrate the effectiveness of WiseOpen in comparison with the state-of-the-art.
Nick, Nikzad, Yongsheng Gao, Jun Zhou
In recent years, convolutional neural networks (CNNs) with channel-wise feature refining mechanisms have brought noticeable benefits to modelling channel dependencies. However, current attention paradigms fail to infer an optimal channel descriptor capable of simultaneously exploiting statistical and spatial relationships among feature maps. In this paper, to overcome this shortcoming, we present a novel channel-wise spatially autocorrelated (CSA) attention mechanism. Inspired by geographical analysis, the proposed CSA exploits the spatial relationships between channels of feature maps to produce an effective channel descriptor. To the best of our knowledge, this is the f irst time that the concept of geographical spatial analysis is utilized in deep CNNs. The proposed CSA imposes negligible learning parameters and light computational overhead to the deep model, making it a powerful yet efficient attention module of choice. We validate the effectiveness of the proposed CSA networks (CSA-Nets) through extensive experiments and analysis on ImageNet, and MS COCO benchmark datasets for image classification, object detection, and instance segmentation. The experimental results demonstrate that CSA-Nets are able to consistently achieve competitive performance and superior generalization than several state-of-the-art attention-based CNNs over different benchmark tasks and datasets.
Dieter Verbruggen, Sofie Pollin, Hazem Sallouha
Deep learning (DL) techniques are increasingly pervasive across various domains, including wireless communication, where they extract insights from raw radio signals. However, the computational demands of DL pose significant challenges, particularly in distributed wireless networks like Cell-free networks, where deploying DL models on edge devices becomes hard due to heightened computational loads. These computational loads escalate with larger input sizes, often correlating with improved model performance. To mitigate this challenge, Early Exiting (EE) techniques have been introduced in DL, primarily targeting the depth of the model. This approach enables models to exit during inference based on specified criteria, leveraging entropy measures at intermediate exits. Doing so makes less complex samples exit early, reducing computational load and inference time. In our contribution, we propose a novel width-wise exiting strategy for Convolutional Neural Network (CNN)-based architectures. By selectively adjusting the input size, we aim to regulate computational demands effectively. Our approach aims to decrease the average computational load during inference while maintaining performance levels comparable to conventional models. We specifically investigate Modulation Classification, a well-established application of DL in wireless communication. Our experimental results show substantial reductions in computational load, with an average decrease of 28%, and particularly notable reductions of 65% in high-SNR scenarios. Through this work, we present a practical solution for reducing computational demands in deep learning applications, particularly within the domain of wireless communication.
Qianqian Qi, David J. Hessen, Aike N. Vonk, Peter G. M. van der Heijden
Correspondence analysis (CA) is a popular technique to visualize the relationship between two categorical variables. CA uses the data from a two-way contingency table and is affected by the presence of outliers. The supplementary points method is a popular method to handle outliers. Its disadvantage is that the information from entire rows or columns is removed. However, outliers can be caused by cells only. In this paper, a reconstitution algorithm is introduced to cope with such cells. This algorithm can reduce the contribution of cells in CA instead of deleting entire rows or columns. Thus the remaining information in the row and column involved can be used in the analysis. The reconstitution algorithm is compared with two alternative methods for handling outliers, the supplementary points method and MacroPCA. It is shown that the proposed strategy works well.
Junting Dong, Qi Fang, Zehuan Huang, Xudong Xu, Jingbo Wang, Sida Peng, Bo Dai
This paper addresses the task of 3D clothed human generation from textural descriptions. Previous works usually encode the human body and clothes as a holistic model and generate the whole model in a single-stage optimization, which makes them struggle for clothing editing and meanwhile lose fine-grained control over the whole generation process. To solve this, we propose a layer-wise clothed human representation combined with a progressive optimization strategy, which produces clothing-disentangled 3D human models while providing control capacity for the generation process. The basic idea is progressively generating a minimal-clothed human body and layer-wise clothes. During clothing generation, a novel stratified compositional rendering method is proposed to fuse multi-layer human models, and a new loss function is utilized to help decouple the clothing model from the human body. The proposed method achieves high-quality disentanglement, which thereby provides an effective way for 3D garment generation. Extensive experiments demonstrate that our approach achieves state-of-the-art 3D clothed human generation while also supporting cloth editing applications such as virtual try-on. Project page: http://jtdong.com/tela_layer/
Yi Hu, Hanchi Ren, Chen Hu, Jingjing Deng, Xianghua Xie
Federated learning (FL) is a powerful Machine Learning (ML) paradigm that
enables distributed clients to collaboratively learn a shared global model
while keeping the data on the original device, thereby preserving privacy. A
central challenge in FL is the effective aggregation of local model weights
from disparate and potentially unbalanced participating clients. Existing
methods often treat each client indiscriminately, applying a single proportion
to the entire local model. However, it is empirically advantageous for each
weight to be assigned a specific proportion. This paper introduces an
innovative Element-Wise Weights Aggregation Method for Federated Learning
(EWWA-FL) aimed at optimizing learning performance and accelerating convergence
speed. Unlike traditional FL approaches, EWWA-FL aggregates local weights to
the global model at the level of individual elements, thereby allowing each
participating client to make element-wise contributions to the learning
process. By taking into account the unique dataset characteristics of each
client, EWWA-FL enhances the robustness of the global model to different
datasets while also achieving rapid convergence. The method is flexible enough
to employ various weighting strategies. Through comprehensive experiments, we
demonstrate the advanced capabilities of EWWA-FL, showing significant
improvements in both accuracy and convergence speed across a range of backbones
and benchmarks.
Authors' comments: 2023 IEEE International Conference on Data Mining Workshops (ICDMW)
Pierre Lelièvre, Chien-Chung Chen
Attribution methods are primarily designed to study the distribution of input
component contributions to individual model predictions. However, some research
applications require a summary of attribution patterns across the entire
dataset to facilitate the interpretability of the scrutinized models. In this
paper, we present a new method called Integrated Gradient Correlation (IGC)
that relates dataset-wise attributions to a model prediction score and enables
region-specific analysis by a direct summation over associated components. We
demonstrate our method on scalar predictions with the study of image feature
representation in the brain from fMRI neural signals and the estimation of
neural population receptive fields (NSD dataset), as well as on categorical
predictions with the investigation of handwritten digit recognition (MNIST
dataset). The resulting IGC attributions show selective patterns, revealing
underlying model strategies coherent with their respective objectives.
Authors' comments: 12 pages, 8 figures, source code at
https://github.com/plelievre/int_grad_corr.git
Wencheng Zhu, Xin Zhou, Pengfei Zhu, Yu Wang, Qinghua Hu
In this paper, we propose a simple yet effective contrastive knowledge distillation framework that achieves sample-wise logit alignment while preserving semantic consistency. Conventional knowledge distillation approaches exhibit over-reliance on feature similarity per sample, which risks overfitting, and contrastive approaches focus on inter-class discrimination at the expense of intra-sample semantic relationships. Our approach transfers "dark knowledge" through teacher-student contrastive alignment at the sample level. Specifically, our method first enforces intra-sample alignment by directly minimizing teacher-student logit discrepancies within individual samples. Then, we utilize inter-sample contrasts to preserve semantic dissimilarities across samples. By redefining positive pairs as aligned teacher-student logits from identical samples and negative pairs as cross-sample logit combinations, we reformulate these dual constraints into an InfoNCE loss framework, reducing computational complexity lower than sample squares while eliminating dependencies on temperature parameters and large batch sizes. We conduct comprehensive experiments across three benchmark datasets, including the CIFAR-100, ImageNet-1K, and MS COCO datasets, and experimental results clearly confirm the effectiveness of the proposed method on image classification, object detection, and instance segmentation tasks.
Marco Berrettini, Christian Hennig, Cinzia Viroli
Quantile-based classifiers can classify high-dimensional observations by minimising a discrepancy of an observation to a class based on suitable quantiles of the within-class distributions, corresponding to a unique percentage for all variables. The present work extends these classifiers by introducing a way to determine potentially different optimal percentages for different variables. Furthermore, a variable-wise scale parameter is introduced. A simple greedy algorithm to estimate the parameters is proposed. Their consistency in a nonparametric setting is proved. Experiments using artificially generated and real data confirm the potential of the quantile-based classifier with variable-wise parameters.
Khoi Do, Duong Nguyen, Nguyen H. Tran, Viet Dung Nguyen
Beyond class frequency, we recognize the impact of class-wise relationships among various class-specific predictions and the imbalance in label masks on long-tailed segmentation learning. To address these challenges, we propose an innovative Pixel-wise Adaptive Training (PAT) technique tailored for long-tailed segmentation. PAT has two key features: 1) class-wise gradient magnitude homogenization, and 2) pixel-wise class-specific loss adaptation (PCLA). First, the class-wise gradient magnitude homogenization helps alleviate the imbalance among label masks by ensuring equal consideration of the class-wise impact on model updates. Second, PCLA tackles the detrimental impact of both rare classes within the long-tailed distribution and inaccurate predictions from previous training stages by encouraging learning classes with low prediction confidence and guarding against forgetting classes with high confidence. This combined approach fosters robust learning while preventing the model from forgetting previously learned knowledge. PAT exhibits significant performance improvements, surpassing the current state-of-the-art by 2.2% in the NyU dataset. Moreover, it enhances overall pixel-wise accuracy by 2.85% and intersection over union value by 2.07%, with a particularly notable declination of 0.39% in detecting rare classes compared to Balance Logits Variation, as demonstrated on the three popular datasets, i.e., OxfordPetIII, CityScape, and NYU.
Francis Duey, James Schombert, Stacy McGaugh, Federico Lelli
We present WISE W1 photometry of the SPARC (Spitzer Photometry and Accurate
Rotation Curves) sample. The baseline of near-IR fluxes is established for use
by stellar mass models, a key component to the baryonic Tully-Fisher relation
and other kinematic galaxies scaling relations. We focus this paper on
determination of the characteristics of the W1 fluxes compared to IRAC 3.6
fluxes, internal accuracy limitations from photometric techniques, external
accuracy by comparison to other work in the literature and the range of W1 to
IRAC 3.6 colors. We outline the behavior of SDSS g, W1 and IRAC 3.6 colors with
respect to underlying SED features. We also note a previously unknown
correlation between WISE colors and the central surface brightness, probably
related to the low metallicity of low surface brightness dwarfs.
Authors' comments: Accepted to AJ, 19 pages, 10 figures
Tianyu Huang, Liangzu Peng, René Vidal, Yun-Hui Liu
Given an input set of $3$D point pairs, the goal of outlier-robust $3$D
registration is to compute some rotation and translation that align as many
point pairs as possible. This is an important problem in computer vision, for
which many highly accurate approaches have been recently proposed. Despite
their impressive performance, these approaches lack scalability, often
overflowing the $16$GB of memory of a standard laptop to handle roughly
$30,000$ point pairs. In this paper, we propose a $3$D registration approach
that can process more than ten million ($10^7$) point pairs with over $99\%$
random outliers. Moreover, our method is efficient, entails low memory costs,
and maintains high accuracy at the same time. We call our method TEAR, as it
involves minimizing an outlier-robust loss that computes Truncated Entry-wise
Absolute Residuals. To minimize this loss, we decompose the original
$6$-dimensional problem into two subproblems of dimensions $3$ and $2$,
respectively, solved in succession to global optimality via a customized
branch-and-bound method. While branch-and-bound is often slow and unscalable,
this does not apply to TEAR as we propose novel bounding functions that are
tight and computationally efficient. Experiments on various datasets are
conducted to validate the scalability and efficiency of our method.
Authors' comments: 24 pages, 12 figures. Accepted to CVPR 2024
Mei Qiu, Wei Lin, Lauren Ann Christopher, Stanley Chien, Yaobin Chen, Shu Hu
In the US, thousands of Pan, Tilt, and Zoom (PTZ) traffic cameras monitor highway conditions. There is a great interest in using these highway cameras to gather valuable road traffic data to support traffic analysis and decision-making for highway safety and efficient traffic management. However, there are too many cameras for a few human traffic operators to effectively monitor, so a fully automated solution is desired. This paper introduces a novel system that learns the locations of highway lanes and traffic directions from these camera feeds automatically. It collects real-time, lane-specific traffic data continuously, even adjusting for changes in camera angle or zoom. This facilitates efficient traffic analysis, decision-making, and improved highway safety.
Ningyi Liao, Zihao Yu, Siqiang Luo
Graph Neural Networks (GNNs) have shown promising performance in various graph learning tasks, but at the cost of resource-intensive computations. The primary overhead of GNN update stems from graph propagation and weight transformation, both involving operations on graph-scale matrices. Previous studies attempt to reduce the computational budget by leveraging graph-level or network-level sparsification techniques, resulting in downsized graph or weights. In this work, we propose Unifews, which unifies the two operations in an entry-wise manner considering individual matrix elements, and conducts joint edge-weight sparsification to enhance learning efficiency. The entry-wise design of Unifews enables adaptive compression across GNN layers with progressively increased sparsity, and is applicable to a variety of architectural designs with on-the-fly operation simplification. Theoretically, we establish a novel framework to characterize sparsified GNN learning in view of a graph optimization process, and prove that Unifews effectively approximates the learning objective with bounded error and reduced computational load. We conduct extensive experiments to evaluate the performance of our method in diverse settings. Unifews is advantageous in jointly removing more than 90% of edges and weight entries with comparable or better accuracy than baseline models. The sparsification offers remarkable efficiency improvements including 10-20x matrix operation reduction and up to 100x acceleration in graph propagation time for the largest graph at the billion-edge scale.