Xinjian Zhao, Liang Zhang, Yang Liu, Ruocheng Guo, Xiangyu Zhao
Graph contrastive learning (GCL) has emerged as a pivotal technique in the domain of graph representation learning. A crucial aspect of effective GCL is the caliber of generated positive and negative samples, which is intrinsically dictated by their resemblance to the original data. Nevertheless, precise control over similarity during sample generation presents a formidable challenge, often impeding the effective discovery of representative graph patterns. To address this challenge, we propose an innovative framework: Adversarial Curriculum Graph Contrastive Learning (ACGCL), which capitalizes on the merits of pair-wise augmentation to engender graph-level positive and negative samples with controllable similarity, alongside subgraph contrastive learning to discern effective graph patterns therein. Within the ACGCL framework, we have devised a novel adversarial curriculum training methodology that facilitates progressive learning by sequentially increasing the difficulty of distinguishing the generated samples. Notably, this approach transcends the prevalent sparsity issue inherent in conventional curriculum learning strategies by adaptively concentrating on more challenging training data. Finally, a comprehensive assessment of ACGCL is conducted through extensive experiments on six well-known benchmark datasets, wherein ACGCL conspicuously surpasses a set of state-of-the-art baselines.
Bram Vanherle, Vittorio Pippi, Silvia Cascianelli, Nick Michiels, Frank Van Reeth, Rita Cucchiara
Styled Handwritten Text Generation (HTG) has received significant attention in recent years, propelled by the success of learning-based solutions employing GANs, Transformers, and, preliminarily, Diffusion Models. Despite this surge in interest, there remains a critical yet understudied aspect - the impact of the input, both visual and textual, on the HTG model training and its subsequent influence on performance. This study delves deeper into a cutting-edge Styled-HTG approach, proposing strategies for input preparation and training regularization that allow the model to achieve better performance and generalize better. These aspects are validated through extensive analysis on several different settings and datasets. Moreover, in this work, we go beyond performance optimization and address a significant hurdle in HTG research - the lack of a standardized evaluation protocol. In particular, we propose a standardization of the evaluation protocol for HTG and conduct a comprehensive benchmarking of existing approaches. By doing so, we aim to establish a foundation for fair and meaningful comparisons between HTG strategies, fostering progress in the field.
Artem Chernikov, Henry Towsner
We investigate various forms of (model-theoretic) stability for hypergraphs
and their corresponding strengthenings of the hypergraph regularity lemma with
respect to partitions of vertices. On the one hand, we provide a complete
classification of the various possibilities in the ternary case. On the other
hand, we provide an example of a family of slice-wise stable 3-hypergraphs so
that for no partition of the vertices, any triple of parts has density close to
0 or 1. In particular, this addresses some questions and conjectures of Terry
and Wolf. We work in the general measure theoretic context of graded
probability spaces, so all our results apply both to measures in ultraproducts
of finite graphs, leading to the aforementioned combinatorial applications, and
to commuting definable Keisler measures, leading to applications in model
theory.
Authors' comments: 67 pages
Xiaofeng Liu, Nadya Shusharina, Helen A Shih, C. -C. Jay Kuo, Georges El Fakhri, Jonghye Woo
In this work, we aim to predict the survival time (ST) of glioblastoma (GBM)
patients undergoing different treatments based on preoperative magnetic
resonance (MR) scans. The personalized and precise treatment planning can be
achieved by comparing the ST of different treatments. It is well established
that both the current status of the patient (as represented by the MR scans)
and the choice of treatment are the cause of ST. While previous related
MR-based glioblastoma ST studies have focused only on the direct mapping of MR
scans to ST, they have not included the underlying causal relationship between
treatments and ST. To address this limitation, we propose a
treatment-conditioned regression model for glioblastoma ST that incorporates
treatment information in addition to MR scans. Our approach allows us to
effectively utilize the data from all of the treatments in a unified manner,
rather than having to train separate models for each of the treatments.
Furthermore, treatment can be effectively injected into each convolutional
layer through the adaptive instance normalization we employ. We evaluate our
framework on the BraTS20 ST prediction task. Three treatment options are
considered: Gross Total Resection (GTR), Subtotal Resection (STR), and no
resection. The evaluation results demonstrate the effectiveness of injecting
the treatment for estimating GBM survival.
Authors' comments: SPIE Medical Imaging 2024: Computer-Aided Diagnosis
Reduan Achtibat, Sayed Mohammad Vakilzadeh Hatefi, Maximilian Dreyer, Aakriti Jain, Thomas Wiegand, Sebastian Lapuschkin, Wojciech Samek
Large Language Models are prone to biased predictions and hallucinations, underlining the paramount importance of understanding their model-internal reasoning process. However, achieving faithful attributions for the entirety of a black-box transformer model and maintaining computational efficiency is an unsolved challenge. By extending the Layer-wise Relevance Propagation attribution method to handle attention layers, we address these challenges effectively. While partial solutions exist, our method is the first to faithfully and holistically attribute not only input but also latent representations of transformer models with the computational efficiency similar to a singular backward pass. Through extensive evaluations against existing methods on Llama 2, Flan-T5 and the Vision Transformer architecture, we demonstrate that our proposed approach surpasses alternative methods in terms of faithfulness and enables the understanding of latent representations, opening up the door for concept-based explanations. We provide an open-source implementation on GitHub https://github.com/rachtibat/LRP-for-Transformers.
Karim Helwani, Masahito Togami, Paris Smaragdis, Michael M. Goodwin
While neural network approaches have made significant strides in resolving classical signal processing problems, it is often the case that hybrid approaches that draw insight from both signal processing and neural networks produce more complete solutions. In this paper, we present a hybrid classical digital signal processing/deep neural network (DSP/DNN) approach to source separation (SS) highlighting the theoretical link between variational autoencoder and classical approaches to SS. We propose a system that transforms the single channel under-determined SS task to an equivalent multichannel over-determined SS problem in a properly designed latent space. The separation task in the latent space is treated as finding a variational block-wise disentangled representation of the mixture. We show empirically, that the design choices and the variational formulation of the task at hand motivated by the classical signal processing theoretical results lead to robustness to unseen out-of-distribution data and reduction of the overfitting risk. To address the resulting permutation issue we explicitly incorporate a novel differentiable permutation loss function and augment the model with a memory mechanism to keep track of the statistics of the individual sources.
Xunkai Li, Jingyuan Ma, Zhengyu Wu, Daohan Su, Wentao Zhang, Rong-Hua Li, Guoren Wang
Scalable graph neural networks (GNNs) have emerged as a promising technique,
which exhibits superior predictive performance and high running efficiency
across numerous large-scale graph-based web applications. However, (i) Most
scalable GNNs tend to treat all nodes in graphs with the same propagation
rules, neglecting their topological uniqueness; (ii) Existing node-wise
propagation optimization strategies are insufficient on web-scale graphs with
intricate topology, where a full portrayal of nodes' local properties is
required. Intuitively, different nodes in web-scale graphs possess distinct
topological roles, and therefore propagating them indiscriminately or neglect
local contexts may compromise the quality of node representations. This
intricate topology in web-scale graphs cannot be matched by small-scale
scenarios. To address the above issues, we propose \textbf{A}daptive
\textbf{T}opology-aware \textbf{P}ropagation (ATP), which reduces potential
high-bias propagation and extracts structural patterns of each node in a
scalable manner to improve running efficiency and predictive performance.
Remarkably, ATP is crafted to be a plug-and-play node-wise propagation
optimization strategy, allowing for offline execution independent of the graph
learning process in a new perspective. Therefore, this approach can be
seamlessly integrated into most scalable GNNs while remain orthogonal to
existing node-wise propagation optimization strategies. Extensive experiments
on 12 datasets, including the most representative large-scale ogbn-papers100M,
have demonstrated the effectiveness of ATP. Specifically, ATP has proven to be
efficient in improving the performance of prevalent scalable GNNs for
semi-supervised node classification while addressing redundant computational
costs.
Authors' comments: Accepted by WWW 2024
Dongxia Wu, Tsuyoshi Id, Aurlie Lozano, Georgios Kollias, Ji Navrtil, Naoki Abe, Yi-An Ma, Rose Yu
We address the problem of learning Granger causality from asynchronous, interdependent, multi-type event sequences. In particular, we are interested in discovering instance-level causal structures in an unsupervised manner. Instance-level causality identifies causal relationships among individual events, providing more fine-grained information for decision-making. Existing work in the literature either requires strong assumptions, such as linearity in the intensity function, or heuristically defined model parameters that do not necessarily meet the requirements of Granger causality. We propose Instance-wise Self-Attentive Hawkes Processes (ISAHP), a novel deep learning framework that can directly infer the Granger causality at the event instance level. ISAHP is the first neural point process model that meets the requirements of Granger causality. It leverages the self-attention mechanism of the transformer to align with the principles of Granger causality. We empirically demonstrate that ISAHP is capable of discovering complex instance-level causal structures that cannot be handled by classical models. We also show that ISAHP achieves state-of-the-art performance in proxy tasks involving type-level causal discovery and instance-level event type prediction.
Giorgio Saracco, Giorgio Stefani
We study Cheeger and $p$-eigenvalue partition problems depending on a given
evaluation function $\Phi$ for $p\in[1,\infty)$. We prove existence and
regularity of minima, relations among the problems, convergence, and stability
with respect to $p$ and to $\Phi$.
Authors' comments: 33 pages
Suneung Kim, Woo-Jeoung Nam, Seong-Whan Lee
Recently, appearance-based gaze estimation has been attracting attention in computer vision, and remarkable improvements have been achieved using various deep learning techniques. Despite such progress, most methods aim to infer gaze vectors from images directly, which causes overfitting to person-specific appearance factors. In this paper, we address these challenges and propose a novel framework: Stochastic subject-wise Adversarial gaZE learning (SAZE), which trains a network to generalize the appearance of subjects. We design a Face generalization Network (Fgen-Net) using a face-to-gaze encoder and face identity classifier and a proposed adversarial loss. The proposed loss generalizes face appearance factors so that the identity classifier inferences a uniform probability distribution. In addition, the Fgen-Net is trained by a learning mechanism that optimizes the network by reselecting a subset of subjects at every training step to avoid overfitting. Our experimental results verify the robustness of the method in that it yields state-of-the-art performance, achieving 3.89 and 4.42 on the MPIIGaze and EyeDiap datasets, respectively. Furthermore, we demonstrate the positive generalization effect by conducting further experiments using face images involving different styles generated from the generative model.
Ye Lin Tun, Chu Myaet Thwal, Le Quang Huy, Minh N. H. Nguyen, Choong Seon Hong
Many recent studies integrate federated learning (FL) with self-supervised learning (SSL) to take advantage of raw training data distributed across edge devices. However, edge devices often struggle with high computation and communication costs imposed by SSL and FL algorithms. To tackle this hindrance, we propose LW-FedSSL, a layer-wise federated self-supervised learning approach that allows edge devices to incrementally train one layer of the model at a time. LW-FedSSL comprises server-side calibration and representation alignment mechanisms to maintain comparable performance with end-to-end FedSSL while significantly lowering clients' resource requirements. The server-side calibration mechanism takes advantage of the resource-rich server in an FL environment to assist in global model training. Meanwhile, the representation alignment mechanism encourages closeness between representations of FL local models and those of the global model. Our experiments show that LW-FedSSL has a $3.3 \times$ lower memory requirement and a $3.2 \times$ cheaper communication cost than its end-to-end counterpart. We also explore a progressive training strategy called Prog-FedSSL that outperforms end-to-end training with a similar memory requirement and a $1.8 \times$ cheaper communication cost.
Nicolás Ayobi, Santiago Rodríguez, Alejandra Pérez, Isabela Hernández, Nicolás Aparicio, Eugénie Dessevres, Sebastián Peña, Jessica Santander et al.
This paper presents the Holistic and Multi-Granular Surgical Scene
Understanding of Prostatectomies (GraSP) dataset, a curated benchmark that
models surgical scene understanding as a hierarchy of complementary tasks with
varying levels of granularity. Our approach encompasses long-term tasks, such
as surgical phase and step recognition, and short-term tasks, including
surgical instrument segmentation and atomic visual actions detection. To
exploit our proposed benchmark, we introduce the Transformers for Actions,
Phases, Steps, and Instrument Segmentation (TAPIS) model, a general
architecture that combines a global video feature extractor with localized
region proposals from an instrument segmentation model to tackle the
multi-granularity of our benchmark. Through extensive experimentation in ours
and alternative benchmarks, we demonstrate TAPIS's versatility and
state-of-the-art performance across different tasks. This work represents a
foundational step forward in Endoscopic Vision, offering a novel framework for
future research towards holistic surgical scene understanding.
Authors' comments: Preprint submitted to Medical Image Analysis. Official extension of
previous MICCAI 2022
(https://link.springer.com/chapter/10.1007/978-3-031-16449-1_42) and ISBI
2023 (https://ieeexplore.ieee.org/document/10230819) orals. Data and codes
are available at https://github.com/BCV-Uniandes/GraSP
Marco Pacini, Xiaowen Dong, Bruno Lepri, Gabriele Santin
Equivariant neural networks have shown improved performance, expressiveness
and sample complexity on symmetrical domains. But for some specific symmetries,
representations, and choice of coordinates, the most common point-wise
activations, such as ReLU, are not equivariant, hence they cannot be employed
in the design of equivariant neural networks. The theorem we present in this
paper describes all possible combinations of finite-dimensional
representations, choice of coordinates and point-wise activations to obtain an
exactly equivariant layer, generalizing and strengthening existing
characterizations. Notable cases of practical relevance are discussed as
corollaries. Indeed, we prove that rotation-equivariant networks can only be
invariant, as it happens for any network which is equivariant with respect to
connected compact groups. Then, we discuss implications of our findings when
applied to important instances of exactly equivariant networks. First, we
completely characterize permutation equivariant networks such as Invariant
Graph Networks with point-wise nonlinearities and their geometric counterparts,
highlighting a plethora of models whose expressive power and performance are
still unknown. Second, we show that feature spaces of disentangled steerable
convolutional neural networks are trivial representations.
Authors' comments: Accepted at the 12th International Conference on Learning
Representations (ICLR 2024)
Guiming Cao, Kaize Shi, Hong Fu, Huaiwen Zhang, Guandong Xu
Pre-trained Vision-Language (V-L) models set the benchmark for generalization
to downstream tasks among the noteworthy contenders. Many characteristics of
the V-L model have been explored in existing research including the challenge
of the sensitivity to text input and the tuning process across multi-modal
prompts. With the advanced utilization of the V-L model like CLIP, recent
approaches deploy learnable prompts instead of hand-craft prompts to boost the
generalization performance and address the aforementioned challenges. Inspired
by layer-wise training, which is wildly used in image fusion, we note that
using a sequential training process to adapt different modalities branches of
CLIP efficiently facilitates the improvement of generalization. In the context
of addressing the multi-modal prompting challenge, we propose Token-wise
Adaptive for Multi-modal Prompt Learning (APLe) for tuning both modalities
prompts, vision and language, as tokens in a sequential manner. APLe addresses
the challenges in V-L models to promote prompt learning across both modalities,
which indicates a competitive generalization performance in line with the
state-of-the-art. Preeminently, APLe shows robustness and favourable
performance in prompt-length experiments with an absolute advantage in adopting
the V-L models.
Authors' comments: 7 pages,3 figures
Jonathan Fischer, Martin Schulze, Paul Rosenthal, Lars Linsen
When employing Direct Volume Rendering (DVR) for visualizing volumetric scalar fields, classification is generally performed on a piecewise constant or piecewise linear approximation of the field on a viewing ray. Smoothed Particle Hydrodynamics (SPH) data sets define volumetric scalar fields as the sum of individual particle contributions, at highly varying spatial resolution. We present an approach for approximating SPH scalar fields along viewing rays with piece-wise polynomial functions of higher order. This is done by approximating each particle contribution individually and then efficiently summing the results, thus generating a higher-order representation of the field with a resolution adapting to the data resolution in the volume.
Firas Laakom, Yuheng Bu, Moncef Gabbouj
Existing generalization theories of supervised learning typically take a
holistic approach and provide bounds for the expected generalization over the
whole data distribution, which implicitly assumes that the model generalizes
similarly for all the classes. In practice, however, there are significant
variations in generalization performance among different classes, which cannot
be captured by the existing generalization bounds. In this work, we tackle this
problem by theoretically studying the class-generalization error, which
quantifies the generalization performance of each individual class. We derive a
novel information-theoretic bound for class-generalization error using the KL
divergence, and we further obtain several tighter bounds using the conditional
mutual information (CMI), which are significantly easier to estimate in
practice. We empirically validate our proposed bounds in different neural
networks and show that they accurately capture the complex class-generalization
error behavior. Moreover, we show that the theoretical tools developed in this
paper can be applied in several applications beyond this context.
Authors' comments: 26 pages
Claire Greenwell, Poshak Gandhi, Daniel Stern, George Lansbury, Vincenzo Mainieri, Peter Boorman, Yoshiki Toba
The growth of active galactic nuclei (AGN) occurs under some form of
obscuration in a large fraction of the population. The difficulty in
constraining this population leads to high uncertainties in cosmic X-ray
background and galaxy evolution models. Using an SDSS-WISE cross-match, we
target infrared luminous AGN ($W1-W2$ > 0.8, and monochromatic rest-frame
luminosity above $\lambda L_{\lambda}$(12$\mu$m) $\approx$ 3 $\times$ 10$^{44}$
erg s$^{-1}$), but with passive galaxy-like optical spectra (Optically
Quiescent Quasars; OQQs). We find 47 objects that show no significant [O
III]$\lambda$5007 emission, a typically strong AGN optical emission line. As a
comparison sample, we examine SDSS-selected Type 2 quasars (QSO2s), which show
a significant [O III]$\lambda$5007 line by definition. We find a 1:16 ratio of
OQQs compared to QSO2s, suggesting that the OQQ duty cycle is likely much
shorter than that of QSO2s (though selection biases are not fully quantified).
We consider observed properties in comparison with other galaxy types, and
examine them for consistency with theories on their intrinsic nature: chiefly
(a) a high covering factor for surrounding obscuring matter, preventing the
detection of high-ionisation emission lines - `cocooned AGN'; or (b) ionised
gas being absent on the kpc scales of the Narrow Line Region (NLR), perhaps due
to a `switching on' or `young' AGN. OQQs do not obviously fit the standard
paradigm for merger-driven AGN and host galaxy evolution, implying we may be
missing part of the flow of AGN evolution.
Authors' comments: Accepted for publication in MNRAS (20 pages, 18 figures)
Andreas Papachristodoulou, Christos Kyrkou, Stelios Timotheou, Theocharis Theocharides
The Forward-Forward (FF) Algorithm has been recently proposed to alleviate
the issues of backpropagation (BP) commonly used to train deep neural networks.
However, its current formulation exhibits limitations such as the generation of
negative data, slower convergence, and inadequate performance on complex tasks.
In this paper, we take the main ideas of FF and improve them by leveraging
channel-wise competitive learning in the context of convolutional neural
networks for image classification tasks. A layer-wise loss function is
introduced that promotes competitive learning and eliminates the need for
negative data construction. To enhance both the learning of compositional
features and feature space partitioning, a channel-wise feature separator and
extractor block is proposed that complements the competitive learning process.
Our method outperforms recent FF-based models on image classification tasks,
achieving testing errors of 0.58%, 7.69%, 21.89%, and 48.77% on MNIST,
Fashion-MNIST, CIFAR-10 and CIFAR-100 respectively. Our approach bridges the
performance gap between FF learning and BP methods, indicating the potential of
our proposed approach to learn useful representations in a layer-wise modular
fashion, enabling more efficient and flexible learning.
Authors' comments: To be published in AAAI 2024, 11 pages, 7 figures
Gwladys Kelodjou, Laurence Roz, Vronique Masson, Luis Galrraga, Romaric Gaudel, Maurice Tchuente, Alexandre Termier
Machine learning techniques, such as deep learning and ensemble methods, are
widely used in various domains due to their ability to handle complex
real-world tasks. However, their black-box nature has raised multiple concerns
about the fairness, trustworthiness, and transparency of computer-assisted
decision-making. This has led to the emergence of local post-hoc explainability
methods, which offer explanations for individual decisions made by black-box
algorithms. Among these methods, Kernel SHAP is widely used due to its
model-agnostic nature and its well-founded theoretical framework. Despite these
strengths, Kernel SHAP suffers from high instability: different executions of
the method with the same inputs can lead to significantly different
explanations, which diminishes the utility of post-hoc explainability. The
contribution of this paper is two-fold. On the one hand, we show that Kernel
SHAP's instability is caused by its stochastic neighbor selection procedure,
which we adapt to achieve full stability without compromising explanation
fidelity. On the other hand, we show that by restricting the neighbors
generation to perturbations of size 1 -- which we call the coalitions of Layer
1 -- we obtain a novel feature-attribution method that is fully stable,
efficient to compute, and still meaningful.
Authors' comments: To appear in AAAI-24
Chanyong Jung, Gihyun Kwon, Jong Chul Ye
Recently, patch-wise contrastive learning is drawing attention for the image
translation by exploring the semantic correspondence between the input and
output images. To further explore the patch-wise topology for high-level
semantic understanding, here we exploit the graph neural network to capture the
topology-aware features. Specifically, we construct the graph based on the
patch-wise similarity from a pretrained encoder, whose adjacency matrix is
shared to enhance the consistency of patch-wise relation between the input and
the output. Then, we obtain the node feature from the graph neural network, and
enhance the correspondence between the nodes by increasing mutual information
using the contrastive loss. In order to capture the hierarchical semantic
structure, we further propose the graph pooling. Experimental results
demonstrate the state-of-art results for the image translation thanks to the
semantic encoding by the constructed graphs.
Authors' comments: AAAI 2024