Hui Liu, Yibiao Huang, Xuejun Liu, Lei Deng
Accurate and efficient prediction of the molecular properties of drugs is one of the fundamental problems in drug research and development. Recent advancements in representation learning have been shown to greatly improve the performance of molecular property prediction. However, due to limited labeled data, supervised learning-based molecular representation algorithms can only search limited chemical space, which results in poor generalizability. In this work, we proposed a self-supervised representation learning framework for large-scale unlabeled molecules. We developed a novel molecular graph augmentation strategy, referred to as attention-wise graph mask, to generate challenging positive sample for contrastive learning. We adopted the graph attention network (GAT) as the molecular graph encoder, and leveraged the learned attention scores as masking guidance to generate molecular augmentation graphs. By minimization of the contrastive loss between original graph and masked graph, our model can capture important molecular structure and higher-order semantic information. Extensive experiments showed that our attention-wise graph mask contrastive learning exhibit state-of-the-art performance in a couple of downstream molecular property prediction tasks.
Leibo Liu, Oscar Perez-Concha, Anthony Nguyen, Vicki Bennett, Louisa Jorm
International Classification of Diseases (ICD) coding plays an important role in systematically classifying morbidity and mortality data. In this study, we propose a hierarchical label-wise attention Transformer model (HiLAT) for the explainable prediction of ICD codes from clinical documents. HiLAT firstly fine-tunes a pretrained Transformer model to represent the tokens of clinical documents. We subsequently employ a two-level hierarchical label-wise attention mechanism that creates label-specific document representations. These representations are in turn used by a feed-forward neural network to predict whether a specific ICD code is assigned to the input clinical document of interest. We evaluate HiLAT using hospital discharge summaries and their corresponding ICD-9 codes from the MIMIC-III database. To investigate the performance of different types of Transformer models, we develop ClinicalplusXLNet, which conducts continual pretraining from XLNet-Base using all the MIMIC-III clinical notes. The experiment results show that the F1 scores of the HiLAT+ClinicalplusXLNet outperform the previous state-of-the-art models for the top-50 most frequent ICD-9 codes from MIMIC-III. Visualisations of attention weights present a potential explainability tool for checking the face validity of ICD code predictions.
Yuanhao Cai, Jing Lin, Zudi Lin, Haoqian Wang, Yulun Zhang, Hanspeter Pfister, Radu Timofte, Luc Van Gool
Existing leading methods for spectral reconstruction (SR) focus on designing
deeper or wider convolutional neural networks (CNNs) to learn the end-to-end
mapping from the RGB image to its hyperspectral image (HSI). These CNN-based
methods achieve impressive restoration performance while showing limitations in
capturing the long-range dependencies and self-similarity prior. To cope with
this problem, we propose a novel Transformer-based method, Multi-stage
Spectral-wise Transformer (MST++), for efficient spectral reconstruction. In
particular, we employ Spectral-wise Multi-head Self-attention (S-MSA) that is
based on the HSI spatially sparse while spectrally self-similar nature to
compose the basic unit, Spectral-wise Attention Block (SAB). Then SABs build up
Single-stage Spectral-wise Transformer (SST) that exploits a U-shaped structure
to extract multi-resolution contextual information. Finally, our MST++,
cascaded by several SSTs, progressively improves the reconstruction quality
from coarse to fine. Comprehensive experiments show that our MST++
significantly outperforms other state-of-the-art methods. In the NTIRE 2022
Spectral Reconstruction Challenge, our approach won the First place. Code and
pre-trained models are publicly available at
https://github.com/caiyuanhao1998/MST-plus-plus.
Authors' comments: Winner of NTIRE 2022 Challenge on Spectral Reconstruction from RGB;
The First Transformer-based Method for Spectral Reconstruction
Furkan Kınlı, Barış Özcan, Furkan Kıraç
Image-level corruptions and perturbations degrade the performance of CNNs on
different downstream vision tasks. Social media filters are one of the most
common resources of various corruptions and perturbations for real-world visual
analysis applications. The negative effects of these distractive factors can be
alleviated by recovering the original images with their pure style for the
inference of the downstream vision tasks. Assuming these filters substantially
inject a piece of additional style information to the social media images, we
can formulate the problem of recovering the original versions as a reverse
style transfer problem. We introduce Contrastive Instagram Filter Removal
Network (CIFR), which enhances this idea for Instagram filter removal by
employing a novel multi-layer patch-wise contrastive style learning mechanism.
Experiments show our proposed strategy produces better qualitative and
quantitative results than the previous studies. Moreover, we present the
results of our additional experiments for proposed architecture within
different settings. Finally, we present the inference outputs and quantitative
comparison of filtered and recovered images on localization and segmentation
tasks to encourage the main motivation for this problem.
Authors' comments: Accepted to NTIRE: New Trends in Image Restoration and Enhancement
workshop and challenges at CVPR 2022
Amir Valizadeh
The human nervous system utilizes synaptic plasticity to solve optimization
problems. Previous studies have tried to add the plasticity factor to the
training process of artificial neural networks, but most of those models
require complex external control over the network or complex novel rules. In
this manuscript, a novel nature-inspired optimization algorithm is introduced
that imitates biological neural plasticity. Furthermore, the model is tested on
three datasets and the results are compared with gradient descent optimization.
Authors' comments: 12 pages, 4 figures
Zhengqi Gao, Sucheng Ren, Zihui Xue, Siting Li, Hang Zhao
Multimodal fusion emerges as an appealing technique to improve model performances on many tasks. Nevertheless, the robustness of such fusion methods is rarely involved in the present literature. In this paper, we propose a training-free robust late-fusion method by exploiting conditional independence assumption and Jacobian regularization. Our key is to minimize the Frobenius norm of a Jacobian matrix, where the resulting optimization problem is relaxed to a tractable Sylvester equation. Furthermore, we provide a theoretical error bound of our method and some insights about the function of the extra modality. Several numerical experiments on AV-MNIST, RAVDESS, and VGGsound demonstrate the efficacy of our method under both adversarial attacks and random corruptions.
Jan Friedrich, Daniela Moreno, Michael Sinhuber, Matthias Waechter, Joachim Peinke
Accurate models of turbulent wind fields have become increasingly important
in the atmospheric sciences, e.g., for the determination of spatiotemporal
correlations in wind parks, the estimation of individual loads on turbine rotor
and blades, or for the modeling of particle-turbulence interaction in
atmospheric clouds or pollutant distributions in urban settings. Due to the
prohibitive task of resolving the fields across a broad range of scales, one
oftentimes has to resort to stochastic wind field models that fulfill specific,
empirically observed, properties. Here, we present a new model for the
generation of synthetic wind fields that can be apprehended as an extension of
the well-known Mann model for inflow turbulence in the wind energy sciences.
Whereas such Gaussian random field models solely control second-order
statistics (i.e., velocity correlation tensors or kinetic energy spectra), we
explicitly show that our extended model emulates the effects of higher-order
statistics as well. Most importantly, the empirically observed phenomenon of
small-scale intermittency, which can be regarded as one of the key features of
atmospheric turbulent flows, is reproduced with high accuracy and at
considerably low computational cost. Our method is based on a recently
developed multipoint statistical description of turbulent velocity fields [J.
Friedrich et al., J. Phys. Complex. 2 045006 (2021)] and consists of a
superposition of multivariate Gaussian statistics with fluctuating covariances.
We demonstrate exemplarily how such "superstatistical" wind fields can be
constrained on a certain number of point-wise measurement data from a
meteorological mast array.
Authors' comments: 12 pages, 5 figures
Norihide Tokushige
Let $\mathcal G$ be a family of subsets of an $n$-element set. The family
$\mathcal G$ is called non-trivial $3$-wise intersecting if the intersection of
any three subsets in $\mathcal G$ is non-empty, but the intersection of all
subsets is empty. For a real number $p\in(0,1)$ we define the measure of the
family by the sum of $p^{|G|}(1-p)^{n-|G|}$ over all $G\in\mathcal G$. We
determine the maximum measure of non-trivial $3$-wise intersecting families. We
also discuss the uniqueness and stability of the corresponding optimal
structure. These results are obtained by solving linear programming problems.
Authors' comments: 30 pages
Haozhe Wang, Zhiyang Liu, Lei Zhou, Huan Yin, Marcelo H Ang
Vision-based grasp estimation is an essential part of robotic manipulation
tasks in the real world. Existing planar grasp estimation algorithms have been
demonstrated to work well in relatively simple scenes. But when it comes to
complex scenes, such as cluttered scenes with messy backgrounds and moving
objects, the algorithms from previous works are prone to generate inaccurate
and unstable grasping contact points. In this work, we first study the existing
planar grasp estimation algorithms and analyze the related challenges in
complex scenes. Secondly, we design a Pixel-wise Efficient Grasp Generation
Network (PEGG-Net) to tackle the problem of grasping in complex scenes.
PEGG-Net can achieve improved state-of-the-art performance on the Cornell
dataset (98.9%) and second-best performance on the Jacquard dataset (93.8%),
outperforming other existing algorithms without the introduction of complex
structures. Thirdly, PEGG-Net could operate in a closed-loop manner for added
robustness in dynamic environments using position-based visual servoing (PBVS).
Finally, we conduct real-world experiments on static, dynamic, and cluttered
objects in different complex scenes. The results show that our proposed network
achieves a high success rate in grasping irregular objects, household objects,
and workshop tools. To benefit the community, our trained model and
supplementary materials are available at https://github.com/HZWang96/PEGG-Net.
Authors' comments: An updated version of the paper. Fixed typos and added new content
Ryoma Kobayashi, Yusuke Mukuta, Tatsuya Harada
Learning from Label Proportions (LLP) is a weakly supervised learning method
that aims to perform instance classification from training data consisting of
pairs of bags containing multiple instances and the class label proportions
within the bags. Previous studies on multiclass LLP can be divided into two
categories according to the learning task: per-instance label classification
and per-bag label proportion estimation. However, these methods often results
in high variance estimates of the risk when applied to complex models, or lack
statistical learning theory arguments. To address this issue, we propose new
learning methods based on statistical learning theory for both per-instance and
per-bag policies. We demonstrate that the proposed methods are respectively
risk-consistent and classifier-consistent in an instance-wise manner, and
analyze the estimation error bounds. Additionally, we present a heuristic
approximation method that utilizes an existing method for regressing label
proportions to reduce the computational complexity of the proposed methods.
Through benchmark experiments, we demonstrated the effectiveness of the
proposed methods.
Authors' comments: 21 pages, 5 figures
Yuta Matsunaga, Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari
In this paper, we propose a method to generate personalized filled pauses
(FPs) with group-wise prediction models. Compared with fluent text generation,
disfluent text generation has not been widely explored. To generate more
human-like texts, we addressed disfluent text generation. The usage of
disfluency, such as FPs, rephrases, and word fragments, differs from speaker to
speaker, and thus, the generation of personalized FPs is required. However, it
is difficult to predict them because of the sparsity of position and the
frequency difference between more and less frequently used FPs. Moreover, it is
sometimes difficult to adapt FP prediction models to each speaker because of
the large variation of the tendency within each speaker. To address these
issues, we propose a method to build group-dependent prediction models by
grouping speakers on the basis of their tendency to use FPs. This method does
not require a large amount of data and time to train each speaker model. We
further introduce a loss function and a word embedding model suitable for FP
prediction. Our experimental results demonstrate that group-dependent models
can predict FPs with higher scores than a non-personalized one and the
introduced loss function and word embedding model improve the prediction
performance.
Authors' comments: Accepted to LREC 2022
Chen Tang, Kai Ouyang, Zhi Wang, Yifei Zhu, Yaowei Wang, Wen Ji, Wenwu Zhu
The exponentially large discrete search space in mixed-precision quantization
(MPQ) makes it hard to determine the optimal bit-width for each layer. Previous
works usually resort to iterative search methods on the training set, which
consume hundreds or even thousands of GPU-hours. In this study, we reveal that
some unique learnable parameters in quantization, namely the scale factors in
the quantizer, can serve as importance indicators of a layer, reflecting the
contribution of that layer to the final accuracy at certain bit-widths. These
importance indicators naturally perceive the numerical transformation during
quantization-aware training, which can precisely provide quantization
sensitivity metrics of layers. However, a deep network always contains hundreds
of such indicators, and training them one by one would lead to an excessive
time cost. To overcome this issue, we propose a joint training scheme that can
obtain all indicators at once. It considerably speeds up the indicators
training process by parallelizing the original sequential training processes.
With these learned importance indicators, we formulate the MPQ search problem
as a one-time integer linear programming (ILP) problem. That avoids the
iterative search and significantly reduces search time without limiting the
bit-width search space. For example, MPQ search on ResNet18 with our indicators
takes only 0.06 s, which improves time efficiency exponentially compared to
iterative search methods. Also, extensive experiments show our approach can
achieve SOTA accuracy on ImageNet for far-ranging models with various
constraints (e.g., BitOps, compress rate). Code is available on
https://github.com/1hunters/LIMPQ.
Authors' comments: Published on ECCV 2022, code is available on
https://github.com/1hunters/LIMPQ
Teng Zhang, Sabyasachi Chatterjee
The main result of this article is that we obtain an elementwise error bound for the Fused Lasso estimator for any general convex loss function $\rho$. We then focus on the special cases when either $\rho$ is the square loss function (for mean regression) or is the quantile loss function (for quantile regression) for which we derive new pointwise error bounds. Even though error bounds for the usual Fused Lasso estimator and its quantile version have been studied before; our bound appears to be new. This is because all previous works bound a global loss function like the sum of squared error, or a sum of Huber losses in the case of quantile regression in Padilla and Chatterjee (2021). Clearly, element wise bounds are stronger than global loss error bounds as it reveals how the loss behaves locally at each point. Our element wise error bound also has a clean and explicit dependence on the tuning parameter $\lambda$ which informs the user of a good choice of $\lambda$. In addition, our bound is nonasymptotic with explicit constants and is able to recover almost all the known results for Fused Lasso (both mean and quantile regression) with additional improvements in some cases.
Julien M. Hendrickx, Balázs Gerencsér
We consider arbitrary trajectories subject to a coordinate-wise energy
decrease: the sign of the derivative of each entry is never the same as that of
the corresponding entry of the gradient of some energy function. We show that
this simple condition guarantees convergence to a point, to the minimum of the
energy functions, or to a set where its Hessian has very specific properties.
This extends and strengthens recent results that were restricted to convex
quadratic energy functions. We demonstrate the application of our result by
using it to prove the convergence of a class of multi-agent systems subject to
multiple uncertainties.
Authors' comments: 9 pages, 1 figure
Yuqing Lan, Yao Duan, Yifei Shi, Hui Huang, Kai Xu
Context has proven to be one of the most important factors in object layout
reasoning for 3D scene understanding. Existing deep contextual models either
learn holistic features for context encoding or rely on pre-defined scene
templates for context modeling. We argue that scene understanding benefits from
object relation reasoning, which is capable of mitigating the ambiguity of 3D
object detections and thus helps locate and classify the 3D objects more
accurately and robustly. To achieve this, we propose a novel 3D relation module
(3DRM) which reasons about object relations at pair-wise levels. The 3DRM
predicts the semantic and spatial relationships between objects and extracts
the object-wise relation features. We demonstrate the effects of 3DRM by
plugging it into proposal-based and voting-based 3D object detection pipelines,
respectively. Extensive evaluations show the effectiveness and generalization
of 3DRM on 3D object detection. Our source code is available at
https://github.com/lanlan96/3DRM.
Authors' comments: 13 pages, 8 figures
Romeo Valentin, Claudio Ferrari, Jérémy Scheurer, Andisheh Amrollahi, Chris Wendler, Max B. Paulus
We present our submission for the configuration task of the Machine Learning
for Combinatorial Optimization (ML4CO) NeurIPS 2021 competition. The
configuration task is to predict a good configuration of the open-source solver
SCIP to solve a mixed integer linear program (MILP) efficiently. We pose this
task as a supervised learning problem: First, we compile a large dataset of the
solver performance for various configurations and all provided MILP instances.
Second, we use this data to train a graph neural network that learns to predict
a good configuration for a specific instance. The submission was tested on the
three problem benchmarks of the competition and improved solver performance
over the default by 12% and 35% and 8% across the hidden test instances. We
ranked 3rd out of 15 on the global leaderboard and won the student leaderboard.
We make our code publicly available at
\url{https://github.com/RomeoV/ml4co-competition} .
Authors' comments: 5 pages, 3 figures
Jin Gyu Lee, Cyrus Mostajeran, Graham Van Goffrier
We study a node-wise monotone barrier coupling law, motivated by the synaptic
coupling of neural central pattern generators. It is illustrated that this
coupling imitates the desirable properties of neural central pattern
generators. In particular, the coupling law 1) allows us to assign multiple
central patterns on the circle and 2) allows for rapid switching between
different patterns via simple `kicks'. In the end, we achieve full control by
partitioning the state space by utilizing a barrier effect and assigning a
unique steady-state behavior to each element of the resulting partition. We
analyze the global behavior and study the viability of the design.
Authors' comments: 25 pages, 8 figures
Mingxing Li, Shenglong Zhou, Chang Chen, Yueyi Zhang, Dong Liu, Zhiwei Xiong
Accurate retinal vessel segmentation is challenging because of the complex
texture of retinal vessels and low imaging contrast. Previous methods generally
refine segmentation results by cascading multiple deep networks, which are
time-consuming and inefficient. In this paper, we propose two novel methods to
address these challenges. First, we devise a light-weight module, named
multi-scale residual similarity gathering (MRSG), to generate pixel-wise
adaptive filters (PA-Filters). Different from cascading multiple deep networks,
only one PA-Filter layer can improve the segmentation results. Second, we
introduce a response cue erasing (RCE) strategy to enhance the segmentation
accuracy. Experimental results on the DRIVE, CHASE_DB1, and STARE datasets
demonstrate that our proposed method outperforms state-of-the-art methods while
maintaining a compact structure. Code is available at
https://github.com/Limingxing00/Retinal-Vessel-Segmentation-ISBI20222.
Authors' comments: Accepted by ISBI 2022
Chieh Wu, Aria Masoomi, Arthur Gretton, Jennifer Dy
There is currently a debate within the neuroscience community over the
likelihood of the brain performing backpropagation (BP). To better mimic the
brain, training a network \textit{one layer at a time} with only a "single
forward pass" has been proposed as an alternative to bypass BP; we refer to
these networks as "layer-wise" networks. We continue the work on layer-wise
networks by answering two outstanding questions. First, $\textit{do they have a
closed-form solution?}$ Second, $\textit{how do we know when to stop adding
more layers?}$ This work proves that the Kernel Mean Embedding is the
closed-form weight that achieves the network global optimum while driving these
networks to converge towards a highly desirable kernel for classification; we
call it the $\textit{Neural Indicator Kernel}$.
Authors' comments: Since this version is similar to an older version, I should have
updated the older version instead of creating a new version. I will now
retract this version, and update a previous version to this
Yuandong Tian
We show that Contrastive Learning (CL) under a broad family of loss functions
(including InfoNCE) has a unified formulation of coordinate-wise optimization
on the network parameter $\boldsymbol{\theta}$ and pairwise importance
$\alpha$, where the \emph{max player} $\boldsymbol{\theta}$ learns
representation for contrastiveness, and the \emph{min player} $\alpha$ puts
more weights on pairs of distinct samples that share similar representations.
The resulting formulation, called $\alpha$-CL, unifies not only various
existing contrastive losses, which differ by how sample-pair importance
$\alpha$ is constructed, but also is able to extrapolate to give novel
contrastive losses beyond popular ones, opening a new avenue of contrastive
loss design. These novel losses yield comparable (or better) performance on
CIFAR10, STL-10 and CIFAR-100 than classic InfoNCE. Furthermore, we also
analyze the max player in detail: we prove that with fixed $\alpha$, max player
is equivalent to Principal Component Analysis (PCA) for deep linear network,
and almost all local minima are global and rank-1, recovering optimal PCA
solutions. Finally, we extend our analysis on max player to 2-layer ReLU
networks, showing that its fixed points can have higher ranks.
Authors' comments: Add code links