Teng Zhang, Sabyasachi Chatterjee
The main result of this article is that we obtain an elementwise error bound for the Fused Lasso estimator for any general convex loss function $\rho$. We then focus on the special cases when either $\rho$ is the square loss function (for mean regression) or is the quantile loss function (for quantile regression) for which we derive new pointwise error bounds. Even though error bounds for the usual Fused Lasso estimator and its quantile version have been studied before; our bound appears to be new. This is because all previous works bound a global loss function like the sum of squared error, or a sum of Huber losses in the case of quantile regression in Padilla and Chatterjee (2021). Clearly, element wise bounds are stronger than global loss error bounds as it reveals how the loss behaves locally at each point. Our element wise error bound also has a clean and explicit dependence on the tuning parameter $\lambda$ which informs the user of a good choice of $\lambda$. In addition, our bound is nonasymptotic with explicit constants and is able to recover almost all the known results for Fused Lasso (both mean and quantile regression) with additional improvements in some cases.
Julien M. Hendrickx, Balázs Gerencsér
We consider arbitrary trajectories subject to a coordinate-wise energy
decrease: the sign of the derivative of each entry is never the same as that of
the corresponding entry of the gradient of some energy function. We show that
this simple condition guarantees convergence to a point, to the minimum of the
energy functions, or to a set where its Hessian has very specific properties.
This extends and strengthens recent results that were restricted to convex
quadratic energy functions. We demonstrate the application of our result by
using it to prove the convergence of a class of multi-agent systems subject to
multiple uncertainties.
Authors' comments: 9 pages, 1 figure
Yuqing Lan, Yao Duan, Yifei Shi, Hui Huang, Kai Xu
Context has proven to be one of the most important factors in object layout
reasoning for 3D scene understanding. Existing deep contextual models either
learn holistic features for context encoding or rely on pre-defined scene
templates for context modeling. We argue that scene understanding benefits from
object relation reasoning, which is capable of mitigating the ambiguity of 3D
object detections and thus helps locate and classify the 3D objects more
accurately and robustly. To achieve this, we propose a novel 3D relation module
(3DRM) which reasons about object relations at pair-wise levels. The 3DRM
predicts the semantic and spatial relationships between objects and extracts
the object-wise relation features. We demonstrate the effects of 3DRM by
plugging it into proposal-based and voting-based 3D object detection pipelines,
respectively. Extensive evaluations show the effectiveness and generalization
of 3DRM on 3D object detection. Our source code is available at
https://github.com/lanlan96/3DRM.
Authors' comments: 13 pages, 8 figures
Romeo Valentin, Claudio Ferrari, Jérémy Scheurer, Andisheh Amrollahi, Chris Wendler, Max B. Paulus
We present our submission for the configuration task of the Machine Learning
for Combinatorial Optimization (ML4CO) NeurIPS 2021 competition. The
configuration task is to predict a good configuration of the open-source solver
SCIP to solve a mixed integer linear program (MILP) efficiently. We pose this
task as a supervised learning problem: First, we compile a large dataset of the
solver performance for various configurations and all provided MILP instances.
Second, we use this data to train a graph neural network that learns to predict
a good configuration for a specific instance. The submission was tested on the
three problem benchmarks of the competition and improved solver performance
over the default by 12% and 35% and 8% across the hidden test instances. We
ranked 3rd out of 15 on the global leaderboard and won the student leaderboard.
We make our code publicly available at
\url{https://github.com/RomeoV/ml4co-competition} .
Authors' comments: 5 pages, 3 figures
Jin Gyu Lee, Cyrus Mostajeran, Graham Van Goffrier
We study a node-wise monotone barrier coupling law, motivated by the synaptic
coupling of neural central pattern generators. It is illustrated that this
coupling imitates the desirable properties of neural central pattern
generators. In particular, the coupling law 1) allows us to assign multiple
central patterns on the circle and 2) allows for rapid switching between
different patterns via simple `kicks'. In the end, we achieve full control by
partitioning the state space by utilizing a barrier effect and assigning a
unique steady-state behavior to each element of the resulting partition. We
analyze the global behavior and study the viability of the design.
Authors' comments: 25 pages, 8 figures
Mingxing Li, Shenglong Zhou, Chang Chen, Yueyi Zhang, Dong Liu, Zhiwei Xiong
Accurate retinal vessel segmentation is challenging because of the complex
texture of retinal vessels and low imaging contrast. Previous methods generally
refine segmentation results by cascading multiple deep networks, which are
time-consuming and inefficient. In this paper, we propose two novel methods to
address these challenges. First, we devise a light-weight module, named
multi-scale residual similarity gathering (MRSG), to generate pixel-wise
adaptive filters (PA-Filters). Different from cascading multiple deep networks,
only one PA-Filter layer can improve the segmentation results. Second, we
introduce a response cue erasing (RCE) strategy to enhance the segmentation
accuracy. Experimental results on the DRIVE, CHASE_DB1, and STARE datasets
demonstrate that our proposed method outperforms state-of-the-art methods while
maintaining a compact structure. Code is available at
https://github.com/Limingxing00/Retinal-Vessel-Segmentation-ISBI20222.
Authors' comments: Accepted by ISBI 2022
Chieh Wu, Aria Masoomi, Arthur Gretton, Jennifer Dy
There is currently a debate within the neuroscience community over the
likelihood of the brain performing backpropagation (BP). To better mimic the
brain, training a network \textit{one layer at a time} with only a "single
forward pass" has been proposed as an alternative to bypass BP; we refer to
these networks as "layer-wise" networks. We continue the work on layer-wise
networks by answering two outstanding questions. First, $\textit{do they have a
closed-form solution?}$ Second, $\textit{how do we know when to stop adding
more layers?}$ This work proves that the Kernel Mean Embedding is the
closed-form weight that achieves the network global optimum while driving these
networks to converge towards a highly desirable kernel for classification; we
call it the $\textit{Neural Indicator Kernel}$.
Authors' comments: Since this version is similar to an older version, I should have
updated the older version instead of creating a new version. I will now
retract this version, and update a previous version to this
Yuandong Tian
We show that Contrastive Learning (CL) under a broad family of loss functions
(including InfoNCE) has a unified formulation of coordinate-wise optimization
on the network parameter $\boldsymbol{\theta}$ and pairwise importance
$\alpha$, where the \emph{max player} $\boldsymbol{\theta}$ learns
representation for contrastiveness, and the \emph{min player} $\alpha$ puts
more weights on pairs of distinct samples that share similar representations.
The resulting formulation, called $\alpha$-CL, unifies not only various
existing contrastive losses, which differ by how sample-pair importance
$\alpha$ is constructed, but also is able to extrapolate to give novel
contrastive losses beyond popular ones, opening a new avenue of contrastive
loss design. These novel losses yield comparable (or better) performance on
CIFAR10, STL-10 and CIFAR-100 than classic InfoNCE. Furthermore, we also
analyze the max player in detail: we prove that with fixed $\alpha$, max player
is equivalent to Principal Component Analysis (PCA) for deep linear network,
and almost all local minima are global and rank-1, recovering optimal PCA
solutions. Finally, we extend our analysis on max player to 2-layer ReLU
networks, showing that its fixed points can have higher ranks.
Authors' comments: Add code links
Pallavi Patil, Mark Whittle, Kristina Nyland, Carol Lonsdale, Mark Lacy, Amy E. Kimball, Colin Lonsdale, Wendy Peters et al.
We present radio spectra spanning $0.1 - 10$ GHz for the sample of heavily
obscured luminous quasars with extremely red mid-infrared-optical colors and
compact radio emission. The spectra are constructed from targeted 10 GHz
observations and archival radio survey data, which together yield $6-11$ flux
density measurements for each object. Our suite of Python tools for modeling
the radio spectra is publicly available on Github. Our primary result is that
most (61%) of the sample have peaked or curved radio spectra and many (36%)
could be classified as Gigahertz Peaked Spectrum (GPS) sources. This indicates
compact emission regions likely arising from recently triggered radio jets.
Assuming synchrotron self-absorption (SSA) generates the peaks, we infer
compact source sizes ($3 - 100$ pc) with strong magnetic fields ($6 - 100$ mG)
and young ages ($30 - 10^4$ years). Conversely, free-free absorption (FFA)
could also create peaks due to the high column densities associated with the
deeply embedded nature of the sample. However, we find no correlations between
the existence or frequency of the peaks and any parameters of the MIR emission.
The high-frequency spectral indices are steep ($\alpha \approx -1$) and
correlate, weakly, with the ratio of MIR photon energy density to magnetic
energy density, suggesting that the spectral steepening could arise from
inverse Compton scattering off the intense MIR photon field. This study
provides a foundation for combining multi-frequency and mixed-resolution radio
survey data for understanding the impact of young radio jets on the ISM and
star formation rates of their host galaxies.
Authors' comments: 48 pages, 17 figures, published in Astrophysical Journal
Chen Lin, Zheyang Li, Bo Peng, Haoji Hu, Wenming Tan, Ye Ren, Shiliang Pu
This paper introduces a post-training quantization~(PTQ) method achieving
highly efficient Convolutional Neural Network~ (CNN) quantization with high
performance. Previous PTQ methods usually reduce compression error via
performing layer-by-layer parameters calibration. However, with lower
representational ability of extremely compressed parameters (e.g., the
bit-width goes less than 4), it is hard to eliminate all the layer-wise errors.
This work addresses this issue via proposing a unit-wise feature reconstruction
algorithm based on an observation of second order Taylor series expansion of
the unit-wise error. It indicates that leveraging the interaction between
adjacent layers' parameters could compensate layer-wise errors better. In this
paper, we define several adjacent layers as a Basic-Unit, and present a
unit-wise post-training algorithm which can minimize quantization error. This
method achieves near-original accuracy on ImageNet and COCO when quantizing
FP32 models to INT4 and INT3.
Authors' comments: Accepted by BMVC 2021
Surajit Borkotokey, Sujata Goala, Niharika Kakoty, Parishmita Boruah
We introduce the component-wise egalitarian Myerson value for network games. This new value being a convex combination of the Myerson value and the component-wise equal division rule is a player-based allocation rule. In network games under the cooperative framework, the Myerson value is an extreme example of marginalism, while the equal division rule signifies egalitarianism. In the proposed component-wise egalitarian Myerson value, a convexity parameter combines these two attributes and determines the degree of solidarity to the players. Here, by solidarity, we mean the mutual support or compensation among the players in a network. We provide three axiomatic characterizations of the value. Further, we propose an implementation mechanism for the component-wise egalitarian Myerson value under subgame perfect Nash equilibrium.
Théo Dessertaine, Jean-Philippe Bouchaud
We consider a simple model for multidimensional cone-wise linear dynamics
around cusp-like equilibria. We assume that the local linear evolution is
either $\mathbf{v}^\prime=\mathbb{A}\mathbf{v}$ or $\mathbb{B}\mathbf{v}$ (with
$\mathbb{A}$, $\mathbb{B}$ independently drawn a rotationally invariant
ensemble of $N \times N$ matrices) depending on the sign of the first component
of $\mathbf{v}$. We establish strong connections with the random diffusion
persistence problem. When $N \to \infty$, we find that the Lyapounov exponent
is non self-averaging, i.e. one can observe apparent stability and apparent
instability for the same system, depending on time and initial conditions.
Finite $N$ effects are also discussed, and lead to cone trapping phenomena.
Authors' comments: 5 pages, 4 figures
Hao Sun, Taiyi Wang
Although it is well known that exploration plays a key role in Reinforcement Learning (RL), prevailing exploration strategies for continuous control tasks in RL are mainly based on naive isotropic Gaussian noise regardless of the causality relationship between action space and the task and consider all dimensions of actions equally important. In this work, we propose to conduct interventions on the primal action space to discover the causal relationship between the action space and the task reward. We propose the method of State-Wise Action Refined (SWAR), which addresses the issue of action space redundancy and promote causality discovery in RL. We formulate causality discovery in RL tasks as a state-dependent action space selection problem and propose two practical algorithms as solutions. The first approach, TD-SWAR, detects task-related actions during temporal difference learning, while the second approach, Dyn-SWAR, reveals important actions through dynamic model prediction. Empirically, both methods provide approaches to understand the decisions made by RL agents and improve learning efficiency in action-redundant tasks.
Hunmin Lee, Yueyang Liu, Donghyun Kim, Yingshu Li
Non-IID dataset and heterogeneous environment of the local clients are regarded as a major issue in Federated Learning (FL), causing a downturn in the convergence without achieving satisfactory performance. In this paper, we propose a novel Label-wise clustering algorithm that guarantees the trainability among geographically dispersed heterogeneous local clients, by selecting only the local models trained with a dataset that approximates into uniformly distributed class labels, which is likely to obtain faster minimization of the loss and increment the accuracy among the FL network. Through conducting experiments on the suggested six common non-IID scenarios, we empirically show that the vanilla FL aggregation model is incapable of gaining robust convergence generating biased pre-trained local models and drifting the local weights to mislead the trainability in the worst case. Moreover, we quantitatively estimate the expected performance of the local models before training, which offers a global server to select the optimal clients, saving additional computational costs. Ultimately, in order to gain resolution of the non-convergence in such non-IID situations, we design clustering algorithms based on local input class labels, accommodating the diversity and assorting clients that could lead the overall system to attain the swift convergence as global training continues. Our paper shows that proposed Label-wise clustering demonstrates prompt and robust convergence compared to other FL algorithms when local training datasets are non-IID or coexist with IID through multiple experiments.
Lanning Wei, Huan Zhao, Zhiqiang He
In recent years, Graph Neural Networks (GNNs) have shown superior performance on diverse applications on real-world datasets. To improve the model capacity and alleviate the over-smoothing problem, several methods proposed to incorporate the intermediate layers by layer-wise connections. However, due to the highly diverse graph types, the performance of existing methods vary on diverse graphs, leading to a need for data-specific layer-wise connection methods. To address this problem, we propose a novel framework LLC (Learn Layer-wise Connections) based on neural architecture search (NAS) to learn adaptive connections among intermediate layers in GNNs. LLC contains one novel search space which consists of 3 types of blocks and learnable connections, and one differentiable search algorithm to enable the efficient search process. Extensive experiments on five real-world datasets are conducted, and the results show that the searched layer-wise connections can not only improve the performance but also alleviate the over-smoothing problem.
Norihide Tokushige
Let $\mathcal G$ be a family of subsets of an $n$-element set. The family $\mathcal G$ is called $3$-wise $t$-intersecting if the intersection of any three subsets in $\mathcal G$ is of size at least $t$. For a real number $p\in(0,1)$ we define the measure of the family by the sum of $p^{|G|}(1-p)^{n-|G|}$ over all $G\in\mathcal G$. We prove that if $t\geq 15$ and $p\leq 2/(\sqrt{4t+9}-1)$ then $p^t$ is the maximum measure of $3$-wise $t$-intersecting families, and the bound for $p$ is sharp. We also present the corresponding stability result for shifted families.
Srimanta Mandal, Kuldeep Purohit, A. N. Rajagopalan
In practice, images can contain different amounts of noise for different color channels, which is not acknowledged by existing super-resolution approaches. In this paper, we propose to super-resolve noisy color images by considering the color channels jointly. Noise statistics are blindly estimated from the input low-resolution image and are used to assign different weights to different color channels in the data cost. Implicit low-rank structure of visual data is enforced via nuclear norm minimization in association with adaptive weights, which is added as a regularization term to the cost. Additionally, multi-scale details of the image are added to the model through another regularization term that involves projection onto PCA basis, which is constructed using similar patches extracted across different scales of the input image. The results demonstrate the super-resolving capability of the approach in real scenarios.
Vijay Lingam, Chanakya Ekbote, Manan Sharma, Rahul Ragesh, Arun Iyer, Sundararajan Sellamanickam
Graph Neural Networks (GNNs) exploit signals from node features and the input
graph topology to improve node classification task performance. However, these
models tend to perform poorly on heterophilic graphs, where connected nodes
have different labels. Recently proposed GNNs work across graphs having varying
levels of homophily. Among these, models relying on polynomial graph filters
have shown promise. We observe that solutions to these polynomial graph filter
models are also solutions to an overdetermined system of equations. It suggests
that in some instances, the model needs to learn a reasonably high order
polynomial. On investigation, we find the proposed models ineffective at
learning such polynomials due to their designs. To mitigate this issue, we
perform an eigendecomposition of the graph and propose to learn multiple
adaptive polynomial filters acting on different subsets of the spectrum. We
theoretically and empirically show that our proposed model learns a better
filter, thereby improving classification accuracy. We study various aspects of
our proposed model including, dependency on the number of eigencomponents
utilized, latent polynomial filters learned, and performance of the individual
polynomials on the node classification task. We further show that our model is
scalable by evaluating over large graphs. Our model achieves performance gains
of up to 5% over the state-of-the-art models and outperforms existing
polynomial filter-based approaches in general.
Authors' comments: 28 pages, 9 figures, Under Review
Zhiguo Huang, Xiaowei Chen, Bojuan Wang
Numerous works have proven that existing neighbor-averaging Graph Neural
Networks cannot efficiently catch structure features, and many works show that
injecting structure, distance, position or spatial features can significantly
improve performance of GNNs, however, injecting overall structure and distance
into GNNs is an intuitive but remaining untouched idea. In this work, we shed
light on the direction. We first extracting hop-wise structure information and
compute distance distributional information, gathering with node's intrinsic
features, embedding them into same vector space and then adding them up. The
derived embedding vectors are then fed into GATs(like GAT, AGDN) and then
Correct and Smooth, experiments show that the DHSEGATs achieve competitive
result. The code is available at https://github.com/hzg0601/DHSEGATs.
Authors' comments: 11 pages; 1 figures;
Tal Shaharabany, Lior Wolf
The leading segmentation methods represent the output map as a pixel grid. We study an alternative representation in which the object edges are modeled, per image patch, as a polygon with $k$ vertices that is coupled with per-patch label probabilities. The vertices are optimized by employing a differentiable neural renderer to create a raster image. The delineated region is then compared with the ground truth segmentation. Our method obtains multiple state-of-the-art results: 76.26\% mIoU on the Cityscapes validation, 90.92\% IoU on the Vaihingen building segmentation benchmark, 66.82\% IoU for the MoNU microscopy dataset, and 90.91\% for the bird benchmark CUB. Our code for training and reproducing these results is attached as supplementary.