Benoît de Courson, Léo Fitouchi, Jean-Philippe Bouchaud, Michael Benzaquen
The ability to learn from others (social learning) is often deemed a cause of
human species success. But if social learning is indeed more efficient (whether
less costly or more accurate) than individual learning, it raises the question
of why would anyone engage in individual information seeking, which is a
necessary condition for social learning's efficacy. We propose an evolutionary
model solving this paradox, provided agents (i) aim not only at information
quality but also vie for audience and prestige, and (ii) do not only value
accuracy but also reward originality -- allowing them to alleviate herding
effects. We find that under some conditions (large enough success rate of
informed agents and intermediate taste for popularity), both social learning's
higher accuracy and the taste for original opinions are evolutionary-stable,
within a mutually beneficial division of labour-like equilibrium. When such
conditions are not met, the system most often converges towards mutually
detrimental equilibria.
Authors' comments: 11 pages, 6 figures, 1 table
Yuval Nirkin, Lior Wolf, Tal Hassner
We present a novel, real-time, semantic segmentation network in which the
encoder both encodes and generates the parameters (weights) of the decoder.
Furthermore, to allow maximal adaptivity, the weights at each decoder block
vary spatially. For this purpose, we design a new type of hypernetwork,
composed of a nested U-Net for drawing higher level context features, a
multi-headed weight generating module which generates the weights of each block
in the decoder immediately before they are consumed, for efficient memory
utilization, and a primary network that is composed of novel dynamic patch-wise
convolutions. Despite the usage of less-conventional blocks, our architecture
obtains real-time performance. In terms of the runtime vs. accuracy trade-off,
we surpass state of the art (SotA) results on popular semantic segmentation
benchmarks: PASCAL VOC 2012 (val. set) and real-time semantic segmentation on
Cityscapes, and CamVid. The code is available: https://nirkin.com/hyperseg.
Authors' comments: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Shachar Gluska, Mark Grobman
Quantization is an essential step in the efficient deployment of deep learning models and as such is an increasingly popular research topic. An important practical aspect that is not addressed in the current literature is how to analyze and fix fail cases where the use of quantization results in excessive degradation. In this paper, we present a simple analytic framework that breaks down overall degradation to its per layer contributions. We analyze many common networks and observe that a layer's contribution is determined by both intrinsic (local) factors - the distribution of the layer's weights and activations - and extrinsic (global) factors having to do with the the interaction with the rest of the layers. Layer-wise analysis of existing quantization schemes reveals local fail-cases of existing techniques which are not reflected when inspecting their overall performance. As an example, we consider ResNext26 on which SoTA post-training quantization methods perform poorly. We show that almost all of the degradation stems from a single layer. The same analysis also allows for local fixes - applying a common weight clipping heuristic only to this layer reduces degradation to a minimum while applying the same heuristic globally results in high degradation. More generally, layer-wise analysis allows for a more nuanced examination of how quantization affects the network, enabling the design of better performing schemes.
Junchen Ye, Leilei Sun, Bowen Du, Yanjie Fu, Hui Xiong
Graph Convolutional Network (GCN) has been widely applied in transportation demand prediction due to its excellent ability to capture non-Euclidean spatial dependence among station-level or regional transportation demands. However, in most of the existing research, the graph convolution was implemented on a heuristically generated adjacency matrix, which could neither reflect the real spatial relationships of stations accurately, nor capture the multi-level spatial dependence of demands adaptively. To cope with the above problems, this paper provides a novel graph convolutional network for transportation demand prediction. Firstly, a novel graph convolution architecture is proposed, which has different adjacency matrices in different layers and all the adjacency matrices are self-learned during the training process. Secondly, a layer-wise coupling mechanism is provided, which associates the upper-level adjacency matrix with the lower-level one. It also reduces the scale of parameters in our model. Lastly, a unitary network is constructed to give the final prediction result by integrating the hidden spatial states with gated recurrent unit, which could capture the multi-level spatial dependence and temporal dynamics simultaneously. Experiments have been conducted on two real-world datasets, NYC Citi Bike and NYC Taxi, and the results demonstrate the superiority of our model over the state-of-the-art ones.
Lorenzo Luciano, Imre Kiss, Peter William Beardshear, Esther Kadosh, A. Ben Hamza
The performance levels of a computing machine running a given workload configuration are crucial for both users and providers of computing resources. Knowing how well a computing machine is running with a given workload configuration is critical to making proper computing resource allocation decisions. In this paper, we introduce a novel framework for deriving computing machine and computing resource performance indicators for a given workload configuration. We propose a workload/machine index score (WISE) framework for computing a fitness score for a workload/machine combination. The WISE score indicates how well a computing machine is running with a specific workload configuration by addressing the issue of whether resources are being stressed or sitting idle wasting precious resources. In addition to encompassing any number of computing resources, the WISE score is determined by considering how far from target levels the machine resources are operating at without maxing out. Experimental results demonstrate the efficacy of the proposed WISE framework on two distinct workload configurations.
Vajira Thambawita, Steven Hicks, Pål Halvorsen, Michael A. Riegler
Segmentation of findings in the gastrointestinal tract is a challenging but also an important task which is an important building stone for sufficient automatic decision support systems. In this work, we present our solution for the Medico 2020 task, which focused on the problem of colon polyp segmentation. We present our simple but efficient idea of using an augmentation method that uses grids in a pyramid-like manner (large to small) for segmentation. Our results show that the proposed methods work as indented and can also lead to comparable results when competing with other methods.
Xueyi Li, Tianfei Zhou, Jianwu Li, Yi Zhou, Zhaoxiang Zhang
Acquiring sufficient ground-truth supervision to train deep visual models has
been a bottleneck over the years due to the data-hungry nature of deep
learning. This is exacerbated in some structured prediction tasks, such as
semantic segmentation, which requires pixel-level annotations. This work
addresses weakly supervised semantic segmentation (WSSS), with the goal of
bridging the gap between image-level annotations and pixel-level segmentation.
We formulate WSSS as a novel group-wise learning task that explicitly models
semantic dependencies in a group of images to estimate more reliable pseudo
ground-truths, which can be used for training more accurate segmentation
models. In particular, we devise a graph neural network (GNN) for group-wise
semantic mining, wherein input images are represented as graph nodes, and the
underlying relations between a pair of images are characterized by an efficient
co-attention mechanism. Moreover, in order to prevent the model from paying
excessive attention to common semantics only, we further propose a graph
dropout layer, encouraging the model to learn more accurate and complete object
responses. The whole network is end-to-end trainable by iterative message
passing, which propagates interaction cues over the images to progressively
improve the performance. We conduct experiments on the popular PASCAL VOC 2012
and COCO benchmarks, and our model yields state-of-the-art performance. Our
code is available at: https://github.com/Lixy1997/Group-WSSS.
Authors' comments: Accepted to AAAI 2021. Code: https://github.com/Lixy1997/Group-WSSS
Zhibin Li, Jian Zhang, Yongshun Gong, Yazhou Yao, Qiang Wu
We propose a new method for learning with multi-field categorical data.
Multi-field categorical data are usually collected over many heterogeneous
groups. These groups can reflect in the categories under a field. The existing
methods try to learn a universal model that fits all data, which is challenging
and inevitably results in learning a complex model. In contrast, we propose a
field-wise learning method leveraging the natural structure of data to learn
simple yet efficient one-to-one field-focused models with appropriate
constraints. In doing this, the models can be fitted to each category and thus
can better capture the underlying differences in data. We present a model that
utilizes linear models with variance and low-rank constraints, to help it
generalize better and reduce the number of parameters. The model is also
interpretable in a field-wise manner. As the dimensionality of multi-field
categorical data can be very high, the models applied to such data are mostly
over-parameterized. Our theoretical analysis can potentially explain the effect
of over-parametrization on the generalization of our model. It also supports
the variance constraints in the learning objective. The experiment results on
two large-scale datasets show the superior performance of our model, the trend
of the generalization error bound, and the interpretability of learning
outcomes. Our code is available at
https://github.com/lzb5600/Field-wise-Learning.
Authors' comments: Accepted at NeurIPS 2020
Jeffrey Fong, Siwei Chen, Kaiqi Chen
Training neural networks with large batch is of fundamental significance to deep learning. Large batch training remarkably reduces the amount of training time but has difficulties in maintaining accuracy. Recent works have put forward optimization methods such as LARS and LAMB to tackle this issue through adaptive layer-wise optimization using trust ratios. Though prevailing, such methods are observed to still suffer from unstable and extreme trust ratios which degrades performance. In this paper, we propose a new variant of LAMB, called LAMBC, which employs trust ratio clipping to stabilize its magnitude and prevent extreme values. We conducted experiments on image classification tasks such as ImageNet and CIFAR-10 and our empirical results demonstrate promising improvements across different batch sizes.
Hao Zheng, Yulei Qin, Yun Gu, Fangfang Xie, Jie Yang, Jiayuan Sun, Guang-zhong Yang
Automated airway segmentation is a prerequisite for pre-operative diagnosis and intra-operative navigation for pulmonary intervention. Due to the small size and scattered spatial distribution of peripheral bronchi, this is hampered by severe class imbalance between foreground and background regions, which makes it challenging for CNN-based methods to parse distal small airways. In this paper, we demonstrate that this problem is arisen by gradient erosion and dilation of the neighborhood voxels. During back-propagation, if the ratio of the foreground gradient to background gradient is small while the class imbalance is local, the foreground gradients can be eroded by their neighborhoods. This process cumulatively increases the noise information included in the gradient flow from top layers to the bottom ones, limiting the learning of small structures in CNNs. To alleviate this problem, we use group supervision and the corresponding WingsNet to provide complementary gradient flows to enhance the training of shallow layers. To further address the intra-class imbalance between large and small airways, we design a General Union loss function which obviates the impact of airway size by distance-based weights and adaptively tunes the gradient ratio based on the learning process. Extensive experiments on public datasets demonstrate that the proposed method can predict the airway structures with higher accuracy and better morphological completeness than the baselines.
Zitong Yu, Xiaobai Li, Jingang Shi, Zhaoqiang Xia, Guoying Zhao
Face anti-spoofing (FAS) plays a vital role in securing face recognition
systems from the presentation attacks (PAs). As more and more realistic PAs
with novel types spring up, it is necessary to develop robust algorithms for
detecting unknown attacks even in unseen scenarios. However, deep models
supervised by traditional binary loss (e.g., `0' for bonafide vs. `1' for PAs)
are weak in describing intrinsic and discriminative spoofing patterns.
Recently, pixel-wise supervision has been proposed for the FAS task, intending
to provide more fine-grained pixel/patch-level cues. In this paper, we firstly
give a comprehensive review and analysis about the existing pixel-wise
supervision methods for FAS. Then we propose a novel pyramid supervision, which
guides deep models to learn both local details and global semantics from
multi-scale spatial context. Extensive experiments are performed on five FAS
benchmark datasets to show that, without bells and whistles, the proposed
pyramid supervision could not only improve the performance beyond existing
pixel-wise supervision frameworks, but also enhance the model's
interpretability (i.e., locating the patch-level positions of PAs more
reasonably). Furthermore, elaborate studies are conducted for exploring the
efficacy of different architecture configurations with two kinds of pixel-wise
supervisions (binary mask and depth map supervisions), which provides
inspirable insights for future architecture/supervision design.
Authors' comments: submitted to IEEE Transactions on Biometrics, Behavior and Identity
Science
Nicolas Nadisic, Jeremy E Cohen, Arnaud Vandaele, Nicolas Gillis
Nonnegative least squares problems with multiple right-hand sides (MNNLS)
arise in models that rely on additive linear combinations. In particular, they
are at the core of most nonnegative matrix factorization algorithms and have
many applications. The nonnegativity constraint is known to naturally favor
sparsity, that is, solutions with few non-zero entries. However, it is often
useful to further enhance this sparsity, as it improves the interpretability of
the results and helps reducing noise, which leads to the sparse MNNLS problem.
In this paper, as opposed to most previous works that enforce sparsity column-
or row-wise, we first introduce a novel formulation for sparse MNNLS, with a
matrix-wise sparsity constraint. Then, we present a two-step algorithm to
tackle this problem. The first step divides sparse MNNLS in subproblems, one
per column of the original problem. It then uses different algorithms to
produce, either exactly or approximately, a Pareto front for each subproblem,
that is, to produce a set of solutions representing different tradeoffs between
reconstruction error and sparsity. The second step selects solutions among
these Pareto fronts in order to build a sparsity-constrained matrix that
minimizes the reconstruction error. We perform experiments on facial and
hyperspectral images, and we show that our proposed two-step approach provides
more accurate results than state-of-the-art sparse coding heuristics applied
both column-wise and globally.
Authors' comments: 25 pages + 18 pages supplementary material. This is the new version
of a work originally called "A Homotopy-based Algorithm for Sparse Multiple
Right-hand Sides Nonnegative Least Squares". Although the central concept is
the same, the paper has been almost completely rewritten
Jonas Spethmann, Martin Grünebohm, Roland Wiesendanger, Kirsten von Bergmann, André Kubetzka
We investigate magnetic domain walls in a single fcc Mn layer on Re(0001)
employing spin-polarized STM, atom manipulation, and spin dynamics simulations.
The low symmetry of the row-wise antiferromagnetic (1Q) state leads to a new
type of domain wall which connects rotational 1Q domains by a transient 2Q
state with characteristic 90$^\circ$ angles between neighboring magnetic
moments. The domain wall properties depend on their orientation and their width
of about 2 nm essentially results from a balance of Heisenberg and higher-order
exchange interactions. Atom manipulation allows domain wall imaging with atomic
spin-resolution, as well as domain wall positioning, and we demonstrate that
the force to move an atom is anisotropic on the 1Q domain.
Authors' comments: 6 pages, 4 figures
Hao Li, Xiaopeng Zhang, Hongkai Xiong
Contrastive learning based on instance discrimination trains model to
discriminate different transformations of the anchor sample from other samples,
which does not consider the semantic similarity among samples. This paper
proposes a new kind of contrastive learning method, named CLIM, which uses
positives from other samples in the dataset. This is achieved by searching
local similar samples of the anchor, and selecting samples that are closer to
the corresponding cluster center, which we denote as center-wise local image
selection. The selected samples are instantiated via an data mixture strategy,
which performs as a smoothing regularization. As a result, CLIM encourages both
local similarity and global aggregation in a robust way, which we find is
beneficial for feature representation. Besides, we introduce
\emph{multi-resolution} augmentation, which enables the representation to be
scale invariant. We reach 75.5% top-1 accuracy with linear evaluation over
ResNet-50, and 59.3% top-1 accuracy when fine-tuned with only 1% labels.
Authors' comments: Accepted by BMVC2021
Qiang Wang, Changliang Li, Yue Zhang, Tong Xiao, Jingbo Zhu
Traditional neural machine translation is limited to the topmost encoder
layer's context representation and cannot directly perceive the lower encoder
layers. Existing solutions usually rely on the adjustment of network
architecture, making the calculation more complicated or introducing additional
structural restrictions. In this work, we propose layer-wise multi-view
learning to solve this problem, circumventing the necessity to change the model
structure. We regard each encoder layer's off-the-shelf output, a by-product in
layer-by-layer encoding, as the redundant view for the input sentence. In this
way, in addition to the topmost encoder layer (referred to as the primary
view), we also incorporate an intermediate encoder layer as the auxiliary view.
We feed the two views to a partially shared decoder to maintain independent
prediction. Consistency regularization based on KL divergence is used to
encourage the two views to learn from each other. Extensive experimental
results on five translation tasks show that our approach yields stable
improvements over multiple strong baselines. As another bonus, our method is
agnostic to network architectures and can maintain the same inference speed as
the original model.
Authors' comments: COLING 2020
Ruizhe Li, Xiao Li, Guanyi Chen, Chenghua Lin
The Variational Autoencoder (VAE) is a popular and powerful model applied to
text modelling to generate diverse sentences. However, an issue known as
posterior collapse (or KL loss vanishing) happens when the VAE is used in text
modelling, where the approximate posterior collapses to the prior, and the
model will totally ignore the latent variables and be degraded to a plain
language model during text generation. Such an issue is particularly prevalent
when RNN-based VAE models are employed for text modelling. In this paper, we
propose a simple, generic architecture called Timestep-Wise Regularisation VAE
(TWR-VAE), which can effectively avoid posterior collapse and can be applied to
any RNN-based VAE models. The effectiveness and versatility of our model are
demonstrated in different tasks, including language modelling and dialogue
response generation.
Authors' comments: Accepted by COLING 2020, final camera ready version
Yiwen Liao, Raphaël Latty, Bin Yang
Feature selection is generally used as one of the most important
preprocessing techniques in machine learning, as it helps to reduce the
dimensionality of data and assists researchers and practitioners in
understanding data. Thereby, by utilizing feature selection, better performance
and reduced computational consumption, memory complexity and even data amount
can be expected. Although there exist approaches leveraging the power of deep
neural networks to carry out feature selection, many of them often suffer from
sensitive hyperparameters. This paper proposes a feature mask module
(FM-module) for feature selection based on a novel batch-wise attenuation and
feature mask normalization. The proposed method is almost free from
hyperparameters and can be easily integrated into common neural networks as an
embedded feature selection method. Experiments on popular image, text and
speech datasets have shown that our approach is easy to use and has superior
performance in comparison with other state-of-the-art deep-learning-based
feature selection methods.
Authors' comments: accepted by IJCNN2021
Trung Trinh, Samuel Kaski, Markus Heinonen
We introduce implicit Bayesian neural networks, a simple and scalable
approach for uncertainty representation in deep learning. Standard Bayesian
approach to deep learning requires the impractical inference of the posterior
distribution over millions of parameters. Instead, we propose to induce a
distribution that captures the uncertainty over neural networks by augmenting
each layer's inputs with latent variables. We present appropriate input
distributions and demonstrate state-of-the-art performance in terms of
calibration, robustness and uncertainty characterisation over large-scale,
multi-million parameter image classification tasks.
Authors' comments: 8 pages
Marc Abeille, Louis Faury, Clément Calauzènes
Logistic Bandits have recently attracted substantial attention, by providing
an uncluttered yet challenging framework for understanding the impact of
non-linearity in parametrized bandits. It was shown by Faury et al. (2020) that
the learning-theoretic difficulties of Logistic Bandits can be embodied by a
large (sometimes prohibitively) problem-dependent constant $\kappa$,
characterizing the magnitude of the reward's non-linearity. In this paper we
introduce a novel algorithm for which we provide a refined analysis. This
allows for a better characterization of the effect of non-linearity and yields
improved problem-dependent guarantees. In most favorable cases this leads to a
regret upper-bound scaling as $\tilde{\mathcal{O}}(d\sqrt{T/\kappa})$, which
dramatically improves over the $\tilde{\mathcal{O}}(d\sqrt{T}+\kappa)$
state-of-the-art guarantees. We prove that this rate is minimax-optimal by
deriving a $\Omega(d\sqrt{T/\kappa})$ problem-dependent lower-bound. Our
analysis identifies two regimes (permanent and transitory) of the regret, which
ultimately re-conciliates Faury et al. (2020) with the Bayesian approach of
Dong et al. (2019). In contrast to previous works, we find that in the
permanent regime non-linearity can dramatically ease the
exploration-exploitation trade-off. While it also impacts the length of the
transitory phase in a problem-dependent fashion, we show that this impact is
mild in most reasonable configurations.
Authors' comments: 40 pages. AISTATS 2021, oral
Sean McBane, Youngsoo Choi
Lattice-type structures can provide a combination of stiffness with light
weight that is desirable in a variety of applications. Design optimization of
these structures must rely on approximations of the governing physics to render
solution of a mathematical model feasible. In this paper, we propose a topology
optimization (TO) formulation that approximates the governing physics using
component-wise reduced order modeling, which can reduce solution time by
multiple orders of magnitude over a full-order finite element model while
providing a relative error in the solution of less than one percent. In
addition, the offline training data set from such component-wise models is
reusable, allowing its application to many design problems for only the cost of
a single offline training phase, and the component-wise method is nearly
embarrassingly parallel. We also show how the parameterization chosen in our
optimization allows a simplification of the component-wise reduced order model
(CWROM) not noted in previous literature, for further speedup of the
optimization process. The sensitivity of the compliance with respect to the
particular parameterization is derived solely in the component level. In
numerical examples, we demonstrate a 1000x speedup over a full-order FEM model
with relative error of less than one percent and show minimum compliance
designs for two different cantilever beam examples, one smaller and one larger.
Finally, error bounds for displacement field, compliance, and compliance
sensitivity of the CWROM are derived.
Authors' comments: 27 pages, 11 figures