Xiang Gao, Wei Hu, Guo-Jun Qi
Recent advances in Graph Convolutional Neural Networks (GCNNs) have shown their efficiency for non-Euclidean data on graphs, which often require a large amount of labeled data with high cost. It it thus critical to learn graph feature representations in an unsupervised manner in practice. To this end, we propose a novel unsupervised learning of Graph Transformation Equivariant Representations (GraphTER), aiming to capture intrinsic patterns of graph structure under both global and local transformations. Specifically, we allow to sample different groups of nodes from a graph and then transform them node-wise isotropically or anisotropically. Then, we self-train a representation encoder to capture the graph structures by reconstructing these node-wise transformations from the feature representations of the original and transformed graphs. In experiments, we apply the learned GraphTER to graphs of 3D point cloud data, and results on point cloud segmentation/classification show that GraphTER significantly outperforms state-of-the-art unsupervised approaches and pushes greatly closer towards the upper bound set by the fully supervised counterparts. The code is available at: https://github.com/gyshgx868/graph-ter.
Rafael Díaz Hernández Rojas, Giorgio Parisi, Federico Ricci-Tersenghi
Jamming is a phenomenon shared by a wide variety of systems, such as granular
materials, foams, and glasses in their high density regime. This has motivated
the development of a theoretical framework capable of explaining many of their
static critical properties with a unified approach. However the dynamics
occurring in the vicinity of the jamming point has received little attention
and the problem of finding a connection with the local structure of the
configuration remains unexplored. Here we address this issue by constructing
physically well defined structural variables using the information contained in
the network of contacts of jammed configurations, and then showing that such
variables yield a resilient statistical description of the particle-wise
dynamics near this critical point. Our results are based on extensive numerical
simulations of systems of spherical particles that allow us to statistically
characterize the trajectories of individual particles in terms of their first
two moments. We first demonstrate that, besides displaying a broad distribution
of mobilities, particles may also have preferential directions of motion. Next,
we associate each of these features with a structural variable computed
uniquely in terms of the contact vectors at jamming, obtaining considerably
high statistical correlations. The robustness of our approach is confirmed by
testing two types of dynamical protocols, namely Molecular Dynamics and Monte
Carlo, with different types of interaction. We also provide evidence that the
dynamical regime we study here is dominated by anharmonic effects and therefore
it cannot be described properly in terms of vibrational modes. Finally, we show
that correlations decay slowly and in an interaction-independent fashion,
suggesting a universal rate of information loss.
Authors' comments: Same as published version; better figures placement
Mengzhuo Guo, Zhongzhi Xu, Qingpeng Zhang, Xiuwu Liao, Jiapeng Liu
Ordinal regression predicts the objects' labels that exhibit a natural ordering, which is important to many managerial problems such as credit scoring and clinical diagnosis. In these problems, the ability to explain how the attributes affect the prediction is critical to users. However, most, if not all, existing ordinal regression models simplify such explanation in the form of constant coefficients for the main and interaction effects of individual attributes. Such explanation cannot characterize the contributions of attributes at different value scales. To address this challenge, we propose a new explainable ordinal regression model, namely, the Explainable Ordinal Factorization Model (XOFM). XOFM uses the piece-wise linear functions to approximate the actual contributions of individual attributes and their interactions. Moreover, XOFM introduces a novel ordinal transformation process to assign each object the probabilities of belonging to multiple relevant classes, instead of fixing boundaries to differentiate classes. XOFM is based on the Factorization Machines to handle the potential sparsity problem as a result of discretizing the attribute scales. Comprehensive experiments with benchmark datasets and baseline models demonstrate that the proposed XOFM exhibits superior explainability and leads to state-of-the-art prediction accuracy.
Yoshiki Toba, Satoshi Yamada, Yoshihiro Ueda, Claudio Ricci, Yuichi Terashima, Tohru Nagao, Wei-Hao Wang, Atsushi Tanimoto et al.
We report the discovery of a Compton-thick (CT) dust-obscured galaxy (DOG) at
$z$ = 0.89, WISE J082501.48+300257.2 (WISE0825+3002), observed by Nuclear
Spectroscopic Telescope Array (NuSTAR). X-ray analysis with the XCLUMPY model
revealed that hard X-ray luminosity in the rest-frame 2-10 keV band of
WISE0825+3002 is $L_{\rm X}$ (2-10 keV) = $4.2^{+2.8}_{-1.6} \times 10^{44}$
erg s$^{-1}$ while its hydrogen column density is $N_{\rm H}$ =
$1.0^{+0.8}_{-0.4} \times 10^{24}$ cm$^{-2}$, indicating that WISE0825+3002 is
a mildly CT active galactic nucleus (AGN). We performed the spectral energy
distribution (SED) fitting with CIGALE to derive its stellar mass, star
formation rate, and infrared luminosity. The estimated Eddington ratio based on
stellar mass and integration of the best-fit SED of AGN component is
$\lambda_{\rm Edd}$ = 0.70, which suggests that WISE0825+3002 harbors an
actively growing black hole behind a large amount of gas and dust. We found
that the relationship between luminosity ratio of X-ray and 6 $\mu$m, and
Eddington ratio follows an empirical relation for AGNs reported by Toba et al.
(2019a).
Authors' comments: 10 pages, 7 figures, and 2 tables, accepted for publication in ApJ
Yisheng He, Wei Sun, Haibin Huang, Jianran Liu, Haoqiang Fan, Jian Sun
In this work, we present a novel data-driven method for robust 6DoF object
pose estimation from a single RGBD image. Unlike previous methods that directly
regressing pose parameters, we tackle this challenging task with a
keypoint-based approach. Specifically, we propose a deep Hough voting network
to detect 3D keypoints of objects and then estimate the 6D pose parameters
within a least-squares fitting manner. Our method is a natural extension of
2D-keypoint approaches that successfully work on RGB based 6DoF estimation. It
allows us to fully utilize the geometric constraint of rigid objects with the
extra depth information and is easy for a network to learn and optimize.
Extensive experiments were conducted to demonstrate the effectiveness of
3D-keypoint detection in the 6D pose estimation task. Experimental results also
show our method outperforms the state-of-the-art methods by large margins on
several benchmarks. Code and video are available at
https://github.com/ethnhe/PVN3D.git.
Authors' comments: Accepted to Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, 2020. (CVPR 2020)
Anis Elgabli, Jihong Park, Sabbir Ahmed, Mehdi Bennis
This article proposes a communication-efficient decentralized deep learning
algorithm, coined layer-wise federated group ADMM (L-FGADMM). To minimize an
empirical risk, every worker in L-FGADMM periodically communicates with two
neighbors, in which the periods are separately adjusted for different layers of
its deep neural network. A constrained optimization problem for this setting is
formulated and solved using the stochastic version of GADMM proposed in our
prior work. Numerical evaluations show that by less frequently exchanging the
largest layer, L-FGADMM can significantly reduce the communication cost,
without compromising the convergence speed. Surprisingly, despite less
exchanged information and decentralized operations, intermittently skipping the
largest layer consensus in L-FGADMM creates a regularizing effect, thereby
achieving the test accuracy as high as federated learning (FL), a baseline
method with the entire layer consensus by the aid of a central entity.
Authors' comments: 6 pages; 4 figures; presented at IEEE WCNC'2020
Haoming Jiang, Chen Liang, Chong Wang, Tuo Zhao
Many multi-domain neural machine translation (NMT) models achieve knowledge transfer by enforcing one encoder to learn shared embedding across domains. However, this design lacks adaptation to individual domains. To overcome this limitation, we propose a novel multi-domain NMT model using individual modules for each domain, on which we apply word-level, adaptive and layer-wise domain mixing. We first observe that words in a sentence are often related to multiple domains. Hence, we assume each word has a domain proportion, which indicates its domain preference. Then word representations are obtained by mixing their embedding in individual domains based on their domain proportions. We show this can be achieved by carefully designing multi-head dot-product attention modules for different domains, and eventually taking weighted averages of their parameters by word-level layer-wise domain proportions. Through this, we can achieve effective domain knowledge sharing, and capture fine-grained domain-specific knowledge as well. Our experiments show that our proposed model outperforms existing ones in several NMT tasks.
Ahmed Ben Saad, Youssef Tamaazousti, Josselin Kherroubi, Alexis He
We tackle the problem of texture inpainting where the input images are textures with missing values along with masks that indicate the zones that should be generated. Many works have been done in image inpainting with the aim to achieve global and local consistency. But these works still suffer from limitations when dealing with textures. In fact, the local information in the image to be completed needs to be used in order to achieve local continuities and visually realistic texture inpainting. For this, we propose a new segmentor discriminator that performs a patch-wise real/fake classification and is supervised by input masks. During training, it aims to locate the fake and thus backpropagates consistent signal to the generator. We tested our approach on the publicly available DTD dataset and showed that it achieves state-of-the-art performances and better deals with local consistency than existing methods.
Dino Ienco, Roberto Interdonato, Raffaele Gaetano
Recurrent Neural Networks (RNNs) can be seriously impacted by the initial parameters assignment, which may result in poor generalization performances on new unseen data. With the objective to tackle this crucial issue, in the context of RNN based classification, we propose a new supervised layer-wise pretraining strategy to initialize network parameters. The proposed approach leverages a data-aware strategy that sets up a taxonomy of classification problems automatically derived by the model behavior. To the best of our knowledge, despite the great interest in RNN-based classification, this is the first data-aware strategy dealing with the initialization of such models. The proposed strategy has been tested on four benchmarks coming from two different domains, i.e., Speech Recognition and Remote Sensing. Results underline the significance of our approach and point out that data-aware strategies positively support the initialization of Recurrent Neural Network based classification models.
Zhirui Chen, Jianheng Li, Wei-Shi Zheng
The scalability problem caused by the difficulty in annotating Person Re-identification(Re-ID) datasets has become a crucial bottleneck in the development of Re-ID.To address this problem, many unsupervised Re-ID methods have recently been proposed.Nevertheless, most of these models require transfer from another auxiliary fully supervised dataset, which is still expensive to obtain.In this work, we propose a Re-ID model based on Weakly Supervised Tracklets(WST) data from various camera views, which can be inexpensively acquired by combining the fragmented tracklets of the same person in the same camera view over a period of time.We formulate our weakly supervised tracklets Re-ID model by a novel method, named deep feature-wise mutual learning(DFML), which consists of Mutual Learning on Feature Extractors (MLFE) and Mutual Learning on Feature Classifiers (MLFC).We propose MLFE by leveraging two feature extractors to learn from each other to extract more robust and discriminative features.On the other hand, we propose MLFC by adapting discriminative features from various camera views to each classifier. Extensive experiments demonstrate the superiority of our proposed DFML over the state-of-the-art unsupervised models and even some supervised models on three Re-ID benchmark datasets.
Adam K. Leroy, Karin M. Sandstrom, Dustin Lang, Alexia Lewis, Samir Salim, Erica A. Behrens, Jérémy Chastenet, I-Da Chiang et al.
We present an atlas of ultraviolet and infrared images of ~15,750 local (d <
50 Mpc) galaxies, as observed by NASA's WISE and GALEX missions. These maps
have matched resolution (FWHM 7.5'' and 15''), matched astrometry, and a common
procedure for background removal. We demonstrate that they agree well with
resolved intensity measurements and integrated photometry from previous
surveys. This atlas represents the first part of a program (the z=0
Multi-wavelength Galaxy Synthesis) to create a large, uniform database of
resolved measurements of gas and dust in nearby galaxies. The images and
associated catalogs are publicly available at the NASA/IPAC Infrared Science
Archive. This atlas allows us estimate local and integrated star formation
rates (SFRs) and stellar masses (M$_\star$) across the local galaxy population
in a uniform way. In the appendix, we use the population synthesis fits of
Salim et al. (2016, 2018) to calibrate integrated M$_\star$ and SFR estimators
based on GALEX and WISE. Because they leverage an SDSS-base training set of
>100,000 galaxies, these calibrations have high precision and allow us to
rigorously compare local galaxies to Sloan Digital Sky Survey results. We
provide these SFR and M$_\star$ estimates for all galaxies in our sample and
show that our results yield a "main sequence" of star forming galaxies
comparable to previous work. We also show the distribution of intensities from
resolved galaxies in NUV-to-WISE1 vs. WISE1-to-WISE3 space, which captures much
of the key physics accessed by these bands.
Authors' comments: 46 pages, 27 figures, published in ApJS
(https://ui.adsabs.harvard.edu/abs/2019ApJS..244...24L/abstract ). See that
version for full resolution figures and machine readable tables. Go download
data for your favorite nearby galaxy here:
https://irsa.ipac.caltech.edu/data/WISE/z0MGS/overview.html . The appendix
presents detailed analysis of translations to physical quantities
J. Chae, S. -N. Hong
We propose a novel greedy algorithm for the support recovery of a sparse signal from a small number of noisy measurements. In the proposed method, a new support index is identified for each iteration based on bit-wise maximum a posteriori (B-MAP) detection. This is optimal in the sense of detecting one of the remaining support indices, provided that all the detected indices in the previous iterations are correct. Despite its optimality, it requires an expensive complexity for computing the maximization metric (i.e., a posteriori probability of each remaining support) due to the marginalization of high-dimensional sparse vector. We address this problem by presenting a good proxy (named B-MAP proxy) on the maximization metric which is accurate enough to find the maximum index, rather than an exact probability, Moreover, it is easily evaluated only using vector correlations as in orthogonal matching pursuit (OMP), but the use completely different proxy matrices for maximization. We demonstrate that the proposed B-MAP detection provides a significant gain compared with the existing methods as OMP and MAP-OMP, having the same complexity. Subsequently, we construct the advanced greedy algorithms, based on B-MAP proxy, by leveraging the idea of compressive sampling matching pursuit (CoSaMP) and subspace pursuit (SP). Via simulations, we show that the proposed method outperforms also OMP and MAP-OMP under the frameworks of the advanced greedy algorithms.
Mina Basirat, Peter M. Roth
Deep neural networks paved the way for significant improvements in image
visual categorization during the last years. However, even though the tasks are
highly varying, differing in complexity and difficulty, existing solutions
mostly build on the same architectural decisions. This also applies to the
selection of activation functions (AFs), where most approaches build on
Rectified Linear Units (ReLUs). In this paper, however, we show that the choice
of a proper AF has a significant impact on the classification accuracy, in
particular, if fine, subtle details are of relevance. Therefore, we propose to
model the degree of absence and the presence of features via the AF by using
piece-wise linear functions, which we refer to as L*ReLU. In this way, we can
ensure the required properties, while still inheriting the benefits in terms of
computational efficiency from ReLUs. We demonstrate our approach for the task
of Fine-grained Visual Categorization (FGVC), running experiments on seven
different benchmark datasets. The results do not only demonstrate superior
results but also that for different tasks, having different characteristics,
different AFs are selected.
Authors' comments: Accepted: Winter Conference on Applications of Computer Vision (WACV)
2020
T. H. Jarrett, M. E. Cluver, M. J. I. Brown, D. A. Dale, C. W. Tsai, F. Masci
We present mid-infrared photometry and measured global properties of the 100
largest galaxies in the sky, including the Magellanic Clouds, Local Group
galaxies M31 and M33, the Fornax and Virgo Galaxy Cluster giants, and many of
the most spectacular Messier objects (e.g., M51 and M83). This is the first
release of a larger catalog of extended sources as imaged in the mid-infrared,
called the WISE Extended Source Catalogue (WXSC). In this study we measure
their global attributes, including integrated flux, surface brightness and
radial distribution. The largest of the large are the LMC, SMC and the
Andromeda Galaxy, which are also the brightest mid-infrared galaxies in the
sky. We interrogate the large galaxies using WISE colors, which serve as
proxies for four general types of galaxies: bulge-dominated spheroidals,
intermediate semi-quiescent disks, star-forming spirals, and AGN-dominated. The
colors reveal a tight "sequence" that spans 5 magnitudes in W2-W3 color,
ranging from early to late-types, and low to high star-forming activity; we fit
the functional form given by: ${\rm (W1-W2)} = [0.015 \times {\rm e}^{
\frac{{\rm (W2-W3)}}{1.38} }] - 0.08$. Departures from this sequence may reveal
nuclear, starburst, and merging events. Physical properties and luminosity
attributes are computed, notably the diameter, aggregate stellar mass and the
dust-obscured star formation activity. We introduce the 'pinwheel' diagram
which depicts physical properties with respect to the median value observed for
WISE galaxies in the local universe. Utilized with the WXSC, this diagram will
delineate between different kinds of galaxies, identifying those with similar
star formation and structural properties. Finally, we present the mid-infrared
photometry of the 25 brightest globular clusters in the sky, including Omega
Centauri, 47 Tucanae and a number of famed night-sky targets (e.g. M 13).
(Abridged)
Authors' comments: 45 pages, 25 figures, 6 tables. Accepted for publication in ApJS.
High quality graphics, tables and ancillary material are available at the
following URL: https://vislab.idia.ac.za/research
Yihui He, Jianing Qian, Jianren Wang, Cindy X. Le, Congrui Hetang, Qi Lyu, Wenping Wang, Tianwei Yue
Very deep convolutional neural networks (CNNs) have been firmly established as the primary methods for many computer vision tasks. However, most state-of-the-art CNNs are large, which results in high inference latency. Recently, depth-wise separable convolution has been proposed for image recognition tasks on computationally limited platforms such as robotics and self-driving cars. Though it is much faster than its counterpart, regular convolution, accuracy is sacrificed. In this paper, we propose a novel decomposition approach based on SVD, namely depth-wise decomposition, for expanding regular convolutions into depthwise separable convolutions while maintaining high accuracy. We show our approach can be further generalized to the multi-channel and multi-layer cases, based on Generalized Singular Value Decomposition (GSVD) [59]. We conduct thorough experiments with the latest ShuffleNet V2 model [47] on both random synthesized dataset and a large-scale image recognition dataset: ImageNet [10]. Our approach outperforms channel decomposition [73] on all datasets. More importantly, our approach improves the Top-1 accuracy of ShuffleNet V2 by ~2%.
Batiste Le Bars, Pierre Humbert, Argyris Kalogeratos, Nicolas Vayatis
This work focuses on the estimation of multiple change-points in a
time-varying Ising model that evolves piece-wise constantly. The aim is to
identify both the moments at which significant changes occur in the Ising
model, as well as the underlying graph structures. For this purpose, we propose
to estimate the neighborhood of each node by maximizing a penalized version of
its conditional log-likelihood. The objective of the penalization is twofold:
it imposes sparsity in the learned graphs and, thanks to a fused-type penalty,
it also enforces them to evolve piece-wise constantly. Using few assumptions,
we provide two change-points consistency theorems. Those are the first in the
context of unknown number of change-points detection in time-varying Ising
model. Finally, experimental results on several synthetic datasets and a
real-world dataset demonstrate the performance of our method.
Authors' comments: 18 pages (9 pages for Appendix), 4 figures, 2 tables
Maike Lorena Stern, Martin Schellenberger
Efficient quality control is inevitable in the manufacturing of
light-emitting diodes (LEDs). Because defective LED chips may be traced back to
different causes, a time and cost-intensive electrical and optical contact
measurement is employed. Fast photoluminescence measurements, on the other
hand, are commonly used to detect wafer separation damages but also hold the
potential to enable an efficient detection of all kinds of defective LED chips.
On a photoluminescence image, every pixel corresponds to an LED chip's
brightness after photoexcitation, revealing performance information. But due to
unevenly distributed brightness values and varying defect patterns,
photoluminescence images are not yet employed for a comprehensive defect
detection. In this work, we show that fully convolutional networks can be used
for chip-wise defect detection, trained on a small data-set of
photoluminescence images. Pixel-wise labels allow us to classify each and every
chip as defective or not. Being measurement-based, labels are easy to procure
and our experiments show that existing discrepancies between training images
and labels do not hinder network training. Using weighted loss calculation, we
were able to equalize our highly unbalanced class categories. Due to the
consistent use of skip connections and residual shortcuts, our network is able
to predict a variety of structures, from extensive defect clusters up to single
defective LED chips.
Authors' comments: 14 pages, 12 figures
Yice Cao, Yan Wu, Peng Zhang, Wenkai Liang, Ming Li
Although complex-valued (CV) neural networks have shown better classification
results compared to their real-valued (RV) counterparts for polarimetric
synthetic aperture radar (PolSAR) classification, the extension of pixel-level
RV networks to the complex domain has not yet thoroughly examined. This paper
presents a novel complex-valued deep fully convolutional neural network
(CV-FCN) designed for PolSAR image classification. Specifically, CV-FCN uses
PolSAR CV data that includes the phase information and utilizes the deep FCN
architecture that performs pixel-level labeling. It integrates the feature
extraction module and the classification module in a united framework.
Technically, for the particularity of PolSAR data, a dedicated complex-valued
weight initialization scheme is defined to initialize CV-FCN. It considers the
distribution of polarization data to conduct CV-FCN training from scratch in an
efficient and fast manner. CV-FCN employs a complex
downsampling-then-upsampling scheme to extract dense features. To enrich
discriminative information, multi-level CV features that retain more
polarization information are extracted via the complex downsampling scheme.
Then, a complex upsampling scheme is proposed to predict dense CV labeling. It
employs complex max-unpooling layers to greatly capture more spatial
information for better robustness to speckle noise. In addition, to achieve
faster convergence and obtain more precise classification results, a novel
average cross-entropy loss function is derived for CV-FCN optimization.
Experiments on real PolSAR datasets demonstrate that CV-FCN achieves better
classification performance than other state-of-art methods.
Authors' comments: 17 pages, 12 figures, first submission on May 20th, 2019
Sawyer Birnbaum, Volodymyr Kuleshov, Zayd Enam, Pang Wei Koh, Stefano Ermon
Learning representations that accurately capture long-range dependencies in
sequential inputs -- including text, audio, and genomic data -- is a key
problem in deep learning. Feed-forward convolutional models capture only
feature interactions within finite receptive fields while recurrent
architectures can be slow and difficult to train due to vanishing gradients.
Here, we propose Temporal Feature-Wise Linear Modulation (TFiLM) -- a novel
architectural component inspired by adaptive batch normalization and its
extensions -- that uses a recurrent neural network to alter the activations of
a convolutional model. This approach expands the receptive field of
convolutional sequence models with minimal computational overhead. Empirically,
we find that TFiLM significantly improves the learning speed and accuracy of
feed-forward neural networks on a range of generative and discriminative
learning tasks, including text classification and audio super-resolution
Authors' comments: Presented at NeurIPS 2019
Betty van Aken, Benjamin Winter, Alexander Löser, Felix A. Gers
Bidirectional Encoder Representations from Transformers (BERT) reach
state-of-the-art results in a variety of Natural Language Processing tasks.
However, understanding of their internal functioning is still insufficient and
unsatisfactory. In order to better understand BERT and other Transformer-based
models, we present a layer-wise analysis of BERT's hidden states. Unlike
previous research, which mainly focuses on explaining Transformer models by
their attention weights, we argue that hidden states contain equally valuable
information. Specifically, our analysis focuses on models fine-tuned on the
task of Question Answering (QA) as an example of a complex downstream task. We
inspect how QA models transform token vectors in order to find the correct
answer. To this end, we apply a set of general and QA-specific probing tasks
that reveal the information stored in each representation layer. Our
qualitative analysis of hidden state visualizations provides additional
insights into BERT's reasoning process. Our results show that the
transformations within BERT go through phases that are related to traditional
pipeline tasks. The system can therefore implicitly incorporate task-specific
information into its token representations. Furthermore, our analysis reveals
that fine-tuning has little impact on the models' semantic abilities and that
prediction errors can be recognized in the vector representations of even early
layers.
Authors' comments: Accepted at CIKM 2019