Batiste Le Bars, Pierre Humbert, Argyris Kalogeratos, Nicolas Vayatis
This work focuses on the estimation of multiple change-points in a
time-varying Ising model that evolves piece-wise constantly. The aim is to
identify both the moments at which significant changes occur in the Ising
model, as well as the underlying graph structures. For this purpose, we propose
to estimate the neighborhood of each node by maximizing a penalized version of
its conditional log-likelihood. The objective of the penalization is twofold:
it imposes sparsity in the learned graphs and, thanks to a fused-type penalty,
it also enforces them to evolve piece-wise constantly. Using few assumptions,
we provide two change-points consistency theorems. Those are the first in the
context of unknown number of change-points detection in time-varying Ising
model. Finally, experimental results on several synthetic datasets and a
real-world dataset demonstrate the performance of our method.
Authors' comments: 18 pages (9 pages for Appendix), 4 figures, 2 tables
Maike Lorena Stern, Martin Schellenberger
Efficient quality control is inevitable in the manufacturing of
light-emitting diodes (LEDs). Because defective LED chips may be traced back to
different causes, a time and cost-intensive electrical and optical contact
measurement is employed. Fast photoluminescence measurements, on the other
hand, are commonly used to detect wafer separation damages but also hold the
potential to enable an efficient detection of all kinds of defective LED chips.
On a photoluminescence image, every pixel corresponds to an LED chip's
brightness after photoexcitation, revealing performance information. But due to
unevenly distributed brightness values and varying defect patterns,
photoluminescence images are not yet employed for a comprehensive defect
detection. In this work, we show that fully convolutional networks can be used
for chip-wise defect detection, trained on a small data-set of
photoluminescence images. Pixel-wise labels allow us to classify each and every
chip as defective or not. Being measurement-based, labels are easy to procure
and our experiments show that existing discrepancies between training images
and labels do not hinder network training. Using weighted loss calculation, we
were able to equalize our highly unbalanced class categories. Due to the
consistent use of skip connections and residual shortcuts, our network is able
to predict a variety of structures, from extensive defect clusters up to single
defective LED chips.
Authors' comments: 14 pages, 12 figures
Yice Cao, Yan Wu, Peng Zhang, Wenkai Liang, Ming Li
Although complex-valued (CV) neural networks have shown better classification
results compared to their real-valued (RV) counterparts for polarimetric
synthetic aperture radar (PolSAR) classification, the extension of pixel-level
RV networks to the complex domain has not yet thoroughly examined. This paper
presents a novel complex-valued deep fully convolutional neural network
(CV-FCN) designed for PolSAR image classification. Specifically, CV-FCN uses
PolSAR CV data that includes the phase information and utilizes the deep FCN
architecture that performs pixel-level labeling. It integrates the feature
extraction module and the classification module in a united framework.
Technically, for the particularity of PolSAR data, a dedicated complex-valued
weight initialization scheme is defined to initialize CV-FCN. It considers the
distribution of polarization data to conduct CV-FCN training from scratch in an
efficient and fast manner. CV-FCN employs a complex
downsampling-then-upsampling scheme to extract dense features. To enrich
discriminative information, multi-level CV features that retain more
polarization information are extracted via the complex downsampling scheme.
Then, a complex upsampling scheme is proposed to predict dense CV labeling. It
employs complex max-unpooling layers to greatly capture more spatial
information for better robustness to speckle noise. In addition, to achieve
faster convergence and obtain more precise classification results, a novel
average cross-entropy loss function is derived for CV-FCN optimization.
Experiments on real PolSAR datasets demonstrate that CV-FCN achieves better
classification performance than other state-of-art methods.
Authors' comments: 17 pages, 12 figures, first submission on May 20th, 2019
Sawyer Birnbaum, Volodymyr Kuleshov, Zayd Enam, Pang Wei Koh, Stefano Ermon
Learning representations that accurately capture long-range dependencies in
sequential inputs -- including text, audio, and genomic data -- is a key
problem in deep learning. Feed-forward convolutional models capture only
feature interactions within finite receptive fields while recurrent
architectures can be slow and difficult to train due to vanishing gradients.
Here, we propose Temporal Feature-Wise Linear Modulation (TFiLM) -- a novel
architectural component inspired by adaptive batch normalization and its
extensions -- that uses a recurrent neural network to alter the activations of
a convolutional model. This approach expands the receptive field of
convolutional sequence models with minimal computational overhead. Empirically,
we find that TFiLM significantly improves the learning speed and accuracy of
feed-forward neural networks on a range of generative and discriminative
learning tasks, including text classification and audio super-resolution
Authors' comments: Presented at NeurIPS 2019
Betty van Aken, Benjamin Winter, Alexander Löser, Felix A. Gers
Bidirectional Encoder Representations from Transformers (BERT) reach
state-of-the-art results in a variety of Natural Language Processing tasks.
However, understanding of their internal functioning is still insufficient and
unsatisfactory. In order to better understand BERT and other Transformer-based
models, we present a layer-wise analysis of BERT's hidden states. Unlike
previous research, which mainly focuses on explaining Transformer models by
their attention weights, we argue that hidden states contain equally valuable
information. Specifically, our analysis focuses on models fine-tuned on the
task of Question Answering (QA) as an example of a complex downstream task. We
inspect how QA models transform token vectors in order to find the correct
answer. To this end, we apply a set of general and QA-specific probing tasks
that reveal the information stored in each representation layer. Our
qualitative analysis of hidden state visualizations provides additional
insights into BERT's reasoning process. Our results show that the
transformations within BERT go through phases that are related to traditional
pipeline tasks. The system can therefore implicitly incorporate task-specific
information into its token representations. Furthermore, our analysis reveals
that fine-tuning has little impact on the models' semantic abilities and that
prediction errors can be recognized in the vector representations of even early
layers.
Authors' comments: Accepted at CIKM 2019
Haibo Yang, Xin Zhang, Minghong Fang, Jia Liu
In this work, we consider the resilience of distributed algorithms based on stochastic gradient descent (SGD) in distributed learning with potentially Byzantine attackers, who could send arbitrary information to the parameter server to disrupt the training process. Toward this end, we propose a new Lipschitz-inspired coordinate-wise median approach (LICM-SGD) to mitigate Byzantine attacks. We show that our LICM-SGD algorithm can resist up to half of the workers being Byzantine attackers, while still converging almost surely to a stationary region in non-convex settings. Also, our LICM-SGD method does not require any information about the number of attackers and the Lipschitz constant, which makes it attractive for practical implementations. Moreover, our LICM-SGD method enjoys the optimal $O(md)$ computational time-complexity in the sense that the time-complexity is the same as that of the standard SGD under no attacks. We conduct extensive experiments to show that our LICM-SGD algorithm consistently outperforms existing methods in training multi-class logistic regression and convolutional neural networks with MNIST and CIFAR-10 datasets. In our experiments, LICM-SGD also achieves a much faster running time thanks to its low computational time-complexity.
Nathan McDannold, P. Jason White, Rees Cosgrove
This work explored an element-wise approach to model transcranial MRI-guided focused ultrasound (TcMRgFUS) thermal ablation, a noninvasive approach to neurosurgery. Each element of the phased array transducer was simulated individually and could be simultaneously loaded into computer memory, allowing for rapid calculation of the pressure field for different phase offsets used for beam steering and aberration correction. We simulated the pressure distribution for 431 sonications in 32 patients, applied the phase and magnitude values used during treatment, and estimated the resulting temperature rise. We systematically varied the relationship between CT-derived skull density and the acoustic attenuation and sound speed to obtain the best agreement between the predictions and MR temperature imaging (MRTI). The optimization was validated with simulations of 396 sonications from 40 additional treatments. After optimization, the predicted and measured heating agreed well (R2: 0.74 patients 1-32; 0.71 patients 33-72). The dimensions and obliquity of the heating in the simulated temperature maps correlated well with the MRTI (R2: 0.62, 0.74 respectively), but the measured heating was more spatially diffuse. The energy needed to achieve ablation varied by an order of magnitude (3.3-36.1 kJ). While this element-wise approach requires more computation time up front, it can be performed in parallel. It allows for rapid calculation of the three-dimensional heating at the focus for different phase and magnitude values on the array. We also show how this approach can be used to optimize the relationship between CT-derived skull density and acoustic properties. While the relationships found here need further validation in a larger patient population, these results demonstrate the promise of this approach to model TcMRgFUS.
Byeongmoon Ji, Hyemin Jung, Jihyeun Yoon, Kyungyul Kim, Younghak Shin
The prediction reliability of neural networks is important in many applications. Specifically, in safety-critical domains, such as cancer prediction or autonomous driving, a reliable confidence of model's prediction is critical for the interpretation of the results. Modern deep neural networks have achieved a significant improvement in performance for many different image classification tasks. However, these networks tend to be poorly calibrated in terms of output confidence. Temperature scaling is an efficient post-processing-based calibration scheme and obtains well calibrated results. In this study, we leverage the concept of temperature scaling to build a sophisticated bin-wise scaling. Furthermore, we adopt augmentation of validation samples for elaborated scaling. The proposed methods consistently improve calibration performance with various datasets and deep convolutional neural network models.
Peter R. M. Eisenhardt, Federico Marocco, John W. Fowler, Aaron M. Meisner, J. Davy Kirkpatrick, Nelson Garcia, Thomas H. Jarrett, Renata Koontz et al.
CatWISE is a program to catalog sources selected from combined ${\it WISE}$
and ${\it NEOWISE}$ all-sky survey data at 3.4 and 4.6 $\mu$m (W1 and W2). The
CatWISE Preliminary Catalog consists of 900,849,014 sources measured in data
collected from 2010 to 2016. This dataset represents four times as many
exposures and spans over ten times as large a time baseline as that used for
the AllWISE Catalog. CatWISE adapts AllWISE software to measure the sources in
coadded images created from six-month subsets of these data, each representing
one coverage of the inertial sky, or epoch. The catalog includes the measured
motion of sources in 8 epochs over the 6.5 year span of the data. From
comparison to ${\it Spitzer}$, the SNR=5 limits in magnitudes in the Vega
system are W1=17.67 and W2=16.47, compared to W1=16.96 and W2=16.02 for
AllWISE. From comparison to ${\it Gaia}$, CatWISE positions have typical
accuracies of 50 mas for stars at W1=10 mag and 275 mas for stars at W1=15.5
mag. Proper motions have typical accuracies of 10 mas yr$^{-1}$ and 30 mas
yr$^{-1}$ for stars with these brightnesses, an order of magnitude better than
from AllWISE. The catalog is available in the WISE/NEOWISE Enhanced and
Contributed Products area of the NASA/IPAC Infrared Science Archive.
Authors' comments: 53 pages, 20 figures, 5 tables. Accepted by ApJS
Kiru Park, Timothy Patten, Markus Vincze
Estimating the 6D pose of objects using only RGB images remains challenging
because of problems such as occlusion and symmetries. It is also difficult to
construct 3D models with precise texture without expert knowledge or
specialized scanning devices. To address these problems, we propose a novel
pose estimation method, Pix2Pose, that predicts the 3D coordinates of each
object pixel without textured models. An auto-encoder architecture is designed
to estimate the 3D coordinates and expected errors per pixel. These pixel-wise
predictions are then used in multiple stages to form 2D-3D correspondences to
directly compute poses with the PnP algorithm with RANSAC iterations. Our
method is robust to occlusion by leveraging recent achievements in generative
adversarial training to precisely recover occluded parts. Furthermore, a novel
loss function, the transformer loss, is proposed to handle symmetric objects by
guiding predictions to the closest symmetric pose. Evaluations on three
different benchmark datasets containing symmetric and occluded objects show our
method outperforms the state of the art using only RGB images.
Authors' comments: Accepted at ICCV 2019 (Oral)
Jiacheng Chen, Chen Liu, Jiaye Wu, Yasutaka Furukawa
This paper proposes a new approach for automated floorplan reconstruction
from RGBD scans, a major milestone in indoor mapping research. The approach,
dubbed Floor-SP, formulates a novel optimization problem, where room-wise
coordinate descent sequentially solves dynamic programming to optimize the
floorplan graph structure. The objective function consists of data terms guided
by deep neural networks, consistency terms encouraging adjacent rooms to share
corners and walls, and the model complexity term. The approach does not require
corner/edge detection with thresholds, unlike most other methods. We have
evaluated our system on production-quality RGBD scans of 527 apartments or
houses, including many units with non-Manhattan structures. Qualitative and
quantitative evaluations demonstrate a significant performance boost over the
current state-of-the-art. Please refer to our project website
http://jcchen.me/floor-sp/ for code and data.
Authors' comments: 10 pages, 9 figures, accepted to ICCV 2019
Wenbo Gong, Sebastian Tschiatschek, Richard Turner, Sebastian Nowozin, José Miguel Hernández-Lobato, Cheng Zhang
In this paper we introduce the ice-start problem, i.e., the challenge of deploying machine learning models when only little or no training data is initially available, and acquiring each feature element of data is associated with costs. This setting is representative for the real-world machine learning applications. For instance, in the health-care domain, when training an AI system for predicting patient metrics from lab tests, obtaining every single measurement comes with a high cost. Active learning, where only the label is associated with a cost does not apply to such problem, because performing all possible lab tests to acquire a new training datum would be costly, as well as unnecessary due to redundancy. We propose Icebreaker, a principled framework to approach the ice-start problem. Icebreaker uses a full Bayesian Deep Latent Gaussian Model (BELGAM) with a novel inference method. Our proposed method combines recent advances in amortized inference and stochastic gradient MCMC to enable fast and accurate posterior inference. By utilizing BELGAM's ability to fully quantify model uncertainty, we also propose two information acquisition functions for imputation and active prediction problems. We demonstrate that BELGAM performs significantly better than the previous VAE (Variational autoencoder) based models, when the data set size is small, using both machine learning benchmarks and real-world recommender systems and health-care applications. Moreover, based on BELGAM, Icebreaker further improves the performance and demonstrate the ability to use minimum amount of the training data to obtain the highest test time performance.
Chang-You Tai, Meng-Ru Wu, Yun-Wei Chu, Shao-Yu Chu
Recently, researchers utilize Knowledge Graph (KG) as side information in recommendation system to address cold start and sparsity issue and improve the recommendation performance. Existing KG-aware recommendation model use the feature of neighboring entities and structural information to update the embedding of currently located entity. Although the fruitful information is beneficial to the following task, the cost of exploring the entire graph is massive and impractical. In order to reduce the computational cost and maintain the pattern of extracting features, KG-aware recommendation model usually utilize fixed-size and random set of neighbors rather than complete information in KG. Nonetheless, there are two critical issues in these approaches: First of all, fixed-size and randomly selected neighbors restrict the view of graph. In addition, as the order of graph feature increases, the growth of parameter dimensionality of the model may lead the training process hard to converge. To solve the aforementioned limitations, we propose GraphSW, a strategy based on stage-wise training framework which would only access to a subset of the entities in KG in every stage. During the following stages, the learned embedding from previous stages is provided to the network in the next stage and the model can learn the information gradually from the KG. We apply stage-wise training on two SOTA recommendation models, RippleNet and Knowledge Graph Convolutional Networks (KGCN). Moreover, we evaluate the performance on six real world datasets, Last.FM 2011, Book-Crossing,movie, LFM-1b 2015, Amazon-book and Yelp 2018. The result of our experiments shows that proposed strategy can help both models to collect more information from the KG and improve the performance. Furthermore, it is observed that GraphSW can assist KGCN to converge effectively in high-order graph feature.
Ching-An Cheng, Xinyan Yan, Byron Boots
Policy gradient methods have demonstrated success in reinforcement learning tasks that have high-dimensional continuous state and action spaces. However, policy gradient methods are also notoriously sample inefficient. This can be attributed, at least in part, to the high variance in estimating the gradient of the task objective with Monte Carlo methods. Previous research has endeavored to contend with this problem by studying control variates (CVs) that can reduce the variance of estimates without introducing bias, including the early use of baselines, state dependent CVs, and the more recent state-action dependent CVs. In this work, we analyze the properties and drawbacks of previous CV techniques and, surprisingly, we find that these works have overlooked an important fact that Monte Carlo gradient estimates are generated by trajectories of states and actions. We show that ignoring the correlation across the trajectories can result in suboptimal variance reduction, and we propose a simple fix: a class of "trajectory-wise" CVs, that can further drive down the variance. We show that constructing trajectory-wise CVs can be done recursively and requires only learning state-action value functions like the previous CVs for policy gradient. We further prove that the proposed trajectory-wise CVs are optimal for variance reduction under reasonable assumptions.
Brian Kenji Iwana, Ryohei Kuroki, Seiichi Uchida
Convolutional Neural Networks (CNN) have become state-of-the-art in the field
of image classification. However, not everything is understood about their
inner representations. This paper tackles the interpretability and
explainability of the predictions of CNNs for multi-class classification
problems. Specifically, we propose a novel visualization method of pixel-wise
input attribution called Softmax-Gradient Layer-wise Relevance Propagation
(SGLRP). The proposed model is a class discriminate extension to Deep Taylor
Decomposition (DTD) using the gradient of softmax to back propagate the
relevance of the output probability to the input image. Through qualitative and
quantitative analysis, we demonstrate that SGLRP can successfully localize and
attribute the regions on input images which contribute to a target object's
classification. We show that the proposed method excels at discriminating the
target objects class from the other possible objects in the images. We confirm
that SGLRP performs better than existing Layer-wise Relevance Propagation (LRP)
based methods and can help in the understanding of the decision process of
CNNs.
Authors' comments: Published at ICCV 2019 Workshops
Mizuho Uchiyama, Kohei Ichikawa
We systematically investigate the mid-infrared (MIR; $\lambda>3 ~\mu$m) time
variability of uniformly selected $\sim800$ massive young stellar objects
(MYSOs) from the Red MSX Source (RMS) survey. Out of the 806 sources, we obtain
reliable 9-year-long MIR magnitude variability data of 331 sources at the
3.4~$\mu$m (W1) and 4.6~$\mu$m (W2) bands by cross-matching the MYSO positions
with ALLWISE and NEOWISE catalogs. After applying the variability selections
using ALLWISE data, we identify 5 MIR-variable candidates. The light curves
show various classes, with the periodic, plateau-like, and dipper features. Out
of the obtained two color-magnitude diagram of W1 and W1$-$W2, one shows "bluer
when brighter and redder when fainter" trends in variability, suggesting change
in extinction or accretion rate. Finally, our results show that
G335.9960$-$00.8532 (hereafter, G335) has a periodic light curve, with a
$\approx 690$-day cycle. Spectral energy density model fitting results indicate
that G335 is a relatively evolved MYSO; thus, we may be witnessing the very
early stages of a hyper- or ultra-compact HII region, a key source for
understanding MYSO evolution.
Authors' comments: 12 pages, 5 figures, 3 tables. Accepted by ApJ
S. K. Leggett, Trent J. Dupuy, Caroline V. Morley, Mark S. Marley, William M. J. Best, Michael C. Liu, D. Apai, S. L. Casewell et al.
Half of the energy emitted by late-T- and Y-type brown dwarfs emerges at 3.5
< lambda um < 5.5. We present new L' (3.43 < lambda um < 4.11) photometry
obtained at the Gemini North telescope for nine late-T and Y dwarfs, and
synthesize L' from spectra for an additional two dwarfs. The targets include
two binary systems which were imaged at a resolution of 0.25". One of these,
WISEP J045853.90+643452.6AB, shows significant motion, and we present an
astrometric analysis of the binary using Hubble Space Telescope, Keck Adaptive
Optics, and Gemini images. We compare lambda ~4um observations to models, and
find that the model fluxes are too low for brown dwarfs cooler than ~700K. The
discrepancy increases with decreasing temperature, and is a factor of ~2 at
T_eff=500K and ~4 at T_eff=400K. Warming the upper layers of a model atmosphere
generates a spectrum closer to what is observed. The thermal structure of cool
brown dwarf atmospheres above the radiative-convective boundary may not be
adequately modelled using pure radiative equilibrium; instead heat may be
introduced by thermochemical instabilities (previously suggested for the L- to
T-type transition) or by breaking gravity waves (previously suggested for the
solar system giant planets). One-dimensional models may not capture these
atmospheres, which likely have both horizontal and vertical
pressure/temperature variations.
Authors' comments: Accepted for publication in The Astrophysical Journal, July 17 2019.
This revision includes changes to the Appendix only. Additional new W1
photometry is given and Figure 11 is updated; Figure 13 contained erroneous
data and is corrected; Figure 14 is new and shows the new relationships
derived for transforming from ground-based L or M magnitudes to Spitzer [3.6]
and [4.5]
Yanyuet Man, Xiangyun Ding, Xingcheng Yao, Han Bao
Throughout the world, breast cancer is one of the leading causes of female
death. Recently, deep learning methods are developed to automatically grade
breast cancer of histological slides. However, the performance of existing deep
learning models is limited due to the lack of large annotated biomedical
datasets. One promising way to relieve the annotating burden is to leverage the
unannotated datasets to enhance the trained model. In this paper, we first
apply active learning method in breast cancer grading, and propose a
semi-supervised framework based on expectation maximization (EM) model. The
proposed EM approach is based on the collaborative filtering among the
annotated and unannotated datasets. The collaborative filtering method
effectively extracts useful and credible datasets from the unannotated images.
Results of pixel-wise prediction of whole-slide images (WSI) demonstrate that
the proposed method not only outperforms state-of-art methods, but also
significantly reduces the annotation cost by over 70%.
Authors' comments: The author list and contents of this paper is not complete. Other
authors request to withdraw this paper
Maen Alzubi, Szilvester Kovács
Fuzzy Rule Interpolation (FRI) reasoning methods have been introduced to address sparse fuzzy rule bases and reduce complexity. The first FRI method was the Koczy and Hirota (KH) proposed "Linear Interpolation". Besides, several conditions and criteria have been suggested for unifying the common requirements FRI methods have to satisfy. One of the most conditions is restricted the fuzzy set of the conclusion must preserve a Piece-Wise Linearity (PWL) if all antecedents and consequents of the fuzzy rules are preserving on PWL sets at {\alpha}-cut levels. The KH FRI is one of FRI methods which cannot satisfy this condition. Therefore, the goal of this paper is to investigate equations and notations related to PWL property, which is aimed to highlight the problematic properties of the KH FRI method to prove its efficiency with PWL condition. In addition, this paper is focusing on constructing benchmark examples to be a baseline for testing other FRI methods against situations that are not satisfied with the linearity condition for KH FRI.
Mike Raeini
Human beings have been generating data since very long times ago. We ask the
following common-sense and wise questions (WizQuestions):
1. Why do we refer to some pieces of data more often than referring to other
pieces? 2. What does make those commonly-referred pieces of data so unique and
different? 3. What are the characteristics of data that sometimes make the data
so unique and different?
In this article, we introduce a novel approach (model) that helps us answer
these questions from data science and network science perspectives. WizWordily
speaking, our proposed approach enables us to model the data (as a network),
measure the quality of data, and study the network of data deeply and
thoroughly.
Authors' comments: 16 pages, 3 figures