Haibo Yang, Xin Zhang, Minghong Fang, Jia Liu
In this work, we consider the resilience of distributed algorithms based on stochastic gradient descent (SGD) in distributed learning with potentially Byzantine attackers, who could send arbitrary information to the parameter server to disrupt the training process. Toward this end, we propose a new Lipschitz-inspired coordinate-wise median approach (LICM-SGD) to mitigate Byzantine attacks. We show that our LICM-SGD algorithm can resist up to half of the workers being Byzantine attackers, while still converging almost surely to a stationary region in non-convex settings. Also, our LICM-SGD method does not require any information about the number of attackers and the Lipschitz constant, which makes it attractive for practical implementations. Moreover, our LICM-SGD method enjoys the optimal $O(md)$ computational time-complexity in the sense that the time-complexity is the same as that of the standard SGD under no attacks. We conduct extensive experiments to show that our LICM-SGD algorithm consistently outperforms existing methods in training multi-class logistic regression and convolutional neural networks with MNIST and CIFAR-10 datasets. In our experiments, LICM-SGD also achieves a much faster running time thanks to its low computational time-complexity.
Nathan McDannold, P. Jason White, Rees Cosgrove
This work explored an element-wise approach to model transcranial MRI-guided focused ultrasound (TcMRgFUS) thermal ablation, a noninvasive approach to neurosurgery. Each element of the phased array transducer was simulated individually and could be simultaneously loaded into computer memory, allowing for rapid calculation of the pressure field for different phase offsets used for beam steering and aberration correction. We simulated the pressure distribution for 431 sonications in 32 patients, applied the phase and magnitude values used during treatment, and estimated the resulting temperature rise. We systematically varied the relationship between CT-derived skull density and the acoustic attenuation and sound speed to obtain the best agreement between the predictions and MR temperature imaging (MRTI). The optimization was validated with simulations of 396 sonications from 40 additional treatments. After optimization, the predicted and measured heating agreed well (R2: 0.74 patients 1-32; 0.71 patients 33-72). The dimensions and obliquity of the heating in the simulated temperature maps correlated well with the MRTI (R2: 0.62, 0.74 respectively), but the measured heating was more spatially diffuse. The energy needed to achieve ablation varied by an order of magnitude (3.3-36.1 kJ). While this element-wise approach requires more computation time up front, it can be performed in parallel. It allows for rapid calculation of the three-dimensional heating at the focus for different phase and magnitude values on the array. We also show how this approach can be used to optimize the relationship between CT-derived skull density and acoustic properties. While the relationships found here need further validation in a larger patient population, these results demonstrate the promise of this approach to model TcMRgFUS.
Byeongmoon Ji, Hyemin Jung, Jihyeun Yoon, Kyungyul Kim, Younghak Shin
The prediction reliability of neural networks is important in many applications. Specifically, in safety-critical domains, such as cancer prediction or autonomous driving, a reliable confidence of model's prediction is critical for the interpretation of the results. Modern deep neural networks have achieved a significant improvement in performance for many different image classification tasks. However, these networks tend to be poorly calibrated in terms of output confidence. Temperature scaling is an efficient post-processing-based calibration scheme and obtains well calibrated results. In this study, we leverage the concept of temperature scaling to build a sophisticated bin-wise scaling. Furthermore, we adopt augmentation of validation samples for elaborated scaling. The proposed methods consistently improve calibration performance with various datasets and deep convolutional neural network models.
Peter R. M. Eisenhardt, Federico Marocco, John W. Fowler, Aaron M. Meisner, J. Davy Kirkpatrick, Nelson Garcia, Thomas H. Jarrett, Renata Koontz et al.
CatWISE is a program to catalog sources selected from combined ${\it WISE}$
and ${\it NEOWISE}$ all-sky survey data at 3.4 and 4.6 $\mu$m (W1 and W2). The
CatWISE Preliminary Catalog consists of 900,849,014 sources measured in data
collected from 2010 to 2016. This dataset represents four times as many
exposures and spans over ten times as large a time baseline as that used for
the AllWISE Catalog. CatWISE adapts AllWISE software to measure the sources in
coadded images created from six-month subsets of these data, each representing
one coverage of the inertial sky, or epoch. The catalog includes the measured
motion of sources in 8 epochs over the 6.5 year span of the data. From
comparison to ${\it Spitzer}$, the SNR=5 limits in magnitudes in the Vega
system are W1=17.67 and W2=16.47, compared to W1=16.96 and W2=16.02 for
AllWISE. From comparison to ${\it Gaia}$, CatWISE positions have typical
accuracies of 50 mas for stars at W1=10 mag and 275 mas for stars at W1=15.5
mag. Proper motions have typical accuracies of 10 mas yr$^{-1}$ and 30 mas
yr$^{-1}$ for stars with these brightnesses, an order of magnitude better than
from AllWISE. The catalog is available in the WISE/NEOWISE Enhanced and
Contributed Products area of the NASA/IPAC Infrared Science Archive.
Authors' comments: 53 pages, 20 figures, 5 tables. Accepted by ApJS
Kiru Park, Timothy Patten, Markus Vincze
Estimating the 6D pose of objects using only RGB images remains challenging
because of problems such as occlusion and symmetries. It is also difficult to
construct 3D models with precise texture without expert knowledge or
specialized scanning devices. To address these problems, we propose a novel
pose estimation method, Pix2Pose, that predicts the 3D coordinates of each
object pixel without textured models. An auto-encoder architecture is designed
to estimate the 3D coordinates and expected errors per pixel. These pixel-wise
predictions are then used in multiple stages to form 2D-3D correspondences to
directly compute poses with the PnP algorithm with RANSAC iterations. Our
method is robust to occlusion by leveraging recent achievements in generative
adversarial training to precisely recover occluded parts. Furthermore, a novel
loss function, the transformer loss, is proposed to handle symmetric objects by
guiding predictions to the closest symmetric pose. Evaluations on three
different benchmark datasets containing symmetric and occluded objects show our
method outperforms the state of the art using only RGB images.
Authors' comments: Accepted at ICCV 2019 (Oral)
Jiacheng Chen, Chen Liu, Jiaye Wu, Yasutaka Furukawa
This paper proposes a new approach for automated floorplan reconstruction
from RGBD scans, a major milestone in indoor mapping research. The approach,
dubbed Floor-SP, formulates a novel optimization problem, where room-wise
coordinate descent sequentially solves dynamic programming to optimize the
floorplan graph structure. The objective function consists of data terms guided
by deep neural networks, consistency terms encouraging adjacent rooms to share
corners and walls, and the model complexity term. The approach does not require
corner/edge detection with thresholds, unlike most other methods. We have
evaluated our system on production-quality RGBD scans of 527 apartments or
houses, including many units with non-Manhattan structures. Qualitative and
quantitative evaluations demonstrate a significant performance boost over the
current state-of-the-art. Please refer to our project website
http://jcchen.me/floor-sp/ for code and data.
Authors' comments: 10 pages, 9 figures, accepted to ICCV 2019
Wenbo Gong, Sebastian Tschiatschek, Richard Turner, Sebastian Nowozin, José Miguel Hernández-Lobato, Cheng Zhang
In this paper we introduce the ice-start problem, i.e., the challenge of deploying machine learning models when only little or no training data is initially available, and acquiring each feature element of data is associated with costs. This setting is representative for the real-world machine learning applications. For instance, in the health-care domain, when training an AI system for predicting patient metrics from lab tests, obtaining every single measurement comes with a high cost. Active learning, where only the label is associated with a cost does not apply to such problem, because performing all possible lab tests to acquire a new training datum would be costly, as well as unnecessary due to redundancy. We propose Icebreaker, a principled framework to approach the ice-start problem. Icebreaker uses a full Bayesian Deep Latent Gaussian Model (BELGAM) with a novel inference method. Our proposed method combines recent advances in amortized inference and stochastic gradient MCMC to enable fast and accurate posterior inference. By utilizing BELGAM's ability to fully quantify model uncertainty, we also propose two information acquisition functions for imputation and active prediction problems. We demonstrate that BELGAM performs significantly better than the previous VAE (Variational autoencoder) based models, when the data set size is small, using both machine learning benchmarks and real-world recommender systems and health-care applications. Moreover, based on BELGAM, Icebreaker further improves the performance and demonstrate the ability to use minimum amount of the training data to obtain the highest test time performance.
Chang-You Tai, Meng-Ru Wu, Yun-Wei Chu, Shao-Yu Chu
Recently, researchers utilize Knowledge Graph (KG) as side information in recommendation system to address cold start and sparsity issue and improve the recommendation performance. Existing KG-aware recommendation model use the feature of neighboring entities and structural information to update the embedding of currently located entity. Although the fruitful information is beneficial to the following task, the cost of exploring the entire graph is massive and impractical. In order to reduce the computational cost and maintain the pattern of extracting features, KG-aware recommendation model usually utilize fixed-size and random set of neighbors rather than complete information in KG. Nonetheless, there are two critical issues in these approaches: First of all, fixed-size and randomly selected neighbors restrict the view of graph. In addition, as the order of graph feature increases, the growth of parameter dimensionality of the model may lead the training process hard to converge. To solve the aforementioned limitations, we propose GraphSW, a strategy based on stage-wise training framework which would only access to a subset of the entities in KG in every stage. During the following stages, the learned embedding from previous stages is provided to the network in the next stage and the model can learn the information gradually from the KG. We apply stage-wise training on two SOTA recommendation models, RippleNet and Knowledge Graph Convolutional Networks (KGCN). Moreover, we evaluate the performance on six real world datasets, Last.FM 2011, Book-Crossing,movie, LFM-1b 2015, Amazon-book and Yelp 2018. The result of our experiments shows that proposed strategy can help both models to collect more information from the KG and improve the performance. Furthermore, it is observed that GraphSW can assist KGCN to converge effectively in high-order graph feature.
Ching-An Cheng, Xinyan Yan, Byron Boots
Policy gradient methods have demonstrated success in reinforcement learning tasks that have high-dimensional continuous state and action spaces. However, policy gradient methods are also notoriously sample inefficient. This can be attributed, at least in part, to the high variance in estimating the gradient of the task objective with Monte Carlo methods. Previous research has endeavored to contend with this problem by studying control variates (CVs) that can reduce the variance of estimates without introducing bias, including the early use of baselines, state dependent CVs, and the more recent state-action dependent CVs. In this work, we analyze the properties and drawbacks of previous CV techniques and, surprisingly, we find that these works have overlooked an important fact that Monte Carlo gradient estimates are generated by trajectories of states and actions. We show that ignoring the correlation across the trajectories can result in suboptimal variance reduction, and we propose a simple fix: a class of "trajectory-wise" CVs, that can further drive down the variance. We show that constructing trajectory-wise CVs can be done recursively and requires only learning state-action value functions like the previous CVs for policy gradient. We further prove that the proposed trajectory-wise CVs are optimal for variance reduction under reasonable assumptions.
Brian Kenji Iwana, Ryohei Kuroki, Seiichi Uchida
Convolutional Neural Networks (CNN) have become state-of-the-art in the field
of image classification. However, not everything is understood about their
inner representations. This paper tackles the interpretability and
explainability of the predictions of CNNs for multi-class classification
problems. Specifically, we propose a novel visualization method of pixel-wise
input attribution called Softmax-Gradient Layer-wise Relevance Propagation
(SGLRP). The proposed model is a class discriminate extension to Deep Taylor
Decomposition (DTD) using the gradient of softmax to back propagate the
relevance of the output probability to the input image. Through qualitative and
quantitative analysis, we demonstrate that SGLRP can successfully localize and
attribute the regions on input images which contribute to a target object's
classification. We show that the proposed method excels at discriminating the
target objects class from the other possible objects in the images. We confirm
that SGLRP performs better than existing Layer-wise Relevance Propagation (LRP)
based methods and can help in the understanding of the decision process of
CNNs.
Authors' comments: Published at ICCV 2019 Workshops
Mizuho Uchiyama, Kohei Ichikawa
We systematically investigate the mid-infrared (MIR; $\lambda>3 ~\mu$m) time
variability of uniformly selected $\sim800$ massive young stellar objects
(MYSOs) from the Red MSX Source (RMS) survey. Out of the 806 sources, we obtain
reliable 9-year-long MIR magnitude variability data of 331 sources at the
3.4~$\mu$m (W1) and 4.6~$\mu$m (W2) bands by cross-matching the MYSO positions
with ALLWISE and NEOWISE catalogs. After applying the variability selections
using ALLWISE data, we identify 5 MIR-variable candidates. The light curves
show various classes, with the periodic, plateau-like, and dipper features. Out
of the obtained two color-magnitude diagram of W1 and W1$-$W2, one shows "bluer
when brighter and redder when fainter" trends in variability, suggesting change
in extinction or accretion rate. Finally, our results show that
G335.9960$-$00.8532 (hereafter, G335) has a periodic light curve, with a
$\approx 690$-day cycle. Spectral energy density model fitting results indicate
that G335 is a relatively evolved MYSO; thus, we may be witnessing the very
early stages of a hyper- or ultra-compact HII region, a key source for
understanding MYSO evolution.
Authors' comments: 12 pages, 5 figures, 3 tables. Accepted by ApJ
S. K. Leggett, Trent J. Dupuy, Caroline V. Morley, Mark S. Marley, William M. J. Best, Michael C. Liu, D. Apai, S. L. Casewell et al.
Half of the energy emitted by late-T- and Y-type brown dwarfs emerges at 3.5
< lambda um < 5.5. We present new L' (3.43 < lambda um < 4.11) photometry
obtained at the Gemini North telescope for nine late-T and Y dwarfs, and
synthesize L' from spectra for an additional two dwarfs. The targets include
two binary systems which were imaged at a resolution of 0.25". One of these,
WISEP J045853.90+643452.6AB, shows significant motion, and we present an
astrometric analysis of the binary using Hubble Space Telescope, Keck Adaptive
Optics, and Gemini images. We compare lambda ~4um observations to models, and
find that the model fluxes are too low for brown dwarfs cooler than ~700K. The
discrepancy increases with decreasing temperature, and is a factor of ~2 at
T_eff=500K and ~4 at T_eff=400K. Warming the upper layers of a model atmosphere
generates a spectrum closer to what is observed. The thermal structure of cool
brown dwarf atmospheres above the radiative-convective boundary may not be
adequately modelled using pure radiative equilibrium; instead heat may be
introduced by thermochemical instabilities (previously suggested for the L- to
T-type transition) or by breaking gravity waves (previously suggested for the
solar system giant planets). One-dimensional models may not capture these
atmospheres, which likely have both horizontal and vertical
pressure/temperature variations.
Authors' comments: Accepted for publication in The Astrophysical Journal, July 17 2019.
This revision includes changes to the Appendix only. Additional new W1
photometry is given and Figure 11 is updated; Figure 13 contained erroneous
data and is corrected; Figure 14 is new and shows the new relationships
derived for transforming from ground-based L or M magnitudes to Spitzer [3.6]
and [4.5]
Yanyuet Man, Xiangyun Ding, Xingcheng Yao, Han Bao
Throughout the world, breast cancer is one of the leading causes of female
death. Recently, deep learning methods are developed to automatically grade
breast cancer of histological slides. However, the performance of existing deep
learning models is limited due to the lack of large annotated biomedical
datasets. One promising way to relieve the annotating burden is to leverage the
unannotated datasets to enhance the trained model. In this paper, we first
apply active learning method in breast cancer grading, and propose a
semi-supervised framework based on expectation maximization (EM) model. The
proposed EM approach is based on the collaborative filtering among the
annotated and unannotated datasets. The collaborative filtering method
effectively extracts useful and credible datasets from the unannotated images.
Results of pixel-wise prediction of whole-slide images (WSI) demonstrate that
the proposed method not only outperforms state-of-art methods, but also
significantly reduces the annotation cost by over 70%.
Authors' comments: The author list and contents of this paper is not complete. Other
authors request to withdraw this paper
Maen Alzubi, Szilvester Kovács
Fuzzy Rule Interpolation (FRI) reasoning methods have been introduced to address sparse fuzzy rule bases and reduce complexity. The first FRI method was the Koczy and Hirota (KH) proposed "Linear Interpolation". Besides, several conditions and criteria have been suggested for unifying the common requirements FRI methods have to satisfy. One of the most conditions is restricted the fuzzy set of the conclusion must preserve a Piece-Wise Linearity (PWL) if all antecedents and consequents of the fuzzy rules are preserving on PWL sets at {\alpha}-cut levels. The KH FRI is one of FRI methods which cannot satisfy this condition. Therefore, the goal of this paper is to investigate equations and notations related to PWL property, which is aimed to highlight the problematic properties of the KH FRI method to prove its efficiency with PWL condition. In addition, this paper is focusing on constructing benchmark examples to be a baseline for testing other FRI methods against situations that are not satisfied with the linearity condition for KH FRI.
Mike Raeini
Human beings have been generating data since very long times ago. We ask the
following common-sense and wise questions (WizQuestions):
1. Why do we refer to some pieces of data more often than referring to other
pieces? 2. What does make those commonly-referred pieces of data so unique and
different? 3. What are the characteristics of data that sometimes make the data
so unique and different?
In this article, we introduce a novel approach (model) that helps us answer
these questions from data science and network science perspectives. WizWordily
speaking, our proposed approach enables us to model the data (as a network),
measure the quality of data, and study the network of data deeply and
thoroughly.
Authors' comments: 16 pages, 3 figures
Jie, Zhang, Jonathan P. Newman, Xiao Wang, Chetan Singh Thakur, John Rattray, Ralph Etienne-Cummings, Matthew A. Wilson
We demonstrated a CMOS imaging system that adapts each pixel's exposure and
sampling rate to capture high dynamic range (HDR) videos. The system consist of
a custom designed image sensor with pixel-wise exposure configurability and a
real-time pixel exposure controller. These parts operate in a closed-loop to
sample, detect and optimize each pixel's exposure and sampling rate to minimize
local region's underexposure, overexposure and motion blurring. Exposure
control is implemented using all-integrated electronics without external
optical modulation. This reduces overall system size and power consumption.
The image sensor is implemented using a standard 130nm CMOS process while the
exposure controller is implemented on a computer. We performed experiments
under complex lighting and motion condition to test performance of the system,
and demonstrate the benefit of pixel-wise adaptive imaging on the performance
of computer vision tasks such as segmentation, motion estimation and object
recognition.
Authors' comments: 9 pages, 8 figures
Krzysztof Debicki, Lanpeng Ji, Tomasz Rolski
We consider a two-dimensional ruin problem where the surplus process of
business lines is modelled by a two-dimensional correlated Brownian motion with
drift. We study the ruin function $P(u)$ for the component-wise ruin (that is
both business lines are ruined in an infinite-time horizon), where $u$ is the
same initial capital for each line. We measure the goodness of the business by
analysing the adjustment coefficient, that is the limit of $-\ln P(u)/u$ as $u$
tends to infinity, which depends essentially on the correlation $\rho$ of the
two surplus processes. In order to work out the adjustment coefficient we solve
a two-layer optimization problem.
Authors' comments: 20
Yutai Hou, Zhihan Zhou, Yijia Liu, Ning Wang, Wanxiang Che, Han Liu, Ting Liu
While few-shot classification has been widely explored with similarity based methods, few-shot sequence labeling poses a unique challenge as it also calls for modeling the label dependencies. To consider both the item similarity and label dependency, we propose to leverage the conditional random fields (CRFs) in few-shot sequence labeling. It calculates emission score with similarity based methods and obtains transition score with a specially designed transfer mechanism. When applying CRF in the few-shot scenarios, the discrepancy of label sets among different domains makes it hard to use the label dependency learned in prior domains. To tackle this, we introduce the dependency transfer mechanism that transfers abstract label transition patterns. In addition, the similarity methods rely on the high quality sample representation, which is challenging for sequence labeling, because sense of a word is different when measuring its similarity to words in different sentences. To remedy this, we take advantage of recent contextual embedding technique, and further propose a pair-wise embedder. It provides additional certainty for word sense by embedding query and support sentence pairwisely. Experimental results on slot tagging and named entity recognition show that our model significantly outperforms the strongest few-shot learning baseline by 11.76 (21.2%) and 12.18 (97.7%) F1 scores respectively in the one-shot setting.
Vojtech Mrazek, Zdenek Vasicek, Lukas Sekanina, Muhammad Abdullah Hanif, Muhammad Shafique
The state-of-the-art approaches employ approximate computing to reduce the
energy consumption of DNN hardware. Approximate DNNs then require extensive
retraining afterwards to recover from the accuracy loss caused by the use of
approximate operations. However, retraining of complex DNNs does not scale
well. In this paper, we demonstrate that efficient approximations can be
introduced into the computational path of DNN accelerators while retraining can
completely be avoided. ALWANN provides highly optimized implementations of DNNs
for custom low-power accelerators in which the number of computing units is
lower than the number of DNN layers. First, a fully trained DNN is converted to
operate with 8-bit weights and 8-bit multipliers in convolutional layers. A
suitable approximate multiplier is then selected for each computing element
from a library of approximate multipliers in such a way that (i) one
approximate multiplier serves several layers, and (ii) the overall
classification error and energy consumption are minimized. The optimizations
including the multiplier selection problem are solved by means of a
multiobjective optimization NSGA-II algorithm. In order to completely avoid the
computationally expensive retraining of DNN, which is usually employed to
improve the classification accuracy, we propose a simple weight updating scheme
that compensates the inaccuracy introduced by employing approximate
multipliers. The proposed approach is evaluated for two architectures of DNN
accelerators with approximate multipliers from the open-source "EvoApprox"
library. We report that the proposed approach saves 30% of energy needed for
multiplication in convolutional layers of ResNet-50 while the accuracy is
degraded by only 0.6%. The proposed technique and approximate layers are
available as an open-source extension of TensorFlow at
https://github.com/ehw-fit/tf-approximate.
Authors' comments: Accepted for 2019 IEEE/ACM International Conference On Computer-Aided
Design (ICCAD'19)
Zi-Kan Geng, Yue Jiang, Tianhong Wang, Hui-Wen Zheng, Guo-Li Wang
The Isgur-Wise function vastly reduces the weak-decay form factors of hadrons
containing one heavy quark. In this paper, we extract the Isgur-Wise functions
from the instantaneous Bethe-Salpeter method, and give the numerical results
for the $B_c$ decays to charmonium where the final states include $1S$, $1P$,
$2S$ and $2P$. The overlapping integral of the wave functions for the initial
and final states is the Isgur-Wise function, as the heavy quark effective
theory does. In the case of accurate calculation, describing form factors need
to introduce more relativistic corrections which are the overlapping integrals
with the relative momentum between the quark and antiquark to Isgur-Wise
function. The relativistic corrections to Isgur-Wise function provide greater
contributions especially involving the excited state, and therefore are
necessary to be adopted.
Authors' comments: 33 pages, 66 figures