Shaahin Angizi, Deliang Fan
With Von-Neumann computing architectures struggling to address
computationally- and memory-intensive big data analytic task today,
Processing-in-Memory (PIM) platforms are gaining growing interests. In this
way, processing-in-DRAM architecture has achieved remarkable success by
dramatically reducing data transfer energy and latency. However, the
performance of such system unavoidably diminishes when dealing with more
complex applications seeking bulk bit-wise X(N)OR- or addition operations,
despite utilizing maximum internal DRAM bandwidth and in-memory parallelism. In
this paper, we develop DRIM platform that harnesses DRAM as computational
memory and transforms it into a fundamental processing unit. DRIM uses the
analog operation of DRAM sub-arrays and elevates it to implement bit-wise
X(N)OR operation between operands stored in the same bit-line, based on a new
dual-row activation mechanism with a modest change to peripheral circuits such
sense amplifiers. The simulation results show that DRIM achieves on average 71x
and 8.4x higher throughput for performing bulk bit-wise X(N)OR-based operations
compared with CPU and GPU, respectively. Besides, DRIM outperforms recent
processing-in-DRAM platforms with up to 3.7x better performance.
Authors' comments: 7 pages, 9 Figures
Jon Hoffman
Neural Networks accomplish amazing things, but they suffer from computational
and memory bottlenecks that restrict their usage. Nowhere can this be better
seen than in the mobile space, where specialized hardware is being created just
to satisfy the demand for neural networks. Previous studies have shown that
neural networks have vastly more connections than they actually need to do
their work. This thesis develops a method that can compress networks to less
than 10% of memory and less than 25% of computational power, without loss of
accuracy, and without creating sparse networks that require special code to
run.
Authors' comments: Thesis for Masters degree
Yuning Chai
Recent advances in single-frame object detection and segmentation techniques
have motivated a wide range of works to extend these methods to process video
streams. In this paper, we explore the idea of hard attention aimed for
latency-sensitive applications. Instead of reasoning about every frame
separately, our method selects and only processes a small sub-window of the
frame. Our technique then makes predictions for the full frame based on the
sub-windows from previous frames and the update from the current sub-window.
The latency reduction by this hard attention mechanism comes at the cost of
degraded accuracy. We made two contributions to address this. First, we propose
a specialized memory cell that recovers lost context when processing
sub-windows. Secondly, we adopt a Q-learning-based policy training strategy
that enables our approach to intelligently select the sub-windows such that the
staleness in the memory hurts the performance the least. Our experiments
suggest that our approach reduces the latency by approximately four times
without significantly sacrificing the accuracy on the ImageNet VID video object
detection dataset and the DAVIS video object segmentation dataset. We further
demonstrate that we can reinvest the saved computation into other parts of the
network, and thus resulting in an accuracy increase at a comparable
computational cost as the original system and beating other recently proposed
state-of-the-art methods in the low latency range.
Authors' comments: ICCV 2019 Camera Ready + Supplementary
Ashu Sharma, Sanjay K. Sahay
In the fast-growing smart devices, Android is the most popular OS, and due to
its attractive features, mobility, ease of use, these devices hold sensitive
information such as personal data, browsing history, shopping history,
financial details, etc. Therefore, any security gap in these devices means that
the information stored or accessing the smart devices are at high risk of being
breached by the malware. These malware are continuously growing and are also
used for military espionage, disrupting the industry, power grids, etc. To
detect these malware, traditional signature matching techniques are widely
used. However, such strategies are not capable to detect the advanced Android
malicious apps because malware developer uses several obfuscation techniques.
Hence, researchers are continuously addressing the security issues in the
Android based smart devices. Therefore, in this paper using Drebin benchmark
malware dataset we experimentally demonstrate how to improve the detection
accuracy by analyzing the apps after grouping the collected data based on the
permissions and achieved 97.15% overall average accuracy. Our results
outperform the accuracy obtained without grouping data (79.27%, 2017), Arp, et
al. (94%, 2014), Annamalai et al. (84.29%, 2016), Bahman Rashidi et al. (82%,
2017)) and Ali Feizollah, et al. (95.5%, 2017). The analysis also shows that
among the groups, Microphone group detection accuracy is least while Calendar
group apps are detected with the highest accuracy, and with the highest
accuracy, and for the best performance, one shall take 80-100 features.
Authors' comments: 9 pages, 20 Figures
João Lita da Silva
The main purpose of this paper is to obtain strong laws of large numbers for
arrays or weighted sums of random variables under a scenario of dependence.
Namely, for triangular arrays $\{X_{n,k}, \, 1 \leqslant k \leqslant n, \, n
\geqslant 1 \}$ of row-wise extended negatively dependent random variables
weakly mean dominated by a random variable $X \in \mathscr{L}_{1}$ and
sequences $\{b_{n} \}$ of positive constants, conditions are given to ensure
$\sum_{k=1}^{n} \left(X_{n,k} - \mathbb{E} \, X_{n,k} \right)/b_{n}
\overset{\textnormal{a.s.}}{\longrightarrow} 0$. Our statements also allow us
to improve recent results about complete convergence.
Authors' comments: 15 pages
Raffaele D'Abrusco, Nuria Alvarez Crespo, Francesco Massaro, Riccardo Campana, Vahram Chavushyan, Marco Landoni, Fabio La Franca, Nicola Masetti et al.
We present two catalogs of radio-loud candidate blazars whose WISE
mid-infrared colors are selected to be consistent with the colors of confirmed
gamma-ray emitting blazars. The first catalog is the improved and expanded
release of the WIBRaLS catalog presented by D'Abrusco et al. (2014): it
includes sources detected in all four WISE filters, spatially cross-matched
with radio source in one of three radio surveys and radio-loud based on their
q22 spectral parameter. WIBRaLS2 includes 9541 sources classified as BL Lacs,
FSRQs or mixed candidates based on their WISE colors. The second catalog,
called KDEBLLACS, based on a new selection technique, contains 5579 candidate
BL Lacs extracted from the population of WISE sources detected in the first
three WISE passbands ([3.4], [4.6] and [12]) only, whose mid-infrared colors
are similar to those of confirmed, gamma-ray BL Lacs. KDBLLACS members area
also required to have a radio counterpart and be radio-loud based on the
parameter q12, defined similarly to q22 used for the WIBRaLS2. We describe the
properties of these catalogs and compare them with the largest samples of
confirmed and candidate blazars in the literature. We crossmatch the two new
catalogs with the most recent catalogs of gamma-ray sources detected by Fermi
LAT instrument. Since spectroscopic observations of candidate blazars from the
first WIBRaLS catalog within the uncertainty regions of gamma-ray unassociated
sources confirmed that ~90% of these candidates are blazars, we anticipate that
these new catalogs will play again an important role in the identification of
the gamma-ray sky.
Authors' comments: 20 pages, 7 figures. Accepted for publication in The Astrophysical
Journal Supplement Series
Kanav Vats, Helmut Neher, Alexander Wong, David A. Clausi, John Zelek
In this paper, we present a novel approach called KPTransfer for improving modeling performance for keypoint detection deep neural networks via domain transfer between different keypoint subsets. This approach is motivated by the notion that rich contextual knowledge can be transferred between different keypoint subsets representing separate domains. In particular, the proposed method takes into account various keypoint subsets/domains by sequentially adding and removing keypoints. Contextual knowledge is transferred between two separate domains via domain transfer. Experiments to demonstrate the efficacy of the proposed KPTransfer approach were performed for the task of human pose estimation on the MPII dataset, with comparisons against random initialization and frozen weight extraction configurations. Experimental results demonstrate the efficacy of performing domain transfer between two different joint subsets resulting in a PCKh improvement of up to 1.1 over random initialization on joints such as wrists and knee in certain joint splits with an overall PCKh improvement of 0.5. Domain transfer from a different set of joints not only results in improved accuracy but also results in faster convergence because of mutual co-adaptations of weights resulting from the contextual knowledge of the pose from a different set of joints.
Lin Xu, Han Sun, Yuai Liu
Deep metric learning is essential for visual recognition. The widely used
pair-wise (or triplet) based loss objectives cannot make full use of semantical
information in training samples or give enough attention to those hard samples
during optimization. Thus, they often suffer from a slow convergence rate and
inferior performance. In this paper, we show how to learn an importance-driven
distance metric via optimal transport programming from batches of samples. It
can automatically emphasize hard examples and lead to significant improvements
in convergence. We propose a new batch-wise optimal transport loss and combine
it in an end-to-end deep metric learning manner. We use it to learn the
distance metric and deep feature representation jointly for recognition.
Empirical results on visual retrieval and classification tasks with six
benchmark datasets, i.e., MNIST, CIFAR10, SHREC13, SHREC14, ModelNet10, and
ModelNet40, demonstrate the superiority of the proposed method. It can
accelerate the convergence rate significantly while achieving a
state-of-the-art recognition performance. For example, in 3D shape recognition
experiments, we show that our method can achieve better recognition performance
within only 5 epochs than what can be obtained by mainstream 3D shape
recognition approaches after 200 epochs.
Authors' comments: 10 pages, 4 figures Accepted by CVPR2019
Pedro Hespanhol, Rien Quirynen
Nonlinear model predictive control~(NMPC) generally requires the solution of a non-convex optimization problem at each sampling instant under strict timing constraints, based on a set of differential equations that can often be stiff and/or that may include implicit algebraic equations. This paper provides a local convergence analysis for the recently proposed adjoint-based sequential quadratic programming~(SQP) algorithm that is based on a block-structured variant of the two-sided rank-one~(TR1) quasi-Newton update formula to efficiently compute Jacobian matrix approximations in a sparsity preserving fashion. A particularly efficient algorithm implementation is proposed in case an implicit integration scheme is used for discretization of the optimal control problem, in which matrix factorization and matrix-matrix operations can be avoided entirely. The convergence analysis results as well as the computational performance of the proposed optimization algorithm are illustrated for two simulation case studies of nonlinear MPC.
Moritz Böhle, Fabian Eitel, Martin Weygandt, Kerstin Ritter
Deep neural networks have led to state-of-the-art results in many medical imaging tasks including Alzheimer's disease (AD) detection based on structural magnetic resonance imaging (MRI) data. However, the network decisions are often perceived as being highly non-transparent, making it difficult to apply these algorithms in clinical routine. In this study, we propose using layer-wise relevance propagation (LRP) to visualize convolutional neural network decisions for AD based on MRI data. Similarly to other visualization methods, LRP produces a heatmap in the input space indicating the importance/relevance of each voxel contributing to the final classification outcome. In contrast to susceptibility maps produced by guided backpropagation ("Which change in voxels would change the outcome most?"), the LRP method is able to directly highlight positive contributions to the network classification in the input space. In particular, we show that (1) the LRP method is very specific for individuals ("Why does this person have AD?") with high inter-patient variability, (2) there is very little relevance for AD in healthy controls and (3) areas that exhibit a lot of relevance correlate well with what is known from literature. To quantify the latter, we compute size-corrected metrics of the summed relevance per brain area, e.g., relevance density or relevance gain. Although these metrics produce very individual "fingerprints" of relevance patterns for AD patients, a lot of importance is put on areas in the temporal lobe including the hippocampus. After discussing several limitations such as sensitivity toward the underlying model and computation parameters, we conclude that LRP might have a high potential to assist clinicians in explaining neural network decisions for diagnosing AD (and potentially other diseases) based on structural MRI data.
Xiaoshui Huang, Lixin Fan, Qiang Wu, Jian Zhang, Chun Yuan
Many types of 3D acquisition sensors have emerged in recent years and point
cloud has been widely used in many areas. Accurate and fast registration of
cross-source 3D point clouds from different sensors is an emerged research
problem in computer vision. This problem is extremely challenging because
cross-source point clouds contain a mixture of various variances, such as
density, partial overlap, large noise and outliers, viewpoint changing. In this
paper, an algorithm is proposed to align cross-source point clouds with both
high accuracy and high efficiency. There are two main contributions: firstly,
two components, the weak region affinity and pixel-wise refinement, are
proposed to maintain the global and local information of 3D point clouds. Then,
these two components are integrated into an iterative tensor-based registration
algorithm to solve the cross-source point cloud registration problem. We
conduct experiments on synthetic cross-source benchmark dataset and real
cross-source datasets. Comparison with six state-of-the-art methods, the
proposed method obtains both higher efficiency and accuracy.
Authors' comments: ICME 2019
Tamal Batabyal, Barry Condron, Scott T. Acton
The shape and connectivity of a neuron determine its function. Modern imaging
methods have proven successful at extracting such information. However, in
order to analyze this type of data, neuronal morphology needs to be encoded in
a graph-theoretic method. This encoding enables the use of high throughput
informatic methods to extract and infer brain function. The application of
graph-theoretic methods to neuronal morphological representation comes with
certain difficulties. Here we report a novel, effective method to accomplish
this task.
The morphology of a neuron, which consists of its overall size, global shape,
local branch patterns, and cell-specific biophysical properties, can vary
significantly with the cell's identity, location, as well as developmental and
physiological state. Various algorithms have been developed to customize shape
based statistical and graph related features for quantitative analysis of
neuromorphology, followed by the classification of neuron cell types using the
features. Unlike the classical feature extraction based methods from imaged or
3D reconstructed neurons, we propose a model based on the rooted path
decomposition from the soma to the dendrites of a neuron and extract
morphological features on each path. We hypothesize that measuring the distance
between two neurons can be realized by minimizing the cost of continuously
morphing the set of all rooted paths of one neuron to another. To validate this
claim, we first establish the correspondence of paths between two neurons using
a modified Munkres algorithm. Next, an elastic deformation framework that
employs the square root velocity function is established to perform the
continuous morphing, which, in addition, provides an effective visualization
tool. We experimentally show the efficacy of NeuroPath2Path, NeuroP2P, over the
state of the art.
Authors' comments: Submitted to Neuroinformatics
Zhixin Wang, Kui Jia
In this work, we propose a novel method termed \emph{Frustum ConvNet
(F-ConvNet)} for amodal 3D object detection from point clouds. Given 2D region
proposals in an RGB image, our method first generates a sequence of frustums
for each region proposal, and uses the obtained frustums to group local points.
F-ConvNet aggregates point-wise features as frustum-level feature vectors, and
arrays these feature vectors as a feature map for use of its subsequent
component of fully convolutional network (FCN), which spatially fuses
frustum-level features and supports an end-to-end and continuous estimation of
oriented boxes in the 3D space. We also propose component variants of
F-ConvNet, including an FCN variant that extracts multi-resolution frustum
features, and a refined use of F-ConvNet over a reduced 3D space. Careful
ablation studies verify the efficacy of these component variants. F-ConvNet
assumes no prior knowledge of the working 3D environment and is thus
dataset-agnostic. We present experiments on both the indoor SUN-RGBD and
outdoor KITTI datasets. F-ConvNet outperforms all existing methods on SUN-RGBD,
and at the time of submission it outperforms all published works on the KITTI
benchmark. Code has been made available at:
{\url{https://github.com/zhixinwang/frustum-convnet}.}
Authors' comments: IROS 2019
Julien M. Hendrickx, Balazs Gerencser, Baris Fidan
We consider trajectories where the sign of the derivative of each entry is
opposite to that of the corresponding entry in the gradient of an energy
function. We show that this condition guarantees convergence when the energy
function is quadratic and positive definite and partly extend that result to
some classes of positive semi-definite quadratic functions including those
defined using a graph Laplacian. We show how this condition allows establishing
the convergence of a platoon application in which it naturally appears, due to
deadzones in the control laws designed to avoid instabilities caused by
inconsistent measurements of the same distance by different agents.
Authors' comments: 6 pages, 2 figures, double columns
Shengfan Wang, Xin Jiang, Jie Zhao, Xiaoman Wang, Weiguo Zhou, Yunhui Liu, Fellow IEEE
This paper presents an efficient neural network model to generate robotic
grasps with high resolution images. The proposed model uses fully convolution
neural network to generate robotic grasps for each pixel using 400 $\times$ 400
high resolution RGB-D images. It first down-sample the images to get features
and then up-sample those features to the original size of the input as well as
combines local and global features from different feature maps. Compared to
other regression or classification methods for detecting robotic grasps, our
method looks more like the segmentation methods which solves the problem
through pixel-wise ways. We use Cornell Grasp Dataset to train and evaluate the
model and get high accuracy about 94.42% for image-wise and 91.02% for
object-wise and fast prediction time about 8ms. We also demonstrate that
without training on the multiple objects dataset, our model can directly output
robotic grasps candidates for different objects because of the pixel wise
implementation.
Authors' comments: Submitted to ROBIO 2019
Ashadullah Shawon, Syed Tauhid Zuhori, Firoz Mahmud, Md. Jamil-Ur Rahman
A web browser should not be only for browsing web pages but also help users
to find out their target websites and recommend similar type websites based on
their behavior. Throughout this paper, we propose two methods to make a web
browser more intelligent about link prediction which works during typing on
address-bar and recommendation of websites according to several categories. Our
proposed link prediction system is actually frecency prediction which is
predicted based on the first visit, last visit and URL counts. But recommend
system is the most challenging as it is needed to classify web URLs according
to names without visiting web pages. So we use existing model for URL
classification. The only existing approach gives unsatisfactory results and low
accuracy. So we add hyperparameter optimization with an existing approach that
finds the best parameters for existing URL classification model and gives
better accuracy. In this paper, we propose a category wise recommendation
system using frecency value and the total visit of individual URL category.
Authors' comments: preprint
Zeyu Cui, Zekun Li, Shu Wu, Xiaoyu Zhang, Liang Wang
With the rapid development of fashion market, the customers' demands of
customers for fashion recommendation are rising. In this paper, we aim to
investigate a practical problem of fashion recommendation by answering the
question "which item should we select to match with the given fashion items and
form a compatible outfit". The key to this problem is to estimate the outfit
compatibility. Previous works which focus on the compatibility of two items or
represent an outfit as a sequence fail to make full use of the complex
relations among items in an outfit. To remedy this, we propose to represent an
outfit as a graph. In particular, we construct a Fashion Graph, where each node
represents a category and each edge represents interaction between two
categories. Accordingly, each outfit can be represented as a subgraph by
putting items into their corresponding category nodes. To infer the outfit
compatibility from such a graph, we propose Node-wise Graph Neural Networks
(NGNN) which can better model node interactions and learn better node
representations. In NGNN, the node interaction on each edge is different, which
is determined by parameters correlated to the two connected nodes. An attention
mechanism is utilized to calculate the outfit compatibility score with learned
node representations. NGNN can not only be used to model outfit compatibility
from visual or textual modality but also from multiple modalities. We conduct
experiments on two tasks: (1) Fill-in-the-blank: suggesting an item that
matches with existing components of outfit; (2) Compatibility prediction:
predicting the compatibility scores of given outfits. Experimental results
demonstrate the great superiority of our proposed method over others.
Authors' comments: 11 pages, accepted by the 2019 World Wide Web Conference (WWW-2019)
Jaeyoung Lee, Aravind Balakrishnan, Ashish Gaurav, Krzysztof Czarnecki, Sean Sedwards
Machine learning can provide efficient solutions to the complex problems encountered in autonomous driving, but ensuring their safety remains a challenge. A number of authors have attempted to address this issue, but there are few publicly-available tools to adequately explore the trade-offs between functionality, scalability, and safety. We thus present WiseMove, a software framework to investigate safe deep reinforcement learning in the context of motion planning for autonomous driving. WiseMove adopts a modular learning architecture that suits our current research questions and can be adapted to new technologies and new questions. We present the details of WiseMove, demonstrate its use on a common traffic scenario, and describe how we use it in our ongoing safe learning research.
Markus Suchi, Timothy Patten, David Fischinger, Markus Vincze
Developing robot perception systems for recognizing objects in the real-world
requires computer vision algorithms to be carefully scrutinized with respect to
the expected operating domain. This demands large quantities of ground truth
data to rigorously evaluate the performance of algorithms. This paper presents
the EasyLabel tool for easily acquiring high quality ground truth annotation of
objects at the pixel-level in densely cluttered scenes. In a semi-automatic
process, complex scenes are incrementally built and EasyLabel exploits depth
change to extract precise object masks at each step. We use this tool to
generate the Object Cluttered Indoor Dataset (OCID) that captures diverse
settings of objects, background, context, sensor to scene distance, viewpoint
angle and lighting conditions. OCID is used to perform a systematic comparison
of existing object segmentation methods. The baseline comparison supports the
need for pixel- and object-wise annotation to progress robot vision towards
realistic applications. This insight reveals the usefulness of EasyLabel and
OCID to better understand the challenges that robots face in the real-world.
Copyright 20XX IEEE. Personal use of this material is permitted. Permission
from IEEE must be obtained for all other uses, in any current or future media,
including reprinting/republishing this material for advertising or promotional
purposes, creating new collective works, for resale or redistribution to
servers or lists, or reuse of any copyrighted component of this work in other
works.
Authors' comments: 7 pages, 8 figures, ICRA2019, Draft
Stavros Akras, Lizette Guzman-Ramirez, Marcelo L. Leal-Ferreira, Gerardo Ramos-Larios
We present a new census of Galactic and extragalactic symbiotic stars
(SySts). This compilation contains 323 known and 87 candidate SySts. Of the
confirmed SySts, 257 are Galactic and 66 extragalactic. The spectral energy
distributions (SEDs) of 348 sources have been constructed using 2MASS and
AllWISE data. Regarding the Galactic SySts, 74% are S-types, 13% D and 3.5%
D$^{\prime}$. S-types show an SED peak between 0.8 and 1.7 $\mu$m, whereas
D-type show a peak at longer wavelengths between 2 and 4 $\mu$m.
D$^{\prime}$-type, on the other hand, display a nearly flat profile. Gaia
distances and effective temperatures are also presented. According to their
Gaia distances, S-type are found to be members of both thin and thick Galactic
disk populations, while S+IR- and D-types are mainly thin disk sources. Gaia
temperatures show a reasonable agreement with the temperatures derived from
SEDs within their uncertainties. A new census of the OVI $\lambda$6830
Raman-scattered line in SySts is also presented. From a sample of 298 SySts
with available optical spectra, 55% are found to emit the line. No significant
preference is found among the different types. The report of the OVI
$\lambda$6830 Raman-scattered line in non-SySts is also discussed as well as
the correlation between the Raman-scattered OVI line and X-ray emission. We
conclude that the presence of the OVI Raman-scattered line still provides a
strong criterion for identifying a source as a SySt.
Authors' comments: 6 pages, 13 figures, 7 tables