Jie, Zhang, Jonathan P. Newman, Xiao Wang, Chetan Singh Thakur, John Rattray, Ralph Etienne-Cummings, Matthew A. Wilson
We demonstrated a CMOS imaging system that adapts each pixel's exposure and
sampling rate to capture high dynamic range (HDR) videos. The system consist of
a custom designed image sensor with pixel-wise exposure configurability and a
real-time pixel exposure controller. These parts operate in a closed-loop to
sample, detect and optimize each pixel's exposure and sampling rate to minimize
local region's underexposure, overexposure and motion blurring. Exposure
control is implemented using all-integrated electronics without external
optical modulation. This reduces overall system size and power consumption.
The image sensor is implemented using a standard 130nm CMOS process while the
exposure controller is implemented on a computer. We performed experiments
under complex lighting and motion condition to test performance of the system,
and demonstrate the benefit of pixel-wise adaptive imaging on the performance
of computer vision tasks such as segmentation, motion estimation and object
recognition.
Authors' comments: 9 pages, 8 figures
Krzysztof Debicki, Lanpeng Ji, Tomasz Rolski
We consider a two-dimensional ruin problem where the surplus process of
business lines is modelled by a two-dimensional correlated Brownian motion with
drift. We study the ruin function $P(u)$ for the component-wise ruin (that is
both business lines are ruined in an infinite-time horizon), where $u$ is the
same initial capital for each line. We measure the goodness of the business by
analysing the adjustment coefficient, that is the limit of $-\ln P(u)/u$ as $u$
tends to infinity, which depends essentially on the correlation $\rho$ of the
two surplus processes. In order to work out the adjustment coefficient we solve
a two-layer optimization problem.
Authors' comments: 20
Yutai Hou, Zhihan Zhou, Yijia Liu, Ning Wang, Wanxiang Che, Han Liu, Ting Liu
While few-shot classification has been widely explored with similarity based methods, few-shot sequence labeling poses a unique challenge as it also calls for modeling the label dependencies. To consider both the item similarity and label dependency, we propose to leverage the conditional random fields (CRFs) in few-shot sequence labeling. It calculates emission score with similarity based methods and obtains transition score with a specially designed transfer mechanism. When applying CRF in the few-shot scenarios, the discrepancy of label sets among different domains makes it hard to use the label dependency learned in prior domains. To tackle this, we introduce the dependency transfer mechanism that transfers abstract label transition patterns. In addition, the similarity methods rely on the high quality sample representation, which is challenging for sequence labeling, because sense of a word is different when measuring its similarity to words in different sentences. To remedy this, we take advantage of recent contextual embedding technique, and further propose a pair-wise embedder. It provides additional certainty for word sense by embedding query and support sentence pairwisely. Experimental results on slot tagging and named entity recognition show that our model significantly outperforms the strongest few-shot learning baseline by 11.76 (21.2%) and 12.18 (97.7%) F1 scores respectively in the one-shot setting.
Vojtech Mrazek, Zdenek Vasicek, Lukas Sekanina, Muhammad Abdullah Hanif, Muhammad Shafique
The state-of-the-art approaches employ approximate computing to reduce the
energy consumption of DNN hardware. Approximate DNNs then require extensive
retraining afterwards to recover from the accuracy loss caused by the use of
approximate operations. However, retraining of complex DNNs does not scale
well. In this paper, we demonstrate that efficient approximations can be
introduced into the computational path of DNN accelerators while retraining can
completely be avoided. ALWANN provides highly optimized implementations of DNNs
for custom low-power accelerators in which the number of computing units is
lower than the number of DNN layers. First, a fully trained DNN is converted to
operate with 8-bit weights and 8-bit multipliers in convolutional layers. A
suitable approximate multiplier is then selected for each computing element
from a library of approximate multipliers in such a way that (i) one
approximate multiplier serves several layers, and (ii) the overall
classification error and energy consumption are minimized. The optimizations
including the multiplier selection problem are solved by means of a
multiobjective optimization NSGA-II algorithm. In order to completely avoid the
computationally expensive retraining of DNN, which is usually employed to
improve the classification accuracy, we propose a simple weight updating scheme
that compensates the inaccuracy introduced by employing approximate
multipliers. The proposed approach is evaluated for two architectures of DNN
accelerators with approximate multipliers from the open-source "EvoApprox"
library. We report that the proposed approach saves 30% of energy needed for
multiplication in convolutional layers of ResNet-50 while the accuracy is
degraded by only 0.6%. The proposed technique and approximate layers are
available as an open-source extension of TensorFlow at
https://github.com/ehw-fit/tf-approximate.
Authors' comments: Accepted for 2019 IEEE/ACM International Conference On Computer-Aided
Design (ICCAD'19)
Zi-Kan Geng, Yue Jiang, Tianhong Wang, Hui-Wen Zheng, Guo-Li Wang
The Isgur-Wise function vastly reduces the weak-decay form factors of hadrons
containing one heavy quark. In this paper, we extract the Isgur-Wise functions
from the instantaneous Bethe-Salpeter method, and give the numerical results
for the $B_c$ decays to charmonium where the final states include $1S$, $1P$,
$2S$ and $2P$. The overlapping integral of the wave functions for the initial
and final states is the Isgur-Wise function, as the heavy quark effective
theory does. In the case of accurate calculation, describing form factors need
to introduce more relativistic corrections which are the overlapping integrals
with the relative momentum between the quark and antiquark to Isgur-Wise
function. The relativistic corrections to Isgur-Wise function provide greater
contributions especially involving the excited state, and therefore are
necessary to be adopted.
Authors' comments: 33 pages, 66 figures
Insu Han, Haim Avron, Jinwoo Shin
This paper studies how to sketch element-wise functions of low-rank matrices. Formally, given low-rank matrix A = [Aij] and scalar non-linear function f, we aim for finding an approximated low-rank representation of the (possibly high-rank) matrix [f(Aij)]. To this end, we propose an efficient sketching-based algorithm whose complexity is significantly lower than the number of entries of A, i.e., it runs without accessing all entries of [f(Aij)] explicitly. The main idea underlying our method is to combine a polynomial approximation of f with the existing tensor sketch scheme for approximating monomials of entries of A. To balance the errors of the two approximation components in an optimal manner, we propose a novel regression formula to find polynomial coefficients given A and f. In particular, we utilize a coreset-based regression with a rigorous approximation guarantee. Finally, we demonstrate the applicability and superiority of the proposed scheme under various machine learning tasks.
Boris Ginsburg, Patrice Castonguay, Oleksii Hrinchuk, Oleksii Kuchaiev, Vitaly Lavrukhin, Ryan Leary, Jason Li, Huyen Nguyen et al.
We propose NovoGrad, an adaptive stochastic gradient descent method with
layer-wise gradient normalization and decoupled weight decay. In our
experiments on neural networks for image classification, speech recognition,
machine translation, and language modeling, it performs on par or better than
well tuned SGD with momentum and Adam or AdamW. Additionally, NovoGrad (1) is
robust to the choice of learning rate and weight initialization, (2) works well
in a large batch setting, and (3) has two times smaller memory footprint than
Adam.
Authors' comments: Preprint, under review
Xiang Li, Xiaolin Hu, Jian Yang
The Convolutional Neural Networks (CNNs) generate the feature representation
of complex objects by collecting hierarchical and different parts of semantic
sub-features. These sub-features can usually be distributed in grouped form in
the feature vector of each layer, representing various semantic entities.
However, the activation of these sub-features is often spatially affected by
similar patterns and noisy backgrounds, resulting in erroneous localization and
identification. We propose a Spatial Group-wise Enhance (SGE) module that can
adjust the importance of each sub-feature by generating an attention factor for
each spatial location in each semantic group, so that every individual group
can autonomously enhance its learnt expression and suppress possible noise. The
attention factors are only guided by the similarities between the global and
local feature descriptors inside each group, thus the design of SGE module is
extremely lightweight with \emph{almost no extra parameters and calculations}.
Despite being trained with only category supervisions, the SGE component is
extremely effective in highlighting multiple active areas with various
high-order semantics (such as the dog's eyes, nose, etc.). When integrated with
popular CNN backbones, SGE can significantly boost the performance of image
recognition tasks. Specifically, based on ResNet50 backbones, SGE achieves
1.2\% Top-1 accuracy improvement on the ImageNet benchmark and 1.0$\sim$2.0\%
AP gain on the COCO benchmark across a wide range of detectors
(Faster/Mask/Cascade RCNN and RetinaNet). Codes and pretrained models are
available at https://github.com/implus/PytorchInsight.
Authors' comments: Code available at: https://github.com/implus/PytorchInsight
Karsten Roth, Tomasz Konopczyński, Jürgen Hesser
At present, lesion segmentation is still performed manually (or semi-automatically) by medical experts. To facilitate this process, we contribute a fully-automatic lesion segmentation pipeline. This work proposes a method as a part of the LiTS (Liver Tumor Segmentation Challenge) competition for ISBI 17 and MICCAI 17 comparing methods for automatics egmentation of liver lesions in CT scans. By utilizing cascaded, densely connected 2D U-Nets and a Tversky-coefficient based loss function, our framework achieves very good shape extractions with high detection sensitivity, with competitive scores at time of publication. In addition, adjusting hyperparameters in our Tversky-loss allows to tune the network towards higher sensitivity or robustness.
Kai Su, Dongdong Yu, Zhenqi Xu, Xin Geng, Changhu Wang
Multi-person pose estimation is an important but challenging problem in
computer vision. Although current approaches have achieved significant progress
by fusing the multi-scale feature maps, they pay little attention to enhancing
the channel-wise and spatial information of the feature maps. In this paper, we
propose two novel modules to perform the enhancement of the information for the
multi-person pose estimation. First, a Channel Shuffle Module (CSM) is proposed
to adopt the channel shuffle operation on the feature maps with different
levels, promoting cross-channel information communication among the pyramid
feature maps. Second, a Spatial, Channel-wise Attention Residual Bottleneck
(SCARB) is designed to boost the original residual unit with attention
mechanism, adaptively highlighting the information of the feature maps both in
the spatial and channel-wise context. The effectiveness of our proposed modules
is evaluated on the COCO keypoint benchmark, and experimental results show that
our approach achieves the state-of-the-art results.
Authors' comments: Accepted by CVPR 2019
Seungyul Han, Youngchul Sung
In importance sampling (IS)-based reinforcement learning algorithms such as
Proximal Policy Optimization (PPO), IS weights are typically clipped to avoid
large variance in learning. However, policy update from clipped statistics
induces large bias in tasks with high action dimensions, and bias from clipping
makes it difficult to reuse old samples with large IS weights. In this paper,
we consider PPO, a representative on-policy algorithm, and propose its
improvement by dimension-wise IS weight clipping which separately clips the IS
weight of each action dimension to avoid large bias and adaptively controls the
IS weight to bound policy update from the current policy. This new technique
enables efficient learning for high action-dimensional tasks and reusing of old
samples like in off-policy learning to increase the sample efficiency.
Numerical results show that the proposed new algorithm outperforms PPO and
other RL algorithms in various Open AI Gym tasks.
Authors' comments: Accepted to the 36th International Conference on Machine Learning
(ICML), 2019
Jimuyang Zhang, Sanping Zhou, Jinjun Wang, Dong Huang
The main challenge of Multiple Object Tracking (MOT) is the efficiency in
associating indefinite number of objects between video frames. Standard motion
estimators used in tracking, e.g., Long Short Term Memory (LSTM), only deal
with single object, while Re-IDentification (Re-ID) based approaches
exhaustively compare object appearances. Both approaches are computationally
costly when they are scaled to a large number of objects, making it very
difficult for real-time MOT. To address these problems, we propose a highly
efficient Deep Neural Network (DNN) that simultaneously models association
among indefinite number of objects. The inference computation of the DNN does
not increase with the number of objects. Our approach, Frame-wise Motion and
Appearance (FMA), computes the Frame-wise Motion Fields (FMF) between two
frames, which leads to very fast and reliable matching among a large number of
object bounding boxes. As auxiliary information is used to fix uncertain
matches, Frame-wise Appearance Features (FAF) are learned in parallel with
FMFs. Extensive experiments on the MOT17 benchmark show that our method
achieved real-time MOT with competitive results as the state-of-the-art
approaches.
Authors' comments: 13 pages, 4 figures, 4 tables
Jie Xing, Zheren Li, Biyuan Wang, Yuji Qi, Bingbin Yu, Farhad G. Zanjani, Aiwen Zheng, Remco Duits et al.
Breast cancer is the most common invasive cancer with the highest cancer occurrence in females. Handheld ultrasound is one of the most efficient ways to identify and diagnose the breast cancer. The area and the shape information of a lesion is very helpful for clinicians to make diagnostic decisions. In this study we propose a new deep-learning scheme, semi-pixel-wise cycle generative adversarial net (SPCGAN) for segmenting the lesion in 2D ultrasound. The method takes the advantage of a fully convolutional neural network (FCN) and a generative adversarial net to segment a lesion by using prior knowledge. We compared the proposed method to a fully connected neural network and the level set segmentation method on a test dataset consisting of 32 malignant lesions and 109 benign lesions. Our proposed method achieved a Dice similarity coefficient (DSC) of 0.92 while FCN and the level set achieved 0.90 and 0.79 respectively. Particularly, for malignant lesions, our method increases the DSC (0.90) of the fully connected neural network to 0.93 significantly (p$<$0.001). The results show that our SPCGAN can obtain robust segmentation results. The framework of SPCGAN is particularly effective when sufficient training samples are not available compared to FCN. Our proposed method may be used to relieve the radiologists' burden for annotation.
Xingyuan Zhang, Fuhai Zhang
3D Hand pose estimation from a single depth image is an essential topic in
computer vision and human-computer interaction. Although the rising of deep
learning method boosts the accuracy a lot, the problem is still hard to solve
due to the complex structure of the human hand. Existing methods with deep
learning either lose spatial information of hand structure or lack a direct
supervision of joint coordinates. In this paper, we propose a novel Pixel-wise
Regression method, which use spatial-form representation (SFR) and
differentiable decoder (DD) to solve the two problems. To use our method, we
build a model, in which we design a particular SFR and its correlative DD which
divided the 3D joint coordinates into two parts, plane coordinates and depth
coordinates and use two modules named Plane Regression (PR) and Depth
Regression (DR) to deal with them respectively. We conduct an ablation
experiment to show the method we proposed achieve better results than the
former methods. We also make an exploration on how different training
strategies influence the learned SFRs and results. The experiment on three
public datasets demonstrates that our model is comparable with the existing
state-of-the-art models and in one of them our model can reduce mean 3D joint
error by 25%.
Authors' comments: Update LaTeX version. Code coming soon
Zaizheng Li, Qidi Zhang
We prove a Hopf's lemma in the point-wise sense. The essential technique is
to prove $(-\Delta)^s_p u(x)$ is uniformly bounded in the unit ball
$B_1\subset\mathbb{R}^n$, where $u(x)=(1-|x|^2)^s_{+}$. Also we study the
global H\"older continuity of bounded positive solutions for $(-\Delta)^s_p
u(x)=f(x,u).$
Authors' comments: 33 pages
Chunfeng Song, Yan Huang, Wanli Ouyang, Liang Wang
Semantic segmentation has achieved huge progress via adopting deep Fully
Convolutional Networks (FCN). However, the performance of FCN based models
severely rely on the amounts of pixel-level annotations which are expensive and
time-consuming. To address this problem, it is a good choice to learn to
segment with weak supervision from bounding boxes. How to make full use of the
class-level and region-level supervisions from bounding boxes is the critical
challenge for the weakly supervised learning task. In this paper, we first
introduce a box-driven class-wise masking model (BCM) to remove irrelevant
regions of each class. Moreover, based on the pixel-level segment proposal
generated from the bounding box supervision, we could calculate the mean
filling rates of each class to serve as an important prior cue, then we propose
a filling rate guided adaptive loss (FR-Loss) to help the model ignore the
wrongly labeled pixels in proposals. Unlike previous methods directly training
models with the fixed individual segment proposals, our method can adjust the
model learning with global statistical information. Thus it can help reduce the
negative impacts from wrongly labeled proposals. We evaluate the proposed
method on the challenging PASCAL VOC 2012 benchmark and compare with other
methods. Extensive experimental results show that the proposed method is
effective and achieves the state-of-the-art results.
Authors' comments: Accepted by CVPR 2019
Shonosuke Sugasawa, Genya Kobayashi, Yuki Kawakubo
Estimating income distributions plays an important role in the measurement of
inequality and poverty over space. The existing literature on income
distributions predominantly focuses on estimating an income distribution for a
country or a region separately and the simultaneous estimation of multiple
income distributions has not been discussed in spite of its practical
importance. In this work, we develop an effective method for the simultaneous
estimation and inference for area-wise spatial income distributions taking
account of geographical information from grouped data. Based on the multinomial
likelihood function for grouped data, we propose a spatial state-space model
for area-wise parameters of parametric income distributions. We provide an
efficient Bayesian approach to estimation and inference for area-wise latent
parameters, which enables us to compute area-wise summary measures of income
distributions such as mean incomes and Gini indices, not only for sampled areas
but also for areas without any samples thanks to the latent spatial state-space
structure. The proposed method is demonstrated using the Japanese
municipality-wise grouped income data. The simulation studies show the
superiority of the proposed method to a crude conventional approach which
estimates the income distributions separately.
Authors' comments: 25 pages
Thomas Uriot
In this paper, we propose an extension to an existing algorithm
(instance-MIR) which tackles the multiple instance regression (MIR) problem,
also known as distribution regression. The MIR setting arises when the data is
a collection of bags, where each bag consists of several instances which
correspond to the same and unique real-valued label. The goal of a MIR
algorithm is to find a mapping from the instances of an unseen bag to its
target value. The instance-MIR algorithm treats all the instances separately
and maps each instance to a label. The final bag label is then taken as the
mean or the median of the predictions for that given bag. While it is
conceptually simple, taking a single statistic to summarize the distribution of
the labels in each bag is a limitation. In spite of this performance
bottleneck, the instance-MIR algorithm has been shown to be competitive when
compared to the current state-of-the-art methods. We address the aforementioned
issue by computing the kernel mean embeddings of the distributions of the
predicted labels, for each bag, and learn a regressor from these embeddings to
the bag label. We test our algorithm (instance-kme-MIR) on five real world
datasets and obtain better results than the baseline instance-MIR across all
the datasets, while achieving state-of-the-art results on two of the datasets.
Authors' comments: KDD 2019, FEED Workshop
Dongjun Lee
Most deep learning approaches for text-to-SQL generation are limited to the
WikiSQL dataset, which only supports very simple queries over a single table.
We focus on the Spider dataset, a complex and cross-domain text-to-SQL task,
which includes complex queries over multiple tables. In this paper, we propose
a SQL clause-wise decoding neural architecture with a self-attention based
database schema encoder to address the Spider task. Each of the clause-specific
decoders consists of a set of sub-modules, which is defined by the syntax of
each clause. Additionally, our model works recursively to support nested
queries. When evaluated on the Spider dataset, our approach achieves 4.6\% and
9.8\% accuracy gain in the test and dev sets, respectively. In addition, we
show that our model is significantly more effective at predicting complex and
nested queries than previous work.
Authors' comments: EMNLP 2019
Yaoyao Liu, Bernt Schiele, Qianru Sun
Few-shot learning aims to train efficient predictive models with a few examples. The lack of training data leads to poor models that perform high-variance or low-confidence predictions. In this paper, we propose to meta-learn the ensemble of epoch-wise empirical Bayes models (E3BM) to achieve robust predictions. "Epoch-wise" means that each training epoch has a Bayes model whose parameters are specifically learned and deployed. "Empirical" means that the hyperparameters, e.g., used for learning and ensembling the epoch-wise models, are generated by hyperprior learners conditional on task-specific data. We introduce four kinds of hyperprior learners by considering inductive vs. transductive, and epoch-dependent vs. epoch-independent, in the paradigm of meta-learning. We conduct extensive experiments for five-class few-shot tasks on three challenging benchmarks: miniImageNet, tieredImageNet, and FC100, and achieve top performance using the epoch-dependent transductive hyperprior learner, which captures the richest information. Our ablation study shows that both "epoch-wise ensemble" and "empirical" encourage high efficiency and robustness in the model performance.