Xing Tao, Yuexiang Li, Wenhui Zhou, Kai Ma, Yefeng Zheng
Deep learning highly relies on the quantity of annotated data. However, the
annotations for 3D volumetric medical data require experienced physicians to
spend hours or even days for investigation. Self-supervised learning is a
potential solution to get rid of the strong requirement of training data by
deeply exploiting raw data information. In this paper, we propose a novel
self-supervised learning framework for volumetric medical images. Specifically,
we propose a context restoration task, i.e., Rubik's cube++, to pre-train 3D
neural networks. Different from the existing context-restoration-based
approaches, we adopt a volume-wise transformation for context permutation,
which encourages network to better exploit the inherent 3D anatomical
information of organs. Compared to the strategy of training from scratch,
fine-tuning from the Rubik's cube++ pre-trained weight can achieve better
performance in various tasks such as pancreas segmentation and brain tissue
segmentation. The experimental results show that our self-supervised learning
method can significantly improve the accuracy of 3D deep learning networks on
volumetric medical datasets without the use of extra data.
Authors' comments: Accepted by MICCAI 2020
Mahyar Nemati, Morteza Soltani, Jie Ding, Jinho Choi
Ambient backscatter communication (AmBC) over
orthogonal-frequency-division-multiplexing (OFDM) signals has recently been
proposed as an appealing technique for low power Internet-of-Things (IoT)
applications. The special spectrum structure of OFDM signals provides a range
of flexibility in terms of bit-error-rate (BER) performance, data rate, and
power consumption. In this paper, we study subcarrier-wise backscatter
communication over ambient OFDM signals. This new AmBC is to exploit the
special spectrum structure of OFDM to transmit data over its squeezed
orthogonal subcarriers. We propose a basis transmission scheme and its two
modifications to support a higher data rate with superior BER performance
compared to existing methods. The basis scheme can transmit one bit per
subcarrier using on-off keying (OOK) modulation in the frequency domain. In the
first modification, interleaved subcarrier block transmission model is employed
to improve the BER performance of the system in frequency-selective channels.
It results in a trade-off between the size of the blocks and data rate. Thus,
in the second modification, interleaved index modulation (IM) is employed to
mitigate the data rate decrementation of the former modification. It also
stabilizes and controls the power of the signal to result in interference
reduction for a legacy receiver. Analytical and numerical evaluations provide a
proof to see the performance of the proposed method in terms of BER, data rate,
and interference.
Authors' comments: Under review for IEEE TVT 2020
Yi Tay, Zhe Zhao, Dara Bahri, Donald Metzler, Da-Cheng Juan
Achieving state-of-the-art performance on natural language understanding tasks typically relies on fine-tuning a fresh model for every task. Consequently, this approach leads to a higher overall parameter cost, along with higher technical maintenance for serving multiple models. Learning a single multi-task model that is able to do well for all the tasks has been a challenging and yet attractive proposition. In this paper, we propose \textsc{HyperGrid}, a new approach for highly effective multi-task learning. The proposed approach is based on a decomposable hypernetwork that learns grid-wise projections that help to specialize regions in weight matrices for different tasks. In order to construct the proposed hypernetwork, our method learns the interactions and composition between a global (task-agnostic) state and a local task-specific state. We apply our proposed \textsc{HyperGrid} on the current state-of-the-art T5 model, demonstrating strong performance across the GLUE and SuperGLUE benchmarks when using only a single multi-task model. Our method helps bridge the gap between fine-tuning and multi-task learning approaches.
C. W. Xiao, S. Rahmani, H. Hassanabadi
We investigate the decay properties of some beauty and charm mesons with a phenomenological potential model. First, we consider the nonrelativistic Hamiltonian of the mesonic system with Coulomb plus exponential terms and study the wave function and the energy of the system using the variational approach. Thereby, we compute the masses, the decay constants, the leptonic branching fractions of heavy-light mesons and the mixing mass parameter $\Delta {m_{{B_q}}}$. We study the radiative leptonic decay widths of ${D_s} \to \gamma \ell \bar \nu $, ${D^ - } \to \gamma \ell \bar \nu $ and the semileptonic decay widths of ${\bar B_{(s)}} \to {D_{(s)}}\ell \bar \nu $, ${\bar B_{(s)}} \to D_{(s)}^*\ell \bar \nu $. Using Isgur-Wise functions, we calculate the branching ratios of $B \to {D^{(*)}}\pi $ and two-body nonleptonic decay of $D \to K\pi $. Our results are consistent with other theoretical models and the experimental results.
M. Holler, J. -P. Lenain, M. de Naurois, R. Rauth, D. A. Sanchez
We introduce a new simulation and analysis paradigm for Imaging Atmospheric
Cherenkov Telescope (IACT) arrays, simulating the actual observation conditions
as well as individual telescope configuration for each observation unit.
Compared to existing frameworks, where simulations are usually generated using
pre-defined settings, this run-wise simulation approach implies more realistic
simulations and hence reduced systematic uncertainties. The computational
effort of this dedicated simulation concept is notably independent of the
amount of different observation configurations but just scales linearly with
observation time. This corresponds to a large advantage for increasingly
complex current and future IACT arrays where the size of the phase space makes
it computationally unfeasible to generate simulations that reach the
requirements regarding systematics using the classical simulation scheme.
Authors' comments: 13 pages, 7 figures, 2 tables. Accepted for publication in
Astroparticle Physics
Sumit Goel, Wade Hann-Caruthers
We consider the facility location problem in two dimensions. In particular,
we consider a setting where agents have Euclidean preferences, defined by their
ideal points, for a facility to be located in $\mathbb{R}^2$. We show that for
the $p-norm$ ($p \geq 1$) objective, the coordinate-wise median mechanism (CM)
has the lowest worst-case approximation ratio in the class of deterministic,
anonymous, and strategyproof mechanisms. For the minisum objective and an odd
number of agents $n$, we show that CM has a worst-case approximation ratio (AR)
of $\sqrt{2}\frac{\sqrt{n^2+1}}{n+1}$. For the $p-norm$ social cost objective
($p\geq 2$), we find that the AR for CM is bounded above by
$2^{\frac{3}{2}-\frac{2}{p}}$. We conjecture that the AR of CM actually equals
the lower bound $2^{1-\frac{1}{p}}$ (as is the case for $p=2$ and $p=\infty$)
for any $p\geq 2$.
Authors' comments: 25 pages, SAGT 2022
Jin Jin, Lin Zhang, Ethan Leng, Gregory J. Metzger, Joseph S. Koopmeiners
While current research has shown the importance of Multi-parametric MRI
(mpMRI) in diagnosing prostate cancer (PCa), further investigation is needed
for how to incorporate the specific structures of the mpMRI data, such as the
regional heterogeneity and between-voxel correlation within a subject. This
paper proposes a machine learning-based method for improved voxel-wise PCa
classification by taking into account the unique structures of the data. We
propose a multi-resolution modeling approach to account for regional
heterogeneity, where base learners trained locally at multiple resolutions are
combined using the super learner, and account for between-voxel correlation by
efficient spatial Gaussian kernel smoothing. The method is flexible in that the
super learner framework allows implementation of any classifier as the base
learner, and can be easily extended to classifying cancer into more
sub-categories. We describe detailed classification algorithm for the binary
PCa status, as well as the ordinal clinical significance of PCa for which a
weighted likelihood approach is implemented to enhance the detection of the
less prevalent cancer categories. We illustrate the advantages of the proposed
approach over conventional modeling and machine learning approaches through
simulations and application to in vivo data.
Authors' comments: 28 pages, 4 figures, 5 tables
Pierre-Francois Marteau
In this paper, we propose DiFF-RF, an ensemble approach composed of random
partitioning binary trees to detect point-wise and collective (as well as
contextual) anomalies. Thanks to a distance-based paradigm used at the leaves
of the trees, this semi-supervised approach solves a drawback that has been
identified in the isolation forest (IF) algorithm. Moreover, taking into
account the frequencies of visits in the leaves of the random trees allows to
significantly improve the performance of DiFF-RF when considering the presence
of collective anomalies. DiFF-RF is fairly easy to train, and excellent
performance can be obtained by using a simple semi-supervised procedure to
setup the extra hyper-parameter that is introduced. We first evaluate DiFF-RF
on a synthetic data set to i) verify that the limitation of the IF algorithm is
overcome, ii) demonstrate how collective anomalies are actually detected and
iii) to analyze the effect of the meta-parameters it involves. We assess the
DiFF-RF algorithm on a large set of datasets from the UCI repository, as well
as two benchmarks related to intrusion detection applications. Our experiments
show that DiFF-RF almost systematically outperforms the IF algorithm, but also
challenges the one-class SVM baseline and a deep learning variational
auto-encoder architecture. Furthermore, our experience shows that DiFF-RF can
work well in the presence of small-scale learning data, which is conversely
difficult for deep neural architectures. Finally, DiFF-RF is computationally
efficient and can be easily parallelized on multi-core architectures.
Authors' comments: arXiv admin note: text overlap with arXiv:1705.03800
Emily Moravec, Anthony Gonzalez, Simon Dicker, Stacey Alberts, Mark Brodwin, Tracy Clarke, Thomas Connor, Bandon Decker et al.
We present a multi-wavelength investigation of the radio galaxy population in
the galaxy cluster MOO J1506+5137 at $z$=1.09$\pm$0.03, which in previous work
we identified as having multiple complex radio sources. The combined dataset
used in this work includes data from the Low-Frequency Array Two-metre Sky
Survey (LoTSS), NSF's Karl G. Jansky Very Large Array (VLA), the Robert C. Byrd
Green Bank Telescope (GBT), the Spitzer Space Telescope, and the Dark Energy
Camera Legacy Survey (DECaLS). We find that there are five radio sources which
are all located within 500 kpc ($\sim$1$^{\prime}$) of the cluster center and
have radio luminosities $P_{\mathrm{1.4GHz}}$ > 1.6$\times$10$^{24}$ W
Hz$^{-1}$. The typical host galaxies are among the highest stellar mass
galaxies in the cluster. The exceptional radio activity among the massive
galaxy population appears to be linked to the dynamical state of the cluster.
The galaxy distribution suggests an ongoing merger, with a subgroup found to
the northwest of the main cluster. Further, two of the five sources are
classified as bent-tail sources with one being a potential wide-angle tail
(WAT)/hybrid morphology radio source (HyMoRS) indicating a dynamic environment.
The cluster also lies in a region of the mass-richness plane occupied by other
merging clusters in the Massive and Distant Clusters of WISE Survey (MaDCoWS).
The data suggest that during the merger phase radio activity can be
dramatically enhanced, which would contribute to the observed trend of
increased radio activity in clusters with increasing redshift.
Authors' comments: 17 pages and 8 figures. Accepted in ApJ for publication
Jñani Crawford, Eshed Margalit, Kalanit Grill-Spector, Sonia Poltoratski
The increased use of convolutional neural networks for face recognition in
science, governance, and broader society has created an acute need for methods
that can show how these 'black box' decisions are made. To be interpretable and
useful to humans, such a method should convey a model's learned classification
strategy in a way that is robust to random initializations or spurious
correlations in input data. To this end, we applied the decompositional
pixel-wise attribution method of layer-wise relevance propagation (LRP) to
resolve the decisions of several classes of VGG-16 models trained for face
recognition. We then quantified how these relevance measures vary with and
generalize across key model parameters, such as the pretraining dataset
(ImageNet or VGGFace), the finetuning task (gender or identity classification),
and random initializations of model weights. Using relevance-based image
masking, we find that relevance maps for face classification prove generally
stable across random initializations, and can generalize across finetuning
tasks. However, there is markedly less generalization across pretraining
datasets, indicating that ImageNet- and VGGFace-trained models sample face
information differently even as they achieve comparably high classification
performance. Fine-grained analyses of relevance maps across models revealed
asymmetries in generalization that point to specific benefits of choice
parameters, and suggest that it may be possible to find an underlying set of
important face image pixels that drive decisions across convolutional neural
networks and tasks. Finally, we evaluated model decision weighting against
human measures of similarity, providing a novel framework for interpreting face
recognition decisions across human and machine.
Authors' comments: 10 pages, 7 figures
Itay Hubara, Yury Nahshan, Yair Hanani, Ron Banner, Daniel Soudry
Lately, post-training quantization methods have gained considerable attention, as they are simple to use, and require only a small unlabeled calibration set. This small dataset cannot be used to fine-tune the model without significant over-fitting. Instead, these methods only use the calibration set to set the activations' dynamic ranges. However, such methods always resulted in significant accuracy degradation, when used below 8-bits (except on small datasets). Here we aim to break the 8-bit barrier. To this end, we minimize the quantization errors of each layer separately by optimizing its parameters over the calibration set. We empirically demonstrate that this approach is: (1) much less susceptible to over-fitting than the standard fine-tuning approaches, and can be used even on a very small calibration set; and (2) more powerful than previous methods, which only set the activations' dynamic ranges. Furthermore, we demonstrate how to optimally allocate the bit-widths for each layer, while constraining accuracy degradation or model compression by proposing a novel integer programming formulation. Finally, we suggest model global statistics tuning, to correct biases introduced during quantization. Together, these methods yield state-of-the-art results for both vision and text models. For instance, on ResNet50, we obtain less than 1\% accuracy degradation --- with 4-bit weights and activations in all layers, but the smallest two. We open-sourced our code.
Bart Bogaerts, Emilio Gamba, Tias Guns
We explore the problem of step-wise explaining how to solve constraint satisfaction problems, with a use case on logic grid puzzles. More specifically, we study the problem of explaining the inference steps that one can take during propagation, in a way that is easy to interpret for a person. Thereby, we aim to give the constraint solver explainable agency, which can help in building trust in the solver by being able to understand and even learn from the explanations. The main challenge is that of finding a sequence of simple explanations, where each explanation should aim to be as cognitively easy as possible for a human to verify and understand. This contrasts with the arbitrary combination of facts and constraints that the solver may use when propagating. We propose the use of a cost function to quantify how simple an individual explanation of an inference step is, and identify the explanation-production problem of finding the best sequence of explanations of a CSP. Our approach is agnostic of the underlying constraint propagation mechanisms, and can provide explanations even for inference steps resulting from combinations of constraints. In case multiple constraints are involved, we also develop a mechanism that allows to break the most difficult steps up and thus gives the user the ability to zoom in on specific parts of the explanation. Our proposed algorithm iteratively constructs the explanation sequence by using an optimistic estimate of the cost function to guide the search for the best explanation at each step. Our experiments on logic grid puzzles show the feasibility of the approach in terms of the quality of the individual explanations and the resulting explanation sequences obtained.
Qian Lou, Song Bian, Lei Jiang
Hybrid Privacy-Preserving Neural Network (HPPNN) implementing linear layers by Homomorphic Encryption (HE) and nonlinear layers by Garbled Circuit (GC) is one of the most promising secure solutions to emerging Machine Learning as a Service (MLaaS). Unfortunately, a HPPNN suffers from long inference latency, e.g., $\sim100$ seconds per image, which makes MLaaS unsatisfactory. Because HE-based linear layers of a HPPNN cost $93\%$ inference latency, it is critical to select a set of HE parameters to minimize computational overhead of linear layers. Prior HPPNNs over-pessimistically select huge HE parameters to maintain large noise budgets, since they use the same set of HE parameters for an entire network and ignore the error tolerance capability of a network. In this paper, for fast and accurate secure neural network inference, we propose an automated layer-wise parameter selector, AutoPrivacy, that leverages deep reinforcement learning to automatically determine a set of HE parameters for each linear layer in a HPPNN. The learning-based HE parameter selection policy outperforms conventional rule-based HE parameter selection policy. Compared to prior HPPNNs, AutoPrivacy-optimized HPPNNs reduce inference latency by $53\%\sim70\%$ with negligible loss of accuracy.
Fenglin Liu, Xuancheng Ren, Guangxiang Zhao, Chenyu You, Xuewei Ma, Xian Wu, Xu Sun
In sequence-to-sequence learning, e.g., natural language generation, the decoder relies on the attention mechanism to efficiently extract information from the encoder. While it is common practice to draw information from only the last encoder layer, recent work has proposed to use representations from different encoder layers for diversified levels of information. Nonetheless, the decoder still obtains only a single view of the source sequences, which might lead to insufficient training of the encoder layer stack due to the hierarchy bypassing problem. In this work, we propose layer-wise multi-view decoding, where for each decoder layer, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers are supplemented for a stereoscopic view of the source sequences. Systematic experiments and analyses show that we successfully address the hierarchy bypassing problem, require almost negligible parameter increase, and substantially improve the performance of sequence-to-sequence learning with deep representations on five diverse tasks, i.e., machine translation, abstractive summarization, image captioning, video captioning, medical report generation, and paraphrase generation. In particular, our approach achieves new state-of-the-art results on ten benchmark datasets, including a low-resource machine translation dataset and two low-resource medical report generation datasets.
Ryuichi Takanobu, Qi Zhu, Jinchao Li, Baolin Peng, Jianfeng Gao, Minlie Huang
There is a growing interest in developing goal-oriented dialog systems which
serve users in accomplishing complex tasks through multi-turn conversations.
Although many methods are devised to evaluate and improve the performance of
individual dialog components, there is a lack of comprehensive empirical study
on how different components contribute to the overall performance of a dialog
system. In this paper, we perform a system-wise evaluation and present an
empirical analysis on different types of dialog systems which are composed of
different modules in different settings. Our results show that (1) a pipeline
dialog system trained using fine-grained supervision signals at different
component levels often obtains better performance than the systems that use
joint or end-to-end models trained on coarse-grained labels, (2)
component-wise, single-turn evaluation results are not always consistent with
the overall performance of a dialog system, and (3) despite the discrepancy
between simulators and human users, simulated evaluation is still a valid
alternative to the costly human evaluation especially in the early stage of
development.
Authors' comments: SIGDIAL 2020 long paper
Kenneth W. Shum, Hanxu Hou
A novel implementation of a special class of Galois ring, in which the
multiplication can be realized by a cyclic convolution, is applied to the
construction of network codes. The primitive operations involved are byte-wise
shifts and integer additions modulo a power of 2. Both of them can be executed
efficiently in microprocessors. An illustration of how to apply this idea to
array code is given at the end of the paper.
Authors' comments: Accepted for presentation in ISIT2020
C. A. Theissen, D. C. Bardalez Gagliuffi, J. K. Faherty, J. Gagne, A. J. Burgasser
We present a parallax solution for WISE J135501.90-825838.9, a spectral
binary with spectral types L7+T7.5 and candidate AB Doradus member. Using
$WISE$ astrometry, we obtain a distance of $d = 16.7\pm5.3$ pc. This
preliminary parallax solution provides further evidence that WISE
J135501.90-825838.9 is a member of AB Doradus (130-200 Myr), and when combined
with evolutionary models predicts masses of 11 $M_\mathrm{Jup}$ and 9
$M_\mathrm{Jup}$ for both components.
Authors' comments: Submitted to RNAAS
Yin Tang, Qi Teng, Lei Zhang, Fuhong Min, Jun He
Recently, convolutional neural networks (CNNs) have set latest
state-of-the-art on various human activity recognition (HAR) datasets. However,
deep CNNs often require more computing resources, which limits their
applications in embedded HAR. Although many successful methods have been
proposed to reduce memory and FLOPs of CNNs, they often involve special network
architectures designed for visual tasks, which are not suitable for deep HAR
tasks with time series sensor signals, due to remarkable discrepancy.
Therefore, it is necessary to develop lightweight deep models to perform HAR.
As filter is the basic unit in constructing CNNs, it deserves further research
whether re-designing smaller filters is applicable for deep HAR. In the paper,
inspired by the idea, we proposed a lightweight CNN using Lego filters for HAR.
A set of lower-dimensional filters is used as Lego bricks to be stacked for
conventional filters, which does not rely on any special network structure. The
local loss function is used to train model. To our knowledge, this is the first
paper that proposes lightweight CNN for HAR in ubiquitous and wearable
computing arena. The experiment results on five public HAR datasets, UCI-HAR
dataset, OPPORTUNITY dataset, UNIMIB-SHAR dataset, PAMAP2 dataset, and WISDM
dataset collected from either smartphones or multiple sensor nodes, indicate
that our novel Lego CNN with local loss can greatly reduce memory and
computation cost over CNN, while achieving higher accuracy. That is to say, the
proposed model is smaller, faster and more accurate. Finally, we evaluate the
actual performance on an Android smartphone.
Authors' comments: 11 pages, 11 figures
Alessandro Ilic Mezza, Emanuël A. P. Habets, Meinard Müller, Augusto Sarti
The performance of machine learning algorithms is known to be negatively
affected by possible mismatches between training (source) and test (target)
data distributions. In fact, this problem emerges whenever an acoustic scene
classification system which has been trained on data recorded by a given device
is applied to samples acquired under different acoustic conditions or captured
by mismatched recording devices. To address this issue, we propose an
unsupervised domain adaptation method that consists of aligning the first- and
second-order sample statistics of each frequency band of target-domain acoustic
scenes to the ones of the source-domain training dataset. This model-agnostic
approach is devised to adapt audio samples from unseen devices before they are
fed to a pre-trained classifier, thus avoiding any further learning phase.
Using the DCASE 2018 Task 1-B development dataset, we show that the proposed
method outperforms the state-of-the-art unsupervised methods found in the
literature in terms of both source- and target-domain classification accuracy.
Authors' comments: 5 pages, 1 figure, 3 tables, submitted to EUSIPCO 2020
Tengteng Zhang, Yiqin Yu, Jing Mei, Zefang Tang, Xiang Zhang, Shaochun Li
The PICO framework (Population, Intervention, Comparison, and Outcome) is
usually used to formulate evidence in the medical domain. The major task of
PICO extraction is to extract sentences from medical literature and classify
them into each class. However, in most circumstances, there will be more than
one evidences in an extracted sentence even it has been categorized to a
certain class. In order to address this problem, we propose a step-wise disease
Named Entity Recognition (DNER) extraction and PICO identification method. With
our method, sentences in paper title and abstract are first classified into
different classes of PICO, and medical entities are then identified and
classified into P and O. Different kinds of deep learning frameworks are used
and experimental results show that our method will achieve high performance and
fine-grained extraction results comparing with conventional PICO extraction
works.
Authors' comments: 9 pages, 3 figures