Haohe Liu, Lei Xie, Jian Wu, Geng Yang
This paper presents a new input format, channel-wise subband input (CWS), for
convolutional neural networks (CNN) based music source separation (MSS) models
in the frequency domain. We aim to address the major issues in CNN-based
high-resolution MSS model: high computational cost and weight sharing between
distinctly different bands. Specifically, in this paper, we decompose the input
mixture spectra into several bands and concatenate them channel-wise as the
model input. The proposed approach enables effective weight sharing in each
subband and introduces more flexibility between channels. For comparison
purposes, we perform voice and accompaniment separation (VAS) on models with
different scales, architectures, and CWS settings. Experiments show that the
CWS input is beneficial in many aspects. We evaluate our method on musdb18hq
test set, focusing on SDR, SIR and SAR metrics. Among all our experiments, CWS
enables models to obtain 6.9% performance gain on the average metrics. With
even a smaller number of parameters, less training data, and shorter training
time, our MDenseNet with 8-bands CWS input still surpasses the original
MMDenseNet with a large margin. Moreover, CWS also reduces computational cost
and training time to a large extent.
Authors' comments: Accepted in INTERSPEECH 2020
Fotios Logothetis, Ignas Budvytis, Roberto Mecca, Roberto Cipolla
Retrieving accurate 3D reconstructions of objects from the way they reflect light is a very challenging task in computer vision. Despite more than four decades since the definition of the Photometric Stereo problem, most of the literature has had limited success when global illumination effects such as cast shadows, self-reflections and ambient light come into play, especially for specular surfaces. Recent approaches have leveraged the power of deep learning in conjunction with computer graphics in order to cope with the need of a vast number of training data in order to invert the image irradiance equation and retrieve the geometry of the object. However, rendering global illumination effects is a slow process which can limit the amount of training data that can be generated. In this work we propose a novel pixel-wise training procedure for normal prediction by replacing the training data (observation maps) of globally rendered images with independent per-pixel generated data. We show that global physical effects can be approximated on the observation map domain and this simplifies and speeds up the data creation procedure. Our network, PX-NET, achieves the state-of-the-art performance compared to other pixelwise methods on synthetic datasets, as well as the Diligent real dataset on both dense and sparse light settings.
Wasi Uddin Ahmad, Xiao Bai, Soomin Lee, Kai-Wei Chang
Natural language processing techniques have demonstrated promising results in
keyphrase generation. However, one of the major challenges in \emph{neural}
keyphrase generation is processing long documents using deep neural networks.
Generally, documents are truncated before given as inputs to neural networks.
Consequently, the models may miss essential points conveyed in the target
document. To overcome this limitation, we propose \emph{SEG-Net}, a neural
keyphrase generation model that is composed of two major components, (1) a
selector that selects the salient sentences in a document and (2) an
extractor-generator that jointly extracts and generates keyphrases from the
selected sentences. SEG-Net uses Transformer, a self-attentive architecture, as
the basic building block with a novel \emph{layer-wise} coverage attention to
summarize most of the points discussed in the document. The experimental
results on seven keyphrase generation benchmarks from scientific and web
documents demonstrate that SEG-Net outperforms the state-of-the-art neural
generative methods by a large margin.
Authors' comments: ACL 2021 (camera ready)
Shuai Zhang, Peng Zhang, Xindian Ma, Junqiu Wei, Ningning Wang, Qun Liu
Transformer has been widely-used in many Natural Language Processing (NLP) tasks and the scaled dot-product attention between tokens is a core module of Transformer. This attention is a token-wise design and its complexity is quadratic to the length of sequence, limiting its application potential for long sequence tasks. In this paper, we propose a dimension-wise attention mechanism based on which a novel language modeling approach (namely TensorCoder) can be developed. The dimension-wise attention can reduce the attention complexity from the original $O(N^2d)$ to $O(Nd^2)$, where $N$ is the length of the sequence and $d$ is the dimensionality of head. We verify TensorCoder on two tasks including masked language modeling and neural machine translation. Compared with the original Transformer, TensorCoder not only greatly reduces the calculation of the original model but also obtains improved performance on masked language modeling task (in PTB dataset) and comparable performance on machine translation tasks.
Alexandra-Ioana Albu, Alina Enescu, Luigi Malagò
Anomaly detection for Magnetic Resonance Images (MRIs) can be solved with
unsupervised methods by learning the distribution of healthy images and
identifying anomalies as outliers. In presence of an additional dataset of
unlabelled data containing also anomalies, the task can be framed as a
semi-supervised task with negative and unlabelled sample points. Recently, in
Albu et al., 2020, we have proposed a slice-wise semi-supervised method for
tumour detection based on the computation of a dissimilarity function in the
latent space of a Variational AutoEncoder, trained on unlabelled data. The
dissimilarity is computed between the encoding of the image and the encoding of
its reconstruction obtained through a different autoencoder trained only on
healthy images. In this paper we present novel and improved results for our
method, obtained by training the Variational AutoEncoders on a subset of the
HCP and BRATS-2018 datasets and testing on the remaining individuals. We show
that by training the models on higher resolution images and by improving the
quality of the reconstructions, we obtain results which are comparable with
different baselines, which employ a single VAE trained on healthy individuals.
As expected, the performance of our method increases with the size of the
threshold used to determine the presence of an anomaly.
Authors' comments: In 2020 KDD Workshop on Applied Data Science for Healthcare, August
24, 2020, San Diego, CA, USA. ACM, New York, NY, USA, 4 pages
Xing Tao, Yuexiang Li, Wenhui Zhou, Kai Ma, Yefeng Zheng
Deep learning highly relies on the quantity of annotated data. However, the
annotations for 3D volumetric medical data require experienced physicians to
spend hours or even days for investigation. Self-supervised learning is a
potential solution to get rid of the strong requirement of training data by
deeply exploiting raw data information. In this paper, we propose a novel
self-supervised learning framework for volumetric medical images. Specifically,
we propose a context restoration task, i.e., Rubik's cube++, to pre-train 3D
neural networks. Different from the existing context-restoration-based
approaches, we adopt a volume-wise transformation for context permutation,
which encourages network to better exploit the inherent 3D anatomical
information of organs. Compared to the strategy of training from scratch,
fine-tuning from the Rubik's cube++ pre-trained weight can achieve better
performance in various tasks such as pancreas segmentation and brain tissue
segmentation. The experimental results show that our self-supervised learning
method can significantly improve the accuracy of 3D deep learning networks on
volumetric medical datasets without the use of extra data.
Authors' comments: Accepted by MICCAI 2020
Mahyar Nemati, Morteza Soltani, Jie Ding, Jinho Choi
Ambient backscatter communication (AmBC) over
orthogonal-frequency-division-multiplexing (OFDM) signals has recently been
proposed as an appealing technique for low power Internet-of-Things (IoT)
applications. The special spectrum structure of OFDM signals provides a range
of flexibility in terms of bit-error-rate (BER) performance, data rate, and
power consumption. In this paper, we study subcarrier-wise backscatter
communication over ambient OFDM signals. This new AmBC is to exploit the
special spectrum structure of OFDM to transmit data over its squeezed
orthogonal subcarriers. We propose a basis transmission scheme and its two
modifications to support a higher data rate with superior BER performance
compared to existing methods. The basis scheme can transmit one bit per
subcarrier using on-off keying (OOK) modulation in the frequency domain. In the
first modification, interleaved subcarrier block transmission model is employed
to improve the BER performance of the system in frequency-selective channels.
It results in a trade-off between the size of the blocks and data rate. Thus,
in the second modification, interleaved index modulation (IM) is employed to
mitigate the data rate decrementation of the former modification. It also
stabilizes and controls the power of the signal to result in interference
reduction for a legacy receiver. Analytical and numerical evaluations provide a
proof to see the performance of the proposed method in terms of BER, data rate,
and interference.
Authors' comments: Under review for IEEE TVT 2020
Yi Tay, Zhe Zhao, Dara Bahri, Donald Metzler, Da-Cheng Juan
Achieving state-of-the-art performance on natural language understanding tasks typically relies on fine-tuning a fresh model for every task. Consequently, this approach leads to a higher overall parameter cost, along with higher technical maintenance for serving multiple models. Learning a single multi-task model that is able to do well for all the tasks has been a challenging and yet attractive proposition. In this paper, we propose \textsc{HyperGrid}, a new approach for highly effective multi-task learning. The proposed approach is based on a decomposable hypernetwork that learns grid-wise projections that help to specialize regions in weight matrices for different tasks. In order to construct the proposed hypernetwork, our method learns the interactions and composition between a global (task-agnostic) state and a local task-specific state. We apply our proposed \textsc{HyperGrid} on the current state-of-the-art T5 model, demonstrating strong performance across the GLUE and SuperGLUE benchmarks when using only a single multi-task model. Our method helps bridge the gap between fine-tuning and multi-task learning approaches.
C. W. Xiao, S. Rahmani, H. Hassanabadi
We investigate the decay properties of some beauty and charm mesons with a phenomenological potential model. First, we consider the nonrelativistic Hamiltonian of the mesonic system with Coulomb plus exponential terms and study the wave function and the energy of the system using the variational approach. Thereby, we compute the masses, the decay constants, the leptonic branching fractions of heavy-light mesons and the mixing mass parameter $\Delta {m_{{B_q}}}$. We study the radiative leptonic decay widths of ${D_s} \to \gamma \ell \bar \nu $, ${D^ - } \to \gamma \ell \bar \nu $ and the semileptonic decay widths of ${\bar B_{(s)}} \to {D_{(s)}}\ell \bar \nu $, ${\bar B_{(s)}} \to D_{(s)}^*\ell \bar \nu $. Using Isgur-Wise functions, we calculate the branching ratios of $B \to {D^{(*)}}\pi $ and two-body nonleptonic decay of $D \to K\pi $. Our results are consistent with other theoretical models and the experimental results.
M. Holler, J. -P. Lenain, M. de Naurois, R. Rauth, D. A. Sanchez
We introduce a new simulation and analysis paradigm for Imaging Atmospheric
Cherenkov Telescope (IACT) arrays, simulating the actual observation conditions
as well as individual telescope configuration for each observation unit.
Compared to existing frameworks, where simulations are usually generated using
pre-defined settings, this run-wise simulation approach implies more realistic
simulations and hence reduced systematic uncertainties. The computational
effort of this dedicated simulation concept is notably independent of the
amount of different observation configurations but just scales linearly with
observation time. This corresponds to a large advantage for increasingly
complex current and future IACT arrays where the size of the phase space makes
it computationally unfeasible to generate simulations that reach the
requirements regarding systematics using the classical simulation scheme.
Authors' comments: 13 pages, 7 figures, 2 tables. Accepted for publication in
Astroparticle Physics
Sumit Goel, Wade Hann-Caruthers
We consider the facility location problem in two dimensions. In particular,
we consider a setting where agents have Euclidean preferences, defined by their
ideal points, for a facility to be located in $\mathbb{R}^2$. We show that for
the $p-norm$ ($p \geq 1$) objective, the coordinate-wise median mechanism (CM)
has the lowest worst-case approximation ratio in the class of deterministic,
anonymous, and strategyproof mechanisms. For the minisum objective and an odd
number of agents $n$, we show that CM has a worst-case approximation ratio (AR)
of $\sqrt{2}\frac{\sqrt{n^2+1}}{n+1}$. For the $p-norm$ social cost objective
($p\geq 2$), we find that the AR for CM is bounded above by
$2^{\frac{3}{2}-\frac{2}{p}}$. We conjecture that the AR of CM actually equals
the lower bound $2^{1-\frac{1}{p}}$ (as is the case for $p=2$ and $p=\infty$)
for any $p\geq 2$.
Authors' comments: 25 pages, SAGT 2022
Jin Jin, Lin Zhang, Ethan Leng, Gregory J. Metzger, Joseph S. Koopmeiners
While current research has shown the importance of Multi-parametric MRI
(mpMRI) in diagnosing prostate cancer (PCa), further investigation is needed
for how to incorporate the specific structures of the mpMRI data, such as the
regional heterogeneity and between-voxel correlation within a subject. This
paper proposes a machine learning-based method for improved voxel-wise PCa
classification by taking into account the unique structures of the data. We
propose a multi-resolution modeling approach to account for regional
heterogeneity, where base learners trained locally at multiple resolutions are
combined using the super learner, and account for between-voxel correlation by
efficient spatial Gaussian kernel smoothing. The method is flexible in that the
super learner framework allows implementation of any classifier as the base
learner, and can be easily extended to classifying cancer into more
sub-categories. We describe detailed classification algorithm for the binary
PCa status, as well as the ordinal clinical significance of PCa for which a
weighted likelihood approach is implemented to enhance the detection of the
less prevalent cancer categories. We illustrate the advantages of the proposed
approach over conventional modeling and machine learning approaches through
simulations and application to in vivo data.
Authors' comments: 28 pages, 4 figures, 5 tables
Pierre-Francois Marteau
In this paper, we propose DiFF-RF, an ensemble approach composed of random
partitioning binary trees to detect point-wise and collective (as well as
contextual) anomalies. Thanks to a distance-based paradigm used at the leaves
of the trees, this semi-supervised approach solves a drawback that has been
identified in the isolation forest (IF) algorithm. Moreover, taking into
account the frequencies of visits in the leaves of the random trees allows to
significantly improve the performance of DiFF-RF when considering the presence
of collective anomalies. DiFF-RF is fairly easy to train, and excellent
performance can be obtained by using a simple semi-supervised procedure to
setup the extra hyper-parameter that is introduced. We first evaluate DiFF-RF
on a synthetic data set to i) verify that the limitation of the IF algorithm is
overcome, ii) demonstrate how collective anomalies are actually detected and
iii) to analyze the effect of the meta-parameters it involves. We assess the
DiFF-RF algorithm on a large set of datasets from the UCI repository, as well
as two benchmarks related to intrusion detection applications. Our experiments
show that DiFF-RF almost systematically outperforms the IF algorithm, but also
challenges the one-class SVM baseline and a deep learning variational
auto-encoder architecture. Furthermore, our experience shows that DiFF-RF can
work well in the presence of small-scale learning data, which is conversely
difficult for deep neural architectures. Finally, DiFF-RF is computationally
efficient and can be easily parallelized on multi-core architectures.
Authors' comments: arXiv admin note: text overlap with arXiv:1705.03800
Emily Moravec, Anthony Gonzalez, Simon Dicker, Stacey Alberts, Mark Brodwin, Tracy Clarke, Thomas Connor, Bandon Decker et al.
We present a multi-wavelength investigation of the radio galaxy population in
the galaxy cluster MOO J1506+5137 at $z$=1.09$\pm$0.03, which in previous work
we identified as having multiple complex radio sources. The combined dataset
used in this work includes data from the Low-Frequency Array Two-metre Sky
Survey (LoTSS), NSF's Karl G. Jansky Very Large Array (VLA), the Robert C. Byrd
Green Bank Telescope (GBT), the Spitzer Space Telescope, and the Dark Energy
Camera Legacy Survey (DECaLS). We find that there are five radio sources which
are all located within 500 kpc ($\sim$1$^{\prime}$) of the cluster center and
have radio luminosities $P_{\mathrm{1.4GHz}}$ > 1.6$\times$10$^{24}$ W
Hz$^{-1}$. The typical host galaxies are among the highest stellar mass
galaxies in the cluster. The exceptional radio activity among the massive
galaxy population appears to be linked to the dynamical state of the cluster.
The galaxy distribution suggests an ongoing merger, with a subgroup found to
the northwest of the main cluster. Further, two of the five sources are
classified as bent-tail sources with one being a potential wide-angle tail
(WAT)/hybrid morphology radio source (HyMoRS) indicating a dynamic environment.
The cluster also lies in a region of the mass-richness plane occupied by other
merging clusters in the Massive and Distant Clusters of WISE Survey (MaDCoWS).
The data suggest that during the merger phase radio activity can be
dramatically enhanced, which would contribute to the observed trend of
increased radio activity in clusters with increasing redshift.
Authors' comments: 17 pages and 8 figures. Accepted in ApJ for publication
Jñani Crawford, Eshed Margalit, Kalanit Grill-Spector, Sonia Poltoratski
The increased use of convolutional neural networks for face recognition in
science, governance, and broader society has created an acute need for methods
that can show how these 'black box' decisions are made. To be interpretable and
useful to humans, such a method should convey a model's learned classification
strategy in a way that is robust to random initializations or spurious
correlations in input data. To this end, we applied the decompositional
pixel-wise attribution method of layer-wise relevance propagation (LRP) to
resolve the decisions of several classes of VGG-16 models trained for face
recognition. We then quantified how these relevance measures vary with and
generalize across key model parameters, such as the pretraining dataset
(ImageNet or VGGFace), the finetuning task (gender or identity classification),
and random initializations of model weights. Using relevance-based image
masking, we find that relevance maps for face classification prove generally
stable across random initializations, and can generalize across finetuning
tasks. However, there is markedly less generalization across pretraining
datasets, indicating that ImageNet- and VGGFace-trained models sample face
information differently even as they achieve comparably high classification
performance. Fine-grained analyses of relevance maps across models revealed
asymmetries in generalization that point to specific benefits of choice
parameters, and suggest that it may be possible to find an underlying set of
important face image pixels that drive decisions across convolutional neural
networks and tasks. Finally, we evaluated model decision weighting against
human measures of similarity, providing a novel framework for interpreting face
recognition decisions across human and machine.
Authors' comments: 10 pages, 7 figures
Itay Hubara, Yury Nahshan, Yair Hanani, Ron Banner, Daniel Soudry
Lately, post-training quantization methods have gained considerable attention, as they are simple to use, and require only a small unlabeled calibration set. This small dataset cannot be used to fine-tune the model without significant over-fitting. Instead, these methods only use the calibration set to set the activations' dynamic ranges. However, such methods always resulted in significant accuracy degradation, when used below 8-bits (except on small datasets). Here we aim to break the 8-bit barrier. To this end, we minimize the quantization errors of each layer separately by optimizing its parameters over the calibration set. We empirically demonstrate that this approach is: (1) much less susceptible to over-fitting than the standard fine-tuning approaches, and can be used even on a very small calibration set; and (2) more powerful than previous methods, which only set the activations' dynamic ranges. Furthermore, we demonstrate how to optimally allocate the bit-widths for each layer, while constraining accuracy degradation or model compression by proposing a novel integer programming formulation. Finally, we suggest model global statistics tuning, to correct biases introduced during quantization. Together, these methods yield state-of-the-art results for both vision and text models. For instance, on ResNet50, we obtain less than 1\% accuracy degradation --- with 4-bit weights and activations in all layers, but the smallest two. We open-sourced our code.
Bart Bogaerts, Emilio Gamba, Tias Guns
We explore the problem of step-wise explaining how to solve constraint satisfaction problems, with a use case on logic grid puzzles. More specifically, we study the problem of explaining the inference steps that one can take during propagation, in a way that is easy to interpret for a person. Thereby, we aim to give the constraint solver explainable agency, which can help in building trust in the solver by being able to understand and even learn from the explanations. The main challenge is that of finding a sequence of simple explanations, where each explanation should aim to be as cognitively easy as possible for a human to verify and understand. This contrasts with the arbitrary combination of facts and constraints that the solver may use when propagating. We propose the use of a cost function to quantify how simple an individual explanation of an inference step is, and identify the explanation-production problem of finding the best sequence of explanations of a CSP. Our approach is agnostic of the underlying constraint propagation mechanisms, and can provide explanations even for inference steps resulting from combinations of constraints. In case multiple constraints are involved, we also develop a mechanism that allows to break the most difficult steps up and thus gives the user the ability to zoom in on specific parts of the explanation. Our proposed algorithm iteratively constructs the explanation sequence by using an optimistic estimate of the cost function to guide the search for the best explanation at each step. Our experiments on logic grid puzzles show the feasibility of the approach in terms of the quality of the individual explanations and the resulting explanation sequences obtained.
Qian Lou, Song Bian, Lei Jiang
Hybrid Privacy-Preserving Neural Network (HPPNN) implementing linear layers by Homomorphic Encryption (HE) and nonlinear layers by Garbled Circuit (GC) is one of the most promising secure solutions to emerging Machine Learning as a Service (MLaaS). Unfortunately, a HPPNN suffers from long inference latency, e.g., $\sim100$ seconds per image, which makes MLaaS unsatisfactory. Because HE-based linear layers of a HPPNN cost $93\%$ inference latency, it is critical to select a set of HE parameters to minimize computational overhead of linear layers. Prior HPPNNs over-pessimistically select huge HE parameters to maintain large noise budgets, since they use the same set of HE parameters for an entire network and ignore the error tolerance capability of a network. In this paper, for fast and accurate secure neural network inference, we propose an automated layer-wise parameter selector, AutoPrivacy, that leverages deep reinforcement learning to automatically determine a set of HE parameters for each linear layer in a HPPNN. The learning-based HE parameter selection policy outperforms conventional rule-based HE parameter selection policy. Compared to prior HPPNNs, AutoPrivacy-optimized HPPNNs reduce inference latency by $53\%\sim70\%$ with negligible loss of accuracy.
Fenglin Liu, Xuancheng Ren, Guangxiang Zhao, Chenyu You, Xuewei Ma, Xian Wu, Xu Sun
In sequence-to-sequence learning, e.g., natural language generation, the decoder relies on the attention mechanism to efficiently extract information from the encoder. While it is common practice to draw information from only the last encoder layer, recent work has proposed to use representations from different encoder layers for diversified levels of information. Nonetheless, the decoder still obtains only a single view of the source sequences, which might lead to insufficient training of the encoder layer stack due to the hierarchy bypassing problem. In this work, we propose layer-wise multi-view decoding, where for each decoder layer, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers are supplemented for a stereoscopic view of the source sequences. Systematic experiments and analyses show that we successfully address the hierarchy bypassing problem, require almost negligible parameter increase, and substantially improve the performance of sequence-to-sequence learning with deep representations on five diverse tasks, i.e., machine translation, abstractive summarization, image captioning, video captioning, medical report generation, and paraphrase generation. In particular, our approach achieves new state-of-the-art results on ten benchmark datasets, including a low-resource machine translation dataset and two low-resource medical report generation datasets.
Ryuichi Takanobu, Qi Zhu, Jinchao Li, Baolin Peng, Jianfeng Gao, Minlie Huang
There is a growing interest in developing goal-oriented dialog systems which
serve users in accomplishing complex tasks through multi-turn conversations.
Although many methods are devised to evaluate and improve the performance of
individual dialog components, there is a lack of comprehensive empirical study
on how different components contribute to the overall performance of a dialog
system. In this paper, we perform a system-wise evaluation and present an
empirical analysis on different types of dialog systems which are composed of
different modules in different settings. Our results show that (1) a pipeline
dialog system trained using fine-grained supervision signals at different
component levels often obtains better performance than the systems that use
joint or end-to-end models trained on coarse-grained labels, (2)
component-wise, single-turn evaluation results are not always consistent with
the overall performance of a dialog system, and (3) despite the discrepancy
between simulators and human users, simulated evaluation is still a valid
alternative to the costly human evaluation especially in the early stage of
development.
Authors' comments: SIGDIAL 2020 long paper