Tuan Phung-Duc
Retrial phenomenon naturally arises in various systems such as call centers,
cellular networks and random access protocols in local area networks. This
paper gives a comprehensive survey on theory and applications of retrial queues
in these systems. We investigate the state of the art of the theoretical
researches including exact solutions, stability, asymptotic analyses and
multidimensional models. We present an overview on retrial models arising from
real world applications. Some open problems and promising research directions
are also discussed.
Authors' comments: 31 pages. In: Stochastic Operations Research in Business and Industry
(eds. by Tadashi Dohi, Katsunori Ano and Shoji Kasahara)
Chenze Shao, Yang Feng, Jinchao Zhang, Fandong Meng, Xilin Chen, Jie Zhou
Non-Autoregressive Transformer (NAT) aims to accelerate the Transformer model
through discarding the autoregressive mechanism and generating target words
independently, which fails to exploit the target sequential information.
Over-translation and under-translation errors often occur for the above reason,
especially in the long sentence translation scenario. In this paper, we propose
two approaches to retrieve the target sequential information for NAT to enhance
its translation ability while preserving the fast-decoding property. Firstly,
we propose a sequence-level training method based on a novel reinforcement
algorithm for NAT (Reinforce-NAT) to reduce the variance and stabilize the
training procedure. Secondly, we propose an innovative Transformer decoder
named FS-decoder to fuse the target sequential information into the top layer
of the decoder. Experimental results on three translation tasks show that the
Reinforce-NAT surpasses the baseline NAT system by a significant margin on BLEU
without decelerating the decoding speed and the FS-decoder achieves comparable
translation performance to the autoregressive Transformer with considerable
speedup.
Authors' comments: 12 pages, 4 figures, ACL 2019 long paper
Kelly L. Wiggers, Alceu S. Britto, Laurent Heutte, Alessandro L. Koerich, Luiz S. Oliveira
This paper presents a novel approach for image retrieval and pattern spotting
in document image collections. The manual feature engineering is avoided by
learning a similarity-based representation using a Siamese Neural Network
trained on a previously prepared subset of image pairs from the ImageNet
dataset. The learned representation is used to provide the similarity-based
feature maps used to find relevant image candidates in the data collection
given an image query. A robust experimental protocol based on the public
Tobacco800 document image collection shows that the proposed method compares
favorably against state-of-the-art document image retrieval methods, reaching
0.94 and 0.83 of mean average precision (mAP) for retrieval and pattern
spotting (IoU=0.7), respectively. Besides, we have evaluated the proposed
method considering feature maps of different sizes, showing the impact of
reducing the number of features in the retrieval performance and
time-consuming.
Authors' comments: Accepted for IJCNN 2019
Fatemeh Kazemi, Esmaeil Karimi, Anoosheh Heidarzadeh, Alex Sprintson
In this paper, we study the multi-server setting of the \emph{Private
Information Retrieval with Coded Side Information (PIR-CSI)} problem. In this
problem, there are $K$ messages replicated across $N$ servers, and there is a
user who wishes to download one message from the servers without revealing any
information to any server about the identity of the requested message. The user
has a side information which is a linear combination of a subset of $M$
messages in the database. The parameter $M$ is known to all servers in advance,
whereas the indices and the coefficients of the messages in the user's side
information are unknown to any server \emph{a priori}.
We focus on a class of PIR-CSI schemes, referred to as \emph{server-symmetric
schemes}, in which the queries/answers to/from different servers are symmetric
in structure. We define the \emph{rate} of a PIR-CSI scheme as its minimum
download rate among all problem instances, and define the
\emph{server-symmetric capacity} of the PIR-CSI problem as the supremum of
rates over all server-symmetric PIR-CSI schemes. Our main results are as
follows: (i) when the side information is not a function of the user's
requested message, the capacity is given by
${(1+{1}/{N}+\dots+{1}/{N^{\left\lceil \frac{K}{M+1}\right\rceil -1}})^{-1}}$
for any ${1\leq M\leq K-1}$; and (ii) when the side information is a function
of the user's requested message, the capacity is equal to $1$ for $M=2$ and
$M=K$, and it is equal to ${N}/{(N+1)}$ for any ${3 \leq M \leq K-1}$. The
converse proofs rely on new information-theoretic arguments, and the
achievability schemes are inspired by our recently proposed scheme for
single-server PIR-CSI as well as the Sun-Jafar scheme for multi-server PIR.
Authors' comments: 16 pages; A short version of this work was presented at the 16th
Canadian Workshop on Information Theory (CWIT'19), Hamilton, Ontario, Canada,
June 2019
Mohamad Ali-Dib, Kristen Menou, Alan P. Jackson, Chenchong Zhu, Noah Hammond
Crater ellipticity determination is a complex and time consuming task that so
far has evaded successful automation. We train a state of the art computer
vision algorithm to identify craters in Lunar digital elevation maps and
retrieve their sizes and 2D shapes. The computational backbone of the model is
MaskRCNN, an "instance segmentation" general framework that detects craters in
an image while simultaneously producing a mask for each crater that traces its
outer rim. Our post-processing pipeline then finds the closest fitting ellipse
to these masks, allowing us to retrieve the crater ellipticities. Our model is
able to correctly identify 87\% of known craters in the longitude range we hid
from the network during training and validation (test set), while predicting
thousands of additional craters not present in our training data. Manual
validation of a subset of these "new" craters indicates that a majority of them
are real, which we take as an indicator of the strength of our model in
learning to identify craters, despite incomplete training data. The crater
size, ellipticity, and depth distributions predicted by our model are
consistent with human-generated results. The model allows us to perform a large
scale search for differences in crater diameter and shape distributions between
the lunar highlands and maria, and we exclude any such differences with a high
statistical significance. The predicted test set catalogue and trained model
are available here: https://github.com/malidib/Craters_MaskRCNN/.
Authors' comments: 59 pages, 13 figures, Accepted for publication in Icarus
Daya Guo, Duyu Tang, Nan Duan, Ming Zhou, Jian Yin
In this paper, we present an approach to incorporate retrieved datapoints as
supporting evidence for context-dependent semantic parsing, such as generating
source code conditioned on the class environment. Our approach naturally
combines a retrieval model and a meta-learner, where the former learns to find
similar datapoints from the training data, and the latter considers retrieved
datapoints as a pseudo task for fast adaptation. Specifically, our retriever is
a context-aware encoder-decoder model with a latent variable which takes
context environment into consideration, and our meta-learner learns to utilize
retrieved datapoints in a model-agnostic meta-learning paradigm for fast
adaptation. We conduct experiments on CONCODE and CSQA datasets, where the
context refers to class environment in JAVA codes and conversational history,
respectively. We use sequence-to-action model as the base semantic parser,
which performs the state-of-the-art accuracy on both datasets. Results show
that both the context-aware retriever and the meta-learning strategy improve
accuracy, and our approach performs better than retrieve-and-edit baselines.
Authors' comments: Accepted by ACL 2019
Lianli Gao, Xiaosu Zhu, Jingkuan Song, Zhou Zhao, Heng Tao Shen
Product Quantization (PQ) has long been a mainstream for generating an exponentially large codebook at very low memory/time cost. Despite its success, PQ is still tricky for the decomposition of high-dimensional vector space, and the retraining of model is usually unavoidable when the code length changes. In this work, we propose a deep progressive quantization (DPQ) model, as an alternative to PQ, for large scale image retrieval. DPQ learns the quantization codes sequentially and approximates the original feature space progressively. Therefore, we can train the quantization codes with different code lengths simultaneously. Specifically, we first utilize the label information for guiding the learning of visual features, and then apply several quantization blocks to progressively approach the visual features. Each quantization block is designed to be a layer of a convolutional neural network, and the whole framework can be trained in an end-to-end manner. Experimental results on the benchmark datasets show that our model significantly outperforms the state-of-the-art for image retrieval. Our model is trained once for different code lengths and therefore requires less computation time. Additional ablation study demonstrates the effect of each component of our proposed model. Our code is released at https://github.com/cfm-uestc/DPQ.
Yair Feldman, Ran El-Yaniv
This paper is concerned with the task of multi-hop open-domain Question
Answering (QA). This task is particularly challenging since it requires the
simultaneous performance of textual reasoning and efficient searching. We
present a method for retrieving multiple supporting paragraphs, nested amidst a
large knowledge base, which contain the necessary evidence to answer a given
question. Our method iteratively retrieves supporting paragraphs by forming a
joint vector representation of both a question and a paragraph. The retrieval
is performed by considering contextualized sentence-level representations of
the paragraphs in the knowledge source. Our method achieves state-of-the-art
performance over two well-known datasets, SQuAD-Open and HotpotQA, which serve
as our single- and multi-hop open-domain QA benchmarks, respectively.
Authors' comments: ACL 2019
Jianfeng Cai, Yuling Jiao, Xiliang Lu, Juntao You
Sparse phase retrieval plays an important role in many fields of applied science and thus attracts lots of attention. In this paper, we propose a \underline{sto}chastic alte\underline{r}nating \underline{m}inimizing method for \underline{sp}arse ph\underline{a}se \underline{r}etrieval (\textit{StormSpar}) algorithm which {emprically} is able to recover $n$-dimensional $s$-sparse signals from only $O(s\,\mathrm{log}\, n)$ number of measurements without a desired initial value required by many existing methods. In \textit{StormSpar}, the hard-thresholding pursuit (HTP) algorithm is employed to solve the sparse constraint least square sub-problems. The main competitive feature of \textit{StormSpar} is that it converges globally requiring optimal order of number of samples with random initialization. Extensive numerical experiments are given to validate the proposed algorithm.
Yale Song, Mohammad Soleymani
Visual-semantic embedding aims to find a shared latent space where related
visual and textual instances are close to each other. Most current methods
learn injective embedding functions that map an instance to a single point in
the shared space. Unfortunately, injective embedding cannot effectively handle
polysemous instances with multiple possible meanings; at best, it would find an
average representation of different meanings. This hinders its use in
real-world scenarios where individual instances and their cross-modal
associations are often ambiguous. In this work, we introduce Polysemous
Instance Embedding Networks (PIE-Nets) that compute multiple and diverse
representations of an instance by combining global context with locally-guided
features via multi-head self-attention and residual learning. To learn
visual-semantic embedding, we tie-up two PIE-Nets and optimize them jointly in
the multiple instance learning framework. Most existing work on cross-modal
retrieval focuses on image-text data. Here, we also tackle a more challenging
case of video-text retrieval. To facilitate further research in video-text
retrieval, we release a new dataset of 50K video-sentence pairs collected from
social media, dubbed MRW (my reaction when). We demonstrate our approach on
both image-text and video-text retrieval scenarios using MS-COCO, TGIF, and our
new MRW dataset.
Authors' comments: CVPR 2019. Includes supplementary material. Have updated results on
TGIF and MRW
Kyle Swanson, Lili Yu, Christopher Fox, Jeremy Wohlwend, Tao Lei
Response suggestion is an important task for building human-computer conversation systems. Recent approaches to conversation modeling have introduced new model architectures with impressive results, but relatively little attention has been paid to whether these models would be practical in a production setting. In this paper, we describe the unique challenges of building a production retrieval-based conversation system, which selects outputs from a whitelist of candidate responses. To address these challenges, we propose a dual encoder architecture which performs rapid inference and scales well with the size of the whitelist. We also introduce and compare two methods for generating whitelists, and we carry out a comprehensive analysis of the model and whitelists. Experimental results on a large, proprietary help desk chat dataset, including both offline metrics and a human evaluation, indicate production-quality performance and illustrate key lessons about conversation modeling in practice.
Zhaoqun Li, Cheng Xu, Biao Leng
Learning discriminative shape representations is a crucial issue for
large-scale 3D shape retrieval. In this paper, we propose the Collaborative
Inner Product Loss (CIP Loss) to obtain ideal shape embedding that
discriminative among different categories and clustered within the same class.
Utilizing simple inner product operation, CIP loss explicitly enforces the
features of the same class to be clustered in a linear subspace, while
inter-class subspaces are constrained to be at least orthogonal. Compared to
previous metric loss functions, CIP loss could provide more clear geometric
interpretation for the embedding than Euclidean margin, and is easy to
implement without normalization operation referring to cosine margin. Moreover,
our proposed loss term can combine with other commonly used loss functions and
can be easily plugged into existing off-the-shelf architectures. Extensive
experiments conducted on the two public 3D object retrieval datasets, ModelNet
and ShapeNetCore 55, demonstrate the effectiveness of our proposal, and our
method has achieved state-of-the-art results on both datasets.
Authors' comments: Accepted by IJCAI2019
Kenton Lee, Ming-Wei Chang, Kristina Toutanova
Recent work on open domain question answering (QA) assumes strong supervision
of the supporting evidence and/or assumes a blackbox information retrieval (IR)
system to retrieve evidence candidates. We argue that both are suboptimal,
since gold evidence is not always available, and QA is fundamentally different
from IR. We show for the first time that it is possible to jointly learn the
retriever and reader from question-answer string pairs and without any IR
system. In this setting, evidence retrieval from all of Wikipedia is treated as
a latent variable. Since this is impractical to learn from scratch, we
pre-train the retriever with an Inverse Cloze Task. We evaluate on open
versions of five QA datasets. On datasets where the questioner already knows
the answer, a traditional IR system such as BM25 is sufficient. On datasets
where a user is genuinely seeking an answer, we show that learned retrieval is
crucial, outperforming BM25 by up to 19 points in exact match.
Authors' comments: Accepted to ACL 2019
Boris N. Oreshkin, Negar Rostamzadeh, Pedro O. Pinheiro, Christopher Pal
We address the problem of learning fine-grained cross-modal representations. We propose an instance-based deep metric learning approach in joint visual and textual space. The key novelty of this paper is that it shows that using per-image semantic supervision leads to substantial improvement in zero-shot performance over using class-only supervision. On top of that, we provide a probabilistic justification for a metric rescaling approach that solves a very common problem in the generalized zero-shot learning setting, i.e., classifying test images from unseen classes as one of the classes seen during training. We evaluate our approach on two fine-grained zero-shot learning datasets: CUB and FLOWERS. We find that on the generalized zero-shot classification task CLAREL consistently outperforms the existing approaches on both datasets.
M. F. Kasim, A. F. A. Bott, P. Tzeferacos, D. Q. Lamb, G. Gregori, S. M. Vinko
Proton radiography is a technique in high energy density science to diagnose magnetic and/or electric fields in a plasma by firing a proton beam and detecting its modulated intensity profile on a screen. Current approaches to retrieve the integrated field from the modulated intensity profile require the unmodulated beam intensity profile before the interaction, which is rarely available experimentally due to shot-to-shot variability. In this paper, we present a statistical method to retrieve the integrated field without needing to know the exact source profile. We apply our method to experimental data, showing the robustness of our approach. Our proposed technique allows not only for the retrieval of the path-integrated fields, but also of the statistical properties of the fields.
Daniele Bonadiman, Anjishnu Kumar, Arpit Mittal
The goal of a Question Paraphrase Retrieval (QPR) system is to retrieve equivalent questions that result in the same answer as the original question. Such a system can be used to understand and answer rare and noisy reformulations of common questions by mapping them to a set of canonical forms. This has large-scale applications for community Question Answering (cQA) and open-domain spoken language question answering systems. In this paper we describe a new QPR system implemented as a Neural Information Retrieval (NIR) system consisting of a neural network sentence encoder and an approximate k-Nearest Neighbour index for efficient vector retrieval. We also describe our mechanism to generate an annotated dataset for question paraphrase retrieval experiments automatically from question-answer logs via distant supervision. We show that the standard loss function in NIR, triplet loss, does not perform well with noisy labels. We propose smoothed deep metric loss (SDML) and with our experiments on two QPR datasets we show that it significantly outperforms triplet loss in the noisy label setting.
Devraj Mandal, Pramod Rao, Soma Biswas
Cross-modal data matching refers to retrieval of data from one modality, when
given a query from another modality. In general, supervised algorithms achieve
better retrieval performance compared to their unsupervised counterpart, as
they can learn better representative features by leveraging the available label
information. However, this comes at the cost of requiring huge amount of
labeled examples, which may not always be available. In this work, we propose a
novel framework in a semi-supervised setting, which can predict the labels of
the unlabeled data using complementary information from different modalities.
The proposed framework can be used as an add-on with any baseline crossmodal
algorithm to give significant performance improvement, even in case of limited
labeled data. Finally, we analyze the challenging scenario where the unlabeled
examples can even come from classes not in the training data and evaluate the
performance of our algorithm under such setting. Extensive evaluation using
several baseline algorithms across three different datasets shows the
effectiveness of our label prediction framework.
Authors' comments: 12 pages, 3 tables, 2 figures, 1 algorithm flowchart
Elad Amrani, Rami Ben-Ari, Tal Hakim, Alex Bronstein
Learning an object detector or retrieval requires a large data set with
manual annotations. Such data sets are expensive and time consuming to create
and therefore difficult to obtain on a large scale. In this work, we propose to
exploit the natural correlation in narrations and the visual presence of
objects in video, to learn an object detector and retrieval without any manual
labeling involved. We pose the problem as weakly supervised learning with noisy
labels, and propose a novel object detection paradigm under these constraints.
We handle the background rejection by using contrastive samples and confront
the high level of label noise with a new clustering score. Our evaluation is
based on a set of 11 manually annotated objects in over 5000 frames. We show
comparison to a weakly-supervised approach as baseline and provide a strongly
labeled upper bound.
Authors' comments: ICCV 2019 Workshop on Multi-modal Video Analysis and Moments in Time
Challenge
Adam D. Cobb, Michael D. Himes, Frank Soboczenski, Simone Zorzan, Molly D. O'Beirne, Atılım Güneş Baydin, Yarin Gal, Shawn D. Domagal-Goldman et al.
Machine learning is now used in many areas of astrophysics, from detecting exoplanets in Kepler transit signals to removing telescope systematics. Recent work demonstrated the potential of using machine learning algorithms for atmospheric retrieval by implementing a random forest to perform retrievals in seconds that are consistent with the traditional, computationally-expensive nested-sampling retrieval method. We expand upon their approach by presenting a new machine learning model, \texttt{plan-net}, based on an ensemble of Bayesian neural networks that yields more accurate inferences than the random forest for the same data set of synthetic transmission spectra. We demonstrate that an ensemble provides greater accuracy and more robust uncertainties than a single model. In addition to being the first to use Bayesian neural networks for atmospheric retrieval, we also introduce a new loss function for Bayesian neural networks that learns correlations between the model outputs. Importantly, we show that designing machine learning models to explicitly incorporate domain-specific knowledge both improves performance and provides additional insight by inferring the covariance of the retrieved atmospheric parameters. We apply \texttt{plan-net} to the Hubble Space Telescope Wide Field Camera 3 transmission spectrum for WASP-12b and retrieve an isothermal temperature and water abundance consistent with the literature. We highlight that our method is flexible and can be expanded to higher-resolution spectra and a larger number of atmospheric parameters.
Hasan Al-Marzouqi, Yuting Hu, Ghassan AlRegib
Image retrieval is an important problem in the area of multimedia processing. This paper presents two new curvelet-based algorithms for texture retrieval which are suitable for use in constrained-memory devices. The developed algorithms are tested on three publicly available texture datasets: CUReT, Mondial-Marmi, and STex-fabric. Our experiments confirm the effectiveness of the proposed system. Furthermore, a weighted version of the proposed retrieval algorithm is proposed, which is shown to achieve promising results in the classification of seismic activities.