Daniel Gillick, Sayali Kulkarni, Larry Lansing, Alessandro Presta, Jason Baldridge, Eugene Ie, Diego Garcia-Olano
We show that it is feasible to perform entity linking by training a dual
encoder (two-tower) model that encodes mentions and entities in the same dense
vector space, where candidate entities are retrieved by approximate nearest
neighbor search. Unlike prior work, this setup does not rely on an alias table
followed by a re-ranker, and is thus the first fully learned entity retrieval
model. We show that our dual encoder, trained using only anchor-text links in
Wikipedia, outperforms discrete alias table and BM25 baselines, and is
competitive with the best comparable results on the standard TACKBP-2010
dataset. In addition, it can retrieve candidates extremely fast, and
generalizes well to a new dataset derived from Wikinews. On the modeling side,
we demonstrate the dramatic value of an unsupervised negative mining algorithm
for this task.
Authors' comments: CoNLL 2019
Q. Luo, H. Wang
Phase retrieval (PR) is an inverse problem about recovering a signal from
phaseless linear measurements. This problem can be effectively solved by
minimizing a nonconvex amplitude-based loss function. However, this loss
function is non-smooth. To address the non-smoothness, a series of methods have
been proposed by adding truncating, reweighting and smoothing operations to
adjust the gradient or the loss function and achieved better performance. But
these operations bring about extra rules and parameters that need to be
carefully designed. Unlike previous works, we present a smooth amplitude flow
method (SAF) which minimizes a novel loss function, without additionally
modifying the gradient or the loss function during gradient descending. Such a
new heuristic can be regarded as a smooth version of the original non-smooth
amplitude-based loss function. We prove that SAF can converge geometrically to
a global optimal point via the gradient algorithm with an elaborate
initialization stage with a high probability. Substantial numerical tests
empirically illustrate that the proposed heuristic is significantly superior to
the original amplitude-based loss function and SAF also outperforms other
state-of-the-art methods in terms of the recovery rate and the converging
speed. Specially, it is numerically shown that SAF can stably recover the
original signal when number of measurements is smaller than the
information-theoretic limit for both the real and the complex Gaussian models.
Authors' comments: 18 pages, 6 figures, two referrences added
Mohammad Rezaei, Ali Ahmadi, Navid Naderi
This paper presents a new method to extract image low-level features, namely mix histogram (MH), for content-based image retrieval. Since color and edge orientation features are important visual information which help the human visual system percept and discriminate different images, this method extracts and integrates color and edge orientation information in order to measure similarity between different images. Traditional color histograms merely focus on the global distribution of color in the image and therefore fail to extract other visual features. The MH is attempting to overcome this problem by extracting edge orientations as well as color feature. The unique characteristic of the MH is that it takes into consideration both color and edge orientation information in an effective manner. Experimental results show that it outperforms many existing methods which were originally developed for image retrieval purposes.
Hanfei Yan
We derive a set of ptychography phase-retrieval iterative engines based on proximal algorithms originally developed in convex optimization theory, and discuss their connections with existing ones. The use of proximal operator creates a simple frame work that allows us to incorporate the effect of noise from a maximum-likelihood principle. We focus on three particular algorithms, namely proximal minimization, alternating direction method of multiplier and accelerated proximal gradient, and benckmark their performance with numerical simulations and experimental x-ray data. Among them, accelerated proximal gradient shows superior performance in terms of both accuracy and convergence rate for a noisy dataset.
Danny Merkx, Stefan L. Frank, Mirjam Ernestus
Humans learn language by interaction with their environment and listening to
other humans. It should also be possible for computational models to learn
language directly from speech but so far most approaches require text. We
improve on existing neural network approaches to create visually grounded
embeddings for spoken utterances. Using a combination of a multi-layer GRU,
importance sampling, cyclic learning rates, ensembling and vectorial
self-attention our results show a remarkable increase in image-caption
retrieval performance over previous work. Furthermore, we investigate which
layers in the model learn to recognise words in the input. We find that deeper
network layers are better at encoding word presence, although the final layer
has slightly lower performance. This shows that our visually grounded sentence
encoder learns to recognise words from the input even though it is not
explicitly trained for word recognition.
Authors' comments: Submitted to InterSpeech 2019
Karim Banawan, Batuhan Arasli, Sennur Ulukus
We consider the problem of private information retrieval from $N$
\emph{storage-constrained} databases. In this problem, a user wishes to
retrieve a single message out of $M$ messages (of size $L$) without revealing
any information about the identity of the message to individual databases. Each
database stores $\mu ML$ symbols, i.e., a $\mu$ fraction of the entire library,
where $\frac{1}{N} \leq \mu \leq 1$. Our goal is to characterize the optimal
tradeoff curve for the storage cost (captured by $\mu$) and the normalized
download cost ($D/L$). We show that the download cost can be reduced by
employing a hybrid storage scheme that combines \emph{MDS coding} ideas with
\emph{uncoded partial replication} ideas. When there is no coding, our scheme
reduces to Attia-Kumar-Tandon storage scheme, which was initially introduced by
Maddah-Ali-Niesen in the context of the caching problem, and when there is no
uncoded partial replication, our scheme reduces to Banawan-Ulukus storage
scheme; in general, our scheme outperforms both.
Authors' comments: ITW 2019
Zhenghao Liu, Chenyan Xiong, Maosong Sun, Zhiyuan Liu
This paper explores entity embedding effectiveness in ad-hoc entity
retrieval, which introduces distributed representation of entities into entity
retrieval. The knowledge graph contains lots of knowledge and models entity
semantic relations with the well-formed structural representation. Entity
embedding learns lots of semantic information from the knowledge graph and
represents entities with a low-dimensional representation, which provides an
opportunity to establish interactions between query related entities and
candidate entities for entity retrieval. Our experiments demonstrate the
effectiveness of entity embedding based model, which achieves more than 5\%
improvement than the previous state-of-the-art learning to rank based entity
retrieval model. Our further analysis reveals that the entity semantic match
feature effective, especially for the scenario which needs more semantic
understanding.
Authors' comments: 12 pages, 2 figures
Furkan Kınlı, Barış Özcan, Furkan Kıraç
In this study, we investigate in-shop clothing retrieval performance of
densely-connected Capsule Networks with dynamic routing. To achieve this, we
propose Triplet-based design of Capsule Network architecture with two different
feature extraction methods. In our design, Stacked-convolutional (SC) and
Residual-connected (RC) blocks are used to form the input of capsule layers.
Experimental results show that both of our designs outperform all variants of
the baseline study, namely FashionNet, without relying on the landmark
information. Moreover, when compared to the SOTA architectures on clothing
retrieval, our proposed Triplet Capsule Networks achieve comparable recall
rates only with half of parameters used in the SOTA architectures.
Authors' comments: Accepted to the International Conference on Computer Vision, ICCV
2019, Workshop on Computer Vision for Fashion, Art and Design
Weitsung Lin, Tinghsuan Chao, Jianmin Wu, Tianhuang Su
As emojis are widely used in social media, people not only use an emoji to
express their emotions or mention things but also extend its usage to represent
complicate emotions, concepts or activities by combining multiple emojis. In
this work, we study how emoji combination, a consecutive emoji sequence, is
used like a new language. We propose a novel algorithm called Retrieval
Strategy to predict what emoji combination follows given a short text as
context. Our algorithm treats emoji combinations as phrase in language, ranking
sets of emoji combinations like retrieving words from dictionary. We show that
our algorithm largely improves the F1 score from 0.141 to 0.204 on emoji
combination prediction task.
Authors' comments: 4 pages, 2 figures, published in anlp.jp 2019
Donghuo Zeng
A cross-modal retrieval process is to use a query in one modality to obtain
relevant data in another modality. The challenging issue of cross-modal
retrieval lies in bridging the heterogeneous gap for similarity computation,
which has been broadly discussed in image-text, audio-text, and video-text
cross-modal multimedia data mining and retrieval. However, the gap in temporal
structures of different data modalities is not well addressed due to the lack
of alignment relationship between temporal cross-modal structures. Our research
focuses on learning the correlation between different modalities for the task
of cross-modal retrieval. We have proposed an architecture: Supervised-Deep
Canonical Correlation Analysis (S-DCCA), for cross-modal retrieval. In this
forum paper, we will talk about how to exploit triplet neural networks (TNN) to
enhance the correlation learning for cross-modal retrieval. The experimental
result shows the proposed TNN-based supervised correlation learning
architecture can get the best result when the data representation extracted by
supervised learning.
Authors' comments: 3 pages, 1 figure, Submitted to ICDM2019 Ph.D. Forum session
Stanislav Morozov, Artem Babenko
In plenty of machine learning applications, the most relevant items for a particular query should be efficiently extracted, while the relevance function is based on a highly-nonlinear model, e.g., DNNs or GBDTs. Due to the high computational complexity of such models, exhaustive search is infeasible even for medium-scale problems. To address this issue, we introduce Relevance Proximity Graphs (RPG): an efficient non-exhaustive approach that provides a high-quality approximate solution for maximal relevance retrieval. Namely, we extend the recent similarity graphs framework to the setting, when there is no similarity measure defined on item pairs, which is a common practical use-case. By design, our approach directly maximizes off-the-shelf relevance functions and does not require any proxy auxiliary models. Via extensive experiments, we show that the developed method provides excellent retrieval accuracy while requiring only a few model computations, outperforming indirect models. We open-source our implementation as well as two large-scale datasets to support further research on relevance retrieval.
Rodrigo Nogueira
A goal shared by artificial intelligence and information retrieval is to create an oracle, that is, a machine that can answer our questions, no matter how difficult they are. A more limited, but still instrumental, version of this oracle is a question-answering system, in which an open-ended question is given to the machine, and an answer is produced based on the knowledge it has access to. Such systems already exist and are increasingly capable of answering complicated questions. This progress can be partially attributed to the recent success of machine learning and to the efficient methods for storing and retrieving information, most notably through web search engines. One can imagine that this general-purpose question-answering system can be built as a billion-parameters neural network trained end-to-end with a large number of pairs of questions and answers. We argue, however, that although this approach has been very successful for tasks such as machine translation, storing the world's knowledge as parameters of a learning machine can be very hard. A more efficient way is to train an artificial agent on how to use an external retrieval system to collect relevant information. This agent can leverage the effort that has been put into designing and running efficient storage and retrieval systems by learning how to best utilize them to accomplish a task. ...
Felix Hamann, Nadja Kurz, Adrian Ulges
In retrieval applications, binary hashes are known to offer significant
improvements in terms of both memory and speed. We investigate the compression
of sentence embeddings using a neural encoder-decoder architecture, which is
trained by minimizing reconstruction error. Instead of employing the original
real-valued embeddings, we use latent representations in Hamming space produced
by the encoder for similarity calculations.
In quantitative experiments on several benchmarks for semantic similarity
tasks, we show that our compressed hamming embeddings yield a comparable
performance to uncompressed embeddings (Sent2Vec, InferSent, Glove-BoW), at
compression ratios of up to 256:1. We further demonstrate that our model
strongly decorrelates input features, and that the compressor generalizes well
when pre-trained on Wikipedia sentences. We publish the source code on Github
and all experimental results.
Authors' comments: 4 Pages, 9 Figures, 1 Table
Rahaf Aljundi, Lucas Caccia, Eugene Belilovsky, Massimo Caccia, Min Lin, Laurent Charlin, Tinne Tuytelaars
Continual learning, the setting where a learning agent is faced with a never ending stream of data, continues to be a great challenge for modern machine learning systems. In particular the online or "single-pass through the data" setting has gained attention recently as a natural setting that is difficult to tackle. Methods based on replay, either generative or from a stored memory, have been shown to be effective approaches for continual learning, matching or exceeding the state of the art in a number of standard benchmarks. These approaches typically rely on randomly selecting samples from the replay memory or from a generative model, which is suboptimal. In this work, we consider a controlled sampling of memories for replay. We retrieve the samples which are most interfered, i.e. whose prediction will be most negatively impacted by the foreseen parameters update. We show a formulation for this sampling criterion in both the generative replay and the experience replay setting, producing consistent gains in performance and greatly reduced forgetting. We release an implementation of our method at https://github.com/optimass/Maximally_Interfered_Retrieval.
Darío Garigliotti, Dyaa Albakour, Miguel Martinez, Krisztian Balog
Monitoring entities in media streams often relies on rich entity
representations, like structured information available in a knowledge base
(KB). For long-tail entities, such monitoring is highly challenging, due to
their limited, if not entirely missing, representation in the reference KB. In
this paper, we address the problem of retrieving textual contexts for
monitoring long-tail entities. We propose an unsupervised method to overcome
the limited representation of long-tail entities by leveraging established
entities and their contexts as support information. Evaluation on a
purpose-built test collection shows the suitability of our approach and its
robustness for out-of-KB entities.
Authors' comments: Proceedings of the 2019 ACM International Conference on Theory of
Information Retrieval (ICTIR' 19)
Hossein S. Aghamiry, Ali Gholami, Stéphane Operto
Extended formulation of Full Waveform Inversion (FWI), called Wavefield Reconstruction Inversion (WRI), offers potential benefits of decreasing the nonlinearity of the inverse problem by replacing the explicit inverse of the ill-conditioned wave-equation operator of classical FWI (the oscillating Green functions) with a suitably defined data-driven regularized inverse. This regularization relaxes the wave-equation constraint to reconstruct wavefields that match the data, hence mitigating the risk of cycle skipping. The subsurface model parameters are then updated in a direction that reduces these constraint violations. However, in the case of a rough initial model, the phase errors in the reconstructed wavefields may trap the waveform inversion in a local minimum leading to inaccurate subsurface models. In this paper, in order to avoid matching such incorrect phase information during the early WRI iterations, we design a new cost function based upon phase retrieval, namely a process which seeks to reconstruct a signal from the amplitude of linear measurements. This new formulation, called Wavefield Inversion with Phase Retrieval (WIPR), further improves the robustness of the parameter estimation subproblem by a suitable phase correction. We implement the resulting WIPR problem with an alternating-direction approach, which combines the Majorization-Minimization (MM) algorithm to linearise the phase-retrieval term and a variable splitting technique based upon the alternating direction method of multipliers (ADMM). This new workflow equipped with Tikhonov-total variation (TT) regularization, which is the combination of second-order Tikhonov and total variation regularizations and bound constraints, successfully reconstructs the 2004 BP salt model from a sparse fixed-spread acquisition using a 3~Hz starting frequency and a homogeneous initial velocity model.
Mehdi Amara, Christine Opagiste, Rose-Marie Galera
The reported temperature variations of CeB6 s magnetic entropy are
inconsistent with the fourfold degeneracy of the crystal field ground state.
This old question is here addressed through new specific heat measurements and
an improved description, in the cage context, of both the phonons and crystal
field contributions to the specific heat. The antiferromagnetic transition is
characterized as first-order and its latent heat determined. From the phonons
dispersion for a cage compound, the lattice specific heat contribution is
derived from the LaB6 data. Once corrected for the first-order transition and
lattice contributions, the magnetic entropy displays the characteristic plateau
of the quadruplet crystal field ground state, but at temperatures in excess of
30 K. Below 30 K, as the ordering temperature is approached, the magnetic
entropy is substantially reduced. This anomalous temperature dependence is
consistent with a crystal field ground state split by the rare-earth movement,
a phenomenon specific to rare-earth cage compounds.
Authors' comments: 11 double column pages, 9 figures, latex for PRB
Yinfei Yang, Daniel Cer, Amin Ahmad, Mandy Guo, Jax Law, Noah Constant, Gustavo Hernandez Abrego, Steve Yuan et al.
We introduce two pre-trained retrieval focused multilingual sentence encoding
models, respectively based on the Transformer and CNN model architectures. The
models embed text from 16 languages into a single semantic space using a
multi-task trained dual-encoder that learns tied representations using
translation based bridge tasks (Chidambaram al., 2018). The models provide
performance that is competitive with the state-of-the-art on: semantic
retrieval (SR), translation pair bitext retrieval (BR) and retrieval question
answering (ReQA). On English transfer learning tasks, our sentence-level
embeddings approach, and in some cases exceed, the performance of monolingual,
English only, sentence embedding models. Our models are made available for
download on TensorFlow Hub.
Authors' comments: 6 pages, 6 tables, 2 listings, and 1 figure
Cristian Rusu
In this note, we discuss the shift retrieval problems, both classical and
compressed, and provide connections between them using circulant matrices. We
review the properties of circulant matrices necessary for our calculations and
then show how shifts can be recovered from a single measurement.
Authors' comments: arXiv admin note: substantial text overlap with arXiv:1812.01115
Felix Krahmer, Dominik Stöger
Phase retrieval refers to the problem of reconstructing an unknown vector
$x_0 \in \mathbb{C}^n$ or $x_0 \in \mathbb{R}^n $ from $m$ measurements of the
form $y_i = \big\vert \langle \xi^{\left(i\right)}, x_0 \rangle \big\vert^2 $,
where $ \left\{ \xi^{\left(i\right)} \right\}^m_{i=1} \subset \mathbb{C}^m $
are known measurement vectors. While Gaussian measurements allow for recovery
of arbitrary signals provided the number of measurements scales at least
linearly in the number of dimensions, it has been shown that ambiguities may
arise for certain other classes of measurements $ \left\{ \xi^{\left(i\right)}
\right\}^{m}_{i=1}$ such as Bernoulli measurements or Fourier measurements. In
this paper, we will prove that even when a subgaussian vector $
\xi^{\left(i\right)} \in \mathbb{C}^m $ does not fulfill a small-ball
probability assumption, the PhaseLift method is still able to reconstruct a
large class of signals $x_0 \in \mathbb{R}^n$ from the measurements. This
extends recent work by Krahmer and Liu from the real-valued to the
complex-valued case. However, our proof strategy is quite different and we
expect some of the new proof ideas to be useful in several other measurement
scenarios as well. We then extend our results $x_0 \in \mathbb{C}^n $ up to an
additional assumption which, as we show, is necessary.
Authors' comments: 25 pages