Grace Hui Yang
This article presents a summary graph to show the relationships between Information Retrieval (IR) and other related disciplines. The figure tells the key differences between them and the conditions under which one would transition into another.
Bing Gao, Haixia Liu, Yang Wang
Generally, phase retrieval problem can be viewed as the reconstruction of a
function/signal from only the magnitude of the linear measurements. These
measurements can be, for example, the Fourier transform of the density
function. Computationally the phase retrieval problem is very challenging. Many
algorithms for phase retrieval are based on i.i.d. Gaussian random
measurements. However, Gaussian random measurements remain one of the very few
classes of measurements. In this paper, we develop an efficient phase retrieval
algorithm for sub-gaussian random frames. We provide a general condition for
measurements and develop a modified spectral initialization. In the algorithm,
we first obtain a good approximation of the solution through the
initialization, and from there we useWirtinger Flow to solve for the solution.
We prove that the algorithm converges to the global minimizer linearly.
Authors' comments: 20 pages, 2 figures
Tatiana Latychevskaia
This paper provides a tutorial of iterative phase retrieval algorithms based on the Gerchberg-Saxton (GS) algorithm applied in digital holography. In addition, a novel GS-based algorithm that allows reconstruction of 3D samples is demonstrated. The GS-based algorithms recover complex-valued wavefront by wavefront back-and forth propagation between two planes with constraints superimposed in these two planes. Iterative phase retrieval allows quantitatively correct and twin-image-free reconstructions of object amplitude and phase distributions from its in-line hologram. The present work derives the quantitative criteria on how many holograms are required to reconstruct a complex-valued object distribution, be it a 2D or 3D sample. It is shown that for a sample that can be approximated as a 2D sample, a single-shot in-line hologram is sufficient to reconstruct the absorption and phase distributions of the sample. Previously, the GS-based algorithms have been successfully employed to reconstruct samples that are limited to a 2D plane. However, realistic physical objects always have some finite thickness and therefore are 3D rather than 2D objects. This study demonstrates that 3D samples, including 3D phase objects, can be reconstructed from two or more holograms. It is shown that in principle, two holograms are sufficient to recover the entire wavefront diffracted by a 3D sample distribution. In this method, the reconstruction is performed by applying iterative phase retrieval between the planes where intensity was measured. The recovered complex-valued wavefront is then propagated back to the sample planes, thus reconstructing the 3D distribution of the sample. This method can be applied for 3D samples such as 3D distribution of particles, thick biological samples, and other 3D phase objects. Examples of reconstructions of 3D objects, including phase objects, are provided.
Peng Shi, Jimmy Lin
Recent work has shown the surprising ability of multi-lingual BERT to serve as a zero-shot cross-lingual transfer model for a number of language processing tasks. We combine this finding with a similarly-recently proposal on sentence-level relevance modeling for document retrieval to demonstrate the ability of multi-lingual BERT to transfer models of relevance across languages. Experiments on test collections in five different languages from diverse language families (Chinese, Arabic, French, Hindi, and Bengali) show that models trained with English data improve ranking quality, without any special processing, both for (non-English) mono-lingual retrieval as well as cross-lingual retrieval.
Amir Vakili Tahami, Azadeh Shakery
Work on retrieval-based chatbots, like most sequence pair matching tasks, can
be divided into Cross-encoders that perform word matching over the pair, and
Bi-encoders that encode the pair separately. The latter has better performance,
however since candidate responses cannot be encoded offline, it is also much
slower. Lately, multi-layer transformer architectures pre-trained as language
models have been used to great effect on a variety of natural language
processing and information retrieval tasks. Recent work has shown that these
language models can be used in text-matching scenarios to create Bi-encoders
that perform almost as well as Cross-encoders while having a much faster
inference speed. In this paper, we expand upon this work by developing a
sequence matching architecture that %takes into account contexts in the
training dataset at inference time. utilizes the entire training set as a
makeshift knowledge-base during inference. We perform detailed experiments
demonstrating that this architecture can be used to further improve Bi-encoders
performance while still maintaining a relatively high inference speed.
Authors' comments: 8 pages, 1 figure, 3 tables
Shishi Qiao, Ruiping Wang, Shiguang Shan, Xilin Chen
Retrieving videos of a particular person with face image as a query via
hashing technique has many important applications. While face images are
typically represented as vectors in Euclidean space, characterizing face videos
with some robust set modeling techniques (e.g. covariance matrices as exploited
in this study, which reside on Riemannian manifold), has recently shown
appealing advantages. This hence results in a thorny heterogeneous spaces
matching problem. Moreover, hashing with handcrafted features as done in many
existing works is clearly inadequate to achieve desirable performance for this
task. To address such problems, we present an end-to-end Deep Heterogeneous
Hashing (DHH) method that integrates three stages including image feature
learning, video modeling, and heterogeneous hashing in a single framework, to
learn unified binary codes for both face images and videos. To tackle the key
challenge of hashing on the manifold, a well-studied Riemannian kernel mapping
is employed to project data (i.e. covariance matrices) into Euclidean space and
thus enables to embed the two heterogeneous representations into a common
Hamming space, where both intra-space discriminability and inter-space
compatibility are considered. To perform network optimization, the gradient of
the kernel mapping is innovatively derived via structured matrix
backpropagation in a theoretically principled way. Experiments on three
challenging datasets show that our method achieves quite competitive
performance compared with existing hashing methods.
Authors' comments: 14 pages, 17 figures, 4 tables, accepted by IEEE Transactions on
Image Processing (TIP) 2019
Chun-Kit Lai, Friedrich Littmann, Eric Weber
We consider the problem of conjugate phase retrieval in Paley-Wiener space
$PW_{\pi}$. The goal of conjugate phase retrieval is to recover a signal $f$
from the magnitudes of linear measurements up to unknown phase factor and
unknown conjugate, meaning $f(t)$ and $\overline{f(t)}$ are not necessarily
distinguishable from the available data. We show that conjugate phase retrieval
can be accomplished in $PW_{\pi}$ by sampling only on the real line by using
structured convolutions. We also show that conjugate phase retrieval can be
accomplished in $PW_{\pi}$ by sampling both $f$ and $f^{\prime}$ only on the
real line. Moreover, we demonstrate experimentally that the Gerchberg-Saxton
method of alternating projections can accomplish the reconstruction from
vectors that do conjugate phase retrieval in finite dimensional spaces.
Finally, we show that generically, conjugate phase retrieval can be
accomplished by sampling at three times the Nyquist rate, whereas phase
retrieval requires sampling at four times the Nyquist rate.
Authors' comments: 5 color figures
Ji Li, Hongkai Zhao
Phase retrieval with prior information can be cast as a nonsmooth and nonconvex optimization problem. We solve the problem by graph projection splitting (GPS), where the two proximity subproblems and the graph projection step can be solved efficiently. With slight modification, we also propose a robust graph projection splitting (RGPS) method to stabilize the iteration for noisy measurements. Contrary to intuition, RGPS outperforms GPS with fewer iterations to locate a satisfying solution even for noiseless case. Based on the connection between GPS and Douglas-Rachford iteration, under mild conditions on the sampling vectors, we analyze the fixed point sets and provide the local convergence of GPS and RGPS applied to noiseless phase retrieval without prior information. For noisy case, we provide the error bound of the reconstruction. Compared to other existing methods, thanks for the splitting approach, GPS and RGPS can efficiently solve phase retrieval with prior information regularization for general sampling vectors which are not necessarily isometric. For Gaussian phase retrieval, compared to existing gradient flow approaches, numerical results show that GPS and RGPS are much less sensitive to the initialization. Thus they markedly improve the phase transition in noiseless case and reconstruction in the presence of noise respectively. GPS shows sharpest phase transition among existing methods including RGPS, while it needs more iterations than RGPS when the number of measurement is large enough. RGPS outperforms GPS in terms of stability for noisy measurements. When applying RGPS to more general non-Gaussian measurements with prior information, such as support, sparsity and TV minimization, RGPS either outperforms state-of-the-art solvers or can be combined with state-of-the-art solvers to improve their reconstruction quality.
Deepanwita Datta
Studying human behaviour through lifelogging has seen an increase in attention from researchers over the past decade. The opportunities that lifelogging offers are based on the fact that a lifelog, as a "black box" of our lives, offers rich contextual information, which has been an Achilles heel of information discovery. While lifelog data has been put to use in various contexts, its application to indoor environment scenario remains unexplored. In this proposal, I plan to design a method that enables us to capture and record indoor lifelog data of a person's life in order to facilitate healthcare systems, emergency response, item tracking etc. To this end, we aim to build an Indoor Information Retrieval system that can be queried with natural language queries over lifelog data. Judicious use of the lifelog data for the indoor application may enable us to solve very fundamental but non-avoidable problems of our daily life. Analysis of lifelog data coupled with Information Retrieval is not only a promising research topic, but the possibility of its indoor application especially for healthcare, lost-item tracking would be an innovative research idea to the best of our knowledge.
Amir Soleimani, Christof Monz, Marcel Worring
Motivated by the promising performance of pre-trained language models, we investigate BERT in an evidence retrieval and claim verification pipeline for the FEVER fact extraction and verification challenge. To this end, we propose to use two BERT models, one for retrieving potential evidence sentences supporting or rejecting claims, and another for verifying claims based on the predicted evidence sets. To train the BERT retrieval system, we use pointwise and pairwise loss functions, and examine the effect of hard negative mining. A second BERT model is trained to classify the samples as supported, refuted, and not enough information. Our system achieves a new state of the art recall of 87.1 for retrieving top five sentences out of the FEVER documents consisting of 50K Wikipedia pages, and scores second in the official leaderboard with the FEVER score of 69.7.
Raunak Kumar, Paul Liu, Moses Charikar, Austin R. Benson
Pattern counting in graphs is a fundamental primitive for many network analysis tasks, and a number of methods have been developed for scaling subgraph counting to large graphs. Many real-world networks carry a natural notion of strength of connection between nodes, which are often modeled by a weighted graph, but existing scalable graph algorithms for pattern mining are designed for unweighted graphs. Here, we develop a suite of deterministic and random sampling algorithms that enable the fast discovery of the 3-cliques (triangles) with the largest weight in a graph, where weight is measured by a generalized mean of a triangle's edges. For example, one of our proposed algorithms can find the top-1000 weighted triangles of a weighted graph with billions of edges in thirty seconds on a commodity server, which is orders of magnitude faster than existing "fast" enumeration schemes. Our methods thus open the door towards scalable pattern mining in weighted graphs.
Ana Valeria Gonzalez, Isabelle Augenstein, Anders Søgaard
Most research on dialogue has focused either on dialogue generation for openended chit chat or on state tracking for goal-directed dialogue. In this work, we explore a hybrid approach to goal-oriented dialogue generation that combines retrieval from past history with a hierarchical, neural encoder-decoder architecture. We evaluate this approach in the customer support domain using the Multiwoz dataset (Budzianowski et al., 2018). We show that adding this retrieval step to a hierarchical, neural encoder-decoder architecture leads to significant improvements, including responses that are rated more appropriate and fluent by human evaluators. Finally, we compare our retrieval-based model to various semantically conditioned models explicitly using past dialog act information, and find that our proposed model is competitive with the current state of the art (Chen et al., 2019), while not requiring explicit labels about past machine acts.
Daniel Gillick, Sayali Kulkarni, Larry Lansing, Alessandro Presta, Jason Baldridge, Eugene Ie, Diego Garcia-Olano
We show that it is feasible to perform entity linking by training a dual
encoder (two-tower) model that encodes mentions and entities in the same dense
vector space, where candidate entities are retrieved by approximate nearest
neighbor search. Unlike prior work, this setup does not rely on an alias table
followed by a re-ranker, and is thus the first fully learned entity retrieval
model. We show that our dual encoder, trained using only anchor-text links in
Wikipedia, outperforms discrete alias table and BM25 baselines, and is
competitive with the best comparable results on the standard TACKBP-2010
dataset. In addition, it can retrieve candidates extremely fast, and
generalizes well to a new dataset derived from Wikinews. On the modeling side,
we demonstrate the dramatic value of an unsupervised negative mining algorithm
for this task.
Authors' comments: CoNLL 2019
Q. Luo, H. Wang
Phase retrieval (PR) is an inverse problem about recovering a signal from
phaseless linear measurements. This problem can be effectively solved by
minimizing a nonconvex amplitude-based loss function. However, this loss
function is non-smooth. To address the non-smoothness, a series of methods have
been proposed by adding truncating, reweighting and smoothing operations to
adjust the gradient or the loss function and achieved better performance. But
these operations bring about extra rules and parameters that need to be
carefully designed. Unlike previous works, we present a smooth amplitude flow
method (SAF) which minimizes a novel loss function, without additionally
modifying the gradient or the loss function during gradient descending. Such a
new heuristic can be regarded as a smooth version of the original non-smooth
amplitude-based loss function. We prove that SAF can converge geometrically to
a global optimal point via the gradient algorithm with an elaborate
initialization stage with a high probability. Substantial numerical tests
empirically illustrate that the proposed heuristic is significantly superior to
the original amplitude-based loss function and SAF also outperforms other
state-of-the-art methods in terms of the recovery rate and the converging
speed. Specially, it is numerically shown that SAF can stably recover the
original signal when number of measurements is smaller than the
information-theoretic limit for both the real and the complex Gaussian models.
Authors' comments: 18 pages, 6 figures, two referrences added
Mohammad Rezaei, Ali Ahmadi, Navid Naderi
This paper presents a new method to extract image low-level features, namely mix histogram (MH), for content-based image retrieval. Since color and edge orientation features are important visual information which help the human visual system percept and discriminate different images, this method extracts and integrates color and edge orientation information in order to measure similarity between different images. Traditional color histograms merely focus on the global distribution of color in the image and therefore fail to extract other visual features. The MH is attempting to overcome this problem by extracting edge orientations as well as color feature. The unique characteristic of the MH is that it takes into consideration both color and edge orientation information in an effective manner. Experimental results show that it outperforms many existing methods which were originally developed for image retrieval purposes.
Hanfei Yan
We derive a set of ptychography phase-retrieval iterative engines based on proximal algorithms originally developed in convex optimization theory, and discuss their connections with existing ones. The use of proximal operator creates a simple frame work that allows us to incorporate the effect of noise from a maximum-likelihood principle. We focus on three particular algorithms, namely proximal minimization, alternating direction method of multiplier and accelerated proximal gradient, and benckmark their performance with numerical simulations and experimental x-ray data. Among them, accelerated proximal gradient shows superior performance in terms of both accuracy and convergence rate for a noisy dataset.
Danny Merkx, Stefan L. Frank, Mirjam Ernestus
Humans learn language by interaction with their environment and listening to
other humans. It should also be possible for computational models to learn
language directly from speech but so far most approaches require text. We
improve on existing neural network approaches to create visually grounded
embeddings for spoken utterances. Using a combination of a multi-layer GRU,
importance sampling, cyclic learning rates, ensembling and vectorial
self-attention our results show a remarkable increase in image-caption
retrieval performance over previous work. Furthermore, we investigate which
layers in the model learn to recognise words in the input. We find that deeper
network layers are better at encoding word presence, although the final layer
has slightly lower performance. This shows that our visually grounded sentence
encoder learns to recognise words from the input even though it is not
explicitly trained for word recognition.
Authors' comments: Submitted to InterSpeech 2019
Karim Banawan, Batuhan Arasli, Sennur Ulukus
We consider the problem of private information retrieval from $N$
\emph{storage-constrained} databases. In this problem, a user wishes to
retrieve a single message out of $M$ messages (of size $L$) without revealing
any information about the identity of the message to individual databases. Each
database stores $\mu ML$ symbols, i.e., a $\mu$ fraction of the entire library,
where $\frac{1}{N} \leq \mu \leq 1$. Our goal is to characterize the optimal
tradeoff curve for the storage cost (captured by $\mu$) and the normalized
download cost ($D/L$). We show that the download cost can be reduced by
employing a hybrid storage scheme that combines \emph{MDS coding} ideas with
\emph{uncoded partial replication} ideas. When there is no coding, our scheme
reduces to Attia-Kumar-Tandon storage scheme, which was initially introduced by
Maddah-Ali-Niesen in the context of the caching problem, and when there is no
uncoded partial replication, our scheme reduces to Banawan-Ulukus storage
scheme; in general, our scheme outperforms both.
Authors' comments: ITW 2019
Zhenghao Liu, Chenyan Xiong, Maosong Sun, Zhiyuan Liu
This paper explores entity embedding effectiveness in ad-hoc entity
retrieval, which introduces distributed representation of entities into entity
retrieval. The knowledge graph contains lots of knowledge and models entity
semantic relations with the well-formed structural representation. Entity
embedding learns lots of semantic information from the knowledge graph and
represents entities with a low-dimensional representation, which provides an
opportunity to establish interactions between query related entities and
candidate entities for entity retrieval. Our experiments demonstrate the
effectiveness of entity embedding based model, which achieves more than 5\%
improvement than the previous state-of-the-art learning to rank based entity
retrieval model. Our further analysis reveals that the entity semantic match
feature effective, especially for the scenario which needs more semantic
understanding.
Authors' comments: 12 pages, 2 figures
Furkan Kınlı, Barış Özcan, Furkan Kıraç
In this study, we investigate in-shop clothing retrieval performance of
densely-connected Capsule Networks with dynamic routing. To achieve this, we
propose Triplet-based design of Capsule Network architecture with two different
feature extraction methods. In our design, Stacked-convolutional (SC) and
Residual-connected (RC) blocks are used to form the input of capsule layers.
Experimental results show that both of our designs outperform all variants of
the baseline study, namely FashionNet, without relying on the landmark
information. Moreover, when compared to the SOTA architectures on clothing
retrieval, our proposed Triplet Capsule Networks achieve comparable recall
rates only with half of parameters used in the SOTA architectures.
Authors' comments: Accepted to the International Conference on Computer Vision, ICCV
2019, Workshop on Computer Vision for Fashion, Art and Design