Liu Yang, Junjie Hu, Minghui Qiu, Chen Qu, Jianfeng Gao, W. Bruce Croft, Xiaodong Liu, Yelong Shen et al.
Intelligent personal assistant systems that are able to have multi-turn
conversations with human users are becoming increasingly popular. Most previous
research has been focused on using either retrieval-based or generation-based
methods to develop such systems. Retrieval-based methods have the advantage of
returning fluent and informative responses with great diversity. However, the
performance of the methods is limited by the size of the response repository.
On the other hand, generation-based methods can produce highly coherent
responses on any topics. But the generated responses are often generic and not
informative due to the lack of grounding knowledge. In this paper, we propose a
hybrid neural conversation model that combines the merits of both response
retrieval and generation methods. Experimental results on Twitter and
Foursquare data show that the proposed model outperforms both retrieval-based
methods and generation-based methods (including a recently proposed
knowledge-grounded neural conversation model) under both automatic evaluation
metrics and human evaluation. We hope that the findings in this study provide
new insights on how to integrate text retrieval and text generation models for
building conversation systems.
Authors' comments: Accepted as a Full Paper in CIKM 2019. 10 pages
Siddhant Arora, Andrew Yates
We consider algorithm selection in the context of ad-hoc information
retrieval. Given a query and a pair of retrieval methods, we propose a
meta-learner that predicts how to combine the methods' relevance scores into an
overall relevance score. Inspired by neural models' different properties with
regard to IR axioms, these predictions are based on features that quantify
axiom-related properties of the query and its top ranked documents. We conduct
an evaluation on TREC Web Track data and find that the meta-learner often
significantly improves over the individual methods. Finally, we conduct feature
and query weight analyses to investigate the meta-learner's behavior.
Authors' comments: Algorithm Selection and Meta-Learning in Information Retrieval
(AMIR'19) workshop at ECIR'19
Zhan Yang, Osolo Ian Raymond, WuQing Sun, Jun Long
Due to its fast retrieval and storage efficiency capabilities, hashing has
been widely used in nearest neighbor retrieval tasks. By using deep learning
based techniques, hashing can outperform non-learning based hashing technique
in many applications. However, we argue that the current deep learning based
hashing methods ignore some critical problems (e.g., the learned hash codes are
not discriminative due to the hashing methods being unable to discover rich
semantic information and the training strategy having difficulty optimizing the
discrete binary codes). In this paper, we propose a novel image hashing method,
termed as \textbf{\underline{A}}symmetric \textbf{\underline{D}}eep
\textbf{\underline{S}}emantic \textbf{\underline{Q}}uantization
(\textbf{ADSQ}). \textbf{ADSQ} is implemented using three stream frameworks,
which consist of one \emph{LabelNet} and two \emph{ImgNets}. The
\emph{LabelNet} leverages the power of three fully-connected layers, which are
used to capture rich semantic information between image pairs. For the two
\emph{ImgNets}, they each adopt the same convolutional neural network
structure, but with different weights (i.e., asymmetric convolutional neural
networks). The two \emph{ImgNets} are used to generate discriminative compact
hash codes. Specifically, the function of the \emph{LabelNet} is to capture
rich semantic information that is used to guide the two \emph{ImgNets} in
minimizing the gap between the real-continuous features and the discrete binary
codes. Furthermore, \textbf{ADSQ} can utilize the most critical semantic
information to guide the feature learning process and consider the consistency
of the common semantic space and Hamming space. Experimental results on three
benchmarks (i.e., CIFAR-10, NUS-WIDE, and ImageNet) demonstrate that the
proposed \textbf{ADSQ} can outperforms current state-of-the-art methods.
Authors' comments: Accepted to IEEE ACCESS. arXiv admin note: text overlap with
arXiv:1812.01404
Philipp Grohs, Martin Rathmair
In recent work [P. Grohs and M. Rathmair. Stable Gabor Phase Retrieval and Spectral Clustering. Communications on Pure and Applied Mathematics (2018)] the instabilities of the Gabor phase retrieval problem, i.e., the problem of reconstructing a function $f$ from its spectrogram $|\mathcal{G}f|$, where $$ \mathcal{G}f(x,y)=\int_{\mathbb{R}^d} f(t) e^{-\pi|t-x|^2} e^{-2\pi i t\cdot y} dt, \quad x,y\in \mathbb{R}^d, $$ have been completely classified in terms of the disconnectedness of the spectrogram. These findings, however, were crucially restricted to the onedimensional case ($d=1$) and therefore not relevant for many practical applications. In the present paper we not only generalize the aforementioned results to the multivariate case but also significantly improve on them. Our new results have comprehensive implications in various applications such as ptychography, a highly popular method in coherent diffraction imaging.
David A. Barmherzig, Ju Sun, T. J. Lane, Po-Nan Li, Emmanuel J. Candès
A general mathematical framework and recovery algorithm is presented for the
holographic phase retrieval problem. In this problem, which arises in
holographic coherent diffraction imaging, a "reference" portion of the signal
to be recovered via phase retrieval is a priori known from experimental design.
A generic formula is also derived for the expected recovery error when the
measurement data is corrupted by Poisson shot noise. This facilitates an
optimization perspective towards reference design and analysis. We employ this
optimization perspective towards quantifying the performance of various
reference choices.
Authors' comments: 27 pages, 10 figures
Karim Banawan, Sennur Ulukus
We consider the problem of private information retrieval (PIR) of a single message out of $K$ messages from $N$ non-colluding and non-replicated databases. Different from the majority of the existing literature, which considers the case of replicated databases where all databases store the same content in the form of all $K$ messages, here, we consider the case of non-replicated databases under a special non-replication structure where each database stores $M$ out of $K$ messages and each message is stored across $R$ different databases. This generates an $R$-regular graph structure for the storage system where the vertices of the graph are the messages and the edges are the databases. We derive a general upper bound for $M=2$ that depends on the graph structure. We then specialize the problem to storage systems described by two special types of graph structures: cyclic graphs and \emph{fully-connected graphs}. We prove that the PIR capacity for the case of cyclic graphs is $\frac{2}{K+1}$, and the PIR capacity for the case of fully-connected graphs is $\min\{\frac{2}{K},\frac{1}{2}\}$. To that end, we propose novel achievable schemes for both graph structures that are capacity-achieving. The central insight in both schemes is to introduce dependency in the queries submitted to databases that do not contain the desired message, such that the requests can be compressed. In both cases, the results show severe degradation in PIR capacity due to non-replication.
Jie Li, Rongrong Ji, Hong Liu, Xiaopeng Hong, Yue Gao, Qi Tian
Universal adversarial perturbations (UAPs), a.k.a. input-agnostic
perturbations, has been proved to exist and be able to fool cutting-edge deep
learning models on most of the data samples. Existing UAP methods mainly focus
on attacking image classification models. Nevertheless, little attention has
been paid to attacking image retrieval systems. In this paper, we make the
first attempt in attacking image retrieval systems. Concretely, image retrieval
attack is to make the retrieval system return irrelevant images to the query at
the top ranking list. It plays an important role to corrupt the neighbourhood
relationships among features in image retrieval attack. To this end, we propose
a novel method to generate retrieval-against UAP to break the neighbourhood
relationships of image features via degrading the corresponding ranking metric.
To expand the attack method to scenarios with varying input sizes or
untouchable network parameters, a multi-scale random resizing scheme and a
ranking distillation strategy are proposed. We evaluate the proposed method on
four widely-used image retrieval datasets, and report a significant performance
drop in terms of different metrics, such as mAP and mP@10. Finally, we test our
attack methods on the real-world visual search engine, i.e., Google Images,
which demonstrates the practical potentials of our methods.
Authors' comments: ICCV 2019
Daniel Gillick, Alessandro Presta, Gaurav Singh Tomar
Most text-based information retrieval (IR) systems index objects by words or phrases. These discrete systems have been augmented by models that use embeddings to measure similarity in continuous space. But continuous-space models are typically used just to re-rank the top candidates. We consider the problem of end-to-end continuous retrieval, where standard approximate nearest neighbor (ANN) search replaces the usual discrete inverted index, and rely entirely on distances between learned embeddings. By training simple models specifically for retrieval, with an appropriate model architecture, we improve on a discrete baseline by 8% and 26% (MAP) on two similar-question retrieval tasks. We also discuss the problem of evaluation for retrieval systems, and show how to modify existing pairwise similarity datasets for this purpose.
Frank Soboczenski, Michael D. Himes, Molly D. O'Beirne, Simone Zorzan, Atilim Gunes Baydin, Adam D. Cobb, Yarin Gal, Daniel Angerhausen et al.
Over the past decade, the study of extrasolar planets has evolved rapidly
from plain detection and identification to comprehensive categorization and
characterization of exoplanet systems and their atmospheres. Atmospheric
retrieval, the inverse modeling technique used to determine an exoplanetary
atmosphere's temperature structure and composition from an observed spectrum,
is both time-consuming and compute-intensive, requiring complex algorithms that
compare thousands to millions of atmospheric models to the observational data
to find the most probable values and associated uncertainties for each model
parameter. For rocky, terrestrial planets, the retrieved atmospheric
composition can give insight into the surface fluxes of gaseous species
necessary to maintain the stability of that atmosphere, which may in turn
provide insight into the geological and/or biological processes active on the
planet. These atmospheres contain many molecules, some of them biosignatures,
spectral fingerprints indicative of biological activity, which will become
observable with the next generation of telescopes. Runtimes of traditional
retrieval models scale with the number of model parameters, so as more
molecular species are considered, runtimes can become prohibitively long.
Recent advances in machine learning (ML) and computer vision offer new ways to
reduce the time to perform a retrieval by orders of magnitude, given a
sufficient data set to train with. Here we present an ML-based retrieval
framework called Intelligent exoplaNet Atmospheric RetrievAl (INARA) that
consists of a Bayesian deep learning model for retrieval and a data set of
3,000,000 synthetic rocky exoplanetary spectra generated using the NASA
Planetary Spectrum Generator. Our work represents the first ML retrieval model
for rocky, terrestrial exoplanets and the first synthetic data set of
terrestrial spectra generated at this scale.
Authors' comments: Third workshop on Bayesian Deep Learning (NeurIPS 2018), Montreal,
Canada
Julien Lavauzelle, Razane Tajeddine, Ragnar Freij-Hollanti, Camilla Hollanti
A private information retrieval (PIR) scheme allows a user to retrieve a file from a database without revealing any information on the file being requested. As of now, PIR schemes have been proposed for several kinds of storage systems, including replicated and MDS-coded data. In this paper, the problem of constructing a PIR scheme on regenerating codes is considered. A regenerating code is a storage code whose codewords are distributed among $n$ nodes, enabling efficient storage of files, as well as low-bandwidth retrieval of files and repair of nodes. In this work, a PIR scheme on regenerating codes is constructed, using the product-matrix (PM) framework of Rashmi, Shah and Kumar. Both the minimum-bandwidth (MBR) and minimum-storage (MSR) settings are considered, and the structure given by the PM framework is used in order to reduce the download communication complexity of our schemes.
Zhongdao Wang, Liang Zheng, Shengjin Wang
Feature fusion is a commonly used strategy in image retrieval tasks, which aggregates the matching responses of multiple visual features. Feasible sets of features can be either descriptors (SIFT, HSV) for an entire image or the same descriptor for different local parts (face, body). Ideally, the to-be-fused heterogeneous features are pre-assumed to be discriminative and complementary to each other. However, the effectiveness of different features varies dramatically according to different queries. That is to say, for some queries, a feature may be neither discriminative nor complementary to existing ones, while for other queries, the feature suffices. As a result, it is important to estimate the effectiveness of features in a query-adaptive manner. To this end, this article proposes a new late fusion scheme at the score level. We base our method on the observation that the sorted score curves contain patterns that describe their effectiveness. For example, an "L"-shaped curve indicates that the feature is discriminative while a gradually descending curve suggests a bad feature. As such, this paper introduces a query-adaptive late fusion pipeline. In the hand-crafted version, it can be an unsupervised approach to tasks like particular object retrieval. In the learning version, it can also be applied to supervised tasks like person recognition and pedestrian retrieval, based on a trainable neural module. Extensive experiments are conducted on two object retrieval datasets and one person recognition dataset. We show that our method is able to highlight the good features and suppress the bad ones, is resilient to distractor features, and achieves very competitive retrieval accuracy compared with the state of the art. In an additional person re-identification dataset, the application scope and limitation of the proposed method are studied.
Alvaro Henrique Chaim Correia, Jorge Luiz Moreira Silva, Thiago de Castro Martins, Fabio Gagliardi Cozman
Recurrent neural networks are now the state-of-the-art in natural language
processing because they can build rich contextual representations and process
texts of arbitrary length. However, recent developments on attention mechanisms
have equipped feedforward networks with similar capabilities, hence enabling
faster computations due to the increase in the number of operations that can be
parallelized. We explore this new type of architecture in the domain of
question-answering and propose a novel approach that we call Fully Attention
Based Information Retriever (FABIR). We show that FABIR achieves competitive
results in the Stanford Question Answering Dataset (SQuAD) while having fewer
parameters and being faster at both learning and inference than rival methods.
Authors' comments: Accepted for presentation at the International Joint Conference on
Neural Networks (IJCNN) 2018
Maxim Pisov, Gleb Makarchuk, Valery Kostjuchenko, Alexandra Dalechina, Andrey Golanov, Mikhail Belyaev
Classification-based image retrieval systems are built by training convolutional neural networks (CNNs) on a relevant classification problem and using the distance in the resulting feature space as a similarity metric. However, in practical applications, it is often desirable to have representations which take into account several aspects of the data (e.g., brain tumor type and its localization). In our work, we extend the classification-based approach with multitask learning: we train a CNN on brain MRI scans with heterogeneous labels and implement a corresponding tumor image retrieval system. We validate our approach on brain tumor data which contains information about tumor types, shapes and localization. We show that our method allows us to build representations that contain more relevant information about tumors than single-task classification-based approaches.
Meng Huang, Ming-Jun Lai, Abraham Varghese, Zhiqiang Xu
In this paper, we develop a new computational approach which is based on
minimizing the difference of two convex functionals (DC) to solve a broader
class of phase retrieval problems. The approach splits a standard nonlinear
least squares minimizing function associated with the phase retrieval problem
into the difference of two convex functions and then solves a sequence of
convex minimization sub-problems. For each subproblem, the Nesterov's
accelerated gradient descent algorithm or the Barzilai-Borwein (BB) algorithm
is used. In the setting of sparse phase retrieval, a standard $\ell_1$ norm
term is added into the minimization mentioned above. The subproblem is
approximated by a proximal gradient method which is solved by the
shrinkage-threshold technique directly without iterations. In addition, a
modified Attouch-Peypouquet technique is used to accelerate the iterative
computation. These lead to more effective algorithms than the Wirtinger flow
(WF) algorithm and the Gauss-Newton (GN) algorithm and etc.. A convergence
analysis of both DC based algorithms shows that the iterative solutions is
convergent linearly to a critical point and will be closer to a global
minimizer than the given initial starting point. Our study is a deterministic
analysis while the study for the Wirtinger flow (WF) algorithm and its
variants, the Gauss-Newton (GN) algorithm, the trust region algorithm is based
on the probability analysis. In particular, the DC based algorithms are able to
retrieve solutions using a number $m$ of measurements which is about twice of
the number $n$ of entries in the solution with high frequency of successes.
When $m\approx n$, the $\ell_1$ DC based algorithm is able to retrieve sparse
signals.
Authors' comments: 28 pages
Razane Tajeddine, Antonia Wachter-Zeh, Camilla Hollanti
In this paper, the problem of providing privacy to users requesting data over a network from a distributed storage system (DSS) is considered. The DSS, which is considered as the multi-terminal destination of the network from the user's perspective, is encoded by a maximum rank distance (MRD) code to store the data on these multiple servers. A private information retrieval (PIR) scheme ensures that a user can request a file without revealing any information on which file is being requested to any of the servers. In this paper, a novel PIR scheme is proposed, allowing the user to recover a file from a storage system with low communication cost, while allowing some servers in the system to collude in the quest of revealing the identity of the requested file. The network is modeled as a random linear network, i.e., all nodes of the network forward random (unknown) linear combinations of incoming packets. Both error-free and erroneous random linear networks are considered.
David Semedo, João Magalhães
Multimedia information have strong temporal correlations that shape the way
modalities co-occur over time. In this paper we study the dynamic nature of
multimedia and social-media information, where the temporal dimension emerges
as a strong source of evidence for learning the temporal correlations across
visual and textual modalities. So far, cross-media retrieval models, explored
the correlations between different modalities (e.g. text and image) to learn a
common subspace, in which semantically similar instances lie in the same
neighbourhood. Building on such knowledge, we propose a novel temporal
cross-media neural architecture, that departs from standard cross-media
methods, by explicitly accounting for the temporal dimension through temporal
subspace learning. The model is softly-constrained with temporal and
inter-modality constraints that guide the new subspace learning task by
favouring temporal correlations between semantically similar and temporally
close instances. Experiments on three distinct datasets show that accounting
for time turns out to be important for cross-media retrieval. Namely, the
proposed method outperforms a set of baselines on the task of temporal
cross-media retrieval, demonstrating its effectiveness for performing temporal
subspace learning.
Authors' comments: To appear in ACM MM 2018
Jianfeng Dong, Xirong Li, Chaoxi Xu, Shouling Ji, Yuan He, Gang Yang, Xun Wang
This paper attacks the challenging problem of zero-example video retrieval.
In such a retrieval paradigm, an end user searches for unlabeled videos by
ad-hoc queries described in natural language text with no visual example
provided. Given videos as sequences of frames and queries as sequences of
words, an effective sequence-to-sequence cross-modal matching is required. The
majority of existing methods are concept based, extracting relevant concepts
from queries and videos and accordingly establishing associations between the
two modalities. In contrast, this paper takes a concept-free approach,
proposing a dual deep encoding network that encodes videos and queries into
powerful dense representations of their own. Dual encoding is conceptually
simple, practically effective and end-to-end. As experiments on three
benchmarks, i.e. MSR-VTT, TRECVID 2016 and 2017 Ad-hoc Video Search show, the
proposed solution establishes a new state-of-the-art for zero-example video
retrieval.
Authors' comments: Accepted by CVPR 2019. Code and data are available at
https://github.com/danieljf24/dual_encoding
Elham Ashoori, Terry Rudolph
There have been suggestions within the Information Retrieval (IR) community that quantum mechanics (QM) can be used to help formalise the foundations of IR. The invoked connection to QM is mathematical rather than physical. The proposed ideas are concerned with information which is encoded, processed and accessed in classical computers. However, some of the suggestions have been thoroughly muddled with questions about applying techniques of quantum information theory in IR, and it is often unclear whether or not the suggestion is to perform actual quantum information processing on the information. This paper is an attempt to provide some conceptual clarity on the emerging issues.
Giorgos Kordopatis-Zilos, Symeon Papadopoulos, Ioannis Patras, Ioannis Kompatsiaris
This paper introduces the problem of Fine-grained Incident Video Retrieval (FIVR). Given a query video, the objective is to retrieve all associated videos, considering several types of associations that range from duplicate videos to videos from the same incident. FIVR offers a single framework that contains several retrieval tasks as special cases. To address the benchmarking needs of all such tasks, we construct and present a large-scale annotated video dataset, which we call FIVR-200K, and it comprises 225,960 videos. To create the dataset, we devise a process for the collection of YouTube videos based on major news events from recent years crawled from Wikipedia and deploy a retrieval pipeline for the automatic selection of query videos based on their estimated suitability as benchmarks. We also devise a protocol for the annotation of the dataset with respect to the four types of video associations defined by FIVR. Finally, we report the results of an experimental study on the dataset comparing five state-of-the-art methods developed based on a variety of visual descriptors, highlighting the challenges of the current problem.
Alexander Barnett, Charles L. Epstein, Leslie Greengard, Jeremy Magland
One of the most powerful approaches to imaging at the nanometer or
subnanometer length scale is coherent diffraction imaging using X-ray sources.
For amorphous (non-crystalline) samples, the raw data can be interpreted as the
modulus of the continuous Fourier transform of the unknown object. Making use
of prior information about the sample (such as its support), a natural goal is
to recover the phase through computational means, after which the unknown
object can be visualized at high resolution. While many algorithms have been
proposed for this phase retrieval problem, careful analysis of its
well-posedness has received relatively little attention. In this paper, we show
that the problem is, in general, not well-posed and describe some of the
underlying issues that are responsible for the ill-posedness. We then show how
this analysis can be used to develop experimental protocols that lead to better
conditioned inverse problems.
Authors' comments: This is a substantial revision of version 1