Kaiwen Cai, Chris Xiaoxuan Lu, Xingyu Zhao, Xiaowei Huang
Most image retrieval research prioritizes improving predictive performance, often overlooking situations where the reliability of predictions is equally important. The gap between model performance and reliability requirements highlights the need for a systematic approach to analyze and address the risks associated with image retrieval. Uncertainty quantification technique can be applied to mitigate this issue by assessing uncertainty for retrieval sets, but it provides only a heuristic estimate of uncertainty rather than a guarantee. To address these limitations, we present Risk Controlled Image Retrieval (RCIR), which generates retrieval sets with coverage guarantee, i.e., retrieval sets that are guaranteed to contain the true nearest neighbors with a predefined probability. RCIR can be easily integrated with existing uncertainty-aware image retrieval systems, agnostic to data distribution and model selection. To the best of our knowledge, this is the first work that provides coverage guarantees to image retrieval. The validity and efficiency of RCIR are demonstrated on four real-world datasets: CAR-196, CUB-200, Pittsburgh, and ChestX-Det.
Sajani Vithana, Sennur Ulukus
We introduce the problem of deceptive information retrieval (DIR), in which a user wishes to download a required file out of multiple independent files stored in a system of databases while \emph{deceiving} the databases by making the databases' predictions on the user-required file index incorrect with high probability. Conceptually, DIR is an extension of private information retrieval (PIR). In PIR, a user downloads a required file without revealing its index to any of the databases. The metric of deception is defined as the probability of error of databases' prediction on the user-required file, minus the corresponding probability of error in PIR. The problem is defined on time-sensitive data that keeps updating from time to time. In the proposed scheme, the user deceives the databases by sending \emph{real} queries to download the required file at the time of the requirement and \emph{dummy} queries at multiple distinct future time instances to manipulate the probabilities of sending each query for each file requirement, using which the databases' make the predictions on the user-required file index. The proposed DIR scheme is based on a capacity achieving probabilistic PIR scheme, and achieves rates lower than the PIR capacity due to the additional downloads made to deceive the databases. When the required level of deception is zero, the proposed scheme achieves the PIR capacity.
Maksim Zhdanov, Ivan Karpukhin
The concepts of overfitting and generalization are vital for evaluating
machine learning models. In this work, we show that the popular Recall@K metric
depends on the number of classes in the dataset, which limits its ability to
estimate generalization. To fix this issue, we propose a new metric, which
measures retrieval performance, and, unlike Recall@K, estimates generalization.
We apply the proposed metric to popular image retrieval methods and provide new
insights about deep metric learning generalization.
Authors' comments: 4 pages, 3 figures, 2 tables
Zhengbao Jiang, Frank F. Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan et al.
Despite the remarkable ability of large language models (LMs) to comprehend
and generate language, they have a tendency to hallucinate and create factually
inaccurate output. Augmenting LMs by retrieving information from external
knowledge resources is one promising solution. Most existing retrieval
augmented LMs employ a retrieve-and-generate setup that only retrieves
information once based on the input. This is limiting, however, in more general
scenarios involving generation of long texts, where continually gathering
information throughout generation is essential. In this work, we provide a
generalized view of active retrieval augmented generation, methods that
actively decide when and what to retrieve across the course of the generation.
We propose Forward-Looking Active REtrieval augmented generation (FLARE), a
generic method which iteratively uses a prediction of the upcoming sentence to
anticipate future content, which is then utilized as a query to retrieve
relevant documents to regenerate the sentence if it contains low-confidence
tokens. We test FLARE along with baselines comprehensively over 4 long-form
knowledge-intensive generation tasks/datasets. FLARE achieves superior or
competitive performance on all tasks, demonstrating the effectiveness of our
method. Code and datasets are available at https://github.com/jzbjyb/FLARE.
Authors' comments: EMNLP 2023
Zeyu Wang, Yu Wu
Retrieving target information based on input query is of fundamental importance in many real-world applications. In practice, it is not uncommon for the initial search to fail, where additional feedback information is needed to guide the searching process. In this work, we study a setting where the feedback is provided through users clicking liked and disliked searching results. We believe this form of feedback is of great practical interests for its convenience and efficiency. To facilitate future work in this direction, we construct a new benchmark termed click-feedback retrieval based on a large-scale dataset in fashion domain. We demonstrate that incorporating click-feedback can drastically improve the retrieval performance, which validates the value of the proposed setting. We also introduce several methods to utilize click-feedback during training, and show that click-feedback-guided training can significantly enhance the retrieval quality. We hope further exploration in this direction can bring new insights on building more efficient and user-friendly search engines.
Chenyang Lyu, Manh-Duy Nguyen, Van-Tu Ninh, Liting Zhou, Cathal Gurrin, Jennifer Foster
Recent years have witnessed an increasing amount of dialogue/conversation on the web especially on social media. That inspires the development of dialogue-based retrieval, in which retrieving videos based on dialogue is of increasing interest for recommendation systems. Different from other video retrieval tasks, dialogue-to-video retrieval uses structured queries in the form of user-generated dialogue as the search descriptor. We present a novel dialogue-to-video retrieval system, incorporating structured conversational information. Experiments conducted on the AVSD dataset show that our proposed approach using plain-text queries improves over the previous counterpart model by 15.8% on R@1. Furthermore, our approach using dialogue as a query, improves retrieval performance by 4.2%, 6.2%, 8.6% on R@1, R@5 and R@10 and outperforms the state-of-the-art model by 0.7%, 3.6% and 6.0% on R@1, R@5 and R@10 respectively.
Rita Ramos, Desmond Elliott, Bruno Martins
Inspired by retrieval-augmented language generation and pretrained Vision and Language (V&L) encoders, we present a new approach to image captioning that generates sentences given the input image and a set of captions retrieved from a datastore, as opposed to the image alone. The encoder in our model jointly processes the image and retrieved captions using a pretrained V&L BERT, while the decoder attends to the multimodal encoder representations, benefiting from the extra textual evidence from the retrieved captions. Experimental results on the COCO dataset show that image captioning can be effectively formulated from this new perspective. Our model, named EXTRA, benefits from using captions retrieved from the training dataset, and it can also benefit from using an external dataset without the need for retraining. Ablation studies show that retrieving a sufficient number of captions (e.g., k=5) can improve captioning quality. Our work contributes towards using pretrained V&L encoders for generative tasks, instead of standard classification tasks.
Quang Cao, Hong Yen Tran, Son Hoang Dau, Xun Yi, Emanuele Viterbo, Chen Feng, Yu-Chih Huang, Jingge Zhu et al.
A private information retrieval (PIR) scheme allows a client to retrieve a
data item $x_i$ among $n$ items $x_1,x_2,\ldots,x_n$ from $k$ servers, without
revealing what $i$ is even when $t < k$ servers collude and try to learn $i$.
Such a PIR scheme is said to be $t$-private. A PIR scheme is $v$-verifiable if
the client can verify the correctness of the retrieved $x_i$ even when $v \leq
k$ servers collude and try to fool the client by sending manipulated data. Most
of the previous works in the literature on PIR assumed that $v < k$, leaving
the case of all-colluding servers open. We propose a generic construction that
combines a linear map commitment (LMC) and an arbitrary linear PIR scheme to
produce a $k$-verifiable PIR scheme, termed a committed PIR scheme. Such a
scheme guarantees that even in the worst scenario, when all servers are under
the control of an attacker, although the privacy is unavoidably lost, the
client won't be fooled into accepting an incorrect $x_i$. We demonstrate the
practicality of our proposal by implementing the committed PIR schemes based on
the Lai-Malavolta LMC and three well-known PIR schemes using the GMP library
and blst, the current fastest C library for elliptic curve pairings.
Authors' comments: Accepted at ESORICS 2023
Tamir Bendory, Dan Edidin, Ivan Gonzalez
We consider the finite alphabet phase retrieval problem: recovering a signal whose entries lie in a small alphabet of possible values from its Fourier magnitudes. This problem arises in the celebrated technology of X-ray crystallography to determine the atomic structure of biological molecules. Our main result states that for generic values of the alphabet, two signals have the same Fourier magnitudes if and only if several partitions have the same difference sets. Thus, the finite alphabet phase retrieval problem reduces to the combinatorial problem of determining a signal from those difference sets. Notably, this result holds true when one of the letters of the alphabet is zero, namely, for sparse signals with finite alphabet, which is the situation in X-ray crystallography.
David M. Paganin, Daniele Pelliccia, Kaye S. Morgan
Unresolved spatially-random microstructure, in an illuminated sample, can
lead to position-dependent blur when an image of that sample is formed. For a
small propagation distance, between the exit surface of the sample and the
entrance surface of a position-sensitive detector, the paraxial approximation
implies that the blurring influence of the sample may be modeled using an
anomalous-diffusion field. This diffusion field may have a scalar or tensor
character, depending on whether the random microstructure has an
autocorrelation function that is rotationally isotropic or anisotropic,
respectively. Partial differential equations are written down and then solved,
in a closed-form manner, for several variants of the inverse problem of
diffusion-field retrieval given suitable intensity images. Both
uniform-illumination and structured-illumination schemes are considered. Links
are made, between the recovered diffusion field and certain statistical
properties of the unresolved microstructure. The developed theory -- which may
be viewed as a crudely parallel form of small-angle scattering under the
Guinier approximation -- is applicable to a range of paraxial radiation and
matter fields, such as visible light, x rays, neutrons, and electrons.
Authors' comments: Numerous very minor corrections and clarifications throughout,
compared to the previous version
P. G. Casazza, F. Akrami
We will give several surprising equivalences and consequences of weak phase
retrieval. These results give a complete understanding of the difference
between weak phase retrieval and phase retrieval. We also answer two
longstanding open problems on weak phase retrieval: (1) We show that the
families of weak phase retrievable frames $\{x_{i}\}_{i=1}^{m}$ in
$\mathbb{R}^n$ are not dense in the family of $m$-element sets of vectors in
$\mathbb{R}^n$ for all $m\ge 2n-2$; (2) We show that any frame
$\{x_i\}_{i=1}^{2n-2}$ containing one or more canonical basis vectors in
$\mathbb{R}^n$ cannot do weak phase retrieval. We provide numerous examples to
show that the obtained results are best possible.
Authors' comments: arXiv admin note: substantial text overlap with arXiv:2110.06868
Vara Bhavya Sri Malli
Flexible task planning is still a significant challenge for robots. The
inability of robots to creatively adapt their task plans to new or unforeseen
challenges is largely attributable to their limited understanding of their
activities and the environment. Cooking, for example, requires a person to
occasionally take risks that a robot would find extremely dangerous. We may
obtain manipulation sequences by employing knowledge that is drawn from
numerous video sources thanks to knowledge retrieval through graph search.
Authors' comments: arXiv admin note: text overlap with arXiv:1902.01537 by other
authors. text overlap with arXiv:2211.02992
Jianfeng Dong, Xianke Chen, Minsong Zhang, Xun Yang, Shujie Chen, Xirong Li, Xun Wang
Current methods for text-to-video retrieval (T2VR) are trained and tested on
video-captioning oriented datasets such as MSVD, MSR-VTT and VATEX. A key
property of these datasets is that videos are assumed to be temporally
pre-trimmed with short duration, whilst the provided captions well describe the
gist of the video content. Consequently, for a given paired video and caption,
the video is supposed to be fully relevant to the caption. In reality, however,
as queries are not known a priori, pre-trimmed video clips may not contain
sufficient content to fully meet the query. This suggests a gap between the
literature and the real world. To fill the gap, we propose in this paper a
novel T2VR subtask termed Partially Relevant Video Retrieval (PRVR). An
untrimmed video is considered to be partially relevant w.r.t. a given textual
query if it contains a moment relevant to the query. PRVR aims to retrieve such
partially relevant videos from a large collection of untrimmed videos. PRVR
differs from single video moment retrieval and video corpus moment retrieval,
as the latter two are to retrieve moments rather than untrimmed videos. We
formulate PRVR as a multiple instance learning (MIL) problem, where a video is
simultaneously viewed as a bag of video clips and a bag of video frames. Clips
and frames represent video content at different time scales. We propose a
Multi-Scale Similarity Learning (MS-SL) network that jointly learns clip-scale
and frame-scale similarities for PRVR. Extensive experiments on three datasets
(TVR, ActivityNet Captions, and Charades-STA) demonstrate the viability of the
proposed method. We also show that our method can be used for improving video
corpus moment retrieval.
Authors' comments: Accepted by ACM MM 2022. The paper's homepage is
http://danieljf24.github.io/prvr
Michael Glass, Gaetano Rossiello, Md Faisal Mahbub Chowdhury, Ankita Rajaram Naik, Pengshan Cai, Alfio Gliozzo
As demonstrated by GPT-3 and T5, transformers grow in capability as parameter
spaces become larger and larger. However, for tasks that require a large amount
of knowledge, non-parametric memory allows models to grow dramatically with a
sub-linear increase in computational cost and GPU memory requirements. Recent
models such as RAG and REALM have introduced retrieval into conditional
generation. These models incorporate neural initial retrieval from a corpus of
passages. We build on this line of research, proposing Re2G, which combines
both neural initial retrieval and reranking into a BART-based
sequence-to-sequence generation. Our reranking approach also permits merging
retrieval results from sources with incomparable scores, enabling an ensemble
of BM25 and neural initial retrieval. To train our system end-to-end, we
introduce a novel variation of knowledge distillation to train the initial
retrieval, reranker, and generation using only ground truth on the target
sequence output. We find large gains in four diverse tasks: zero-shot slot
filling, question answering, fact-checking, and dialog, with relative gains of
9% to 34% over the previous state-of-the-art on the KILT leaderboard. We make
our code available as open source at
https://github.com/IBM/kgi-slot-filling/tree/re2g.
Authors' comments: Accepted at NAACL 2022
Sarah A. Obead, Jörg Kliewer
We formulate a new variant of the private information retrieval (PIR) problem
where the user is pliable, i.e., interested in any message from a desired
subset of the available dataset, denoted as pliable private information
retrieval (PPIR). We consider a setup where a dataset consisting of $f$
messages is replicated in $n$ noncolluding databases and classified into
$\Gamma$ classes. For this setup, the user wishes to retrieve any $\lambda\geq
1$ messages from multiple desired classes, i.e., $\eta\geq 1$, while revealing
no information about the identity of the desired classes to the databases. We
term this problem multi-message PPIR (M-PPIR) and introduce the single-message
PPIR (PPIR) problem as an elementary special case of M-PPIR. We first derive
converse bounds on the M-PPIR rate, which is defined as the ratio of the
desired amount of information and the total amount of downloaded information,
followed by the corresponding achievable schemes. As a result, we show that the
PPIR capacity, i.e., the maximum achievable PPIR rate, for $n$ noncolluding
databases matches the capacity of PIR with $n$ databases and $\Gamma$ messages.
Thus, enabling flexibility, i.e., pliability, where privacy is only guaranteed
for classes, but not for messages as in classical PIR, allows to trade-off
privacy versus download rate. A similar insight is shown to hold for the
general case of M-PPIR.
Authors' comments: 23 pages, 3 figures, 3 tables, submitted for possible publication
Mengchu Xu, Dekuan Dong, Jian Wang
In recent years, phase retrieval has received much attention in statistics,
applied mathematics and optical engineering. In this paper, we propose an
efficient algorithm, termed Subspace Phase Retrieval (SPR), which can
accurately recover an $n$-dimensional $k$-sparse complex-valued signal $\x$
given its $\Omega(k^2\log n)$ magnitude-only Gaussian samples if the minimum
nonzero entry of $\x$ satisfies $|x_{\min}| = \Omega(\|\x\|/\sqrt{k})$.
Furthermore, if the energy sum of the most significant $\sqrt{k}$ elements in
$\x$ is comparable to $\|\x\|^2$, the SPR algorithm can exactly recover $\x$
with $\Omega(k \log n)$ magnitude-only samples, which attains the
information-theoretic sampling complexity for sparse phase retrieval. Numerical
Experiments demonstrate that the proposed algorithm achieves the
state-of-the-art reconstruction performance compared to existing ones.
Authors' comments: To appear in IEEE Transactions on Information Theory, 2024, 33 pages,
10 figures
Hamed Zamani, Fernando Diaz, Mostafa Dehghani, Donald Metzler, Michael Bendersky
Although information access systems have long supported people in
accomplishing a wide range of tasks, we propose broadening the scope of users
of information access systems to include task-driven machines, such as machine
learning models. In this way, the core principles of indexing, representation,
retrieval, and ranking can be applied and extended to substantially improve
model generalization, scalability, robustness, and interpretability. We
describe a generic retrieval-enhanced machine learning (REML) framework, which
includes a number of existing models as special cases. REML challenges
information retrieval conventions, presenting opportunities for novel advances
in core areas, including optimization. The REML research agenda lays a
foundation for a new style of information access research and paves a path
towards advancing machine learning and artificial intelligence.
Authors' comments: To appear in proceedings of ACM SIGIR 2022
Hyunji Lee, Sohee Yang, Hanseok Oh, Minjoon Seo
A common practice for text retrieval is to use an encoder to map the
documents and the query to a common vector space and perform a nearest neighbor
search (NNS); multi-hop retrieval also often adopts the same paradigm, usually
with a modification of iteratively reformulating the query vector so that it
can retrieve different documents at each hop. However, such a bi-encoder
approach has limitations in multi-hop settings; (1) the reformulated query gets
longer as the number of hops increases, which further tightens the embedding
bottleneck of the query vector, and (2) it is prone to error propagation. In
this paper, we focus on alleviating these limitations in multi-hop settings by
formulating the problem in a fully generative way. We propose an
encoder-decoder model that performs multi-hop retrieval by simply generating
the entire text sequences of the retrieval targets, which means the query and
the documents interact in the language model's parametric space rather than L2
or inner product space as in the bi-encoder approach. Our approach, Generative
Multi-hop Retrieval(GMR), consistently achieves comparable or higher
performance than bi-encoder models in five datasets while demonstrating
superior GPU memory and storage footprint.
Authors' comments: published at EMNLP 2022
Heqi Zheng, Xiao Zhang, Zewen Chi, Heyan Huang, Tan Yan, Tian Lan, Wei Wei, Xian-Ling Mao
Cross-lingual retrieval aims to retrieve relevant text across languages. Current methods typically achieve cross-lingual retrieval by learning language-agnostic text representations in word or sentence level. However, how to learn phrase representations for cross-lingual phrase retrieval is still an open problem. In this paper, we propose XPR, a cross-lingual phrase retriever that extracts phrase representations from unlabeled example sentences. Moreover, we create a large-scale cross-lingual phrase retrieval dataset, which contains 65K bilingual phrase pairs and 4.2M example sentences in 8 English-centric language pairs. Experimental results show that XPR outperforms state-of-the-art baselines which utilize word-level or sentence-level representations. XPR also shows impressive zero-shot transferability that enables the model to perform retrieval in an unseen language pair during training. Our dataset, code, and trained models are publicly available at www.github.com/cwszz/XPR/.
Anirudh Goyal, Abram L. Friesen, Andrea Banino, Theophane Weber, Nan Rosemary Ke, Adria Puigdomenech Badia, Arthur Guez, Mehdi Mirza et al.
Most deep reinforcement learning (RL) algorithms distill experience into parametric behavior policies or value functions via gradient updates. While effective, this approach has several disadvantages: (1) it is computationally expensive, (2) it can take many updates to integrate experiences into the parametric model, (3) experiences that are not fully integrated do not appropriately influence the agent's behavior, and (4) behavior is limited by the capacity of the model. In this paper we explore an alternative paradigm in which we train a network to map a dataset of past experiences to optimal behavior. Specifically, we augment an RL agent with a retrieval process (parameterized as a neural network) that has direct access to a dataset of experiences. This dataset can come from the agent's past experiences, expert demonstrations, or any other relevant source. The retrieval process is trained to retrieve information from the dataset that may be useful in the current context, to help the agent achieve its goal faster and more efficiently. he proposed method facilitates learning agents that at test-time can condition their behavior on the entire dataset and not only the current state, or current trajectory. We integrate our method into two different RL agents: an offline DQN agent and an online R2D2 agent. In offline multi-task problems, we show that the retrieval-augmented DQN agent avoids task interference and learns faster than the baseline DQN agent. On Atari, we show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores. We run extensive ablations to measure the contributions of the components of our proposed method.