Zhihong Shao, Yeyun Gong, Yelong Shen, Minlie Huang, Nan Duan, Weizhu Chen
Large language models are powerful text processors and reasoners, but are
still subject to limitations including outdated knowledge and hallucinations,
which necessitates connecting them to the world. Retrieval-augmented large
language models have raised extensive attention for grounding model generation
on external knowledge. However, retrievers struggle to capture relevance,
especially for queries with complex information needs. Recent work has proposed
to improve relevance modeling by having large language models actively involved
in retrieval, i.e., to improve retrieval with generation. In this paper, we
show that strong performance can be achieved by a method we call Iter-RetGen,
which synergizes retrieval and generation in an iterative manner. A model
output shows what might be needed to finish a task, and thus provides an
informative context for retrieving more relevant knowledge which in turn helps
generate a better output in the next iteration. Compared with recent work which
interleaves retrieval with generation when producing an output, Iter-RetGen
processes all retrieved knowledge as a whole and largely preserves the
flexibility in generation without structural constraints. We evaluate
Iter-RetGen on multi-hop question answering, fact verification, and commonsense
reasoning, and show that it can flexibly leverage parametric knowledge and
non-parametric knowledge, and is superior to or competitive with
state-of-the-art retrieval-augmented baselines while causing fewer overheads of
retrieval and generation. We can further improve performance via
generation-augmented retrieval adaptation.
Authors' comments: Preprint
Daniel Campos, ChengXiang Zhai
Vector-based retrieval systems have become a common staple for academic and industrial search applications because they provide a simple and scalable way of extending the search to leverage contextual representations for documents and queries. As these vector-based systems rely on contextual language models, their usage commonly requires GPUs, which can be expensive and difficult to manage. Given recent advances in introducing sparsity into language models for improved inference efficiency, in this paper, we study how sparse language models can be used for dense retrieval to improve inference efficiency. Using the popular retrieval library Tevatron and the MSMARCO, NQ, and TriviaQA datasets, we find that sparse language models can be used as direct replacements with little to no drop in accuracy and up to 4.3x improved inference speeds
Gustavo Penha, Claudia Hauff
A number of learned sparse and dense retrieval approaches have recently been
proposed and proven effective in tasks such as passage retrieval and document
retrieval. In this paper we analyze with a replicability study if the lessons
learned generalize to the retrieval of responses for dialogues, an important
task for the increasingly popular field of conversational search. Unlike
passage and document retrieval where documents are usually longer than queries,
in response ranking for dialogues the queries (dialogue contexts) are often
longer than the documents (responses). Additionally, dialogues have a
particular structure, i.e. multiple utterances by different users. With these
differences in mind, we here evaluate how generalizable the following major
findings from previous works are: (F1) query expansion outperforms a
no-expansion baseline; (F2) document expansion outperforms a no-expansion
baseline; (F3) zero-shot dense retrieval underperforms sparse baselines; (F4)
dense retrieval outperforms sparse baselines; (F5) hard negative sampling is
better than random sampling for training dense models. Our experiments -- based
on three different information-seeking dialogue datasets -- reveal that four
out of five findings (F2-F5) generalize to our domain
Authors' comments: Accepted for publication in the European Conference on Information
Retrieval (ECIR'23). arXiv admin note: substantial text overlap with
arXiv:2204.10558
Parishad BehnamGhader, Santiago Miret, Siva Reddy
Augmenting pretrained language models with retrievers has shown promise in
effectively solving common NLP problems, such as language modeling and question
answering. In this paper, we evaluate the strengths and weaknesses of popular
retriever-augmented language models, namely kNN-LM, REALM, DPR + FiD,
Contriever + ATLAS, and Contriever + Flan-T5, in reasoning over retrieved
statements across different tasks. Our findings indicate that the simple
similarity metric employed by retrievers is insufficient for retrieving all the
necessary statements for reasoning. Additionally, the language models do not
exhibit strong reasoning even when provided with only the required statements.
Furthermore, when combined with imperfect retrievers, the performance of the
language models becomes even worse, e.g., Flan-T5's performance drops by 28.6%
when retrieving 5 statements using Contriever. While larger language models
improve performance, there is still a substantial room for enhancement. Our
further analysis indicates that multihop retrieve-and-read is promising for
large language models like GPT-3.5, but does not generalize to other language
models like Flan-T5-xxl.
Authors' comments: Accepted in EMNLP2023 Findings
Zhengbao Jiang, Luyu Gao, Jun Araki, Haibo Ding, Zhiruo Wang, Jamie Callan, Graham Neubig
Systems for knowledge-intensive tasks such as open-domain question answering
(QA) usually consist of two stages: efficient retrieval of relevant documents
from a large corpus and detailed reading of the selected documents to generate
answers. Retrievers and readers are usually modeled separately, which
necessitates a cumbersome implementation and is hard to train and adapt in an
end-to-end fashion. In this paper, we revisit this design and eschew the
separate architecture and training in favor of a single Transformer that
performs Retrieval as Attention (ReAtt), and end-to-end training solely based
on supervision from the end QA task. We demonstrate for the first time that a
single model trained end-to-end can achieve both competitive retrieval and QA
performance, matching or slightly outperforming state-of-the-art separately
trained retrievers and readers. Moreover, end-to-end adaptation significantly
boosts its performance on out-of-domain datasets in both supervised and
unsupervised settings, making our model a simple and adaptable solution for
knowledge-intensive tasks. Code and models are available at
https://github.com/jzbjyb/ReAtt.
Authors' comments: EMNLP 2022
Dingkun Long, Yanzhao Zhang, Guangwei Xu, Pengjun Xie
Pre-trained language model (PTM) has been shown to yield powerful text
representations for dense passage retrieval task. The Masked Language Modeling
(MLM) is a major sub-task of the pre-training process. However, we found that
the conventional random masking strategy tend to select a large number of
tokens that have limited effect on the passage retrieval task (e,g. stop-words
and punctuation). By noticing the term importance weight can provide valuable
information for passage retrieval, we hereby propose alternative retrieval
oriented masking (dubbed as ROM) strategy where more important tokens will have
a higher probability of being masked out, to capture this straightforward yet
essential information to facilitate the language model pre-training process.
Notably, the proposed new token masking method will not change the architecture
and learning objective of original PTM. Our experiments verify that the
proposed ROM enables term importance information to help language model
pre-training thus achieving better performance on multiple passage retrieval
benchmarks.
Authors' comments: Search LM part of the "AliceMind SLM + HLAR" method in MS MARCO
Passage Ranking Leaderboard Submission
Yury Zemlyanskiy, Michiel de Jong, Joshua Ainslie, Panupong Pasupat, Peter Shaw, Linlu Qiu, Sumit Sanghai, Fei Sha
A common recent approach to semantic parsing augments sequence-to-sequence
models by retrieving and appending a set of training samples, called exemplars.
The effectiveness of this recipe is limited by the ability to retrieve
informative exemplars that help produce the correct parse, which is especially
challenging in low-resource settings. Existing retrieval is commonly based on
similarity of query and exemplar inputs. We propose GandR, a retrieval
procedure that retrieves exemplars for which outputs are also similar.
GandRfirst generates a preliminary prediction with input-based retrieval. Then,
it retrieves exemplars with outputs similar to the preliminary prediction which
are used to generate a final prediction. GandR sets the state of the art on
multiple low-resource semantic parsing tasks.
Authors' comments: To appear in the proceedings of COLING 2022
Zhenghao Liu, Chenyan Xiong, Yuanhuiyi Lv, Zhiyuan Liu, Ge Yu
This paper presents Universal Vision-Language Dense Retrieval (UniVL-DR),
which builds a unified model for multi-modal retrieval. UniVL-DR encodes
queries and multi-modality resources in an embedding space for searching
candidates from different modalities. To learn a unified embedding space for
multi-modal retrieval, UniVL-DR proposes two techniques: 1) Universal embedding
optimization strategy, which contrastively optimizes the embedding space using
the modality-balanced hard negatives; 2) Image verbalization method, which
bridges the modality gap between images and texts in the raw data space.
UniVL-DR achieves the state-of-the-art on the multi-modal open-domain question
answering benchmark, WebQA, and outperforms all retrieval models on the two
subtasks, text-text retrieval and text-image retrieval. It demonstrates that
universal multi-modal search is feasible to replace the divide-and-conquer
pipeline with a united model and also benefits single/cross modality tasks. All
source codes of this work are available at
https://github.com/OpenMatch/UniVL-DR.
Authors' comments: Accepted by ICLR 2023
Sebastian Hofstätter, Nick Craswell, Bhaskar Mitra, Hamed Zamani, Allan Hanbury
Recently, several dense retrieval (DR) models have demonstrated competitive performance to term-based retrieval that are ubiquitous in search systems. In contrast to term-based matching, DR projects queries and documents into a dense vector space and retrieves results via (approximate) nearest neighbor search. Deploying a new system, such as DR, inevitably involves tradeoffs in aspects of its performance. Established retrieval systems running at scale are usually well understood in terms of effectiveness and costs, such as query latency, indexing throughput, or storage requirements. In this work, we propose a framework with a set of criteria that go beyond simple effectiveness measures to thoroughly compare two retrieval systems with the explicit goal of assessing the readiness of one system to replace the other. This includes careful tradeoff considerations between effectiveness and various cost factors. Furthermore, we describe guardrail criteria, since even a system that is better on average may have systematic failures on a minority of queries. The guardrails check for failures on certain query characteristics and novel failure types that are only possible in dense retrieval systems. We demonstrate our decision framework on a Web ranking scenario. In that scenario, state-of-the-art DR models have surprisingly strong results, not only on average performance but passing an extensive set of guardrail tests, showing robustness on different query characteristics, lexical matching, generalization, and number of regressions. It is impossible to predict whether DR will become ubiquitous in the future, but one way this is possible is through repeated applications of decision processes such as the one presented here.
Dwaipayan Roy, Zeljko Carevic, Philipp Mayr
In this paper, we investigate the retrievability of datasets and publications
in a real-life Digital Library (DL). The measure of retrievability was
originally developed to quantify the influence that a retrieval system has on
the access to information. Retrievability can also enable DL engineers to
evaluate their search engine to determine the ease with which the content in
the collection can be accessed. Following this methodology, in our study, we
propose a system-oriented approach for studying dataset and publication
retrieval. A speciality of this paper is the focus on measuring the
accessibility biases of various types of DL items and including a metric of
usefulness. Among other metrics, we use Lorenz curves and Gini coefficients to
visualize the differences of the two retrievable document types (specifically
datasets and publications). Empirical results reported in the paper show a
distinguishable diversity in the retrievability scores among the documents of
different types.
Authors' comments: ACM/IEEE Joint Conference on Digital Libraries (JCDL) 2022 research
paper
Wei Zhong, Jheng-Hong Yang, Yuqing Xie, Jimmy Lin
With the recent success of dense retrieval methods based on bi-encoders, studies have applied this approach to various interesting downstream retrieval tasks with good efficiency and in-domain effectiveness. Recently, we have also seen the presence of dense retrieval models in Math Information Retrieval (MIR) tasks, but the most effective systems remain classic retrieval methods that consider hand-crafted structure features. In this work, we try to combine the best of both worlds:\ a well-defined structure search method for effective formula search and efficient bi-encoder dense retrieval models to capture contextual similarities. Specifically, we have evaluated two representative bi-encoder models for token-level and passage-level dense retrieval on recent MIR tasks. Our results show that bi-encoder models are highly complementary to existing structure search methods, and we are able to advance the state-of-the-art on MIR datasets.
Jianjin Zhang, Zheng Liu, Weihao Han, Shitao Xiao, Ruicheng Zheng, Yingxia Shao, Hao Sun, Hanqing Zhu et al.
Embedding based retrieval (EBR) is a fundamental building block in many web applications. However, EBR in sponsored search is distinguished from other generic scenarios and technically challenging due to the need of serving multiple retrieval purposes: firstly, it has to retrieve high-relevance ads, which may exactly serve user's search intent; secondly, it needs to retrieve high-CTR ads so as to maximize the overall user clicks. In this paper, we present a novel representation learning framework Uni-Retriever developed for Bing Search, which unifies two different training modes knowledge distillation and contrastive learning to realize both required objectives. On one hand, the capability of making high-relevance retrieval is established by distilling knowledge from the ``relevance teacher model''. On the other hand, the capability of making high-CTR retrieval is optimized by learning to discriminate user's clicked ads from the entire corpus. The two training modes are jointly performed as a multi-objective learning process, such that the ads of high relevance and CTR can be favored by the generated embeddings. Besides the learning strategy, we also elaborate our solution for EBR serving pipeline built upon the substantially optimized DiskANN, where massive-scale EBR can be performed with competitive time and memory efficiency, and accomplished in high-quality. We make comprehensive offline and online experiments to evaluate the proposed techniques, whose findings may provide useful insights for the future development of EBR systems. Uni-Retriever has been mainstreamed as the major retrieval path in Bing's production thanks to the notable improvements on the representation and EBR serving quality.
Matthias Wellershoff
We characterise all pairs of finite order entire functions whose magnitudes
agree on two arbitrary lines in the complex plane by means of the Hadamard
factorisation theorem. Building on this, we also characterise all pairs of
second order entire functions whose magnitudes agree on infinitely many
equidistant parallel lines. Furthermore, we show that the magnitude of an
entire function on three parallel lines, whose distances are rationally
independent, uniquely determines the function up to global phase, and that
there exists a first order entire function whose magnitude on the lines
$\mathbb{R} + \tfrac{\mathrm{i}}{n} \mathbb{Z}$ does not uniquely determine it
up to global phase, for all positive integers $n$. Our results have direct
implications for Gabor phase retrieval.
Authors' comments: 29 pages, 2 figures; minor change in acknowledgements
Sophia Althammer, Sebastian Hofstätter, Mete Sertkan, Suzan Verberne, Allan Hanbury
Dense passage retrieval (DPR) models show great effectiveness gains in first
stage retrieval for the web domain. However in the web domain we are in a
setting with large amounts of training data and a query-to-passage or a
query-to-document retrieval task. We investigate in this paper dense
document-to-document retrieval with limited labelled target data for training,
in particular legal case retrieval. In order to use DPR models for
document-to-document retrieval, we propose a Paragraph Aggregation Retrieval
Model (PARM) which liberates DPR models from their limited input length. PARM
retrieves documents on the paragraph-level: for each query paragraph, relevant
documents are retrieved based on their paragraphs. Then the relevant results
per query paragraph are aggregated into one ranked list for the whole query
document. For the aggregation we propose vector-based aggregation with
reciprocal rank fusion (VRRF) weighting, which combines the advantages of
rank-based aggregation and topical aggregation based on the dense embeddings.
Experimental results show that VRRF outperforms rank-based aggregation
strategies for dense document-to-document retrieval with PARM. We compare PARM
to document-level retrieval and demonstrate higher retrieval effectiveness of
PARM for lexical and dense first-stage retrieval on two different legal case
retrieval collections. We investigate how to train the dense retrieval model
for PARM on limited target data with labels on the paragraph or the
document-level. In addition, we analyze the differences of the retrieved
results of lexical and dense retrieval with PARM.
Authors' comments: Accepted at ECIR 2022
Konstantin Avrachenkov, Evsey Morozov, Ruslana Nekrasova
We establish stability criterion for a two-class retrial system with Poisson inputs, general class-dependent service times and class-dependent constant retrial rates. We also characterise an interesting phenomenon of partial stability when one orbit is tight but the other orbit goes to infinity in probability. All theoretical results are illustrated by numerical experiments.
Xilun Chen, Kushal Lakhotia, Barlas Oğuz, Anchit Gupta, Patrick Lewis, Stan Peshterliev, Yashar Mehdad, Sonal Gupta et al.
Despite their recent popularity and well-known advantages, dense retrievers still lag behind sparse methods such as BM25 in their ability to reliably match salient phrases and rare entities in the query and to generalize to out-of-domain data. It has been argued that this is an inherent limitation of dense models. We rebut this claim by introducing the Salient Phrase Aware Retriever (SPAR), a dense retriever with the lexical matching capacity of a sparse model. We show that a dense Lexical Model {\Lambda} can be trained to imitate a sparse one, and SPAR is built by augmenting a standard dense retriever with {\Lambda}. Empirically, SPAR shows superior performance on a range of tasks including five question answering datasets, MS MARCO passage retrieval, as well as the EntityQuestions and BEIR benchmarks for out-of-domain evaluation, exceeding the performance of state-of-the-art dense and sparse retrievers. The code and models of SPAR are available at: https://github.com/facebookresearch/dpr-scale/tree/main/spar
Sophia Althammer, Arian Askari, Suzan Verberne, Allan Hanbury
In this paper, we present our approaches for the case law retrieval and the
legal case entailment task in the Competition on Legal Information
Extraction/Entailment (COLIEE) 2021. As first stage retrieval methods combined
with neural re-ranking methods using contextualized language models like BERT
achieved great performance improvements for information retrieval in the web
and news domain, we evaluate these methods for the legal domain. A distinct
characteristic of legal case retrieval is that the query case and case
description in the corpus tend to be long documents and therefore exceed the
input length of BERT. We address this challenge by combining lexical and dense
retrieval methods on the paragraph-level of the cases for the first stage
retrieval. Here we demonstrate that the retrieval on the paragraph-level
outperforms the retrieval on the document-level. Furthermore the experiments
suggest that dense retrieval methods outperform lexical retrieval. For
re-ranking we address the problem of long documents by summarizing the cases
and fine-tuning a BERT-based re-ranker with the summaries. Overall, our best
results were obtained with a combination of BM25 and dense passage retrieval
using domain-specific embeddings.
Authors' comments: Published in COLIEE 2021
Hao Cheng, Ping Wang, Chun Qi
As important data carriers, the drastically increasing number of multimedia
videos often brings many duplicate and near-duplicate videos in the top results
of search. Near-duplicate video retrieval (NDVR) can cluster and filter out the
redundant contents. In this paper, the proposed NDVR approach extracts the
frame-level video representation based on convolutional neural network (CNN)
features from fully-connected layer and aggregated intermediate convolutional
layers. Unsupervised metric learning is used for similarity measurement and
feature matching. An efficient re-ranking algorithm combined with k-nearest
neighborhood fuses the retrieval results from two levels of features and
further improves the retrieval performance. Extensive experiments on the widely
used CC\_WEB\_VIDEO dataset shows that the proposed approach exhibits superior
performance over the state-of-the-art.
Authors' comments: This paper is submitted to ICIP 2019
Gregor Geigle, Jonas Pfeiffer, Nils Reimers, Ivan Vulić, Iryna Gurevych
Current state-of-the-art approaches to cross-modal retrieval process text and
visual input jointly, relying on Transformer-based architectures with
cross-attention mechanisms that attend over all words and objects in an image.
While offering unmatched retrieval performance, such models: 1) are typically
pretrained from scratch and thus less scalable, 2) suffer from huge retrieval
latency and inefficiency issues, which makes them impractical in realistic
applications. To address these crucial gaps towards both improved and efficient
cross-modal retrieval, we propose a novel fine-tuning framework that turns any
pretrained text-image multi-modal model into an efficient retrieval model. The
framework is based on a cooperative retrieve-and-rerank approach which
combines: 1) twin networks (i.e., a bi-encoder) to separately encode all items
of a corpus, enabling efficient initial retrieval, and 2) a cross-encoder
component for a more nuanced (i.e., smarter) ranking of the retrieved small set
of items. We also propose to jointly fine-tune the two components with shared
weights, yielding a more parameter-efficient model. Our experiments on a series
of standard cross-modal retrieval benchmarks in monolingual, multilingual, and
zero-shot setups, demonstrate improved accuracy and huge efficiency benefits
over the state-of-the-art cross-encoders.
Authors' comments: TACL 2022
Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Min Zhang, Shaoping Ma
Ranking has always been one of the top concerns in information retrieval research. For decades, lexical matching signal has dominated the ad-hoc retrieval process, but it also has inherent defects, such as the vocabulary mismatch problem. Recently, Dense Retrieval (DR) technique has been proposed to alleviate these limitations by capturing the deep semantic relationship between queries and documents. The training of most existing Dense Retrieval models relies on sampling negative instances from the corpus to optimize a pairwise loss function. Through investigation, we find that this kind of training strategy is biased and fails to optimize full retrieval performance effectively and efficiently. To solve this problem, we propose a Learning To Retrieve (LTRe) training technique. LTRe constructs the document index beforehand. At each training iteration, it performs full retrieval without negative sampling and then updates the query representation model parameters. Through this process, it teaches the DR model how to retrieve relevant documents from the entire corpus instead of how to rerank a potentially biased sample of documents. Experiments in both passage retrieval and document retrieval tasks show that: 1) in terms of effectiveness, LTRe significantly outperforms all competitive sparse and dense baselines. It even gains better performance than the BM25-BERT cascade system under reasonable latency constraints. 2) in terms of training efficiency, compared with the previous state-of-the-art DR method, LTRe provides more than 170x speed-up in the training process. Training with a compressed index further saves computing resources with minor performance loss.