Huayang Li, Yixuan Su, Deng Cai, Yan Wang, Lemao Liu
Recently, retrieval-augmented text generation attracted increasing attention
of the computational linguistics community. Compared with conventional
generation models, retrieval-augmented text generation has remarkable
advantages and particularly has achieved state-of-the-art performance in many
NLP tasks. This paper aims to conduct a survey about retrieval-augmented text
generation. It firstly highlights the generic paradigm of retrieval-augmented
generation, and then it reviews notable approaches according to different tasks
including dialogue response generation, machine translation, and other
generation tasks. Finally, it points out some important directions on top of
recent methods to facilitate future research.
Authors' comments: all authors contributed equally
Philippe Weinzaepfel, Thomas Lucas, Diane Larlus, Yannis Kalantidis
Methods that combine local and global features have recently shown excellent
performance on multiple challenging deep image retrieval benchmarks, but their
use of local features raises at least two issues. First, these local features
simply boil down to the localized map activations of a neural network, and
hence can be extremely redundant. Second, they are typically trained with a
global loss that only acts on top of an aggregation of local features; by
contrast, testing is based on local feature matching, which creates a
discrepancy between training and testing. In this paper, we propose a novel
architecture for deep image retrieval, based solely on mid-level features that
we call Super-features. These Super-features are constructed by an iterative
attention module and constitute an ordered set in which each element focuses on
a localized and discriminant image pattern. For training, they require only
image labels. A contrastive loss operates directly at the level of
Super-features and focuses on those that match across images. A second
complementary loss encourages diversity. Experiments on common landmark
retrieval benchmarks validate that Super-features substantially outperform
state-of-the-art methods when using the same number of features, and only
require a significantly smaller memory footprint to match their performance.
Code and models are available at: https://github.com/naver/FIRe.
Authors' comments: ICLR 2022
Samuel Pinilla, Kumar Vijay Mishra, Brian M. Sadler, Henry Arguello
The ability of a radar to discriminate in both range and Doppler velocity is
completely characterized by the ambiguity function (AF) of its transmit
waveform. Mathematically, it is obtained by correlating the waveform with its
Doppler-shifted and delayed replicas. We consider the inverse problem of
designing a radar transmit waveform that satisfies the specified AF magnitude.
This process can be viewed as a signal reconstruction with some variation of
phase retrieval methods. We provide a trust-region algorithm that minimizes a
smoothed non-convex least-squares objective function to iteratively recover the
underlying signal-of-interest for either time- or band-limited support. The
method first approximates the signal using an iterative spectral algorithm and
then refines the attained initialization based upon a sequence of gradient
iterations. Our theoretical analysis shows that unique signal reconstruction is
possible using signal samples no more than thrice the number of signal
frequencies or time samples. Numerical experiments demonstrate that our method
recovers both time- and band-limited signals from even sparsely and randomly
sampled AFs with mean-square-error of $1\times 10^{-6}$ and $9\times 10^{-2}$
for the full noiseless samples and sparse noisy samples, respectively.
Authors' comments: 18 pages, 12 figures, 1 table
Bing Gao
We consider the problem of recovering a signal from the magnitudes of affine
measurements, which is also known as {\em affine phase retrieval}. In this
paper, we formulate affine phase retrieval as an optimization problem and
develop a second-order algorithm based on Newton method to solve it. Besides
being able to convert into a phase retrieval problem, affine phase retrieval
has its unique advantages in its solution. For example, the linear information
in the observation makes it possible to solve this problem with second-order
algorithms under complex measurements. Another advantage is that our algorithm
doesn't have any special requirements for the initial point, while an
appropriate initial value is essential for most non-convex phase retrieval
algorithms. Starting from zero, our algorithm generates iteration point by
Newton method, and we prove that the algorithm can quadratically converge to
the true signal without any ambiguity for both Gaussian measurements and CDP
measurements. In addition, we also use some numerical simulations to verify the
conclusions and to show the effectiveness of the algorithm.
Authors' comments: 15 pages, 2 figures
Jianfeng Gao, Chenyan Xiong, Paul Bennett, Nick Craswell
A conversational information retrieval (CIR) system is an information
retrieval (IR) system with a conversational interface which allows users to
interact with the system to seek information via multi-turn conversations of
natural language, in spoken or written form. Recent progress in deep learning
has brought tremendous improvements in natural language processing (NLP) and
conversational AI, leading to a plethora of commercial conversational services
that allow naturally spoken and typed interaction, increasing the need for more
human-centric interactions in IR. As a result, we have witnessed a resurgent
interest in developing modern CIR systems in both research communities and
industry. This book surveys recent advances in CIR, focusing on neural
approaches that have been developed in the last few years. This book is based
on the authors' tutorial at SIGIR'2020 (Gao et al., 2020b), with IR and NLP
communities as the primary target audience. However, audiences with other
background, such as machine learning and human-computer interaction, will also
find it an accessible introduction to CIR. We hope that this book will prove a
valuable resource for students, researchers, and software developers. This
manuscript is a working draft. Comments are welcome.
Authors' comments: Book Draft
Tatiana Latychevskaia
Modern microscopy techniques are developing towards high-resolution imaging, and tremendous progress has been made in past decades; however, the imaging of individual biological macromolecules at atomic resolution using short-wavelength radiation such as electrons or X-rays has not yet been achieved. The construction of free-electron lasers in many countries around the world arises from the desire to develop new imaging techniques by employing coherent radiation to image individual macromolecules. This work deals with coherent imaging and related phase retrieval techniques, with an emphasis on their application in the imaging of individual biological macromolecules.
Sebastian Hofstätter, Sophia Althammer, Mete Sertkan, Allan Hanbury
We present strong Transformer-based re-ranking and dense retrieval baselines
for the recently released TripClick health ad-hoc retrieval collection. We
improve the - originally too noisy - training data with a simple negative
sampling policy. We achieve large gains over BM25 in the re-ranking task of
TripClick, which were not achieved with the original baselines. Furthermore, we
study the impact of different domain-specific pre-trained models on TripClick.
Finally, we show that dense retrieval outperforms BM25 by considerable margins,
even with simple training procedures.
Authors' comments: Accepted at ECIR 2022
Hai Su, Meiyin Han, Junle Liang, Jun Liang, Songsen Yu
Compared with the traditional hashing methods, deep hashing methods generate hash codes with rich semantic information and greatly improves the performances in the image retrieval field. However, it is unsatisfied for current deep hashing methods to predict the similarity of hard examples. It exists two main factors affecting the ability of learning hard examples, which are weak key features extraction and the shortage of hard examples. In this paper, we give a novel end-to-end model to extract the key feature from hard examples and obtain hash code with the accurate semantic information. In addition, we redesign a hard pair-wise loss function to assess the hard degree and update penalty weights of examples. It effectively alleviates the shortage problem in hard examples. Experimental results on CIFAR-10 and NUS-WIDE demonstrate that our model outperformances the mainstream hashing-based image retrieval methods.
Simion-Vlad Bogolin, Ioana Croitoru, Hailin Jin, Yang Liu, Samuel Albanie
Profiting from large-scale training datasets, advances in neural architecture
design and efficient inference, joint embeddings have become the dominant
approach for tackling cross-modal retrieval. In this work we first show that,
despite their effectiveness, state-of-the-art joint embeddings suffer
significantly from the longstanding "hubness problem" in which a small number
of gallery embeddings form the nearest neighbours of many queries. Drawing
inspiration from the NLP literature, we formulate a simple but effective
framework called Querybank Normalisation (QB-Norm) that re-normalises query
similarities to account for hubs in the embedding space. QB-Norm improves
retrieval performance without requiring retraining. Differently from prior
work, we show that QB-Norm works effectively without concurrent access to any
test set queries. Within the QB-Norm framework, we also propose a novel
similarity normalisation method, the Dynamic Inverted Softmax, that is
significantly more robust than existing approaches. We showcase QB-Norm across
a range of cross modal retrieval models and benchmarks where it consistently
enhances strong baselines beyond the state of the art. Code is available at
https://vladbogo.github.io/QB-Norm/.
Authors' comments: Accepted at CVPR 2022
Young Kyun Jang, Geonmo Gu, Byungsoo Ko, Isaac Kang, Nam Ik Cho
In hash-based image retrieval systems, degraded or transformed inputs usually
generate different codes from the original, deteriorating the retrieval
accuracy. To mitigate this issue, data augmentation can be applied during
training. However, even if augmented samples of an image are similar in real
feature space, the quantization can scatter them far away in Hamming space.
This results in representation discrepancies that can impede training and
degrade performance. In this work, we propose a novel self-distilled hashing
scheme to minimize the discrepancy while exploiting the potential of augmented
data. By transferring the hash knowledge of the weakly-transformed samples to
the strong ones, we make the hash code insensitive to various transformations.
We also introduce hash proxy-based similarity learning and binary cross
entropy-based quantization loss to provide fine quality hash codes. Ultimately,
we construct a deep hashing framework that not only improves the existing deep
hashing approaches, but also achieves the state-of-the-art retrieval results.
Extensive experiments are conducted and confirm the effectiveness of our work.
Authors' comments: ECCV2022
Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, Edouard Grave
Recently, information retrieval has seen the emergence of dense retrievers, using neural networks, as an alternative to classical sparse methods based on term-frequency. These models have obtained state-of-the-art results on datasets and tasks where large training sets are available. However, they do not transfer well to new applications with no training data, and are outperformed by unsupervised term-frequency methods such as BM25. In this work, we explore the limits of contrastive learning as a way to train unsupervised dense retrievers and show that it leads to strong performance in various retrieval settings. On the BEIR benchmark our unsupervised model outperforms BM25 on 11 out of 15 datasets for the Recall@100. When used as pre-training before fine-tuning, either on a few thousands in-domain examples or on the large MS~MARCO dataset, our contrastive model leads to improvements on the BEIR benchmark. Finally, we evaluate our approach for multi-lingual retrieval, where training data is even scarcer than for English, and show that our approach leads to strong unsupervised performance. Our model also exhibits strong cross-lingual transfer when fine-tuned on supervised English data only and evaluated on low resources language such as Swahili. We show that our unsupervised models can perform cross-lingual retrieval between different scripts, such as retrieving English documents from Arabic queries, which would not be possible with term matching methods.
Jianmo Ni, Chen Qu, Jing Lu, Zhuyun Dai, Gustavo Hernández Ábrego, Ji Ma, Vincent Y. Zhao, Yi Luan et al.
It has been shown that dual encoders trained on one domain often fail to generalize to other domains for retrieval tasks. One widespread belief is that the bottleneck layer of a dual encoder, where the final score is simply a dot-product between a query vector and a passage vector, is too limited to make dual encoders an effective retrieval model for out-of-domain generalization. In this paper, we challenge this belief by scaling up the size of the dual encoder model {\em while keeping the bottleneck embedding size fixed.} With multi-stage training, surprisingly, scaling up the model size brings significant improvement on a variety of retrieval tasks, especially for out-of-domain generalization. Experimental results show that our dual encoders, \textbf{G}eneralizable \textbf{T}5-based dense \textbf{R}etrievers (GTR), outperform %ColBERT~\cite{khattab2020colbert} and existing sparse and dense retrievers on the BEIR dataset~\cite{thakur2021beir} significantly. Most surprisingly, our ablation study finds that GTR is very data efficient, as it only needs 10\% of MS Marco supervised data to achieve the best out-of-domain performance. All the GTR models are released at https://tfhub.dev/google/collections/gtr/1.
Ori Ram, Gal Shachaf, Omer Levy, Jonathan Berant, Amir Globerson
Dense retrievers for open-domain question answering (ODQA) have been shown to
achieve impressive performance by training on large datasets of
question-passage pairs. In this work we ask whether this dependence on labeled
data can be reduced via unsupervised pretraining that is geared towards ODQA.
We show this is in fact possible, via a novel pretraining scheme designed for
retrieval. Our "recurring span retrieval" approach uses recurring spans across
passages in a document to create pseudo examples for contrastive learning. Our
pretraining scheme directly controls for term overlap across pseudo queries and
relevant passages, thus allowing to model both lexical and semantic relations
between them. The resulting model, named Spider, performs surprisingly well
without any labeled training examples on a wide range of ODQA datasets.
Specifically, it significantly outperforms all other pretrained baselines in a
zero-shot setting, and is competitive with BM25, a strong sparse baseline.
Moreover, a hybrid retriever over Spider and BM25 improves over both, and is
often competitive with DPR models, which are trained on tens of thousands of
examples. Last, notable gains are observed when using Spider as an
initialization for supervised training.
Authors' comments: NAACL 2022
Hui Wu, Min Wang, Wengang Zhou, Yang Hu, Houqiang Li
In image retrieval, deep local features learned in a data-driven manner have
been demonstrated effective to improve retrieval performance. To realize
efficient retrieval on large image database, some approaches quantize deep
local features with a large codebook and match images with aggregated match
kernel. However, the complexity of these approaches is non-trivial with large
memory footprint, which limits their capability to jointly perform feature
learning and aggregation. To generate compact global representations while
maintaining regional matching capability, we propose a unified framework to
jointly learn local feature representation and aggregation. In our framework,
we first extract deep local features using CNNs. Then, we design a tokenizer
module to aggregate them into a few visual tokens, each corresponding to a
specific visual pattern. This helps to remove background noise, and capture
more discriminative regions in the image. Next, a refinement block is
introduced to enhance the visual tokens with self-attention and
cross-attention. Finally, different visual tokens are concatenated to generate
a compact global representation. The whole framework is trained end-to-end with
image-level labels. Extensive experiments are conducted to evaluate our
approach, which outperforms the state-of-the-art methods on the Revisited
Oxford and Paris datasets.
Authors' comments: Our code is available at https://github.com/MCC-WH/Token
Zelu Deng, Yujie Zhong, Sheng Guo, Weilin Huang
This work aims at improving instance retrieval with self-supervision. We find
that fine-tuning using the recently developed self-supervised (SSL) learning
methods, such as SimCLR and MoCo, fails to improve the performance of instance
retrieval. In this work, we identify that the learnt representations for
instance retrieval should be invariant to large variations in viewpoint and
background etc., whereas self-augmented positives applied by the current SSL
methods can not provide strong enough signals for learning robust
instance-level representations. To overcome this problem, we propose InsCLR, a
new SSL method that builds on the \textit{instance-level} contrast, to learn
the intra-class invariance by dynamically mining meaningful pseudo positive
samples from both mini-batches and a memory bank during training. Extensive
experiments demonstrate that InsCLR achieves similar or even better performance
than the state-of-the-art SSL methods on instance retrieval. Code is available
at https://github.com/zeludeng/insclr.
Authors' comments: Accepted by AAAI 2022
Zhenting Luan, Zhenyu Ming, Yuchi Wu, Wei Han, Xiang Chen, Bo Bai, Liping Zhang
Harmonic retrieval (HR) has a wide range of applications in the scenes where signals are modelled as a summation of sinusoids. Past works have developed a number of approaches to recover the original signals. Most of them rely on classical singular value decomposition, which are vulnerable to unexpected outliers. In this paper, we present new decomposition algorithms of third-order complex-valued tensors with $L_1$-principle component analysis ($L_1$-PCA) of complex data and apply them to a novel random access HR model in presence of outliers. We also develop a novel subcarrier recovery method for the proposed model. Simulations are designed to compare our proposed method with some existing tensor-based algorithms for HR. The results demonstrate the outlier-insensitivity of the proposed method.
Shalev Lifshitz, Abtin Riasatian, H. R. Tizhoosh
Recent advances in digital pathology have led to the need for Histopathology Image Retrieval (HIR) systems that search through databases of biopsy images to find similar cases to a given query image. These HIR systems allow pathologists to effortlessly and efficiently access thousands of previously diagnosed cases in order to exploit the knowledge in the corresponding pathology reports. Since HIR systems may have to deal with millions of gigapixel images, the extraction of compact and expressive image features must be available to allow for efficient and accurate retrieval. In this paper, we propose the application of Gram barcodes as image features for HIR systems. Unlike most feature generation schemes, Gram barcodes are based on high-order statistics that describe tissue texture by summarizing the correlations between different feature maps in layers of convolutional neural networks. We run HIR experiments on three public datasets using a pre-trained VGG19 network for Gram barcode generation and showcase highly competitive results.
Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Jiafeng Guo, Min Zhang, Shaoping Ma
Dense Retrieval (DR) reaches state-of-the-art results in first-stage retrieval, but little is known about the mechanisms that contribute to its success. Therefore, in this work, we conduct an interpretation study of recently proposed DR models. Specifically, we first discretize the embeddings output by the document and query encoders. Based on the discrete representations, we analyze the attribution of input tokens. Both qualitative and quantitative experiments are carried out on public test collections. Results suggest that DR models pay attention to different aspects of input and extract various high-level topic representations. Therefore, we can regard the representations learned by DR models as a mixture of high-level topics.
Yixing Fan, Xiaohui Xie, Yinqiong Cai, Jia Chen, Xinyu Ma, Xiangsheng Li, Ruqing Zhang, Jiafeng Guo
The core of information retrieval (IR) is to identify relevant information from large-scale resources and return it as a ranked list to respond to the user's information need. In recent years, the resurgence of deep learning has greatly advanced this field and leads to a hot topic named NeuIR (i.e., neural information retrieval), especially the paradigm of pre-training methods (PTMs). Owing to sophisticated pre-training objectives and huge model size, pre-trained models can learn universal language representations from massive textual data, which are beneficial to the ranking task of IR. Recently, a large number of works, which are dedicated to the application of PTMs in IR, have been introduced to promote the retrieval performance. Considering the rapid progress of this direction, this survey aims to provide a systematic review of pre-training methods in IR. To be specific, we present an overview of PTMs applied in different components of an IR system, including the retrieval component, the re-ranking component, and other components. In addition, we also introduce PTMs specifically designed for IR, and summarize available datasets as well as benchmark leaderboards. Moreover, we discuss some open challenges and highlight several promising directions, with the hope of inspiring and facilitating more works on these topics for future research.
Kamilla Nazirkhanova, Joachim Neu, David Tse
The ability to verifiably retrieve transaction or state data stored off-chain is crucial to blockchain scaling techniques such as rollups or sharding. We formalize the problem and design a storage- and communication-efficient protocol using linear erasure-correcting codes and homomorphic vector commitments. Motivated by application requirements for rollups, our solution Semi-AVID-PR departs from earlier Verifiable Information Dispersal schemes in that we do not require comprehensive termination properties. Compared to Data Availability Oracles, under no circumstance do we fall back to returning empty blocks. Distributing a file of 22 MB among 256 storage nodes, up to 85 of which may be adversarial, requires in total ~70 MB of communication and storage, and ~41 seconds of single-thread runtime (<3 seconds on 16 threads) on an AMD Opteron 6378 processor when using the BLS12-381 curve. Our solution requires no modification to on-chain contracts of Validium rollups such as StarkWare's StarkEx. Additionally, it provides privacy of the dispersed data against honest-but-curious storage nodes. Finally, we discuss an application of our Semi-AVID-PR scheme to data availability verification schemes based on random sampling.