Yukang Gan, Yixiao Ge, Chang Zhou, Shupeng Su, Zhouchuan Xu, Xuyuan Xu, Quanchao Hui, Xiang Chen et al.
Large-scale embedding-based retrieval (EBR) is the cornerstone of search-related industrial applications. Given a user query, the system of EBR aims to identify relevant information from a large corpus of documents that may be tens or hundreds of billions in size. The storage and computation turn out to be expensive and inefficient with massive documents and high concurrent queries, making it difficult to further scale up. To tackle the challenge, we propose a binary embedding-based retrieval (BEBR) engine equipped with a recurrent binarization algorithm that enables customized bits per dimension. Specifically, we compress the full-precision query and document embeddings, formulated as float vectors in general, into a composition of multiple binary vectors using a lightweight transformation model with residual multilayer perception (MLP) blocks. We can therefore tailor the number of bits for different applications to trade off accuracy loss and cost savings. Importantly, we enable task-agnostic efficient training of the binarization model using a new embedding-to-embedding strategy. We also exploit the compatible training of binary embeddings so that the BEBR engine can support indexing among multiple embedding versions within a unified system. To further realize efficient search, we propose Symmetric Distance Calculation (SDC) to achieve lower response time than Hamming codes. We successfully employed the introduced BEBR to Tencent products, including Sogou, Tencent Video, QQ World, etc. The binarization algorithm can be seamlessly generalized to various tasks with multiple modalities. Extensive experiments on offline benchmarks and online A/B tests demonstrate the efficiency and effectiveness of our method, significantly saving 30%~50% index costs with almost no loss of accuracy at the system level.
Jinkuan Zhu, Hao Huang, Qiao Deng, Xiyao Li
Fashion image retrieval task aims to search relevant clothing items of a query image from the gallery. The previous recipes focus on designing different distance-based loss functions, pulling relevant pairs to be close and pushing irrelevant images apart. However, these methods ignore fine-grained features (e.g. neckband, cuff) of clothing images. In this paper, we propose a novel fashion image retrieval method leveraging both global and fine-grained features, dubbed Multi-Granular Alignment (MGA). Specifically, we design a Fine-Granular Aggregator(FGA) to capture and aggregate detailed patterns. Then we propose Attention-based Token Alignment (ATA) to align image features at the multi-granular level in a coarse-to-fine manner. To prove the effectiveness of our proposed method, we conduct experiments on two sub-tasks (In-Shop & Consumer2Shop) of the public fashion datasets DeepFashion. The experimental results show that our MGA outperforms the state-of-the-art methods by 1.8% and 0.6% in the two sub-tasks on the R@1 metric, respectively.
Igor O. Zavadskyi
The variable-length Reverse Multi-Delimiter (RMD) codes are known to
represent sequences of unbounded and unordered integers. When applied to data
compression, they combine a good compression ratio with fast decoding. In this
paper, we investigate another property of RMD-codes - the ability of direct
access to codewords in the encoded bitstream. We present the method allowing us
to extract and decode a codeword from an RMD-bitstream in almost constant time
with the tiny space overhead, and make experiments on its application to
natural language text compression.
Authors' comments: 18 pages, 5 figures, 2 algorithms, 1 table
Yanwen Fang, Yuxi Cai, Jintai Chen, Jingyu Zhao, Guangjian Tian, Guodong Li
More and more evidence has shown that strengthening layer interactions can
enhance the representation power of a deep neural network, while self-attention
excels at learning interdependencies by retrieving query-activated information.
Motivated by this, we devise a cross-layer attention mechanism, called
multi-head recurrent layer attention (MRLA), that sends a query representation
of the current layer to all previous layers to retrieve query-related
information from different levels of receptive fields. A light-weighted version
of MRLA is also proposed to reduce the quadratic computation cost. The proposed
layer attention mechanism can enrich the representation power of many
state-of-the-art vision networks, including CNNs and vision transformers. Its
effectiveness has been extensively evaluated in image classification, object
detection and instance segmentation tasks, where improvements can be
consistently observed. For example, our MRLA can improve 1.6% Top-1 accuracy on
ResNet-50, while only introducing 0.16M parameters and 0.07B FLOPs.
Surprisingly, it can boost the performances by a large margin of 3-4% box AP
and mask AP in dense prediction tasks. Our code is available at
https://github.com/joyfang1106/MRLA.
Authors' comments: Published as a conference paper at ICLR 2023
Peter Vouras, Kumar Vijay Mishra, Alexandra Artusio-Glimpse
Rydberg-aided atomic electrometry using alkali-metal atoms is gaining increased research interest for detecting external electric fields. However, the inability of Rydberg probes to detect phase is a serious impediment to their realistic deployment. In this paper, we derive a novel phase retrieval algorithm for use in a phased array or synthetic aperture applications where only measurements of electric field intensity are possible at each spatial sample. These array configurations exist if a Rydberg atom probe is used in place of an antenna. We employ three-stage alternating projections to solve the resulting optimization problem. Our numerical experiments demonstrate the effectiveness of the proposed algorithm in terms of beamformed array output.
Ori Ram, Yoav Levine, Itay Dalmedigos, Dor Muhlgay, Amnon Shashua, Kevin Leyton-Brown, Yoav Shoham
Retrieval-Augmented Language Modeling (RALM) methods, which condition a
language model (LM) on relevant documents from a grounding corpus during
generation, were shown to significantly improve language modeling performance.
In addition, they can mitigate the problem of factually inaccurate text
generation and provide natural source attribution mechanism. Existing RALM
approaches focus on modifying the LM architecture in order to facilitate the
incorporation of external information, significantly complicating deployment.
This paper considers a simple alternative, which we dub In-Context RALM:
leaving the LM architecture unchanged and prepending grounding documents to the
input, without any further training of the LM. We show that In-Context RALM
that builds on off-the-shelf general purpose retrievers provides surprisingly
large LM gains across model sizes and diverse corpora. We also demonstrate that
the document retrieval and ranking mechanism can be specialized to the RALM
setting to further boost performance. We conclude that In-Context RALM has
considerable potential to increase the prevalence of LM grounding, particularly
in settings where a pretrained LM must be used without modification or even via
API access.
Authors' comments: Accepted for publication in Transactions of the Association for
Computational Linguistics (TACL). pre-MIT Press publication version
Sushuang Ma, Yuichi Ito, Ahmed Faris Al-Refaie, Quentin Changeat, Billy Edwards, Giovanna Tinetti
In this paper, we present YunMa, an exoplanet cloud simulation and retrieval
package, which enables the study of cloud microphysics and radiative properties
in exoplanetary atmospheres. YunMa simulates the vertical distribution and
sizes of cloud particles and their corresponding scattering signature in
transit spectra. We validated YunMa against results from the literature. When
coupled to the TauREx 3 platform, an open Bayesian framework for spectral
retrievals, YunMa enables the retrieval of the cloud properties and parameters
from transit spectra of exoplanets. The sedimentation efficiency
($f_{\mathrm{sed}}$), which controls the cloud microphysics, is set as a free
parameter in retrievals. We assess the retrieval performances of YunMa through
28 instances of a K2-18 b-like atmosphere with different fractions of H$_2$/He
and N$_2$, and assuming water clouds. Our results show a substantial
improvement in retrieval performances when using YunMa instead of a simple
opaque cloud model and highlight the need to include cloud radiative transfer
and microphysics to interpret the next-generation data for exoplanet
atmospheres. This work also inspires instrumental development for future
flagships by demonstrating retrieval performances with different data quality.
Authors' comments: 24 pages, 12 figures, accepted in ApJ
Weijia Shi, Sewon Min, Michihiro Yasunaga, Minjoon Seo, Rich James, Mike Lewis, Luke Zettlemoyer, Wen-tau Yih
We introduce REPLUG, a retrieval-augmented language modeling framework that treats the language model (LM) as a black box and augments it with a tuneable retrieval model. Unlike prior retrieval-augmented LMs that train language models with special cross attention mechanisms to encode the retrieved text, REPLUG simply prepends retrieved documents to the input for the frozen black-box LM. This simple design can be easily applied to any existing retrieval and language models. Furthermore, we show that the LM can be used to supervise the retrieval model, which can then find documents that help the LM make better predictions. Our experiments demonstrate that REPLUG with the tuned retriever significantly improves the performance of GPT-3 (175B) on language modeling by 6.3%, as well as the performance of Codex on five-shot MMLU by 5.1%.
Kailash A. Hambarde, Hugo Proenca
In this paper, we provide a detailed overview of the models used for information retrieval in the first and second stages of the typical processing chain. We discuss the current state-of-the-art models, including methods based on terms, semantic retrieval, and neural. Additionally, we delve into the key topics related to the learning process of these models. This way, this survey offers a comprehensive understanding of the field and is of interest for for researchers and practitioners entering/working in the information retrieval domain.
Bo Fang, Wenhao Wu, Chang Liu, Yu Zhou, Yuxin Song, Weiping Wang, Xiangbo Shu, Xiangyang Ji et al.
With the explosive growth of web videos and emerging large-scale
vision-language pre-training models, e.g., CLIP, retrieving videos of interest
with text instructions has attracted increasing attention. A common practice is
to transfer text-video pairs to the same embedding space and craft cross-modal
interactions with certain entities in specific granularities for semantic
correspondence. Unfortunately, the intrinsic uncertainties of optimal entity
combinations in appropriate granularities for cross-modal queries are
understudied, which is especially critical for modalities with hierarchical
semantics, e.g., video, text, etc. In this paper, we propose an
Uncertainty-Adaptive Text-Video Retrieval approach, termed UATVR, which models
each look-up as a distribution matching procedure. Concretely, we add
additional learnable tokens in the encoders to adaptively aggregate
multi-grained semantics for flexible high-level reasoning. In the refined
embedding space, we represent text-video pairs as probabilistic distributions
where prototypes are sampled for matching evaluation. Comprehensive experiments
on four benchmarks justify the superiority of our UATVR, which achieves new
state-of-the-art results on MSR-VTT (50.8%), VATEX (64.5%), MSVD (49.7%), and
DiDeMo (45.8%). The code is available at https://github.com/bofang98/UATVR.
Authors' comments: To appear at ICCV2023
Malavika Vasist, François Rozet, Olivier Absil, Paul Mollière, Evert Nasedkin, Gilles Louppe
Retrieving the physical parameters from spectroscopic observations of
exoplanets is key to understanding their atmospheric properties. Exoplanetary
atmospheric retrievals are usually based on approximate Bayesian inference and
rely on sampling-based approaches to compute parameter posterior distributions.
Accurate or repeated retrievals, however, can result in very long computation
times due to the sequential nature of sampling-based algorithms. We aim to
amortize exoplanetary atmospheric retrieval using neural posterior estimation
(NPE), a simulation-based inference algorithm based on variational inference
and normalizing flows. In this way, we aim (i) to strongly reduce inference
time, (ii) to scale inference to complex simulation models with many nuisance
parameters or intractable likelihood functions, and (iii) to enable the
statistical validation of the inference results. We evaluate NPE on a radiative
transfer model for exoplanet spectra petitRADTRANS, including the effects of
scattering and clouds. We train a neural autoregressive flow to quickly
estimate posteriors and compare against retrievals computed with MultiNest. NPE
produces accurate posterior approximations while reducing inference time down
to a few seconds. We demonstrate the computational faithfulness of our
posterior approximations using inference diagnostics including posterior
predictive checks and coverage, taking advantage of the quasi-instantaneous
inference time of NPE. Our analysis confirms the reliability of the approximate
posteriors produced by NPE. The accuracy and reliability of the inference
results produced by NPE establishes it as a promising approach for atmospheric
retrievals. Amortization of the posterior inference makes repeated inference on
several observations computationally inexpensive since it does not require
on-the-fly simulations, making the retrieval efficient, scalable, and testable.
Authors' comments: The paper has been submitted to AandA after a final revision
Fahimeh Arabyani-Neyshaburi, Ali Akbar Arefijamaal, Rajab Ali Kamyabi-Gol
Recovering a signal up to a unimodular constant from the magnitudes of linear measurements has been popular and well studied in recent years. However, numerous unsolved problems regarding phase retrieval still exist. Given a phase retrieval frame, may the family of phase retrieval dual frames be classified? And is such a family dense in the set of dual frames? Can we present the equivalent conditions for a family of vectors to do weak phase retrieval in complex Hilbert space case? What is the connection between phase, weak phase and norm retrieval? In this context, we aim to deal with these open problems concerning phase retrieval dual frames, weak phase retrieval frames, and specially investigate equivalent conditions for identifying these features. We provide some characterizations of alternate dual frames of a phase retrieval frame which yield phase retrieval in finite dimensional Hilbert spaces. Moreover, for some classes of frames, we show that the family of phase retrieval dual frames is open and dense in the set of dual frames. Then, we study weak phase retrieval problem. Among other things, we obtain some equivalent conditions on a family of vectors to do phase retrieval in terms of weak phase retrieval.
Nam Le Hai, Thomas Gerald, Thibault Formal, Jian-Yun Nie, Benjamin Piwowarski, Laure Soulier
Conversational search is a difficult task as it aims at retrieving documents
based not only on the current user query but also on the full conversation
history. Most of the previous methods have focused on a multi-stage ranking
approach relying on query reformulation, a critical intermediate step that
might lead to a sub-optimal retrieval. Other approaches have tried to use a
fully neural IR first-stage, but are either zero-shot or rely on full
learning-to-rank based on a dataset with pseudo-labels. In this work,
leveraging the CANARD dataset, we propose an innovative lightweight learning
technique to train a first-stage ranker based on SPLADE. By relying on SPLADE
sparse representations, we show that, when combined with a second-stage ranker
based on T5Mono, the results are competitive on the TREC CAsT 2020 and 2021
tracks.
Authors' comments: Accepted at ECIR 2023
Wedad Alharbi, Daniel Freeman, Dorsa Ghoreishi, Claire Lois, Shanea Sebastian
A frame $(x_j)_{j\in J}$ for a Hilbert space $H$ is said to do phase
retrieval if for all distinct vectors $x,y\in H$ the magnitude of the frame
coefficients $(|\langle x, x_j\rangle|)_{j\in J}$ and $(|\langle y,
x_j\rangle|)_{j\in J}$ distinguish $x$ from $y$ (up to a unimodular scalar). A
frame which does phase retrieval is said to do $C$-stable phase retrieval if
the recovery of any vector $x\in H$ from the magnitude of the frame
coefficients is $C$-Lipschitz. It is known that if a frame does stable phase
retrieval then any sufficiently small perturbation of the frame vectors will do
stable phase retrieval, though with a slightly worse stability constant. We
provide new quantitative bounds on how the stability constant for phase
retrieval is affected by a small perturbation of the frame vectors. These
bounds are significant in that they are independent of the dimension of the
Hilbert space and the number of vectors in the frame.
Authors' comments: 14 pages
Yucheng Zhou, Tao Shen, Xiubo Geng, Chongyang Tao, Guodong Long, Can Xu, Daxin Jiang
Long document retrieval aims to fetch query-relevant documents from a
large-scale collection, where knowledge distillation has become de facto to
improve a retriever by mimicking a heterogeneous yet powerful cross-encoder.
However, in contrast to passages or sentences, retrieval on long documents
suffers from the scope hypothesis that a long document may cover multiple
topics. This maximizes their structure heterogeneity and poses a
granular-mismatch issue, leading to an inferior distillation efficacy. In this
work, we propose a new learning framework, fine-grained distillation (FGD), for
long-document retrievers. While preserving the conventional dense retrieval
paradigm, it first produces global-consistent representations crossing
different fine granularity and then applies multi-granular aligned distillation
merely during training. In experiments, we evaluate our framework on two
long-document retrieval benchmarks, which show state-of-the-art performance.
Authors' comments: 13 pages, 5 figures, 5 tables
Dong Li, Yelong Shen, Ruoming Jin, Yi Mao, Kuan Wang, Weizhu Chen
Pre-trained language models have achieved promising success in code retrieval tasks, where a natural language documentation query is given to find the most relevant existing code snippet. However, existing models focus only on optimizing the documentation code pairs by embedding them into latent space, without the association of external knowledge. In this paper, we propose a generation-augmented query expansion framework. Inspired by the human retrieval process - sketching an answer before searching, in this work, we utilize the powerful code generation model to benefit the code retrieval task. Specifically, we demonstrate that rather than merely retrieving the target code snippet according to the documentation query, it would be helpful to augment the documentation query with its generation counterpart - generated code snippets from the code generation model. To the best of our knowledge, this is the first attempt that leverages the code generation model to enhance the code retrieval task. We achieve new state-of-the-art results on the CodeSearchNet benchmark and surpass the baselines significantly.
Jing Lu, Keith Hall, Ji Ma, Jianmo Ni
We present Hybrid Infused Reranking for Passages Retrieval (HYRR), a framework for training rerankers based on a hybrid of BM25 and neural retrieval models. Retrievers based on hybrid models have been shown to outperform both BM25 and neural models alone. Our approach exploits this improved performance when training a reranker, leading to a robust reranking model. The reranker, a cross-attention neural model, is shown to be robust to different first-stage retrieval systems, achieving better performance than rerankers simply trained upon the first-stage retrievers in the multi-stage systems. We present evaluations on a supervised passage retrieval task using MS MARCO and zero-shot retrieval tasks using BEIR. The empirical results show strong performance on both evaluations.
David Uthus, Jianmo Ni
Evaluating automatically-generated text summaries is a challenging task. While there have been many interesting approaches, they still fall short of human evaluations. We present RISE, a new approach for evaluating summaries by leveraging techniques from information retrieval. RISE is first trained as a retrieval task using a dual-encoder retrieval setup, and can then be subsequently utilized for evaluating a generated summary given an input document, without gold reference summaries. RISE is especially well suited when working on new datasets where one may not have reference summaries available for evaluation. We conduct comprehensive experiments on the SummEval benchmark (Fabbri et al., 2021) and the results show that RISE has higher correlation with human evaluations compared to many past approaches to summarization evaluation. Furthermore, RISE also demonstrates data-efficiency and generalizability across languages.
Riya Gupta, C. V. Jawahar
Extracting the relevant information out of a large number of documents is a
challenging and tedious task. The quality of results generated by the
traditionally available full-text search engine and text-based image retrieval
systems is not optimal. Information retrieval (IR) tasks become more
challenging with the nontraditional language scripts, as in the case of Indic
scripts. The authors have developed OCR (Optical Character Recognition) Search
Engine to make an Information Retrieval & Extraction (IRE) system that
replicates the current state-of-the-art methods using the IRE and Natural
Language Processing (NLP) techniques. Here we have presented the study of the
methods used for performing search and retrieval tasks. The details of this
system, along with the statistics of the dataset (source: National Digital
Library of India or NDLI), is also presented. Additionally, the ideas to
further explore and add value to research in IRE are also discussed.
Authors' comments: 6 pages including references, 5 figures, and 1 table. For project
page see
https://cvit.iiit.ac.in/research/projects/cvit-projects/retrieval-from-large-document-image-collections
Yookoon Park, Mahmoud Azab, Bo Xiong, Seungwhan Moon, Florian Metze, Gourab Kundu, Kirmani Ahmed
Cross-modal contrastive learning has led the recent advances in multimodal
retrieval with its simplicity and effectiveness. In this work, however, we
reveal that cross-modal contrastive learning suffers from incorrect
normalization of the sum retrieval probabilities of each text or video
instance. Specifically, we show that many test instances are either over- or
under-represented during retrieval, significantly hurting the retrieval
performance. To address this problem, we propose Normalized Contrastive
Learning (NCL) which utilizes the Sinkhorn-Knopp algorithm to compute the
instance-wise biases that properly normalize the sum retrieval probabilities of
each instance so that every text and video instance is fairly represented
during cross-modal retrieval. Empirical study shows that NCL brings consistent
and significant gains in text-video retrieval on different model architectures,
with new state-of-the-art multimodal retrieval metrics on the ActivityNet,
MSVD, and MSR-VTT datasets without any architecture engineering.
Authors' comments: Published in EMNLP 2022