Guoshan Liu, Hailong Yin, Bin Zhu, Jingjing Chen, Chong-Wah Ngo, Yu-Gang Jiang
Given the potential applications of generating recipes from food images, this
area has garnered significant attention from researchers in recent years.
Existing works for recipe generation primarily utilize a two-stage training
method, first generating ingredients and then obtaining instructions from both
the image and ingredients. Large Multi-modal Models (LMMs), which have achieved
notable success across a variety of vision and language tasks, shed light to
generating both ingredients and instructions directly from images.
Nevertheless, LMMs still face the common issue of hallucinations during recipe
generation, leading to suboptimal performance. To tackle this, we propose a
retrieval augmented large multimodal model for recipe generation. We first
introduce Stochastic Diversified Retrieval Augmentation (SDRA) to retrieve
recipes semantically related to the image from an existing datastore as a
supplement, integrating them into the prompt to add diverse and rich context to
the input image. Additionally, Self-Consistency Ensemble Voting mechanism is
proposed to determine the most confident prediction recipes as the final
output. It calculates the consistency among generated recipe candidates, which
use different retrieval recipes as context for generation. Extensive
experiments validate the effectiveness of our proposed method, which
demonstrates state-of-the-art (SOTA) performance in recipe generation tasks on
the Recipe1M dataset.
Authors' comments: ACCEPT on IEEE/CVF Winter Conference on Applications of Computer
Vision (WACV) 2025
Millicent Li, Tongfei Chen, Benjamin Van Durme, Patrick Xia
Document retrieval for tasks such as search and retrieval-augmented
generation typically involves datasets that are unstructured: free-form text
without explicit internal structure in each document. However, documents can
have a structured form, consisting of fields such as an article title, message
body, or HTML header. To address this gap, we introduce Multi-Field Adaptive
Retrieval (MFAR), a flexible framework that accommodates any number of and any
type of document indices on structured data. Our framework consists of two main
steps: (1) the decomposition of an existing document into fields, each indexed
independently through dense and lexical methods, and (2) learning a model which
adaptively predicts the importance of a field by conditioning on the document
query, allowing on-the-fly weighting of the most likely field(s). We find that
our approach allows for the optimized use of dense versus lexical
representations across field types, significantly improves in document ranking
over a number of existing retrievers, and achieves state-of-the-art performance
for multi-field structured data.
Authors' comments: ICLR 2025, Spotlight
Mohamed Nomeir, Pasan Dissanayake, Shreya Meel, Sanghamitra Dutta, Sennur Ulukus
Transparency and explainability are two extremely important aspects to be considered when employing black-box machine learning models in high-stake applications. Providing counterfactual explanations is one way of catering this requirement. However, this also poses a threat to the privacy of both the institution that is providing the explanation as well as the user who is requesting it. In this work, we propose multiple schemes inspired by private information retrieval (PIR) techniques which ensure the \emph{user's privacy} when retrieving counterfactual explanations. We present a scheme which retrieves the \emph{exact} nearest neighbor counterfactual explanation from a database of accepted points while achieving perfect (information-theoretic) privacy for the user. While the scheme achieves perfect privacy for the user, some leakage on the database is inevitable which we quantify using a mutual information based metric. Furthermore, we propose strategies to reduce this leakage to achieve an advanced degree of database privacy. We extend these schemes to incorporate user's preference on transforming their attributes, so that a more actionable explanation can be received. Since our schemes rely on finite field arithmetic, we empirically validate our schemes on real datasets to understand the trade-off between the accuracy and the finite field sizes.
Michael Itzhaki
This paper introduces a novel method for compressing palindromic structures in strings, establishing upper and lower bounds for their efficient representation. We uncover a unique relationship between the Lempel-Ziv parsing of palindromes and the alphabet size, leveraging this insight to propose a compression algorithm that improves space efficiency of Manacher array. Additionally, we present a data structure capable of storing all maximal palindromes (Manacher array) in sublinear space with near-optimal access time. Our approach reduces the memory overhead of storing palindromes, offering a new avenue for optimizing compression algorithms in text processing applications.
Weinan Zhang, Junwei Liao, Ning Li, Kounianhua Du, Jianghao Lin
Since the 1970s, information retrieval (IR) has long been defined as the
process of acquiring relevant information items from a pre-defined corpus to
satisfy user information needs. Traditional IR systems, while effective in
domains like web search, are constrained by their reliance on static,
pre-defined information items. To this end, this paper introduces agentic
information retrieval (Agentic IR), a transformative next-generation paradigm
for IR driven by large language models (LLMs) and AI agents. The central shift
in agentic IR is the evolving definition of ``information'' from static,
pre-defined information items to dynamic, context-dependent information states.
Information state refers to a particular information context that the user is
right in within a dynamic environment, encompassing not only the acquired
information items but also real-time user preferences, contextual factors, and
decision-making processes. In such a way, traditional information retrieval,
focused on acquiring relevant information items based on user queries, can be
naturally extended to achieving the target information state given the user
instruction, which thereby defines the agentic information retrieval. We
systematically discuss agentic IR from various aspects, i.e., task formulation,
architecture, evaluation, case studies, as well as challenges and future
prospects. We believe that the concept of agentic IR introduced in this paper
not only broadens the scope of information retrieval research but also lays the
foundation for a more adaptive, interactive, and intelligent next-generation IR
paradigm.
Authors' comments: 11 pages, perspective paper
Zhong Zheng, Lingzhou Xue
The phase retrieval problem in the presence of noise aims to recover the
signal vector of interest from a set of quadratic measurements with infrequent
but arbitrary corruptions, and it plays an important role in many scientific
applications. However, the essential geometric structure of the nonconvex
robust phase retrieval based on the $\ell_1$-loss is largely unknown to study
spurious local solutions, even under the ideal noiseless setting, and its
intrinsic nonsmooth nature also impacts the efficiency of optimization
algorithms. This paper introduces the smoothed robust phase retrieval (SRPR)
based on a family of convolution-type smoothed loss functions. Theoretically,
we prove that the SRPR enjoys a benign geometric structure with high
probability: (1) under the noiseless situation, the SRPR has no spurious local
solutions, and the target signals are global solutions, and (2) under the
infrequent but arbitrary corruptions, we characterize the stationary points of
the SRPR and prove its benign landscape, which is the first landscape analysis
of phase retrieval with corruption in the literature. Moreover, we prove the
local linear convergence rate of gradient descent for solving the SRPR under
the noiseless situation. Experiments on both simulated datasets and image
recovery are provided to demonstrate the numerical performance of the SRPR.
Authors' comments: 32 pages, 8 figures
Bailu Ding, Jiaqi Zhai
Retrieval plays a fundamental role in recommendation systems, search, and
natural language processing (NLP) by efficiently finding relevant items from a
large corpus given a query. Dot products have been widely used as the
similarity function in such tasks, enabled by Maximum Inner Product Search
(MIPS) algorithms for efficient retrieval. However, state-of-the-art retrieval
algorithms have migrated to learned similarities. These advanced approaches
encompass multiple query embeddings, complex neural networks, direct item ID
decoding via beam search, and hybrid solutions. Unfortunately, we lack
efficient solutions for retrieval in these state-of-the-art setups. Our work
addresses this gap by investigating efficient retrieval techniques with
expressive learned similarity functions. We establish Mixture-of-Logits (MoL)
as a universal approximator of similarity functions, demonstrate that MoL's
expressiveness can be realized empirically to achieve superior performance on
diverse retrieval scenarios, and propose techniques to retrieve the approximate
top-k results using MoL with tight error bounds. Through extensive
experimentation, we show that MoL, enhanced by our proposed mutual
information-based load balancing loss, sets new state-of-the-art results across
heterogeneous scenarios, including sequential retrieval models in
recommendation systems and finetuning language models for question answering;
and our approximate top-$k$ algorithms outperform baselines by up to 66x in
latency while achieving >.99 recall rate compared to exact algorithms.
Authors' comments: To appear in WWW 2025. Our code and model checkpoints are available
at https://github.com/bailuding/rails
Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke
Beyond effectiveness, the robustness of an information retrieval (IR) system
is increasingly attracting attention. When deployed, a critical technology such
as IR should not only deliver strong performance on average but also have the
ability to handle a variety of exceptional situations. In recent years,
research into the robustness of IR has seen significant growth, with numerous
researchers offering extensive analyses and proposing myriad strategies to
address robustness challenges. In this tutorial, we first provide background
information covering the basics and a taxonomy of robustness in IR. Then, we
examine adversarial robustness and out-of-distribution (OOD) robustness within
IR-specific contexts, extensively reviewing recent progress in methods to
enhance robustness. The tutorial concludes with a discussion on the robustness
of IR in the context of large language models (LLMs), highlighting ongoing
challenges and promising directions for future research. This tutorial aims to
generate broader attention to robustness issues in IR, facilitate an
understanding of the relevant literature, and lower the barrier to entry for
interested researchers and practitioners.
Authors' comments: accepted by SIGIR2024 Tutorial
Yichen Zhu, Zhicai Ou, Xiaofeng Mou, Jian Tang
Embodied agents operating in complex and uncertain environments face
considerable challenges. While some advanced agents handle complex manipulation
tasks with proficiency, their success often hinges on extensive training data
to develop their capabilities. In contrast, humans typically rely on recalling
past experiences and analogous situations to solve new problems. Aiming to
emulate this human approach in robotics, we introduce the Retrieval-Augmented
Embodied Agent (RAEA). This innovative system equips robots with a form of
shared memory, significantly enhancing their performance. Our approach
integrates a policy retriever, allowing robots to access relevant strategies
from an external policy memory bank based on multi-modal inputs. Additionally,
a policy generator is employed to assimilate these strategies into the learning
process, enabling robots to formulate effective responses to tasks. Extensive
testing of RAEA in both simulated and real-world scenarios demonstrates its
superior performance over traditional methods, representing a major leap
forward in robotic technology.
Authors' comments: CVPR2024
Leif Azzopardi, Vishwa Vinay
This paper introduces the concept of accessibility from the field of transportation planning and adopts it within the context of Information Retrieval (IR). An analogy is drawn between the fields, which motivates the development of document accessibility measures for IR systems. Considering the accessibility of documents within a collection given an IR System provides a different perspective on the analysis and evaluation of such systems which could be used to inform the design, tuning and management of current and future IR systems.
Marwah Alaofi, Negar Arabzadeh, Charles L. A. Clarke, Mark Sanderson
In this chapter, we consider generative information retrieval evaluation from
two distinct but interrelated perspectives. First, large language models (LLMs)
themselves are rapidly becoming tools for evaluation, with current research
indicating that LLMs may be superior to crowdsource workers and other paid
assessors on basic relevance judgement tasks. We review past and ongoing
related research, including speculation on the future of shared task
initiatives, such as TREC, and a discussion on the continuing need for human
assessments. Second, we consider the evaluation of emerging LLM-based
generative information retrieval (GenIR) systems, including retrieval augmented
generation (RAG) systems. We consider approaches that focus both on the
end-to-end evaluation of GenIR systems and on the evaluation of a retrieval
component as an element in a RAG system. Going forward, we expect the
evaluation of GenIR systems to be at least partially based on LLM-based
assessment, creating an apparent circularity, with a system seemingly
evaluating its own output. We resolve this apparent circularity in two ways: 1)
by viewing LLM-based assessment as a form of "slow search", where a slower IR
system is used for evaluation and training of a faster production IR system;
and 2) by recognizing a continuing need to ground evaluation in human
assessment, even if the characteristics of that human assessment must change.
Authors' comments: This chapter is part of the book Information Access in the Era of
Generative AI, co-edited by Chirag Shah and Ryen White
Bowen Cao, Deng Cai, Leyang Cui, Xuxin Cheng, Wei Bi, Yuexian Zou, Shuming Shi
Standard language models generate text by selecting tokens from a fixed,
finite, and standalone vocabulary. We introduce a novel method that selects
context-aware phrases from a collection of supporting documents. One of the
most significant challenges for this paradigm shift is determining the training
oracles, because a string of text can be segmented in various ways and each
segment can be retrieved from numerous possible documents. To address this, we
propose to initialize the training oracles using linguistic heuristics and,
more importantly, bootstrap the oracles through iterative self-reinforcement.
Extensive experiments show that our model not only outperforms standard
language models on a variety of knowledge-intensive tasks but also demonstrates
improved generation quality in open-ended text generation. For instance,
compared to the standard language model counterpart, our model raises the
accuracy from 23.47% to 36.27% on OpenbookQA, and improves the MAUVE score from
42.61% to 81.58% in open-ended text generation. Remarkably, our model also
achieves the best performance and the lowest latency among several
retrieval-augmented baselines. In conclusion, we assert that retrieval is more
accurate generation and hope that our work will encourage further research on
this new paradigm shift.
Authors' comments: ICLR 2024
Yongqi Li, Zhen Zhang, Wenjie Wang, Liqiang Nie, Wenjie Li, Tat-Seng Chua
Generative retrieval is a promising new paradigm in text retrieval that generates identifier strings of relevant passages as the retrieval target. This paradigm leverages powerful generative language models, distinct from traditional sparse or dense retrieval methods. In this work, we identify a viable direction to further enhance generative retrieval via distillation and propose a feasible framework, named DGR. DGR utilizes sophisticated ranking models, such as the cross-encoder, in a teacher role to supply a passage rank list, which captures the varying relevance degrees of passages instead of binary hard labels; subsequently, DGR employs a specially designed distilled RankNet loss to optimize the generative retrieval model, considering the passage rank order provided by the teacher model as labels. This framework only requires an additional distillation step to enhance current generative retrieval systems and does not add any burden to the inference stage. We conduct experiments on four public datasets, and the results indicate that DGR achieves state-of-the-art performance among the generative retrieval methods. Additionally, DGR demonstrates exceptional robustness and generalizability with various teacher models and distillation losses.
Shi-Qi Yan, Jia-Chen Gu, Yun Zhu, Zhen-Hua Ling
Large language models (LLMs) inevitably exhibit hallucinations since the
accuracy of generated texts cannot be secured solely by the parametric
knowledge they encapsulate. Although retrieval-augmented generation (RAG) is a
practicable complement to LLMs, it relies heavily on the relevance of retrieved
documents, raising concerns about how the model behaves if retrieval goes
wrong. To this end, we propose the Corrective Retrieval Augmented Generation
(CRAG) to improve the robustness of generation. Specifically, a lightweight
retrieval evaluator is designed to assess the overall quality of retrieved
documents for a query, returning a confidence degree based on which different
knowledge retrieval actions can be triggered. Since retrieval from static and
limited corpora can only return sub-optimal documents, large-scale web searches
are utilized as an extension for augmenting the retrieval results. Besides, a
decompose-then-recompose algorithm is designed for retrieved documents to
selectively focus on key information and filter out irrelevant information in
them. CRAG is plug-and-play and can be seamlessly coupled with various
RAG-based approaches. Experiments on four datasets covering short- and
long-form generation tasks show that CRAG can significantly improve the
performance of RAG-based approaches.
Authors' comments: Update results, add more analysis, and fix typos
Sebastian Bruch
Vectors are universal mathematical objects that can represent text, images, speech, or a mix of these data modalities. That happens regardless of whether data is represented by hand-crafted features or learnt embeddings. Collect a large enough quantity of such vectors and the question of retrieval becomes urgently relevant: Finding vectors that are more similar to a query vector. This monograph is concerned with the question above and covers fundamental concepts along with advanced data structures and algorithms for vector retrieval. In doing so, it recaps this fascinating topic and lowers barriers of entry into this rich area of research.
Anton Shapkin, Denis Litvinov, Yaroslav Zharov, Egor Bogomolov, Timur Galimzyanov, Timofey Bryksin
Current state-of-the-art large language models are effective in generating
high-quality text and encapsulating a broad spectrum of world knowledge. These
models, however, often hallucinate and lack locally relevant factual data.
Retrieval-augmented approaches were introduced to overcome these problems and
provide more accurate responses. Typically, the retrieved information is simply
appended to the main request, restricting the context window size of the model.
We propose a novel approach for the Dynamic Retrieval-Augmented Generation
(DRAG), based on the entity-augmented generation, which injects compressed
embeddings of the retrieved entities into the generative model. The proposed
pipeline was developed for code-generation tasks, yet can be transferred to
some domains of natural language processing. To train the model, we collect and
publish a new project-level code generation dataset. We use it for the
evaluation along with publicly available datasets. Our approach achieves
several targets: (1) lifting the length limitations of the context window,
saving on the prompt size; (2) allowing huge expansion of the number of
retrieval entities available for the context; (3) alleviating the problem of
misspelling or failing to find relevant entity names. This allows the model to
beat all baselines (except GPT-3.5) with a strong margin.
Authors' comments: 10 pages
Evert Nasedkin, Paul Mollière, Doriann Blain
petitRADTRANS (pRT) is a fast radiative transfer code used for computing
emission and transmission spectra of exoplanet atmospheres, combining a FORTRAN
back end with a Python based user interface. It is widely used in the exoplanet
community with 222 references in the literature to date, and has been
benchmarked against numerous similar tools. The spectra calculated with pRT can
be used as a forward model for fitting spectroscopic data using Monte Carlo
techniques, commonly referred to as an atmospheric retrieval. The new retrieval
module combines fast forward modelling with nested sampling codes, allowing for
atmospheric retrievals on a large range of different types of exoplanet data.
Thus it is now possible to use pRT to easily and quickly infer the atmospheric
properties of exoplanets in both transmission and thermal emission.
Authors' comments: 5 pages, 1 figure, published in the Journal of Open Source Software
Roman Jacome, Kumar Vijay Mishra, Brian M. Sadler, Henry Arguello
Signal processing over hypercomplex numbers arises in many optical imaging
applications. In particular, spectral image or color stereo data are often
processed using octonion algebra. Recently, the eight-band multispectral image
phase recovery has gained salience, wherein it is desired to recover the eight
bands from the phaseless measurements. In this paper, we tackle this hitherto
unaddressed hypercomplex variant of the popular phase retrieval (PR) problem.
We propose octonion Wirtinger flow (OWF) to recover an octonion signal from its
intensity-only observation. However, contrary to the complex-valued Wirtinger
flow, the non-associative nature of octonion algebra and the consequent lack of
octonion derivatives make the extension to OWF non-trivial. We resolve this
using the pseudo-real-matrix representation of octonion to perform the
derivatives in each OWF update. We demonstrate that our approach recovers the
octonion signal up to a right-octonion phase factor. Numerical experiments
validate OWF-based PR with high accuracy under both noiseless and noisy
measurements.
Authors' comments: 13 pages, 3 figures
Hrishikesh Kulkarni, Sean MacAvaney, Nazli Goharian, Ophir Frieder
Retrieval approaches that score documents based on learned dense vectors
(i.e., dense retrieval) rather than lexical signals (i.e., conventional
retrieval) are increasingly popular. Their ability to identify related
documents that do not necessarily contain the same terms as those appearing in
the user's query (thereby improving recall) is one of their key advantages.
However, to actually achieve these gains, dense retrieval approaches typically
require an exhaustive search over the document collection, making them
considerably more expensive at query-time than conventional lexical approaches.
Several techniques aim to reduce this computational overhead by approximating
the results of a full dense retriever. Although these approaches reasonably
approximate the top results, they suffer in terms of recall -- one of the key
advantages of dense retrieval. We introduce 'LADR' (Lexically-Accelerated Dense
Retrieval), a simple-yet-effective approach that improves the efficiency of
existing dense retrieval models without compromising on retrieval
effectiveness. LADR uses lexical retrieval techniques to seed a dense retrieval
exploration that uses a document proximity graph. We explore two variants of
LADR: a proactive approach that expands the search space to the neighbors of
all seed documents, and an adaptive approach that selectively searches the
documents with the highest estimated relevance in an iterative fashion. Through
extensive experiments across a variety of dense retrieval models, we find that
LADR establishes a new dense retrieval effectiveness-efficiency Pareto frontier
among approximate k nearest neighbor techniques. Further, we find that when
tuned to take around 8ms per query in retrieval latency on our hardware, LADR
consistently achieves both precision and recall that are on par with an
exhaustive search on standard benchmarks.
Authors' comments: SIGIR 2023
Xiaohuan Pei, Yanxi Li, Minjing Dong, Chang Xu
With the increasing number of new neural architecture designs and substantial
existing neural architectures, it becomes difficult for the researchers to
situate their contributions compared with existing neural architectures or
establish the connections between their designs and other relevant ones. To
discover similar neural architectures in an efficient and automatic manner, we
define a new problem Neural Architecture Retrieval which retrieves a set of
existing neural architectures which have similar designs to the query neural
architecture. Existing graph pre-training strategies cannot address the
computational graph in neural architectures due to the graph size and motifs.
To fulfill this potential, we propose to divide the graph into motifs which are
used to rebuild the macro graph to tackle these issues, and introduce
multi-level contrastive learning to achieve accurate graph representation
learning. Extensive evaluations on both human-designed and synthesized neural
architectures demonstrate the superiority of our algorithm. Such a dataset
which contains 12k real-world network architectures, as well as their
embedding, is built for neural architecture retrieval.
Authors' comments: ICLR 2024