Antonios Minas Krasakis, Andrew Yates, Evangelos Kanoulas
This study aims to develop models that generate corpus informed clarifying questions for web search, in a way that ensures the questions align with the available information in the retrieval corpus. We demonstrate the effectiveness of Retrieval Augmented Language Models (RAG) in this process, emphasising their ability to (i) jointly model the user query and retrieval corpus to pinpoint the uncertainty and ask for clarifications end-to-end and (ii) model more evidence documents, which can be used towards increasing the breadth of the questions asked. However, we observe that in current datasets search intents are largely unsupported by the corpus, which is problematic both for training and evaluation. This causes question generation models to ``hallucinate'', ie. suggest intents that are not in the corpus, which can have detrimental effects in performance. To address this, we propose dataset augmentation methods that align the ground truth clarifications with the retrieval corpus. Additionally, we explore techniques to enhance the relevance of the evidence pool during inference, but find that identifying ground truth intents within the corpus remains challenging. Our analysis suggests that this challenge is partly due to the bias of current datasets towards clarification taxonomies and calls for data that can support generating corpus-informed clarifications.
Kartik Garg, Sai Shubodh Puligilla, Shishir Kolathaya, Madhava Krishna, Sourav Garg
Accurately recognizing a revisited place is crucial for embodied agents to
localize and navigate. This requires visual representations to be distinct,
despite strong variations in camera viewpoint and scene appearance. Existing
visual place recognition pipelines encode the "whole" image and search for
matches. This poses a fundamental challenge in matching two images of the same
place captured from different camera viewpoints: "the similarity of what
overlaps can be dominated by the dissimilarity of what does not overlap". We
address this by encoding and searching for "image segments" instead of the
whole images. We propose to use open-set image segmentation to decompose an
image into `meaningful' entities (i.e., things and stuff). This enables us to
create a novel image representation as a collection of multiple overlapping
subgraphs connecting a segment with its neighboring segments, dubbed
SuperSegment. Furthermore, to efficiently encode these SuperSegments into
compact vector representations, we propose a novel factorized representation of
feature aggregation. We show that retrieving these partial representations
leads to significantly higher recognition recall than the typical whole image
based retrieval. Our segments-based approach, dubbed SegVLAD, sets a new
state-of-the-art in place recognition on a diverse selection of benchmark
datasets, while being applicable to both generic and task-specialized image
encoders. Finally, we demonstrate the potential of our method to ``revisit
anything'' by evaluating our method on an object instance retrieval task, which
bridges the two disparate areas of research: visual place recognition and
object-goal navigation, through their common aim of recognizing goal objects
specific to a place. Source code: https://github.com/AnyLoc/Revisit-Anything.
Authors' comments: Presented at ECCV 2024; Includes supplementary; 29 pages; 8 figures
Zahra Sepasdar, Sushant Gautam, Cise Midoglu, Michael A. Riegler, Pål Halvorsen
Extracting meaningful insights from large and complex datasets poses significant challenges, particularly in ensuring the accuracy and relevance of retrieved information. Traditional data retrieval methods such as sequential search and index-based retrieval often fail when handling intricate and interconnected data structures, resulting in incomplete or misleading outputs. To overcome these limitations, we introduce Structured-GraphRAG, a versatile framework designed to enhance information retrieval across structured datasets in natural language queries. Structured-GraphRAG utilizes multiple knowledge graphs, which represent data in a structured format and capture complex relationships between entities, enabling a more nuanced and comprehensive retrieval of information. This graph-based approach reduces the risk of errors in language model outputs by grounding responses in a structured format, thereby enhancing the reliability of results. We demonstrate the effectiveness of Structured-GraphRAG by comparing its performance with that of a recently published method using traditional retrieval-augmented generation. Our findings show that Structured-GraphRAG significantly improves query processing efficiency and reduces response times. While our case study focuses on soccer data, the framework's design is broadly applicable, offering a powerful tool for data analysis and enhancing language model applications across various structured domains.
Ameeta Agrawal, Andy Dang, Sina Bagheri Nezhad, Rhitabrat Pokharel, Russell Scheinberg
Recent large language models (LLMs) demonstrate impressive capabilities in
handling long contexts, some exhibiting near-perfect recall on synthetic
retrieval tasks. However, these evaluations have mainly focused on English text
and involved a single target sentence within lengthy contexts. Our work
investigates how LLM performance generalizes to multilingual settings with
multiple hidden target sentences. We create a new dataset -- mLongRR -- to
comprehensively evaluate several multilingual long-context LLMs on retrieval
and reasoning tasks across five languages: English, Vietnamese, Indonesian,
Swahili, and Somali. These languages share the Latin script but belong to
distinct language families and resource levels. Our analysis reveals a
significant performance gap between languages. The best-performing models such
as Gemini-1.5 and GPT-4o, achieve around 96% accuracy in English to around 36%
in Somali with a single target sentence. However, this accuracy drops to 40% in
English and 0% in Somali when dealing with three target sentences. Our findings
highlight the challenges long-context LLMs face when processing longer
contexts, an increase in the number of target sentences, or languages of lower
resource levels.
Authors' comments: To appear at MRL 2024
Yoonsang Lee, Minsoo Kim, Seung-won Hwang
This paper studies the problem of information retrieval, to adapt to unseen tasks. Existing work generates synthetic queries from domain-specific documents to jointly train the retriever. However, the conventional query generator assumes the query as a question, thus failing to accommodate general search intents. A more lenient approach incorporates task-adaptive elements, such as few-shot learning with an 137B LLM. In this paper, we challenge a trend equating query and question, and instead conceptualize query generation task as a "compilation" of high-level intent into task-adaptive query. Specifically, we propose EGG, a query generator that better adapts to wide search intents expressed in the BeIR benchmark. Our method outperforms baselines and existing models on four tasks with underexplored intents, while utilizing a query generator 47 times smaller than the previous state-of-the-art. Our findings reveal that instructing the LM with explicit search intent is a key aspect of modeling an effective query generator.
Xinyu Gao, Yun Xiong, Deze Wang, Zhenhan Guan, Zejian Shi, Haofen Wang, Shanshan Li
Retrieval-augmented code generation utilizes Large Language Models as the
generator and significantly expands their code generation capabilities by
providing relevant code, documentation, and more via the retriever. The current
approach suffers from two primary limitations: 1) information redundancy. The
indiscriminate inclusion of redundant information can result in resource
wastage and may misguide generators, affecting their effectiveness and
efficiency. 2) preference gap. Due to different optimization objectives, the
retriever strives to procure code with higher ground truth similarity, yet this
effort does not substantially benefit the generator. The retriever and the
generator may prefer different golden code, and this gap in preference results
in a suboptimal design. Additionally, differences in parameterization knowledge
acquired during pre-training result in varying preferences among different
generators.
To address these limitations, in this paper, we propose RRG (Retrieve,
Refactor, Generate), a novel framework for effective and efficient code
generation. This framework introduces a code refactorer module between the
retriever and the generator to bridge them. The refactoring process transforms
the raw retrieved code into a more concise, efficient, and model-friendly
version. It eliminates redundant information and noise, reducing the input
length. Consequently, the generator receives higher-quality context, enabling
it to produce more accurate results with lower inference costs. We conducted
comprehensive experiments on multiple datasets. In the experiments, we
confirmed the existence of a preference gap between the retriever and the
generator, and RRG effectively bridges this gap. Specifically, RRG achieved
significant performance improvements, with increases of up to 28% on EM, 13% on
BLEU, and 6.8% on CodeBLEU.
Authors' comments: ASE2024
Lu Chen, Ruqing Zhang, Jiafeng Guo, Yixing Fan, Xueqi Cheng
Retrieval-augmented generation (RAG) has emerged as a popular solution to mitigate the hallucination issues of large language models. However, existing studies on RAG seldom address the issue of predictive uncertainty, i.e., how likely it is that a RAG model's prediction is incorrect, resulting in uncontrollable risks in real-world applications. In this work, we emphasize the importance of risk control, ensuring that RAG models proactively refuse to answer questions with low confidence. Our research identifies two critical latent factors affecting RAG's confidence in its predictions: the quality of the retrieved results and the manner in which these results are utilized. To guide RAG models in assessing their own confidence based on these two latent factors, we develop a counterfactual prompting framework that induces the models to alter these factors and analyzes the effect on their answers. We also introduce a benchmarking procedure to collect answers with the option to abstain, facilitating a series of experiments. For evaluation, we introduce several risk-related metrics and the experimental results demonstrate the effectiveness of our approach. Our code and benchmark dataset are available at https://github.com/ict-bigdatalab/RC-RAG.
Brendan Hogan Rappazzo, Yingheng Wang, Aaron Ferber, Carla Gomes
The ability to form, retrieve, and reason about memories in response to
stimuli serves as the cornerstone for general intelligence - shaping entities
capable of learning, adaptation, and intuitive insight. Large Language Models
(LLMs) have proven their ability, given the proper memories or context, to
reason and respond meaningfully to stimuli. However, they are still unable to
optimally encode, store, and retrieve memories - the ability to do this would
unlock their full ability to operate as AI agents, and to specialize to niche
domains. To remedy this, one promising area of research is Retrieval Augmented
Generation (RAG), which aims to augment LLMs by providing them with rich
in-context examples and information. In question-answering (QA) applications,
RAG methods embed the text of interest in chunks, and retrieve the most
relevant chunks for a prompt using text embeddings. Motivated by human memory
encoding and retrieval, we aim to improve over standard RAG methods by
generating and encoding higher-level information and tagging the chunks by
their utility to answer questions. We introduce Graphical Eigen Memories For
Retrieval Augmented Generation (GEM-RAG). GEM-RAG works by tagging each chunk
of text in a given text corpus with LLM generated ``utility'' questions,
connecting chunks in a graph based on the similarity of both their text and
utility questions, and then using the eigendecomposition of the memory graph to
build higher level summary nodes that capture the main themes of the text. We
evaluate GEM-RAG, using both UnifiedQA and GPT-3.5 Turbo as the LLMs, with
SBERT, and OpenAI's text encoders on two standard QA tasks, showing that
GEM-RAG outperforms other state-of-the-art RAG methods on these tasks. We also
discuss the implications of having a robust RAG system and future directions.
Authors' comments: 8 pages
Jordi Bayarri-Planas, Ashwin Kumar Gururajan, Dario Garcia-Gasulla
This study leverages optimized context retrieval to enhance open-source Large
Language Models (LLMs) for cost-effective, high performance healthcare AI. We
demonstrate that this approach achieves state-of-the-art accuracy on medical
question answering at a fraction of the cost of proprietary models,
significantly improving the cost-accuracy Pareto frontier on the MedQA
benchmark. Key contributions include: (1) OpenMedQA, a novel benchmark
revealing a performance gap in open-ended medical QA compared to
multiple-choice formats; (2) a practical, reproducible pipeline for context
retrieval optimization; and (3) open-source resources (Prompt Engine,
CoT/ToT/Thinking databases) to empower healthcare AI development. By advancing
retrieval techniques and QA evaluation, we enable more affordable and reliable
LLM solutions for healthcare.
Authors' comments: 14 pages, 3 figures, 5 tables, Accepted for publication at the 21st
International Conference on Artificial Intelligence Applications and
Innovations (AIAI 2025)
Morris Florek, David Tschirschwitz, Björn Barz, Volker Rodehorst
Current image retrieval systems often face domain specificity and generalization issues. This study aims to overcome these limitations by developing a computationally efficient training framework for a universal feature extractor that provides strong semantic image representations across various domains. To this end, we curated a multi-domain training dataset, called M4D-35k, which allows for resource-efficient training. Additionally, we conduct an extensive evaluation and comparison of various state-of-the-art visual-semantic foundation models and margin-based metric learning loss functions regarding their suitability for efficient universal feature extraction. Despite constrained computational resources, we achieve near state-of-the-art results on the Google Universal Image Embedding Challenge, with a mMP@5 of 0.721. This places our method at the second rank on the leaderboard, just 0.7 percentage points behind the best performing method. However, our model has 32% fewer overall parameters and 289 times fewer trainable parameters. Compared to methods with similar computational requirements, we outperform the previous state of the art by 3.3 percentage points. We release our code and M4D-35k training set annotations at https://github.com/morrisfl/UniFEx.
Jiatao Li, Xinyu Hu, Xiaojun Wan
Retrieval-Augmented Generation (RAG) has greatly improved large language
models (LLMs) by enabling them to generate accurate, contextually grounded
responses through the integration of external information. However,
conventional RAG approaches, which prioritize top-ranked documents based solely
on query-context relevance, often introduce redundancy and conflicting
information. This issue is particularly evident in unsupervised retrieval
settings, where there are no mechanisms to effectively mitigate these problems,
leading to suboptimal context selection. To address this, we propose Selection
using Matrices for Augmented Retrieval (SMART) in question answering tasks, a
fully unsupervised and training-free framework designed to optimize context
selection in RAG. SMART leverages Determinantal Point Processes (DPPs) to
simultaneously model relevance, diversity and conflict, ensuring the selection
of potentially high-quality contexts. Experimental results across multiple
datasets demonstrate that SMART significantly enhances QA performance and
surpasses previous unsupervised context selection methods, showing a promising
strategy for RAG.
Authors' comments: Under Review
Dongwon Jung, Qin Liu, Tenghao Huang, Ben Zhou, Muhao Chen
Retrieval Augmented Generation (RAG) improves large language models (LMs) by incorporating non-parametric knowledge through evidence retrieval from external sources. However, it often struggles to filter out inconsistent and irrelevant information that can distract the LM from its tasks. While compressing the retrieved evidence with a compression model aims to address this issue, the compressed evidence may still be unfamiliar to the target model used for downstream task, potentially failing to utilize the evidence effectively. We propose FaviComp (Familiarity-aware Evidence Compression), a novel training-free evidence compression technique that makes retrieved evidence more familiar to the target model, while seamlessly integrating parametric knowledge from the model. Specifically, FaviComp proactively lowers the perplexity of the compressed evidence with regard to the target model by combining token probabilities from both the compression model and the target model to generate context that is more familiar to the target model. This approach balances the integration of parametric and non-parametric knowledge, which is especially helpful in complex tasks where the retrieved evidence set may not contain all the necessary information. Experimental results demonstrate that FaviComp consistently outperforms existing baselines in multiple open-domain QA datasets, achieving high compression rates and showcasing the effective integration of both parametric and non-parametric knowledge.
Nitin Aravind Birur, Tanay Baswa, Divyanshu Kumar, Jatan Loya, Sahil Agarwal, Prashanth Harshangi
Large language models (LLMs) exhibit remarkable capabilities but often produce inaccurate responses, as they rely solely on their embedded knowledge. Retrieval-Augmented Generation (RAG) enhances LLMs by incorporating an external information retrieval system, supplying additional context along with the query to mitigate inaccuracies for a particular context. However, accuracy issues still remain, as the model may rely on irrelevant documents or extrapolate incorrectly from its training knowledge. To assess and improve the performance of both the retrieval system and the LLM in a RAG framework, we propose \textbf{VERA} (\textbf{V}alidation and \textbf{E}nhancement for \textbf{R}etrieval \textbf{A}ugmented systems), a system designed to: 1) Evaluate and enhance the retrieved context before response generation, and 2) Evaluate and refine the LLM-generated response to ensure precision and minimize errors. VERA employs an evaluator-cum-enhancer LLM that first checks if external retrieval is necessary, evaluates the relevance and redundancy of the retrieved context, and refines it to eliminate non-essential information. Post-response generation, VERA splits the response into atomic statements, assesses their relevance to the query, and ensures adherence to the context. Our experiments demonstrate VERA's remarkable efficacy not only in improving the performance of smaller open-source models, but also larger state-of-the art models. These enhancements underscore VERA's potential to produce accurate and relevant responses, advancing the state-of-the-art in retrieval-augmented language modeling. VERA's robust methodology, combining multiple evaluation and refinement steps, effectively mitigates hallucinations and improves retrieval and response processes, making it a valuable tool for applications demanding high accuracy and reliability in information generation. .
Xincheng Liao, Junwen Duan, Yixi Huang, Jianxin Wang
Unified information extraction (UIE) aims to complete all information
extraction tasks using a single model or framework. While previous work has
primarily focused on instruction-tuning large language models (LLMs) with
constructed datasets, these methods require significant computational resources
and struggle to generalize to unseen tasks. To address these limitations, we
propose RUIE (Retrieval-based Unified Information Extraction), a framework that
leverages in-context learning to enable rapid generalization while reducing
computational costs. The key challenge in RUIE is selecting the most beneficial
demonstrations for LLMs to effectively handle diverse IE tasks. To achieve
this, we integrate LLM preferences for ranking candidate demonstrations and
design a keyword-enhanced reward model to capture fine-grained relationships
between queries and demonstrations. We then train a bi-encoder retriever for
UIE through contrastive learning and knowledge distillation. To the best of our
knowledge, RUIE is the first trainable retrieval framework for UIE.
Experimental results on 8 held-out datasets demonstrate RUIE's effectiveness in
generalizing to unseen tasks, with average F1-score improvements of 19.22 and
3.13 compared to instruction-tuning methods and other retrievers, respectively.
Further analysis confirms RUIE's adaptability to LLMs of varying sizes and the
importance of its key components.
Authors' comments: 14 pages, 3 figures
Sai Shashank Kalakonda, Shubh Maheshwari, Ravi Kiran Sarvadevabhatla
We introduce MoRAG, a novel multi-part fusion based retrieval-augmented generation strategy for text-based human motion generation. The method enhances motion diffusion models by leveraging additional knowledge obtained through an improved motion retrieval process. By effectively prompting large language models (LLMs), we address spelling errors and rephrasing issues in motion retrieval. Our approach utilizes a multi-part retrieval strategy to improve the generalizability of motion retrieval across the language space. We create diverse samples through the spatial composition of the retrieved motions. Furthermore, by utilizing low-level, part-specific motion information, we can construct motion samples for unseen text descriptions. Our experiments demonstrate that our framework can serve as a plug-and-play module, improving the performance of motion diffusion models. Code, pretrained models and sample videos are available at: https://motion-rag.github.io/
Yukang Lin, Bingchen Zhong, Shuoran Jiang, Joanna Siebert, Qingcai Chen
Large language models(LLMs) have exhibited remarkable few-shot learning capabilities and unified the paradigm of NLP tasks through the in-context learning(ICL) technique. Despite the success of ICL, the quality of the exemplar demonstrations can significantly influence the LLM's performance. Existing exemplar selection methods mainly focus on the semantic similarity between queries and candidate exemplars. On the other hand, the logical connections between reasoning steps can be beneficial to depict the problem-solving process as well. In this paper, we proposes a novel method named Reasoning Graph-enhanced Exemplar Retrieval(RGER). RGER first quires LLM to generate an initial response, then expresses intermediate problem-solving steps to a graph structure. After that, it employs graph kernel to select exemplars with semantic and structural similarity. Extensive experiments demonstrate the structural relationship is helpful to the alignment of queries and candidate exemplars. The efficacy of RGER on math and logit reasoning tasks showcases its superiority over state-of-the-art retrieval-based approaches. Our code is released at https://github.com/Yukang-Lin/RGER.
Orion Weller, Benjamin Van Durme, Dawn Lawrie, Ashwin Paranjape, Yuhao Zhang, Jack Hessel
Instruction-tuned language models (LM) are able to respond to imperative commands, providing a more natural user interface compared to their base counterparts. In this work, we present Promptriever, the first retrieval model able to be prompted like an LM. To train Promptriever, we curate and release a new instance-level instruction training set from MS MARCO, spanning nearly 500k instances. Promptriever not only achieves strong performance on standard retrieval tasks, but also follows instructions. We observe: (1) large gains (reaching SoTA) on following detailed relevance instructions (+14.3 p-MRR / +3.1 nDCG on FollowIR), (2) significantly increased robustness to lexical choices/phrasing in the query+instruction (+12.9 Robustness@10 on InstructIR), and (3) the ability to perform hyperparameter search via prompting to reliably improve retrieval performance (+1.4 average increase on BEIR). Promptriever demonstrates that retrieval models can be controlled with prompts on a per-query basis, setting the stage for future work aligning LM prompting techniques with information retrieval.
Yanan Jian, Fuxun Yu, Qi Zhang, William Levine, Brandon Dubbs, Nikolaos Karianakis
This paper presents a novel way of online adapting any off-the-shelf object
detection model to a novel domain without retraining the detector model.
Inspired by how humans quickly learn knowledge of a new subject (e.g.,
memorization), we allow the detector to look up similar object concepts from
memory during test time. This is achieved through a retrieval augmented
classification (RAC) module together with a memory bank that can be flexibly
updated with new domain knowledge. We experimented with various off-the-shelf
open-set detector and close-set detectors. With only a tiny memory bank (e.g.,
10 images per category) and being training-free, our online learning method
could significantly outperform baselines in adapting a detector to novel
domains.
Authors' comments: Accepted at ECCV 2024, Human-Inspired Computer Vision (HCV) workshop
Yujia Zhou, Yan Liu, Xiaoxi Li, Jiajie Jin, Hongjin Qian, Zheng Liu, Chaozhuo Li, Zhicheng Dou et al.
Retrieval-Augmented Generation (RAG) has quickly grown into a pivotal paradigm in the development of Large Language Models (LLMs). While much of the current research in this field focuses on performance optimization, particularly in terms of accuracy and efficiency, the trustworthiness of RAG systems remains an area still under exploration. From a positive perspective, RAG systems are promising to enhance LLMs by providing them with useful and up-to-date knowledge from vast external databases, thereby mitigating the long-standing problem of hallucination. While from a negative perspective, RAG systems are at the risk of generating undesirable contents if the retrieved information is either inappropriate or poorly utilized. To address these concerns, we propose a unified framework that assesses the trustworthiness of RAG systems across six key dimensions: factuality, robustness, fairness, transparency, accountability, and privacy. Within this framework, we thoroughly review the existing literature on each dimension. Additionally, we create the evaluation benchmark regarding the six dimensions and conduct comprehensive evaluations for a variety of proprietary and open-source models. Finally, we identify the potential challenges for future research based on our investigation results. Through this work, we aim to lay a structured foundation for future investigations and provide practical insights for enhancing the trustworthiness of RAG systems in real-world applications.
Yifei Xin, Xuxin Cheng, Zhihong Zhu, Xusheng Yang, Yuexian Zou
Existing audio-text retrieval (ATR) methods are essentially discriminative
models that aim to maximize the conditional likelihood, represented as
p(candidates|query). Nevertheless, this methodology fails to consider the
intrinsic data distribution p(query), leading to difficulties in discerning
out-of-distribution data. In this work, we attempt to tackle this constraint
through a generative perspective and model the relationship between audio and
text as their joint probability p(candidates,query). To this end, we present a
diffusion-based ATR framework (DiffATR), which models ATR as an iterative
procedure that progressively generates joint distribution from noise.
Throughout its training phase, DiffATR is optimized from both generative and
discriminative viewpoints: the generator is refined through a generation loss,
while the feature extractor benefits from a contrastive loss, thus combining
the merits of both methodologies. Experiments on the AudioCaps and Clotho
datasets with superior performances, verify the effectiveness of our approach.
Notably, without any alterations, our DiffATR consistently exhibits strong
performance in out-of-domain retrieval settings.
Authors' comments: Accepted by Interspeech2024