Philippe Jaming, Rolando Perez Iii
The aim of this paper is to get a deeper understanding of the spaces of variable bandwidth introduced by Gr{\"o}chenig and Klotz (What is variable bandwidth? Comm. Pure Appl. Math., 70 (2017), 2039-2083). In particular, we show that when the variation of the bandwidth is modeled by a step function with a finite number of jumps, then, the sign retrieval principle applies.
Yanhong Li, David Yunis, David McAllester, Jiawei Zhou
There has recently been considerable interest in incorporating information
retrieval into large language models (LLMs). Retrieval from a dynamically
expanding external corpus of text allows a model to incorporate current events
and can be viewed as a form of episodic memory. Here we demonstrate that
pre-processing the external corpus into semi-structured ''atomic facts'' makes
retrieval more efficient. More specifically, we demonstrate that our particular
form of atomic facts improves performance on various question answering tasks
when the amount of retrieved text is limited. Limiting the amount of retrieval
reduces the size of the context and improves inference efficiency.
Authors' comments: NAACL 2025 Main Conference
Jiali Cheng, Hadi Amiri
This study finds that existing information retrieval (IR) models show
significant biases based on the linguistic complexity of input queries,
performing well on linguistically simpler (or more complex) queries while
underperforming on linguistically more complex (or simpler) queries. To address
this issue, we propose EqualizeIR, a framework to mitigate linguistic biases in
IR models. EqualizeIR uses a linguistically biased weak learner to capture
linguistic biases in IR datasets and then trains a robust model by regularizing
and refining its predictions using the biased weak learner. This approach
effectively prevents the robust model from overfitting to specific linguistic
patterns in data. We propose four approaches for developing
linguistically-biased models. Extensive experiments on several datasets show
that our method reduces performance disparities across linguistically simple
and complex queries, while improving overall retrieval performance.
Authors' comments: NAACL 2025
Ahmed H. Salamah, Pierre McWhannel, Nicole Yan
Information retrieval systems have traditionally relied on exact term match methods such as BM25 for first-stage retrieval. However, recent advancements in neural network-based techniques have introduced a new method called dense retrieval. This approach uses a dual-encoder to create contextual embeddings that can be indexed and clustered efficiently at run-time, resulting in improved retrieval performance in Open-domain Question Answering systems. In this paper, we apply the dense retrieval technique to conversational search by conducting experiments on the CAsT benchmark dataset. We also propose an end-to-end conversational search system called GPT2QR+DPR, which incorporates various query reformulation strategies to improve retrieval accuracy. Our findings indicate that dense retrieval outperforms BM25 even without extensive fine-tuning. Our work contributes to the growing body of research on neural-based retrieval methods in conversational search, and highlights the potential of dense retrieval in improving retrieval accuracy in conversational search systems.
Haoyu Huang, Yongfeng Huang, Junjie Yang, Zhenyu Pan, Yongqiang Chen, Kaili Ma, Hongzhi Chen, James Cheng
Graph-based Retrieval-Augmented Generation (RAG) methods have significantly enhanced the performance of large language models (LLMs) in domain-specific tasks. However, existing RAG methods do not adequately utilize the naturally inherent hierarchical knowledge in human cognition, which limits the capabilities of RAG systems. In this paper, we introduce a new RAG approach, called HiRAG, which utilizes hierarchical knowledge to enhance the semantic understanding and structure capturing capabilities of RAG systems in the indexing and retrieval processes. Our extensive experiments demonstrate that HiRAG achieves significant performance improvements over the state-of-the-art baseline methods. The code of our proposed method is available at \href{https://github.com/hhy-huang/HiRAG}{https://github.com/hhy-huang/HiRAG}.
Zecheng Zhao, Zhi Chen, Zi Huang, Shazia Sadiq, Tong Chen
Text-to-Video Retrieval (TVR) aims to match videos with corresponding textual queries, yet the continual influx of new video content poses a significant challenge for maintaining system performance over time. In this work, we introduce the first benchmark for Continual Text-to-Video Retrieval (CTVR) to overcome these limitations. Our analysis reveals that current TVR methods based on pre-trained models struggle to retain plasticity when adapting to new tasks, while existing continual learning approaches experience catastrophic forgetting, resulting in semantic misalignment between historical queries and stored video features. To address these challenges, we propose StableFusion, a novel CTVR framework comprising two main components: the Frame Fusion Adapter (FFA), which captures temporal dynamics in video content while preserving model flexibility, and the Task-Aware Mixture-of-Experts (TAME), which maintains consistent semantic alignment between queries across tasks and the stored video features. Comprehensive evaluations on two benchmark datasets under various task settings demonstrate that StableFusion outperforms existing continual learning and TVR methods, achieving superior retrieval performance with minimal degradation on earlier tasks in the context of continuous video streams. Our code is available at: https://github.com/JasonCodeMaker/CTVR
Qi Xu, Annie Qu
In the era of big data, large-scale, multi-modal datasets are increasingly ubiquitous, offering unprecedented opportunities for predictive modeling and scientific discovery. However, these datasets often exhibit complex heterogeneity, such as covariate shift, posterior drift, and missing modalities, that can hinder the accuracy of existing prediction algorithms. To address these challenges, we propose a novel Representation Retrieval ($R^2$) framework, which integrates a representation learning module (the representer) with a sparsity-induced machine learning model (the learner). Moreover, we introduce the notion of "integrativeness" for representers, characterized by the effective data sources used in learning representers, and propose a Selective Integration Penalty (SIP) to explicitly improve the property. Theoretically, we demonstrate that the $R^2$ framework relaxes the conventional full-sharing assumption in multi-task learning, allowing for partially shared structures, and that SIP can improve the convergence rate of the excess risk bound. Extensive simulation studies validate the empirical performance of our framework, and applications to two real-world datasets further confirm its superiority over existing approaches.
Yang Nan, Huichi Zhou, Xiaodan Xing, Giorgos Papanastasiou, Lei Zhu, Zhifan Gao, Alejandro F Fangi, Guang Yang
As artificial intelligence and digital medicine increasingly permeate healthcare systems, robust governance frameworks are essential to ensure ethical, secure, and effective implementation. In this context, medical image retrieval becomes a critical component of clinical data management, playing a vital role in decision-making and safeguarding patient information. Existing methods usually learn hash functions using bottleneck features, which fail to produce representative hash codes from blended embeddings. Although contrastive hashing has shown superior performance, current approaches often treat image retrieval as a classification task, using category labels to create positive/negative pairs. Moreover, many methods fail to address the out-of-distribution (OOD) issue when models encounter external OOD queries or adversarial attacks. In this work, we propose a novel method to consolidate knowledge of hierarchical features and optimisation functions. We formulate the knowledge consolidation by introducing Depth-aware Representation Fusion (DaRF) and Structure-aware Contrastive Hashing (SCH). DaRF adaptively integrates shallow and deep representations into blended features, and SCH incorporates image fingerprints to enhance the adaptability of positive/negative pairings. These blended features further facilitate OOD detection and content-based recommendation, contributing to a secure AI-driven healthcare environment. Moreover, we present a content-guided ranking to improve the robustness and reproducibility of retrieval results. Our comprehensive assessments demonstrate that the proposed method could effectively recognise OOD samples and significantly outperform existing approaches in medical image retrieval (p<0.05). In particular, our method achieves a 5.6-38.9% improvement in mean Average Precision on the anatomical radiology dataset.
Juseon-Do, Jaesung Hwang, Jingun Kwon, Hidetaka Kamigaito, Manabu Okumura
This study investigates retrieval-augmented summarization by specifically
examining the impact of exemplar summary lengths under length constraints, not
covered by previous work. We propose a Diverse Length-aware Maximal Marginal
Relevance (DL-MMR) algorithm to better control summary lengths. This algorithm
combines the query relevance with diverse target lengths in retrieval-augmented
summarization. Unlike previous methods that necessitate exhaustive exemplar
exemplar relevance comparisons using MMR, DL-MMR considers the exemplar target
length as well and avoids comparing exemplars to each other, thereby reducing
computational cost and conserving memory during the construction of an exemplar
pool. Experimental results showed the effectiveness of DL-MMR, which considers
length diversity, compared to the original MMR algorithm. DL-MMR additionally
showed the effectiveness in memory saving of 781,513 times and computational
cost reduction of 500,092 times, while maintaining the same level of
informativeness.
Authors' comments: 12 pages, accepted to NAACL 2025 Findings
Xuan Lu, Sifan Liu, Bochao Yin, Yongqi Li, Xinghao Chen, Hui Su, Yaohui Jin, Wenjun Zeng et al.
In this paper, we introduce MultiConIR, the first benchmark designed to evaluate retrieval models in multi-condition scenarios. Unlike existing datasets that primarily focus on single-condition queries from search engines, MultiConIR captures real-world complexity by incorporating five diverse domains: books, movies, people, medical cases, and legal documents. We propose three tasks to systematically assess retrieval and reranking models on multi-condition robustness, monotonic relevance ranking, and query format sensitivity. Our findings reveal that existing retrieval and reranking models struggle with multi-condition retrieval, with rerankers suffering severe performance degradation as query complexity increases. We further investigate the performance gap between retrieval and reranking models, exploring potential reasons for these discrepancies, and analysis the impact of different pooling strategies on condition placement sensitivity. Finally, we highlight the strengths of GritLM and Nv-Embed, which demonstrate enhanced adaptability to multi-condition queries, offering insights for future retrieval models. The code and datasets are available at https://github.com/EIT-NLP/MultiConIR.
Jianhui Wang, Zhifei Yang, Yangfan He, Huixiong Zhang, Yuxuan Chen, Jingwei Huang
Accurate material retrieval is critical for creating realistic 3D assets. Existing methods rely on datasets that capture shape-invariant and lighting-varied representations of materials, which are scarce and face challenges due to limited diversity and inadequate real-world generalization. Most current approaches adopt traditional image search techniques. They fall short in capturing the unique properties of material spaces, leading to suboptimal performance in retrieval tasks. Addressing these challenges, we introduce MaRI, a framework designed to bridge the feature space gap between synthetic and real-world materials. MaRI constructs a shared embedding space that harmonizes visual and material attributes through a contrastive learning strategy by jointly training an image and a material encoder, bringing similar materials and images closer while separating dissimilar pairs within the feature space. To support this, we construct a comprehensive dataset comprising high-quality synthetic materials rendered with controlled shape variations and diverse lighting conditions, along with real-world materials processed and standardized using material transfer techniques. Extensive experiments demonstrate the superior performance, accuracy, and generalization capabilities of MaRI across diverse and complex material retrieval tasks, outperforming existing methods.
Justus-Jonas Erker, Nils Reimers, Iryna Gurevych
Decomposition-based multi-hop retrieval methods rely on many autoregressive
steps to break down complex queries, which breaks end-to-end differentiability
and is computationally expensive. Decomposition-free methods tackle this, but
current decomposition-free approaches struggle with longer multi-hop problems
and generalization to out-of-distribution data. To address these challenges, we
introduce GRITHopper-7B, a novel multi-hop dense retrieval model that achieves
state-of-the-art performance on both in-distribution and out-of-distribution
benchmarks. GRITHopper combines generative and representational instruction
tuning by integrating causal language modeling with dense retrieval training.
Through controlled studies, we find that incorporating additional context after
the retrieval process, referred to as post-retrieval language modeling,
enhances dense retrieval performance. By including elements such as final
answers during training, the model learns to better contextualize and retrieve
relevant information. GRITHopper-7B offers a robust, scalable, and
generalizable solution for multi-hop dense retrieval, and we release it to the
community for future research and applications requiring multi-hop reasoning
and retrieval capabilities.
Authors' comments: Under Review at ACL Rolling Review (ARR)
Meng Zheng, Jiajin Zhang, Benjamin Planche, Zhongpai Gao, Terrence Chen, Ziyan Wu
Image-Text Retrieval (ITR) finds broad applications in healthcare, aiding
clinicians and radiologists by automatically retrieving relevant patient cases
in the database given the query image and/or report, for more efficient
clinical diagnosis and treatment, especially for rare diseases. However
conventional ITR systems typically only rely on global image or text
representations for measuring patient image/report similarities, which overlook
local distinctiveness across patient cases. This often results in suboptimal
retrieval performance. In this paper, we propose an Anatomical
Location-Conditioned Image-Text Retrieval (ALC-ITR) framework, which, given a
query image and the associated suspicious anatomical region(s), aims to
retrieve similar patient cases exhibiting the same disease or symptoms in the
same anatomical region. To perform location-conditioned multimodal retrieval,
we learn a medical Relevance-Region-Aligned Vision Language (RRA-VL) model with
semantic global-level and region-/word-level alignment to produce
generalizable, well-aligned multi-modal representations. Additionally, we
perform location-conditioned contrastive learning to further utilize cross-pair
region-level contrastiveness for improved multi-modal retrieval. We show that
our proposed RRA-VL achieves state-of-the-art localization performance in
phase-grounding tasks, and satisfying multi-modal retrieval performance with or
without location conditioning. Finally, we thoroughly investigate the
generalizability and explainability of our proposed ALC-ITR system in providing
explanations and preliminary diagnosis reports given retrieved patient cases
(conditioned on anatomical regions), with proper off-the-shelf LLM prompts.
Authors' comments: 16 pages, 10 figures
Gabriele Bizzarri, Miranda Parisi, Mylenne Manrique, Ilaria Gianani, Andrea Chiuri, Matteo Rosati, Vittorio Giovannetti, Matteo G. A. Paris et al.
The description of complex systems requires a progressively larger number of parameters. However, in practice it often happens that a small subset of parameters suffice to describe the dynamics of the system itself in terms of stiff combinations, while the remaining sloppy combinations provide no information on the system. While this effect can reduce model complexity, it can also limit the estimation precision when the stiff and sloppy combinations are unknown to the experimenter, and one is forced to estimate the potentially sloppy model parameters. We explored how such a sloppy behavior can be controlled and counteracted via quantum weak measurements in the estimation of two sequential phases. We showed that the introduction of a weak measurement of variable strength in-between the two phases allows to switch from a fully sloppy setup to a fully determined one where both phases can be estimated with quantum-limited precision. Our work provides an important insight of sloppiness detection in quantum systems, with promising applications in quantum metrology and imaging, as well as to quantum security and quantum monitoring.
Quan Mai, Susan Gauch, Douglas Adams
We present Boolean-aware attention, a novel attention mechanism that dynamically adjusts token focus based on Boolean operators (e.g., and, or, not). Our model employs specialized Boolean experts, each tailored to amplify or suppress attention for operator-specific contexts. A predefined gating mechanism activates the corresponding experts based on the detected Boolean type. Experiments on Boolean retrieval datasets demonstrate that integrating BoolAttn with BERT greatly enhances the model's capability to process Boolean queries.
Abdelrahman Abdallah, Jamshid Mozafari, Bhawna Piryani, Mohammed Ali, Adam Jatowt
Knowledge-intensive tasks, particularly open-domain question answering
(ODQA), document reranking, and retrieval-augmented language modeling, require
a balance between retrieval accuracy and generative flexibility. Traditional
retrieval models such as BM25 and Dense Passage Retrieval (DPR), efficiently
retrieve from large corpora but often lack semantic depth. Generative models
like GPT-4-o provide richer contextual understanding but face challenges in
maintaining factual consistency. In this work, we conduct a systematic
evaluation of retrieval-based, generation-based, and hybrid models, with a
primary focus on their performance in ODQA and related retrieval-augmented
tasks. Our results show that dense retrievers, particularly DPR, achieve strong
performance in ODQA with a top-1 accuracy of 50.17\% on NQ, while hybrid models
improve nDCG@10 scores on BEIR from 43.42 (BM25) to 52.59, demonstrating their
strength in document reranking. Additionally, we analyze language modeling
tasks using WikiText-103, showing that retrieval-based approaches like BM25
achieve lower perplexity compared to generative and hybrid methods,
highlighting their utility in retrieval-augmented generation. By providing
detailed comparisons and practical insights into the conditions where each
approach excels, we aim to facilitate future optimizations in retrieval,
reranking, and generative models for ODQA and related knowledge-intensive
applications.
Authors' comments: work on progress
Rachid Guerraoui, Anne-Marie Kermarrec, Diana Petrescu, Rafael Pires, Mathis Randl, Martijn de Vos
Large language models (LLMs) have demonstrated remarkable capabilities across
various domains but remain susceptible to hallucinations and inconsistencies,
limiting their reliability. Retrieval-augmented generation (RAG) mitigates
these issues by grounding model responses in external knowledge sources.
Existing RAG workflows often leverage a single vector database, which is
impractical in the common setting where information is distributed across
multiple repositories. We introduce RAGRoute, a novel mechanism for federated
RAG search. RAGRoute dynamically selects relevant data sources at query time
using a lightweight neural network classifier. By not querying every data
source, this approach significantly reduces query overhead, improves retrieval
efficiency, and minimizes the retrieval of irrelevant information. We evaluate
RAGRoute using the MIRAGE and MMLU benchmarks and demonstrate its effectiveness
in retrieving relevant documents while reducing the number of queries. RAGRoute
reduces the total number of queries up to 77.5% and communication volume up to
76.2%.
Authors' comments: To appear in the proceedings of EuroMLSys'25
Laura Perez-Beltrachini, Mirella Lapata
Retrieval augmented Question Answering (QA) helps QA models overcome knowledge gaps by incorporating retrieved evidence, typically a set of passages, alongside the question at test time. Previous studies show that this approach improves QA performance and reduces hallucinations, without, however, assessing whether the retrieved passages are indeed useful at answering correctly. In this work, we propose to quantify the uncertainty of a QA model via estimating the utility of the passages it is provided with. We train a lightweight neural model to predict passage utility for a target QA model and show that while simple information theoretic metrics can predict answer correctness up to a certain extent, our approach efficiently approximates or outperforms more expensive sampling-based methods. Code and data are available at https://github.com/lauhaide/ragu.
Gabriele Berton, Carlo Masone
Retrieving images from the same location as a given query is an important
component of multiple computer vision tasks, like Visual Place Recognition,
Landmark Retrieval, Visual Localization, 3D reconstruction, and SLAM. However,
existing solutions are built to specifically work for one of these tasks, and
are known to fail when the requirements slightly change or when they meet
out-of-distribution data. In this paper we combine a variety of existing
methods, training techniques, and datasets to train a retrieval model, called
MegaLoc, that is performant on multiple tasks. We find that MegaLoc (1)
achieves state of the art on a large number of Visual Place Recognition
datasets, (2) impressive results on common Landmark Retrieval datasets, and (3)
sets a new state of the art for Visual Localization on the LaMAR datasets,
where we only changed the retrieval method to the existing localization
pipeline. The code for MegaLoc is available at
https://github.com/gmberton/MegaLoc
Authors' comments: Tech Report
Milan Gritta, Huiyin Xue, Gerasimos Lampouras
Speculative decoding (SD) accelerates Large Language Model (LLM) generation
by using an efficient draft model to propose the next few tokens, which are
verified by the LLM in a single forward call, reducing latency while preserving
its outputs. We focus on retrieval-based SD where the draft model retrieves the
next tokens from a non-parametric datastore. Sparse retrieval (REST), which
operates on the surface form of strings, is currently the dominant paradigm due
to its simplicity and scalability. However, its effectiveness is limited due to
the usage of short contexts and exact string matching. Instead, we introduce
Dense Retrieval for Speculative Decoding (DReSD), a novel framework that uses
approximate nearest neighbour search with contextualised token embeddings to
retrieve the most semantically relevant token sequences for SD. Extensive
experiments show that DReSD achieves (on average) 87% higher acceptance rates,
65% longer accepted tokens and 19% faster generation speeds compared to sparse
retrieval (REST).
Authors' comments: Under Review