Huifeng Lin, Gang Su, Jintao Liang, You Wu, Rui Zhao, Ziyue Li
Retrieval-Augmented Generation (RAG) based on Large Language Models (LLMs) is
a powerful solution to understand and query the industry's closed-source
documents. However, basic RAG often struggles with complex QA tasks in legal
and regulatory domains, particularly when dealing with numerous government
documents. The top-$k$ strategy frequently misses golden chunks, leading to
incomplete or inaccurate answers. To address these retrieval bottlenecks, we
explore two strategies to improve evidence coverage and answer quality. The
first is a One-SHOT retrieval method that adaptively selects chunks based on a
token budget, allowing as much relevant content as possible to be included
within the model's context window. Additionally, we design modules to further
filter and refine the chunks. The second is an iterative retrieval strategy
built on a Reasoning Agentic RAG framework, where a reasoning LLM dynamically
issues search queries, evaluates retrieved results, and progressively refines
the context over multiple turns. We identify query drift and retrieval laziness
issues and further design two modules to tackle them. Through extensive
experiments on a dataset of government documents, we aim to offer practical
insights and guidance for real-world applications in legal and regulatory
domains.
Authors' comments: under Review of EMNLP 2025
Or Shachar, Uri Katz, Yoav Goldberg, Oren Glickman
We present NER Retriever, a zero-shot retrieval framework for ad-hoc Named
Entity Retrieval, a variant of Named Entity Recognition (NER), where the types
of interest are not provided in advance, and a user-defined type description is
used to retrieve documents mentioning entities of that type. Instead of relying
on fixed schemas or fine-tuned models, our method builds on internal
representations of large language models (LLMs) to embed both entity mentions
and user-provided open-ended type descriptions into a shared semantic space. We
show that internal representations, specifically the value vectors from
mid-layer transformer blocks, encode fine-grained type information more
effectively than commonly used top-layer embeddings. To refine these
representations, we train a lightweight contrastive projection network that
aligns type-compatible entities while separating unrelated types. The resulting
entity embeddings are compact, type-aware, and well-suited for nearest-neighbor
search. Evaluated on three benchmarks, NER Retriever significantly outperforms
both lexical and dense sentence-level retrieval baselines. Our findings provide
empirical support for representation selection within LLMs and demonstrate a
practical solution for scalable, schema-free entity retrieval. The NER
Retriever Codebase is publicly available at
https://github.com/ShacharOr100/ner_retriever
Authors' comments: Findings of EMNLP 2025
Baiqiang Wang, Qian Lou, Mengxin Zheng, Dongfang Zhao
Retrieval-Augmented Generation (RAG) has become a foundational component of modern AI systems, yet it introduces significant privacy risks by exposing user queries to service providers. To address this, we introduce PIR-RAG, a practical system for privacy-preserving RAG. PIR-RAG employs a novel architecture that uses coarse-grained semantic clustering to prune the search space, combined with a fast, lattice-based Private Information Retrieval (PIR) protocol. This design allows for the efficient retrieval of entire document clusters, uniquely optimizing for the end-to-end RAG workflow where full document content is required. Our comprehensive evaluation against strong baseline architectures, including graph-based PIR and Tiptoe-style private scoring, demonstrates PIR-RAG's scalability and its superior performance in terms of "RAG-Ready Latency"-the true end-to-end time required to securely fetch content for an LLM. Our work establishes PIR-RAG as a viable and highly efficient solution for privacy in large-scale AI systems.
Xuejun Chang, Zaiqiao Meng, Debasis Ganguly
Retrievability of a document is a collection-based statistic that measures
its expected (reciprocal) rank of being retrieved within a specific rank
cut-off. A collection with uniformly distributed retrievability scores across
documents is an indicator of fair document exposure. While retrievability
scores have been used to quantify the fairness of exposure for a collection, in
our work, we use the distribution of retrievability scores to measure the
exposure bias of retrieval models. We hypothesise that an uneven distribution
of retrievability scores across the entire collection may not accurately
reflect exposure bias but rather indicate variations in topical relevance. As a
solution, we propose a topic-focused localised retrievability measure, which we
call \textit{T-Retrievability} (topic-retrievability), which first computes
retrievability scores over multiple groups of topically-related documents, and
then aggregates these localised values to obtain the collection-level
statistics. Our analysis using this proposed T-Retrievability measure uncovers
new insights into the exposure characteristics of various neural ranking
models. The findings suggest that this localised measure provides a more
nuanced understanding of exposure fairness, offering a more reliable approach
for assessing document accessibility in IR systems.
Authors' comments: Accepted by Proceedings of the 34th ACM International Conference on
Information and Knowledge Management (CIKM 2025), November 10-14, 2025,
Seoul, Republic of Korea
Yanbo Dai, Zhenlan Ji, Zongjie Li, Kuan Li, Shuai Wang
Retrieval-Augmented Generation (RAG) has become a standard approach for improving the reliability of large language models (LLMs). Prior work demonstrates the vulnerability of RAG systems by misleading them into generating attacker-chosen outputs through poisoning the knowledge base. However, this paper uncovers that such attacks could be mitigated by the strong \textit{self-correction ability (SCA)} of modern LLMs, which can reject false context once properly configured. This SCA poses a significant challenge for attackers aiming to manipulate RAG systems. In contrast to previous poisoning methods, which primarily target the knowledge base, we introduce \textsc{DisarmRAG}, a new poisoning paradigm that compromises the retriever itself to suppress the SCA and enforce attacker-chosen outputs. This compromisation enables the attacker to straightforwardly embed anti-SCA instructions into the context provided to the generator, thereby bypassing the SCA. To this end, we present a contrastive-learning-based model editing technique that performs localized and stealthy edits, ensuring the retriever returns a malicious instruction only for specific victim queries while preserving benign retrieval behavior. To further strengthen the attack, we design an iterative co-optimization framework that automatically discovers robust instructions capable of bypassing prompt-based defenses. We extensively evaluate DisarmRAG across six LLMs and three QA benchmarks. Our results show near-perfect retrieval of malicious instructions, which successfully suppress SCA and achieve attack success rates exceeding 90\% under diverse defensive prompts. Also, the edited retriever remains stealthy under several detection methods, highlighting the urgent need for retriever-centric defenses.
Jonghyun Song, Youngjune Lee, Gyu-Hwung Cho, Ilhyeon Song, Saehun Kim, Yohan Jo
Vision-Language Pretrained (VLP) models have achieved impressive performance
on multimodal tasks, including text-image retrieval, based on dense
representations. Meanwhile, Learned Sparse Retrieval (LSR) has gained traction
in text-only settings due to its interpretability and efficiency with fast
term-based lookup via inverted indexes. Inspired by these advantages, recent
work has extended LSR to the multimodal domain. However, these methods often
rely on computationally expensive contrastive pre-training, or distillation
from a frozen dense model, which limits the potential for mutual enhancement.
To address these limitations, we propose a simple yet effective framework that
enables bi-directional learning between dense and sparse representations
through Self-Knowledge Distillation. This bi-directional learning is achieved
using an integrated similarity score-a weighted sum of dense and sparse
similarities-which serves as a shared teacher signal for both representations.
To ensure efficiency, we fine-tune the final layer of the dense encoder and the
sparse projection head, enabling easy adaptation of any existing VLP model.
Experiments on MSCOCO and Flickr30k demonstrate that our sparse retriever not
only outperforms existing sparse baselines, but also achieves performance
comparable to-or even surpassing-its dense counterparts, while retaining the
benefits of sparse models.
Authors' comments: accepted to CIKM 2025 short research paper track
Weijia Liu, Jiuxin Cao, Bo Miao, Zhiheng Fu, Xuelin Zhu, Jiawei Ge, Bo Liu, Mehwish Nasim et al.
Current text-driven Video Moment Retrieval (VMR) methods encode all video
clips, including irrelevant ones, disrupting multimodal alignment and hindering
optimization. To this end, we propose a denoise-then-retrieve paradigm that
explicitly filters text-irrelevant clips from videos and then retrieves the
target moment using purified multimodal representations. Following this
paradigm, we introduce the Denoise-then-Retrieve Network (DRNet), comprising
Text-Conditioned Denoising (TCD) and Text-Reconstruction Feedback (TRF)
modules. TCD integrates cross-attention and structured state space blocks to
dynamically identify noisy clips and produce a noise mask to purify multimodal
video representations. TRF further distills a single query embedding from
purified video representations and aligns it with the text embedding, serving
as auxiliary supervision for denoising during training. Finally, we perform
conditional retrieval using text embeddings on purified video representations
for accurate VMR. Experiments on Charades-STA and QVHighlights demonstrate that
our approach surpasses state-of-the-art methods on all metrics. Furthermore,
our denoise-then-retrieve paradigm is adaptable and can be seamlessly
integrated into advanced VMR models to boost performance.
Authors' comments: Accepted by IJCAI 2025
Jaehyun Kwak, Ramahdani Muhammad Izaaz Inhar, Se-Young Yun, Sung-Ju Lee
Composed Image Retrieval (CIR) retrieves relevant images based on a reference
image and accompanying text describing desired modifications. However, existing
CIR methods only focus on retrieving the target image and disregard the
relevance of other images. This limitation arises because most methods
employing contrastive learning-which treats the target image as positive and
all other images in the batch as negatives-can inadvertently include false
negatives. This may result in retrieving irrelevant images, reducing user
satisfaction even when the target image is retrieved. To address this issue, we
propose Query-Relevant Retrieval through Hard Negative Sampling (QuRe), which
optimizes a reward model objective to reduce false negatives. Additionally, we
introduce a hard negative sampling strategy that selects images positioned
between two steep drops in relevance scores following the target image, to
effectively filter false negatives. In order to evaluate CIR models on their
alignment with human satisfaction, we create Human-Preference FashionIQ
(HP-FashionIQ), a new dataset that explicitly captures user preferences beyond
target retrieval. Extensive experiments demonstrate that QuRe achieves
state-of-the-art performance on FashionIQ and CIRR datasets while exhibiting
the strongest alignment with human preferences on the HP-FashionIQ dataset. The
source code is available at https://github.com/jackwaky/QuRe.
Authors' comments: Accepted to ICML 2025
Ting-Wen Ko, Jyun-Yu Jiang, Pu-Jen Cheng
Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating external documents at inference time, enabling up-to-date knowledge access without costly retraining. However, conventional RAG methods retrieve passages independently, often leading to redundant, noisy, or insufficiently diverse context-particularly problematic - particularly problematic in noisy corpora and for multi-hop questions. To address this, we propose Adaptive Passage Combination Retrieval (AdaPCR), a novel framework for open-domain question answering with black-box LMs. AdaPCR explicitly models dependencies between passages by considering passage combinations as units for retrieval and reranking. It consists of a context-aware query reformulation using concatenated passages, and a reranking step trained with a predictive objective aligned with downstream answer likelihood. Crucially, AdaPCR adaptively selects the number of retrieved passages without additional stopping modules. Experiments across several QA benchmarks show that AdaPCR outperforms baselines, particularly in multi-hop reasoning, demonstrating the effectiveness of modeling inter-passage dependencies for improved retrieval.
Antoine Nzeyimana, Andre Niyongabo Rubungo
The recent mainstream adoption of large language model (LLM) technology is enabling novel applications in the form of chatbots and virtual assistants across many domains. With the aim of grounding LLMs in trusted domains and avoiding the problem of hallucinations, retrieval-augmented generation (RAG) has emerged as a viable solution. In order to deploy sustainable RAG systems in low-resource settings, achieving high retrieval accuracy is not only a usability requirement but also a cost-saving strategy. Through empirical evaluations on a Kinyarwanda-language dataset, we find that the most limiting factors in achieving high retrieval accuracy are limited language coverage and inadequate sub-word tokenization in pre-trained language models. We propose a new retriever model, KinyaColBERT, which integrates two key concepts: late word-level interactions between queries and documents, and a morphology-based tokenization coupled with two-tier transformer encoding. This methodology results in lexically grounded contextual embeddings that are both fine-grained and self-contained. Our evaluation results indicate that KinyaColBERT outperforms strong baselines and leading commercial text embedding APIs on a Kinyarwanda agricultural retrieval benchmark. By adopting this retrieval strategy, we believe that practitioners in other low-resource settings can not only achieve reliable RAG systems but also deploy solutions that are more cost-effective.
Li-Cheng Shen, Jih-Kang Hsieh, Wei-Hua Li, Chu-Song Chen
Text-to-image retrieval (TIR) aims to find relevant images based on a textual
query, but existing approaches are primarily based on whole-image captions and
lack interpretability. Meanwhile, referring expression segmentation (RES)
enables precise object localization based on natural language descriptions but
is computationally expensive when applied across large image collections. To
bridge this gap, we introduce Mask-aware TIR (MaTIR), a new task that unifies
TIR and RES, requiring both efficient image search and accurate object
segmentation. To address this task, we propose a two-stage framework,
comprising a first stage for segmentation-aware image retrieval and a second
stage for reranking and object grounding with a multimodal large language model
(MLLM). We leverage SAM 2 to generate object masks and Alpha-CLIP to extract
region-level embeddings offline at first, enabling effective and scalable
online retrieval. Secondly, MLLM is used to refine retrieval rankings and
generate bounding boxes, which are matched to segmentation masks. We evaluate
our approach on COCO and D$^3$ datasets, demonstrating significant improvements
in both retrieval accuracy and segmentation quality over previous methods.
Authors' comments: ICMR 2025
Martin Böckling, Heiko Paulheim, Andreea Iana
Large Language Models (LLMs) have showcased impressive reasoning abilities,
but often suffer from hallucinations or outdated knowledge. Knowledge Graph
(KG)-based Retrieval-Augmented Generation (RAG) remedies these shortcomings by
grounding LLM responses in structured external information from a knowledge
base. However, many KG-based RAG approaches struggle with (i) aligning KG and
textual representations, (ii) balancing retrieval accuracy and efficiency, and
(iii) adapting to dynamically updated KGs. In this work, we introduce
Walk&Retrieve, a simple yet effective KG-based framework that leverages
walk-based graph traversal and knowledge verbalization for corpus generation
for zero-shot RAG. Built around efficient KG walks, our method does not require
fine-tuning on domain-specific data, enabling seamless adaptation to KG
updates, reducing computational overhead, and allowing integration with any
off-the-shelf backbone LLM. Despite its simplicity, Walk&Retrieve performs
competitively, often outperforming existing RAG systems in response accuracy
and hallucination reduction. Moreover, it demonstrates lower query latency and
robust scalability to large KGs, highlighting the potential of lightweight
retrieval strategies as strong baselines for future RAG research.
Authors' comments: Accepted at the Information Retrieval's Role in RAG Systems (IR-RAG
2025) in conjunction with SIGIR 2025
Jinyu Guo, Xunlei Chen, Qiyang Xia, Zhaokun Wang, Jie Ou, Libo Qin, Shunyu Yao, Wenhong Tian
Retrieval-Augmented Generation (RAG) encounters efficiency challenges when
scaling to massive knowledge bases while preserving contextual relevance. We
propose Hash-RAG, a framework that integrates deep hashing techniques with
systematic optimizations to address these limitations. Our queries directly
learn binary hash codes from knowledgebase code, eliminating intermediate
feature extraction steps, and significantly reducing storage and computational
overhead. Building upon this hash-based efficient retrieval framework, we
establish the foundation for fine-grained chunking. Consequently, we design a
Prompt-Guided Chunk-to-Context (PGCC) module that leverages retrieved
hash-indexed propositions and their original document segments through prompt
engineering to enhance the LLM's contextual awareness. Experimental evaluations
on NQ, TriviaQA, and HotpotQA datasets demonstrate that our approach achieves a
90% reduction in retrieval time compared to conventional methods while
maintaining considerate recall performance. Additionally, The proposed system
outperforms retrieval/non-retrieval baselines by 1.4-4.3% in EM scores.
Authors' comments: Accepted at Findings of ACL 2025
Jie Ou, Jinyu Guo, Shuaihong Jiang, Zhaokun Wang, Libo Qin, Shunyu Yao, Wenhong Tian
Retrieval-augmented generation (RAG) has emerged as a pivotal method for
expanding the knowledge of large language models. To handle complex queries
more effectively, researchers developed Adaptive-RAG (A-RAG) to enhance the
generated quality through multiple interactions with external knowledge bases.
Despite its effectiveness, A-RAG exacerbates the pre-existing efficiency
challenges inherent in RAG, which are attributable to its reliance on multiple
iterations of generation. Existing A-RAG approaches process all retrieved
contents from scratch. However, they ignore the situation where there is a
significant overlap in the content of the retrieval results across rounds. The
overlapping content is redundantly represented, which leads to a large
proportion of repeated computations, thus affecting the overall efficiency. To
address this issue, this paper introduces a model-agnostic approach that can be
generally applied to A-RAG methods, which is dedicated to reducing the
redundant representation process caused by the overlapping of retrieval
results. Specifically, we use cache access and parallel generation to speed up
the prefilling and decoding stages respectively. Additionally, we also propose
an instruction-driven module to further guide the model to more effectively
attend to each part of the content in a more suitable way for LLMs. Experiments
show that our approach achieves 2.79 and 2.33 times significant acceleration on
average for prefilling and decoding respectively while maintaining equal
generation quality.
Authors' comments: Accepted at Findings of ACL 2025
Andrei-Laurentiu Bornea, Fadhel Ayed, Antonio De Domenico, Nicola Piovesan, Tareq Si Salem, Ali Maatouk
Artificial intelligence will be one of the key pillars of the next generation
of mobile networks (6G), as it is expected to provide novel added-value
services and improve network performance. In this context, large language
models have the potential to revolutionize the telecom landscape through intent
comprehension, intelligent knowledge retrieval, coding proficiency, and
cross-domain orchestration capabilities. This paper presents Telco-oRAG, an
open-source Retrieval-Augmented Generation (RAG) framework optimized for
answering technical questions in the telecommunications domain, with a
particular focus on 3GPP standards. Telco-oRAG introduces a hybrid retrieval
strategy that combines 3GPP domain-specific retrieval with web search,
supported by glossary-enhanced query refinement and a neural router for
memory-efficient retrieval. Our results show that Telco-oRAG improves the
accuracy in answering 3GPP-related questions by up to 17.6% and achieves a
10.6% improvement in lexicon queries compared to baselines. Furthermore,
Telco-oRAG reduces memory usage by 45% through targeted retrieval of relevant
3GPP series compared to baseline RAG, and enables open-source LLMs to reach
GPT-4-level accuracy on telecom benchmarks.
Authors' comments: 12 pages, 10 figures, 4 tables
Zongyuan Li, Pengfei Li, Runnan Qi, Yanan Ni, Lumin Jiang, Hui Wu, Xuebo Zhang, Kuihua Huang et al.
The lack of domain-specific data in the pre-training of Large Language Models (LLMs) severely limits LLM-based decision systems in specialized applications, while post-training a model in the scenarios requires significant computational resources. In this paper, we present Retrial-Augmented Learning (RAL), a reward-free self-supervised learning framework for LLMs that operates without model training. By developing Retrieval-Augmented Generation (RAG) into a module for organizing intermediate data, we realized a three-stage autonomous knowledge generation of proposing a hypothesis, validating the hypothesis, and generating the knowledge. The method is evaluated in the LLM-PySC2 environment, a representative decision-making platform that combines sufficient complexity with domain-specific knowledge requirements. Experiments demonstrate that the proposed method effectively reduces hallucination by generating and utilizing validated knowledge, and increases decision-making performance at an extremely low cost. Meanwhile, the approach exhibits potential in out-of-distribution(OOD) tasks, robustness, and transferability, making it a cost-friendly but effective solution for decision-making problems and autonomous knowledge generation.
Aarush Sinha
Training effective dense retrieval models often relies on hard negative (HN) examples mined from the document corpus via methods like BM25 or cross-encoders (CE), processes that can be computationally demanding and require full corpus access. This paper introduces a different approach, an end-to-end pipeline where a Large Language Model (LLM) first generates a query from a passage, and then generates a hard negative example using \emph{only} that query text. This corpus-free negative generation contrasts with standard mining techniques. We evaluated this \textsc{LLM Query $\rightarrow$ LLM HN} approach against traditional \textsc{LLM Query $\rightarrow$ BM25 HN} and \textsc{LLM Query $\rightarrow$ CE HN} pipelines using E5-Base and GTE-Base models on several BEIR benchmark datasets. Our results show the proposed all-LLM pipeline achieves performance identical to both the BM25 and the computationally intensive CE baselines across nDCG@10, Precision@10, and Recall@100 metrics. This demonstrates that our corpus-free negative generation method matches the effectiveness of complex, corpus-dependent mining techniques, offering a potentially simpler and more efficient pathway for training high-performance retrievers without sacrificing results. We make the dataset including the queries and the hard-negatives for all three methods publicly available https://huggingface.co/collections/chungimungi/arxiv-hard-negatives-68027bbc601ff6cc8eb1f449.
Quanyu Long, Jianda Chen, Zhengyuan Liu, Nancy F. Chen, Wenya Wang, Sinno Jialin Pan
Large Language Models (LLMs) have demonstrated remarkable capabilities across
numerous tasks, yet they often rely on external context to handle complex
tasks. While retrieval-augmented frameworks traditionally focus on selecting
top-ranked documents in a single pass, many real-world scenarios demand
compositional retrieval, where multiple sources must be combined in a
coordinated manner. In this work, we propose a tri-encoder sequential retriever
that models this process as a Markov Decision Process (MDP), decomposing the
probability of retrieving a set of elements into a sequence of conditional
probabilities and allowing each retrieval step to be conditioned on previously
selected examples. We train the retriever in two stages: first, we efficiently
construct supervised sequential data for initial policy training; we then
refine the policy to align with the LLM's preferences using a reward grounded
in the structural correspondence of generated programs. Experimental results
show that our method consistently and significantly outperforms baselines,
underscoring the importance of explicitly modeling inter-example dependencies.
These findings highlight the potential of compositional retrieval for tasks
requiring multiple pieces of evidence or examples.
Authors' comments: 19 pages, 8 figures
Abraham Itzhak Weinberg
Image retrieval remains a challenging task due to the complex interaction between human visual perception, memory, and computational processes. Current image search engines often struggle to efficiently retrieve images based on natural language descriptions, as they rely on time-consuming preprocessing, tagging, and machine learning pipelines. This paper introduces the Human-Oriented Retrieval Search Engine for Images (HORSE), a novel approach that leverages neuro-symbolic indexing to improve image retrieval by focusing on human-oriented indexing. By integrating cognitive science insights with advanced computational techniques, HORSE enhances the retrieval process, making it more aligned with how humans perceive, store, and recall visual information. The neuro-symbolic framework combines the strengths of neural networks and symbolic reasoning, mitigating their individual limitations. The proposed system optimizes image retrieval, offering a more intuitive and efficient solution for users. We discuss the design and implementation of HORSE, highlight its potential applications in fields such as design error detection and knowledge management, and suggest future directions for research to further refine the system's metrics and capabilities.
Ming Pang, Chunyuan Yuan, Xiaoyu He, Zheng Fang, Donghao Xie, Fanyi Qu, Xue Jiang, Changping Peng et al.
Traditional sparse and dense retrieval methods struggle to leverage general
world knowledge and often fail to capture the nuanced features of queries and
products. With the advent of large language models (LLMs), industrial search
systems have started to employ LLMs to generate identifiers for product
retrieval. Commonly used identifiers include (1) static/semantic IDs and (2)
product term sets. The first approach requires creating a product ID system
from scratch, missing out on the world knowledge embedded within LLMs. While
the second approach leverages this general knowledge, the significant
difference in word distribution between queries and products means that
product-based identifiers often do not align well with user search queries,
leading to missed product recalls. Furthermore, when queries contain numerous
attributes, these algorithms generate a large number of identifiers, making it
difficult to assess their quality, which results in low overall recall
efficiency.
To address these challenges, this paper introduces a novel e-commerce
retrieval paradigm: the Generative Retrieval and Alignment Model (GRAM). GRAM
employs joint training on text information from both queries and products to
generate shared text identifier codes, effectively bridging the gap between
queries and products. This approach not only enhances the connection between
queries and products but also improves inference efficiency. The model uses a
co-alignment strategy to generate codes optimized for maximizing retrieval
efficiency. Additionally, it introduces a query-product scoring mechanism to
compare product values across different codes, further boosting retrieval
efficiency. Extensive offline and online A/B testing demonstrates that GRAM
significantly outperforms traditional models and the latest generative
retrieval models, confirming its effectiveness and practicality.
Authors' comments: Accepted by WWW2025