Caleb G. Abbott, Justin R. Crepp, Brian Sands
The family of multi-plane phase retrieval sensors, such as the curvature and
nonlinear curvature wavefront sensors (WFS), contain tip/tilt information
embedded in their signals. We have built a nonlinear curvature WFS to study
different wavefront reconstruction methods and test the ability to extract
tip/tilt information. Using reliable and fast centroiding algorithms, combined
with knowledge of the measured z-distance to each measurement plane, we
demonstrate that image jitter may be sensed and compensated for using a fast
steering mirror and the WFS alone, i.e. without the need for peripheral
components such as quad-cells or access to a separate scientific imaging
channel. This approach, which is both precise and accurate, corroborates
previous numerical simulations and is expected to improve the overall
reconstruction accuracy of multi-plane phase retrieval sensors including higher
order spatial modes.
Authors' comments: 10 pages, 8 figures, SPIE conference paper
Yao Ding, Yuqing Wu, Ziyang Ding
With the acceleration of technological innovation efficient retrieval and classification of patent literature have become essential for intellectual property management and enterprise RD Traditional keyword and rulebased retrieval methods often fail to address complex query intents or capture semantic associations across technical domains resulting in incomplete and lowrelevance results This study presents an automated patent retrieval framework integrating Large Language Models LLMs with RetrievalAugmented Generation RAG technology The system comprises three components: 1) a preprocessing module for patent data standardization, 2) a highefficiency vector retrieval engine leveraging LLMgenerated embeddings, and 3) a RAGenhanced query module that combines external document retrieval with contextaware response generation Evaluations were conducted on the Google Patents dataset 20062024 containing millions of global patent records with metadata such as filing date domain and status The proposed gpt35turbo0125RAG configuration achieved 805 semantic matching accuracy and 92.1% recall surpassing baseline LLM methods by 28 percentage points The framework also demonstrated strong generalization in crossdomain classification and semantic clustering tasks These results validate the effectiveness of LLMRAG integration for intelligent patent retrieval providing a foundation for nextgeneration AIdriven intellectual property analysis platforms
Bongsu Kim
In dense retrieval, effective training hinges on selecting high quality hard
negatives while avoiding false negatives. Recent methods apply heuristics based
on positive document scores to identify hard negatives, improving both
performance and interpretability. However, these global, example agnostic
strategies often miss instance specific false negatives. To address this, we
propose a learnable adapter module that monitors Bi-Encoder representations to
estimate the likelihood that a hard negative is actually a false negative. This
probability is modeled dynamically and contextually, enabling fine-grained,
query specific judgments. The predicted scores are used in two downstream
components: (1) resampling, where negatives are reweighted during training, and
(2) reranking, where top-k retrieved documents are reordered at inference.
Empirical results on standard benchmarks show that our adapter-enhanced
framework consistently outperforms strong Bi-Encoder baselines, underscoring
the benefit of explicit false negative modeling in dense retrieval.
Authors' comments: 8 pages, 4 figures, submitted to AAAI 2026
Antoine Chaffin, Raphaël Sourty
Neural ranking has become a cornerstone of modern information retrieval.
While single vector search remains the dominant paradigm, it suffers from the
shortcoming of compressing all the information into a single vector. This
compression leads to notable performance degradation in out-of-domain,
long-context, and reasoning-intensive retrieval tasks. Multi-vector approaches
pioneered by ColBERT aim to address these limitations by preserving individual
token embeddings and computing similarity via the MaxSim operator. This
architecture has demonstrated superior empirical advantages, including enhanced
out-of-domain generalization, long-context handling, and performance in complex
retrieval scenarios. Despite these compelling empirical results and clear
theoretical advantages, the practical adoption and public availability of late
interaction models remain low compared to their single-vector counterparts,
primarily due to a lack of accessible and modular tools for training and
experimenting with such models. To bridge this gap, we introduce PyLate, a
streamlined library built on top of Sentence Transformers to support
multi-vector architectures natively, inheriting its efficient training,
advanced logging, and automated model card generation while requiring minimal
code changes to code templates users are already familiar with. By offering
multi-vector-specific features such as efficient indexes, PyLate aims to
accelerate research and real-world application of late interaction models,
thereby unlocking their full potential in modern IR systems. Finally, PyLate
has already enabled the development of state-of-the-art models, including
GTE-ModernColBERT and Reason-ModernColBERT, demonstrating its practical utility
for both research and production environments.
Authors' comments: 5 pages
Pengcheng Wang, Sheng Li, Takahiro Shinozaki
In this paper, we propose RAG-Boost (ST-ShinozakiLab Task I system), which enhances the baseline LLM-based ASR system of the MLC-SLM Challenge (task I) with a retrieval-augmented generation (RAG) module on the fly. Each partial ASR hypothesis queries a vector store of audio-text pairs and domain terms, and the retrieved results are fused with the live ASR hypotheses to fix recognition errors. The fused hypotheses are passed to the LLM, yielding improved responses.
Authors' comments: accepted at Interspeech2025 MLC-SLM Challenge workshop (task I system description)
Jiawei Li, Chengye Yang, Yaochen Zhang, Weilin Sun, Lei Meng, Xiangxu Meng
The goal of construction site risk and hazard identification is to enhance
safety management through automation. Existing research based on large language
models falls into two categories: image-text matching for collaborative
reasoning, which struggles with complex hazard features, and instruction
fine-tuning or dialogue guidance using professional datasets, which suffers
from high training costs and poor generalization.To address this, we propose a
hazard identification method using similar case retrieval enhancement. By
integrating external knowledge and retrieved case contexts via prompt
fine-tuning, we mitigate misjudgments caused by limited domain knowledge and
weak feature associations. Our method includes three modules: retrieval
library, image similarity retrieval, and large model retrieval enhancement,
enabling efficient recognition without training. Experiments on real
construction data show significant improvements. For instance, GLM-4V's
recognition accuracy increased to 50\%, a 35.49\% boost. The method enhances
accuracy, context understanding, and stability, offering new theoretical and
technical support for hazard detection.
Authors' comments: in Chinese language
Kennedy Edemacu, Vinay M. Shashidhar, Micheal Tuape, Dan Abudu, Beakcheol Jang, Jong Wook Kim
Retrieval-Augmented Generation (RAG) has emerged as a powerful approach to boost the capabilities of large language models (LLMs) by incorporating external, up-to-date knowledge sources. However, this introduces a potential vulnerability to knowledge poisoning attacks, where attackers can compromise the knowledge source to mislead the generation model. One such attack is the PoisonedRAG in which the injected adversarial texts steer the model to generate an attacker-chosen response to a target question. In this work, we propose novel defense methods, FilterRAG and ML-FilterRAG, to mitigate the PoisonedRAG attack. First, we propose a new property to uncover distinct properties to differentiate between adversarial and clean texts in the knowledge data source. Next, we employ this property to filter out adversarial texts from clean ones in the design of our proposed approaches. Evaluation of these methods using benchmark datasets demonstrate their effectiveness, with performances close to those of the original RAG systems.
Authors' comments: Preprint for Submission
Rubin Wei, Jiaqi Cao, Jiarui Wang, Jushi Kai, Qipeng Guo, Bowen Zhou, Zhouhan Lin
While modern decoder-only LLMs achieve superior performance across various domains, hallucinations have risen to be a common problem in their generated text, hindering their application in knowledge-intensive tasks. Retriever-augmented generation (RAG) offers a solution, but the non-parametric nature of the retriever hinders its deep interaction with LLM. In this work, we propose to decouple memorization from the LLM decoder using a pretrained, differentiable external memory. The external memory is an MLP pretrained by imitating the behavior of a retriever on the entire pretraining dataset. Our resulting architecture, which comprises a transformer decoder and an external MLP memory pretrained on language modeling and retriever imitation respectively, demonstrates strong perplexity and performance on downstream tasks. Experiments show our architecture exhibits steeper power-law scaling with model size, achieving 17.5% and 24.1% improvement on WikiText-103 and Web datasets compared to decoder-only models while benefiting from added training without overfitting. We demonstrate superior performance on three hallucination benchmarks and nine memory-intensive tasks. Additionally, our approach delivers $80\times$ speedup over $k$NN-LM (500M tokens) and $1.3\times$ faster inference than decoder-only models. Unlike $k$NN-LM, which impairs reasoning, our MLP memory improves StrategyQA performance. We will open-source our code and models in the future.
Yi Jiang, Sendong Zhao, Jianbo Li, Haochun Wang, Lizhe Zhang, Yan Liu, Bin Qin
Retrieval-Augmented Generation (RAG) has emerged as a promising framework for
enhancing the capabilities of Large Language Models (LLMs), especially in
knowledge-intensive tasks. Despite its advantages, current RAG methods often
struggle to *fully exploit knowledge during generation*. In particular, the
synergy between the model's internal parametric knowledge and external
retrieved knowledge remains limited. Retrieved contents may sometimes mislead
generation, while certain generated content can guide the model toward more
accurate outputs. In this work, we propose Collaborative Chain-of-Agents, a
framework designed to enhance explicitly synergy over both parametric and
retrieved knowledge. Specifically, we first introduce CoCoA-zero, a multi-agent
RAG framework that first performs conditional knowledge induction and then
reasons answers. Building on this, we develop CoCoA, a long-chain training
strategy that synthesizes extended multi-agent reasoning trajectories from
CoCoA-zero to fine-tune the LLM. This strategy enhances the model's capability
to explicitly integrate and jointly leverage parametric and retrieved
knowledge. Experiments results show that CoCoA-zero and CoCoA achieve superior
performance on open-domain and multi-hop QA tasks.
Authors' comments: code available at https://github.com/liunian-Jay/CoCoA
Sateesh Kumar, Shivin Dass, Georgios Pavlakos, Roberto Martín-Martín
In this work, we study the problem of data retrieval for few-shot imitation
learning: selecting data from a large dataset to train a performant policy for
a specific task, given only a few target demonstrations. Prior methods retrieve
data using a single-feature distance heuristic, assuming that the best
demonstrations are those that most closely resemble the target examples in
visual, semantic, or motion space. However, this approach captures only a
subset of the relevant information and can introduce detrimental
demonstrations, e.g., retrieving data from unrelated tasks due to similar scene
layouts, or selecting similar motions from tasks with divergent goals. We
present COLLAGE, a method for COLLective data AGgrEgation in few-shot imitation
learning that uses an adaptive late fusion mechanism to guide the selection of
relevant demonstrations based on a task-specific combination of multiple cues.
COLLAGE follows a simple, flexible, and efficient recipe: it assigns weights to
subsets of the dataset that are pre-selected using a single feature (e.g.,
appearance, shape, or language similarity), based on how well a policy trained
on each subset predicts actions in the target demonstrations. These weights are
then used to perform importance sampling during policy training, sampling data
more densely or sparsely according to estimated relevance. COLLAGE is general
and feature-agnostic, allowing it to combine any number of subsets selected by
any retrieval heuristic, and to identify which subsets provide the greatest
benefit for the target task. In extensive experiments, COLLAGE outperforms
state-of-the-art retrieval and multi-task learning approaches by 5.1% in
simulation across 10 tasks, and by 16.6% in the real world across 6 tasks,
where we perform retrieval from the large-scale DROID dataset. More information
at https://robin-lab.cs.utexas.edu/COLLAGE .
Authors' comments: Accepted at the Conference on Robot Learning (CoRL), 2025. Project
page: https://robin-lab.cs.utexas.edu/COLLAGE
Yiqun Chen, Erhan Zhang, Lingyong Yan, Shuaiqiang Wang, Jizhou Huang, Dawei Yin, Jiaxin Mao
In question-answering (QA) systems, Retrieval-Augmented Generation (RAG) has become pivotal in enhancing response accuracy and reducing hallucination issues. The architecture of RAG systems varies significantly, encompassing single-round RAG, iterative RAG, and reasoning RAG, each tailored to address different types of queries. Due to the varying complexity of real-world queries, a fixed RAG pipeline often struggles to balance performance and cost efficiency across different queries. To address this challenge, we propose an adaptive RAG framework called MAO-ARAG, which leverages multi-agent orchestration. Our adaptive RAG is conceived as a multi-turn framework. Specifically, we define multiple executor agents, representing typical RAG modules such as query reformulation agents, document selection agent, and generation agents. A planner agent intelligently selects and integrates the appropriate agents from these executors into a suitable workflow tailored for each query, striving for high-quality answers while maintaining reasonable costs. During each turn, the planner agent is trained using reinforcement learning, guided by an outcome-based reward (F1 score) and a cost-based penalty, continuously improving answer quality while keeping costs within a reasonable range. Experiments conducted on multiple QA datasets demonstrate that our approach, which dynamically plans workflows for each query, not only achieves high answer quality but also maintains both cost and latency within acceptable limits.The code of MAO-ARAG is on https://github.com/chenyiqun/Agentic-RAG.
Sebastian Wind, Jeta Sopa, Daniel Truhn, Mahshad Lotfinia, Tri-Thien Nguyen, Keno Bressem, Lisa Adams, Mirabela Rusu et al.
Clinical decision-making in radiology increasingly benefits from artificial intelligence (AI), particularly through large language models (LLMs). However, traditional retrieval-augmented generation (RAG) systems for radiology question answering (QA) typically rely on single-step retrieval, limiting their ability to handle complex clinical reasoning tasks. Here we propose an agentic RAG framework enabling LLMs to autonomously decompose radiology questions, iteratively retrieve targeted clinical evidence from Radiopaedia, and dynamically synthesize evidence-based responses. We evaluated 24 LLMs spanning diverse architectures, parameter scales (0.5B to >670B), and training paradigms (general-purpose, reasoning-optimized, clinically fine-tuned), using 104 expert-curated radiology questions from previously established RSNA-RadioQA and ExtendedQA datasets. Agentic retrieval significantly improved mean diagnostic accuracy over zero-shot prompting (73% vs. 64%; P<0.001) and conventional online RAG (73% vs. 68%; P<0.001). The greatest gains occurred in mid-sized models (e.g., Mistral Large improved from 72% to 81%) and small-scale models (e.g., Qwen 2.5-7B improved from 55% to 71%), while very large models (>200B parameters) demonstrated minimal changes (<2% improvement). Additionally, agentic retrieval reduced hallucinations (mean 9.4%) and retrieved clinically relevant context in 46% of cases, substantially aiding factual grounding. Even clinically fine-tuned models exhibited meaningful improvements (e.g., MedGemma-27B improved from 71% to 81%), indicating complementary roles of retrieval and fine-tuning. These results highlight the potential of agentic frameworks to enhance factuality and diagnostic accuracy in radiology QA, particularly among mid-sized LLMs, warranting future studies to validate their clinical utility.
Wenchao Gu, Zongyi Lyu, Yanlin Wang, Hongyu Zhang, Cuiyun Gao, Michael R. Lyu
Code retrieval aims to provide users with desired code snippets based on users' natural language queries. With the development of deep learning technologies, adopting pre-trained models for this task has become mainstream. Considering the retrieval efficiency, most of the previous approaches adopt a dual-encoder for this task, which encodes the description and code snippet into representation vectors, respectively. However, the model structure of the dual-encoder tends to limit the model's performance, since it lacks the interaction between the code snippet and description at the bottom layer of the model during training. To improve the model's effectiveness while preserving its efficiency, we propose a framework, which adopts Self-AdaPtive Model Distillation for Efficient CodE Retrieval, named SPENCER. SPENCER first adopts the dual-encoder to narrow the search space and then adopts the cross-encoder to improve accuracy. To improve the efficiency of SPENCER, we propose a novel model distillation technique, which can greatly reduce the inference time of the dual-encoder while maintaining the overall performance. We also propose a teaching assistant selection strategy for our model distillation, which can adaptively select the suitable teaching assistant models for different pre-trained models during the model distillation to ensure the model performance. Extensive experiments demonstrate that the combination of dual-encoder and cross-encoder improves overall performance compared to solely dual-encoder-based models for code retrieval. Besides, our model distillation technique retains over 98% of the overall performance while reducing the inference time of the dual-encoder by 70%.
Moumita Asad, Rafed Muhammad Yasir, Armin Geramirad, Sam Malek
Information Retrieval-based Bug Localization aims to identify buggy source files for a given bug report. While existing approaches -- ranging from vector space models to deep learning models -- have shown potential in this domain, their effectiveness is often limited by the vocabulary mismatch between bug reports and source code. To address this issue, we propose a novel Large Language Model (LLM) based bug localization approach, called GenLoc. Given a bug report, GenLoc leverages an LLM equipped with code-exploration functions to iteratively analyze the code base and identify potential buggy files. To gather better context, GenLoc may optionally retrieve semantically relevant files using vector embeddings. GenLoc has been evaluated on over 9,000 real-world bug reports from six large-scale Java projects. Experimental results show that GenLoc outperforms five state-of-the-art bug localization techniques across multiple metrics, achieving an average improvement of more than 60\% in Accuracy@1.
Salah Eddine Bekhouche, Azeddine Benlamoudi, Yazid Bounab, Fadi Dornaika, Abdenour Hadid
Arabic poses a particular challenge for natural language processing (NLP) and information retrieval (IR) due to its complex morphology, optional diacritics and the coexistence of Modern Standard Arabic (MSA) and various dialects. Despite the growing global significance of Arabic, it is still underrepresented in NLP research and benchmark resources. In this paper, we present an enhanced Dense Passage Retrieval (DPR) framework developed specifically for Arabic. At the core of our approach is a novel Attentive Relevance Scoring (ARS) that replaces standard interaction mechanisms with an adaptive scoring function that more effectively models the semantic relevance between questions and passages. Our method integrates pre-trained Arabic language models and architectural refinements to improve retrieval performance and significantly increase ranking accuracy when answering Arabic questions. The code is made publicly available at \href{https://github.com/Bekhouche/APR}{GitHub}.
Daeyong Kwon, SeungHeon Doh, Juhan Nam
Recent advancements in Large language models (LLMs) have demonstrated
remarkable capabilities across diverse domains. While they exhibit strong
zero-shot performance on various tasks, LLMs' effectiveness in music-related
applications remains limited due to the relatively small proportion of
music-specific knowledge in their training data. To address this limitation, we
propose MusT-RAG, a comprehensive framework based on Retrieval Augmented
Generation (RAG) to adapt general-purpose LLMs for text-only music question
answering (MQA) tasks. RAG is a technique that provides external knowledge to
LLMs by retrieving relevant context information when generating answers to
questions. To optimize RAG for the music domain, we (1) propose MusWikiDB, a
music-specialized vector database for the retrieval stage, and (2) utilizes
context information during both inference and fine-tuning processes to
effectively transform general-purpose LLMs into music-specific models. Our
experiment demonstrates that MusT-RAG significantly outperforms traditional
fine-tuning approaches in enhancing LLMs' music domain adaptation capabilities,
showing consistent improvements across both in-domain and out-of-domain MQA
benchmarks. Additionally, our MusWikiDB proves substantially more effective
than general Wikipedia corpora, delivering superior performance and
computational efficiency.
Authors' comments: 8 pages, 2 figures
Roxana Petcu, Samarth Bhargav, Maarten de Rijke, Evangelos Kanoulas
Understanding and solving complex reasoning tasks is vital for addressing the information needs of a user. Although dense neural models learn contextualised embeddings, they still underperform on queries containing negation. To understand this phenomenon, we study negation in both traditional neural information retrieval and LLM-based models. We (1) introduce a taxonomy of negation that derives from philosophical, linguistic, and logical definitions; (2) generate two benchmark datasets that can be used to evaluate the performance of neural information retrieval models and to fine-tune models for a more robust performance on negation; and (3) propose a logic-based classification mechanism that can be used to analyze the performance of retrieval models on existing datasets. Our taxonomy produces a balanced data distribution over negation types, providing a better training setup that leads to faster convergence on the NevIR dataset. Moreover, we propose a classification schema that reveals the coverage of negation types in existing datasets, offering insights into the factors that might affect the generalization of fine-tuned models on negation.
Aryan Raj, Astitva Veer Garg, Anitha D
Retrieval-Augmented Language Models (RALMs) face significant challenges in
reducing factual errors, particularly in document relevance evaluation and
knowledge integration. We introduce a framework for structured relevance
assessment that enhances RALM robustness through improved document evaluation,
balanced intrinsic and external knowledge integration, and effective handling
of unanswerable queries. Our approach employs a multi-dimensional scoring
system that considers both semantic matching and source reliability, utilizing
embedding-based relevance scoring and synthetic training data with
mixed-quality documents. We implement specialized benchmarking on niche topics,
a knowledge integration mechanism, and an "unknown" response protocol for
queries with insufficient knowledge coverage. Preliminary evaluations
demonstrate significant reductions in hallucination rates and improved
transparency in reasoning processes. Our framework advances the development of
more reliable question-answering systems capable of operating effectively in
dynamic environments with variable data quality. While challenges persist in
accurately distinguishing credible information and balancing system latency
with thoroughness, this work represents a meaningful step toward enhancing RALM
reliability.
Authors' comments: International Conference on ICT for Sustainable Development (ICT4SD)
George Ibrahim, Rita Ramos, Yova Kementchedjhieva
Multilingual vision-language models have made significant strides in image
captioning, yet they still lag behind their English counterparts due to limited
multilingual training data and costly large-scale model parameterization.
Retrieval-augmented generation (RAG) offers a promising alternative by
conditioning caption generation on retrieved examples in the target language,
reducing the need for extensive multilingual training. However, multilingual
RAG captioning models often depend on retrieved captions translated from
English, which can introduce mismatches and linguistic biases relative to the
source language. We introduce CONCAP, a multilingual image captioning model
that integrates retrieved captions with image-specific concepts, enhancing the
contextualization of the input image and grounding the captioning process
across different languages. Experiments on the XM3600 dataset indicate that
CONCAP enables strong performance on low- and mid-resource languages, with
highly reduced data requirements. Our findings highlight the effectiveness of
concept-aware retrieval augmentation in bridging multilingual performance gaps.
Authors' comments: Published as a conference paper at COLM 2025
Shreya Meel, Sennur Ulukus
We introduce the problem of symmetric private information retrieval (SPIR) on replicated databases modeled by a simple graph. In this model, each vertex corresponds to a server, and a message is replicated on two servers if and only if there is an edge between them. We consider the setting where the server-side common randomness necessary to accomplish SPIR is also replicated at the servers according to the graph, and we call this as message-specific common randomness. In this setting, we establish a lower bound on the SPIR capacity, i.e., the maximum download rate, for general graphs, by proposing an achievable SPIR scheme. Next, we prove that, for any SPIR scheme to be feasible, the minimum size of message-specific randomness should be equal to the size of a message. Finally, by providing matching upper bounds, we derive the exact SPIR capacity for the class of path and regular graphs.