B. Ghosh, H. Harikumar, S. Rana
Nearest-neighbour retrieval is central to classification and explainable-AI pipelines, but current practice relies on hand-tuning feature layers and distance metrics. We propose Targeted Manifold Manipulation-Nearest Neighbour (TMM-NN), which reconceptualises retrieval by assessing how readily each sample can be nudged into a designated region of the feature manifold; neighbourhoods are defined by a sample's responsiveness to a targeted perturbation rather than absolute geometric distance. TMM-NN implements this through a lightweight, query-specific trigger patch. The patch is added to the query image, and the network is weakly ``backdoored'' so that any input with the patch is steered toward a dummy class. Images similar to the query need only a slight shift and are classified as the dummy class with high probability, while dissimilar ones are less affected. By ranking candidates by this confidence, TMM-NN retrieves the most semantically related neighbours. Robustness analysis and benchmark experiments confirm this trigger-based ranking outperforms traditional metrics under noise and across diverse tasks.
Nathan Scales, Nathanael Schärli, Olivier Bousquet
Despite the popularity of retrieval-augmented generation (RAG) as a solution for grounded QA in both academia and industry, current RAG methods struggle with questions where the necessary information is distributed across many documents or where retrieval needs to be combined with complex reasoning. Recently, the LOFT study has shown that this limitation also applies to approaches based on long-context language models, with the QUEST benchmark exhibiting particularly large headroom. In this paper, we provide an in-depth analysis of the factors contributing to the poor performance on QUEST-LOFT, publish updated numbers based on a thorough human evaluation, and demonstrate that RAG can be optimized to significantly outperform long-context approaches when combined with a structured output format containing reasoning and evidence, optionally followed by answer re-verification.
Muzakkiruddin Ahmed Mohammed, John R. Talburt, Leon Claasssens, Adriaan Marais
Industrial part specification extraction from unstructured text remains a persistent challenge in manufacturing, procurement, and maintenance, where manual processing is both time-consuming and error-prone. This paper introduces a retrieval-augmented multi-LLM ensemble framework that orchestrates nine state-of-the-art Large Language Models (LLMs) within a structured three-phase pipeline. RAGsemble addresses key limitations of single-model systems by combining the complementary strengths of model families including Gemini (2.0, 2.5, 1.5), OpenAI (GPT-4o, o4-mini), Mistral Large, and Gemma (1B, 4B, 3n-e4b), while grounding outputs in factual data using FAISS-based semantic retrieval. The system architecture consists of three stages: (1) parallel extraction by diverse LLMs, (2) targeted research augmentation leveraging high-performing models, and (3) intelligent synthesis with conflict resolution and confidence-aware scoring. RAG integration provides real-time access to structured part databases, enabling the system to validate, refine, and enrich outputs through similarity-based reference retrieval. Experimental results using real industrial datasets demonstrate significant gains in extraction accuracy, technical completeness, and structured output quality compared to leading single-LLM baselines. Key contributions include a scalable ensemble architecture for industrial domains, seamless RAG integration throughout the pipeline, comprehensive quality assessment mechanisms, and a production-ready solution suitable for deployment in knowledge-intensive manufacturing environments.
Authors' comments: The 17th International Conference on Knowledge and Systems Engineering
Mile Stankovic
Chunking quality determines RAG system performance. Current methods partition documents individually, but complex queries need information scattered across multiple sources: the knowledge fragmentation problem. We introduce Cross-Document Topic-Aligned (CDTA) chunking, which reconstructs knowledge at the corpus level. It first identifies topics across documents, maps segments to each topic, and synthesizes them into unified chunks. On HotpotQA multi-hop reasoning, our method reached 0.93 faithfulness versus 0.83 for contextual retrieval and 0.78 for semantic chunking, a 12% improvement over current industry best practice (p < 0.05). On UAE Legal texts, it reached 0.94 faithfulness with 0.93 citation accuracy. At k = 3, it maintains 0.91 faithfulness while semantic methods drop to 0.68, with a single CDTA chunk containing information requiring multiple traditional fragments. Indexing costs are higher, but synthesis produces information-dense chunks that reduce query-time retrieval needs. For high-query-volume applications with distributed knowledge, cross-document synthesis improves measurably over within-document optimization.
Fei Yu, Quan Deng, Shengeng Tang, Yuehua Li, Lechao Cheng
Understanding 3D scenes in open-world settings poses fundamental challenges
for vision and robotics, particularly due to the limitations of
closed-vocabulary supervision and static annotations. To address this, we
propose a unified framework for Open-World 3D Scene Graph Generation with
Retrieval-Augmented Reasoning, which enables generalizable and interactive 3D
scene understanding. Our method integrates Vision-Language Models (VLMs) with
retrieval-based reasoning to support multimodal exploration and language-guided
interaction. The framework comprises two key components: (1) a dynamic scene
graph generation module that detects objects and infers semantic relationships
without fixed label sets, and (2) a retrieval-augmented reasoning pipeline that
encodes scene graphs into a vector database to support text/image-conditioned
queries. We evaluate our method on 3DSSG and Replica benchmarks across four
tasks-scene question answering, visual grounding, instance retrieval, and task
planning-demonstrating robust generalization and superior performance in
diverse environments. Our results highlight the effectiveness of combining
open-vocabulary perception with retrieval-based reasoning for scalable 3D scene
understanding.
Authors' comments: Accepted by AAAI 2026
Dhananjay Ashok, Suraj Nair, Mutasem Al-Darabsah, Choon Hui Teo, Tarun Agarwal, Jonathan May
Zero-shot dense retrieval is a challenging setting where a document corpus is
provided without relevant queries, necessitating a reliance on pretrained dense
retrievers (DRs). However, since these DRs are not trained on the target
corpus, they struggle to represent semantic differences between similar
documents. To address this failing, we introduce a training-free representation
sharpening framework that augments a document's representation with information
that helps differentiate it from similar documents in the corpus. On over
twenty datasets spanning multiple languages, the representation sharpening
framework proves consistently superior to traditional retrieval, setting a new
state-of-the-art on the BRIGHT benchmark. We show that representation
sharpening is compatible with prior approaches to zero-shot dense retrieval and
consistently improves their performance. Finally, we address the
performance-cost tradeoff presented by our framework and devise an
indexing-time approximation that preserves the majority of our performance
gains over traditional retrieval, yet suffers no additional inference-time
cost.
Authors' comments: 15 pages, 4 figures
Chao Zhang, Yuhao Wang, Derong Xu, Haoxin Zhang, Yuanjie Lyu, Yuhao Chen, Shuochen Liu, Tong Xu et al.
Retrieval-Augmented Generation (RAG) utilizes external knowledge to augment
Large Language Models' (LLMs) reliability. For flexibility, agentic RAG employs
autonomous, multi-round retrieval and reasoning to resolve queries. Although
recent agentic RAG has improved via reinforcement learning, they often incur
substantial token overhead from search and reasoning processes. This trade-off
prioritizes accuracy over efficiency. To address this issue, this work proposes
TeaRAG, a token-efficient agentic RAG framework capable of compressing both
retrieval content and reasoning steps. 1) First, the retrieved content is
compressed by augmenting chunk-based semantic retrieval with a graph retrieval
using concise triplets. A knowledge association graph is then built from
semantic similarity and co-occurrence. Finally, Personalized PageRank is
leveraged to highlight key knowledge within this graph, reducing the number of
tokens per retrieval. 2) Besides, to reduce reasoning steps, Iterative
Process-aware Direct Preference Optimization (IP-DPO) is proposed.
Specifically, our reward function evaluates the knowledge sufficiency by a
knowledge matching mechanism, while penalizing excessive reasoning steps. This
design can produce high-quality preference-pair datasets, supporting iterative
DPO to improve reasoning conciseness. Across six datasets, TeaRAG improves the
average Exact Match by 4% and 2% while reducing output tokens by 61% and 59% on
Llama3-8B-Instruct and Qwen2.5-14B-Instruct, respectively. Code is available at
https://github.com/Applied-Machine-Learning-Lab/TeaRAG.
Authors' comments: 32 pages
Pablo A. Peña R., James S. Jenkins
We present \texttt{EMPEROR}, an open-source Python framework designed for
efficient exoplanet detection and characterisation with radial velocities (RV).
\texttt{EMPEROR} integrates Dynamic Nested Sampling (DNS) and Adaptive Parallel
Tempering (APT) Markov Chain Monte Carlo (MCMC), supporting multiple noise
models such as Gaussian Processes (GPs) and Moving Averages (MA). The framework
enables systematic model comparison using statistical metrics, including
Bayesian evidence ($\ln{\mathcal{Z}}$) and Bayesian Information Criterion
(BIC), while providing automated, publish-ready visualisations.
\texttt{EMPEROR} is evaluated across three distinct systems to assess its
capabilities in different detection scenarios. Sampling performance, model
selection, and the search for Earth-mass planets are evaluated in data for 51
Pegasi, HD 55693 and Barnard's Star (GJ 699). For 51 Pegasi, APT achieves an
effective sampling increase over DNS by a factor 3.76, while retrieving tighter
parameter estimates. For HD 55693 the stellar rotation
$P_{\text{rot}}=29.72^{+0.01}_{-0.02}$ and magnetic cycle
$P_{\text{mag}}=2557.0^{+70.1}_{-36.7}$ are recovered, while demonstrating the
sensitivity of $\ln{\mathcal{Z}}$ to prior selection. For Barnard's star,
several noise models are compared, and the confirmed planet parameters are
successfully retrieved with all of them. The best model shows a period of
3.1536$\pm$0.0003~d, minimum mass of 0.38$\pm$0.03 M$_{\rm{\oplus}}$, and
semi-major axis of 0.02315$\pm$0.00039~AU. Purely statistical inference might
be insufficient on its own for robust exoplanet detection. Effective
methodologies must integrate domain knowledge, heuristic criteria, and
multi-faceted model comparisons. The versatility of \texttt{EMPEROR} in
handling diverse noise structures, its systematic model selection, and its
improved performance make it a valuable tool for RV exoplanetary studies.
Authors' comments: Accepted for publication in A&A Sect. 15. Numerical methods and
codes. The official acceptance date is 19/10/2025
Grigory Kovalev, Natalia Loukachevitch, Mikhail Tikhomirov, Olga Babina, Pavel Mamaev
In this paper, we present a novel series of Russian information retrieval datasets constructed from the "Did you know..." section of Russian Wikipedia. Our datasets support a range of retrieval tasks, including fact-checking, retrieval-augmented generation, and full-document retrieval, by leveraging interesting facts and their referenced Wikipedia articles annotated at the sentence level with graded relevance. We describe the methodology for dataset creation that enables the expansion of existing Russian Information Retrieval (IR) resources. Through extensive experiments, we extend the RusBEIR research by comparing lexical retrieval models, such as BM25, with state-of-the-art neural architectures fine-tuned for Russian, as well as multilingual models. Results of our experiments show that lexical methods tend to outperform neural models on full-document retrieval, while neural approaches better capture lexical semantics in shorter texts, such as in fact-checking or fine-grained retrieval. Using our newly created datasets, we also analyze the impact of document length on retrieval performance and demonstrate that combining retrieval with neural reranking consistently improves results. Our contribution expands the resources available for Russian information retrieval research and highlights the importance of accurate evaluation of retrieval models to achieve optimal performance. All datasets are publicly available at HuggingFace. To facilitate reproducibility and future research, we also release the full implementation on GitHub.
Ha Young Kim, Jun Li, Ana Beatriz Solana, Carolin M. Pirkl, Benedikt Wiestler, Julia A. Schnabel, Cosmin I. Bercea
Rare diseases represent the long tail of medical imaging, where AI models
often fail due to the scarcity of representative training data. In clinical
workflows, radiologists frequently consult case reports and literature when
confronted with unfamiliar findings. Following this line of reasoning, we
introduce RADAR, Retrieval Augmented Diagnostic Reasoning Agents, an agentic
system for rare disease detection in brain MRI. Our approach uses AI agents
with access to external medical knowledge by embedding both case reports and
literature using sentence transformers and indexing them with FAISS to enable
efficient similarity search. The agent retrieves clinically relevant evidence
to guide diagnostic decision making on unseen diseases, without the need of
additional training. Designed as a model-agnostic reasoning module, RADAR can
be seamlessly integrated with diverse large language models, consistently
improving their rare pathology recognition and interpretability. On the NOVA
dataset comprising 280 distinct rare diseases, RADAR achieves up to a 10.2%
performance gain, with the strongest improvements observed for open source
models such as DeepSeek. Beyond accuracy, the retrieved examples provide
interpretable, literature grounded explanations, highlighting
retrieval-augmented reasoning as a powerful paradigm for low-prevalence
conditions in medical imaging.
Authors' comments: Submitted on behalf of the PREDICTOM consortium
Yuanning Cui, Zequn Sun, Wei Hu, Zhangjie Fu
Large language models (LLMs) excel at reasoning but struggle with knowledge-intensive questions due to limited context and parametric knowledge. However, existing methods that rely on finetuned LLMs or GNN retrievers are limited by dataset-specific tuning and scalability on large or unseen graphs. We propose the LLM-KGFR collaborative framework, where an LLM works with a structured retriever, the Knowledge Graph Foundation Retriever (KGFR). KGFR encodes relations using LLM-generated descriptions and initializes entities based on their roles in the question, enabling zero-shot generalization to unseen KGs. To handle large graphs efficiently, it employs Asymmetric Progressive Propagation (APP)- a stepwise expansion that selectively limits high-degree nodes while retaining informative paths. Through node-, edge-, and path-level interfaces, the LLM iteratively requests candidate answers, supporting facts, and reasoning paths, forming a controllable reasoning loop. Experiments demonstrate that LLM-KGFR achieves strong performance while maintaining scalability and generalization, providing a practical solution for KG-augmented reasoning.
Manh Nguyen, Sunil Gupta, Dai Do, Hung Le
Hallucination mitigation remains a persistent challenge for large language models (LLMs), even as model scales grow. Existing approaches often rely on external knowledge sources, such as structured databases or knowledge graphs, accessed through prompting or retrieval. However, prompt-based grounding is fragile and domain-sensitive, while symbolic knowledge integration incurs heavy retrieval and formatting costs. Motivated by knowledge graphs, we introduce Graph-Retrieved Adaptive Decoding (GRAD), a decoding-time method that grounds generation in corpus-derived evidence without retraining. GRAD constructs a sparse token transition graph by accumulating next-token logits across a small retrieved corpus in a single forward pass. During decoding, graph-retrieved logits are max-normalized and adaptively fused with model logits to favor high-evidence continuations while preserving fluency. Across three models and a range of question-answering benchmarks spanning intrinsic, extrinsic hallucination, and factuality tasks, GRAD consistently surpasses baselines, achieving up to 9.7$\%$ higher intrinsic accuracy, 8.6$\%$ lower hallucination rates, and 6.9$\%$ greater correctness compared to greedy decoding, while attaining the highest truth--informativeness product score among all methods. GRAD offers a lightweight, plug-and-play alternative to contrastive decoding and knowledge graph augmentation, demonstrating that statistical evidence from corpus-level token transitions can effectively steer generation toward more truthful and verifiable outputs.
Yinsicheng Jiang, Yeqi Huang, Liang Cheng, Cheng Deng, Xuan Sun, Luo Mai
Retrieval-augmented generation (RAG) enhances large language models (LLMs) with retrieved context but often suffers from downgraded prefill performance as modern applications demand longer and more complex inputs. Existing caching techniques either preserve accuracy with low cache reuse or improve reuse at the cost of degraded reasoning quality. We present RAGBoost, an efficient RAG system that achieves high cache reuse without sacrificing accuracy through accuracy-preserving context reuse. RAGBoost detects overlapping retrieved items across concurrent sessions and multi-turn interactions, using efficient context indexing, ordering, and de-duplication to maximize reuse, while lightweight contextual hints maintain reasoning fidelity. It integrates seamlessly with existing LLM inference engines and improves their prefill performance by 1.5-3X over state-of-the-art methods, while preserving or even enhancing reasoning accuracy across diverse RAG and agentic AI workloads. Our code is released at: https://github.com/Edinburgh-AgenticAI/RAGBoost.
Hung-Ting Chen, Xiang Liu, Shauli Ravfogel, Eunsol Choi
Most text retrievers generate \emph{one} query vector to retrieve relevant documents. Yet, the conditional distribution of relevant documents for the query may be multimodal, e.g., representing different interpretations of the query. We first quantify the limitations of existing retrievers. All retrievers we evaluate struggle more as the distance between target document embeddings grows. To address this limitation, we develop a new retriever architecture, \emph{A}utoregressive \emph{M}ulti-\emph{E}mbedding \emph{R}etriever (AMER). Our model autoregressively generates multiple query vectors, and all the predicted query vectors are used to retrieve documents from the corpus. We show that on the synthetic vectorized data, the proposed method could capture multiple target distributions perfectly, showing 4x better performance than single embedding model. We also fine-tune our model on real-world multi-answer retrieval datasets and evaluate in-domain. AMER presents 4 and 21\% relative gains over single-embedding baselines on two datasets we evaluate on. Furthermore, we consistently observe larger gains on the subset of dataset where the embeddings of the target documents are less similar to each other. We demonstrate the potential of using a multi-query vector retriever and open up a new direction for future work.
Artur Iasenovets, Fei Tang, Huihui Zhu, Ping Wang, Lei Liu
Permissioned blockchains ensure integrity and auditability of shared data but
expose query parameters to peers during read operations, creating privacy risks
for organizations querying sensitive records. This paper proposes a Private
Information Retrieval (PIR) mechanism to enable private reads from Hyperledger
Fabric's world state, allowing endorsing peers to process encrypted queries
without learning which record is accessed. We implement and benchmark a
PIR-enabled chaincode that performs ciphertext-plaintext (ct-pt) homomorphic
multiplication directly within evaluate transactions, preserving Fabric's
endorsement and audit semantics. The prototype achieves an average end-to-end
latency of 113 ms and a peer-side execution time below 42 ms, with
approximately 2 MB of peer network traffic per private read in development
mode--reducible by half under in-process deployment. Storage profiling across
three channel configurations shows near-linear growth: block size increases
from 77 kilobytes to 294 kilobytes and world-state from 112 kilobytes to 332
kilobytes as the ring dimension scales from 8,192 to 32,768 coefficients.
Parameter analysis further indicates that ring size and record length jointly
constrain packing capacity, supporting up to 512 records of 64 bytes each under
the largest configuration. These results confirm the practicality of PIR-based
private reads in Fabric for smaller, sensitive datasets and highlight future
directions to optimize performance and scalability.
Authors' comments: This work has been submitted to IEEE for possible publication
Rajan Das Gupta, Md Kishor Morol, Nafiz Fahad, Md Tanzib Hosain, Sumaya Binte Zilani Choya, Md Jakir Hossen
As the global burden of Alzheimer's disease (AD) continues to grow, early and
accurate detection has become increasingly critical, especially in regions with
limited access to advanced diagnostic tools. We propose BRAINS (Biomedical
Retrieval-Augmented Intelligence for Neurodegeneration Screening) to address
this challenge. This novel system harnesses the powerful reasoning capabilities
of Large Language Models (LLMs) for Alzheimer's detection and monitoring.
BRAINS features a dual-module architecture: a cognitive diagnostic module and a
case-retrieval module. The Diagnostic Module utilizes LLMs fine-tuned on
cognitive and neuroimaging datasets -- including MMSE, CDR scores, and brain
volume metrics -- to perform structured assessments of Alzheimer's risk.
Meanwhile, the Case Retrieval Module encodes patient profiles into latent
representations and retrieves similar cases from a curated knowledge base.
These auxiliary cases are fused with the input profile via a Case Fusion Layer
to enhance contextual understanding. The combined representation is then
processed with clinical prompts for inference. Evaluations on real-world
datasets demonstrate BRAINS effectiveness in classifying disease severity and
identifying early signs of cognitive decline. This system not only shows strong
potential as an assistive tool for scalable, explainable, and early-stage
Alzheimer's disease detection, but also offers hope for future applications in
the field.
Authors' comments: Accepted for publication in ICMLA 2025
Borong Zhang, Junjing Deng, Yi Jiang, Zichao Wendy Di
We present eMAGPIE (extended Multilevel-Adaptive-Guided Ptychographic Iterative Engine), a stochastic multigrid method for blind ptychographic phase retrieval that jointly recovers the object and the probe. We recast the task as the iterative minimization of a quadratic surrogate that majorizes the exit-wave misfit. From this surrogate, we derive closed-form updates, combined in a geometric-mean, phase-aligned joint step, yielding a simultaneous update of the object and probe with guaranteed descent of the sampled surrogate. This formulation naturally admits a multigrid acceleration that speeds up convergence. In experiments, eMAGPIE attains lower data misfit and phase error at comparable compute budgets and produces smoother, artifact-reduced phase reconstructions.
Sagar Dutta, Vipul Arora
This work presents a supervised deep hashing method for retrieving similar audio events. The proposed method, named AudioNet, is a deep-learning-based system for efficient hashing and retrieval of similar audio events using an audio example as a query. AudioNet achieves high retrieval performance on multiple standard datasets by generating binary hash codes for similar audio events, setting new benchmarks in the field, and highlighting its efficacy and effectiveness compare to other hashing methods. Through comprehensive experiments on standard datasets, our research represents a pioneering effort in evaluating the retrieval performance of similar audio events. A novel loss function is proposed which incorporates weighted contrastive and weighted pairwise loss along with hashcode balancing to improve the efficiency of audio event retrieval. The method adopts discrete gradient propagation, which allows gradients to be propagated through discrete variables during backpropagation. This enables the network to optimize the discrete hash codes using standard gradient-based optimization algorithms, which are typically used for continuous variables. The proposed method showcases promising retrieval performance, as evidenced by the experimental results, even when dealing with imbalanced datasets. The systematic analysis conducted in this study further supports the significant benefits of the proposed method in retrieval performance across multiple datasets. The findings presented in this work establish a baseline for future studies on the efficient retrieval of similar audio events using deep audio embeddings.
Pavan Kumar Perepu
Mathematical expressions (MEs) have complex two-dimensional structures in which symbols can be present at any nested depth like superscripts, subscripts, above, below etc. As MEs are represented using LaTeX format, several text retrieval methods based on string matching, vector space models etc., have also been applied for ME retrieval problem in the literature. As these methods are based on syntactic similarity, recently deep learning approaches based on embedding have been used for semantic similarity. In our present work, we have focused on the retrieval of mathematical expressions using deep learning approaches. In our approach, semantic features are extracted from the MEs using a deep recurrent neural network (DRNN) and these features have been used for matching and retrieval. We have trained the network for a classification task which determines the complexity of an ME. ME complexity has been quantified in terms of its nested depth. Based on the nested depth, we have considered three complexity classes of MEs: Simple, Medium and Complex. After training the network, outputs just before the the final fully connected layer are extracted for all the MEs. These outputs form the semantic features of MEs and are stored in a database. For a given ME query, its semantic features are computed using the trained DRNN and matched against the semantic feature database. Matching is performed based on the standard euclidean distance and top 'k' nearest matches are retrieved, where 'k' is a user-defined parameter. Our approach has been illustrated on a database of 829 MEs.
Min Fang, Zhihui Fu, Qibin Zhao, Jun Wang
Speculative decoding (SD) has emerged as an effective technique to accelerate large language model (LLM) inference without compromising output quality. However, the achievable speedup largely depends on the effectiveness of the drafting model. While model-based methods like EAGLE-2 are accurate but costly, retrieval-enhanced methods like SAM-Decoding rely on heuristic switching strategies that often trigger unnecessary retrievals. To address this, we propose ReSpec (\textbf{Re}trieval-enhanced \textbf{Spe}culative Decoding), a novel framework that transforms heuristic drafter switching into adaptive decision-making. ReSpec features three core innovations: 1) An \textbf{entropy-guided adaptive trigger} quantifies contextual predictability to initiate retrieval only when uncertainty is low, avoiding costly low-quality speculations. 2) A \textbf{feedback-driven candidate selection} leverages historical feedback to organize multiple high-quality candidates for parallel verification, maximizing retrieval utility. 3) A source-aware \textbf{relaxed verification strategy} applies strict checks to model-generated drafts while using a relaxed verification for retrieved drafts, achieving a better balance between accuracy and efficiency. Extensive experiments on Spec-Bench demonstrate that ReSpec achieves state-of-the-art acceleration,outperforming EAGLE-2 and SAM-Decoding by over $33\%$ and $25\%$, respectively, while maintaining output quality.