Elias Lumer, Matt Melich, Olivia Zino, Elena Kim, Sara Dieter, Pradeep Honaganahalli Basavaraju, Vamse Kumar Subbiah, James A. Burke et al.
Recent advancements in Retrieval-Augmented Generation (RAG) have enabled Large Language Models to answer financial questions using external knowledge bases of U.S. SEC filings, earnings reports, and regulatory documents. However, existing work lacks systematic comparison of vector-based and non-vector RAG architectures for financial documents, and the empirical impact of advanced RAG techniques on retrieval accuracy, answer quality, latency, and cost remain unclear. We present the first systematic evaluation comparing vector-based agentic RAG using hybrid search and metadata filtering against hierarchical node-based systems that traverse document structure without embeddings. We evaluate two enhancement techniques applied to the vector-based architecture, i) cross-encoder reranking for retrieval precision, and ii) small-to-big chunk retrieval for context completeness. Across 1,200 SEC 10-K, 10-Q, and 8-K filings on a 150-question benchmark, we measure retrieval metrics (MRR, Recall@5), answer quality through LLM-as-a-judge pairwise comparisons, latency, and preprocessing costs. Vector-based agentic RAG achieves a 68% win rate over hierarchical node-based systems with comparable latency (5.2 compared to 5.98 seconds). Cross-encoder reranking achieves a 59% absolute improvement at optimal parameters (10, 5) for MRR@5. Small-to-big retrieval achieves a 65% win rate over baseline chunking with only 0.2 seconds additional latency. Our findings reveal that applying advanced RAG techniques to financial Q&A systems improves retrieval accuracy, answer quality, and has cost-performance tradeoffs to be considered in production.
Authors' comments: 8 pages, 2 figures
Runwei Guan, Ka Lok Man, Feifan Chen, Shanliang Yao, Rongsheng Hu, Xiaohui Zhu, Jeremy Smith, Eng Gee Lim et al.
Natural language (NL) based vehicle retrieval is a task aiming to retrieve a vehicle that is most consistent with a given NL query from among all candidate vehicles. Because NL query can be easily obtained, such a task has a promising prospect in building an interactive intelligent traffic system (ITS). Current solutions mainly focus on extracting both text and image features and mapping them to the same latent space to compare the similarity. However, existing methods usually use dependency analysis or semantic role-labelling techniques to find keywords related to vehicle attributes. These techniques may require a lot of pre-processing and post-processing work, and also suffer from extracting the wrong keyword when the NL query is complex. To tackle these problems and simplify, we borrow the idea from named entity recognition (NER) and construct FindVehicle, a NER dataset in the traffic domain. It has 42.3k labelled NL descriptions of vehicle tracks, containing information such as the location, orientation, type and colour of the vehicle. FindVehicle also adopts both overlapping entities and fine-grained entities to meet further requirements. To verify its effectiveness, we propose a baseline NL-based vehicle retrieval model called VehicleFinder. Our experiment shows that by using text encoders pre-trained by FindVehicle, VehicleFinder achieves 87.7\% precision and 89.4\% recall when retrieving a target vehicle by text command on our homemade dataset based on UA-DETRAC. The time cost of VehicleFinder is 279.35 ms on one ARM v8.2 CPU and 93.72 ms on one RTX A4000 GPU, which is much faster than the Transformer-based system. The dataset is open-source via the link https://github.com/GuanRunwei/FindVehicle, and the implementation can be found via the link https://github.com/GuanRunwei/VehicleFinder-CTIM.
A. P. Konijnenberg, W. M. J. Coene, H. P. Urbach
Recently, efforts have been made to improve ptychography phase retrieval algorithms so that they are more robust against noise. Often the algorithm is adapted by changing the cost functional that needs to be minimized. In particular, it has been suggested that the cost functional should be obtained using a maximum-likelihood approach that takes the noise statistics into account. Here, we consider the different choices of cost functional, and to how they affect the reconstruction results. We find that seemingly the only consistently reliable way to improve reconstruction results in the presence of noise is to reduce the step size of the update function. In addition, a noise-robust ptychographic reconstruction method has been proposed that relies on adapting the intensity constraints
Baptiste Lavie, João M. Mendonça, Christoph Mordasini, Matej Malik, Mickaël Bonnefoy, Brice-Olivier Demory, Maria Oreshenko, Simon L. Grimm et al.
We present an open-source retrieval code named HELIOS-Retrieval (hereafter
HELIOS-R), designed to obtain chemical abundances and temperature-pressure
profiles from inverting the measured spectra of exoplanetary atmospheres. In
the current implementation, we use an exact solution of the radiative transfer
equation, in the pure absorption limit, in our forward model, which allows us
to analytically integrate over all of the outgoing rays (instead of performing
Gaussian quadrature). Two chemistry models are considered: unconstrained
chemistry (where the mixing ratios are treated as free parameters) and
equilibrium chemistry (enforced via analytical formulae, where only the
elemental abundances are free parameters). The nested sampling algorithm allows
us to formally implement Occam's Razor based on a comparison of the Bayesian
evidence between models. We perform a retrieval analysis on the measured
spectra of the HR 8799b, c, d and e directly imaged exoplanets. Chemical
equilibrium is disfavored by the Bayesian evidence for HR 8799b, c and d. We
find supersolar C/O, C/H and O/H values for the outer HR 8799b and c
exoplanets, while the inner HR 8799d and e exoplanets have substellar C/O,
substellar C/H and superstellar O/H values. If these retrieved properties are
representative of the bulk compositions of the exoplanets, then they are
inconsistent with formation via gravitational instability (without late-time
accretion) and consistent with a core accretion scenario in which late-time
accretion of ices occurred differently for the inner and outer exoplanets. For
HR 8799e, we find that spectroscopy in the K band is crucial for constraining
C/O and C/H. HELIOS-R is publicly available as part of the Exoclimes Simulation
Platform (ESP; www.exoclime.org).
Authors' comments: 27 pages, 21 figures, 3 tables, published in AJ
Dimitris Stripelis, Patrick Foley, Mohammad Naseri, William Lindskog-Münzing, Chong Shen Ng, Daniel Janes Beutel, Nicholas D. Lane
RAG typically assumes centralized access to documents, which breaks down when knowledge is distributed across private data silos. We propose a secure Federated RAG system built using Flower that performs local silo retrieval, while server-side aggregation and text generation run inside an attested, confidential compute environment, enabling confidential remote LLM inference even in the presence of honest-but-curious or compromised servers. We also propose a cascading inference approach that incorporates a non-confidential third-party model (e.g., Amazon Nova) as auxiliary context without weakening confidentiality.
Authors' comments: 6 pages, 1 figure, 2 tables
Kumar Vijay Mishra, Henry Arguello, Brian M. Sadler
Hypercomplex signal processing (HSP) offers powerful tools for analyzing and processing multidimensional signals by explicitly exploiting inter-dimensional correlations through Clifford algebra. In recent years, hypercomplex formulations of the phase retrieval (PR) problem, wheren a complex-valued signal is recovered from intensity-only measurements, have attracted growing interest. Hypercomplex phase retrieval (HPR) naturally arises in a range of optical imaging and computational sensing applications, where signals are often modeled using quaternion- or octonion-valued representations. Similar to classical PR, HPR problems may involve measurements obtained via complex, hypercomplex, Fourier, or other structured sensing operators. These formulations open new avenues for the development of advanced HSP-based algorithms and theoretical frameworks. This chapter surveys emerging methodologies and applications of HPR, with particular emphasis on optical imaging systems.
Authors' comments: 21 pages, 4 figures, 2 tables. arXiv admin note: substantial text overlap with arXiv:2310.17660
Andrew Parry, Debasis Ganguly, Sean MacAvaney
Information retrieval evaluation often suffers from fragmented practices -- varying dataset subsets, aggregation methods, and pipeline configurations -- that undermine reproducibility and comparability, especially for foundation embedding models requiring robust out-of-domain performance. We introduce SuiteEval, a unified framework that offers automatic end-to-end evaluation, dynamic indexing that reuses on-disk indices to minimise disk usage, and built-in support for major benchmarks (BEIR, LoTTE, MS MARCO, NanoBEIR, and BRIGHT). Users only need to supply a pipeline generator. SuiteEval handles data loading, indexing, ranking, metric computation, and result aggregation. New benchmark suites can be added in a single line. SuiteEval reduces boilerplate and standardises evaluations to facilitate reproducible IR research, as a broader benchmark set is increasingly required.
Authors' comments: 5 pages, 3 figures, 2 tables, Accepted as a Demonstration to ECIR 2026
Yichen Tang, Weihang Su, Yiqun Liu, Qingyao Ai
Integrating external tools enables Large Language Models (LLMs) to interact with real-world environments and solve complex tasks. Given the growing scale of available tools, effective tool retrieval is essential to mitigate constraints of LLMs' context windows and ensure computational efficiency. Existing approaches typically treat tool retrieval as a traditional ad-hoc retrieval task, matching user queries against the entire raw tool documentation. In this paper, we identify three fundamental challenges that limit the effectiveness of this paradigm: (i) the incompleteness and structural inconsistency of tool documentation; (ii) the significant semantic and granular mismatch between user queries and technical tool documents; and, most importantly, (iii) the multi-aspect nature of tool utility, that involves distinct dimensions, such as functionality, input constraints, and output formats, varying in format and importance. To address these challenges, we introduce Multi-Field Tool Retrieval, a framework designed to align user intent with tool representations through fine-grained, multi-field modeling. Experimental results show that our framework achieves SOTA performance on five datasets and a mixed benchmark, exhibiting superior generalizability and robustness.
Authors' comments: 12 pages, 4 figures
Yuping Lin, Zitao Li, Yue Xing, Pengfei He, Yingqian Cui, Yaliang Li, Bolin Ding, Jingren Zhou et al.
Recent studies have identified "retrieval heads" in Large Language Models (LLMs) responsible for extracting information from input contexts. However, prior works largely rely on static statistics aggregated across datasets, identifying heads that perform retrieval on average. This perspective overlooks the fine-grained temporal dynamics of autoregressive generation. In this paper, we investigate retrieval heads from a dynamic perspective. Through extensive analysis, we establish three core claims: (1) Dynamism: Retrieval heads vary dynamically across timesteps; (2) Irreplaceability: Dynamic retrieval heads are specific at each timestep and cannot be effectively replaced by static retrieval heads; and (3) Correlation: The model's hidden state encodes a predictive signal for future retrieval head patterns, indicating an internal planning mechanism. We validate these findings on the Needle-in-a-Haystack task and a multi-hop QA task, and quantify the differences on the utility of dynamic and static retrieval heads in a Dynamic Retrieval-Augmented Generation framework. Our study provides new insights into the internal mechanisms of LLMs.
Nilesh Gupta, Wei-Cheng Chang, Ngot Bui, Cho-Jui Hsieh, Inderjit S. Dhillon
Modern IR systems are increasingly tasked with answering complex, multi-faceted queries that require deep reasoning rather than simple keyword or semantic matching. While LLM-based IR has shown great promise, the prevailing retrieve-then-rerank paradigm inherits the limitations of embedding-based retrieval; parametric generative approaches are difficult to update with new information; and long-context methods that place the entire corpus in context are computationally infeasible for large document collections. To address these challenges, we introduce LATTICE, a hierarchical retrieval framework that enables an LLM to reason over and navigate large corpora with logarithmic search complexity by imposing a semantic tree structure on the corpus. Our approach consists of two stages: (1) an offline phase that organizes the corpus into a semantic hierarchy via either a bottom-up agglomerative strategy or a top-down divisive strategy using multi-level summaries and (2) an online traversal phase where a search LLM navigates this tree. A central challenge in such LLM-guided search is that the model's relevance judgments are noisy, context-dependent, and unaware of the hierarchy, making cross-branch and cross-level comparisons difficult. To overcome this, we propose a traversal algorithm that estimates calibrated latent relevance scores from local LLM outputs and aggregates them into a global path relevance metric. Our training-free framework achieves state-of-the-art zero-shot performance on the reasoning-intensive BRIGHT benchmark, demonstrating up to 9% improvement in Recall@100 and 5% in nDCG@10 over the next best zero-shot baseline. Furthermore, compared to the fine-tuned SOTA method DIVER-v2, LATTICE attains comparable results on BRIGHT subsets that use a static corpus for evaluation.
Chenhao Xu, Longxiang Gao, Yuan Miao, Xi Zheng
As large language models (LLMs) become increasingly adopted on edge devices, Retrieval-Augmented Generation (RAG) is gaining prominence as a solution to address factual deficiencies and hallucinations by integrating external knowledge. However, centralized RAG architectures face significant challenges in data privacy and scalability. For instance, smart healthcare services often rely on collecting sensitive patient data and building a centralized knowledge base to provide better diagnosis and treatment advice, while privacy concerns significantly impede this process. Besides, maintaining a comprehensive and continuously updated knowledge base is costly, particularly in response to regional epidemics and rapidly mutating viruses. To address these challenges, this paper introduces Distributed Retrieval-Augmented Generation (DRAG), a novel framework that improves data privacy by eliminating the need for a centralized knowledge base and restoring data control to owners. DRAG incorporates a Topic-Aware Random Walk (TARW) algorithm that leverages LLMs to extract query topics and facilitate targeted peer discovery within a peer-to-peer network, enabling efficient knowledge retrieval in decentralized environments. Extensive experiments across three diverse datasets and LLMs demonstrate that DRAG with TARW achieves near-centralized RAG performance by using half as many messages as flooding. The code is available at https://github.com/xuchenhao001/DRAG.
Gianluca Monaci, Rafael S. Rezende, Romain Deffayet, Gabriela Csurka, Guillaume Bono, Hervé Déjean, Stéphane Clinchant, Christian Wolf
Methods for navigation based on large-scale learning typically treat each episode as a new problem, where the agent is spawned with a clean memory in an unknown environment. While these generalization capabilities to an unknown environment are extremely important, we claim that, in a realistic setting, an agent should have the capacity of exploiting information collected during earlier robot operations. We address this by introducing a new retrieval-augmented agent, trained with RL, capable of querying a database collected from previous episodes in the same environment and learning how to integrate this additional context information. We introduce a unique agent architecture for the general navigation task, evaluated on ObjectNav, ImageNav and Instance-ImageNav. Our retrieval and context encoding methods are data-driven and heavily employ vision foundation models (FM) for both semantic and geometric understanding. We propose new benchmarks for these settings and we show that retrieval allows zero-shot transfer across tasks and environments while significantly improving performance.
Chaofan Li, Zheng Liu, Jianlyv Chen, Defu Lian, Yingxia Shao
While retrieval techniques are widely used in practice, they still face significant challenges in cross-domain scenarios. Recently, generation-augmented methods have emerged as a promising solution to this problem. These methods enhance raw queries by incorporating additional information from an LLM-based generator, facilitating more direct retrieval of relevant documents. However, existing methods struggle with highly specialized situations that require extensive domain expertise. To address this problem, we present \textbf{Reinforced-IR}, a novel approach that jointly adapts a pre-trained retriever and generator for precise cross-domain retrieval. A key innovation of Reinforced-IR is its \textbf{Self-Boosting} framework, which enables retriever and generator to learn from each other's feedback. Specifically, the generator is reinforced to generate query augmentations that enhance the retriever's performance, while the retriever is trained to better discriminate the relevant documents identified by the generator. This iterative process allows the end-to-end retrieval performance to be progressively optimized using an unlabeled corpus from the target domain. In our experiment, Reinforced-IR outperforms existing domain adaptation methods by a large margin, leading to substantial improvements in retrieval quality across a wide range of application scenarios.
Bhaskar Mitra
Our world today is facing a confluence of several mutually reinforcing crises each of which intersects with concerns of social justice and emancipation. This paper is a provocation for the role of computer-mediated information access in our emancipatory struggles. We define emancipatory information retrieval as the study and development of information access methods that challenge various forms of human oppression, and situates its activities within broader collective emancipatory praxis. The term "emancipatory" here signifies the moral concerns of universal humanization of all peoples and the elimination of oppression to create the conditions under which we can collectively flourish. To develop an emancipatory research agenda for information retrieval (IR), in this paper we speculate about the practices that the community can adopt, enumerate some of the projects that the field should undertake, and discuss provocations to spark new ideas and directions for research. We challenge the field of IR research to embrace humanistic values and commit to universal emancipation and social justice. We also invite scholars from fields such as human-computer interaction, information sciences, media studies, design, social sciences, humanities, democratic theory, and critical theory, as well as legal and policy experts, civil rights and social justice activists, and artists to join us in realizing this transformation. In this process, we must both imagine post-oppressive worlds, and reimagine the role of IR in that world and in the journey that leads us there.
Weihang Su, Yichen Tang, Qingyao Ai, Junxi Yan, Changyue Wang, Hongning Wang, Ziyi Ye, Yujia Zhou et al.
Retrieval-augmented generation (RAG) techniques have emerged as a promising solution to enhance the reliability of large language models (LLMs) by addressing issues like hallucinations, outdated knowledge, and domain adaptation. In particular, existing RAG methods append relevant documents retrieved from external corpus or databases to the input of LLMs to guide their generation process, which we refer to as the in-context knowledge injection method. While this approach is simple and often effective, it has inherent limitations. Firstly, increasing the context length and number of relevant documents can lead to higher computational overhead and degraded performance, especially in complex reasoning tasks. More importantly, in-context knowledge injection operates primarily at the input level, but LLMs store their internal knowledge in their parameters. This gap fundamentally limits the capacity of in-context methods. To this end, we introduce Parametric retrieval-augmented generation (Parametric RAG), a new RAG paradigm that integrates external knowledge directly into the parameters of feed-forward networks (FFN) of an LLM through document parameterization. This approach not only saves online computational costs by eliminating the need to inject multiple documents into the LLMs' input context, but also deepens the integration of external knowledge into the parametric knowledge space of the LLM. Experimental results demonstrate that Parametric RAG substantially enhances both the effectiveness and efficiency of knowledge augmentation in LLMs. Also, it can be combined with in-context RAG methods to achieve even better performance. We have open-sourced all the code, data, and models in the following anonymized GitHub link: https://github.com/oneal2000/PRAG
Haoyu Liu, Shaohan Huang, Jianfeng Liu, Yuefeng Zhan, Hao Sun, Weiwei Deng, Feng Sun, Furu Wei et al.
Document retrieval techniques are essential for developing large-scale
information systems. The common approach involves using a bi-encoder to compute
the semantic similarity between a query and documents. However, the scalar
similarity often fail to reflect enough information, hindering the
interpretation of retrieval results. In addition, this process primarily
focuses on global semantics, overlooking the finer-grained semantic
relationships between the query and the document's content. In this paper, we
introduce a novel method, $\textbf{Ge}$neration $\textbf{A}$ugmented
$\textbf{R}$etrieval ($\textbf{GeAR}$), which not only improves the global
document-query similarity through contrastive learning, but also integrates
well-designed fusion and decoding modules. This enables GeAR to generate
relevant context within the documents based on a given query, facilitating
learning to retrieve local fine-grained information. Furthermore, when used as
a retriever, GeAR does not incur any additional computational cost over
bi-encoders. GeAR exhibits competitive retrieval performance across diverse
scenarios and tasks. Moreover, qualitative analysis and the results generated
by GeAR provide novel insights into the interpretation of retrieval results.
The code, data, and models will be released at
\href{https://github.com/microsoft/LMOps}{https://github.com/microsoft/LMOps}.
Authors' comments: In ACL 2025
Haolin Wang, Ming Liu, Zifei Yan, Chao Zhou, Longan Xiao, Wangmeng Zuo
When embedding objects (foreground) into images (background), considering the
influence of photography conditions like illumination, it is usually necessary
to perform image harmonization to make the foreground object coordinate with
the background image in terms of brightness, color, and etc. Although existing
image harmonization methods have made continuous efforts toward visually
pleasing results, they are still plagued by two main issues. Firstly, the image
harmonization becomes highly ill-posed when there are no contents similar to
the foreground object in the background, making the harmonization results
unreliable. Secondly, even when similar contents are available, the
harmonization process is often interfered with by irrelevant areas, mainly
attributed to an insufficient understanding of image contents and inaccurate
attention. As a remedy, we present a retrieval-augmented image harmonization
(Raiha) framework, which seeks proper reference images to reduce the
ill-posedness and restricts the attention to better utilize the useful
information. Specifically, an efficient retrieval method is designed to find
reference images that contain similar objects as the foreground while the
illumination is consistent with the background. For training the Raiha
framework to effectively utilize the reference information, a data augmentation
strategy is delicately designed by leveraging existing non-reference image
harmonization datasets. Besides, the image content priors are introduced to
ensure reasonable attention. With the presented Raiha framework, the image
harmonization performance is greatly boosted under both non-reference and
retrieval-augmented settings. The source code and pre-trained models will be
publicly available.
Authors' comments: 8 pages
Derrick Quinn, Mohammad Nouri, Neel Patel, John Salihu, Alireza Salemi, Sukhan Lee, Hamed Zamani, Mohammad Alian
An evolving solution to address hallucination and enhance accuracy in large language models (LLMs) is Retrieval-Augmented Generation (RAG), which involves augmenting LLMs with information retrieved from an external knowledge source, such as the web. This paper profiles several RAG execution pipelines and demystifies the complex interplay between their retrieval and generation phases. We demonstrate that while exact retrieval schemes are expensive, they can reduce inference time compared to approximate retrieval variants because an exact retrieval model can send a smaller but more accurate list of documents to the generative model while maintaining the same end-to-end accuracy. This observation motivates the acceleration of the exact nearest neighbor search for RAG. In this work, we design Intelligent Knowledge Store (IKS), a type-2 CXL device that implements a scale-out near-memory acceleration architecture with a novel cache-coherent interface between the host CPU and near-memory accelerators. IKS offers 13.4-27.9x faster exact nearest neighbor search over a 512GB vector database compared with executing the search on Intel Sapphire Rapids CPUs. This higher search performance translates to 1.7-26.3x lower end-to-end inference time for representative RAG applications. IKS is inherently a memory expander; its internal DRAM can be disaggregated and used for other applications running on the server to prevent DRAM, which is the most expensive component in today's servers, from being stranded.
Alessandro Magnani, Feng Liu, Suthee Chaidaroon, Sachin Yadav, Praveen Reddy Suram, Ajit Puthenputhussery, Sijie Chen, Min Xie et al.
In product search, the retrieval of candidate products before re-ranking is
more critical and challenging than other search like web search, especially for
tail queries, which have a complex and specific search intent. In this paper,
we present a hybrid system for e-commerce search deployed at Walmart that
combines traditional inverted index and embedding-based neural retrieval to
better answer user tail queries. Our system significantly improved the
relevance of the search engine, measured by both offline and online
evaluations. The improvements were achieved through a combination of different
approaches. We present a new technique to train the neural model at scale. and
describe how the system was deployed in production with little impact on
response time. We highlight multiple learnings and practical tricks that were
used in the deployment of this system.
Authors' comments: 9 page, 2 figures, 10 tables, KDD 2022
Hao Wang, Minghui Liao, Zhouyi Xie, Wenyu Liu, Xiang Bai
The task of partial scene text retrieval involves localizing and searching
for text instances that are the same or similar to a given query text from an
image gallery. However, existing methods can only handle text-line instances,
leaving the problem of searching for partial patches within these text-line
instances unsolved due to a lack of patch annotations in the training data. To
address this issue, we propose a network that can simultaneously retrieve both
text-line instances and their partial patches. Our method embeds the two types
of data (query text and scene text instances) into a shared feature space and
measures their cross-modal similarities. To handle partial patches, our
proposed approach adopts a Multiple Instance Learning (MIL) approach to learn
their similarities with query text, without requiring extra annotations.
However, constructing bags, which is a standard step of conventional MIL
approaches, can introduce numerous noisy samples for training, and lower
inference speed. To address this issue, we propose a Ranking MIL (RankMIL)
approach to adaptively filter those noisy samples. Additionally, we present a
Dynamic Partial Match Algorithm (DPMA) that can directly search for the target
partial patch from a text-line instance during the inference stage, without
requiring bags. This greatly improves the search efficiency and the performance
of retrieving partial patches. The source code and dataset are available at
https://github.com/lanfeng4659/PSTR.
Authors' comments: Accepted on TPAMI