Chandana Cheerla
Organizations increasingly rely on proprietary enterprise data, including HR records, structured reports, and tabular documents, for critical decision-making. While Large Language Models (LLMs) have strong generative capabilities, they are limited by static pretraining, short context windows, and challenges in processing heterogeneous data formats. Conventional Retrieval-Augmented Generation (RAG) frameworks address some of these gaps but often struggle with structured and semi-structured data. This work proposes an advanced RAG framework that combines hybrid retrieval strategies using dense embeddings (all-mpnet-base-v2) and BM25, enhanced by metadata-aware filtering with SpaCy NER and cross-encoder reranking. The framework applies semantic chunking to maintain textual coherence and retains tabular data structures to preserve row-column integrity. Quantized indexing optimizes retrieval efficiency, while human-in-the-loop feedback and conversation memory improve adaptability. Experiments on enterprise datasets show notable improvements: Precision@5 increased by 15 percent (90 versus 75), Recall@5 by 13 percent (87 versus 74), and Mean Reciprocal Rank by 16 percent (0.85 versus 0.69). Qualitative evaluations show higher scores in Faithfulness (4.6 versus 3.0), Completeness (4.2 versus 2.5), and Relevance (4.5 versus 3.2) on a 5-point Likert scale. These results demonstrate the framework's effectiveness in delivering accurate, comprehensive, and contextually relevant responses for enterprise tasks. Future work includes extending to multimodal data and integrating agent-based retrieval. The source code will be released at https://github.com/CheerlaChandana/Enterprise-Chatbot
Sara Ghasvarianjahromi, Yauhen Yakimenka, Jörg Kliewer
This paper introduces and analyzes a search and retrieval model that adopts key semantic communication principles from retrieval-augmented generation. We specifically present an information-theoretic analysis of a remote document retrieval system operating over a symbol erasure channel. The proposed model encodes the feature vector of a query, derived from term-frequency weights of a language corpus by using a repetition code with an adaptive rate dependent on the contextual importance of the terms. At the decoder, we select between two documents based on the contextual closeness of the recovered query. By leveraging a jointly Gaussian approximation for both the true and reconstructed similarity scores, we derive an explicit expression for the retrieval error probability, i.e., the probability under which the less similar document is selected. Numerical simulations on synthetic and real-world data (Google NQ) confirm the validity of the analysis. They further demonstrate that assigning greater redundancy to critical features effectively reduces the error rate, highlighting the effectiveness of semantic-aware feature encoding in error-prone communication settings.
Satoshi Yoshida, Jisho Miyazaki, Mio Murao
Storage and retrieval refer to the task of encoding a quantum channel
$\Lambda$ into a quantum state, known as the program state, such that the
channel can later be retrieved. This task is closely related to quantum channel
estimation, where multiple queries to $\Lambda$ are used to prepare a quantum
state $\phi_\Lambda$ that encodes information about the channel. The channel
can then be retrieved by measuring $\phi_\Lambda$, following a
measure-and-prepare strategy. In this work, we analyze the asymptotic
performance of the estimation-based strategy for storage and retrieval of
isometry channels. We show that the optimal fidelity for isometry estimation is
given by $F = 1-{d(D-d)\over n} + O(n^{-2})$, where $d$ and $D$ denote the
input and output dimensions of the isometry, and $n$ is the number of queries.
This result indicates that, unlike in the case of unitary channels, the
estimation-based strategy is suboptimal for the storage and retrieval of
isometry channels, which requires $n = \Theta(\epsilon^{-1})$ to achieve the
diamond-norm error $\epsilon$. To address this limitation, we propose a more
efficient protocol based on port-based teleportation, which stores the isometry
channel in a program state using only $n = \Theta(1/\sqrt{\epsilon})$ queries.
As an application, we extend our approach to general quantum channels,
achieving improved program cost compared to prior results by Gschwedtner,
Bluhm, and Winter [Quantum $\textbf{5}$, 488 (2021)].
Authors' comments: 5+28 pages, 2 figures
Yue Meng, Cheng Guo, Xiaohui Hu, Honghu Deng, Yi Cao, Tong Liu, Bo Zheng
User behavior sequence modeling, which captures user interest from rich historical interactions, is pivotal for industrial recommendation systems. Despite breakthroughs in ranking-stage models capable of leveraging ultra-long behavior sequences with length scaling up to thousands, existing retrieval models remain constrained to sequences of hundreds of behaviors due to two main challenges. One is strict latency budget imposed by real-time service over large-scale candidate pool. The other is the absence of target-aware mechanisms and cross-interaction architectures, which prevent utilizing ranking-like techniques to simplify long sequence modeling. To address these limitations, we propose a new framework named User Long-term Multi-Interest Retrieval Model(ULIM), which enables thousand-scale behavior modeling in retrieval stages. ULIM includes two novel components: 1)Category-Aware Hierarchical Dual-Interest Learning partitions long behavior sequences into multiple category-aware subsequences representing multi-interest and jointly optimizes long-term and short-term interests within specific interest cluster. 2)Pointer-Enhanced Cascaded Category-to-Item Retrieval introduces Pointer-Generator Interest Network(PGIN) for next-category prediction, followed by next-item retrieval upon the top-K predicted categories. Comprehensive experiments on Taobao dataset show that ULIM achieves substantial improvement over state-of-the-art methods, and brings 5.54% clicks, 11.01% orders and 4.03% GMV lift for Taobaomiaosha, a notable mini-app of Taobao.
Hai Toan Nguyen, Tien Dat Nguyen, Viet Ha Nguyen
Retrieval-Augmented Generation (RAG) systems commonly use chunking strategies for retrieval, which enhance large language models (LLMs) by enabling them to access external knowledge, ensuring that the retrieved information is up-to-date and domain-specific. However, traditional methods often fail to create chunks that capture sufficient semantic meaning, as they do not account for the underlying textual structure. This paper proposes a novel framework that enhances RAG by integrating hierarchical text segmentation and clustering to generate more meaningful and semantically coherent chunks. During inference, the framework retrieves information by leveraging both segment-level and cluster-level vector representations, thereby increasing the likelihood of retrieving more precise and contextually relevant information. Evaluations on the NarrativeQA, QuALITY, and QASPER datasets indicate that the proposed method achieved improved results compared to traditional chunking techniques.
Mehmet Onurcan Kaya, Figen S. Oktem
We propose a novel framework for phase retrieval that leverages Langevin dynamics to enable efficient posterior sampling, yielding reconstructions that explicitly balance distortion and perceptual quality. Unlike conventional approaches that prioritize pixel-wise accuracy, our method navigates the perception-distortion tradeoff through a principled combination of stochastic sampling, learned denoising, and model-based updates. The framework comprises three variants of increasing complexity, integrating theoretically grounded Langevin inference, adaptive noise schedule learning, parallel reconstruction sampling, and warm-start initialization from classical solvers. Extensive experiments demonstrate that our method achieves state-of-the-art performance across multiple benchmarks, both in terms of fidelity and perceptual quality.
Isaac Shi, Zeyuan Li, Fan Liu, Wenli Wang, Lewei He, Yang Yang, Tianyu Shi
We present eSapiens, an AI-as-a-Service (AIaaS) platform engineered around a business-oriented trifecta: proprietary data, operational workflows, and any major agnostic Large Language Model (LLM). eSapiens gives businesses full control over their AI assets, keeping everything in-house for AI knowledge retention and data security. eSapiens AI Agents (Sapiens) empower your team by providing valuable insights and automating repetitive tasks, enabling them to focus on high-impact work and drive better business outcomes. The system integrates structured document ingestion, hybrid vector retrieval, and no-code orchestration via LangChain, and supports top LLMs including OpenAI, Claude, Gemini, and DeepSeek. A key component is the THOR Agent, which handles structured SQL-style queries and generates actionable insights over enterprise databases. To evaluate the system, we conduct two experiments. First, a retrieval benchmark on legal corpora reveals that a chunk size of 512 tokens yields the highest retrieval precision (Top-3 accuracy: 91.3%). Second, a generation quality test using TRACe metrics across five LLMs shows that eSapiens delivers more context-consistent outputs with up to a 23% improvement in factual alignment. These results demonstrate the effectiveness of eSapiens in enabling trustworthy, auditable AI workflows for high-stakes domains like legal and finance.
Bangcheng Sun, Yazhe Chen, Jilin Yang, Xiaodong Li, Hui Li
Explainable Recommender System (ExRec) provides transparency to the recommendation process, increasing users' trust and boosting the operation of online services. With the rise of large language models (LLMs), whose extensive world knowledge and nuanced language understanding enable the generation of human-like, contextually grounded explanations, LLM-powered ExRec has gained great momentum. However, existing LLM-based ExRec models suffer from profile deviation and high retrieval overhead, hindering their deployment. To address these issues, we propose Retrieval-Augmented Recommendation Explanation Generation with Hierarchical Aggregation (REXHA). Specifically, we design a hierarchical aggregation based profiling module that comprehensively considers user and item review information, hierarchically summarizing and constructing holistic profiles. Furthermore, we introduce an efficient retrieval module using two types of pseudo-document queries to retrieve relevant reviews to enhance the generation of recommendation explanations, effectively reducing retrieval latency and improving the recall of relevant reviews. Extensive experiments demonstrate that our method outperforms existing approaches by up to 12.6% w.r.t. the explanation quality while achieving high retrieval efficiency.
Kevin Dela Rosa
We introduce ModaRoute, an LLM-based intelligent routing system that dynamically selects optimal modalities for multimodal video retrieval. While dense text captions can achieve 75.9% Recall@5, they require expensive offline processing and miss critical visual information present in 34% of clips with scene text not captured by ASR. By analyzing query intent and predicting information needs, ModaRoute reduces computational overhead by 41% while achieving 60.9% Recall@5. Our approach uses GPT-4.1 to route queries across ASR (speech), OCR (text), and visual indices, averaging 1.78 modalities per query versus exhaustive 3.0 modality search. Evaluation on 1.8M video clips demonstrates that intelligent routing provides a practical solution for scaling multimodal retrieval systems, reducing infrastructure costs while maintaining competitive effectiveness for real-world deployment.
Authors' comments: Accepted to ICCV 2025 Multimodal Representation and Retrieval Workshop
Jack McKechnie, Graham McDonald, Craig Macdonald
The evaluation of Information Retrieval (IR) systems typically uses query-document pairs with corresponding human-labelled relevance assessments (qrels). These qrels are used to determine if one system is better than another based on average retrieval performance. Acquiring large volumes of human relevance assessments is expensive. Therefore, more efficient relevance assessment approaches have been proposed, necessitating comparisons between qrels to ascertain their efficacy. Discriminative power, i.e. the ability to correctly identify significant differences between systems, is important for drawing accurate conclusions on the robustness of qrels. Previous work has measured the proportion of pairs of systems that are identified as significantly different and has quantified Type I statistical errors. Type I errors lead to incorrect conclusions due to false positive significance tests. We argue that also identifying Type II errors (false negatives) is important as they lead science in the wrong direction. We quantify Type II errors and propose that balanced classification metrics, such as balanced accuracy, can be used to portray the discriminative power of qrels. We perform experiments using qrels generated using alternative relevance assessment methods to investigate measuring hypothesis testing errors in IR evaluation. We find that additional insights into the discriminative power of qrels can be gained by quantifying Type II errors, and that balanced classification metrics can be used to give an overall summary of discriminative power in one, easily comparable, number.
Anant Gupta, Rajarshi Bhowmik, Geoffrey Gunow
Tracking the strategic focus of companies through topics in their earnings
calls is a key task in financial analysis. However, as industries evolve,
traditional topic modeling techniques struggle to dynamically capture emerging
topics and their relationships. In this work, we propose an LLM-agent driven
approach to discover and retrieve emerging topics from quarterly earnings
calls. We propose an LLM-agent to extract topics from documents, structure them
into a hierarchical ontology, and establish relationships between new and
existing topics through a topic ontology. We demonstrate the use of extracted
topics to infer company-level insights and emerging trends over time. We
evaluate our approach by measuring ontology coherence, topic evolution
accuracy, and its ability to surface emerging financial trends.
Authors' comments: The 2nd Workshop on Financial Information Retrieval in the Era of
Generative AI, The 48th International ACM SIGIR Conference on Research and
Development in Information Retrieval July 13-17, 2025 | Padua, Italy
Abhinav Java, Srivathsan Koundinyan, Nagarajan Natarajan, Amit Sharma
We consider the problem of answering complex questions, given access to a
large unstructured document corpus. The de facto approach to solving the
problem is to leverage language models that (iteratively) retrieve and reason
through the retrieved documents, until the model has sufficient information to
generate an answer. Attempts at improving this approach focus on
retrieval-augmented generation (RAG) metrics such as accuracy and recall and
can be categorized into two types: (a) fine-tuning on large question answering
(QA) datasets augmented with chain-of-thought traces, and (b) leveraging
RL-based fine-tuning techniques that rely on question-document relevance
signals. However, efficiency in the number of retrieval searches is an equally
important metric, which has received less attention. In this work, we show
that: (1) Large-scale fine-tuning is not needed to improve RAG metrics,
contrary to popular claims in recent literature. Specifically, a standard ReAct
pipeline with improved prompts can outperform state-of-the-art methods on
benchmarks such as HotPotQA. (2) Supervised and RL-based fine-tuning can help
RAG from the perspective of frugality, i.e., the latency due to number of
searches at inference time. For example, we show that we can achieve
competitive RAG metrics at nearly half the cost (in terms of number of
searches) on popular RAG benchmarks, using the same base model, and at a small
training cost (1000 examples).
Authors' comments: Accepted at ICML Workshop: Efficient Systems for Foundation Models
Dahyun Lee, Yongrae Jo, Haeju Park, Moontae Lee
Retrieval in Retrieval-Augmented Generation(RAG) must ensure that retrieved
passages are not only individually relevant but also collectively form a
comprehensive set. Existing approaches primarily rerank top-k passages based on
their individual relevance, often failing to meet the information needs of
complex queries in multi-hop question answering. In this work, we propose a
set-wise passage selection approach and introduce SETR, which explicitly
identifies the information requirements of a query through Chain-of-Thought
reasoning and selects an optimal set of passages that collectively satisfy
those requirements. Experiments on multi-hop RAG benchmarks show that SETR
outperforms both proprietary LLM-based rerankers and open-source baselines in
terms of answer correctness and retrieval quality, providing an effective and
efficient alternative to traditional rerankers in RAG systems. The code is
available at https://github.com/LGAI-Research/SetR
Authors' comments: Accepted to ACL 2025 Oral
SeungYoon Han, Taeho Hwang, Sukmin Cho, Soyeong Jeong, Hoyun Song, Huije Lee, Jong C. Park
The rapid expansion of digital information and knowledge across structured and unstructured sources has heightened the importance of Information Retrieval (IR). While dense retrieval methods have substantially improved semantic matching for general queries, they consistently underperform on queries with explicit temporal constraints--often those containing numerical expressions and time specifiers such as ``in 2015.'' Existing approaches to Temporal Information Retrieval (TIR) improve temporal reasoning but often suffer from catastrophic forgetting, leading to reduced performance on non-temporal queries. To address this, we propose Time-Specifier Model Merging (TSM), a novel method that enhances temporal retrieval while preserving accuracy on non-temporal queries. TSM trains specialized retrievers for individual time specifiers and merges them in to a unified model, enabling precise handling of temporal constraints without compromising non-temporal retrieval. Extensive experiments on both temporal and non-temporal datasets demonstrate that TSM significantly improves performance on temporally constrained queries while maintaining strong results on non-temporal queries, consistently outperforming other baseline methods. Our code is available at https://github.com/seungyoonee/TSM .
Tianzhe Zhao, Jiaoyan Chen, Yanchi Ru, Haiping Zhu, Nan Hu, Jun Liu, Qika Lin
Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by retrieving external data to mitigate hallucinations and outdated knowledge issues. Benefiting from the strong ability in facilitating diverse data sources and supporting faithful reasoning, knowledge graphs (KGs) have been increasingly adopted in RAG systems, giving rise to KG-based RAG (KG-RAG) methods. Though RAG systems are widely applied in various applications, recent studies have also revealed its vulnerabilities to data poisoning attacks, where malicious information injected into external knowledge sources can mislead the system into producing incorrect or harmful responses. However, these studies focus exclusively on RAG systems using unstructured textual data sources, leaving the security risks of KG-RAG largely unexplored, despite the fact that KGs present unique vulnerabilities due to their structured and editable nature. In this work, we conduct the first systematic investigation of the security issue of KG-RAG methods through data poisoning attacks. To this end, we introduce a practical, stealthy attack setting that aligns with real-world implementation. We propose an attack strategy that first identifies adversarial target answers and then inserts perturbation triples to complete misleading inference chains in the KG, increasing the likelihood that KG-RAG methods retrieve and rely on these perturbations during generation. Through extensive experiments on two benchmarks and four recent KG-RAG methods, our attack strategy demonstrates strong effectiveness in degrading KG-RAG performance, even with minimal KG perturbations. In-depth analyses are also conducted to understand the safety threats within the internal stages of KG-RAG systems and to explore the robustness of LLMs against adversarial knowledge.
Authors' comments: 13 pages, 6 figures
Alessandro Umbrico, Guido Bologna, Luca Coraci, Francesca Fracasso, Silvia Gola, Gabriella Cortellessa
The lack of transparency of data-driven Artificial Intelligence techniques limits their interpretability and acceptance into healthcare decision-making processes. We propose an attribution-based approach to improve the interpretability of Explainable AI-based predictions in the specific context of arm lymphedema's risk assessment after lymph nodal radiotherapy in breast cancer. The proposed method performs a statistical analysis of the attributes in the rule-based prediction model using standard metrics from Information Retrieval techniques. This analysis computes the relevance of each attribute to the prediction and provides users with interpretable information about the impact of risk factors. The results of a user study that compared the output generated by the proposed approach with the raw output of the Explainable AI model suggested higher levels of interpretability and usefulness in the context of predicting lymphedema risk.
Zeyuan Meng, Zixuan Yi, Iadh Ounis
Large Language Models (LLMs) have shown strong potential in recommender systems due to their contextual learning and generalisation capabilities. Existing LLM-based recommendation approaches typically formulate the recommendation task using specialised prompts designed to leverage their contextual abilities, and aligning their outputs closely with human preferences to yield an improved recommendation performance. However, the use of LLMs for recommendation tasks is limited by the absence of domain-specific knowledge. This lack of relevant relational knowledge about the items to be recommended in the LLM's pre-training corpus can lead to inaccuracies or hallucinations, resulting in incorrect or misleading recommendations. Moreover, directly using information from the knowledge graph introduces redundant and noisy information, which can affect the LLM's reasoning process or exceed its input context length, thereby reducing the performance of LLM-based recommendations. To address the lack of domain-specific knowledge, we propose a novel model called Knowledge-Enhanced Retrieval-Augmented Generation for Recommendation (KERAG_R). Specifically, we leverage a graph retrieval-augmented generation (GraphRAG) component to integrate additional information from a knowledge graph (KG) into instructions, enabling the LLM to collaboratively exploit recommendation signals from both text-based user interactions and the knowledge graph to better estimate the users' preferences in a recommendation context. In particular, we perform graph RAG by pre-training a graph attention network (GAT) to select the most relevant triple for the target users for the used LLM, thereby enhancing the LLM while reducing redundant and noisy information. Our extensive experiments on three public datasets show that our proposed KERAG_R model significantly outperforms ten existing state-of-the-art recommendation methods.
YiHan Jiao, ZheHao Tan, Dan Yang, DuoLin Sun, Jie Feng, Jian Wang, Peng Wei
Retrieval-augmented generation (RAG) has become a fundamental paradigm for addressing the challenges faced by large language models in handling real-time information and domain-specific problems. Traditional RAG systems primarily rely on the in-context learning (ICL) capabilities of the large language model itself. Still, in-depth research on the specific capabilities needed by the RAG generation model is lacking, leading to challenges with inconsistent document quality and retrieval system imperfections. Even the limited studies that fine-tune RAG generative models often \textit{lack a granular focus on RAG task} or \textit{a deeper utilization of chain-of-thought processes}. To address this, we propose that RAG models should possess three progressively hierarchical abilities (1) Filtering: the ability to select relevant information; (2) Combination: the ability to combine semantic information across paragraphs; and (3) RAG-specific reasoning: the ability to further process external knowledge using internal knowledge. Thus, we introduce our new RAG instruction fine-tuning method, Hierarchical-Thought Instruction-Tuning Retrieval-Augmented Generation (HIRAG) incorporates a "think before answering" strategy. This method enhances the model's open-book examination capability by utilizing multi-level progressive chain-of-thought. Experiments show that the HIRAG training strategy significantly improves the model's performance on datasets such as RGB, PopQA, MuSiQue, HotpotQA, and PubmedQA.
Yiqiao Jin, Kartik Sharma, Vineeth Rakesh, Yingtong Dou, Menghai Pan, Mahashweta Das, Srijan Kumar
Retrieval-augmented Generation (RAG) extends large language models (LLMs)
with external knowledge but faces key challenges: restricted effective context
length and redundancy in retrieved documents. Pure compression-based approaches
reduce input size but often discard fine-grained details essential for factual
accuracy. We propose SARA, a unified RAG framework that balances local
precision and global knowledge coverage under tight context budgets. SARA
combines natural-language text snippets with semantic compression vectors to
jointly enhance context efficiency and answer correctness. It represents
contexts at two complementary levels: 1) fine-grained natural-language spans
that preserve critical entities and numerical values, and 2) compact,
interpretable vectors that summarize high-level semantics. An iterative
evidence-selection module employs the compression vectors for dynamic reranking
of contexts. Across 9 datasets and 5 open-source LLMs spanning 3 model families
(Mistral, Llama, and Gemma), SARA consistently improves answer relevance
(+17.71), answer correctness (+13.72), and semantic similarity (+15.53),
demonstrating the importance of integrating textual and compressed
representations for robust, context-efficient RAG.
Authors' comments: 20 pages
Zhiwei Chen, Yupeng Hu, Zixu Li, Zhiheng Fu, Xuemeng Song, Liqiang Nie
Composed Image Retrieval (CIR) represents a novel retrieval paradigm that is capable of expressing users' intricate retrieval requirements flexibly. It enables the user to give a multimodal query, comprising a reference image and a modification text, and subsequently retrieve the target image. Notwithstanding the considerable advances made by prevailing methodologies, CIR remains in its nascent stages due to two limitations: 1) inhomogeneity between dominant and noisy portions in visual data is ignored, leading to query feature degradation, and 2) the priority of textual data in the image modification process is overlooked, which leads to a visual focus bias. To address these two limitations, this work presents a focus mapping-based feature extractor, which consists of two modules: dominant portion segmentation and dual focus mapping. It is designed to identify significant dominant portions in images and guide the extraction of visual and textual data features, thereby reducing the impact of noise interference. Subsequently, we propose a textually guided focus revision module, which can utilize the modification requirements implied in the text to perform adaptive focus revision on the reference image, thereby enhancing the perception of the modification focus on the composed features. The aforementioned modules collectively constitute the segmentatiOn-based Focus shiFt reviSion nETwork (\mbox{OFFSET}), and comprehensive experiments on four benchmark datasets substantiate the superiority of our proposed method. The codes and data are available on https://zivchen-ty.github.io/OFFSET.github.io/