Saraga S., Anagha M. S., Dincy R. Arikkat, Rafidha Rehiman K. A., Serena Nicolazzo, Antonino Nocera, Vinod P
The widespread use of Android applications has made them a prime target for cyberattacks, significantly increasing the risk of malware that threatens user privacy, security, and device functionality. Effective malware detection is thus critical, with static analysis, dynamic analysis, and Machine Learning being widely used approaches. In this work, we focus on a Machine Learning-based method utilizing static features. We first compiled a dataset of benign and malicious APKs and performed static analysis to extract features such as code structure, permissions, and manifest file content, without executing the apps. Instead of relying solely on raw static features, our system uses an LLM to generate high-level functional descriptions of APKs. To mitigate hallucinations, which are a known vulnerability of LLM, we integrated Retrieval-Augmented Generation (RAG), enabling the LLM to ground its output in relevant context. Using carefully designed prompts, we guide the LLM to produce coherent function summaries, which are then analyzed using a transformer-based model, improving detection accuracy over conventional feature-based methods for malware detection.
Wali Mohammad Abdullah, Md. Morshedul Islam, Devraj Parmar, Happy Hasmukhbhai Patel, Sindhuja Prabhakaran, Baidya Saha
Large Language Models (LLMs) like GPT-3.5-Turbo are increasingly used to assist software development, yet they often produce incomplete code or incorrect imports, especially when lacking access to external or project-specific documentation. We introduce RAILS (Retrieval-Augmented Intelligence for Learning Software Development), a framework that augments LLM prompts with semantically retrieved context from curated Java resources using FAISS and OpenAI embeddings. RAILS incorporates an iterative validation loop guided by compiler feedback to refine suggestions. We evaluated RAILS on 78 real-world Java import error cases spanning standard libraries, GUI APIs, external tools, and custom utilities. Despite using the same LLM, RAILS outperforms baseline prompting by preserving intent, avoiding hallucinations, and surfacing correct imports even when libraries are unavailable locally. Future work will integrate symbolic filtering via PostgreSQL and extend support to other languages and IDEs.
Wali Mohammad Abdullah, Azmain Kabir
We present P4OMP, a retrieval-augmented framework for transforming serial C/C++ code into OpenMP-annotated parallel code using large language models (LLMs). To our knowledge, this is the first system to apply retrieval-based prompting for OpenMP pragma correctness without model fine-tuning or compiler instrumentation. P4OMP leverages Retrieval-Augmented Generation (RAG) with structured instructional knowledge from OpenMP tutorials to improve the reliability of prompt-driven code generation. By grounding generation in the retrieved context, P4OMP improves syntactic correctness compared to baseline prompting with GPT-3.5-Turbo. We evaluate P4OMP against a baseline, GPT-3.5-Turbo without retrieval, on a comprehensive benchmark of 108 real-world C++ programs drawn from Stack Overflow, PolyBench, and NAS benchmark suites. P4OMP achieves 100% compilation success on all parallelizable cases, while the baseline fails to compile in 20 out of 108 cases. Six cases that rely on non-random-access iterators or thread-unsafe constructs are excluded due to fundamental OpenMP limitations. A detailed analysis demonstrates how P4OMP consistently avoids scoping errors, syntactic misuse, and invalid directive combinations that commonly affect baseline-generated code. We further demonstrate strong runtime scaling across seven compute-intensive benchmarks on an HPC cluster. P4OMP offers a robust, modular pipeline that significantly improves the reliability and applicability of LLM-generated OpenMP code.
Kevin Duh, Eugene Yang, Orion Weller, Andrew Yates, Dawn Lawrie
The HLTCOE LiveRAG submission utilized the GPT-researcher framework for
researching the context of the question, filtering the returned results, and
generating the final answer. The retrieval system was a ColBERT bi-encoder
architecture, which represents a passage with many dense tokens. Retrieval used
a local, compressed index of the FineWeb10-BT collection created with PLAID-X,
using a model fine-tuned for multilingual retrieval. Query generation from
context was done with Qwen2.5-7B-Instruct, while filtering was accomplished
with m2-bert-80M-8k-retrieval. Up to nine passages were used as context to
generate an answer using Falcon3-10B. This system placed 5th in the LiveRAG
automatic evaluation for correctness with a score of 1.07.
Authors' comments: 5 pages, 1 figure
Xinghe Cheng, Zihan Zhang, Jiapu Wang, Liangda Fang, Chaobo He, Quanlong Guan, Shirui Pan, Weiqi Luo
Learning path recommendation seeks to provide learners with a structured sequence of learning items (e.g., knowledge concepts or exercises) to optimize their learning efficiency. Despite significant efforts in this area, most existing methods primarily rely on prerequisite relationships, which present two major limitations: 1) Many educational datasets do not explicitly provide prerequisite relationships between knowledge concepts, hindering the application of current learning path recommendation methods. 2) Relying solely on prerequisite relationships as the sole knowledge structure can impede learning progress and negatively impact student outcomes. To address these challenges, we propose a novel approach, Discrimination Learning Enhances Learning Path Recommendation (DLELP), which enhances learning path recommendations by incorporating both prerequisite and similarity relationships between knowledge concepts. Specifically, we introduce a knowledge concept structure graph generation module that adaptively constructs knowledge concept structure graphs for different educational datasets, significantly improving the generalizability of learning path recommendation methods. We then propose a Discrimination Learning-driven Reinforcement Learning (DLRL) framework, which mitigates the issue of blocked learning paths, further enhancing the efficacy of learning path recommendations. Finally, we conduct extensive experiments on three benchmark datasets, demonstrating that our method not only achieves state-of-the-art performance but also provides interpretable reasoning for the recommended learning paths.
Haiyang Peng, Deren Han, Meng Huang
This paper investigates the stability of phase retrieval by analyzing the condition number of the nonlinear map $\Psi_{\boldsymbol{A}}(\boldsymbol{x}) = \bigl(\lvert \langle {\boldsymbol{a}}_j, \boldsymbol{x} \rangle \rvert^2 \bigr)_{1 \le j \le m}$, where $\boldsymbol{a}_j \in \mathbb{H}^n$ are known sensing vectors with $\mathbb{H} \in \{\mathbb{R}, \mathbb{C}\}$. For each $p \ge 1$, we define the condition number $\beta_{\Psi_{\boldsymbol{A}}}^{\ell_p}$ as the ratio of optimal upper and lower Lipschitz constants of $\Psi_{\boldsymbol{A}}$ measured in the $\ell_p$ norm, with respect to the metric $\mathrm {dist}_\mathbb{H}\left(\boldsymbol{x}, \boldsymbol{y}\right) = \|\boldsymbol{x} \boldsymbol{x}^\ast - \boldsymbol{y} \boldsymbol{y}^\ast\|_*$. We establish universal lower bounds on $\beta_{\Psi_{\boldsymbol{A}}}^{\ell_p}$ for any sensing matrix $\boldsymbol{A} \in \mathbb{H}^{m \times d}$, proving that $\beta_{\Psi_{\boldsymbol{A}}}^{\ell_1} \ge \pi/2$ and $\beta_{\Psi_{\boldsymbol{A}}}^{\ell_2} \ge \sqrt{3}$ in the real case $(\mathbb{H} = \mathbb{R})$, and $\beta_{\Psi_{\boldsymbol{A}}}^{\ell_p} \ge 2$ for $p=1,2$ in the complex case $(\mathbb{H} = \mathbb{C})$. These bounds are shown to be asymptotically tight: both a deterministic harmonic frame $\boldsymbol{E}_m \in \mathbb{R}^{m \times 2}$ and Gaussian random matrices $\boldsymbol{A} \in \mathbb{H}^{m \times d}$ asymptotically attain them. Notably, the harmonic frame $\boldsymbol{E}_m \in \mathbb{R}^{m \times 2}$ achieves the optimal lower bound $\sqrt{3}$ for all $m \ge 3$ when $p=2$, thus serving as an optimal sensing matrix within $\boldsymbol{A} \in \mathbb{R}^{m \times 2}$. Our results provide the first explicit uniform lower bounds on $\beta_{\Psi_{\boldsymbol{A}}}^{\ell_p}$ and offer insights into the fundamental stability limits of phase retrieval.
Reza Yousefi Maragheh, Pratheek Vadla, Priyank Gupta, Kai Zhao, Aysenur Inan, Kehui Yao, Jianpeng Xu, Praveen Kanumala et al.
Retrieval-Augmented Generation (RAG) has shown promise in enhancing recommendation systems by incorporating external context into large language model prompts. However, existing RAG-based approaches often rely on static retrieval heuristics and fail to capture nuanced user preferences in dynamic recommendation scenarios. In this work, we introduce ARAG, an Agentic Retrieval-Augmented Generation framework for Personalized Recommendation, which integrates a multi-agent collaboration mechanism into the RAG pipeline. To better understand the long-term and session behavior of the user, ARAG leverages four specialized LLM-based agents: a User Understanding Agent that summarizes user preferences from long-term and session contexts, a Natural Language Inference (NLI) Agent that evaluates semantic alignment between candidate items retrieved by RAG and inferred intent, a context summary agent that summarizes the findings of NLI agent, and an Item Ranker Agent that generates a ranked list of recommendations based on contextual fit. We evaluate ARAG accross three datasets. Experimental results demonstrate that ARAG significantly outperforms standard RAG and recency-based baselines, achieving up to 42.1% improvement in NDCG@5 and 35.5% in Hit@5. We also, conduct an ablation study to analyse the effect by different components of ARAG. Our findings highlight the effectiveness of integrating agentic reasoning into retrieval-augmented recommendation and provide new directions for LLM-based personalization.
Guanting Dong, Xiaoxi Li, Yuyao Zhang, Mengjie Deng
Real-world live retrieval-augmented generation (RAG) systems face significant
challenges when processing user queries that are often noisy, ambiguous, and
contain multiple intents. While RAG enhances large language models (LLMs) with
external knowledge, current systems typically struggle with such complex
inputs, as they are often trained or evaluated on cleaner data. This paper
introduces Omni-RAG, a novel framework designed to improve the robustness and
effectiveness of RAG systems in live, open-domain settings. Omni-RAG employs
LLM-assisted query understanding to preprocess user inputs through three key
modules: (1) Deep Query Understanding and Decomposition, which utilizes LLMs
with tailored prompts to denoise queries (e.g., correcting spelling errors) and
decompose multi-intent queries into structured sub-queries; (2) Intent-Aware
Knowledge Retrieval, which performs retrieval for each sub-query from a corpus
(i.e., FineWeb using OpenSearch) and aggregates the results; and (3) Reranking
and Generation, where a reranker (i.e., BGE) refines document selection before
a final response is generated by an LLM (i.e., Falcon-10B) using a
chain-of-thought prompt. Omni-RAG aims to bridge the gap between current RAG
capabilities and the demands of real-world applications, such as those
highlighted by the SIGIR 2025 LiveRAG Challenge, by robustly handling complex
and noisy queries.
Authors' comments: Accepted at SIGIR 2025 LiveRAG Workshop (Oral Presentation)
Dinh-Khoi Vo, Van-Loc Nguyen, Minh-Triet Tran, Trung-Nghia Le
Retrieving 3D objects in complex indoor environments using only a masked 2D image and a natural language description presents significant challenges. The ROOMELSA challenge limits access to full 3D scene context, complicating reasoning about object appearance, geometry, and semantics. These challenges are intensified by distorted viewpoints, textureless masked regions, ambiguous language prompts, and noisy segmentation masks. To address this, we propose SAMURAI: Shape-Aware Multimodal Retrieval for 3D Object Identification. SAMURAI integrates CLIP-based semantic matching with shape-guided re-ranking derived from binary silhouettes of masked regions, alongside a robust majority voting strategy. A dedicated preprocessing pipeline enhances mask quality by extracting the largest connected component and removing background noise. Our hybrid retrieval framework leverages both language and shape cues, achieving competitive performance on the ROOMELSA private test set. These results highlight the importance of combining shape priors with language understanding for robust open-world 3D object retrieval.
Fangyuan Zhang, Zhengjun Huang, Yingli Zhou, Qintian Guo, Zhixun Li, Wensheng Luo, Di Jiang, Yixiang Fang et al.
Graph-based Retrieval-Augmented Generation (Graph-RAG) enhances large
language models (LLMs) by structuring retrieval over an external corpus.
However, existing approaches typically assume a static corpus, requiring
expensive full-graph reconstruction whenever new documents arrive, limiting
their scalability in dynamic, evolving environments. To address these
limitations, we introduce EraRAG, a novel multi-layered Graph-RAG framework
that supports efficient and scalable dynamic updates. Our method leverages
hyperplane-based Locality-Sensitive Hashing (LSH) to partition and organize the
original corpus into hierarchical graph structures, enabling efficient and
localized insertions of new data without disrupting the existing topology. The
design eliminates the need for retraining or costly recomputation while
preserving high retrieval accuracy and low latency. Experiments on large-scale
benchmarks demonstrate that EraRag achieves up to an order of magnitude
reduction in update time and token consumption compared to existing Graph-RAG
systems, while providing superior accuracy performance. This work offers a
practical path forward for RAG systems that must operate over continually
growing corpora, bridging the gap between retrieval efficiency and
adaptability. Our code and data are available at
https://github.com/EverM0re/EraRAG-Official.
Authors' comments: Under review
Zhigong Zhou, Ning Ding, Xiaochuan Fan, Yue Shang, Yiming Qiu, Jingwei Zhuo, Zhiwei Ge, Songlin Wang et al.
Semantic retrieval, which retrieves semantically matched items given a
textual query, has been an essential component to enhance system effectiveness
in e-commerce search. In this paper, we study the multimodal retrieval problem,
where the visual information (e.g, image) of item is leveraged as supplementary
of textual information to enrich item representation and further improve
retrieval performance. Though learning from cross-modality data has been
studied extensively in tasks such as visual question answering or media
summarization, multimodal retrieval remains a non-trivial and unsolved problem
especially in the asymmetric scenario where the query is unimodal while the
item is multimodal. In this paper, we propose a novel model named SMAR, which
stands for Semantic-enhanced Modality-Asymmetric Retrieval, to tackle the
problem of modality fusion and alignment in this kind of asymmetric scenario.
Extensive experimental results on an industrial dataset show that the proposed
model outperforms baseline models significantly in retrieval accuracy. We have
open sourced our industrial dataset for the sake of reproducibility and future
research works.
Authors' comments: published in sigir2023
Jia-Huei Ju, Suzan Verberne, Maarten de Rijke, Andrew Yates
Retrieval-augmented generation (RAG) enhances large language models by incorporating context retrieved from external knowledge sources. While the effectiveness of the retrieval module is typically evaluated with relevance-based ranking metrics, such metrics may be insufficient to reflect the retrieval's impact on the final RAG result, especially in long-form generation scenarios. We argue that providing a comprehensive retrieval-augmented context is important for long-form RAG tasks like report generation and propose metrics for assessing the context independent of generation. We introduce CRUX, a \textbf{C}ontrolled \textbf{R}etrieval-a\textbf{U}gmented conte\textbf{X}t evaluation framework designed to directly assess retrieval-augmented contexts. This framework uses human-written summaries to control the information scope of knowledge, enabling us to measure how well the context covers information essential for long-form generation. CRUX uses question-based evaluation to assess RAG's retrieval in a fine-grained manner. Empirical results show that CRUX offers more reflective and diagnostic evaluation. Our findings also reveal substantial room for improvement in current retrieval methods, pointing to promising directions for advancing RAG's retrieval. Our data and code are publicly available to support and advance future research on retrieval.
Michael Günther, Saba Sturua, Mohammad Kalim Akram, Isabelle Mohr, Andrei Ungureanu, Sedigheh Eslami, Scott Martens, Bo Wang et al.
We introduce jina-embeddings-v4, a 3.8 billion parameter multimodal embedding
model that unifies text and image representations through a novel architecture
supporting both single-vector and multi-vector embeddings in the late
interaction style. The model incorporates task-specific Low-Rank Adaptation
(LoRA) adapters to optimize performance across diverse retrieval scenarios,
including query-based information retrieval, cross-modal semantic similarity,
and programming code search. Comprehensive evaluations demonstrate that
jina-embeddings-v4 achieves state-of-the-art performance on both single- modal
and cross-modal retrieval tasks, with particular strength in processing
visually rich content such as tables, charts, diagrams, and mixed-media
formats. To facilitate evaluation of this capability, we also introduce
Jina-VDR, a novel benchmark specifically designed for visually rich image
retrieval.
Authors' comments: 22 pages, 1-10 main, 14-22 experimental results, benchmark tables
Mihailo Stojnic
We study theoretical limits of \emph{descending} phase retrieval algorithms. Utilizing \emph{Random duality theory} (RDT) we develop a generic program that allows statistical characterization of various algorithmic performance metrics. Through these we identify the concepts of \emph{parametric manifold} and its \emph{funneling points} as key mathematical objects that govern the underlying algorithms' behavior. An isomorphism between single funneling point manifolds and global convergence of descending algorithms is established. The structure and shape of the parametric manifold as well as its dependence on the sample complexity are studied through both plain and lifted RDT. Emergence of a phase transition is observed. Namely, as sample complexity increases, parametric manifold transitions from a multi to a single funneling point structure. This in return corresponds to a transition from the scenarios where descending algorithms generically fail to the scenarios where they succeed in solving phase retrieval. We also develop and implement a practical algorithmic variant that in a hybrid alternating fashion combines a barrier and a plain gradient descent. Even though the theoretical results are obtained for infinite dimensional scenarios (and consequently non-jittery parametric manifolds), we observe a strong agrement between theoretical and simulated phase transitions predictions for fairly small dimensions on the order of a few hundreds.
Ines Besrour, Jingbo He, Tobias Schreieder, Michael Färber
We present RAGentA, a multi-agent retrieval-augmented generation (RAG)
framework for attributed question answering (QA). With the goal of trustworthy
answer generation, RAGentA focuses on optimizing answer correctness, defined by
coverage and relevance to the question and faithfulness, which measures the
extent to which answers are grounded in retrieved documents. RAGentA uses a
multi-agent architecture that iteratively filters retrieved documents,
generates attributed answers with in-line citations, and verifies completeness
through dynamic refinement. Central to the framework is a hybrid retrieval
strategy that combines sparse and dense methods, improving Recall@20 by 12.5%
compared to the best single retrieval model, resulting in more correct and
well-supported answers. Evaluated on a synthetic QA dataset derived from the
FineWeb index, RAGentA outperforms standard RAG baselines, achieving gains of
1.09% in correctness and 10.72% in faithfulness. These results demonstrate the
effectiveness of the multi-agent architecture and hybrid retrieval in advancing
trustworthy QA.
Authors' comments: Accepted at SIGIR 2025
Xin Jiang, Meiqi Cao, Hao Tang, Fei Shen, Zechao Li
Fine-Grained Image Retrieval~(FGIR) faces challenges in learning discriminative visual representations to retrieve images with similar fine-grained features. Current leading FGIR solutions typically follow two regimes: enforce pairwise similarity constraints in the semantic embedding space, or incorporate a localization sub-network to fine-tune the entire model. However, such two regimes tend to overfit the training data while forgetting the knowledge gained from large-scale pre-training, thus reducing their generalization ability. In this paper, we propose a Dual-Vision Adaptation (DVA) approach for FGIR, which guides the frozen pre-trained model to perform FGIR through collaborative sample and feature adaptation. Specifically, we design Object-Perceptual Adaptation, which modifies input samples to help the pre-trained model perceive critical objects and elements within objects that are helpful for category prediction. Meanwhile, we propose In-Context Adaptation, which introduces a small set of parameters for feature adaptation without modifying the pre-trained parameters. This makes the FGIR task using these adjusted features closer to the task solved during the pre-training. Additionally, to balance retrieval efficiency and performance, we propose Discrimination Perception Transfer to transfer the discriminative knowledge in the object-perceptual adaptation to the image encoder using the knowledge distillation mechanism. Extensive experiments show that DVA has fewer learnable parameters and performs well on three in-distribution and three out-of-distribution fine-grained datasets.
Le Vu Anh, Nguyen Viet Anh, Mehmet Dik, Luong Van Nghia
Retrieval-augmented generation (RAG) has become a common strategy for
updating large language model (LLM) responses with current, external
information. However, models may still rely on memorized training data, bypass
the retrieved evidence, and produce contaminated outputs. We introduce
Retrieval-Path Contamination Scoring (RePCS), a diagnostic method that detects
such behavior without requiring model access or retraining. RePCS compares two
inference paths: (i) a parametric path using only the query, and (ii) a
retrieval-augmented path using both the query and retrieved context by
computing the Kullback-Leibler (KL) divergence between their output
distributions. A low divergence suggests that the retrieved context had minimal
impact, indicating potential memorization. This procedure is model-agnostic,
requires no gradient or internal state access, and adds only a single
additional forward pass. We further derive PAC-style guarantees that link the
KL threshold to user-defined false positive and false negative rates. On the
Prompt-WNQA benchmark, RePCS achieves a ROC-AUC of 0.918. This result
outperforms the strongest prior method by 6.5 percentage points while keeping
latency overhead below 4.7% on an NVIDIA T4 GPU. RePCS offers a lightweight,
black-box safeguard to verify whether a RAG system meaningfully leverages
retrieval, making it especially valuable in safety-critical applications.
Authors' comments: 11 pages, 7 figures, 5 tables
Abu Hanif Muhammad Syarubany, Chang Dong Yoo
Enterprise deployments of large-language model (LLM) demand continuously changing document collections with sub-second latency and predictable GPU cost requirements that classical Retrieval-Augmented Generation (RAG) pipelines only partially satisfy. We present PentaRAG, a five-layer module that routes each query through two instant caches (fixed key-value and semantic), a memory-recall mode that exploits the LLM's own weights, an adaptive session memory, and a conventional retrieval-augmentation layer. Implemented with Mistral-8B, Milvus and vLLM, the system can answer most repeated or semantically similar questions from low-latency caches while retaining full retrieval for novel queries. On the TriviaQA domain, LoRA fine-tuning combined with the memory-recall layer raises answer similarity by approximately 8% and factual correctness by approximately 16% over the base model. Under a nine-session runtime simulation, cache warming reduces mean latency from several seconds to well below one second and shifts traffic toward the fast paths. Resource-efficiency tests show that PentaRAG cuts average GPU time to 0.248 seconds per query, roughly half that of a naive RAG baseline, and sustains an aggregate throughput of approximately 100,000 queries per second on our setup. These results demonstrate that a layered routing strategy can deliver freshness, speed, and efficiency simultaneously in production-grade RAG systems.
Authors' comments: Annual Conference of The Institute of Electronics and Information Engineers
Yanzhen Zou, Xianlin Zhao, Xinglu Pan, Bing Xie
Issue reports have been recognized to contain rich information for
retrieval-augmented code comment generation. However, how to minimize
hallucinations in the generated comments remains significant challenges. In
this paper, we propose IsComment, an issue-based LLM retrieval and verification
approach for generating method's design rationale, usage directives, and so on
as supplementary code comments. We first identify five main types of code
supplementary information that issue reports can provide through
code-comment-issue analysis. Next, we retrieve issue sentences containing these
types of supplementary information and generate candidate code comments. To
reduce hallucinations, we filter out those candidate comments that are
irrelevant to the code or unverifiable by the issue report, making the code
comment generation results more reliable. Our experiments indicate that
compared with LLMs, IsComment increases the coverage of manual supplementary
comments from 33.6% to 72.2% for ChatGPT, from 35.8% to 88.4% for GPT-4o, and
from 35.0% to 86.2% for DeepSeek-V3. Compared with existing work, IsComment can
generate richer and more useful supplementary code comments for programming
understanding, which is quantitatively evaluated through the MESIA metric on
both methods with and without manual code comments.
Authors' comments: 12 pages, 8 figures
Dong Xu, Zhangfan Yang, Ka-chun Wong, Zexuan Zhu, Jiangqiang Li, Junkai Ji
Breakthroughs in high-accuracy protein structure prediction, such as
AlphaFold, have established receptor-based molecule design as a critical driver
for rapid early-phase drug discovery. However, most approaches still struggle
to balance pocket-specific geometric fit with strict valence and synthetic
constraints. To resolve this trade-off, a Retrieval-Enhanced Aligned Diffusion
termed READ is introduced, which is the first to merge molecular
Retrieval-Augmented Generation with an SE(3)-equivariant diffusion model.
Specifically, a contrastively pre-trained encoder aligns atom-level
representations during training, then retrieves graph embeddings of
pocket-matched scaffolds to guide each reverse-diffusion step at inference.
This single mechanism can inject real-world chemical priors exactly where
needed, producing valid, diverse, and shape-complementary ligands. Experimental
results demonstrate that READ can achieve very competitive performance in
CBGBench, surpassing state-of-the-art generative models and even native
ligands. That suggests retrieval and diffusion can be co-optimized for faster,
more reliable structure-based drug design.
Authors' comments: 13 pages, 5 figures