Liyang He, Zhenya Huang, Cheng Yang, Rui Li, Zheng Zhang, Kai Zhang, Zhi Li, Qi Liu et al.
With the rapid growth of textual content on the Internet, efficient large-scale semantic text retrieval has garnered increasing attention from both academia and industry. Text hashing, which projects original texts into compact binary hash codes, is a crucial method for this task. By using binary codes, the semantic similarity computation for text pairs is significantly accelerated via fast Hamming distance calculations, and storage costs are greatly reduced. With the advancement of deep learning, deep text hashing has demonstrated significant advantages over traditional, data-independent hashing techniques. By leveraging deep neural networks, these methods can learn compact and semantically rich binary representations directly from data, overcoming the performance limitations of earlier approaches. This survey investigates current deep text hashing methods by categorizing them based on their core components: semantic extraction, hash code quality preservation, and other key technologies. We then present a detailed evaluation schema with results on several popular datasets, followed by a discussion of practical applications and open-source tools for implementation. Finally, we conclude by discussing key challenges and future research directions, including the integration of deep text hashing with large language models to further advance the field. The project for this survey can be accessed at https://github.com/hly1998/DeepTextHashing.
Zhipeng Liao, Kunming Shao, Jiangnan Yu, Liang Zhao, Tim Kwang-Ting Cheng, Chi-Ying Tsui, Jie Yang, Mohamad Sawan
With powerful and integrative large language models (LLMs), medical AI agents
have demonstrated unique advantages in providing personalized medical
consultations, continuous health monitoring, and precise treatment plans.
Retrieval-Augmented Generation (RAG) integrates personal medical documents into
LLMs by an external retrievable database to address the costly retraining or
fine-tuning issues in deploying customized agents. While deploying medical
agents in edge devices ensures privacy protection, RAG implementations impose
substantial memory access and energy consumption during the retrieval stage.
This paper presents a hierarchical retrieval architecture for edge RAG,
leveraging a two-stage retrieval scheme that combines approximate retrieval for
candidate set generation, followed by high-precision retrieval on pre-selected
document embeddings. The proposed architecture significantly reduces energy
consumption and external memory access while maintaining retrieval accuracy.
Simulation results show that, under TSMC 28nm technology, the proposed
hierarchical retrieval architecture has reduced the overall memory access by
nearly 50% and the computation by 75% compared to pure INT8 retrieval, and the
total energy consumption for 1 MB data retrieval is 177.76 {\mu}J/query.
Authors' comments: Accepted by BioCAS2025
Arnabh Borah, Md Tanvirul Alam, Nidhi Rastogi
Security applications are increasingly relying on large language models (LLMs) for cyber threat detection; however, their opaque reasoning often limits trust, particularly in decisions that require domain-specific cybersecurity knowledge. Because security threats evolve rapidly, LLMs must not only recall historical incidents but also adapt to emerging vulnerabilities and attack patterns. Retrieval-Augmented Generation (RAG) has demonstrated effectiveness in general LLM applications, but its potential for cybersecurity remains underexplored. In this work, we introduce a RAG-based framework designed to contextualize cybersecurity data and enhance LLM accuracy in knowledge retention and temporal reasoning. Using external datasets and the Llama-3-8B-Instruct model, we evaluate baseline RAG, an optimized hybrid retrieval approach, and conduct a comparative analysis across multiple performance metrics. Our findings highlight the promise of hybrid retrieval in strengthening the adaptability and reliability of LLMs for cybersecurity tasks.
Junya Shiraishi, Shashi Raj Pandey, Israel Leyva-Mayorga, Petar Popovski
The use of Dynamic Random Access Memory (DRAM) for storing Machine Learning (ML) models plays a critical role in accelerating ML inference tasks in the next generation of communication systems. However, periodic refreshment of DRAM results in wasteful energy consumption during standby periods, which is significant for resource-constrained Internet of Things (IoT) devices. To solve this problem, this work advocates two novel approaches: 1) wireless memory activation and 2) wireless memory approximation. These enable the wireless devices to efficiently manage the available memory by considering the timing aspects and relevance of ML model usage; hence, reducing the overall energy consumption. Numerical results show that our proposed scheme can realize smaller energy consumption than the always-on approach while satisfying the retrieval accuracy constraint.
Xinhang Li, Qing Guo, Junyu Chen, Zheng Guo, Shengzhe Xu, Lei Li, Lin Zhang
With increasing urban traffic complexity, Traffic Signal Control (TSC) is essential for optimizing traffic flow and improving road safety. Large Language Models (LLMs) emerge as promising approaches for TSC. However, they are prone to hallucinations in emergencies, leading to unreliable decisions that may cause substantial delays for emergency vehicles. Moreover, diverse intersection types present substantial challenges for traffic state encoding and cross-intersection training, limiting generalization across heterogeneous intersections. Therefore, this paper proposes Retrieval Augmented Generation (RAG)-enhanced distributed LLM agents with Emergency response for Generalizable TSC (REG-TSC). Firstly, this paper presents an emergency-aware reasoning framework, which dynamically adjusts reasoning depth based on the emergency scenario and is equipped with a novel Reviewer-based Emergency RAG (RERAG) to distill specific knowledge and guidance from historical cases, enhancing the reliability and rationality of agents' emergency decisions. Secondly, this paper designs a type-agnostic traffic representation and proposes a Reward-guided Reinforced Refinement (R3) for heterogeneous intersections. R3 adaptively samples training experience from diverse intersections with environment feedback-based priority and fine-tunes LLM agents with a designed reward-weighted likelihood loss, guiding REG-TSC toward high-reward policies across heterogeneous intersections. On three real-world road networks with 17 to 177 heterogeneous intersections, extensive experiments show that REG-TSC reduces travel time by 42.00%, queue length by 62.31%, and emergency vehicle waiting time by 83.16%, outperforming other state-of-the-art methods.
Qi Luo, Xiaonan Li, Tingshuo Fan, Xinchi Chen, Xipeng Qiu
Retrieval-augmented generation (RAG) has emerged as a leading approach to reducing hallucinations in large language models (LLMs). Current RAG evaluation benchmarks primarily focus on what we call local RAG: retrieving relevant chunks from a small subset of documents to answer queries that require only localized understanding within specific text chunks. However, many real-world applications require a fundamentally different capability -- global RAG -- which involves aggregating and analyzing information across entire document collections to derive corpus-level insights (for example, "What are the top 10 most cited papers in 2023?"). In this paper, we introduce GlobalQA -- the first benchmark specifically designed to evaluate global RAG capabilities, covering four core task types: counting, extremum queries, sorting, and top-k extraction. Through systematic evaluation across different models and baselines, we find that existing RAG methods perform poorly on global tasks, with the strongest baseline achieving only 1.51 F1 score. To address these challenges, we propose GlobalRAG, a multi-tool collaborative framework that preserves structural coherence through chunk-level retrieval, incorporates LLM-driven intelligent filters to eliminate noisy documents, and integrates aggregation modules for precise symbolic computation. On the Qwen2.5-14B model, GlobalRAG achieves 6.63 F1 compared to the strongest baseline's 1.51 F1, validating the effectiveness of our method.
Yanran Tang, Ruihong Qiu, Xue Li, Zi Huang
Legal case retrieval (LCR) is a cornerstone of real-world legal decision making, as it enables practitioners to identify precedents for a given query case. Existing approaches mainly rely on traditional lexical models and pretrained language models to encode the texts of legal cases. Yet there are rich information in the relations among different legal entities as well as the crucial reasoning process that uncovers how legal facts and legal issues can lead to judicial decisions. Such relational reasoning process reflects the distinctive characteristics of each case that can distinguish one from another, mirroring the real-world judicial process. Naturally, incorporating such information into the precise case embedding could further enhance the accuracy of case retrieval. In this paper, a novel ReaKase-8B framework is proposed to leverage extracted legal facts, legal issues, legal relation triplets and legal reasoning for effective legal case retrieval. ReaKase-8B designs an in-context legal case representation learning paradigm with a fine-tuned large language model. Extensive experiments on two benchmark datasets from COLIEE 2022 and COLIEE 2023 demonstrate that our knowledge and reasoning augmented embeddings substantially improve retrieval performance over baseline models, highlighting the potential of integrating legal reasoning into legal case retrieval systems. The code has been released on https://github.com/yanran-tang/ReaKase-8B.
Ishfaq Aziz, Mohamad Alipour
Estimating subsurface dielectric properties is essential for applications
ranging from environmental surveys of soils to nondestructive evaluation of
concrete in infrastructure. Conventional wave inversion methods typically
assume few discrete homogeneous layers and require dense measurements or strong
prior knowledge of material boundaries, limiting scalability and accuracy in
realistic settings where properties vary continuously. We present a physics
informed machine learning framework that reconstructs subsurface permittivity
as a fully neural, continuous function of depth, trained to satisfy both
measurement data and Maxwells equations. We validate the framework with both
simulations and custom built radar experiments on multilayered natural
materials. Results show close agreement with in-situ permittivity measurements
(R^2=0.93), with sensitivity to even subtle variations (Delta eps_r=2).
Parametric analysis reveals that accurate profiles can be recovered with as few
as three strategically placed sensors in two layer systems. This approach
reframes subsurface inversion from boundary-driven to continuous property
estimation, enabling accurate characterization of smooth permittivity
variations and advancing electromagnetic imaging using low cost radar systems.
Authors' comments: 22 pages, 9 main text figures + 2 supplementary figures
Wenyan Xu, Dawei Xiang, Tianqi Ding, Weihai Lu
Misinformation and disinformation demand fact checking that goes beyond
simple evidence-based reasoning. Existing benchmarks fall short: they are
largely single modality (text-only), span short time horizons, use shallow
evidence, cover domains unevenly, and often omit full articles -- obscuring
models' real-world capability. We present MMM-Fact, a large-scale benchmark of
125,449 fact-checked statements (1995--2025) across multiple domains, each
paired with the full fact-check article and multimodal evidence (text, images,
videos, tables) from four fact-checking sites and one news outlet. To reflect
verification effort, each statement is tagged with a retrieval-difficulty tier
-- Basic (1--5 sources), Intermediate (6--10), and Advanced (>10) -- supporting
fairness-aware evaluation for multi-step, cross-modal reasoning. The dataset
adopts a three-class veracity scheme (true/false/not enough information) and
enables tasks in veracity prediction, explainable fact-checking, complex
evidence aggregation, and longitudinal analysis. Baselines with mainstream LLMs
show MMM-Fact is markedly harder than prior resources, with performance
degrading as evidence complexity rises. MMM-Fact offers a realistic, scalable
benchmark for transparent, reliable, multimodal fact-checking.
Authors' comments: Dataset link: https://huggingface.co/datasets/Wenyan0110/MMM-Fact
Mengzhou Sun, Sendong Zhao, Jianyu Chen, Bin Qin
Evidence-based medicine (EBM) research has always been of paramount importance. It is important to find appropriate medical theoretical support for the needs from physicians or patients to reduce the occurrence of medical accidents. This process is often carried out by human querying relevant literature databases, which lacks objectivity and efficiency. Therefore, researchers utilize retrieval-augmented generation (RAG) to search for evidence and generate responses automatically. However, current RAG methods struggle to handle complex queries in real-world clinical scenarios. For example, when queries lack certain information or use imprecise language, the model may retrieve irrelevant evidence and generate unhelpful answers. To address this issue, we present the PICOs-RAG to expand the user queries into a better format. Our method can expand and normalize the queries into professional ones and use the PICO format, a search strategy tool present in EBM, to extract the most important information used for retrieval. This approach significantly enhances retrieval efficiency and relevance, resulting in up to an 8.8\% improvement compared to the baseline evaluated by our method. Thereby the PICOs-RAG improves the performance of the large language models into a helpful and reliable medical assistant in EBM.
Binbin Li, Guimiao Yang, Zisen Qi, Haiping Wang, Yu Ding
Recent lightweight retrieval-augmented image caption models often utilize retrieved data solely as text prompts, thereby creating a semantic gap by leaving the original visual features unenhanced, particularly for object details or complex scenes. To address this limitation, we propose $DualCap$, a novel approach that enriches the visual representation by generating a visual prompt from retrieved similar images. Our model employs a dual retrieval mechanism, using standard image-to-text retrieval for text prompts and a novel image-to-image retrieval to source visually analogous scenes. Specifically, salient keywords and phrases are derived from the captions of visually similar scenes to capture key objects and similar details. These textual features are then encoded and integrated with the original image features through a lightweight, trainable feature fusion network. Extensive experiments demonstrate that our method achieves competitive performance while requiring fewer trainable parameters compared to previous visual-prompting captioning approaches.
Wilhelmina Maryann Joseph, Beate Stelzer, Salvatore Orlando, Moritz Klawin
Context. Stellar coronae are unresolved in X-rays, so inferences about their structure rely on spectral analysis. The "Sun-as-an-X-ray-star" (SaXS) approach uses the Sun as a spatially resolved template to interpret stellar spectra, but previous SaXS implementations were indirect and computationally heavy. Aims. We present a new SaXS implementation that converts solar emission measure distributions (EMDs) of distinct coronal region types into XSPEC spectral components and test whether broad-band X-ray spectra alone can recover their filling factors. Methods. We built XSPEC multi-temperature spectral models for four solar region types (background/quiet corona, active regions, cores, and flares) by using EMDs derived from Yohkoh/SXT data and translating each EMD bin into an isothermal apec component. These models were fit (using PyXspec) to two one-hour DAXSS spectra representative of quiescent (2022-06-29) and flaring (2022-04-25) states. Best-fit normalizations were converted into projected areas and filling factors and compared with near-coincident Hinode/XRT full-disk images. Results. Using the Yohkoh/SXT EMDs, the quiescent Sun spectrum is dominated by active region emission (filling factor ~22%), with the background corona poorly constrained. The flaring Sun spectrum is best described by a combination of active regions, cores, and flares with filling factors of ~47.5%, ~4.1%, and ~0.062%, respectively. The dominant components match spatial features seen in Hinode/XRT images. Limitations include the DAXSS low-energy cutoff (~0.7 keV) and the small, non-uniform Yohkoh EMD sample. Conclusions. Our SaXS implementation enables direct retrieval of coronal filling factors from broad-band X-ray spectra and provides a physically motivated alternative to ad hoc few-temperature fits, suitable for stellar X-ray analyses.
Anooshka Bajaj, Deven Mahesh Mistry, Sahaj Singh Maini, Yash Aggarwal, Zoran Tiganj
In-context learning is governed by both temporal and semantic relationships, shaping how Large Language Models (LLMs) retrieve contextual information. Analogous to human episodic memory, where the retrieval of specific events is enabled by separating events that happened at different times, this work probes the ability of various pretrained LLMs, including transformer and state-space models, to differentiate and retrieve temporally separated events. Specifically, we prompted models with sequences containing multiple presentations of the same token, which reappears at the sequence end. By fixing the positions of these repeated tokens and permuting all others, we removed semantic confounds and isolated temporal effects on next-token prediction. Across diverse sequences, models consistently placed the highest probabilities on tokens following a repeated token, but with a notable bias for those nearest the beginning or end of the input. An ablation experiment linked this phenomenon in transformers to induction heads. Extending the analysis to unique semantic contexts with partial overlap further demonstrated that memories embedded in the middle of a prompt are retrieved less reliably. Despite architectural differences, state-space and transformer models showed comparable temporal biases. Our findings deepen the understanding of temporal biases in in-context learning and offer an illustration of how these biases can enable temporal separation and episodic retrieval.
Juyeon Kim, Geon Lee, Dongwon Choi, Taeuk Kim, Kijung Shin
Retrieval over visually rich documents is essential for tasks such as legal discovery, scientific search, and enterprise knowledge management. Existing approaches fall into two paradigms: single-vector retrieval, which is efficient but coarse, and multi-vector retrieval, which is accurate but computationally expensive. To address this trade-off, we propose HEAVEN, a two-stage hybrid-vector framework. In the first stage, HEAVEN efficiently retrieves candidate pages using a single-vector method over Visually-Summarized Pages (VS-Pages), which assemble representative visual layouts from multiple pages. In the second stage, it reranks candidates with a multi-vector method while filtering query tokens by linguistic importance to reduce redundant computations. To evaluate retrieval systems under realistic conditions, we also introduce ViMDOC, the first benchmark for visually rich, multi-document, and long-document retrieval. Across four benchmarks, HEAVEN attains 99.87% of the Recall@1 performance of multi-vector models on average while reducing per-query computation by 99.82%, achieving efficiency and accuracy. Our code and datasets are available at: https://github.com/juyeonnn/HEAVEN
Tianhong Gao, Jundong Shen, Bei Shi, Jiapeng Wang, Ying Ju, Junfeng Yao, Jiao Ran, Yong Zhang et al.
Intelligent customer service (ICS) systems via retrieval-augmented generation (RAG) have been widely adopted in Web-based domains such as social platforms and e-commerce, achieving remarkable improvements in automation and efficiency. However, notable limitations still remain: these systems are prone to hallucinations and often generate rigid, mechanical responses, which can introduce business risks and undermine user experience, especially in Web-based customer service interactions under the RAG scenarios. In this paper, we introduce OlaMind, a human-like and hallucination-safe customer service framework for retrieval-augmented dialogue. Specifically, it first leverages a Learn-to-Think stage to learn the reasoning processes and response strategies from human experts, and then employs a Learn-to-Respond stage to perform cold-start supervised fine-tuning (SFT) combined with reinforcement learning (RL) for basic-to-hard self-refinement. Our method significantly enhances human-likeness and naturalness while effectively mitigating hallucinations and critical business risks. We have conducted large-scale online A/B experiments in an industry-level social customer service setting, and extensive experimental results show that OlaMind achieves significant cumulative relative improvements with intelligent resolution rates +28.92%/+18.42% and human takeover rate -6.08%/-7.12% in community-support/livestream-interaction scenarios, respectively, which highlights its consistent effectiveness across diverse real-world applications. The code and data will be publicly available.
Yue Feng, Jinwei Hu, Qijia Lu, Jiawei Niu, Li Tan, Shuo Yuan, Ziyi Yan, Yizhen Jia et al.
We propose the Multi-modal Untrimmed Video Retrieval task, along with a new
benchmark (MUVR) to advance video retrieval for long-video platforms. MUVR aims
to retrieve untrimmed videos containing relevant segments using multi-modal
queries. It has the following features: 1) Practical retrieval paradigm: MUVR
supports video-centric multi-modal queries, expressing fine-grained retrieval
needs through long text descriptions, video tag prompts, and mask prompts. It
adopts a one-to-many retrieval paradigm and focuses on untrimmed videos,
tailored for long-video platform applications. 2) Multi-level visual
correspondence: To cover common video categories (e.g., news, travel, dance)
and precisely define retrieval matching criteria, we construct multi-level
visual correspondence based on core video content (e.g., news events, travel
locations, dance moves) which users are interested in and want to retrieve. It
covers six levels: copy, event, scene, instance, action, and others. 3)
Comprehensive evaluation criteria: We develop 3 versions of MUVR (i.e., Base,
Filter, QA). MUVR-Base/Filter evaluates retrieval models, while MUVR-QA
assesses MLLMs in a question-answering format. We also propose a Reranking
Score to evaluate the reranking ability of MLLMs. MUVR consists of 53K
untrimmed videos from the video platform Bilibili, with 1,050 multi-modal
queries and 84K matches. Extensive evaluations of 3 state-of-the-art video
retrieval models, 6 image-based VLMs, and 10 MLLMs are conducted. MUVR reveals
the limitations of retrieval methods in processing untrimmed videos and
multi-modal queries, as well as MLLMs in multi-video understanding and
reranking. Our code and benchmark is available at
https://github.com/debby-0527/MUVR.
Authors' comments: Accepted to NeurIPS 2025 D&B Track
Hanyu Zhu, Lance Fiondella, Jiawei Yuan, Kai Zeng, Long Jiao
Retrieval-Augmented Generation (RAG) empowers Large Language Models (LLMs) to dynamically integrate external knowledge during inference, improving their factual accuracy and adaptability. However, adversaries can inject poisoned external knowledge to override the model's internal memory. While existing attacks iteratively manipulate retrieval content or prompt structure of RAG, they largely ignore the model's internal representation dynamics and neuron-level sensitivities. The underlying mechanism of RAG poisoning has not been fully studied and the effect of knowledge conflict with strong parametric knowledge in RAG is not considered. In this work, we propose NeuroGenPoisoning, a novel attack framework that generates adversarial external knowledge in RAG guided by LLM internal neuron attribution and genetic optimization. Our method first identifies a set of Poison-Responsive Neurons whose activation strongly correlates with contextual poisoning knowledge. We then employ a genetic algorithm to evolve adversarial passages that maximally activate these neurons. Crucially, our framework enables massive-scale generation of effective poisoned RAG knowledge by identifying and reusing promising but initially unsuccessful external knowledge variants via observed attribution signals. At the same time, Poison-Responsive Neurons guided poisoning can effectively resolves knowledge conflict. Experimental results across models and datasets demonstrate consistently achieving high Population Overwrite Success Rate (POSR) of over 90% while preserving fluency. Empirical evidence shows that our method effectively resolves knowledge conflict.
Yanlin Song, Ben Liu, Víctor Gutiérrez-Basulto, Zhiwei Hu, Qianqian Xie, Min Peng, Sophia Ananiadou, Jeff Z. Pan
Knowledge Graph Question Answering aims to answer natural language questions by reasoning over structured knowledge graphs. While large language models have advanced KGQA through their strong reasoning capabilities, existing methods continue to struggle to fully exploit both the rich knowledge encoded in KGs and the reasoning capabilities of LLMs, particularly in complex scenarios. They often assume complete KG coverage and lack mechanisms to judge when external information is needed, and their reasoning remains locally myopic, failing to maintain coherent multi-step planning, leading to reasoning failures even when relevant knowledge exists. We propose Graph-RFT, a novel two-stage reinforcement fine-tuning KGQA framework with a 'plan-KGsearch-and-Websearch-during-think' paradigm, that enables LLMs to perform autonomous planning and adaptive retrieval scheduling across KG and web sources under incomplete knowledge conditions. Graph-RFT introduces a chain-of-thought fine-tuning method with a customized plan-retrieval dataset activates structured reasoning and resolves the GRPO cold-start problem. It then introduces a novel plan-retrieval guided reinforcement learning process integrates explicit planning and retrieval actions with a multi-reward design, enabling coverage-aware retrieval scheduling. It employs a Cartesian-inspired planning module to decompose complex questions into ordered subquestions, and logical expression to guide tool invocation for globally consistent multi-step reasoning. This reasoning retrieval process is optimized with a multi-reward combining outcome and retrieval specific signals, enabling the model to learn when and how to combine KG and web retrieval effectively.
Timur Galimzyanov, Olga Kolomyttseva, Egor Bogomolov
We study retrieval design for code-focused generation tasks under realistic compute budgets. Using two complementary tasks from Long Code Arena -- code completion and bug localization -- we systematically compare retrieval configurations across various context window sizes along three axes: (i) chunking strategy, (ii) similarity scoring, and (iii) splitting granularity. (1) For PL-PL, sparse BM25 with word-level splitting is the most effective and practical, significantly outperforming dense alternatives while being an order of magnitude faster. (2) For NL-PL, proprietary dense encoders (Voyager-3 family) consistently beat sparse retrievers, however requiring 100x larger latency. (3) Optimal chunk size scales with available context: 32-64 line chunks work best at small budgets, and whole-file retrieval becomes competitive at 16000 tokens. (4) Simple line-based chunking matches syntax-aware splitting across budgets. (5) Retrieval latency varies by up to 200x across configurations; BPE-based splitting is needlessly slow, and BM25 + word splitting offers the best quality-latency trade-off. Thus, we provide evidence-based recommendations for implementing effective code-oriented RAG systems based on task requirements, model constraints, and computational efficiency.
Zhouwei Zhai, Mengxiang Chen, Haoyun Xia, Jin Li, Renquan Zhou, Min Yang
The retrieval-ranking paradigm has long dominated e-commerce search, but its reliance on query-item matching fundamentally misaligns with multi-stage cognitive decision processes of platform users. This misalignment introduces critical limitations: semantic gaps in complex queries, high decision costs due to cross-platform information foraging, and the absence of professional shopping guidance. To address these issues, we propose a Multi-Agent Cognitive Decision Framework (MACDF), which shifts the paradigm from passive retrieval to proactive decision support. Extensive offline evaluations demonstrate MACDF's significant improvements in recommendation accuracy and user satisfaction, particularly for complex queries involving negation, multi-constraint, or reasoning demands. Online A/B testing on JD search platform confirms its practical efficacy. This work highlights the transformative potential of multi-agent cognitive systems in redefining e-commerce search.