Baoyi Zeng, Marc-Antoine Martinod, Denis Defrère
Accurate null depth retrieval is critical in nulling interferometry. However, achieving accurate null depth calibration is challenging due to various noise sources, instrumental imperfections, and the complexity of real observational environments. These challenges necessitate advanced calibration techniques that can efficiently handle such uncertainties while maintaining a high accuracy. This paper aims to incorporate machine-learning techniques with a Bayesian inference to improve the accuracy and efficiency of null depth retrieval in nulling interferometry. Specifically, it explores the use of neural posterior estimation (NPE) to develop models that overcome the computational limitations of conventional methods, such as numerical self-calibration (NSC), providing a more robust solution for accurate null depth calibration. An NPE-based model was developed, with a simulator that incorporates real data to better represent specific conditions. The model was tested on both synthetic and observational data from the LBTI nuller for evaluation. The NPE model successfully demonstrated improved efficiency, achieving results comparable to current methods in use. It achieved a null depth retrieval accuracy down to a few $10^{-4}$ on real observational data, matching the performance of conventional approaches while offering significant computational advantages, reducing the data retrieval time to one-quarter of the time required by self-calibration methods. The NPE model presents a practical and scalable solution for null depth calibration in nulling interferometry, offering substantial improvements in efficiency over existing methods with a better precision and application to other interferometric techniques.
Authors' comments: 9 pages, 7 Figures. accepted for publication in A&A
Shuyi Liu, Yuming Shang, Xi Zhang
Retrieval-Augmented Generation (RAG) has emerged as a powerful framework for enhancing the capabilities of Large Language Models (LLMs) by integrating retrieval-based methods with generative models. As external knowledge repositories continue to expand and the parametric knowledge within models becomes outdated, a critical challenge for RAG systems is resolving conflicts between retrieved external information and LLMs' internal knowledge, which can significantly compromise the accuracy and reliability of generated content. However, existing approaches to conflict resolution typically operate at the token or semantic level, often leading to fragmented and partial understanding of factual discrepancies between LLMs' knowledge and context, particularly in knowledge-intensive tasks. To address this limitation, we propose TruthfulRAG, the first framework that leverages Knowledge Graphs (KGs) to resolve factual-level knowledge conflicts in RAG systems. Specifically, TruthfulRAG constructs KGs by systematically extracting triples from retrieved content, utilizes query-based graph retrieval to identify relevant knowledge, and employs entropy-based filtering mechanisms to precisely locate conflicting elements and mitigate factual inconsistencies, thereby enabling LLMs to generate faithful and accurate responses. Extensive experiments reveal that TruthfulRAG outperforms existing methods, effectively alleviating knowledge conflicts and improving the robustness and trustworthiness of RAG systems.
Authors' comments: 12 pages, 3 figures, accepted at AAAI 2026
Qinfeng Li, Miao Pan, Ke Xiong, Ge Su, Zhiqiang Shen, Yan Liu, Bing Sun, Hao Peng et al.
Retrieval-Augmented Generation (RAG) systems deployed over proprietary knowledge bases face growing threats from reconstruction attacks that aggregate model responses to replicate knowledge bases. Such attacks exploit both intra-class and inter-class paths, progressively extracting fine-grained knowledge within topics and diffusing it across semantically related ones, thereby enabling comprehensive extraction of the original knowledge base. However, existing defenses target only one path, leaving the other unprotected. We conduct a systematic exploration to assess the impact of protecting each path independently and find that joint protection is essential for effective defense. Based on this, we propose RAGFort, a structure-aware dual-module defense combining "contrastive reindexing" for inter-class isolation and "constrained cascade generation" for intra-class protection. Experiments across security, performance, and robustness confirm that RAGFort significantly reduces reconstruction success while preserving answer quality, offering comprehensive defense against knowledge base extraction attacks.
Authors' comments: Accepted by AAAI 2026 Conference
Bo Li, Zhenghua Xu, Rui Xie
Multilingual Retrieval-Augmented Generation (RAG) enables large language models (LLMs) to perform knowledge-intensive tasks in multilingual settings by leveraging retrieved documents as external evidence. However, when the retrieved evidence differs in language from the user query and in-context exemplars, the model often exhibits language drift by generating responses in an unintended language. This phenomenon is especially pronounced during reasoning-intensive decoding, such as Chain-of-Thought (CoT) generation, where intermediate steps introduce further language instability. In this paper, we systematically study output language drift in multilingual RAG across multiple datasets, languages, and LLM backbones. Our controlled experiments reveal that the drift results not from comprehension failure but from decoder-level collapse, where dominant token distributions and high-frequency English patterns dominate the intended generation language. We further observe that English serves as a semantic attractor under cross-lingual conditions, emerging as both the strongest interference source and the most frequent fallback language.
To mitigate this, we propose Soft Constrained Decoding (SCD), a lightweight, training-free decoding strategy that gently steers generation toward the target language by penalizing non-target-language tokens. SCD is model-agnostic and can be applied to any generation algorithm without modifying the architecture or requiring additional data. Experiments across three multilingual datasets and multiple typologically diverse languages show that SCD consistently improves language alignment and task performance, providing an effective and generalizable solution in multilingual RAG.
Authors' comments: AAAI'26, Oral Paper
Chunlei Shi, Han Xu, Yinghao Li, Yi-Lin Wei, Yongchao Feng, Yecheng Zhang, Dan Niu
Satellite-based radar retrieval methods are widely employed to fill coverage gaps in ground-based radar systems, especially in remote areas affected by terrain blockage and limited detection range. Existing methods predominantly rely on overly simplistic spatial-domain architectures constructed from a single data source, limiting their ability to accurately capture complex precipitation patterns and sharply defined meteorological boundaries. To address these limitations, we propose WaveC2R, a novel wavelet-driven coarse-to-refined framework for radar retrieval. WaveC2R integrates complementary multi-source data and leverages frequency-domain decomposition to separately model low-frequency components for capturing precipitation patterns and high-frequency components for delineating sharply defined meteorological boundaries. Specifically, WaveC2R consists of two stages (i)Intensity-Boundary Decoupled Learning, which leverages wavelet decomposition and frequency-specific loss functions to separately optimize low-frequency intensity and high-frequency boundaries; and (ii)Detail-Enhanced Diffusion Refinement, which employs frequency-aware conditional priors and multi-source data to progressively enhance fine-scale precipitation structures while preserving coarse-scale meteorological consistency. Experimental results on the publicly available SEVIR dataset demonstrate that WaveC2R achieves state-of-the-art performance in satellite-based radar retrieval, particularly excelling at preserving high-intensity precipitation features and sharply defined meteorological boundaries.
Authors' comments: AAAI2026 Project's webpage at this URL:https://spring-lovely.github.io/WaveC2R/
Jeongho Min, Dongyoung Kim, Jaehyup Lee
Cross-view image retrieval, particularly street-to-satellite matching, is a critical task for applications such as autonomous navigation, urban planning, and localization in GPS-denied environments. However, existing approaches often require supervised training on curated datasets and rely on panoramic or UAV-based images, which limits real-world deployment. In this paper, we present a simple yet effective cross-view image retrieval framework that leverages a pretrained vision encoder and a large language model (LLM), requiring no additional training. Given a monocular street-view image, our method extracts geographic cues through web-based image search and LLM-based location inference, generates a satellite query via geocoding API, and retrieves matching tiles using a pretrained vision encoder (e.g., DINOv2) with PCA-based whitening feature refinement. Despite using no ground-truth supervision or finetuning, our proposed method outperforms prior learning-based approaches on the benchmark dataset under zero-shot settings. Moreover, our pipeline enables automatic construction of semantically aligned street-to-satellite datasets, which is offering a scalable and cost-efficient alternative to manual annotation. All source codes will be made publicly available at https://jeonghomin.github.io/street2orbit.github.io/.
Authors' comments: Accepted to WACV 2026, 10pages, 4 figures
Wenda Wei, Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Lixin Su, Shuaiqiang Wang, Dawei Yin, Maarten de Rijke et al.
Retrieval-augmented generation (RAG) has proven to be effective in mitigating hallucinations in large language models, yet its effectiveness remains limited in complex, multi-step reasoning scenarios.Recent efforts have incorporated search-based interactions into RAG, enabling iterative reasoning with real-time retrieval. Most approaches rely on outcome-based supervision, offering no explicit guidance for intermediate steps. This often leads to reward hacking and degraded response quality. We propose Bi-RAR, a novel retrieval-augmented reasoning framework that evaluates each intermediate step jointly in both forward and backward directions. To assess the information completeness of each step, we introduce a bidirectional information distance grounded in Kolmogorov complexity, approximated via language model generation probabilities. This quantification measures both how far the current reasoning is from the answer and how well it addresses the question. To optimize reasoning under these bidirectional signals, we adopt a multi-objective reinforcement learning framework with a cascading reward structure that emphasizes early trajectory alignment. Empirical results on seven question answering benchmarks demonstrate that Bi-RAR surpasses previous methods and enables efficient interaction and reasoning with the search engine during training and inference.
Abhimanyu Borthakur, Jack Eden Hirschman, Sergio Carbajo
Ultrashort laser pulses enable attosecond-scale measurements and drive breakthroughs across science and technology, but their routine use hinges on reliable pulse characterization. Frequency-Resolved Optical Gating (FROG) is a leading solution, forming a spectrogram by scanning the delay between two pulse replicas and recording the nonlinear signal spectrum. In online settings, however, dense delay-frequency scans are costly or impractical-especially for long pulses, wavelength regimes with limited spectrometer coverage (e.g., UV), or hardware with coarse resolution, yielding severely undersampled FROG traces. Existing reconstruction methods struggle in this regime-iterative algorithms are computationally heavy, convolutional networks blur fine structure, and sequence models are unstable when inputs are discontinuous or sparse. We present a generative diffusion framework tailored to recover ultrafast pulse intensity and phase from incomplete FROG measurements. Our model infers missing spectro-temporal content with high fidelity, enabling accurate retrieval from aggressively downsampled inputs. On a simulated benchmark of FROG-pulse pairs, the diffusion approach surpasses strong CNN and Seq2Seq baselines in accuracy and stability while remaining efficient enough for near real-time deployment.
Authors' comments: Presented at: Optica Nonlinear Optics Topical Meeting 2025, Honolulu, Hawaii Optica Frontiers and Optics and Laser Science 2025, Denver, Colorado Winner: 2025 Emil Wolf Outstanding Student Paper Competition
Hadi Ghaemi, Lauren Snyder, Markus Stocker
Digital libraries for research, such as the ACM Digital Library or Semantic Scholar, do not enable the machine-supported, efficient reuse of scientific knowledge (e.g., in synthesis research). This is because these libraries are based on document-centric models with narrative text knowledge expressions that require manual or semi-automated knowledge extraction, structuring, and organization. We present ORKG reborn, an emerging digital library that supports finding, accessing, and reusing accurate, fine-grained, and reproducible machine-readable expressions of scientific knowledge that relate scientific statements and their supporting evidence in terms of data and code. The rich expressions of scientific knowledge are published as reborn (born-reusable) articles and provide novel possibilities for scientific knowledge retrieval, for instance by statistical methods, software packages, variables, or data matching specific constraints. We describe the proposed system and demonstrate its practical viability and potential for information retrieval in contrast to state-of-the-art digital libraries and document-centric scholarly communication using several published articles in research fields ranging from computer science to soil science. Our work underscores the enormous potential of scientific knowledge databases and a viable approach to their construction.
Seung Hwan Cho, Yujin Yang, Danik Baeck, Minjoo Kim, Young-Min Kim, Heejung Lee, Sangjin Park
Recommender systems (RS) are currently being studied to mitigate limitations during cold-start conditions by leveraging modality information or introducing Agent concepts based on the exceptional reasoning capabilities of Large Language Models (LLMs). Meanwhile, food and beverage recommender systems have traditionally used knowledge graph and ontology concepts due to the domain's unique data attributes and relationship characteristics. On this background, we propose MARC, a multimodal and multi-task cocktail recommender system based on Agentic Retrieval-Augmented Generation (RAG) utilizing graph database under cold-start conditions. The proposed system generates high-quality, contextually appropriate answers through two core processes: a task recognition router and a reflection process. The graph database was constructed by processing cocktail data from Kaggle, and its effectiveness was evaluated using 200 manually crafted questions. The evaluation used both LLM-as-a-judge and human evaluation to demonstrate that answers generated via the graph database outperformed those from a simple vector database in terms of quality. The code is available at https://github.com/diddbwls/cocktail_rec_agentrag
Authors' comments: 13 pages, 2 figures, Accepted at RDGENAI at CIKM 2025 workshop
Chuanfu Xiao, Jiaxin Zeng
Tensors, especially higher-order tensors, are typically represented in low-rank formats to preserve the main information of the high-dimensional data while saving memory space. In practice, only a small fraction elements in high-dimensional data are of interest, such as the $k$ largest or smallest elements. Thus, retrieving the $k$ largest/smallest elements from a low-rank tensor is a fundamental and important task in a wide variety of applications. In this paper, we first model the top-$k$ elements retrieval problem to a continuous constrained optimization problem. To address the equivalent optimization problem, we develop a block-alternating iterative algorithm that decomposes the original problem into a sequence of small-scale subproblems. Leveraging the separable summation structure of the objective function, a heuristic algorithm is proposed to solve these subproblems in an alternating manner. Numerical experiments with tensors from synthetic and real-world applications demonstrate that the proposed algorithm outperforms existing methods in terms of accuracy and stability.
Jinesh Patel, Arpit Malhotra, Ajay Pande, Prateek Caire
Retrieval-Augmented Generation (RAG) based chatbots are not only useful for information retrieval through questionanswering but also for making complex decisions based on injected private data.we present a survey on how much search time can be saved when retrieving complex information within an organization called "X Systems"(a stealth mode company) by using a RAG-based chatbot compared to traditional search methods. We compare the information retrieval time using standard search techniques versus the RAG-based chatbot for the same queries. Our results conclude that RAG-based chatbots not only save time in information retrieval but also optimize the search process effectively. This survey was conducted with a sample of 105 employees across departments, average time spending on information retrieval per query was taken as metric. Comparison shows us, there are average 80-95% improvement on search when use RAG based chatbot than using standard search.
Shreyas Rajesh, Pavan Holur, Chenda Duan, David Chong, Vwani Roychowdhury
Large Language Models (LLMs) face fundamental challenges in long-context reasoning: many documents exceed their finite context windows, while performance on texts that do fit degrades with sequence length, necessitating their augmentation with external memory frameworks. Current solutions, which have evolved from retrieval using semantic embeddings to more sophisticated structured knowledge graphs representations for improved sense-making and associativity, are tailored for fact-based retrieval and fail to build the space-time-anchored narrative representations required for tracking entities through episodic events. To bridge this gap, we propose the \textbf{Generative Semantic Workspace} (GSW), a neuro-inspired generative memory framework that builds structured, interpretable representations of evolving situations, enabling LLMs to reason over evolving roles, actions, and spatiotemporal contexts. Our framework comprises an \textit{Operator}, which maps incoming observations to intermediate semantic structures, and a \textit{Reconciler}, which integrates these into a persistent workspace that enforces temporal, spatial, and logical coherence. On the Episodic Memory Benchmark (EpBench) \cite{huet_episodic_2025} comprising corpora ranging from 100k to 1M tokens in length, GSW outperforms existing RAG based baselines by up to \textbf{20\%}. Furthermore, GSW is highly efficient, reducing query-time context tokens by \textbf{51\%} compared to the next most token-efficient baseline, reducing inference time costs considerably. More broadly, GSW offers a concrete blueprint for endowing LLMs with human-like episodic memory, paving the way for more capable agents that can reason over long horizons.
Authors' comments: AAAI 2026 Oral
Supriti Vijay, Aman Priyanshu, Anu Vellore, Baturay Saglam, Amin Karbasi
Effective information retrieval requires reasoning over partial evidence and refining strategies as information emerges. Yet current approaches fall short: neural retrievers lack reasoning capabilities, large language models (LLMs) provide semantic depth but at prohibitive cost, and query rewriting or decomposition limits improvement to static transformations. As a result, existing methods fail to capture the iterative dynamics of exploration, feedback, and revision that complex user queries demand. We introduce Orion, a training framework that enables compact models (350M-1.2B parameters) to perform iterative retrieval through learned search strategies. Orion combines: (1) synthetic trajectory generation and supervised fine-tuning to encourage diverse exploration patterns in models, (2) reinforcement learning (RL) that rewards effective query refinement and backtracking behaviors, and (3) inference-time beam search algorithms that exploit the self-reflection capabilities learned during RL. Despite using only 3% of the training data available, our 1.2B model achieves 77.6% success on SciFact (vs. 72.6% for prior retrievers), 25.2% on BRIGHT (vs. 22.1%), 63.2% on NFCorpus (vs. 57.8%), and remains competitive on FEVER, HotpotQA, and MSMarco. It outperforms retrievers up to 200-400x larger on five of six benchmarks. These findings suggest that retrieval performance can emerge from learned strategies, not just model scale, when models are trained to search, reflect, and revise.
Authors' comments: 37 images, 7 figures, and 15 tables
Yining Lu, Wenyi Tang, Max Johnson, Taeho Jung, Meng Jiang
Existing retrieval-augmented generation (RAG) systems typically use a centralized architecture, causing a high cost of data collection, integration, and management, as well as privacy concerns. There is a great need for a decentralized RAG system that enables foundation models to utilize information directly from data owners who maintain full control over their sources. However, decentralization brings a challenge: the numerous independent data sources vary significantly in reliability, which can diminish retrieval accuracy and response quality. To address this, our decentralized RAG system has a novel reliability scoring mechanism that dynamically evaluates each source based on the quality of responses it contributes to generate and prioritizes high-quality sources during retrieval. To ensure transparency and trust, the scoring process is securely managed through blockchain-based smart contracts, creating verifiable and tamper-proof reliability records without relying on a central authority. We evaluate our decentralized system with two Llama models (3B and 8B) in two simulated environments where six data sources have different levels of reliability. Our system achieves a +10.7\% performance improvement over its centralized counterpart in the real world-like unreliable data environments. Notably, it approaches the upper-bound performance of centralized systems under ideally reliable data environments. The decentralized infrastructure enables secure and trustworthy scoring management, achieving approximately 56\% marginal cost savings through batched update operations. Our code and system are open-sourced at github.com/yining610/Reliable-dRAG.
Xinran Li, Xiujuan Xu, Jiaqi Qiao, Yu Liu
Emotion Recognition in Conversation (ERC) is a crucial task for understanding
human emotions and enabling natural human-computer interaction. Although Large
Language Models (LLMs) have recently shown great potential in this field, their
ability to capture the intrinsic connections between explicit and implicit
emotions remains limited. We propose a novel ERC training framework, PRC-Emo,
which integrates Prompt engineering, demonstration Retrieval, and Curriculum
learning, with the goal of exploring whether LLMs can effectively perceive
emotions in conversational contexts. Specifically, we design emotion-sensitive
prompt templates based on both explicit and implicit emotional cues to better
guide the model in understanding the speaker's psychological states. We
construct the first dedicated demonstration retrieval repository for ERC, which
includes training samples from widely used datasets, as well as high-quality
dialogue examples generated by LLMs and manually verified. Moreover, we
introduce a curriculum learning strategy into the LoRA fine-tuning process,
incorporating weighted emotional shifts between same-speaker and
different-speaker utterances to assign difficulty levels to dialogue samples,
which are then organized in an easy-to-hard training sequence. Experimental
results on two benchmark datasets-- IEMOCAP and MELD --show that our method
achieves new state-of-the-art (SOTA) performance, demonstrating the
effectiveness and generalizability of our approach in improving LLM-based
emotional understanding.
Authors' comments: Accepted at AAAI 2026
Hyunjae Kim, Jiwoong Sohn, Aidan Gilson, Nicholas Cochran-Caggiano, Serina Applebaum, Heeju Jin, Seihee Park, Yujin Park et al.
Large language models (LLMs) are transforming the landscape of medicine, yet
two fundamental challenges persist: keeping up with rapidly evolving medical
knowledge and providing verifiable, evidence-grounded reasoning.
Retrieval-augmented generation (RAG) has been widely adopted to address these
limitations by supplementing model outputs with retrieved evidence. However,
whether RAG reliably achieves these goals remains unclear. Here, we present the
most comprehensive expert evaluation of RAG in medicine to date. Eighteen
medical experts contributed a total of 80,502 annotations, assessing 800 model
outputs generated by GPT-4o and Llama-3.1-8B across 200 real-world patient and
USMLE-style queries. We systematically decomposed the RAG pipeline into three
components: (i) evidence retrieval (relevance of retrieved passages), (ii)
evidence selection (accuracy of evidence usage), and (iii) response generation
(factuality and completeness of outputs). Contrary to expectation, standard RAG
often degraded performance: only 22% of top-16 passages were relevant, evidence
selection remained weak (precision 41-43%, recall 27-49%), and factuality and
completeness dropped by up to 6% and 5%, respectively, compared with non-RAG
variants. Retrieval and evidence selection remain key failure points for the
model, contributing to the overall performance drop. We further show that
simple yet effective strategies, including evidence filtering and query
reformulation, substantially mitigate these issues, improving performance on
MedMCQA and MedXpertQA by up to 12% and 8.2%, respectively. These findings call
for re-examining RAG's role in medicine and highlight the importance of
stage-aware evaluation and deliberate system design for reliable medical LLM
applications.
Authors' comments: 34 pages, 6 figures
Adrian F. Chlebowski, Lukasz A. Sterczewski, Jaroslaw Sotor
Semiconductor lasers merge coherent light emission with photodetection and, owing to third-order nonlinearities in their active region, function as sensitive room temperature two-photon absorption (TPA) detectors. Here, we leverage these capabilities offered by a commercially available InGaAsP semiconductor laser diode with an integrated InGaAs monitor photodiode to measure the duration of femtosecond pulses at telecom wavelengths. The nonlinear and linear signals obtained from both detectors enable the retrieval of pulse duration with femtosecond accuracy without moving parts. This capability is demonstrated down to 2 $\mu$W average optical power for a 250 MHz repetition rate laser.
Jian Zhang, Junyi Guo, Junyi Yuan, Huanda Lu, Yanlin Zhou, Fangyu Wu, Qiufeng Wang, Dongming Lu
Cross-modal retrieval is essential for interpreting cultural heritage data, but its effectiveness is often limited by incomplete or inconsistent textual descriptions, caused by historical data loss and the high cost of expert annotation. While large language models (LLMs) offer a promising solution by enriching textual descriptions, their outputs frequently suffer from hallucinations or miss visually grounded details. To address these challenges, we propose $C^3$, a data augmentation framework that enhances cross-modal retrieval performance by improving the completeness and consistency of LLM-generated descriptions. $C^3$ introduces a completeness evaluation module to assess semantic coverage using both visual cues and language-model outputs. Furthermore, to mitigate factual inconsistencies, we formulate a Markov Decision Process to supervise Chain-of-Thought reasoning, guiding consistency evaluation through adaptive query control. Experiments on the cultural heritage datasets CulTi and TimeTravel, as well as on general benchmarks MSCOCO and Flickr30K, demonstrate that $C^3$ achieves state-of-the-art performance in both fine-tuned and zero-shot settings.
Yiwen Zhang, Keyan Ding, Yihang Wu, Xiang Zhuang, Yi Yang, Qiang Zhang, Huajun Chen
Retrieving molecular structures from tandem mass spectra is a crucial step in
rapid compound identification. Existing retrieval methods, such as traditional
mass spectral library matching, suffer from limited spectral library coverage,
while recent cross-modal representation learning frameworks often encounter
modality misalignment, resulting in suboptimal retrieval accuracy and
generalization. To address these limitations, we propose GLMR, a Generative
Language Model-based Retrieval framework that mitigates the cross-modal
misalignment through a two-stage process. In the pre-retrieval stage, a
contrastive learning-based model identifies top candidate molecules as
contextual priors for the input mass spectrum. In the generative retrieval
stage, these candidate molecules are integrated with the input mass spectrum to
guide a generative model in producing refined molecular structures, which are
then used to re-rank the candidates based on molecular similarity. Experiments
on both MassSpecGym and the proposed MassRET-20k dataset demonstrate that GLMR
significantly outperforms existing methods, achieving over 40% improvement in
top-1 accuracy and exhibiting strong generalizability.
Authors' comments: Accepted by AAAI 2026