Abhimanyu Borthakur, Jack Eden Hirschman, Sergio Carbajo
Ultrashort laser pulses enable attosecond-scale measurements and drive breakthroughs across science and technology, but their routine use hinges on reliable pulse characterization. Frequency-Resolved Optical Gating (FROG) is a leading solution, forming a spectrogram by scanning the delay between two pulse replicas and recording the nonlinear signal spectrum. In online settings, however, dense delay-frequency scans are costly or impractical-especially for long pulses, wavelength regimes with limited spectrometer coverage (e.g., UV), or hardware with coarse resolution, yielding severely undersampled FROG traces. Existing reconstruction methods struggle in this regime-iterative algorithms are computationally heavy, convolutional networks blur fine structure, and sequence models are unstable when inputs are discontinuous or sparse. We present a generative diffusion framework tailored to recover ultrafast pulse intensity and phase from incomplete FROG measurements. Our model infers missing spectro-temporal content with high fidelity, enabling accurate retrieval from aggressively downsampled inputs. On a simulated benchmark of FROG-pulse pairs, the diffusion approach surpasses strong CNN and Seq2Seq baselines in accuracy and stability while remaining efficient enough for near real-time deployment.
Authors' comments: Presented at: Optica Nonlinear Optics Topical Meeting 2025, Honolulu, Hawaii Optica Frontiers and Optics and Laser Science 2025, Denver, Colorado Winner: 2025 Emil Wolf Outstanding Student Paper Competition
Hadi Ghaemi, Lauren Snyder, Markus Stocker
Digital libraries for research, such as the ACM Digital Library or Semantic Scholar, do not enable the machine-supported, efficient reuse of scientific knowledge (e.g., in synthesis research). This is because these libraries are based on document-centric models with narrative text knowledge expressions that require manual or semi-automated knowledge extraction, structuring, and organization. We present ORKG reborn, an emerging digital library that supports finding, accessing, and reusing accurate, fine-grained, and reproducible machine-readable expressions of scientific knowledge that relate scientific statements and their supporting evidence in terms of data and code. The rich expressions of scientific knowledge are published as reborn (born-reusable) articles and provide novel possibilities for scientific knowledge retrieval, for instance by statistical methods, software packages, variables, or data matching specific constraints. We describe the proposed system and demonstrate its practical viability and potential for information retrieval in contrast to state-of-the-art digital libraries and document-centric scholarly communication using several published articles in research fields ranging from computer science to soil science. Our work underscores the enormous potential of scientific knowledge databases and a viable approach to their construction.
Seung Hwan Cho, Yujin Yang, Danik Baeck, Minjoo Kim, Young-Min Kim, Heejung Lee, Sangjin Park
Recommender systems (RS) are currently being studied to mitigate limitations during cold-start conditions by leveraging modality information or introducing Agent concepts based on the exceptional reasoning capabilities of Large Language Models (LLMs). Meanwhile, food and beverage recommender systems have traditionally used knowledge graph and ontology concepts due to the domain's unique data attributes and relationship characteristics. On this background, we propose MARC, a multimodal and multi-task cocktail recommender system based on Agentic Retrieval-Augmented Generation (RAG) utilizing graph database under cold-start conditions. The proposed system generates high-quality, contextually appropriate answers through two core processes: a task recognition router and a reflection process. The graph database was constructed by processing cocktail data from Kaggle, and its effectiveness was evaluated using 200 manually crafted questions. The evaluation used both LLM-as-a-judge and human evaluation to demonstrate that answers generated via the graph database outperformed those from a simple vector database in terms of quality. The code is available at https://github.com/diddbwls/cocktail_rec_agentrag
Authors' comments: 13 pages, 2 figures, Accepted at RDGENAI at CIKM 2025 workshop
Chuanfu Xiao, Jiaxin Zeng
Tensors, especially higher-order tensors, are typically represented in low-rank formats to preserve the main information of the high-dimensional data while saving memory space. In practice, only a small fraction elements in high-dimensional data are of interest, such as the $k$ largest or smallest elements. Thus, retrieving the $k$ largest/smallest elements from a low-rank tensor is a fundamental and important task in a wide variety of applications. In this paper, we first model the top-$k$ elements retrieval problem to a continuous constrained optimization problem. To address the equivalent optimization problem, we develop a block-alternating iterative algorithm that decomposes the original problem into a sequence of small-scale subproblems. Leveraging the separable summation structure of the objective function, a heuristic algorithm is proposed to solve these subproblems in an alternating manner. Numerical experiments with tensors from synthetic and real-world applications demonstrate that the proposed algorithm outperforms existing methods in terms of accuracy and stability.
Jinesh Patel, Arpit Malhotra, Ajay Pande, Prateek Caire
Retrieval-Augmented Generation (RAG) based chatbots are not only useful for information retrieval through questionanswering but also for making complex decisions based on injected private data.we present a survey on how much search time can be saved when retrieving complex information within an organization called "X Systems"(a stealth mode company) by using a RAG-based chatbot compared to traditional search methods. We compare the information retrieval time using standard search techniques versus the RAG-based chatbot for the same queries. Our results conclude that RAG-based chatbots not only save time in information retrieval but also optimize the search process effectively. This survey was conducted with a sample of 105 employees across departments, average time spending on information retrieval per query was taken as metric. Comparison shows us, there are average 80-95% improvement on search when use RAG based chatbot than using standard search.
Shreyas Rajesh, Pavan Holur, Chenda Duan, David Chong, Vwani Roychowdhury
Large Language Models (LLMs) face fundamental challenges in long-context reasoning: many documents exceed their finite context windows, while performance on texts that do fit degrades with sequence length, necessitating their augmentation with external memory frameworks. Current solutions, which have evolved from retrieval using semantic embeddings to more sophisticated structured knowledge graphs representations for improved sense-making and associativity, are tailored for fact-based retrieval and fail to build the space-time-anchored narrative representations required for tracking entities through episodic events. To bridge this gap, we propose the \textbf{Generative Semantic Workspace} (GSW), a neuro-inspired generative memory framework that builds structured, interpretable representations of evolving situations, enabling LLMs to reason over evolving roles, actions, and spatiotemporal contexts. Our framework comprises an \textit{Operator}, which maps incoming observations to intermediate semantic structures, and a \textit{Reconciler}, which integrates these into a persistent workspace that enforces temporal, spatial, and logical coherence. On the Episodic Memory Benchmark (EpBench) \cite{huet_episodic_2025} comprising corpora ranging from 100k to 1M tokens in length, GSW outperforms existing RAG based baselines by up to \textbf{20\%}. Furthermore, GSW is highly efficient, reducing query-time context tokens by \textbf{51\%} compared to the next most token-efficient baseline, reducing inference time costs considerably. More broadly, GSW offers a concrete blueprint for endowing LLMs with human-like episodic memory, paving the way for more capable agents that can reason over long horizons.
Authors' comments: AAAI 2026 Oral
Supriti Vijay, Aman Priyanshu, Anu Vellore, Baturay Saglam, Amin Karbasi
Effective information retrieval requires reasoning over partial evidence and refining strategies as information emerges. Yet current approaches fall short: neural retrievers lack reasoning capabilities, large language models (LLMs) provide semantic depth but at prohibitive cost, and query rewriting or decomposition limits improvement to static transformations. As a result, existing methods fail to capture the iterative dynamics of exploration, feedback, and revision that complex user queries demand. We introduce Orion, a training framework that enables compact models (350M-1.2B parameters) to perform iterative retrieval through learned search strategies. Orion combines: (1) synthetic trajectory generation and supervised fine-tuning to encourage diverse exploration patterns in models, (2) reinforcement learning (RL) that rewards effective query refinement and backtracking behaviors, and (3) inference-time beam search algorithms that exploit the self-reflection capabilities learned during RL. Despite using only 3% of the training data available, our 1.2B model achieves 77.6% success on SciFact (vs. 72.6% for prior retrievers), 25.2% on BRIGHT (vs. 22.1%), 63.2% on NFCorpus (vs. 57.8%), and remains competitive on FEVER, HotpotQA, and MSMarco. It outperforms retrievers up to 200-400x larger on five of six benchmarks. These findings suggest that retrieval performance can emerge from learned strategies, not just model scale, when models are trained to search, reflect, and revise.
Authors' comments: 37 images, 7 figures, and 15 tables
Yining Lu, Wenyi Tang, Max Johnson, Taeho Jung, Meng Jiang
Existing retrieval-augmented generation (RAG) systems typically use a centralized architecture, causing a high cost of data collection, integration, and management, as well as privacy concerns. There is a great need for a decentralized RAG system that enables foundation models to utilize information directly from data owners who maintain full control over their sources. However, decentralization brings a challenge: the numerous independent data sources vary significantly in reliability, which can diminish retrieval accuracy and response quality. To address this, our decentralized RAG system has a novel reliability scoring mechanism that dynamically evaluates each source based on the quality of responses it contributes to generate and prioritizes high-quality sources during retrieval. To ensure transparency and trust, the scoring process is securely managed through blockchain-based smart contracts, creating verifiable and tamper-proof reliability records without relying on a central authority. We evaluate our decentralized system with two Llama models (3B and 8B) in two simulated environments where six data sources have different levels of reliability. Our system achieves a +10.7\% performance improvement over its centralized counterpart in the real world-like unreliable data environments. Notably, it approaches the upper-bound performance of centralized systems under ideally reliable data environments. The decentralized infrastructure enables secure and trustworthy scoring management, achieving approximately 56\% marginal cost savings through batched update operations. Our code and system are open-sourced at github.com/yining610/Reliable-dRAG.
Xinran Li, Xiujuan Xu, Jiaqi Qiao, Yu Liu
Emotion Recognition in Conversation (ERC) is a crucial task for understanding
human emotions and enabling natural human-computer interaction. Although Large
Language Models (LLMs) have recently shown great potential in this field, their
ability to capture the intrinsic connections between explicit and implicit
emotions remains limited. We propose a novel ERC training framework, PRC-Emo,
which integrates Prompt engineering, demonstration Retrieval, and Curriculum
learning, with the goal of exploring whether LLMs can effectively perceive
emotions in conversational contexts. Specifically, we design emotion-sensitive
prompt templates based on both explicit and implicit emotional cues to better
guide the model in understanding the speaker's psychological states. We
construct the first dedicated demonstration retrieval repository for ERC, which
includes training samples from widely used datasets, as well as high-quality
dialogue examples generated by LLMs and manually verified. Moreover, we
introduce a curriculum learning strategy into the LoRA fine-tuning process,
incorporating weighted emotional shifts between same-speaker and
different-speaker utterances to assign difficulty levels to dialogue samples,
which are then organized in an easy-to-hard training sequence. Experimental
results on two benchmark datasets-- IEMOCAP and MELD --show that our method
achieves new state-of-the-art (SOTA) performance, demonstrating the
effectiveness and generalizability of our approach in improving LLM-based
emotional understanding.
Authors' comments: Accepted at AAAI 2026
Hyunjae Kim, Jiwoong Sohn, Aidan Gilson, Nicholas Cochran-Caggiano, Serina Applebaum, Heeju Jin, Seihee Park, Yujin Park et al.
Large language models (LLMs) are transforming the landscape of medicine, yet
two fundamental challenges persist: keeping up with rapidly evolving medical
knowledge and providing verifiable, evidence-grounded reasoning.
Retrieval-augmented generation (RAG) has been widely adopted to address these
limitations by supplementing model outputs with retrieved evidence. However,
whether RAG reliably achieves these goals remains unclear. Here, we present the
most comprehensive expert evaluation of RAG in medicine to date. Eighteen
medical experts contributed a total of 80,502 annotations, assessing 800 model
outputs generated by GPT-4o and Llama-3.1-8B across 200 real-world patient and
USMLE-style queries. We systematically decomposed the RAG pipeline into three
components: (i) evidence retrieval (relevance of retrieved passages), (ii)
evidence selection (accuracy of evidence usage), and (iii) response generation
(factuality and completeness of outputs). Contrary to expectation, standard RAG
often degraded performance: only 22% of top-16 passages were relevant, evidence
selection remained weak (precision 41-43%, recall 27-49%), and factuality and
completeness dropped by up to 6% and 5%, respectively, compared with non-RAG
variants. Retrieval and evidence selection remain key failure points for the
model, contributing to the overall performance drop. We further show that
simple yet effective strategies, including evidence filtering and query
reformulation, substantially mitigate these issues, improving performance on
MedMCQA and MedXpertQA by up to 12% and 8.2%, respectively. These findings call
for re-examining RAG's role in medicine and highlight the importance of
stage-aware evaluation and deliberate system design for reliable medical LLM
applications.
Authors' comments: 34 pages, 6 figures
Adrian F. Chlebowski, Lukasz A. Sterczewski, Jaroslaw Sotor
Semiconductor lasers merge coherent light emission with photodetection and, owing to third-order nonlinearities in their active region, function as sensitive room temperature two-photon absorption (TPA) detectors. Here, we leverage these capabilities offered by a commercially available InGaAsP semiconductor laser diode with an integrated InGaAs monitor photodiode to measure the duration of femtosecond pulses at telecom wavelengths. The nonlinear and linear signals obtained from both detectors enable the retrieval of pulse duration with femtosecond accuracy without moving parts. This capability is demonstrated down to 2 $\mu$W average optical power for a 250 MHz repetition rate laser.
Jian Zhang, Junyi Guo, Junyi Yuan, Huanda Lu, Yanlin Zhou, Fangyu Wu, Qiufeng Wang, Dongming Lu
Cross-modal retrieval is essential for interpreting cultural heritage data, but its effectiveness is often limited by incomplete or inconsistent textual descriptions, caused by historical data loss and the high cost of expert annotation. While large language models (LLMs) offer a promising solution by enriching textual descriptions, their outputs frequently suffer from hallucinations or miss visually grounded details. To address these challenges, we propose $C^3$, a data augmentation framework that enhances cross-modal retrieval performance by improving the completeness and consistency of LLM-generated descriptions. $C^3$ introduces a completeness evaluation module to assess semantic coverage using both visual cues and language-model outputs. Furthermore, to mitigate factual inconsistencies, we formulate a Markov Decision Process to supervise Chain-of-Thought reasoning, guiding consistency evaluation through adaptive query control. Experiments on the cultural heritage datasets CulTi and TimeTravel, as well as on general benchmarks MSCOCO and Flickr30K, demonstrate that $C^3$ achieves state-of-the-art performance in both fine-tuned and zero-shot settings.
Yiwen Zhang, Keyan Ding, Yihang Wu, Xiang Zhuang, Yi Yang, Qiang Zhang, Huajun Chen
Retrieving molecular structures from tandem mass spectra is a crucial step in
rapid compound identification. Existing retrieval methods, such as traditional
mass spectral library matching, suffer from limited spectral library coverage,
while recent cross-modal representation learning frameworks often encounter
modality misalignment, resulting in suboptimal retrieval accuracy and
generalization. To address these limitations, we propose GLMR, a Generative
Language Model-based Retrieval framework that mitigates the cross-modal
misalignment through a two-stage process. In the pre-retrieval stage, a
contrastive learning-based model identifies top candidate molecules as
contextual priors for the input mass spectrum. In the generative retrieval
stage, these candidate molecules are integrated with the input mass spectrum to
guide a generative model in producing refined molecular structures, which are
then used to re-rank the candidates based on molecular similarity. Experiments
on both MassSpecGym and the proposed MassRET-20k dataset demonstrate that GLMR
significantly outperforms existing methods, achieving over 40% improvement in
top-1 accuracy and exhibiting strong generalizability.
Authors' comments: Accepted by AAAI 2026
Shahram Najam Syed, Yatharth Ahuja, Arthur Jakobsson, Jeff Ichnowski
Vision-Language-Action models such as OpenVLA show impressive zero-shot
generalization across robotic manipulation tasks but often fail to adapt
efficiently to new deployment environments. In many real-world applications,
consistent high performance on a limited set of tasks is more important than
broad generalization. We propose ExpReS-VLA, a method for specializing
pre-trained VLA models through experience replay and retrieval while preventing
catastrophic forgetting. ExpReS-VLA stores compact feature representations from
the frozen vision backbone instead of raw image-action pairs, reducing memory
usage by approximately 97 percent. During deployment, relevant past experiences
are retrieved using cosine similarity and used to guide adaptation, while
prioritized experience replay emphasizes successful trajectories. We also
introduce Thresholded Hybrid Contrastive Loss, which enables learning from both
successful and failed attempts. On the LIBERO simulation benchmark, ExpReS-VLA
improves success rates from 82.6 to 93.1 percent on spatial reasoning tasks and
from 61 to 72.3 percent on long-horizon tasks. On physical robot experiments
with five manipulation tasks, it reaches 98 percent success on both seen and
unseen settings, compared to 84.7 and 32 percent for naive fine-tuning.
Adaptation takes 31 seconds using 12 demonstrations on a single RTX 5090 GPU,
making the approach practical for real robot deployment.
Authors' comments: 10 pages, 5 figures, submitted to ICRA 2026. Equal contribution by
first two authors
Atsushi Miki, Toshiyasu Matsushima
Private information retrieval (PIR) is a mechanism for efficiently
downloading messages while keeping the index of the desired message secret from
the servers. PIR schemes have been extended to various scenarios with
adversarial servers: PIR schemes where some servers are unresponsive or return
noisy responses are called robust PIR and Byzantine PIR, respectively; PIR
schemes where some servers collude to reveal the index are called colluding
PIR. The information-theoretic upper bound on the download efficiency of these
PIR schemes has been proved in previous studies. However, systematic ways to
construct PIR schemes that achieve the upper bound are not known. In order to
construct a capacity-achieving PIR schemes systematically, it is necessary to
clarify the conditions that the queries should satisfy. This paper proves the
necessary and sufficient conditions for capacity-achieving PIR schemes.
Authors' comments: 17 pages
Guihang Hong, Tao Ouyang, Kongyange Zhao, Zhi Zhou, Xu Chen
Motivated by the imperative for real-time responsiveness and data privacy
preservation, large language models (LLMs) are increasingly deployed on
resource-constrained edge devices to enable localized inference. To improve
output quality, retrieval-augmented generation (RAG) is an efficient technique
that seamlessly integrates local data into LLMs. However, existing edge
computing paradigms primarily focus on single-node optimization, neglecting
opportunities to holistically exploit distributed data and heterogeneous
resources through cross-node collaboration. To bridge this gap, we propose
CoEdge-RAG, a hierarchical scheduling framework for retrieval-augmented LLMs in
collaborative edge computing. In general, privacy constraints preclude accurate
a priori acquisition of heterogeneous data distributions across edge nodes,
directly impeding RAG performance optimization. Thus, we first design an online
query identification mechanism using proximal policy optimization (PPO), which
autonomously infers query semantics and establishes cross-domain knowledge
associations in an online manner. Second, we devise a dynamic inter-node
scheduling strategy that balances workloads across heterogeneous edge nodes by
synergizing historical performance analytics with real-time resource
thresholds. Third, we develop an intra-node scheduler based on online convex
optimization, adaptively allocating query processing ratios and memory
resources to optimize the latency-quality trade-off under fluctuating assigned
loads. Comprehensive evaluations across diverse QA benchmarks demonstrate that
our proposed method significantly boosts the performance of collaborative
retrieval-augmented LLMs, achieving performance gains of 4.23\% to 91.39\% over
baseline methods across all tasks.
Authors' comments: Accepted by RTSS 2025 (Real-Time Systems Symposium, 2025)
Rui Yang, Matthew Yu Heng Wong, Huitao Li, Xin Li, Wentao Zhu, Jingchi Liao, Kunyu Yu, Jonathan Chong Kai Liew et al.
The rapid growth of medical knowledge and increasing complexity of clinical practice pose challenges. In this context, large language models (LLMs) have demonstrated value; however, inherent limitations remain. Retrieval-augmented generation (RAG) technologies show potential to enhance their clinical applicability. This study reviewed RAG applications in medicine. We found that research primarily relied on publicly available data, with limited application in private data. For retrieval, approaches commonly relied on English-centric embedding models, while LLMs were mostly generic, with limited use of medical-specific LLMs. For evaluation, automated metrics evaluated generation quality and task performance, whereas human evaluation focused on accuracy, completeness, relevance, and fluency, with insufficient attention to bias and safety. RAG applications were concentrated on question answering, report generation, text summarization, and information extraction. Overall, medical RAG remains at an early stage, requiring advances in clinical validation, cross-linguistic adaptation, and support for low-resource settings to enable trustworthy and responsible global use.
Janet Jenq, Hongda Shen
Multimodal product retrieval systems in e-commerce platforms rely on effectively combining visual and textual signals to improve search relevance and user experience. However, vision-language models such as CLIP are vulnerable to typographic attacks, where misleading or irrelevant text embedded in images skews model predictions. In this work, we propose a novel method that reverses the logic of typographic attacks by rendering relevant textual content (e.g., titles, descriptions) directly onto product images to perform vision-text compression, thereby strengthening image-text alignment and boosting multimodal product retrieval performance. We evaluate our method on three vertical-specific e-commerce datasets (sneakers, handbags, and trading cards) using six state-of-the-art vision foundation models. Our experiments demonstrate consistent improvements in unimodal and multimodal retrieval accuracy across categories and model families. Our findings suggest that visually rendering product metadata is a simple yet effective enhancement for zero-shot multimodal retrieval in e-commerce applications.
Mohammed Hilel, Yannis Karmim, Jean De Bodinat, Reda Sarehane, Antoine Gillon
Digital Adoption Platforms (DAPs) have become essential tools for helping employees navigate complex enterprise software such as CRM, ERP, or HRMS systems. Companies like LemonLearning have shown how digital guidance can reduce training costs and accelerate onboarding. However, building and maintaining these interactive guides still requires extensive manual effort. Leveraging Large Language Models as virtual assistants is an appealing alternative, yet without a structured understanding of the target software, LLMs often hallucinate and produce unreliable answers. Moreover, most production-grade LLMs are black-box APIs, making fine-tuning impractical due to the lack of access to model weights. In this work, we introduce a Graph-based Retrieval-Augmented Generation framework that automatically converts enterprise web applications into state-action knowledge graphs, enabling LLMs to generate grounded and context-aware assistance. The framework was co-developed with the AI enterprise RAKAM, in collaboration with Lemon Learning. We detail the engineering pipeline that extracts and structures software interfaces, the design of the graph-based retrieval process, and the integration of our approach into production DAP workflows. Finally, we discuss scalability, robustness, and deployment lessons learned from industrial use cases.
Yichen Zhu, Feifei Feng
Robots operating in complex and uncertain environments face considerable
challenges. Advanced robotic systems often rely on extensive datasets to learn
manipulation tasks. In contrast, when humans are faced with unfamiliar tasks,
such as assembling a chair, a common approach is to learn by watching video
demonstrations. In this paper, we propose a novel method for learning robot
policies by Retrieving-from-Video (RfV), using analogies from human
demonstrations to address manipulation tasks. Our system constructs a video
bank comprising recordings of humans performing diverse daily tasks. To enrich
the knowledge from these videos, we extract mid-level information, such as
object affordance masks and hand motion trajectories, which serve as additional
inputs to enhance the robot model's learning and generalization capabilities.
We further feature a dual-component system: a video retriever that taps into an
external video bank to fetch task-relevant video based on task specification,
and a policy generator that integrates this retrieved knowledge into the
learning cycle. This approach enables robots to craft adaptive responses to
various scenarios and generalize to tasks beyond those in the training data.
Through rigorous testing in multiple simulated and real-world settings, our
system demonstrates a marked improvement in performance over conventional
robotic systems, showcasing a significant breakthrough in the field of
robotics.
Authors' comments: Accepted by IROS 2025