Rong Cheng, Jinyi Liu, YAN ZHENG, Fei Ni, Jiazhen Du, Hangyu Mao, Fuzheng Zhang, Bo Wang et al.
Multi-Hop Question Answering (MHQA) tasks permeate real-world applications, posing challenges in orchestrating multi-step reasoning across diverse knowledge domains. While existing approaches have been improved with iterative retrieval, they still struggle to identify and organize dynamic knowledge. To address this, we propose DualRAG, a synergistic dual-process framework that seamlessly integrates reasoning and retrieval. DualRAG operates through two tightly coupled processes: Reasoning-augmented Querying (RaQ) and progressive Knowledge Aggregation (pKA). They work in concert: as RaQ navigates the reasoning path and generates targeted queries, pKA ensures that newly acquired knowledge is systematically integrated to support coherent reasoning. This creates a virtuous cycle of knowledge enrichment and reasoning refinement. Through targeted fine-tuning, DualRAG preserves its sophisticated reasoning and retrieval capabilities even in smaller-scale models, demonstrating its versatility and core advantages across different scales. Extensive experiments demonstrate that this dual-process approach substantially improves answer accuracy and coherence, approaching, and in some cases surpassing, the performance achieved with oracle knowledge access. These results establish DualRAG as a robust and efficient solution for complex multi-hop reasoning tasks.
Chuer Chen, Yuqi Liu, Danqing Shi, Shixiong Cao, Nan Cao
A data story typically integrates data facts from multiple perspectives and stances to construct a comprehensive and objective narrative. However, retrieving these facts demands time for data search and challenges the creator's analytical skills. In this work, we introduce DataScout, an interactive system that automatically performs reasoning and stance-based data facts retrieval to augment the user's statement. Particularly, DataScout leverages an LLM-based agent to construct a retrieval tree, enabling collaborative control of its expansion between users and the agent. The interface visualizes the retrieval tree as a mind map that eases users to intuitively steer the retrieval direction and effectively engage in reasoning and analysis. We evaluate the proposed system through case studies and in-depth expert interviews. Our evaluation demonstrates that DataScout can effectively retrieve multifaceted data facts from different stances, helping users verify their statements and enhance the credibility of their stories.
Youngjune Lee, Haeyu Jeong, Changgeon Lim, Jeong Choi, Hongjun Lim, Hangon Kim, Jiyoon Kwon, Saehun Kim
Online community platforms require dynamic personalized retrieval and
recommendation that can continuously adapt to evolving user interests and new
documents. However, optimizing models to handle such changes in real-time
remains a major challenge in large-scale industrial settings. To address this,
we propose the Interest-aware Representation and Alignment (IRA) framework, an
efficient and scalable approach that dynamically adapts to new interactions
through a cumulative structure. IRA leverages two key mechanisms: (1) Interest
Units that capture diverse user interests as contextual texts, while
reinforcing or fading over time through cumulative updates, and (2) a retrieval
process that measures the relevance between Interest Units and documents based
solely on semantic relationships, eliminating dependence on click signals to
mitigate temporal biases. By integrating cumulative Interest Unit updates with
the retrieval process, IRA continuously adapts to evolving user preferences,
ensuring robust and fine-grained personalization without being constrained by
past training distributions. We validate the effectiveness of IRA through
extensive experiments on real-world datasets, including its deployment in the
Home Section of NAVER's CAFE, South Korea's leading community platform.
Authors' comments: Accepted to SIGIR 2025 Industry Track. First two authors contributed
equally
Yabing Wang, Zhuotao Tian, Qingpei Guo, Zheng Qin, Sanping Zhou, Ming Yang, Le Wang
Composed Image Retrieval (CIR) is a challenging multimodal task that retrieves a target image based on a reference image and accompanying modification text. Due to the high cost of annotating CIR triplet datasets, zero-shot (ZS) CIR has gained traction as a promising alternative. Existing studies mainly focus on projection-based methods, which map an image to a single pseudo-word token. However, these methods face three critical challenges: (1) insufficient pseudo-word token representation capacity, (2) discrepancies between training and inference phases, and (3) reliance on large-scale synthetic data. To address these issues, we propose a two-stage framework where the training is accomplished from mapping to composing. In the first stage, we enhance image-to-pseudo-word token learning by introducing a visual semantic injection module and a soft text alignment objective, enabling the token to capture richer and fine-grained image information. In the second stage, we optimize the text encoder using a small amount of synthetic triplet data, enabling it to effectively extract compositional semantics by combining pseudo-word tokens with modification text for accurate target image retrieval. The strong visual-to-pseudo mapping established in the first stage provides a solid foundation for the second stage, making our approach compatible with both high- and low-quality synthetic data, and capable of achieving significant performance gains with only a small amount of synthetic data. Extensive experiments were conducted on three public datasets, achieving superior performance compared to existing approaches.
Won-Kwang Park
In this study, we investigated the application of the direct sampling method
(DSM) to identify small dielectric objects in a limited-aperture inverse
scattering problem. Unlike previous studies, we consider the bistatic
measurement configuration corresponding to the transmitter location and design
indicator functions for both a single source and multiple sources, and we
convert the unknown measurement data to a fixed nonzero constant. To explain
the applicability and limitation of object detection, we demonstrate that the
indicator functions can be expressed by an infinite series of Bessel functions,
the material properties of the objects, the bistatic angle, and the converted
constant. Based on the theoretical results, we explain how the imaging
performance of the DSM is influenced by the bistatic angle and the converted
constant. In addition, the results of our analyses demonstrate that a smaller
bistatic angle enhances the imaging accuracy and that optimal selection of the
converted constant is crucial to realize reliable object detection. The results
of the numerical simulations obtained using a two-dimensional Fresnel dataset
validated the theoretical findings and illustrate the effectiveness and
limitations of the designed indicator functions for small objects.
Authors' comments: 18 pages, 14 figures
Bang An, Shiyue Zhang, Mark Dredze
Efforts to ensure the safety of large language models (LLMs) include safety
fine-tuning, evaluation, and red teaming. However, despite the widespread use
of the Retrieval-Augmented Generation (RAG) framework, AI safety work focuses
on standard LLMs, which means we know little about how RAG use cases change a
model's safety profile. We conduct a detailed comparative analysis of RAG and
non-RAG frameworks with eleven LLMs. We find that RAG can make models less safe
and change their safety profile. We explore the causes of this change and find
that even combinations of safe models with safe documents can cause unsafe
generations. In addition, we evaluate some existing red teaming methods for RAG
settings and show that they are less effective than when used for non-RAG
settings. Our work highlights the need for safety research and red-teaming
methods specifically tailored for RAG LLMs.
Authors' comments: NAACL 2025
Shiyin Tan, Jaeeon Park, Dongyuan Li, Renhe Jiang, Manabu Okumura
In the field of multi-document summarization (MDS), transformer-based models have demonstrated remarkable success, yet they suffer an input length limitation. Current methods apply truncation after the retrieval process to fit the context length; however, they heavily depend on manually well-crafted queries, which are impractical to create for each document set for MDS. Additionally, these methods retrieve information at a coarse granularity, leading to the inclusion of irrelevant content. To address these issues, we propose a novel retrieval-based framework that integrates query selection and document ranking and shortening into a unified process. Our approach identifies the most salient elementary discourse units (EDUs) from input documents and utilizes them as latent queries. These queries guide the document ranking by calculating relevance scores. Instead of traditional truncation, our approach filters out irrelevant EDUs to fit the context length, ensuring that only critical information is preserved for summarization. We evaluate our framework on multiple MDS datasets, demonstrating consistent improvements in ROUGE metrics while confirming its scalability and flexibility across diverse model architectures. Additionally, we validate its effectiveness through an in-depth analysis, emphasizing its ability to dynamically select appropriate queries and accurately rank documents based on their relevance scores. These results demonstrate that our framework effectively addresses context-length constraints, establishing it as a robust and reliable solution for MDS.
Luankang Zhang, Kenan Song, Yi Quan Lee, Wei Guo, Hao Wang, Yawen Li, Huifeng Guo, Yong Liu et al.
In recommendation systems, the traditional multi-stage paradigm, which
includes retrieval and ranking, often suffers from information loss between
stages and diminishes performance. Recent advances in generative models,
inspired by natural language processing, suggest the potential for unifying
these stages to mitigate such loss. This paper presents the Unified Generative
Recommendation Framework (UniGRF), a novel approach that integrates retrieval
and ranking into a single generative model. By treating both stages as sequence
generation tasks, UniGRF enables sufficient information sharing without
additional computational costs, while remaining model-agnostic. To enhance
inter-stage collaboration, UniGRF introduces a ranking-driven enhancer module
that leverages the precision of the ranking stage to refine retrieval
processes, creating an enhancement loop. Besides, a gradient-guided adaptive
weighter is incorporated to dynamically balance the optimization of retrieval
and ranking, ensuring synchronized performance improvements. Extensive
experiments demonstrate that UniGRF significantly outperforms existing models
on benchmark datasets, confirming its effectiveness in facilitating information
transfer. Ablation studies and further experiments reveal that UniGRF not only
promotes efficient collaboration between stages but also achieves synchronized
optimization. UniGRF provides an effective, scalable, and compatible framework
for generative recommendation systems.
Authors' comments: This paper has been accepted at SIGIR 2025
Bo Lin, Shangwen Wang, Yihao Qin, Liqian Chen, Xiaoguang Mao
Retrieval-Augmented Code Generation (RACG) leverages external knowledge to enhance Large Language Models (LLMs) in code synthesis, improving the functional correctness of the generated code. However, existing RACG systems largely overlook security, leading to substantial risks. Especially, the poisoning of malicious code into knowledge bases can mislead LLMs, resulting in the generation of insecure outputs, which poses a critical threat in modern software development. To address this, we propose a security-hardening framework for RACG systems, CodeGuarder, that shifts the paradigm from retrieving only functional code examples to incorporating both functional code and security knowledge. Our framework constructs a security knowledge base from real-world vulnerability databases, including secure code samples and root cause annotations. For each code generation query, a retriever decomposes the query into fine-grained sub-tasks and fetches relevant security knowledge. To prioritize critical security guidance, we introduce a re-ranking and filtering mechanism by leveraging the LLMs' susceptibility to different vulnerability types. This filtered security knowledge is seamlessly integrated into the generation prompt. Our evaluation shows CodeGuarder significantly improves code security rates across various LLMs, achieving average improvements of 20.12\% in standard RACG, and 31.53\% and 21.91\% under two distinct poisoning scenarios without compromising functional correctness. Furthermore, CodeGuarder demonstrates strong generalization, enhancing security even when the targeted language's security knowledge is lacking. This work presents CodeGuarder as a pivotal advancement towards building secure and trustworthy RACG systems.
Juhyeok Lee, Samuel W. Song, Min Gee Cho, Georgios Varnavides, Stephanie M. Ribet, Colin Ophus, Mary C. Scott, Michael L. Whittaker
Electron cryo-tomography (cryo-ET) enables 3D imaging of complex,
radiation-sensitive structures with molecular detail. However, image contrast
from the interference of scattered electrons is nonlinear with atomic density
and multiple scattering further complicates interpretation. These effects
degrade resolution, particularly in conventional reconstruction algorithms,
which assume linearity. Particle averaging can reduce such issues but is
unsuitable for heterogeneous or dynamic samples ubiquitous in biology,
chemistry, and materials sciences. Here, we develop a phase retrieval-based
cryo-ET method, PhaseT3M. We experimentally demonstrate its application to a ~7
nm Co3O4 nanoparticle on ~30 nm carbon substrate, achieving a maximum
resolution of 1.6 {\AA}, surpassing conventional limits using standard cryo-TEM
equipment. PhaseT3M uses a multislice model for multiple scattering and
Bayesian optimization for alignment and computational aberration correction,
with a positivity constraint to recover 'missing wedge' information. Applied
directly to biological particles, it enhances resolution and reduces artifacts,
establishing a new standard for routine 3D imaging of complex,
radiation-sensitive materials.
Authors' comments: 26 pages, 5 figures, 8 supplementary figures
Dezheng Han, Yibin Jia, Ruxiao Chen, Wenjie Han, Shuaishuai Guo, Jianbo Wang
To enable precise and fully automated cell type annotation with large language models (LLMs), we developed a graph structured feature marker database to retrieve entities linked to differential genes for cell reconstruction. We further designed a multi task workflow to optimize the annotation process. Compared to general purpose LLMs, our method improves human evaluation scores by up to 0.21 and semantic similarity by 6.1% across 11 tissue types, while more closely aligning with the cognitive logic of manual annotation.
William R. Keely, Otto Lamminpää, Steffen Mauceri, Sean M. R. Crowell, Christopher W. O'Dell, Gregory R. McGarragh
Satellite-based estimates of greenhouse gas (GHG) properties from
observations of reflected solar spectra are integral for understanding and
monitoring complex terrestrial systems and their impact on the carbon cycle due
to their near global coverage. Known as retrieval, making GHG concentration
estimations from these observations is a non-linear Bayesian inverse problem,
which is operationally solved using a computationally expensive algorithm
called Optimal Estimation (OE), providing a Gaussian approximation to a
non-Gaussian posterior. This leads to issues in solver algorithm convergence,
and to unrealistically confident uncertainty estimates for the retrieved
quantities. Upcoming satellite missions will provide orders of magnitude more
data than the current constellation of GHG observers. Development of fast and
accurate retrieval algorithms with robust uncertainty quantification is
critical. Doing so stands to provide substantial climate impact of moving
towards the goal of near continuous real-time global monitoring of carbon
sources and sinks which is essential for policy making. To achieve this goal,
we propose a diffusion-based approach to flexibly retrieve a Gaussian or
non-Gaussian posterior, for NASA's Orbiting Carbon Observatory-2 spectrometer,
while providing a substantial computational speed-up over the current
operational state-of-the-art.
Authors' comments: Published as a workshop paper in "Tackling Climate Change with
Machine Learning", ICLR 2025 Workshop on Tackling Climate Change with Machine
Learning. https://www.climatechange.ai/papers/iclr2025/12
Chanyeol Choi, Jihoon Kwon, Jaeseon Ha, Hojun Choi, Chaewoon Kim, Yongjae Lee, Jy-yong Sohn, Alejandro Lopez-Lira
In the fast-paced financial domain, accurate and up-to-date information is
critical to addressing ever-evolving market conditions. Retrieving this
information correctly is essential in financial Question-Answering (QA), since
many language models struggle with factual accuracy in this domain. We present
FinDER, an expert-generated dataset tailored for Retrieval-Augmented Generation
(RAG) in finance. Unlike existing QA datasets that provide predefined contexts
and rely on relatively clear and straightforward queries, FinDER focuses on
annotating search-relevant evidence by domain experts, offering 5,703
query-evidence-answer triplets derived from real-world financial inquiries.
These queries frequently include abbreviations, acronyms, and concise
expressions, capturing the brevity and ambiguity common in the realistic search
behavior of professionals. By challenging models to retrieve relevant
information from large corpora rather than relying on readily determined
contexts, FinDER offers a more realistic benchmark for evaluating RAG systems.
We further present a comprehensive evaluation of multiple state-of-the-art
retrieval models and Large Language Models, showcasing challenges derived from
a realistic benchmark to drive future research on truthful and precise RAG in
the financial domain.
Authors' comments: 10 pages, 3 figures, ICLR 2025 Workshop Advances in Financial AI
Aoran Gan, Hao Yu, Kai Zhang, Qi Liu, Wenyu Yan, Zhenya Huang, Shiwei Tong, Guoping Hu
Recent advancements in Retrieval-Augmented Generation (RAG) have
revolutionized natural language processing by integrating Large Language Models
(LLMs) with external information retrieval, enabling accurate, up-to-date, and
verifiable text generation across diverse applications. However, evaluating RAG
systems presents unique challenges due to their hybrid architecture that
combines retrieval and generation components, as well as their dependence on
dynamic knowledge sources in the LLM era. In response, this paper provides a
comprehensive survey of RAG evaluation methods and frameworks, systematically
reviewing traditional and emerging evaluation approaches, for system
performance, factual accuracy, safety, and computational efficiency in the LLM
era. We also compile and categorize the RAG-specific datasets and evaluation
frameworks, conducting a meta-analysis of evaluation practices in high-impact
RAG research. To the best of our knowledge, this work represents the most
comprehensive survey for RAG evaluation, bridging traditional and LLM-driven
methods, and serves as a critical resource for advancing RAG development.
Authors' comments: 18 pages, 5 figures
Jiaqi Wei, Hao Zhou, Xiang Zhang, Di Zhang, Zijie Qiu, Wei Wei, Jinzhe Li, Wanli Ouyang et al.
Retrieval-augmented generation (RAG) has emerged as a foundational paradigm for knowledge-grounded text generation. However, existing RAG pipelines often fail to ensure that the reasoning trajectories align with the evidential constraints imposed by retrieved content. In this paper, we reframe RAG as a problem of retrieval-aware reasoning and identify a core challenge: reasoning misalignment-the mismatch between a model's reasoning trajectory and the retrieved evidence. To address this challenge, we propose AlignRAG, a novel test-time framework that mitigates reasoning misalignment through iterative Critique-Driven Alignment (CDA) steps. In contrast to prior approaches that rely on static training or post-hoc selection, AlignRAG actively refines reasoning trajectories during inference by enforcing fine-grained alignment with evidence. Our framework introduces a new paradigm for retrieval-aware reasoning by: (1) constructing context-rich training corpora; (2) generating contrastive critiques from preference-aware reasoning trajectories; (3) training a dedicated \textit{Critic Language Model (CLM)} to identify reasoning misalignments; and (4) applying CDA steps to optimize reasoning trajectories iteratively. Empirical results demonstrate that AlignRAG consistently outperforms all baselines and could integrate as a plug-and-play module into existing RAG pipelines without further changes. By reconceptualizing RAG as a structured reasoning trajectory and establishing the test-time framework for correcting reasoning misalignments in RAG, AlignRAG provides practical advancements for retrieval-aware generation.
Junchen Fu, Xuri Ge, Xin Xin, Haitao Yu, Yue Feng, Alexandros Karatzoglou, Ioannis Arapakis, Joemon M. Jose
Multimodal representation learning has garnered significant attention in the
AI community, largely due to the success of large pre-trained multimodal
foundation models like LLaMA, GPT, Mistral, and CLIP. These models have
achieved remarkable performance across various tasks of multimodal information
retrieval (MIR), including web search, cross-modal retrieval, and recommender
systems, etc. However, due to their enormous parameter sizes, significant
efficiency challenges emerge across training, deployment, and inference stages
when adapting these models' representation for IR tasks. These challenges
present substantial obstacles to the practical adaptation of foundation models
for representation learning in information retrieval tasks.
To address these pressing issues, we propose organizing the first EReL@MIR
workshop at the Web Conference 2025, inviting participants to explore novel
solutions, emerging problems, challenges, efficiency evaluation metrics and
benchmarks. This workshop aims to provide a platform for both academic and
industry researchers to engage in discussions, share insights, and foster
collaboration toward achieving efficient and effective representation learning
for multimodal information retrieval in the era of large foundation models.
Authors' comments: WWW2025 Workshop Summary
Feifei Niu, Rongqi Pan, Lionel C. Briand, Hanyang Hu, Krishna Koravadi
In automotive software development, as well as other domains, traceability between stakeholder requirements and system requirements is crucial to ensure consistency, correctness, and regulatory compliance. However, erroneous or missing traceability relationships often arise due to improper propagation of requirement changes or human errors in requirement mapping, leading to inconsistencies and increased maintenance costs. Existing approaches do not address traceability between stakeholder and system requirements, rely on open-source data -- as opposed to automotive (or any industry) data -- and do not address the validation of manual links established by engineers. Additionally, automotive requirements often exhibit variations in the way they are expressed, posing challenges for supervised models requiring training. The recent advancements in large language models (LLMs) provide new opportunities to address these challenges. In this paper, we introduce TVR, a requirement Traceability Validation and Recovery approach primarily targeting automotive systems, leveraging LLMs enhanced with retrieval-augmented generation (RAG). TVR is designed to validate existing traceability links and recover missing ones with high accuracy. We empirically evaluate TVR on automotive requirements, achieving 98.87% accuracy in traceability validation and 85.50% correctness in traceability recovery. Additionally, TVR demonstrates strong robustness, achieving 97.13% in accuracy when handling unseen requirements variations. The results highlight the practical effectiveness of RAG-based LLM approaches in industrial settings, offering a promising solution for improving requirements traceability in complex automotive systems.
Yijun Liu
Interpreting neural activity through meaningful latent representations
remains a complex and evolving challenge at the intersection of neuroscience
and artificial intelligence. We investigate the potential of multimodal
foundation models to align invasive brain recordings with natural language. We
present SSENSE, a contrastive learning framework that projects single-subject
stereo-electroencephalography (sEEG) signals into the sentence embedding space
of a frozen CLIP model, enabling sentence-level retrieval directly from brain
activity. SSENSE trains a neural encoder on spectral representations of sEEG
using InfoNCE loss, without fine-tuning the text encoder. We evaluate our
method on time-aligned sEEG and spoken transcripts from a naturalistic
movie-watching dataset. Despite limited data, SSENSE achieves promising
results, demonstrating that general-purpose language representations can serve
as effective priors for neural decoding.
Authors' comments: Accepted for poster presentation at the CVPR 2025 Workshop on
Multimodal Foundation Models (MMFM3)
Christopher Ick, Gordon Wichern, Yoshiki Masuyama, François G. Germain, Jonathan Le Roux
This report details MERL's system for room impulse response (RIR) estimation
submitted to the Generative Data Augmentation Workshop at ICASSP 2025 for
Augmenting RIR Data (Task 1) and Improving Speaker Distance Estimation (Task
2). We first pre-train a neural acoustic field conditioned by room geometry on
an external large-scale dataset in which pairs of RIRs and the geometries are
provided. The neural acoustic field is then adapted to each target room by
using the enrollment data, where we leverage either the provided room
geometries or geometries retrieved from the external dataset, depending on
availability. Lastly, we predict the RIRs for each pair of source and receiver
locations specified by Task 1, and use these RIRs to train the speaker distance
estimation model in Task 2.
Authors' comments: Presented at ICASSP 2025 GenDA Workshop
Yong-En Tian, Yu-Chien Tang, Kuang-Da Wang, An-Zi Yen, Wen-Chih Peng
Tailoring structured financial reports from companies' earnings releases is
crucial for understanding financial performance and has been widely adopted in
real-world analytics. However, existing summarization methods often generate
broad, high-level summaries, which may lack the precision and detail required
for financial reports that typically focus on specific, structured sections.
While Large Language Models (LLMs) hold promise, generating reports adhering to
predefined multi-section templates remains challenging. This paper investigates
two LLM-based approaches popular in industry for generating templated financial
reports: an agentic information retrieval (IR) framework and a decomposed IR
approach, namely AgenticIR and DecomposedIR. The AgenticIR utilizes
collaborative agents prompted with the full template. In contrast, the
DecomposedIR approach applies a prompt chaining workflow to break down the
template and reframe each section as a query answered by the LLM using the
earnings release. To quantitatively assess the generated reports, we evaluated
both methods in two scenarios: one using a financial dataset without direct
human references, and another with a weather-domain dataset featuring
expert-written reports. Experimental results show that while AgenticIR may
excel in orchestrating tasks and generating concise reports through agent
collaboration, DecomposedIR statistically significantly outperforms AgenticIR
approach in providing broader and more detailed coverage in both scenarios,
offering reflection on the utilization of the agentic framework in real-world
applications.
Authors' comments: 5 pages; 3 figures. Accepted by SIGIR 2025 short paper track. Code
available at
https://github.com/bryant-nn/Template-Based-Financial-Report-Generation