Guimin Hu, Hasti Seifi
Large language models (LLMs) have garnered significant attention in recent years due to their impressive performance. While considerable research has evaluated these models from various perspectives, the extent to which LLMs can perform implicit and explicit emotion retrieval remains largely unexplored. To address this gap, this study investigates LLMs' emotion retrieval capabilities in commonsense. Through extensive experiments involving multiple models, we systematically evaluate the ability of LLMs on emotion retrieval. Specifically, we propose a supervised contrastive probing method to verify LLMs' performance for implicit and explicit emotion retrieval, as well as the diversity of the emotional events they retrieve. The results offer valuable insights into the strengths and limitations of LLMs in handling emotion retrieval.
Spectroscopic observations of exoplanet atmospheres can reveal the chemical
composition, temperature, cloud properties, and (potentially) the habitability
of these distant worlds. The inference of such properties is generally enabled
by Bayesian atmospheric retrieval algorithms. However, until recently, many
retrieval codes have not been publicly available. Here, we describe the open
source release of the POSEIDON exoplanet radiative transfer and retrieval code.
POSEIDON is a Python package for the 1D, 2D, or 3D modelling and analysis of
exoplanet spectra, which is frequently used to interpret Hubble and JWST
observations of exoplanet atmospheres. We provide extensive tutorials on both
forward modelling and retrievals in POSEIDON's online documentation, which we
hope will serve as a helpful resource for the exoplanet atmosphere community.
Authors' comments: 6 pages, 1 figure, published in JOSS in 2023. Tutorials available at
https://poseidon-retrievals.readthedocs.io/en/latest/
Ryuma Nakahata, Shehtab Zaman, Mingyuan Zhang, Fake Lu, Kenneth Chiu
Ptychography is a computational method of microscopy that recovers
high-resolution transmission images of samples from a series of diffraction
patterns. While conventional phase retrieval algorithms can iteratively recover
the images, they require oversampled diffraction patterns, incur significant
computational costs, and struggle to recover the absolute phase of the sample's
transmission function. Deep learning algorithms for ptychography are a
promising approach to resolving the limitations of iterative algorithms. We
present PtychoFormer, a hierarchical transformer-based model for data-driven
single-shot ptychographic phase retrieval. PtychoFormer processes subsets of
diffraction patterns, generating local inferences that are seamlessly stitched
together to produce a high-quality reconstruction. Our model exhibits tolerance
to sparsely scanned diffraction patterns and achieves up to 3600 times faster
imaging speed than the extended ptychographic iterative engine (ePIE). We also
propose the extended-PtychoFormer (ePF), a hybrid approach that combines the
benefits of PtychoFormer with the ePIE. ePF minimizes global phase shifts and
significantly enhances reconstruction quality, achieving state-of-the-art phase
retrieval in ptychography.
Authors' comments: 20 pages, 12 figures
Reza Fayyazi, Stella Hoyos Trueba, Michael Zuzak, Shanchieh Jay Yang
In cybersecurity, security analysts face the challenge of mitigating newly discovered vulnerabilities in real-time, with over 300,000 Common Vulnerabilities and Exposures (CVEs) identified since 1999. The sheer volume of known vulnerabilities complicates the detection of patterns for unknown threats. While LLMs can assist, they often hallucinate and lack alignment with recent threats. Over 25,000 vulnerabilities have been identified so far in 2024, which are introduced after popular LLMs' (e.g., GPT-4) training data cutoff. This raises a major challenge of leveraging LLMs in cybersecurity, where accuracy and up-to-date information are paramount. In this work, we aim to improve the adaptation of LLMs in vulnerability analysis by mimicking how analysts perform such tasks. We propose ProveRAG, an LLM-powered system designed to assist in rapidly analyzing CVEs with automated retrieval augmentation of web data while self-evaluating its responses with verifiable evidence. ProveRAG incorporates a self-critique mechanism to help alleviate omission and hallucination common in the output of LLMs applied in cybersecurity applications. The system cross-references data from verifiable sources (NVD and CWE), giving analysts confidence in the actionable insights provided. Our results indicate that ProveRAG excels in delivering verifiable evidence to the user with over 99% and 97% accuracy in exploitation and mitigation strategies, respectively. This system outperforms direct prompting and chunking retrieval in vulnerability analysis by overcoming temporal and context-window limitations. ProveRAG guides analysts to secure their systems more effectively while documenting the process for future audits.
Krishna Sayana, Raghavendra Vasudeva, Yuri Vasilevski, Kun Su, Liam Hebert, James Pine, Hubert Pham, Ambarish Jash et al.
The recent advances in Large Language Model's generation and reasoning capabilities present an opportunity to develop truly conversational recommendation systems. However, effectively integrating recommender system knowledge into LLMs for natural language generation which is tailored towards recommendation tasks remains a challenge. This paper addresses this challenge by making two key contributions. First, we introduce a new dataset (REGEN) for natural language generation tasks in conversational recommendations. REGEN (Reviews Enhanced with GEnerative Narratives) extends the Amazon Product Reviews dataset with rich user narratives, including personalized explanations of product preferences, product endorsements for recommended items, and summaries of user purchase history. REGEN is made publicly available to facilitate further research. Furthermore, we establish benchmarks using well-known generative metrics, and perform an automated evaluation of the new dataset using a rater LLM. Second, the paper introduces a fusion architecture (CF model with an LLM) which serves as a baseline for REGEN. And to the best of our knowledge, represents the first attempt to analyze the capabilities of LLMs in understanding recommender signals and generating rich narratives. We demonstrate that LLMs can effectively learn from simple fusion architectures utilizing interaction-based CF embeddings, and this can be further enhanced using the metadata and personalization data associated with items. Our experiments show that combining CF and content embeddings leads to improvements of 4-12% in key language metrics compared to using either type of embedding individually. We also provide an analysis to interpret how CF and content embeddings contribute to this new generative task.
Tomas André, Emiliano De Santis, Nicusor Timneanu, Carl Caleman
Single Particle Imaging techniques at X-ray lasers have made significant
strides, yet the challenge of determining the orientation of freely rotating
molecules during delivery remains. In this study, we propose a novel method to
partially retrieve the relative orientation of proteins exposed to ultrafast
X-ray pulses by analyzing the fragmentation patterns resulting from Coulomb
explosions. We simulate these explosions for 45 proteins in the size range 100
-- 4000 atoms using a hybrid Monte Carlo/Molecular Dynamics approach and
capture the resulting ion ejection patterns with virtual detectors. Our goal is
to exploit information from the explosion to infer orientations of proteins at
the time of X-ray exposure. Our results demonstrate that partial orientation
information can be extracted, particularly for larger proteins. Our findings
can be integrated into existing reconstruction algorithms such as
Expand-Maximize-Compress, to improve their efficiency and reduce the need for
high-quality diffraction patterns. This method offers a promising avenue for
enhancing Single Particle Imaging by leveraging measurable data from the
Coulomb explosion to provide valuable insights about orientation.
Authors' comments: 11 pages, 4 figures, one column double spacing
Changmao Li, Jeffrey Flanigan
Large Language Models (LLMs) exhibit impressive results across a wide range of natural language processing (NLP) tasks, yet they can often produce factually incorrect outputs. This paper introduces a simple but effective low-latency post-correction method, \textbf{Retrieval Augmented Correction (RAC)}, aimed at enhancing the factual performance of LLMs without requiring additional fine-tuning. Our method is general and can be used with any instruction-tuned LLM, and has greatly reduced latency compared to prior approaches. RAC decomposes the LLM's output into atomic facts and applies a fine-grained verification and correction process with retrieved content to verify and correct the LLM-generated output. Our extensive experiments show that RAC yields up to 30\% improvements over state-of-the-art baselines across two popular factuality evaluation datasets, validating its efficacy and robustness in both with and without the integration of Retrieval-Augmented Generation (RAG) across different LLMs.\footnote{Our code is at \url{https://github.com/jlab-nlp/Retrieval-Augmented-Correction}}
Haobin Li, Peng Hu, Qianjun Zhang, Xi Peng, Xiting Liu, Mouxing Yang
The success of most existing cross-modal retrieval methods heavily relies on
the assumption that the given queries follow the same distribution of the
source domain. However, such an assumption is easily violated in real-world
scenarios due to the complexity and diversity of queries, thus leading to the
query shift problem. Specifically, query shift refers to the online query
stream originating from the domain that follows a different distribution with
the source one. In this paper, we observe that query shift would not only
diminish the uniformity (namely, within-modality scatter) of the query modality
but also amplify the gap between query and gallery modalities. Based on the
observations, we propose a novel method dubbed Test-time adaptation for
Cross-modal Retrieval (TCR). In brief, TCR employs a novel module to refine the
query predictions (namely, retrieval results of the query) and a joint
objective to prevent query shift from disturbing the common space, thus
achieving online adaptation for the cross-modal retrieval models with query
shift. Expensive experiments demonstrate the effectiveness of the proposed TCR
against query shift. The code will be released upon acceptance.
Authors' comments: 22 pages, 8 figures
Xinze Li, Hanbin Wang, Zhenghao Liu, Shi Yu, Shuo Wang, Yukun Yan, Yukai Fu, Yu Gu et al.
Pretrained language models have shown strong effectiveness in code-related tasks, such as code retrieval, code generation, code summarization, and code completion tasks. In this paper, we propose COde assistaNt viA retrieval-augmeNted language model (CONAN), which aims to build a code assistant by mimicking the knowledge-seeking behaviors of humans during coding. Specifically, it consists of a code structure aware retriever (CONAN-R) and a dual-view code representation-based retrieval-augmented generation model (CONAN-G). CONAN-R pretrains CodeT5 using Code-Documentation Alignment and Masked Entity Prediction tasks to make language models code structure-aware and learn effective representations for code snippets and documentation. Then CONAN-G designs a dual-view code representation mechanism for implementing a retrieval-augmented code generation model. CONAN-G regards the code documentation descriptions as prompts, which help language models better understand the code semantics. Our experiments show that CONAN achieves convincing performance on different code generation tasks and significantly outperforms previous retrieval augmented code generation models. Our further analyses show that CONAN learns tailored representations for both code snippets and documentation by aligning code-documentation data pairs and capturing structural semantics by masking and predicting entities in the code data. Additionally, the retrieved code snippets and documentation provide necessary information from both program language and natural language to assist the code generation process. CONAN can also be used as an assistant for Large Language Models (LLMs), providing LLMs with external knowledge in shorter code document lengths to improve their effectiveness on various code tasks. It shows the ability of CONAN to extract necessary information and help filter out the noise from retrieved code documents.
Hao Chen, Lei Zhu, Xinghui Zhu
Deep hashing, due to its low cost and efficient retrieval advantages, is widely valued in cross-modal retrieval. However, existing cross-modal hashing methods either explore the relationships between data points, which inevitably leads to intra-class dispersion, or explore the relationships between data points and categories while ignoring the preservation of inter-class structural relationships, resulting in the generation of suboptimal hash codes. How to maintain both intra-class aggregation and inter-class structural relationships, In response to this issue, this paper proposes a DCGH method. Specifically, we use proxy loss as the mainstay to maintain intra-class aggregation of data, combined with pairwise loss to maintain inter-class structural relationships, and on this basis, further propose a variance constraint to address the semantic bias issue caused by the combination. A large number of comparative experiments on three benchmark datasets show that the DCGH method has comparable or even better performance compared to existing cross-modal retrieval methods. The code for the implementation of our DCGH framework is available at https://github.com/donnotnormal/DCGH.
Vlassis Fotis, Ioannis Romanelis, Georgios Mylonas, Athanasios Kalogeras, Konstantinos Moustakas
In this paper we study the problem of shape part retrieval in the point cloud domain. Shape retrieval methods in the literature rely on the presence of an existing query object, but what if the part we are looking for is not available? We present Part Retrieval Pipeline (PReP), a pipeline that creatively utilizes metric learning techniques along with a trained classification model to measure the suitability of potential replacement parts from a database, as part of an application scenario targeting circular economy. Through an innovative training procedure with increasing difficulty, it is able to learn to recognize suitable parts relying only on shape context. Thanks to its low parameter size and computational requirements, it can be used to sort through a warehouse of potentially tens of thousand of spare parts in just a few seconds. We also establish an alternative baseline approach to compare against, and extensively document the unique challenges associated with this task, as well as identify the design choices to solve them.
Hansa Meghwani
Ranking consistently emerges as a primary focus in information retrieval
research. Retrieval and ranking models serve as the foundation for numerous
applications, including web search, open domain QA, enterprise domain QA, and
text-based recommender systems. Typically, these models undergo training on
triplets consisting of binary relevance assignments, comprising one positive
and one negative passage. However, their utilization involves a context where a
significantly more nuanced understanding of relevance is necessary, especially
when re-ranking a large pool of potentially relevant passages. Although
collecting positive examples through user feedback like impressions or clicks
is straightforward, identifying suitable negative pairs from a vast pool of
possibly millions or even billions of documents possess a greater challenge.
Generating a substantial number of negative pairs is often necessary to
maintain the high quality of the model. Several approaches have been suggested
in literature to tackle the issue of selecting suitable negative pairs from an
extensive corpus. This study focuses on explaining the crucial role of hard
negatives in the training process of cross-encoder models, specifically aiming
to explain the performance gains observed with hard negative sampling compared
to random sampling. We have developed a robust hard negative mining technique
for efficient training of cross-encoder re-rank models on an enterprise dataset
which has domain specific context. We provide a novel perspective to enhance
retrieval models, ultimately influencing the performance of advanced LLM
systems like Retrieval-Augmented Generation (RAG) and Reasoning and Action
Agents (ReAct). The proposed approach demonstrates that learning both
similarity and dissimilarity simultaneously with cross-encoders improves
performance of retrieval systems.
Authors' comments: Master's thesis
Jianfa Chen, Emily Shen, Trupti Bavalatti, Xiaowen Lin, Yongkai Wang, Shuming Hu, Harihar Subramanyam, Ksheeraj Sai Vepuri et al.
Robust content moderation classifiers are essential for the safety of
Generative AI systems. In this task, differences between safe and unsafe inputs
are often extremely subtle, making it difficult for classifiers (and indeed,
even humans) to properly distinguish violating vs. benign samples without
context or explanation. Scaling risk discovery and mitigation through
continuous model fine-tuning is also slow, challenging and costly, preventing
developers from being able to respond quickly and effectively to emergent
harms. We propose a Classification approach employing Retrieval-Augmented
Generation (Class-RAG). Class-RAG extends the capability of its base LLM
through access to a retrieval library which can be dynamically updated to
enable semantic hotfixing for immediate, flexible risk mitigation. Compared to
model fine-tuning, Class-RAG demonstrates flexibility and transparency in
decision-making, outperforms on classification and is more robust against
adversarial attack, as evidenced by empirical studies. Our findings also
suggest that Class-RAG performance scales with retrieval library size,
indicating that increasing the library size is a viable and low-cost approach
to improve content moderation.
Authors' comments: 11 pages, submit to ACL
Zhe Zhang, Xingyu Liu, Yuanzhang Lin, Xiang Gao, Hailong Sun, Yuan Yuan
Automated unit test generation has been widely studied, with Large Language Models (LLMs) recently showing significant potential. Moreover, in the context of unit test generation, these tools prioritize high code coverage, often at the expense of practical usability, correctness, and maintainability. In response, we propose Property-Based Retrieval Augmentation, a novel mechanism that extends LLM-based Retrieval-Augmented Generation (RAG) beyond basic vector, text similarity, and graph-based methods. Our approach considers task-specific context and introduces a tailored property retrieval mechanism. Specifically, in the unit test generation task, we account for the unique structure of unit tests by dividing the test generation process into Given, When, and Then phases. When generating tests for a focal method, we not only retrieve general context for the code under test but also consider task-specific context such as pre-existing tests of other methods, which can provide valuable insights for any of the Given, When, and Then phases. This forms property relationships between focal method and other methods, thereby expanding the scope of retrieval beyond traditional RAG. We implement this approach in a tool called APT, which sequentially performs preprocessing, property retrieval, and unit test generation, using an iterative strategy where newly generated tests guide the creation of subsequent ones. We evaluated APT on 12 open-source projects with 1515 methods, and the results demonstrate that APT consistently outperforms existing tools in terms of correctness, completeness, and maintainability of the generated tests. Moreover, we introduce a novel code-context-aware retrieval mechanism for LLMs beyond general context, offering valuable insights and potential applications for other code-related tasks.
Natalia Accomazzo, Daniel Carando, Rocio Nores, Victoria Paternostro, Sebastian Velazquez
We study the short-time Fourier transform phase retrieval problem in locally compact abelian groups. Using probabilistic methods, we show that for a large class of groups $G$ and compact subsets $K\subseteq G$ there exists a window function and a uniformly separated set in $G\times \widehat{G}$ allowing phase retrieval in $L^2(K)$.
Zefang Liu, Yinzhu Quan
Retrieving temporal event sequences from textual descriptions is crucial for applications such as analyzing e-commerce behavior, monitoring social media activities, and tracking criminal incidents. To advance this task, we introduce TESRBench, a comprehensive benchmark for temporal event sequence retrieval (TESR) from textual descriptions. TESRBench includes diverse real-world datasets with synthesized and reviewed textual descriptions, providing a strong foundation for evaluating retrieval performance and addressing challenges in this domain. Building on this benchmark, we propose TPP-Embedding, a novel model for embedding and retrieving event sequences. The model leverages the TPP-LLM framework, integrating large language models (LLMs) with temporal point processes (TPPs) to encode both event texts and times. By pooling representations and applying a contrastive loss, it unifies temporal dynamics and event semantics in a shared embedding space, aligning sequence-level embeddings of event sequences and their descriptions. TPP-Embedding demonstrates superior performance over baseline models across TESRBench datasets, establishing it as a powerful solution for the temporal event sequence retrieval task.
Zhuohan Xie, Rui Xing, Yuxia Wang, Jiahui Geng, Hasan Iqbal, Dhruv Sahnan, Iryna Gurevych, Preslav Nakov
Fact-checking long-form text is challenging, and it is therefore common
practice to break it down into multiple atomic claims. The typical approach to
fact-checking these atomic claims involves retrieving a fixed number of pieces
of evidence, followed by a verification step. However, this method is usually
not cost-effective, as it underutilizes the verification model's internal
knowledge of the claim and fails to replicate the iterative reasoning process
in human search strategies. To address these limitations, we propose FIRE, a
novel agent-based framework that integrates evidence retrieval and claim
verification in an iterative manner. Specifically, FIRE employs a unified
mechanism to decide whether to provide a final answer or generate a subsequent
search query, based on its confidence in the current judgment. We compare FIRE
with other strong fact-checking frameworks and find that it achieves slightly
better performance while reducing large language model (LLM) costs by an
average of 7.6 times and search costs by 16.5 times. These results indicate
that FIRE holds promise for application in large-scale fact-checking
operations. Our code is available at https://github.com/mbzuai-nlp/fire.git.
Authors' comments: 4 figures, 8 tables, accepted to Findings of NAACL
Xinze Li, Sen Mei, Zhenghao Liu, Yukun Yan, Shuo Wang, Shi Yu, Zheni Zeng, Hao Chen et al.
Retrieval-Augmented Generation (RAG) has proven its effectiveness in mitigating hallucinations in Large Language Models (LLMs) by retrieving knowledge from external resources. To adapt LLMs for the RAG systems, current approaches use instruction tuning to optimize LLMs, improving their ability to utilize retrieved knowledge. This supervised fine-tuning (SFT) approach focuses on equipping LLMs to handle diverse RAG tasks using different instructions. However, it trains RAG modules to overfit training signals and overlooks the varying data preferences among agents within the RAG system. In this paper, we propose a Differentiable Data Rewards (DDR) method, which end-to-end trains RAG systems by aligning data preferences between different RAG modules. DDR works by collecting the rewards to optimize each agent in the RAG system with the rollout method, which prompts agents to sample some potential responses as perturbations, evaluates the impact of these perturbations on the whole RAG system, and subsequently optimizes the agent to produce outputs that improve the performance of the RAG system. Our experiments on various knowledge-intensive tasks demonstrate that DDR significantly outperforms the SFT method, particularly for LLMs with smaller-scale parameters that depend more on the retrieved knowledge. Additionally, DDR exhibits a stronger capability to align the data preference between RAG modules. The DDR method makes the generation module more effective in extracting key information from documents and mitigating conflicts between parametric memory and external knowledge. All codes are available at https://github.com/OpenMatch/RAG-DDR.
Hao Kang, Tevin Wang, Chenyan Xiong
Dense embeddings deliver strong retrieval performance but often lack interpretability and controllability. This paper introduces a novel approach using sparse autoencoders (SAE) to interpret and control dense embeddings via the learned latent sparse features. Our key contribution is the development of a retrieval-oriented contrastive loss, which ensures the sparse latent features remain effective for retrieval tasks and thus meaningful to interpret. Experimental results demonstrate that both the learned latent sparse features and their reconstructed embeddings retain nearly the same retrieval accuracy as the original dense vectors, affirming their faithfulness. Our further examination of the sparse latent space reveals interesting features underlying the dense embeddings and we can control the retrieval behaviors via manipulating the latent sparse features, for example, prioritizing documents from specific perspectives in the retrieval results.
Haoran Hao, Jiaming Han, Changsheng Li, Yu-Feng Li, Xiangyu Yue
The development of large language models (LLMs) has significantly enhanced
the capabilities of multimodal LLMs (MLLMs) as general assistants. However,
lack of user-specific knowledge still restricts their application in human's
daily life. In this paper, we introduce the Retrieval Augmented Personalization
(RAP) framework for MLLMs' personalization. Starting from a general MLLM, we
turn it into a personalized assistant in three steps. (a) Remember: We design a
key-value database to store user-related information, e.g., user's name, avatar
and other attributes. (b) Retrieve: When the user initiates a conversation, RAP
will retrieve relevant information from the database using a multimodal
retriever. (c) Generate: The input query and retrieved concepts' information
are fed into MLLMs to generate personalized, knowledge-augmented responses.
Unlike previous methods, RAP allows real-time concept editing via updating the
external database. To further improve generation quality and alignment with
user-specific information, we design a pipeline for data collection and create
a specialized dataset for personalized training of MLLMs. Based on the dataset,
we train a series of MLLMs as personalized multimodal assistants. By
pretraining on large-scale dataset, RAP-MLLMs can generalize to infinite visual
concepts without additional finetuning. Our models demonstrate outstanding
flexibility and generation quality across a variety of tasks, such as
personalized image captioning, question answering and visual recognition. The
code, data and models are available at https://hoar012.github.io/RAP-Project/.
Authors' comments: Accepted by CVPR 2025. Code: https://github.com/Hoar012/RAP-MLLM