Benchi Zhao, Mingrui Jing, Lei Zhang, Xuanqiang Zhao, Kun Wang, Xin Wang
Accurately estimating high-order moments of quantum states is an elementary
precondition for many crucial tasks in quantum computing, such as entanglement
spectroscopy, entropy estimation, spectrum estimation and predicting non-linear
features from quantum states. But in reality, inevitable quantum noise prevents
us from accessing the desired value. In this paper, we address this issue by
systematically analyzing the feasibility and efficiency of extracting
high-order moments from noisy states. We first show that there exists a quantum
protocol capable of accomplishing this task if and only if the underlying noise
channel is invertible. We then establish a method for deriving protocols that
attain optimal sample complexity using quantum operations and classical
post-processing only. Our protocols, in contrast to conventional ones, incur
lower overheads and avoid sampling different quantum operations due to a novel
technique called observable shift, making the protocols strong candidates for
practical usage on current quantum devices. The proposed method also indicates
the power of entangled protocols in retrieving high-order information, whereas
in the existing methods, entanglement does not help. Our work contributes to a
deeper understanding of how quantum noise could affect high-order information
extraction and provides guidance on how to tackle it.
Authors' comments: 23 pages, 6 figures
Weifeng Sun, Hongyan Li, Meng Yan, Yan Lei, Hongyu Zhang
Unit testing validates the correctness of the unit under test and has become an essential activity in software development process. A unit test consists of a test prefix that drives the unit under test into a particular state, and a test oracle (e.g., assertion), which specifies the behavior in that state. To reduce manual efforts in conducting unit testing, Yu et al. proposed an integrated approach (integration for short), combining information retrieval (IR) with a deep learning-based approach, to generate assertions for a unit test. Despite promising, there is still a knowledge gap as to why or where integration works or does not work. In this paper, we describe an in-depth analysis of the effectiveness of integration. Our analysis shows that: 1) The overall performance of integration is mainly due to its success in retrieving assertions. 2) integration struggles to understand the semantic differences between the retrieved focal-test (focal-test includes a test prefix and a unit under test) and the input focal-test; 3) integration is limited to specific types of edit operations and cannot handle token addition or deletion. To improve the effectiveness of assertion generation, this paper proposes a novel retrieve-and-edit approach named EditAS. Specifically, EditAS first retrieves a similar focal-test from a pre-defined corpus and treats its assertion as a prototype. Then, EditAS reuses the information in the prototype and edits the prototype automatically. EditAS is more generalizable than integration. We conduct experiments on two large-scale datasets and experimental results demonstrate that EditAS outperforms the state-of-the-art approaches, with an average improvement of 10.00%-87.48% and 3.30%-42.65% in accuracy and BLEU score, respectively.
Ziyang Wang, Yi-Lin Sung, Feng Cheng, Gedas Bertasius, Mohit Bansal
The canonical approach to video-text retrieval leverages a coarse-grained or
fine-grained alignment between visual and textual information. However,
retrieving the correct video according to the text query is often challenging
as it requires the ability to reason about both high-level (scene) and
low-level (object) visual clues and how they relate to the text query. To this
end, we propose a Unified Coarse-to-fine Alignment model, dubbed UCoFiA.
Specifically, our model captures the cross-modal similarity information at
different granularity levels. To alleviate the effect of irrelevant visual
clues, we also apply an Interactive Similarity Aggregation module (ISA) to
consider the importance of different visual features while aggregating the
cross-modal similarity to obtain a similarity score for each granularity.
Finally, we apply the Sinkhorn-Knopp algorithm to normalize the similarities of
each level before summing them, alleviating over- and under-representation
issues at different levels. By jointly considering the crossmodal similarity of
different granularity, UCoFiA allows the effective unification of multi-grained
alignments. Empirically, UCoFiA outperforms previous state-of-the-art
CLIP-based methods on multiple video-text retrieval benchmarks, achieving 2.4%,
1.4% and 1.3% improvements in text-to-video retrieval R@1 on MSR-VTT,
Activity-Net, and DiDeMo, respectively. Our code is publicly available at
https://github.com/Ziyang412/UCoFiA.
Authors' comments: ICCV 2023
Yating Liu, Yaowei Li, Zimo Liu, Wenming Yang, Yaowei Wang, Qingmin Liao
Text-based Person Retrieval (TPR) aims to retrieve the target person images
given a textual query. The primary challenge lies in bridging the substantial
gap between vision and language modalities, especially when dealing with
limited large-scale datasets. In this paper, we introduce a CLIP-based
Synergistic Knowledge Transfer (CSKT) approach for TPR. Specifically, to
explore the CLIP's knowledge on input side, we first propose a Bidirectional
Prompts Transferring (BPT) module constructed by text-to-image and
image-to-text bidirectional prompts and coupling projections. Secondly, Dual
Adapters Transferring (DAT) is designed to transfer knowledge on output side of
Multi-Head Attention (MHA) in vision and language. This synergistic two-way
collaborative mechanism promotes the early-stage feature fusion and efficiently
exploits the existing knowledge of CLIP. CSKT outperforms the state-of-the-art
approaches across three benchmark datasets when the training parameters merely
account for 7.4% of the entire model, demonstrating its remarkable efficiency,
effectiveness and generalization.
Authors' comments: ICASSP2024(accepted). minor typos revision compared to version 1 in
arxiv
Ekaterina Khramtsova, Shengyao Zhuang, Mahsa Baktashmotlagh, Xi Wang, Guido Zuccon
We propose the new problem of choosing which dense retrieval model to use when searching on a new collection for which no labels are available, i.e. in a zero-shot setting. Many dense retrieval models are readily available. Each model however is characterized by very differing search effectiveness -- not just on the test portion of the datasets in which the dense representations have been learned but, importantly, also across different datasets for which data was not used to learn the dense representations. This is because dense retrievers typically require training on a large amount of labeled data to achieve satisfactory search effectiveness in a specific dataset or domain. Moreover, effectiveness gains obtained by dense retrievers on datasets for which they are able to observe labels during training, do not necessarily generalise to datasets that have not been observed during training. This is however a hard problem: through empirical experimentation we show that methods inspired by recent work in unsupervised performance evaluation with the presence of domain shift in the area of computer vision and machine learning are not effective for choosing highly performing dense retrievers in our setup. The availability of reliable methods for the selection of dense retrieval models in zero-shot settings that do not require the collection of labels for evaluation would allow to streamline the widespread adoption of dense retrieval. This is therefore an important new problem we believe the information retrieval community should consider. Implementation of methods, along with raw result files and analysis scripts are made publicly available at https://www.github.com/anonymized.
Himanshu Thakur, Soumitri Chattopadhyay
The ability to retrieve a photo by mere free-hand sketching highlights the
immense potential of Fine-grained sketch-based image retrieval (FG-SBIR).
However, its rapid practical adoption, as well as scalability, is limited by
the expense of acquiring faithful sketches for easily available photo
counterparts. A solution to this problem is Active Learning, which could
minimise the need for labeled sketches while maximising performance. Despite
extensive studies in the field, there exists no work that utilises it for
reducing sketching effort in FG-SBIR tasks. To this end, we propose a novel
active learning sampling technique that drastically minimises the need for
drawing photo sketches. Our proposed approach tackles the trade-off between
uncertainty and diversity by utilising the relationship between the existing
photo-sketch pair to a photo that does not have its sketch and augmenting this
relation with its intermediate representations. Since our approach relies only
on the underlying data distribution, it is agnostic of the modelling approach
and hence is applicable to other cross-modal instance-level retrieval tasks as
well. With experimentation over two publicly available fine-grained SBIR
datasets ChairV2 and ShoeV2, we validate our approach and reveal its
superiority over adapted baselines.
Authors' comments: Accepted at BMVC 2023
Kaiyi Luo, Xulong Zhang, Jianzong Wang, Huaxiong Li, Ning Cheng, Jing Xiao
Cross-modal retrieval (CMR) has been extensively applied in various domains,
such as multimedia search engines and recommendation systems. Most existing CMR
methods focus on image-to-text retrieval, whereas audio-to-text retrieval, a
less explored domain, has posed a great challenge due to the difficulty to
uncover discriminative features from audio clips and texts. Existing studies
are restricted in the following two ways: 1) Most researchers utilize
contrastive learning to construct a common subspace where similarities among
data can be measured. However, they considers only cross-modal transformation,
neglecting the intra-modal separability. Besides, the temperature parameter is
not adaptively adjusted along with semantic guidance, which degrades the
performance. 2) These methods do not take latent representation reconstruction
into account, which is essential for semantic alignment. This paper introduces
a novel audio-text oriented CMR approach, termed Contrastive Latent Space
Reconstruction Learning (CLSR). CLSR improves contrastive representation
learning by taking intra-modal separability into account and adopting an
adaptive temperature control strategy. Moreover, the latent representation
reconstruction modules are embedded into the CMR framework, which improves
modal interaction. Experiments in comparison with some state-of-the-art methods
on two audio-text datasets have validated the superiority of CLSR.
Authors' comments: Accepted by The 35th IEEE International Conference on Tools with
Artificial Intelligence. (ICTAI 2023)
Kejun Lin, Zhixiang Wang, Zheng Wang, Yinqiang Zheng, Shin'ichi Satoh
Person re-identification (re-ID) requires densely distributed cameras. In
practice, the person of interest may not be captured by cameras and, therefore,
needs to be retrieved using subjective information (e.g., sketches from
witnesses). Previous research defines this case using the sketch as sketch
re-identification (Sketch re-ID) and focuses on eliminating the domain gap.
Actually, subjectivity is another significant challenge. We model and
investigate it by posing a new dataset with multi-witness descriptions. It
features two aspects. 1) Large-scale. It contains over 4,763 sketches and
32,668 photos, making it the largest Sketch re-ID dataset. 2) Multi-perspective
and multi-style. Our dataset offers multiple sketches for each identity.
Witnesses' subjective cognition provides multiple perspectives on the same
individual, while different artists' drawing styles provide variation in sketch
styles. We further have two novel designs to alleviate the challenge of
subjectivity. 1) Fusing subjectivity. We propose a non-local (NL) fusion module
that gathers sketches from different witnesses for the same identity. 2)
Introducing objectivity. An AttrAlign module utilizes attributes as an implicit
mask to align cross-domain features. To push forward the advance of Sketch
re-ID, we set three benchmarks (large-scale, multi-style, cross-style).
Extensive experiments demonstrate our leading performance in these benchmarks.
Dataset and Codes are publicly available at:
https://github.com/Lin-Kayla/subjectivity-sketch-reid
Authors' comments: ACM Multimedia 2023
Tal Peer, Simon Welker, Johannes Kolhoff, Timo Gerkmann
Several recent contributions in the field of iterative STFT phase retrieval
have demonstrated that the performance of the classical Griffin-Lim method can
be considerably improved upon. By using the same projection operators as
Griffin-Lim, but combining them in innovative ways, these approaches achieve
better results in terms of both reconstruction quality and required number of
iterations, while retaining a similar computational complexity per iteration.
However, like Griffin-Lim, these algorithms operate in an offline manner and
thus require an entire spectrogram as input, which is an unrealistic
requirement for many real-world speech communication applications. We propose
to extend RTISI -- an existing online (frame-by-frame) variant of the
Griffin-Lim algorithm -- into a flexible framework that enables straightforward
online implementation of any algorithm based on iterative projections. We
further employ this framework to implement online variants of the fast
Griffin-Lim algorithm, the accelerated Griffin-Lim algorithm, and two
algorithms from the optics domain. Evaluation results on speech signals show
that, similarly to the offline case, these algorithms can achieve a
considerable performance gain compared to RTISI.
Authors' comments: Submitted to ICASSP 24
Chao-Wei Huang, Chen-Yu Hsu, Tsu-Yuan Hsu, Chen-An Li, Yun-Nung Chen
Conversational search provides a natural interface for information retrieval
(IR). Recent approaches have demonstrated promising results in applying dense
retrieval to conversational IR. However, training dense retrievers requires
large amounts of in-domain paired data. This hinders the development of
conversational dense retrievers, as abundant in-domain conversations are
expensive to collect. In this paper, we propose CONVERSER, a framework for
training conversational dense retrievers with at most 6 examples of in-domain
dialogues. Specifically, we utilize the in-context learning capability of large
language models to generate conversational queries given a passage in the
retrieval corpus. Experimental results on conversational retrieval benchmarks
OR-QuAC and TREC CAsT 19 show that the proposed CONVERSER achieves comparable
performance to fully-supervised models, demonstrating the effectiveness of our
proposed framework in few-shot conversational dense retrieval. All source code
and generated datasets are available at https://github.com/MiuLab/CONVERSER
Authors' comments: Accepted to SIGDIAL 2023
Guoyuan An, Woo Jae Kim, Saelyne Yang, Rong Li, Yuchi Huo, Sung-Eui Yoon
This paper introduces the first two pixel retrieval benchmarks. Pixel retrieval is segmented instance retrieval. Like semantic segmentation extends classification to the pixel level, pixel retrieval is an extension of image retrieval and offers information about which pixels are related to the query object. In addition to retrieving images for the given query, it helps users quickly identify the query object in true positive images and exclude false positive images by denoting the correlated pixels. Our user study results show pixel-level annotation can significantly improve the user experience. Compared with semantic and instance segmentation, pixel retrieval requires a fine-grained recognition capability for variable-granularity targets. To this end, we propose pixel retrieval benchmarks named PROxford and PRParis, which are based on the widely used image retrieval datasets, ROxford and RParis. Three professional annotators label 5,942 images with two rounds of double-checking and refinement. Furthermore, we conduct extensive experiments and analysis on the SOTA methods in image search, image matching, detection, segmentation, and dense matching using our pixel retrieval benchmarks. Results show that the pixel retrieval task is challenging to these approaches and distinctive from existing problems, suggesting that further research can advance the content-based pixel-retrieval and thus user search experience. The datasets can be downloaded from \href{https://github.com/anguoyuan/Pixel_retrieval-Segmented_instance_retrieval}{this link}.
Georgios Varnavides, Stephanie M. Ribet, Steven E. Zeltmann, Yue Yu, Benjamin H. Savitzky, Vinayak P. Dravid, Mary C. Scott, Colin Ophus
Scanning transmission electron microscopy (STEM) has been extensively used
for imaging complex materials down to atomic resolution. The most commonly
employed STEM imaging modality of annular dark field produces
easily-interpretable contrast, but is dose-inefficient and produces little to
no contrast for light elements and weakly-scattering samples. An alternative is
to use phase contrast STEM imaging, enabled by high speed detectors able to
record full images of a diffracted STEM probe over a grid of scan positions.
Phase contrast imaging in STEM is highly dose-efficient, able to measure the
structure of beam-sensitive materials and even biological samples. Here, we
comprehensively describe the theoretical background, algorithmic implementation
details, and perform both simulated and experimental tests for three iterative
phase retrieval STEM methods: focused-probe differential phase contrast,
defocused-probe parallax imaging, and a generalized ptychographic gradient
descent method implemented in two and three dimensions. We discuss the
strengths and weaknesses of each of these approaches using a consistent
framework to allow for easier comparison. This presentation of STEM phase
retrieval methods will make these methods more approachable, reproducible and
more readily adoptable for many classes of samples.
Authors' comments: 25 pages, 11 figures, 1 table
Rongsheng Li, Yangning Li, Yinghui Li, Chaiyut Luoyiching, Hai-Tao Zheng, Nannan Zhou, Hanjing Su
Meta learning have achieved promising performance in low-resource text
classification which aims to identify target classes with knowledge transferred
from source classes with sets of small tasks named episodes. However, due to
the limited training data in the meta-learning scenario and the inherent
properties of parameterized neural networks, poor generalization performance
has become a pressing problem that needs to be addressed. To deal with this
issue, we propose a meta-learning based method called Retrieval-Augmented Meta
Learning(RAML). It not only uses parameterization for inference but also
retrieves non-parametric knowledge from an external corpus to make inferences,
which greatly alleviates the problem of poor generalization performance caused
by the lack of diverse training data in meta-learning. This method differs from
previous models that solely rely on parameters, as it explicitly emphasizes the
importance of non-parametric knowledge, aiming to strike a balance between
parameterized neural networks and non-parametric knowledge. The model is
required to determine which knowledge to access and utilize during inference.
Additionally, our multi-view passages fusion network module can effectively and
efficiently integrate the retrieved information into low-resource
classification task. The extensive experiments demonstrate that RAML
significantly outperforms current SOTA low-resource text classification models.
Authors' comments: Under Review
Rima Hazra, Debanjan Saha, Amruit Sahoo, Somnath Banerjee, Animesh Mukherjee
Community Question Answering (CQA) in different domains is growing at a large
scale because of the availability of several platforms and huge shareable
information among users. With the rapid growth of such online platforms, a
massive amount of archived data makes it difficult for moderators to retrieve
possible duplicates for a new question and identify and confirm existing
question pairs as duplicates at the right time. This problem is even more
critical in CQAs corresponding to large software systems like askubuntu where
moderators need to be experts to comprehend something as a duplicate. Note that
the prime challenge in such CQA platforms is that the moderators are themselves
experts and are therefore usually extremely busy with their time being
extraordinarily expensive. To facilitate the task of the moderators, in this
work, we have tackled two significant issues for the askubuntu CQA platform:
(1) retrieval of duplicate questions given a new question and (2) duplicate
question confirmation time prediction. In the first task, we focus on
retrieving duplicate questions from a question pool for a particular newly
posted question. In the second task, we solve a regression problem to rank a
pair of questions that could potentially take a long time to get confirmed as
duplicates. For duplicate question retrieval, we propose a Siamese neural
network based approach by exploiting both text and network-based features,
which outperforms several state-of-the-art baseline techniques. Our method
outperforms DupPredictor and DUPE by 5% and 7% respectively. For duplicate
confirmation time prediction, we have used both the standard machine learning
models and neural network along with the text and graph-based features. We
obtain Spearman's rank correlation of 0.20 and 0.213 (statistically
significant) for text and graph based features respectively.
Authors' comments: Full paper accepted at ASONAM 2023: The 2023 IEEE/ACM International
Conference on Advances in Social Networks Analysis and Mining
Hiba Ahsan, Denis Jered McInerney, Jisoo Kim, Christopher Potter, Geoffrey Young, Silvio Amir, Byron C. Wallace
Unstructured Electronic Health Record (EHR) data often contains critical information complementary to imaging data that would inform radiologists' diagnoses. However, time constraints and the large volume of notes frequently associated with individual patients renders manual perusal of such data to identify relevant evidence infeasible in practice. Modern Large Language Models (LLMs) provide a flexible means of interacting with unstructured EHR data, and may provide a mechanism to efficiently retrieve and summarize unstructured evidence relevant to a given query. In this work, we propose and evaluate an LLM (Flan-T5 XXL) for this purpose. Specifically, in a zero-shot setting we task the LLM to infer whether a patient has or is at risk of a particular condition; if so, we prompt the model to summarize the supporting evidence. Enlisting radiologists for manual evaluation, we find that this LLM-based approach provides outputs consistently preferred to a standard information retrieval baseline, but we also highlight the key outstanding challenge: LLMs are prone to hallucinating evidence. However, we provide results indicating that model confidence in outputs might indicate when LLMs are hallucinating, potentially providing a means to address this.
Jinyuan Wang, Hai Zhao, Zhong Wang, Zeyang Zhu, Jinhao Xie, Yong Yu, Yongjian Fei, Yue Huang et al.
In recent years, great advances in pre-trained language models (PLMs) have sparked considerable research focus and achieved promising performance on the approach of dense passage retrieval, which aims at retrieving relative passages from massive corpus with given questions. However, most of existing datasets mainly benchmark the models with factoid queries of general commonsense, while specialised fields such as finance and economics remain unexplored due to the deficiency of large-scale and high-quality datasets with expert annotations. In this work, we propose a new task, policy retrieval, by introducing the Chinese Stock Policy Retrieval Dataset (CSPRD), which provides 700+ prospectus passages labeled by experienced experts with relevant articles from 10k+ entries in our collected Chinese policy corpus. Experiments on lexical, embedding and fine-tuned bi-encoder models show the effectiveness of our proposed CSPRD yet also suggests ample potential for improvement. Our best performing baseline achieves 56.1% MRR@10, 28.5% NDCG@10, 37.5% Recall@10 and 80.6% Precision@10 on dev set.
Yanran Tang, Ruihong Qiu, Xue Li
Legal case retrieval plays an important role for legal practitioners to effectively retrieve relevant cases given a query case. Most existing neural legal case retrieval models directly encode the whole legal text of a case to generate a case representation, which is then utilised to conduct a nearest neighbour search for retrieval. Although these straightforward methods have achieved improvement over conventional statistical methods in retrieval accuracy, two significant challenges are identified in this paper: (1) Legal feature alignment: the usage of the whole case text as the input will generally incorporate redundant and noisy information because, from the legal perspective, the determining factor of relevant cases is the alignment of key legal features instead of whole text matching; (2) Legal context preservation: furthermore, since the existing text encoding models usually have an input length limit shorter than the case, the whole case text needs to be truncated or divided into paragraphs, which leads to the loss of the global context of legal information. In this paper, a novel legal case retrieval framework, PromptCase, is proposed to tackle these challenges. Firstly, legal facts and legal issues are identified and formally defined as the key features facilitating legal case retrieval based on a thorough study of the definition of relevant cases from a legal perspective. Secondly, with the determining legal features, a prompt-based encoding scheme is designed to conduct an effective encoding with language models. Extensive zero-shot experiments have been conducted on two benchmark datasets in legal case retrieval, which demonstrate the superior retrieval effectiveness of the proposed PromptCase. The code has been released on https://github.com/yanran-tang/PromptCase.
Mayank Gite
Breached data refers to the unauthorized access, theft, or exposure of confidential or sensitive information. Breaches typically occur when malicious actors or unauthorized users breach secure systems or networks, resulting in compromised personally identifiable information (PII), protected or personal health information (PHI), payment card industry (PCI) information, or other sensitive data. Data breaches are often the result of malicious activities such as hacking, phishing, insider threats, malware, or physical theft. The misuse of breached data can lead to identity theft, fraud, spamming, or blackmailing. Organizations that experience data breaches may face legal and financial consequences, reputational damage, and harm to their customers or users. Breached records are commonly sold on the dark web or made available on various public forums. To counteract these malicious activities, it is possible to collect breached databases and mitigate potential harm. These databases can be quite large, reaching sizes of up to 150 GB or more. Typically, breached data is stored in the CSV (Comma Separated Value) format due to its simplicity and lightweight nature, which reduces storage requirements. Analyzing and traversing large breached databases necessitates substantial computational power. However, this research explores techniques to optimize database traversal speed without the need to rent expensive cloud machines or virtual private servers (VPS). This optimization will enable individual security researchers to analyze and process large databases on their personal computer systems while significantly reducing costs.
Jian-Feng Cai, Yu Long, Ruixue Wen, Jiaxi Ying
We study the sparse phase retrieval problem, which seeks to recover a sparse signal from a limited set of magnitude-only measurements. In contrast to prevalent sparse phase retrieval algorithms that primarily use first-order methods, we propose an innovative second-order algorithm that employs a Newton-type method with hard thresholding. This algorithm overcomes the linear convergence limitations of first-order methods while preserving their hallmark per-iteration computational efficiency. We provide theoretical guarantees that our algorithm converges to the $s$-sparse ground truth signal $\mathbf{x}^{\natural} \in \mathbb{R}^n$ (up to a global sign) at a quadratic convergence rate after at most $O(\log (\Vert\mathbf{x}^{\natural} \Vert /x_{\min}^{\natural}))$ iterations, using $\Omega(s^2\log n)$ Gaussian random samples. Numerical experiments show that our algorithm achieves a significantly faster convergence rate than state-of-the-art methods.
Jiawei Chen, Hongyu Lin, Xianpei Han, Le Sun
Retrieval-Augmented Generation (RAG) is a promising approach for mitigating
the hallucination of large language models (LLMs). However, existing research
lacks rigorous evaluation of the impact of retrieval-augmented generation on
different large language models, which make it challenging to identify the
potential bottlenecks in the capabilities of RAG for different LLMs. In this
paper, we systematically investigate the impact of Retrieval-Augmented
Generation on large language models. We analyze the performance of different
large language models in 4 fundamental abilities required for RAG, including
noise robustness, negative rejection, information integration, and
counterfactual robustness. To this end, we establish Retrieval-Augmented
Generation Benchmark (RGB), a new corpus for RAG evaluation in both English and
Chinese. RGB divides the instances within the benchmark into 4 separate
testbeds based on the aforementioned fundamental abilities required to resolve
the case. Then we evaluate 6 representative LLMs on RGB to diagnose the
challenges of current LLMs when applying RAG. Evaluation reveals that while
LLMs exhibit a certain degree of noise robustness, they still struggle
significantly in terms of negative rejection, information integration, and
dealing with false information. The aforementioned assessment outcomes indicate
that there is still a considerable journey ahead to effectively apply RAG to
LLMs.
Authors' comments: Accepted to AAAI 2024