Arian Askari, Zihui Yang, Zhaochun Ren, Suzan Verberne
The task of answer retrieval in the legal domain aims to help users to seek
relevant legal advice from massive amounts of professional responses. Two main
challenges hinder applying existing answer retrieval approaches in other
domains to the legal domain: (1) a huge knowledge gap between lawyers and
non-professionals; and (2) a mix of informal and formal content on legal QA
websites. To tackle these challenges, we propose CE_FS, a novel cross-encoder
(CE) re-ranker based on the fine-grained structured inputs. CE_FS uses
additional structured information in the CQA data to improve the effectiveness
of cross-encoder re-rankers. Furthermore, we propose LegalQA: a real-world
benchmark dataset for evaluating answer retrieval in the legal domain.
Experiments conducted on LegalQA show that our proposed method significantly
outperforms strong cross-encoder re-rankers fine-tuned on MS MARCO. Our novel
finding is that adding the question tags of each question besides the question
description and title into the input of cross-encoder re-rankers structurally
boosts the rankers' effectiveness. While we study our proposed method in the
legal domain, we believe that our method can be applied in similar applications
in other domains.
Authors' comments: accepted at ECIR 2024
Nikhilesh Bhatnagar, Ashok Urlana, Vandan Mujadia, Pruthwik Mishra, Dipti Misra Sharma
Cross-lingual summarization involves the summarization of text written in one
language to a different one. There is a body of research addressing
cross-lingual summarization from English to other European languages. In this
work, we aim to perform cross-lingual summarization from English to Hindi. We
propose pairing up the coverage of newsworthy events in textual and video
format can prove to be helpful for data acquisition for cross lingual
summarization. We analyze the data and propose methods to match articles to
video descriptions that serve as document and summary pairs. We also outline
filtering methods over reasonable thresholds to ensure the correctness of the
summaries. Further, we make available 28,583 mono and cross-lingual
article-summary pairs https://github.com/tingc9/Cross-Sum-News-Aligned. We also
build and analyze multiple baselines on the collected data and report error
analysis.
Authors' comments: 6 pages, 6 tables, 2 figures, conference: ICON 2023
Raviteja Anantha, Bortik Bandyopadhyay, Anirudh Kashi, Sayantan Mahinder, Andrew W Hill, Srinivas Chappidi
Large language models (LLMs) are increasingly employed for complex multi-step
planning tasks, where the tool retrieval (TR) step is crucial for achieving
successful outcomes. Two prevalent approaches for TR are single-step retrieval,
which utilizes the complete query, and sequential retrieval using task
decomposition (TD), where a full query is segmented into discrete atomic
subtasks. While single-step retrieval lacks the flexibility to handle
"inter-tool dependency," the TD approach necessitates maintaining "subtask-tool
atomicity alignment," as the toolbox can evolve dynamically. To address these
limitations, we introduce the Progressive Tool retrieval to Improve Planning
(ProTIP) framework. ProTIP is a lightweight, contrastive learning-based
framework that implicitly performs TD without the explicit requirement of
subtask labels, while simultaneously maintaining subtask-tool atomicity. On the
ToolBench dataset, ProTIP outperforms the ChatGPT task decomposition-based
approach by a remarkable margin, achieving a 24% improvement in Recall@K=10 for
TR and a 41% enhancement in tool accuracy for plan generation.
Authors' comments: preprint version
Hangfei Lin, Li Miao, Amir Ziai
Few-shot image classification is the task of classifying unseen images to one of N mutually exclusive classes, using only a small number of training examples for each class. The limited availability of these examples (denoted as K) presents a significant challenge to classification accuracy in some cases. To address this, we have developed a method for augmenting the set of K with an addition set of A retrieved images. We call this system Retrieval-Augmented Few-shot Image Classification (RAFIC). Through a series of experiments, we demonstrate that RAFIC markedly improves performance of few-shot image classification across two challenging datasets. RAFIC consists of two main components: (a) a retrieval component which uses CLIP, LAION-5B, and faiss, in order to efficiently retrieve images similar to the supplied images, and (b) retrieval meta-learning, which learns to judiciously utilize the retrieved images. Code and data is available at github.com/amirziai/rafic.
Xuechen Liu, Xin Wang, Erica Cooper, Xiaoxiao Miao, Junichi Yamagishi
In this study, we introduce a novel cross-modal retrieval task involving
speaker descriptions and their corresponding audio samples. Utilizing
pre-trained speaker and text encoders, we present a simple learning framework
based on contrastive learning. Additionally, we explore the impact of
incorporating speaker labels into the training process. Our findings establish
the effectiveness of linking speaker and text information for the task for both
English and Japanese languages, across diverse data configurations. Additional
visual analysis unveils potential nuanced associations between speaker
clustering and retrieval performance.
Authors' comments: Submitted to IEEE Signal Processing Letters
Raviteja Anantha, Tharun Bethi, Danil Vodianik, Srinivas Chappidi
Large language models (LLMs) have the remarkable ability to solve new tasks
with just a few examples, but they need access to the right tools. Retrieval
Augmented Generation (RAG) addresses this problem by retrieving a list of
relevant tools for a given task. However, RAG's tool retrieval step requires
all the required information to be explicitly present in the query. This is a
limitation, as semantic search, the widely adopted tool retrieval method, can
fail when the query is incomplete or lacks context. To address this limitation,
we propose Context Tuning for RAG, which employs a smart context retrieval
system to fetch relevant information that improves both tool retrieval and plan
generation. Our lightweight context retrieval model uses numerical,
categorical, and habitual usage signals to retrieve and rank context items. Our
empirical results demonstrate that context tuning significantly enhances
semantic search, achieving a 3.5-fold and 1.5-fold improvement in Recall@K for
context retrieval and tool retrieval tasks respectively, and resulting in an
11.6% increase in LLM-based planner accuracy. Additionally, we show that our
proposed lightweight model using Reciprocal Rank Fusion (RRF) with LambdaMART
outperforms GPT-4 based retrieval. Moreover, we observe context augmentation at
plan generation, even after tool retrieval, reduces hallucination.
Authors' comments: preprint version
Susav Shrestha, Narasimha Reddy, Zongwang Li
Recent advances in large language models have demonstrated remarkable
effectiveness in information retrieval (IR) tasks. While many neural IR systems
encode queries and documents into single-vector representations, multi-vector
models elevate the retrieval quality by producing multi-vector representations
and facilitating similarity searches at the granularity of individual tokens.
However, these models significantly amplify memory and storage requirements for
retrieval indices by an order of magnitude. This escalation in index size
renders the scalability of multi-vector IR models progressively challenging due
to their substantial memory demands. We introduce Embedding from Storage
Pipelined Network (ESPN) where we offload the entire re-ranking embedding
tables to SSDs and reduce the memory requirements by 5-16x. We design a
software prefetcher with hit rates exceeding 90%, improving SSD based retrieval
up to 6.4x, and demonstrate that we can maintain near memory levels of query
latency even for large query batch sizes.
Authors' comments: 10 pages, 10 figures
Yujie Qian, Zhening Li, Zhengkai Tu, Connor W. Coley, Regina Barzilay
This paper focuses on using natural language descriptions to enhance
predictive models in the chemistry field. Conventionally, chemoinformatics
models are trained with extensive structured data manually extracted from the
literature. In this paper, we introduce TextReact, a novel method that directly
augments predictive chemistry with texts retrieved from the literature.
TextReact retrieves text descriptions relevant for a given chemical reaction,
and then aligns them with the molecular representation of the reaction. This
alignment is enhanced via an auxiliary masked LM objective incorporated in the
predictor training. We empirically validate the framework on two chemistry
tasks: reaction condition recommendation and one-step retrosynthesis. By
leveraging text retrieval, TextReact significantly outperforms state-of-the-art
chemoinformatics models trained solely on molecular data.
Authors' comments: EMNLP 2023
Hao Li, Curise Jia, Peng Jin, Zesen Cheng, Kehan Li, Jialu Sui, Chang Liu, Li Yuan
Image Retrieval aims to retrieve corresponding images based on a given query.
In application scenarios, users intend to express their retrieval intent
through various query styles. However, current retrieval tasks predominantly
focus on text-query retrieval exploration, leading to limited retrieval query
options and potential ambiguity or bias in user intention. In this paper, we
propose the Style-Diversified Query-Based Image Retrieval task, which enables
retrieval based on various query styles. To facilitate the novel setting, we
propose the first Diverse-Style Retrieval dataset, encompassing diverse query
styles including text, sketch, low-resolution, and art. We also propose a
light-weighted style-diversified retrieval framework. For various query style
inputs, we apply the Gram Matrix to extract the query's textural features and
cluster them into a style space with style-specific bases. Then we employ the
style-init prompt tuning module to enable the visual encoder to comprehend the
texture and style information of the query. Experiments demonstrate that our
model, employing the style-init prompt tuning strategy, outperforms existing
retrieval models on the style-diversified retrieval task. Moreover,
style-diversified queries~(sketch+text, art+text, etc) can be simultaneously
retrieved in our model. The auxiliary information from other queries enhances
the retrieval performance within the respective query.
Authors' comments: 16 pages, 7 figures
Kai Liu, Deguang Han
A phase retrievable quantum channel refers to a quantum channel $\Phi: B(H_A)\to B(H_B)$ such that there is a positive operator valued measure (POVM) $\{F_{j}\}$ in $B(H_{B})$ and $\{\Phi^*(F_j)\}$ is a phase retrievable operator valued frame. In this paper we examine the phase retrievable quantum channels in terms of their Kraus representations. For quantum channels $\Phi$ of Choi's rank-$2$, we obtain a necessary and sufficient condition under which it is phase retrievable. For the general case, we present several necessary and/or sufficient conditions. In particular, a necessary and sufficient condition is obtained in terms of the relevant matrix-valued joint spectrum of the Kraus operators. Additionally, we also examine, by examples, the problem of constructing quantum channels such that there exists a minimal number of rank-one observables $\{F_{j}\}$ such that $\{\Phi^*(F_j)\}$ does phase retrieval for $H_A$. Conversely, for a given set of rank-one observables $\{F_{j}\}_{j=1}^{N}$, we present a sufficient condition under which, for every $1\leq r\leq N$ given, a phase retrievable quantum channel $\Phi$ of Choi's rank-$r$ can be explicitly constructed.
Chakradhar Reddy Nallu
This paper is based on developing different algorithms, which generate the task tree planning for the given goal node(recipe). The knowledge representation of the dishes is called FOON. It contains the different objects and their between them with respective to the motion node The graphical representation of FOON is made by noticing the change in the state of an object with respect to the human manipulators. We will explore how the FOON is created for different recipes by the robots. Task planning contains difficulties in exploring unknown problems, as its knowledge is limited to the FOON. To get the task tree planning for a given recipe, the robot will retrieve the information of different functional units from the knowledge retrieval process called FOON. Thus the generated subgraphs will allow the robot to cook the required dish. Thus the robot can able to cook the given recipe by following the sequence of instructions.
Shitong Sun, Jindong Gu, Shaogang Gong
Text-image composed retrieval aims to retrieve the target image through the
composed query, which is specified in the form of an image plus some text that
describes desired modifications to the input image. It has recently attracted
attention due to its ability to leverage both information-rich images and
concise language to precisely express the requirements for target images.
However, the robustness of these approaches against real-world corruptions or
further text understanding has never been studied. In this paper, we perform
the first robustness study and establish three new diversified benchmarks for
systematic analysis of text-image composed retrieval against natural
corruptions in both vision and text and further probe textural understanding.
For natural corruption analysis, we introduce two new large-scale benchmark
datasets, CIRR-C and FashionIQ-C for testing in open domain and fashion domain
respectively, both of which apply 15 visual corruptions and 7 textural
corruptions. For textural understanding analysis, we introduce a new diagnostic
dataset CIRR-D by expanding the original raw data with synthetic data, which
contains modified text to better probe textual understanding ability including
numerical variation, attribute variation, object removal, background variation,
and fine-grained evaluation. The code and benchmark datasets are available at
https://github.com/SunTongtongtong/Benchmark-Robustness-Text-Image-Compose-Retrieval.
Authors' comments: Accepted by R0-FoMo: Workshop on Robustness of Few-shot and Zero-shot
Learning in Foundation Models at NeurIPS 2023
Farnaz Khun Jush, Tuan Truong, Steffen Vogler, Matthias Lenga
A wide range of imaging techniques and data formats available for medical
images make accurate retrieval from image databases challenging.
Efficient retrieval systems are crucial in advancing medical research,
enabling large-scale studies and innovative diagnostic tools. Thus, addressing
the challenges of medical image retrieval is essential for the continued
enhancement of healthcare and research.
In this study, we evaluated the feasibility of employing four
state-of-the-art pretrained models for medical image retrieval at modality,
body region, and organ levels and compared the results of two similarity
indexing approaches. Since the employed networks take 2D images, we analyzed
the impacts of weighting and sampling strategies to incorporate 3D information
during retrieval of 3D volumes. We showed that medical image retrieval is
feasible using pretrained networks without any additional training or
fine-tuning steps. Using pretrained embeddings, we achieved a recall of 1 for
various tasks at modality, body region, and organ level.
Authors' comments: 8 pages, 3 figures, 4 tables
Tong Wu, Yulei Qin, Enwei Zhang, Zihan Xu, Yuting Gao, Ke Li, Xing Sun
Retrieval augmentation has become an effective solution to empower large language models (LLMs) with external and verified knowledge sources from the database, which overcomes the limitations and hallucinations of LLMs in handling up-to-date and domain-specific information. However, existing embedding models for text retrieval usually have three non-negligible limitations. First, the number and diversity of samples in a batch are too restricted to supervise the modeling of textual nuances at scale. Second, the high proportional noise are detrimental to the semantic correctness and consistency of embeddings. Third, the equal treatment to easy and difficult samples would cause sub-optimum convergence of embeddings with poorer generalization. In this paper, we propose the PEG, a progressively learned embeddings for robust text retrieval. Specifically, we increase the training in-batch negative samples to 80,000, and for each query, we extracted five hard negatives. Concurrently, we incorporated a progressive learning mechanism, enabling the model to dynamically modulate its attention to the samples throughout the entire training process. Additionally, PEG is trained on more than 100 million data, encompassing a wide range of domains (e.g., finance, medicine, and tourism) and covering various tasks (e.g., question-answering, machine reading comprehension, and similarity matching). Extensive experiments conducted on C-MTEB and DuReader demonstrate that PEG surpasses state-of-the-art embeddings in retrieving true positives, highlighting its significant potential for applications in LLMs. Our model is publicly available at https://huggingface.co/TownsWu/PEG.
Hansi Zeng, Chen Luo, Bowen Jin, Sheikh Muhammad Sarwar, Tianxin Wei, Hamed Zamani
Recent research has shown that transformer networks can be used as differentiable search indexes by representing each document as a sequences of document ID tokens. These generative retrieval models cast the retrieval problem to a document ID generation problem for each given query. Despite their elegant design, existing generative retrieval models only perform well on artificially-constructed and small-scale collections. This has led to serious skepticism in the research community on their real-world impact. This paper represents an important milestone in generative retrieval research by showing, for the first time, that generative retrieval models can be trained to perform effectively on large-scale standard retrieval benchmarks. For doing so, we propose RIPOR- an optimization framework for generative retrieval that can be adopted by any encoder-decoder architecture. RIPOR is designed based on two often-overlooked fundamental design considerations in generative retrieval. First, given the sequential decoding nature of document ID generation, assigning accurate relevance scores to documents based on the whole document ID sequence is not sufficient. To address this issue, RIPOR introduces a novel prefix-oriented ranking optimization algorithm. Second, initial document IDs should be constructed based on relevance associations between queries and documents, instead of the syntactic and semantic information in the documents. RIPOR addresses this issue using a relevance-based document ID construction approach that quantizes relevance-based representations learned for documents. Evaluation on MSMARCO and TREC Deep Learning Track reveals that RIPOR surpasses state-of-the-art generative retrieval models by a large margin (e.g., 30.5% MRR improvements on MS MARCO Dev Set), and perform better on par with popular dense retrieval models.
Sedrick Keh, Justin T. Chiu, Daniel Fried
When a model is trying to gather information in an interactive setting, it benefits from asking informative questions. However, in the case of a grounded multi-turn image identification task, previous studies have been constrained to polar yes/no questions, limiting how much information the model can gain in a single turn. We present an approach that formulates more informative, open-ended questions. In doing so, we discover that off-the-shelf visual question answering (VQA) models often make presupposition errors, which standard information gain question selection methods fail to account for. To address this issue, we propose a method that can incorporate presupposition handling into both question selection and belief updates. Specifically, we use a two-stage process, where the model first filters out images which are irrelevant to a given question, then updates its beliefs about which image the user intends. Through self-play and human evaluations, we show that our method is successful in asking informative open-ended questions, increasing accuracy over the past state-of-the-art by 14%, while resulting in 48% more efficient games in human evaluations.
Xiaonan Li, Changtai Zhu, Linyang Li, Zhangyue Yin, Tianxiang Sun, Xipeng Qiu
Verifiable generation aims to let the large language model (LLM) generate
text with supporting documents, which enables the user to flexibly verify the
answer and makes the LLM's output more reliable. Retrieval plays a crucial role
in verifiable generation. Specifically, the retrieved documents not only
supplement knowledge to help the LLM generate correct answers, but also serve
as supporting evidence for the user to verify the LLM's output. However, the
widely used retrievers become the bottleneck of the entire pipeline and limit
the overall performance. Their capabilities are usually inferior to LLMs since
they often have much fewer parameters than the large language model and have
not been demonstrated to scale well to the size of LLMs. If the retriever does
not correctly find the supporting documents, the LLM can not generate the
correct and verifiable answer, which overshadows the LLM's remarkable
abilities. To address these limitations, we propose \LLatrieval (Large Language
Model Verified Retrieval), where the LLM updates the retrieval result until it
verifies that the retrieved documents can sufficiently support answering the
question. Thus, the LLM can iteratively provide feedback to retrieval and
facilitate the retrieval result to fully support verifiable generation.
Experiments show that LLatrieval significantly outperforms extensive baselines
and achieves state-of-the-art results.
Authors' comments: Accepted by NAACL 2024 (Main Conference)
Lukas Gienapp, Harrisen Scells, Niklas Deckers, Janek Bevendorff, Shuai Wang, Johannes Kiesel, Shahbaz Syed, Maik Frbe et al.
Recent advances in large language models have enabled the development of
viable generative information retrieval systems. A generative retrieval system
returns a grounded generated text in response to an information need instead of
the traditional document ranking. Quantifying the utility of these types of
responses is essential for evaluating generative retrieval systems. As the
established evaluation methodology for ranking-based ad hoc retrieval may seem
unsuitable for generative retrieval, new approaches for reliable, repeatable,
and reproducible experimentation are required. In this paper, we survey the
relevant information retrieval and natural language processing literature,
identify search tasks and system architectures in generative retrieval, develop
a corresponding user model, and study its operationalization. This theoretical
analysis provides a foundation and new insights for the evaluation of
generative ad hoc retrieval systems.
Authors' comments: 14 pages, 5 figures, 1 table
Hang Zhang, Yeyun Gong, Xingwei He, Dayiheng Liu, Daya Guo, Jiancheng Lv, Jian Guo
Most dense retrieval models contain an implicit assumption: the training
query-document pairs are exactly matched. Since it is expensive to annotate the
corpus manually, training pairs in real-world applications are usually
collected automatically, which inevitably introduces mismatched-pair noise. In
this paper, we explore an interesting and challenging problem in dense
retrieval, how to train an effective model with mismatched-pair noise. To solve
this problem, we propose a novel approach called Noisy Pair Corrector (NPC),
which consists of a detection module and a correction module. The detection
module estimates noise pairs by calculating the perplexity between annotated
positive and easy negative documents. The correction module utilizes an
exponential moving average (EMA) model to provide a soft supervised signal,
aiding in mitigating the effects of noise. We conduct experiments on
text-retrieval benchmarks Natural Question and TriviaQA, code-search benchmarks
StaQC and SO-DS. Experimental results show that NPC achieves excellent
performance in handling both synthetic and realistic noise.
Authors' comments: Findings of EMNLP 2023
Sunkyung Lee, Minjin Choi, Jongwuk Lee
Generative retrieval shed light on a new paradigm of document retrieval,
aiming to directly generate the identifier of a relevant document for a query.
While it takes advantage of bypassing the construction of auxiliary index
structures, existing studies face two significant challenges: (i) the
discrepancy between the knowledge of pre-trained language models and
identifiers and (ii) the gap between training and inference that poses
difficulty in learning to rank. To overcome these challenges, we propose a
novel generative retrieval method, namely Generative retrieval via LExical
iNdex learning (GLEN). For training, GLEN effectively exploits a dynamic
lexical identifier using a two-phase index learning strategy, enabling it to
learn meaningful lexical identifiers and relevance signals between queries and
documents. For inference, GLEN utilizes collision-free inference, using
identifier weights to rank documents without additional overhead. Experimental
results prove that GLEN achieves state-of-the-art or competitive performance
against existing generative retrieval methods on various benchmark datasets,
e.g., NQ320k, MS MARCO, and BEIR. The code is available at
https://github.com/skleee/GLEN.
Authors' comments: In Proceedings of the 2023 Conference on Empirical Methods in Natural
Language Processing (EMNLP 2023) main conference. 12 pages, 2 figures, 8
tables