Qinkai Yu, Mingyu Jin, Dong Shu, Chong Zhang, Lizhou Fan, Wenyue Hua, Suiyuan Zhu, Yanda Meng et al.
Recent advancements in artificial intelligence (AI), especially large
language models (LLMs), have significantly advanced healthcare applications and
demonstrated potentials in intelligent medical treatment. However, there are
conspicuous challenges such as vast data volumes and inconsistent symptom
characterization standards, preventing full integration of healthcare AI
systems with individual patients' needs. To promote professional and
personalized healthcare, we propose an innovative framework, Heath-LLM, which
combines large-scale feature extraction and medical knowledge trade-off
scoring. Compared to traditional health management applications, our system has
three main advantages: (1) It integrates health reports and medical knowledge
into a large model to ask relevant questions to large language model for
disease prediction; (2) It leverages a retrieval augmented generation (RAG)
mechanism to enhance feature extraction; (3) It incorporates a semi-automated
feature updating framework that can merge and delete features to improve
accuracy of disease prediction. We experiment on a large number of health
reports to assess the effectiveness of Health-LLM system. The results indicate
that the proposed system surpasses the existing ones and has the potential to
significantly advance disease prediction and personalized health management.
Authors' comments: Accepted by ACL 2025 NLP4PosImpact Workshop
Parth Sarthi, Salman Abdullah, Aditi Tuli, Shubh Khanna, Anna Goldie, Christopher D. Manning
Retrieval-augmented language models can better adapt to changes in world state and incorporate long-tail knowledge. However, most existing methods retrieve only short contiguous chunks from a retrieval corpus, limiting holistic understanding of the overall document context. We introduce the novel approach of recursively embedding, clustering, and summarizing chunks of text, constructing a tree with differing levels of summarization from the bottom up. At inference time, our RAPTOR model retrieves from this tree, integrating information across lengthy documents at different levels of abstraction. Controlled experiments show that retrieval with recursive summaries offers significant improvements over traditional retrieval-augmented LMs on several tasks. On question-answering tasks that involve complex, multi-step reasoning, we show state-of-the-art results; for example, by coupling RAPTOR retrieval with the use of GPT-4, we can improve the best performance on the QuALITY benchmark by 20% in absolute accuracy.
Zackary Rackauckas
Infineon has identified a need for engineers, account managers, and customers
to rapidly obtain product information. This problem is traditionally addressed
with retrieval-augmented generation (RAG) chatbots, but in this study, I
evaluated the use of the newly popularized RAG-Fusion method. RAG-Fusion
combines RAG and reciprocal rank fusion (RRF) by generating multiple queries,
reranking them with reciprocal scores and fusing the documents and scores.
Through manually evaluating answers on accuracy, relevance, and
comprehensiveness, I found that RAG-Fusion was able to provide accurate and
comprehensive answers due to the generated queries contextualizing the original
query from various perspectives. However, some answers strayed off topic when
the generated queries' relevance to the original query is insufficient. This
research marks significant progress in artificial intelligence (AI) and natural
language processing (NLP) applications and demonstrates transformations in a
global and multi-industry context.
Authors' comments: 8 pages, 2 figures, 8 pages
Mohamed Nomeir, Alptug Aytekin, Sennur Ulukus
We consider the problems arising from the presence of Byzantine servers in a quantum private information retrieval (QPIR) setting. This is the first work to precisely define what the capabilities of Byzantine servers could be in a QPIR context. We show that quantum Byzantine servers have more capabilities than their classical counterparts due to the possibilities created by the quantum encoding procedure. We focus on quantum Byzantine servers that can apply any reversible operations on their individual qudits. In this case, the Byzantine servers can generate any error, i.e., this covers \emph{all} possible single qudit operations that can be done by the Byzantine servers on their qudits. We design a scheme that is resilient to these kinds of manipulations. We show that the scheme designed achieves superdense coding gain in all cases, i.e., $R_Q= \max \left\{0,\min\left\{1,2\left(1-\frac{X+T+2B}{N}\right)\right\}\right\}$.
Hugo Thimonier, Fabrice Popineau, Arpad Rimmel, Bich-Liên Doan
Deep learning for tabular data has garnered increasing attention in recent
years, yet employing deep models for structured data remains challenging. While
these models excel with unstructured data, their efficacy with structured data
has been limited. Recent research has introduced retrieval-augmented models to
address this gap, demonstrating promising results in supervised tasks such as
classification and regression. In this work, we investigate using
retrieval-augmented models for anomaly detection on tabular data. We propose a
reconstruction-based approach in which a transformer model learns to
reconstruct masked features of \textit{normal} samples. We test the
effectiveness of KNN-based and attention-based modules to select relevant
samples to help in the reconstruction process of the target sample. Our
experiments on a benchmark of 31 tabular datasets reveal that augmenting this
reconstruction-based anomaly detection (AD) method with sample-sample
dependencies via retrieval modules significantly boosts performance. The
present work supports the idea that retrieval module are useful to augment any
deep AD method to enhance anomaly detection on tabular data.
Authors' comments: Accepted at CIKM 2024
Shuxun Wang, Yunfei Lei, Ziqi Zhang, Wei Liu, Haowei Liu, Li Yang, Wenjuan Li, Bing Li et al.
With the rise of 'Metaverse' and 'Web3.0', NFT ( Non-Fungible Token ) has
emerged as a kind of pivotal digital asset, garnering significant attention. By
the end of November 2023, more than 1.4 billion NFT tokens have been minted
across various blockchain platforms. To effectively locate a satisfactory NFT
token, conducting searches within the extensive array of NFT data is essential.
The challenge in NFT retrieval is heightened due to the high degree of
similarity among different NFT tokens, in terms of regional and semantic
aspects. Achieving accurate and efficient retrieval within the large-scale,
highly similar NFT data presents a formidable challenge for both the academic
and industrial communities. In this paper, we will introduce a dataset named
'NFT Top1000 Visual Text Dataset'(henceforth, NFT1000), containing 7.56 million
image-text pairs, and being collected from 1000 most famous PFP NFT collections
by sales volume on the Ethereum blockchain. Based on the dataset, we test the
CLIP (Contrastive Language-Image Pretraining) models as a baseline.
Additionally, we also propose a concept of Comprehensive Variance Index (CVI in
short), which is a robust metric designed to assess the similarity and
retrieval difficulty of visual-text pairs data.
Authors' comments: 6 pages,7 figures
Or Elimelech, Ori Shmuel, Asaf Cohen
In the above article \cite{shmuel2021private}, the authors introduced a PIR scheme for the Additive White Gaussian Noise (AWGN) Multiple Access Channel (MAC), both with and without fading. The authors utilized the additive nature of the channel and leveraged the linear properties and structure of lattice codes to retrieve the desired message without the servers acquiring any knowledge about the retrieved message's index. Theorems 3 and 4 in \cite{shmuel2021private} contain an error arising from the incorrect usage of the modulo operator. Moreover, the proofs assume a one-to-one mapping function, $\phi(\cdot)$, between a message $W_j\in\mathbb{F}_p^L$ and the elements of $\cC$, mistakenly suggesting that the user possesses all the required information in advance. % \st{However, this is not the case.} \textcolor{black}{To deal with that, we defined $\phi(\cdot)$ as a one-to-one mapping function between a vector of $l$ information bits and a lattice point $\lambda\in\cC$}. Herein, we present the corrected versions of these theorems.
Ayush Dubey, Shiv Ram Dubey, Satish Kumar Singh, Wei-Ta Chu
Unsupervised image retrieval aims to learn the important visual characteristics without any given level to retrieve the similar images for a given query image. The Convolutional Neural Network (CNN)-based approaches have been extensively exploited with self-supervised contrastive learning for image hashing. However, the existing approaches suffer due to lack of effective utilization of global features by CNNs and biased-ness created by false negative pairs in the contrastive learning. In this paper, we propose a TransClippedCLR model by encoding the global context of an image using Transformer having local context through patch based processing, by generating the hash codes through product quantization and by avoiding the potential false negative pairs through clipped contrastive learning. The proposed model is tested with superior performance for unsupervised image retrieval on benchmark datasets, including CIFAR10, NUS-Wide and Flickr25K, as compared to the recent state-of-the-art deep models. The results using the proposed clipped contrastive learning are greatly improved on all datasets as compared to same backbone network with vanilla contrastive learning.
Yixuan Tang, Yi Yang
Retrieval-augmented generation (RAG) augments large language models (LLM) by
retrieving relevant knowledge, showing promising potential in mitigating LLM
hallucinations and enhancing response quality, thereby facilitating the great
adoption of LLMs in practice. However, we find that existing RAG systems are
inadequate in answering multi-hop queries, which require retrieving and
reasoning over multiple pieces of supporting evidence. Furthermore, to our
knowledge, no existing RAG benchmarking dataset focuses on multi-hop queries.
In this paper, we develop a novel dataset, MultiHop-RAG, which consists of a
knowledge base, a large collection of multi-hop queries, their ground-truth
answers, and the associated supporting evidence. We detail the procedure of
building the dataset, utilizing an English news article dataset as the
underlying RAG knowledge base. We demonstrate the benchmarking utility of
MultiHop-RAG in two experiments. The first experiment compares different
embedding models for retrieving evidence for multi-hop queries. In the second
experiment, we examine the capabilities of various state-of-the-art LLMs,
including GPT-4, PaLM, and Llama2-70B, in reasoning and answering multi-hop
queries given the evidence. Both experiments reveal that existing RAG methods
perform unsatisfactorily in retrieving and answering multi-hop queries. We hope
MultiHop-RAG will be a valuable resource for the community in developing
effective RAG systems, thereby facilitating greater adoption of LLMs in
practice. The MultiHop-RAG and implemented RAG system is publicly available at
https://github.com/yixuantt/MultiHop-RAG/.
Authors' comments: Link: https://github.com/yixuantt/MultiHop-RAG/
Florin Cuconasu, Giovanni Trappolini, Federico Siciliano, Simone Filice, Cesare Campagnano, Yoelle Maarek, Nicola Tonellotto, Fabrizio Silvestri
Retrieval-Augmented Generation (RAG) systems represent a significant advancement over traditional Large Language Models (LLMs). RAG systems enhance their generation ability by incorporating external data retrieved through an Information Retrieval (IR) phase, overcoming the limitations of standard LLMs, which are restricted to their pre-trained knowledge and limited context window. Most research in this area has predominantly concentrated on the generative aspect of LLMs within RAG systems. Our study fills this gap by thoroughly and critically analyzing the influence of IR components on RAG systems. This paper analyzes which characteristics a retriever should possess for an effective RAG's prompt formulation, focusing on the type of documents that should be retrieved. We evaluate various elements, such as the relevance of the documents to the prompt, their position, and the number included in the context. Our findings reveal, among other insights, that including irrelevant documents can unexpectedly enhance performance by more than 30% in accuracy, contradicting our initial assumption of diminished quality. These results underscore the need for developing specialized strategies to integrate retrieval with language generation models, thereby laying the groundwork for future research in this field.
Zhihao Zhang, Alan Zhu, Lijie Yang, Yihua Xu, Lanting Li, Phitchaya Mangpo Phothilimthana, Zhihao Jia
Retrieval-augmented language models (RaLM) have demonstrated the potential to
solve knowledge-intensive natural language processing (NLP) tasks by combining
a non-parametric knowledge base with a parametric language model. Instead of
fine-tuning a fully parametric model, RaLM excels at its low-cost adaptation to
the latest data and better source attribution mechanisms. Among various RaLM
approaches, iterative RaLM delivers a better generation quality due to a more
frequent interaction between the retriever and the language model. Despite the
benefits, iterative RaLM usually encounters high overheads due to the frequent
retrieval step. To this end, we propose RaLMSpec, a speculation-inspired
framework that provides generic speed-up over iterative RaLM while preserving
the same model outputs through speculative retrieval and batched verification.
By further incorporating prefetching, optimal speculation stride scheduler, and
asynchronous verification, RaLMSpec can automatically exploit the acceleration
potential to the fullest. For naive iterative RaLM serving, extensive
evaluations over three language models on four downstream QA datasets
demonstrate that RaLMSpec can achieve a speed-up ratio of 1.75-2.39x,
1.04-1.39x, and 1.31-1.77x when the retriever is an exact dense retriever,
approximate dense retriever, and sparse retriever respectively compared with
the baseline. For KNN-LM serving, RaLMSpec can achieve a speed-up ratio up to
7.59x and 2.45x when the retriever is an exact dense retriever and approximate
dense retriever, respectively, compared with the baseline.
Authors' comments: Preprint
Eduardo Vicente-Lpez, Luis M. de Campos, Juan M. Fernndez-Luna, Juan F. Huete
Personalization generally improves the performance of queries but in a few cases it may also harms it. If we are able to predict and therefore to disable personalization for those situations, the overall performance will be higher and users will be more satisfied with personalized systems. We use some state-of-the-art pre-retrieval query performance predictors and propose some others including the user profile information for the previous purpose. We study the correlations among these predictors and the difference between the personalized and the original queries. We also use classification and regression techniques to improve the results and finally reach a bit more than one third of the maximum ideal performance. We think this is a good starting point within this research line, which certainly needs more effort and improvements.
Anoushka Gade, Jorjeta Jetcheva
The web serves as a global repository of knowledge, used by billions of people to search for information. Ensuring that users receive the most relevant and up-to-date information, especially in the presence of multiple versions of web content from different time points remains a critical challenge for information retrieval. This challenge has recently been compounded by the increased use of question answering tools trained on Wikipedia or web content and powered by large language models (LLMs) which have been found to make up information (or hallucinate), and in addition have been shown to struggle with the temporal dimensions of information. Even Retriever Augmented Language Models (RALMs) which incorporate a document database to reduce LLM hallucination are unable to handle temporal queries correctly. This leads to instances where RALMs respond to queries such as "Who won the Wimbledon Championship?", by retrieving document passages related to Wimbledon but without the ability to differentiate between them based on how recent they are. In this paper, we propose and evaluate, TempRALM, a temporally-aware Retriever Augmented Language Model (RALM) with few-shot learning extensions, which takes into account both semantically and temporally relevant documents relative to a given query, rather than relying on semantic similarity alone. We show that our approach results in up to 74% improvement in performance over the baseline RALM model, without requiring model pre-training, recalculating or replacing the RALM document index, or adding other computationally intensive elements.
Dezhao Luo, Shaogang Gong, Jiabo Huang, Hailin Jin, Yang Liu
Video moment retrieval (VMR) aims to locate the most likely video moment(s)
corresponding to a text query in untrimmed videos. Training of existing methods
is limited by the lack of diverse and generalisable VMR datasets, hindering
their ability to generalise moment-text associations to queries containing
novel semantic concepts (unseen both visually and textually in a training
source domain). For model generalisation to novel semantics, existing methods
rely heavily on assuming to have access to both video and text sentence pairs
from a target domain in addition to the source domain pair-wise training data.
This is neither practical nor scalable. In this work, we introduce a more
generalisable approach by assuming only text sentences describing new semantics
are available in model training without having seen any videos from a target
domain. To that end, we propose a Fine-grained Video Editing framework, termed
FVE, that explores generative video diffusion to facilitate fine-grained video
editing from the seen source concepts to the unseen target sentences consisting
of new concepts. This enables generative hypotheses of unseen video moments
corresponding to the novel concepts in the target domain. This fine-grained
generative video diffusion retains the original video structure and subject
specifics from the source domain while introducing semantic distinctions of
unseen novel vocabularies in the target domain. A critical challenge is how to
enable this generative fine-grained diffusion process to be meaningful in
optimising VMR, more than just synthesising visually pleasing videos. We solve
this problem by introducing a hybrid selection mechanism that integrates three
quantitative metrics to selectively incorporate synthetic video moments (novel
video hypotheses) as enlarged additions to the original source training data,
whilst minimising potential ...
Authors' comments: AAAI-25
Demiao Lin
With the rapid development of Large Language Models (LLMs),
Retrieval-Augmented Generation (RAG) has become a predominant method in the
field of professional knowledge-based question answering. Presently, major
foundation model companies have opened up Embedding and Chat API interfaces,
and frameworks like LangChain have already integrated the RAG process. It
appears that the key models and steps in RAG have been resolved, leading to the
question: are professional knowledge QA systems now approaching perfection?
This article discovers that current primary methods depend on the premise of
accessing high-quality text corpora. However, since professional documents are
mainly stored in PDFs, the low accuracy of PDF parsing significantly impacts
the effectiveness of professional knowledge-based QA. We conducted an empirical
RAG experiment across hundreds of questions from the corresponding real-world
professional documents. The results show that, ChatDOC, a RAG system equipped
with a panoptic and pinpoint PDF parser, retrieves more accurate and complete
segments, and thus better answers. Empirical experiments show that ChatDOC is
superior to baseline on nearly 47% of questions, ties for 38% of cases, and
falls short on only 15% of cases. It shows that we may revolutionize RAG with
enhanced PDF structure recognition.
Authors' comments: 18 pages, 16 figures
Animesh Basak Chowdhury, Marco Romanelli, Benjamin Tan, Ramesh Karri, Siddharth Garg
Logic synthesis, a pivotal stage in chip design, entails optimizing chip
specifications encoded in hardware description languages like Verilog into
highly efficient implementations using Boolean logic gates. The process
involves a sequential application of logic minimization heuristics (``synthesis
recipe"), with their arrangement significantly impacting crucial metrics such
as area and delay. Addressing the challenge posed by the broad spectrum of
design complexities - from variations of past designs (e.g., adders and
multipliers) to entirely novel configurations (e.g., innovative processor
instructions) - requires a nuanced `synthesis recipe` guided by human expertise
and intuition. This study conducts a thorough examination of learning and
search techniques for logic synthesis, unearthing a surprising revelation:
pre-trained agents, when confronted with entirely novel designs, may veer off
course, detrimentally affecting the search trajectory. We present ABC-RL, a
meticulously tuned $\alpha$ parameter that adeptly adjusts recommendations from
pre-trained agents during the search process. Computed based on similarity
scores through nearest neighbor retrieval from the training dataset, ABC-RL
yields superior synthesis recipes tailored for a wide array of hardware
designs. Our findings showcase substantial enhancements in the
Quality-of-result (QoR) of synthesized circuits, boasting improvements of up to
24.8% compared to state-of-the-art techniques. Furthermore, ABC-RL achieves an
impressive up to 9x reduction in runtime (iso-QoR) when compared to current
state-of-the-art methodologies.
Authors' comments: Accepted in ICLR 2024
Jiarui Qin, Weiwen Liu, Ruiming Tang, Weinan Zhang, Yong Yu
A vast amount of user behavior data is constantly accumulating on today's
large recommendation platforms, recording users' various interests and tastes.
Preserving knowledge from the old data while new data continually arrives is a
vital problem for recommender systems. Existing approaches generally seek to
save the knowledge implicitly in the model parameters. However, such a
parameter-centric approach lacks scalability and flexibility -- the capacity is
hard to scale, and the knowledge is inflexible to utilize. Hence, in this work,
we propose a framework that turns massive user behavior data to retrievable
knowledge (D2K). It is a data-centric approach that is model-agnostic and easy
to scale up. Different from only storing unary knowledge such as the user-side
or item-side information, D2K propose to store ternary knowledge for
recommendation, which is determined by the complete recommendation factors --
user, item, and context. The knowledge retrieved by target samples can be
directly used to enhance the performance of any recommendation algorithms.
Specifically, we introduce a Transformer-based knowledge encoder to transform
the old data into knowledge with the user-item-context cross features. A
personalized knowledge adaptation unit is devised to effectively exploit the
information from the knowledge base by adapting the retrieved knowledge to the
target samples. Extensive experiments on two public datasets show that D2K
significantly outperforms existing baselines and is compatible with a major
collection of recommendation algorithms.
Authors' comments: 12 pages, 7 figures
Ruichen Zhang, Hongyang Du, Yinqiu Liu, Dusit Niyato, Jiawen Kang, Sumei Sun, Xuemin Shen, H. Vincent Poor
With the advance of artificial intelligence (AI), the emergence of Google
Gemini and OpenAI Q* marks the direction towards artificial general
intelligence (AGI). To implement AGI, the concept of interactive AI (IAI) has
been introduced, which can interactively understand and respond not only to
human user input but also to dynamic system and network conditions. In this
article, we explore an integration and enhancement of IAI in networking. We
first comprehensively review recent developments and future perspectives of AI
and then introduce the technology and components of IAI. We then explore the
integration of IAI into the next-generation networks, focusing on how implicit
and explicit interactions can enhance network functionality, improve user
experience, and promote efficient network management. Subsequently, we propose
an IAI-enabled network management and optimization framework, which consists of
environment, perception, action, and brain units. We also design the pluggable
large language model (LLM) module and retrieval augmented generation (RAG)
module to build the knowledge base and contextual memory for decision-making in
the brain unit. We demonstrate the effectiveness of the framework through case
studies. Finally, we discuss potential research directions for IAI-based
networks.
Authors' comments: 10 pages, 4 figures
Xiangpeng Yang, Linchao Zhu, Xiaohan Wang, Yi Yang
Text-video retrieval is a critical multi-modal task to find the most relevant
video for a text query. Although pretrained models like CLIP have demonstrated
impressive potential in this area, the rising cost of fully finetuning these
models due to increasing model size continues to pose a problem. To address
this challenge, prompt tuning has emerged as an alternative. However, existing
works still face two problems when adapting pretrained image-text models to
downstream video-text tasks: (1) The visual encoder could only encode
frame-level features and failed to extract global-level general video
information. (2) Equipping the visual and text encoder with separated prompts
failed to mitigate the visual-text modality gap. To this end, we propose DGL, a
cross-modal Dynamic prompt tuning method with Global-Local video attention. In
contrast to previous prompt tuning methods, we employ the shared latent space
to generate local-level text and frame prompts that encourage inter-modal
interaction. Furthermore, we propose modeling video in a global-local attention
mechanism to capture global video information from the perspective of prompt
tuning. Extensive experiments reveal that when only 0.67% parameters are tuned,
our cross-modal prompt tuning strategy DGL outperforms or is comparable to
fully finetuning methods on MSR-VTT, VATEX, LSMDC, and ActivityNet datasets.
Code will be available at https://github.com/knightyxp/DGL
Authors' comments: AAAI2024, Code will be available at https://github.com/knightyxp/DGL
Peiwen Yuan, Xinglin Wang, Shaoxiong Feng, Boyuan Pan, Yiwei Li, Heda Wang, Xupeng Miao, Kan Li
Generative Retrieval (GR), autoregressively decoding relevant document
identifiers given a query, has been shown to perform well under the setting of
small-scale corpora. By memorizing the document corpus with model parameters,
GR implicitly achieves deep interaction between query and document. However,
such a memorizing mechanism faces three drawbacks: (1) Poor memory accuracy for
fine-grained features of documents; (2) Memory confusion gets worse as the
corpus size increases; (3) Huge memory update costs for new documents. To
alleviate these problems, we propose the Generative Dense Retrieval (GDR)
paradigm. Specifically, GDR first uses the limited memory volume to achieve
inter-cluster matching from query to relevant document clusters.
Memorizing-free matching mechanism from Dense Retrieval (DR) is then introduced
to conduct fine-grained intra-cluster matching from clusters to relevant
documents. The coarse-to-fine process maximizes the advantages of GR's deep
interaction and DR's scalability. Besides, we design a cluster identifier
constructing strategy to facilitate corpus memory and a cluster-adaptive
negative sampling strategy to enhance the intra-cluster mapping ability.
Empirical results show that GDR obtains an average of 3.0 R@100 improvement on
NQ dataset under multiple settings and has better scalability.
Authors' comments: EACL 2024 main