Yang Xiong, Ruichen Zhang, Yinqiu Liu, Dusit Niyato, Zehui Xiong, Ying-Chang Liang, Shiwen Mao
The rapid development of next-generation networking technologies underscores
their transformative role in revolutionizing modern communication systems,
enabling faster, more reliable, and highly interconnected solutions. However,
such development has also brought challenges to network optimizations. Thanks
to the emergence of Large Language Models (LLMs) in recent years, tools
including Retrieval Augmented Generation (RAG) have been developed and applied
in various fields including networking, and have shown their effectiveness.
Taking one step further, the integration of knowledge graphs into RAG
frameworks further enhanced the performance of RAG in networking applications
such as Intent-Driven Networks (IDNs) and spectrum knowledge maps by providing
more contextually relevant responses through more accurate retrieval of related
network information. This paper introduces the RAG framework that integrates
knowledge graphs in its database and explores such framework's application in
networking. We begin by exploring RAG's applications in networking and the
limitations of conventional RAG and present the advantages that knowledge
graphs' structured knowledge representation brings to the retrieval and
generation processes. Next, we propose a detailed GraphRAG-based framework for
networking, including a step-by-step tutorial on its construction. Our
evaluation through a case study on channel gain prediction demonstrates
GraphRAG's enhanced capability in generating accurate, contextually rich
responses, surpassing traditional RAG models. Finally, we discuss key future
directions for applying knowledge-graphs-empowered RAG frameworks in
networking, including robust updates, mitigation of hallucination, and enhanced
security measures for networking applications.
Authors' comments: 9 pages, 4 figures
M. Hamza Mughal, Rishabh Dabral, Merel C. J. Scholman, Vera Demberg, Christian Theobalt
Non-verbal communication often comprises of semantically rich gestures that
help convey the meaning of an utterance. Producing such semantic co-speech
gestures has been a major challenge for the existing neural systems that can
generate rhythmic beat gestures, but struggle to produce semantically
meaningful gestures. Therefore, we present RAG-Gesture, a diffusion-based
gesture generation approach that leverages Retrieval Augmented Generation (RAG)
to produce natural-looking and semantically rich gestures. Our neuro-explicit
gesture generation approach is designed to produce semantic gestures grounded
in interpretable linguistic knowledge. We achieve this by using explicit domain
knowledge to retrieve exemplar motions from a database of co-speech gestures.
Once retrieved, we then inject these semantic exemplar gestures into our
diffusion-based gesture generation pipeline using DDIM inversion and retrieval
guidance at the inference time without any need of training. Further, we
propose a control paradigm for guidance, that allows the users to modulate the
amount of influence each retrieval insertion has over the generated sequence.
Our comparative evaluations demonstrate the validity of our approach against
recent gesture generation approaches. The reader is urged to explore the
results on our project page.
Authors' comments: Preprint. Project page:
https://vcai.mpi-inf.mpg.de/projects/RAG-Gesture/
Jebish Purbey, Drishti Sharma, Siddhant Gupta, Khawaja Murad, Siddartha Pullakhandam, Ram Mohan Rao Kadiyala
This paper presents the system description of our entry for the COLING 2025
RegNLP RIRAG (Regulatory Information Retrieval and Answer Generation)
challenge, focusing on leveraging advanced information retrieval and answer
generation techniques in regulatory domains. We experimented with a combination
of embedding models, including Stella, BGE, CDE, and Mpnet, and leveraged
fine-tuning and reranking for retrieving relevant documents in top ranks. We
utilized a novel approach, LeSeR, which achieved competitive results with a
recall@10 of 0.8201 and map@10 of 0.6655 for retrievals. This work highlights
the transformative potential of natural language processing techniques in
regulatory applications, offering insights into their capabilities for
implementing a retrieval augmented generation system while identifying areas
for future improvement in robustness and domain adaptation.
Authors' comments: 5 pages, Accepted to RegNLP @ COLING 2025
Aniruddha Salve, Saba Attar, Mahesh Deshmukh, Sayali Shivpuje, Arnab Mitra Utsab
Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by
incorporating external, domain-specific data into the generative process. While
LLMs are highly capable, they often rely on static, pre-trained datasets,
limiting their ability to integrate dynamic or private data. Traditional RAG
systems typically use a single-agent architecture to handle query generation,
data retrieval, and response synthesis. However, this approach becomes
inefficient when dealing with diverse data sources, such as relational
databases, document stores, and graph databases, often leading to performance
bottlenecks and reduced accuracy. This paper proposes a multi-agent RAG system
to address these limitations. Specialized agents, each optimized for a specific
data source, handle query generation for relational, NoSQL, and document-based
systems. These agents collaborate within a modular framework, with query
execution delegated to an environment designed for compatibility across various
database types. This distributed approach enhances query efficiency, reduces
token overhead, and improves response accuracy by ensuring that each agent
focuses on its specialized task. The proposed system is scalable and adaptable,
making it ideal for generative AI workflows that require integration with
diverse, dynamic, or private data sources. By leveraging specialized agents and
a modular execution environment, the system provides an efficient and robust
solution for handling complex, heterogeneous data environments in generative AI
applications.
Authors' comments: 16 pages, 3 figures. This preprint introduces a multi-agent framework
for Retrieval-Augmented Generation (RAG), enhancing Large Language Models
(LLMs) for efficient integration of diverse data sources. Relevant for
researchers in AI, ML, generative AI, and database systems
Kaustubh D. Dhole, Kai Shu, Eugene Agichtein
Computational argumentation, which involves generating answers or summaries for controversial topics like abortion bans and vaccination, has become increasingly important in today's polarized environment. Sophisticated LLM capabilities offer the potential to provide nuanced, evidence-based answers to such questions through Retrieval-Augmented Argumentation (RAArg), leveraging real-world evidence for high-quality, grounded arguments. However, evaluating RAArg remains challenging, as human evaluation is costly and difficult for complex, lengthy answers on complicated topics. At the same time, re-using existing argumentation datasets is no longer sufficient, as they lack long, complex arguments and realistic evidence from potentially misleading sources, limiting holistic evaluation of retrieval effectiveness and argument quality. To address these gaps, we investigate automated evaluation methods using multiple fine-grained LLM judges, providing better and more interpretable assessments than traditional single-score metrics and even previously reported human crowdsourcing. To validate the proposed techniques, we introduce ConQRet, a new benchmark featuring long and complex human-authored arguments on debated topics, grounded in real-world websites, allowing an exhaustive evaluation across retrieval effectiveness, argument quality, and groundedness. We validate our LLM Judges on a prior dataset and the new ConQRet benchmark. Our proposed LLM Judges and the ConQRet benchmark can enable rapid progress in computational argumentation and can be naturally extended to other complex retrieval-augmented generation tasks.
Manish Bhattarai, Minh Vu, Javier E. Santos, Ismael Boureima, Daniel O' Malley
We introduce a novel method to enhance cross-language code translation from Fortran to C++ by integrating task-specific embedding alignment into a Retrieval-Augmented Generation (RAG) framework. Unlike conventional retrieval approaches that utilize generic embeddings agnostic to the downstream task, our strategy aligns the retrieval model directly with the objective of maximizing translation quality, as quantified by the CodeBLEU metric. This alignment ensures that the embeddings are semantically and syntactically meaningful for the specific code translation task. Our methodology involves constructing a dataset of 25,000 Fortran code snippets sourced from Stack-V2 dataset and generating their corresponding C++ translations using the LLaMA 3.1-8B language model. We compute pairwise CodeBLEU scores between the generated translations and ground truth examples to capture fine-grained similarities. These scores serve as supervision signals in a contrastive learning framework, where we optimize the embedding model to retrieve Fortran-C++ pairs that are most beneficial for improving the language model's translation performance. By integrating these CodeBLEU-optimized embeddings into the RAG framework, our approach significantly enhances both retrieval accuracy and code generation quality over methods employing generic embeddings. On the HPC Fortran2C++ dataset, our method elevates the average CodeBLEU score from 0.64 to 0.73, achieving a 14% relative improvement. On the Numerical Recipes dataset, we observe an increase from 0.52 to 0.60, marking a 15% relative improvement. Importantly, these gains are realized without any fine-tuning of the language model, underscoring the efficiency and practicality of our approach.
Ying Jin, Zhuoran Zhou, Haoquan Fang, Jenq-Neng Hwang
Medical image understanding requires meticulous examination of fine visual details, with particular regions requiring additional attention. While radiologists build such expertise over years of experience, it is challenging for AI models to learn where to look with limited amounts of training data. This limitation results in unsatisfying robustness in medical image understanding. To address this issue, we propose Diffusion-based Feature Augmentation (DAug), a portable method that improves a perception model's performance with a generative model's output. Specifically, we extend a radiology image to multiple channels, with the additional channels being the heatmaps of regions where diseases tend to develop. A diffusion-based image-to-image translation model was used to generate such heatmaps conditioned on selected disease classes. Our method is motivated by the fact that generative models learn the distribution of normal and abnormal images, and such knowledge is complementary to image understanding tasks. In addition, we propose the Image-Text-Class Hybrid Contrastive learning to utilize both text and class labels. With two novel approaches combined, our method surpasses baseline models without changing the model architecture, and achieves state-of-the-art performance on both medical image retrieval and classification tasks.
Xuchan Bao, Judith Yue Li, Zhong Yi Wan, Kun Su, Timo Denk, Joonseok Lee, Dima Kuzmin, Fei Sha
Modern music retrieval systems often rely on fixed representations of user
preferences, limiting their ability to capture users' diverse and uncertain
retrieval needs. To address this limitation, we introduce Diff4Steer, a novel
generative retrieval framework that employs lightweight diffusion models to
synthesize diverse seed embeddings from user queries that represent potential
directions for music exploration. Unlike deterministic methods that map user
query to a single point in embedding space, Diff4Steer provides a statistical
prior on the target modality (audio) for retrieval, effectively capturing the
uncertainty and multi-faceted nature of user preferences. Furthermore,
Diff4Steer can be steered by image or text inputs, enabling more flexible and
controllable music discovery combined with nearest neighbor search. Our
framework outperforms deterministic regression methods and LLM-based generative
retrieval baseline in terms of retrieval and ranking metrics, demonstrating its
effectiveness in capturing user preferences, leading to more diverse and
relevant recommendations. Listening examples are available at
tinyurl.com/diff4steer.
Authors' comments: NeurIPS 2024 Creative AI Track
Manish Bhattarai, Ryan Barron, Maksim Eren, Minh Vu, Vesselin Grantcharov, Ismael Boureima, Valentin Stanev, Cynthia Matuszek et al.
Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by integrating external document retrieval to provide domain-specific or up-to-date knowledge. The effectiveness of RAG depends on the relevance of retrieved documents, which is influenced by the semantic alignment of embeddings with the domain's specialized content. Although full fine-tuning can align language models to specific domains, it is computationally intensive and demands substantial data. This paper introduces Hierarchical Embedding Alignment Loss (HEAL), a novel method that leverages hierarchical fuzzy clustering with matrix factorization within contrastive learning to efficiently align LLM embeddings with domain-specific content. HEAL computes level/depth-wise contrastive losses and incorporates hierarchical penalties to align embeddings with the underlying relationships in label hierarchies. This approach enhances retrieval relevance and document classification, effectively reducing hallucinations in LLM outputs. In our experiments, we benchmark and evaluate HEAL across diverse domains, including Healthcare, Material Science, Cyber-security, and Applied Maths.
Fnu Neha, Deepshikha Bhati, Deepak Kumar Shukla, Angela Guercio, Ben Ward
The rapid development of Artificial Intelligence (AI) has led to the creation of powerful text generation models, such as large language models (LLMs), which are widely used for diverse applications. However, concerns surrounding AI-generated content, including issues of originality, bias, misinformation, and accountability, have become increasingly prominent. This paper offers a comprehensive overview of AI text generators (AITGs), focusing on their evolution, capabilities, and ethical implications. This paper also introduces Retrieval-Augmented Generation (RAG), a recent approach that improves the contextual relevance and accuracy of text generation by integrating dynamic information retrieval. RAG addresses key limitations of traditional models, including their reliance on static knowledge and potential inaccuracies in handling real-world data. Additionally, the paper reviews detection tools that help differentiate AI-generated text from human-written content and discusses the ethical challenges these technologies pose. The paper explores future directions for improving detection accuracy, supporting ethical AI development, and increasing accessibility. The paper contributes to a more responsible and reliable use of AI in content creation through these discussions.
Yang Chen, Cheng Cheng
Conjugate phase retrieval considers the recovery of a function, up to a unimodular constant and conjugation, from its phaseless measurements. In this paper, we explore the conjugate phase retrieval in a shift-invariant space generated by a Gaussian funciton. First, we show that the modulus function in the Gaussian shift-invariant space can be determined from the phaseless Hermite samples taken on a discrete sampling set. We then show that a function in the shift-invariant space generated by a Gaussian can be uniquely determined, up to a unimodular constant and conjugation, from its phaseless Hermite samples on a discrete set. For the functions with finite coefficient sequences, we provide an explicit reconstruction procedure.
Kaustubh Sridhar, Souradeep Dutta, Dinesh Jayaraman, Insup Lee
Building generalist agents that can rapidly adapt to new environments is a
key challenge for deploying AI in the digital and real worlds. Is scaling
current agent architectures the most effective way to build generalist agents?
We propose a novel approach to pre-train relatively small policies on
relatively small datasets and adapt them to unseen environments via in-context
learning, without any finetuning. Our key idea is that retrieval offers a
powerful bias for fast adaptation. Indeed, we demonstrate that even a simple
retrieval-based 1-nearest neighbor agent offers a surprisingly strong baseline
for today's state-of-the-art generalist agents. From this starting point, we
construct a semi-parametric agent, REGENT, that trains a transformer-based
policy on sequences of queries and retrieved neighbors. REGENT can generalize
to unseen robotics and game-playing environments via retrieval augmentation and
in-context learning, achieving this with up to 3x fewer parameters and up to an
order-of-magnitude fewer pre-training datapoints, significantly outperforming
today's state-of-the-art generalist agents. Website:
https://kaustubhsridhar.github.io/regent-research
Authors' comments: ICLR 2025 Oral, NeurIPS 2024 Workshops on Adaptive Foundation Models
(AFM) and Open World Agents (OWA), 30 pages
Cristian-George Crăciun, Răzvan-Alexandru Smădu, Dumitru-Clementin Cercel, Mihaela-Claudia Cercel
Pre-trained Language Models (PLMs) have shown remarkable performances in
recent years, setting a new paradigm for NLP research and industry. The legal
domain has received some attention from the NLP community partly due to its
textual nature. Some tasks from this domain are represented by
question-answering (QA) tasks. This work explores the legal domain
Multiple-Choice QA (MCQA) for a low-resource language. The contribution of this
work is multi-fold. We first introduce JuRO, the first openly available
Romanian legal MCQA dataset, comprising three different examinations and a
number of 10,836 total questions. Along with this dataset, we introduce CROL,
an organized corpus of laws that has a total of 93 distinct documents with
their modifications from 763 time spans, that we leveraged in this work for
Information Retrieval (IR) techniques. Moreover, we are the first to propose
Law-RoG, a Knowledge Graph (KG) for the Romanian language, and this KG is
derived from the aforementioned corpus. Lastly, we propose a novel approach for
MCQA, Graph Retrieval Augmented by Facts (GRAF), which achieves competitive
results with generally accepted SOTA methods and even exceeds them in most
settings.
Authors' comments: Accepted to ACL 2025 Findings
Leah Bar, Boaz Lerner, Nir Darshan, Rami Ben-Ari
Active Learning (AL) is a user-interactive approach aimed at reducing
annotation costs by selecting the most crucial examples to label. Although AL
has been extensively studied for image classification tasks, the specific
scenario of interactive image retrieval has received relatively little
attention. This scenario presents unique characteristics, including an open-set
and class-imbalanced binary classification, starting with very few labeled
samples. We introduce a novel batch-mode Active Learning framework named GAL
(Greedy Active Learning) that better copes with this application. It
incorporates a new acquisition function for sample selection that measures the
impact of each unlabeled sample on the classifier. We further embed this
strategy in a greedy selection approach, better exploiting the samples within
each batch. We evaluate our framework with both linear (SVM) and non-linear
MLP/Gaussian Process classifiers. For the Gaussian Process case, we show a
theoretical guarantee on the greedy approximation. Finally, we assess our
performance for the interactive content-based image retrieval task on several
benchmarks and demonstrate its superiority over existing approaches and common
baselines. Code is available at https://github.com/barleah/GreedyAL.
Authors' comments: Accepted to Transactions on Machine Learning Research (TMLR)
James Allan, Eunsol Choi, Daniel P. Lopresti, Hamed Zamani
In the fast-evolving field of information retrieval (IR), the integration of generative AI technologies such as large language models (LLMs) is transforming how users search for and interact with information. Recognizing this paradigm shift at the intersection of IR and generative AI (IR-GenAI), a visioning workshop supported by the Computing Community Consortium (CCC) was held in July 2024 to discuss the future of IR in the age of generative AI. This workshop convened 44 experts in information retrieval, natural language processing, human-computer interaction, and artificial intelligence from academia, industry, and government to explore how generative AI can enhance IR and vice versa, and to identify the major challenges and opportunities in this rapidly advancing field. This report contains a summary of discussions as potentially important research topics and contains a list of recommendations for academics, industry practitioners, institutions, evaluation campaigns, and funding agencies.
Roman Colman, Minh Vu, Manish Bhattarai, Martin Ma, Hari Viswanathan, Daniel O'Malley, Javier E. Santos
For decades, corporations and governments have relied on scanned documents to
record vast amounts of information. However, extracting this information is a
slow and tedious process due to the overwhelming amount of documents. The rise
of vision language models presents a way to efficiently and accurately extract
the information out of these documents. The current automated workflow often
requires a two-step approach involving the extraction of information using
optical character recognition software, and subsequent usage of large language
models for processing this information. Unfortunately, these methods encounter
significant challenges when dealing with noisy scanned documents. The high
information density of such documents often necessitates using computationally
expensive language models to effectively reduce noise.
In this study, we propose PatchFinder, an algorithm that builds upon Vision
Language Models (VLMs) to address the information extraction task. First, we
devise a confidence-based score, called Patch Confidence, based on the Maximum
Softmax Probability of the VLMs' output to measure the model's confidence in
its predictions. Then, PatchFinder utilizes that score to determine a suitable
patch size, partition the input document into overlapping patches of that size,
and generate confidence-based predictions for the target information. Our
experimental results show that PatchFinder can leverage Phi-3v, a 4.2 billion
parameter vision language model, to achieve an accuracy of 94% on our dataset
of 190 noisy scanned documents, surpassing the performance of ChatGPT-4o by
18.5 percentage points.
Authors' comments: This paper has been accepted to IEEE/CVF Winter Conference on
Applications of Computer Vision (WACV) 2025
Huang Xie, Khazar Khorrami, Okko Räsänen, Tuomas Virtanen
This paper proposes to use similarities of audio captions for estimating audio-caption relevances to be used for training text-based audio retrieval systems. Current audio-caption datasets (e.g., Clotho) contain audio samples paired with annotated captions, but lack relevance information about audio samples and captions beyond the annotated ones. Besides, mainstream approaches (e.g., CLAP) usually treat the annotated pairs as positives and consider all other audio-caption combinations as negatives, assuming a binary relevance between audio samples and captions. To infer the relevance between audio samples and arbitrary captions, we propose a method that computes non-binary audio-caption relevance scores based on the textual similarities of audio captions. We measure textual similarities of audio captions by calculating the cosine similarity of their Sentence-BERT embeddings and then transform these similarities into audio-caption relevance scores using a logistic function, thereby linking audio samples through their annotated captions to all other captions in the dataset. To integrate the computed relevances into training, we employ a listwise ranking objective, where relevance scores are converted into probabilities of ranking audio samples for a given textual query. We show the effectiveness of the proposed method by demonstrating improvements in text-based audio retrieval compared to methods that use binary audio-caption relevances for training.
Xiaqiang Tang, Qiang Gao, Jian Li, Nan Du, Qi Li, Sihong Xie
Retrieval Augmented Generation (RAG) has proven to be highly effective in
boosting the generative performance of language model in knowledge-intensive
tasks. However, existing RAG framework either indiscriminately perform
retrieval or rely on rigid single-class classifiers to select retrieval
methods, leading to inefficiencies and suboptimal performance across queries of
varying complexity. To address these challenges, we propose a reinforcement
learning-based framework that dynamically selects the most suitable retrieval
strategy based on query complexity. % our solution Our approach leverages a
multi-armed bandit algorithm, which treats each retrieval method as a distinct
``arm'' and adapts the selection process by balancing exploration and
exploitation. Additionally, we introduce a dynamic reward function that
balances accuracy and efficiency, penalizing methods that require more
retrieval steps, even if they lead to a correct result. Our method achieves new
state of the art results on multiple single-hop and multi-hop datasets while
reducing retrieval costs. Our code are available at
https://github.com/FUTUREEEEEE/MBA .
Authors' comments: COLING 2025
Tarun Suresh, Revanth Gangi Reddy, Yifei Xu, Zach Nussbaum, Andriy Mulyar, Brandon Duderstadt, Heng Ji
Effective code retrieval plays a crucial role in advancing code generation,
bug fixing, and software maintenance, particularly as software systems increase
in complexity. While current code embedding models have demonstrated promise in
retrieving code snippets for small-scale, well-defined tasks, they often
underperform in more demanding real-world applications such as bug localization
within GitHub repositories. We hypothesize that a key issue is their reliance
on noisy and inconsistent datasets for training, which impedes their ability to
generalize to more complex retrieval scenarios. To address these limitations,
we introduce CoRNStack, a large-scale, high-quality contrastive training
dataset for code that spans multiple programming languages. This dataset is
curated using consistency filtering to eliminate noisy positives and is further
enriched with mined hard negatives, thereby facilitating more effective
learning. We demonstrate that contrastive training of embedding models using
CoRNStack leads to state-of-the-art performance across a variety of code
retrieval tasks. Furthermore, the dataset can be leveraged for training code
reranking models, a largely underexplored area compared to text reranking. Our
finetuned code reranking model significantly improves the ranking quality over
the retrieved results. Finally, by employing our code retriever and reranker
together, we demonstrate significant improvements in function localization for
GitHub issues, an important component of real-world software development.
Authors' comments: Published as a conference paper at ICLR 2025. First and second author
had equal contribution
Abdelrahman Abdallah, Jamshid Mozafari, Bhawna Piryani, Mohammed M. Abdelgwad, Adam Jatowt
This paper presents DynRank, a novel framework for enhancing passage
retrieval in open-domain question-answering systems through dynamic zero-shot
question classification. Traditional approaches rely on static prompts and
pre-defined templates, which may limit model adaptability across different
questions and contexts. In contrast, DynRank introduces a dynamic prompting
mechanism, leveraging a pre-trained question classification model that
categorizes questions into fine-grained types. Based on these classifications,
contextually relevant prompts are generated, enabling more effective passage
retrieval. We integrate DynRank into existing retrieval frameworks and conduct
extensive experiments on multiple QA benchmark datasets.
Authors' comments: Accepted at Coling2025