Dun Yuan, Hao Zhou, Di Wu, Xue Liu, Hao Chen, Yan Xin, Jianzhong, Zhang
Large language models (LLMs) have made significant progress in
general-purpose natural language processing tasks. However, LLMs are still
facing challenges when applied to domain-specific areas like
telecommunications, which demands specialized expertise and adaptability to
evolving standards. This paper presents a novel framework that combines
knowledge graph (KG) and retrieval-augmented generation (RAG) techniques to
enhance LLM performance in the telecom domain. The framework leverages a KG to
capture structured, domain-specific information about network protocols,
standards, and other telecom-related entities, comprehensively representing
their relationships. By integrating KG with RAG, LLMs can dynamically access
and utilize the most relevant and up-to-date knowledge during response
generation. This hybrid approach bridges the gap between structured knowledge
representation and the generative capabilities of LLMs, significantly enhancing
accuracy, adaptability, and domain-specific comprehension. Our results
demonstrate the effectiveness of the KG-RAG framework in addressing complex
technical queries with precision. The proposed KG-RAG model attained an
accuracy of 88% for question answering tasks on a frequently used
telecom-specific dataset, compared to 82% for the RAG-only and 48% for the
LLM-only approaches.
Authors' comments: This work has been accepted to ICC 2025 IEEE International Conference
on Communications. copyright 2025 IEEE
Mario Alberto Vallejo Reyes
This study analyzes representative Mexican folk vocal melodies using MIDI
feature extraction, examining ambitus, pitch-class entropy, and interval
distribution. It also explores the relationship between these features and song
popularity, as measured by Spotify plays. The study employs MATLAB and the MIDI
Toolbox for extracting musical features and performing statistical analysis.
The findings reveal a significant variation in ambitus, with values ranging
from 8 to 27 semitones, indicating a diverse compositional style and vocal
demand across the genre. The analysis of pitch-class entropy showcases a broad
spectrum of melodic complexity, with Armando Manzanero's `Somos Novios'
displaying the highest entropy, suggesting varied and complex melodic
structures, while traditional pieces like `La Bamba' exhibit lower entropy,
indicating simpler, more repetitive patterns. The interval distribution
predominantly features prime intervals (P1), major and minor seconds (M2, m2),
pointing to a compositional preference for close, contiguous intervals that
contribute to the melodies' accessibility and appeal. Statistical analysis do
not establish a significant correlation between the ambitus or entropy and the
number of Spotify plays.
Authors' comments: 10 pages, 5 figures, 2 tables
Emmanouil Georgios Lionis, Jia-Huei Ju
Document retrieval is one of the most challenging tasks in Information
Retrieval. It requires handling longer contexts, often resulting in higher
query latency and increased computational overhead. Recently, Learned Sparse
Retrieval (LSR) has emerged as a promising approach to address these
challenges. Some have proposed adapting the LSR approach to longer documents by
aggregating segmented document using different post-hoc methods, including
n-grams and proximity scores, adjusting representations, and learning to
ensemble all signals. In this study, we aim to reproduce and examine the
mechanisms of adapting LSR for long documents. Our reproducibility experiments
confirmed the importance of specific segments, with the first segment
consistently dominating document retrieval performance. Furthermore, We
re-evaluate recently proposed methods -- ExactSDM and SoftSDM -- across varying
document lengths, from short (up to 2 segments) to longer (3+ segments). We
also designed multiple analyses to probe the reproduced methods and shed light
on the impact of global information on adapting LSR to longer contexts. The
complete code and implementation for this project is available at:
https://github.com/lionisakis/Reproducibilitiy-lsr-long.
Authors' comments: This is a preprint of our paper accepted at ECIR 2025
Rahul Agarwal, Amit Jaspal, Saurabh Gupta, Omkar Vichare
Recommender systems operate in closed feedback loops, where user interactions
reinforce popularity bias, leading to over-recommendation of already popular
items while under-exposing niche or novel content. Existing bias mitigation
methods, such as Inverse Propensity Scoring (IPS) and Off-Policy Correction
(OPC), primarily operate at the ranking stage or during training, lacking
explicit real-time control over exposure dynamics. In this work, we introduce
an exposure-aware retrieval scoring approach, which explicitly models item
exposure probability and adjusts retrieval-stage ranking at inference time.
Unlike prior work, this method decouples exposure effects from engagement
likelihood, enabling controlled trade-offs between fairness and engagement in
large-scale recommendation platforms. We validate our approach through online
A/B experiments in a real-world video recommendation system, demonstrating a
25% increase in uniquely retrieved items and a 40% reduction in the dominance
of over-popular content, all while maintaining overall user engagement levels.
Our results establish a scalable, deployable solution for mitigating popularity
bias at the retrieval stage, offering a new paradigm for bias-aware
personalization.
Authors' comments: 2 pages. UMAP '25: 33rd ACM Conference on User Modeling, Adaptation
and Personalization, New York City, USA, June 2025
Yuelyu Ji, Rui Meng, Zhuochun Li, Daqing He
Multi-hop question answering (QA) requires models to retrieve and reason over multiple pieces of evidence. While Retrieval-Augmented Generation (RAG) has made progress in this area, existing methods often suffer from two key limitations: (1) fixed or overly frequent retrieval steps, and (2) ineffective use of previously retrieved knowledge. We propose MIND (Memory-Informed and INteractive Dynamic RAG), a framework that addresses these challenges through: (i) prompt-based entity extraction to identify reasoning-relevant elements, (ii) dynamic retrieval triggering based on token-level entropy and attention signals, and (iii) memory-aware filtering, which stores high-confidence facts across reasoning steps to enable consistent multi-hop generation.
Sangam Lee, Ryang Heo, SeongKu Kang, Dongha Lee
Existing dense retrieval models struggle with reasoning-intensive retrieval
task as they fail to capture implicit relevance that requires reasoning beyond
surface-level semantic information. To address these challenges, we propose
Scenario-Profiled Indexing with Knowledge Expansion (SPIKE), a dense retrieval
framework that explicitly indexes implicit relevance by decomposing documents
into scenario-based retrieval units. SPIKE organizes documents into scenario,
which encapsulates the reasoning process necessary to uncover implicit
relationships between hypothetical information needs and document content.
SPIKE constructs a scenario-augmented dataset using a powerful teacher large
language model (LLM), then distills these reasoning capabilities into a
smaller, efficient scenario generator. During inference, SPIKE incorporates
scenario-level relevance alongside document-level relevance, enabling
reasoning-aware retrieval. Extensive experiments demonstrate that SPIKE
consistently enhances retrieval performance across various query types and
dense retrievers. It also enhances the retrieval experience for users through
scenario and offers valuable contextual information for LLMs in
retrieval-augmented generation (RAG).
Authors' comments: 9 pages
Yichun Feng, Jiawei Wang, Ruikun He, Lu Zhou, Yixue Li
Knowledge graphs and large language models (LLMs) are key tools for biomedical knowledge integration and reasoning, facilitating structured organization of scientific articles and discovery of complex semantic relationships. However, current methods face challenges: knowledge graph construction is limited by complex terminology, data heterogeneity, and rapid knowledge evolution, while LLMs show limitations in retrieval and reasoning, making it difficult to uncover cross-document associations and reasoning pathways. To address these issues, we propose a pipeline that uses LLMs to construct a biomedical knowledge graph (BioStrataKG) from large-scale articles and builds a cross-document question-answering dataset (BioCDQA) to evaluate latent knowledge retrieval and multi-hop reasoning. We then introduce Integrated and Progressive Retrieval-Augmented Reasoning (IP-RAR) to enhance retrieval accuracy and knowledge reasoning. IP-RAR maximizes information recall through Integrated Reasoning-based Retrieval and refines knowledge via Progressive Reasoning-based Generation, using self-reflection to achieve deep thinking and precise contextual understanding. Experiments show that IP-RAR improves document retrieval F1 score by 20\% and answer generation accuracy by 25\% over existing methods. This framework helps doctors efficiently integrate treatment evidence for personalized medication plans and enables researchers to analyze advancements and research gaps, accelerating scientific discovery and decision-making.
Xueqing Liu, Jiangrui Zheng, Guanqun Yang, Siyan Wen, Qiushi Liu
In recent years, the rapid increase of security vulnerabilities has caused major challenges in managing them. One critical task in vulnerability management is tracing the patches that fix a vulnerability. By accurately tracing the patching commits, security stakeholders can precisely identify affected software components, determine vulnerable and fixed versions, assess the severity etc., which facilitates rapid deployment of mitigations. However, previous work has shown that the patch information is often missing in vulnerability databases, including both the National Vulnerability Databases (NVD) and the GitHub Advisory Database, which increases the risk of delayed mitigation, incorrect vulnerability assessment, and potential exploits. Although existing work has proposed several approaches for patch tracing, they suffer from two major challenges: (1) the lack of scalability to the full-repository level, and (2) the lack of study on how to model the semantic similarity between the CVE and the full diff code. Upon identifying this gap, we propose SITPatchTracer, a scalable full-repo full-context retrieval system for security vulnerability patch tracing. SITPatchTracer leverages ElasticSearch, learning-to-rank, and a hierarchical embedding approach based on GritLM, a top-ranked LLM for text embedding with unlimited context length and fast inference speed. The evaluation of SITPatchTracer shows that it achieves a high recall on both evaluated datasets. SITPatchTracer's recall not only outperforms several existing works (PatchFinder, PatchScout, VFCFinder), but also Voyage, the SOTA commercial code embedding API by 13\% and 28\%.
Yi-Ting Shen, Sungmin Eum, Doheon Lee, Rohit Shete, Chiao-Yi Wang, Heesung Kwon, Shuvra S. Bhattacharyya
Composed pose retrieval (CPR) enables users to search for human poses by specifying a reference pose and a transition description, but progress in this field is hindered by the scarcity and inconsistency of annotated pose transitions. Existing CPR datasets rely on costly human annotations or heuristic-based rule generation, both of which limit scalability and diversity. In this work, we introduce AutoComPose, the first framework that leverages multimodal large language models (MLLMs) to automatically generate rich and structured pose transition descriptions. Our method enhances annotation quality by structuring transitions into fine-grained body part movements and introducing mirrored/swapped variations, while a cyclic consistency constraint ensures logical coherence between forward and reverse transitions. To advance CPR research, we construct and release two dedicated benchmarks, AIST-CPR and PoseFixCPR, supplementing prior datasets with enhanced attributes. Extensive experiments demonstrate that training retrieval models with AutoComPose yields superior performance over human-annotated and heuristic-based methods, significantly reducing annotation costs while improving retrieval quality. Our work pioneers the automatic annotation of pose transitions, establishing a scalable foundation for future CPR research.
Bruno Coelho, Shujaat Mirza, Yuyuan Cui, Christina Pöpper, Damon McCoy
Fact-checking is a potentially useful application of Large Language Models (LLMs) to combat the growing dissemination of disinformation. However, the performance of LLMs varies across geographic regions. In this paper, we evaluate the factual accuracy of open and private models across a diverse set of regions and scenarios. Using a dataset containing 600 fact-checked statements balanced across six global regions we examine three experimental setups of fact-checking a statement: (1) when just the statement is available, (2) when an LLM-based agent with Wikipedia access is utilized, and (3) as a best case scenario when a Retrieval-Augmented Generation (RAG) system provided with the official fact check is employed. Our findings reveal that regardless of the scenario and LLM used, including GPT-4, Claude Sonnet, and LLaMA, statements from the Global North perform substantially better than those from the Global South. Furthermore, this gap is broadened for the more realistic case of a Wikipedia agent-based system, highlighting that overly general knowledge bases have a limited ability to address region-specific nuances. These results underscore the urgent need for better dataset balancing and robust retrieval strategies to enhance LLM fact-checking capabilities, particularly in geographically diverse contexts.
Andreas Chari, Sean MacAvaney, Iadh Ounis
Globalisation and colonisation have led the vast majority of the world to use
only a fraction of languages, such as English and French, to communicate,
excluding many others. This has severely affected the survivability of many
now-deemed vulnerable or endangered languages, such as Occitan and Sicilian.
These languages often share some characteristics, such as elements of their
grammar and lexicon, with other high-resource languages, e.g. French or
Italian. They can be clustered into groups of language varieties with various
degrees of mutual intelligibility. Current search systems are not usually
trained on many of these low-resource varieties, leading search users to
express their needs in a high-resource language instead. This problem is
further complicated when most information content is expressed in a
high-resource language, inhibiting even more retrieval in low-resource
languages. We show that current search systems are not robust across language
varieties, severely affecting retrieval effectiveness. Therefore, it would be
desirable for these systems to leverage the capabilities of neural models to
bridge the differences between these varieties. This can allow users to express
their needs in their low-resource variety and retrieve the most relevant
documents in a high-resource one. To address this, we propose fine-tuning
neural rankers on pairs of language varieties, thereby exposing them to their
linguistic similarities. We find that this approach improves the performance of
the varieties upon which the models were directly trained, thereby regularising
these models to generalise and perform better even on unseen language variety
pairs. We also explore whether this approach can transfer across language
families and observe mixed results that open doors for future research.
Authors' comments: 12 Pages, 5 Figures, 2 Tables, Full Paper accepted at IR4GOOD track
in ECIR 2025
Magdalena Kaiser, Gerhard Weikum
Conversational Question Answering (ConvQA) involves multiple subtasks, i) to
understand incomplete questions in their context, ii) to retrieve relevant
information, and iii) to generate answers. This work presents PRAISE, a
pipeline-based approach for ConvQA that trains LLM adapters for each of the
three subtasks. As labeled training data for individual subtasks is unavailable
in practice, PRAISE learns from its own generations using the final answering
performance as feedback signal without human intervention and treats
intermediate information, like relevant evidence, as weakly labeled data. We
apply Direct Preference Optimization by contrasting successful and unsuccessful
samples for each subtask. In our experiments, we show the effectiveness of this
training paradigm: PRAISE shows improvements per subtask and achieves new
state-of-the-art performance on a popular ConvQA benchmark, by gaining 15.5
percentage points increase in precision over baselines.
Authors' comments: WWW 2025 Short Paper, 5 pages
Min Cao, ZiYin Zeng, YuXin Lu, Mang Ye, Dong Yi, Jinqiao Wang
Data plays a pivotal role in Text-Based Person Retrieval (TBPR) research.
Mainstream research paradigm necessitates real-world person images with manual
textual annotations for training models, posing privacy-sensitive and
labor-intensive issues. Several pioneering efforts explore synthetic data for
TBPR but still rely on real data, keeping the aforementioned issues and also
resulting in diversity-deficient issue in synthetic datasets, thus impacting
TBPR performance. Moreover, these works tend to explore synthetic data for TBPR
through limited perspectives, leading to exploration-restricted issue. In this
paper, we conduct an empirical study to explore the potential of synthetic data
for TBPR, highlighting three key aspects. (1) We propose an inter-class image
generation pipeline, in which an automatic prompt construction strategy is
introduced to guide generative Artificial Intelligence (AI) models in
generating various inter-class images without reliance on original data. (2) We
develop an intra-class image augmentation pipeline, in which the generative AI
models are applied to further edit the images for obtaining various intra-class
images. (3) Building upon the proposed pipelines and an automatic text
generation pipeline, we explore the effectiveness of synthetic data in diverse
scenarios through extensive experiments. Additionally, we experimentally
investigate various noise-robust learning strategies to mitigate the inherent
noise in synthetic data. We will release the code, along with the synthetic
large-scale dataset generated by our pipelines, which are expected to advance
practical TBPR research.
Authors' comments: 20 pages,13 figures
Cheng Wang, Yiwei Wang, Yujun Cai, Bryan Hooi
Retrieval-augmented generation (RAG) systems enhance large language models by
incorporating external knowledge, addressing issues like outdated internal
knowledge and hallucination. However, their reliance on external knowledge
bases makes them vulnerable to corpus poisoning attacks, where adversarial
passages can be injected to manipulate retrieval results. Existing methods for
crafting such passages, such as random token replacement or training inversion
models, are often slow and computationally expensive, requiring either access
to retriever's gradients or large computational resources. To address these
limitations, we propose Dynamic Importance-Guided Genetic Algorithm (DIGA), an
efficient black-box method that leverages two key properties of retrievers:
insensitivity to token order and bias towards influential tokens. By focusing
on these characteristics, DIGA dynamically adjusts its genetic operations to
generate effective adversarial passages with significantly reduced time and
memory usage. Our experimental evaluation shows that DIGA achieves superior
efficiency and scalability compared to existing methods, while maintaining
comparable or better attack success rates across multiple datasets.
Authors' comments: Accepted to NAACL 2025 Main Track
Jincheng Yan, Yun Wang, Xiaoyan Luo, Yu-Wing Tai
Person re-identification (ReID) plays a critical role in applications like security surveillance and criminal investigations by matching individuals across large image galleries captured by non-overlapping cameras. Traditional ReID methods rely on unimodal inputs, typically images, but face limitations due to challenges like occlusions, lighting changes, and pose variations. While advancements in image-based and text-based ReID systems have been made, the integration of both modalities has remained under-explored. This paper presents FusionSegReID, a multimodal model that combines both image and text inputs for enhanced ReID performance. By leveraging the complementary strengths of these modalities, our model improves matching accuracy and robustness, particularly in complex, real-world scenarios where one modality may struggle. Our experiments show significant improvements in Top-1 accuracy and mean Average Precision (mAP) for ReID, as well as better segmentation results in challenging scenarios like occlusion and low-quality images. Ablation studies further confirm that multimodal fusion and segmentation modules contribute to enhanced re-identification and mask accuracy. The results show that FusionSegReID outperforms traditional unimodal models, offering a more robust and flexible solution for real-world person ReID tasks.
Zixu Li, Zhiheng Fu, Yupeng Hu, Zhiwei Chen, Haokun Wen, Liqiang Nie
Composed Image Retrieval (CIR) facilitates image retrieval through a multimodal query consisting of a reference image and modification text. The reference image defines the retrieval context, while the modification text specifies desired alterations. However, existing CIR datasets predominantly employ coarse-grained modification text (CoarseMT), which inadequately captures fine-grained retrieval intents. This limitation introduces two key challenges: (1) ignoring detailed differences leads to imprecise positive samples, and (2) greater ambiguity arises when retrieving visually similar images. These issues degrade retrieval accuracy, necessitating manual result filtering or repeated queries. To address these limitations, we develop a robust fine-grained CIR data annotation pipeline that minimizes imprecise positive samples and enhances CIR systems' ability to discern modification intents accurately. Using this pipeline, we refine the FashionIQ and CIRR datasets to create two fine-grained CIR datasets: Fine-FashionIQ and Fine-CIRR. Furthermore, we introduce FineCIR, the first CIR framework explicitly designed to parse the modification text. FineCIR effectively captures fine-grained modification semantics and aligns them with ambiguous visual entities, enhancing retrieval precision. Extensive experiments demonstrate that FineCIR consistently outperforms state-of-the-art CIR baselines on both fine-grained and traditional CIR benchmark datasets. Our FineCIR code and fine-grained CIR datasets are available at https://github.com/SDU-L/FineCIR.git.
Reza Qorbani, Gianluca Villani, Theodoros Panagiotakopoulos, Marc Botet Colomer, Linus Härenstam-Nielsen, Mattia Segu, Pier Luigi Dovesi, Jussi Karlgren et al.
Open-vocabulary semantic segmentation models associate vision and text to
label pixels from an undefined set of classes using textual queries, providing
versatile performance on novel datasets. However, large shifts between training
and test domains degrade their performance, requiring fine-tuning for effective
real-world applications. We introduce Semantic Library Adaptation (SemLA), a
novel framework for training-free, test-time domain adaptation. SemLA leverages
a library of LoRA-based adapters indexed with CLIP embeddings, dynamically
merging the most relevant adapters based on proximity to the target domain in
the embedding space. This approach constructs an ad-hoc model tailored to each
specific input without additional training. Our method scales efficiently,
enhances explainability by tracking adapter contributions, and inherently
protects data privacy, making it ideal for sensitive applications.
Comprehensive experiments on a 20-domain benchmark built over 10 standard
datasets demonstrate SemLA's superior adaptability and performance across
diverse settings, establishing a new standard in domain adaptation for
open-vocabulary semantic segmentation.
Authors' comments: CVPR 2025. Project page: https://thegoodailab.org/semla Code:
https://github.com/rezaqorbani/SemLA
Zhicheng Lee, Shulin Cao, Jinxin Liu, Jiajie Zhang, Weichuan Liu, Xiaoyin Che, Lei Hou, Juanzi Li
Large Reasoning Models (LRMs) exhibit remarkable reasoning abilities but rely primarily on parametric knowledge, limiting factual accuracy. While recent works equip reinforcement learning (RL)-based LRMs with retrieval capabilities, they suffer from overthinking and lack robustness in reasoning, reducing their effectiveness in question answering (QA) tasks. To address this, we propose ReaRAG, a factuality-enhanced reasoning model that explores diverse queries without excessive iterations. Our solution includes a novel data construction framework with an upper bound on the reasoning chain length. Specifically, we first leverage an LRM to generate deliberate thinking, then select an action from a predefined action space (Search and Finish). For Search action, a query is executed against the RAG engine, where the result is returned as observation to guide reasoning steps later. This process iterates until a Finish action is chosen. Benefiting from ReaRAG's strong reasoning capabilities, our approach outperforms existing baselines on multi-hop QA. Further analysis highlights its strong reflective ability to recognize errors and refine its reasoning trajectory. Our study enhances LRMs' factuality while effectively integrating robust reasoning for Retrieval-Augmented Generation (RAG).
Yunhai Hu, Yilun Zhao, Chen Zhao, Arman Cohan
We introduce MCTS-RAG, a novel approach that enhances the reasoning capabilities of small language models on knowledge-intensive tasks by leveraging retrieval-augmented generation (RAG) to provide relevant context and Monte Carlo Tree Search (MCTS) to refine reasoning paths. MCTS-RAG dynamically integrates retrieval and reasoning through an iterative decision-making process. Unlike standard RAG methods, which typically retrieve information independently from reasoning and thus integrate knowledge suboptimally, or conventional MCTS reasoning, which depends solely on internal model knowledge without external facts, MCTS-RAG combines structured reasoning with adaptive retrieval. This integrated approach enhances decision-making, reduces hallucinations, and ensures improved factual accuracy and response consistency. The experimental results on multiple reasoning and knowledge-intensive datasets datasets (i.e., ComplexWebQA, GPQA, and FoolMeTwice) show that our method enables small-scale LMs to achieve performance comparable to frontier LLMs like GPT-4o by effectively scaling inference-time compute, setting a new standard for reasoning in small-scale models.
Aishwarya Parab, Prakhar Pradhan, Yogesh Simmhan, Arnab K. Paul
The increasing availability of data from diverse sources, including trusted entities such as governments, as well as untrusted crowd-sourced contributors, demands a secure and trustworthy environment for storage and retrieval. Blockchain, as a distributed and immutable ledger, offers a promising solution to address these challenges. This short paper studies the feasibility of a blockchain-based framework for secure data storage and retrieval across trusted and untrusted sources, focusing on provenance, storage mechanisms, and smart contract security. Through initial experiments using Hyper Ledger Fabric (HLF), we evaluate the storage efficiency, scalability, and feasibility of the proposed approach. This study serves as a motivation for future research to develop a comprehensive blockchain-based storage and retrieval framework.