Moazzam Umer Gondal, Hamad Ul Qudous, Daniya Siddiqui, Asma Ahmad Farhan
This paper introduces the retrieval-augmented framework for automatic fashion caption and hashtag generation, combining multi-garment detection, attribute reasoning, and Large Language Model (LLM) prompting. The system aims to produce visually grounded, descriptive, and stylistically interesting text for fashion imagery, overcoming the limitations of end-to-end captioners that have problems with attribute fidelity and domain generalization. The pipeline combines a YOLO-based detector for multi-garment localization, k-means clustering for dominant color extraction, and a CLIP-FAISS retrieval module for fabric and gender attribute inference based on a structured product index. These attributes, together with retrieved style examples, create a factual evidence pack that is used to guide an LLM to generate human-like captions and contextually rich hashtags. A fine-tuned BLIP model is used as a supervised baseline model for comparison. Experimental results show that the YOLO detector is able to obtain a mean Average Precision (mAP@0.5) of 0.71 for nine categories of garments. The RAG-LLM pipeline generates expressive attribute-aligned captions and achieves mean attribute coverage of 0.80 with full coverage at the 50% threshold in hashtag generation, whereas BLIP gives higher lexical overlap and lower generalization. The retrieval-augmented approach exhibits better factual grounding, less hallucination, and great potential for scalable deployment in various clothing domains. These results demonstrate the use of retrieval-augmented generation as an effective and interpretable paradigm for automated and visually grounded fashion content generation.
Authors' comments: Submitted to Expert Systems with Applications
Tarun Prajapati
Modern Retrieval-Augmented Generation (RAG) systems struggle with a fundamental architectural tension: vector indices are optimized for query latency but poorly handle continuous knowledge updates, while data lakes excel at versioning but introduce query latency penalties. We introduce LiveVectorLake, a dual-tier temporal knowledge base architecture that enables real-time semantic search on current knowledge while maintaining complete version history for compliance, auditability, and point-in-time retrieval. The system introduces three core architectural contributions: (1) Content-addressable chunk-level synchronization using SHA-256 hashing for deterministic change detection without external state tracking; (2) Dual-tier storage separating hot-tier vector indices (Milvus with HNSW) from cold-tier columnar versioning (Delta Lake with Parquet), optimizing query latency and storage cost independently; (3) Temporal query routing enabling point-in-time knowledge retrieval via delta-versioning with ACID consistency across tiers. Evaluation on a 100-document corpus versioned across five time points demonstrates: (i) 10-15% re-processing of content during updates compared to 100% for full re-indexing; (ii) sub-100ms retrieval latency on current knowledge; (iii) sub-2s latency for temporal queries across version history; and (iv) storage cost optimization through hot/cold tier separation (only current chunks in expensive vector indices). The approach enables production RAG deployments requiring simultaneous optimization for query performance, update efficiency, and regulatory compliance. Code and resources: [https://github.com/praj-tarun/LiveVectorLake]
Authors' comments: 7 pages, 1 figure. Preprint; work in progress
Guillaume Braun, Bruno Loureiro, Ha Quang Minh, Masaaki Imaizumi
Scaling laws describe how learning performance improves with data, compute, or training time, and have become a central theme in modern deep learning. We study this phenomenon in a canonical nonlinear model: phase retrieval with anisotropic Gaussian inputs whose covariance spectrum follows a power law. Unlike the isotropic case, where dynamics collapse to a two-dimensional system, anisotropy yields a qualitatively new regime in which an infinite hierarchy of coupled equations governs the evolution of the summary statistics. We develop a tractable reduction that reveals a three-phase trajectory: (i) fast escape from low alignment, (ii) slow convergence of the summary statistics, and (iii) spectral-tail learning in low-variance directions. From this decomposition, we derive explicit scaling laws for the mean-squared error, showing how spectral decay dictates convergence times and error curves. Experiments confirm the predicted phases and exponents. These results provide the first rigorous characterization of scaling laws in nonlinear regression with anisotropic data, highlighting how anisotropy reshapes learning dynamics.
Muhammad Bilal, Zafar Qazi, Marco Canini
The rise of Generative AI Search is fundamentally transforming how users and intelligent systems interact with the Internet. LLMs increasingly act as intermediaries between humans and web information. Yet the web remains optimized for human browsing rather than AI-driven semantic retrieval, resulting in wasted network bandwidth, lower information quality, and unnecessary complexity for developers. We introduce the concept of an AI-Native Internet, a web architecture in which servers expose semantically relevant information chunks rather than full documents, supported by a Web-native semantic resolver that allows AI applications to discover relevant information sources before retrieving fine-grained chunks. Through motivational experiments, we quantify the inefficiencies of current HTML-based retrieval, and outline architectural directions and open challenges for evolving today's document-centric web into an AI-oriented substrate that better supports semantic access to web content.
Joseph Oladokun
Large Language Model agents often retrieve context from knowledge bases that lack structural consistency with the agent's current reasoning state, leading to incoherent reasoning chains. We introduce Path-Constrained Retrieval (PCR), a retrieval method that combines structural graph constraints with semantic search to ensure retrieved information maintains logical relationships within a knowledge graph. PCR restricts the search space to nodes reachable from an anchor node, preventing retrieval of structurally disconnected information that may lead to inconsistent reasoning. We evaluate PCR on PathRAG-6, a benchmark spanning six domains with 180 nodes and 360 edges. Our results show that PCR achieves full structural consistency compared to 24-32 percent in baseline methods, while maintaining strong relevance scores. On the technology domain, PCR obtains full relevance at rank 10 with full structural consistency, significantly outperforming vector search and hybrid retrieval. PCR reduces the average graph distance of retrieved context by 78 percent compared to baselines, demonstrating retrieval of more structurally consistent information. These findings suggest that path-constrained retrieval is an effective approach for improving the reliability and coherence of LLM agent reasoning systems.
Authors' comments: 10 pages
Svitlana Volkova, Peter Bautista, Avinash Hiriyanna, Gabriel Ganberg, Isabel Erickson, Zachary Klinefelter, Nick Abele, Hsien-Te Kao et al.
The exponential growth of scientific knowledge has created significant barriers to cross-disciplinary knowledge discovery, synthesis and research collaboration. In response to this challenge, we present BioSage, a novel compound AI architecture that integrates LLMs with RAG, orchestrated specialized agents and tools to enable discoveries across AI, data science, biomedical, and biosecurity domains. Our system features several specialized agents including the retrieval agent with query planning and response synthesis that enable knowledge retrieval across domains with citation-backed responses, cross-disciplinary translation agents that align specialized terminology and methodologies, and reasoning agents that synthesize domain-specific insights with transparency, traceability and usability. We demonstrate the effectiveness of our BioSage system through a rigorous evaluation on scientific benchmarks (LitQA2, GPQA, WMDP, HLE-Bio) and introduce a new cross-modal benchmark for biology and AI, showing that our BioSage agents outperform vanilla and RAG approaches by 13\%-21\% powered by Llama 3.1. 70B and GPT-4o models. We perform causal investigations into compound AI system behavior and report significant performance improvements by adding RAG and agents over the vanilla models. Unlike other systems, our solution is driven by user-centric design principles and orchestrates specialized user-agent interaction workflows supporting scientific activities including but not limited to summarization, research debate and brainstorming. Our ongoing work focuses on multimodal retrieval and reasoning over charts, tables, and structured scientific data, along with developing comprehensive multimodal benchmarks for cross-disciplinary discovery. Our compound AI solution demonstrates significant potential for accelerating scientific advancement by reducing barriers between traditionally siloed domains.
Ruoxin Zhang, Zhizhao Wen, Chao Wang, Chenchen Tang, Puyang Xu, Yifan Jiang
With the rapid evolution of large language models, retrieval enhanced generation technology has been widely used due to its ability to integrate external knowledge to improve output accuracy. However, the performance of the system is highly dependent on the quality of the retrieval module. If the retrieval results have low relevance to user needs or contain noisy information, it will directly lead to distortion of the generated content. In response to the performance bottleneck of existing models in processing tabular features, this paper proposes an XGBoost machine learning regression model based on feature engineering and particle swarm optimization. Correlation analysis shows that answer_quality is positively correlated with doc_delevance by 0.66, indicating that document relevance has a significant positive effect on answer quality, and improving document relevance may enhance answer quality; The strong negative correlations between semantic similarity, redundancy, and diversity were -0.89 and -0.88, respectively, indicating a trade- off between semantic similarity, redundancy, and diversity. In other words, as the former two increased, diversity significantly decreased. The experimental results comparing decision trees, AdaBoost, etc. show that the VMD PSO BiLSTM model is superior in all evaluation indicators, with significantly lower MSE, RMSE, MAE, and MAPE compared to the comparison model. The R2 value is higher, indicating that its prediction accuracy, stability, and data interpretation ability are more outstanding. This achievement provides an effective path for optimizing the retrieval quality and improving the generation effect of RAG system, and has important value in promoting the implementation and application of related technologies.
Faheem Nizar, Elias Lumer, Anmol Gulati, Pradeep Honaganahalli Basavaraju, Vamse Kumar Subbiah
Recent advances in Large Language Model Multi-Agent Systems enable scalable orchestration and retrieval of specialized, parallelized subagents, each equipped with hundreds or thousands of Model Context Protocol (MCP) servers and tools. However, existing agent, MCP, and retrieval methods typically match queries against a single agent description, obscuring fine-grained tool capabilities of each agent, resulting in suboptimal agent selection. We introduce Agent-as-a-Graph retrieval, a knowledge graph retrieval augmented generation approach that represents both tools and their parent agents as nodes and edges in a knowledge graph. During retrieval, i) relevant agents and tool nodes are first retrieved through vector search, ii) we apply a type-specific weighted reciprocal rank fusion (wRRF) for reranking tools and agents, and iii) parent agents are traversed in the knowledge graph for the final set of agents. We evaluate Agent-as-a-Graph on the LiveMCPBenchmark, achieving 14.9% and 14.6% improvements in Recall@5 and nDCG@5 over prior state-of-the-art retrievers, and 2.4% improvements in wRRF optimizations.
Soroush Javdan, Pragash Krishnamoorthy, Olga Baysal
The rapid evolution of the telecommunication industry necessitates efficient troubleshooting processes to maintain network reliability, software maintainability, and service quality. Trouble Reports (TRs), which document issues in Ericsson's production system, play a critical role in facilitating the timely resolution of software faults. However, the complexity and volume of TR data, along with the presence of diverse criteria that reflect different aspects of each fault, present challenges for retrieval systems. Building on prior work at Ericsson, which utilized a two-stage workflow, comprising Initial Retrieval (IR) and Re-Ranking (RR) stages, this study investigates different TR observation criteria and their impact on the performance of retrieval models. We propose \textbf{CREST} (\textbf{C}riteria-specific \textbf{R}etrieval via \textbf{E}nsemble of \textbf{S}pecialized \textbf{T}R models), a criterion-driven retrieval approach that leverages specialized models for different TR fields to improve both effectiveness and interpretability, thereby enabling quicker fault resolution and supporting software maintenance. CREST utilizes specialized models trained on specific TR criteria and aggregates their outputs to capture diverse and complementary signals. This approach leads to enhanced retrieval accuracy, better calibration of predicted scores, and improved interpretability by providing relevance scores for each criterion, helping users understand why specific TRs were retrieved. Using a subset of Ericsson's internal TRs, this research demonstrates that criterion-specific models significantly outperform a single model approach across key evaluation metrics. This highlights the importance of all targeted criteria used in this study for optimizing the performance of retrieval systems.
Zhijie Chen, Xiang Chen, Ziming Li, Jiacheng Xue, Chaoyang Gao
Context: Software Vulnerability Assessment (SVA) plays a vital role in evaluating and ranking vulnerabilities in software systems to ensure their security and reliability. Objective: Although Large Language Models (LLMs) have recently shown remarkable potential in SVA, they still face two major limitations. First, most LLMs are trained on general-purpose corpora and thus lack domain-specific knowledge essential for effective SVA. Second, they tend to rely on shallow pattern matching instead of deep contextual reasoning, making it challenging to fully comprehend complex code semantics and their security implications. Method: To alleviate these limitations, we propose a novel framework ReVul-CoT that integrates Retrieval-Augmented Generation (RAG) with Chain-of-Thought (COT) prompting. In ReVul-CoT, the RAG module dynamically retrieves contextually relevant information from a constructed local knowledge base that consolidates vulnerability data from authoritative sources (such as NVD and CWE), along with corresponding code snippets and descriptive information. Building on DeepSeek-V3.1, CoT prompting guides the LLM to perform step-by-step reasoning over exploitability, impact scope, and related factors Results: We evaluate ReVul-CoT on a dataset of 12,070 vulnerabilities. Experimental results show that ReVul-CoT outperforms state-of-the-art SVA baselines by 16.50%-42.26% in terms of MCC, and outperforms the best baseline by 10.43%, 15.86%, and 16.50% in Accuracy, F1-score, and MCC, respectively. Our ablation studies further validate the contributions of considering dynamic retrieval, knowledge integration, and CoT-based reasoning. Conclusion: Our results demonstrate that combining RAG with CoT prompting significantly enhances LLM-based SVA and points out promising directions for future research.
Qijun You, Lingshuo Meng, Fangrui Quan, Wei Cao
We present a novel axial ptychographic coherent diffractive imaging (AP-CDI) technique designed to overcome the critical throughput bottleneck of conventional methods. By replacing the 2D raster scan with a simple 1D axial scan, our approach reduces the number of required diffraction patterns by approximately an order of magnitude while maintaining high-fidelity reconstruction. We have experimentally validated this concept, successfully performing simultaneous spectroscopic imaging of a sample and quantitative wavefront characterization of the illumination, thereby accurately quantifying the chromatic aberration of a broadband field. This capability establishes AP-CDI as a highly efficient and versatile tool for real-time, multi-modal imaging, with immediate potential in ultrafast science, material characterization, and live-cell bio-imaging.
Authors' comments: 10 pages, 7 figures
Misaal Khan, Mayank Vatsa, Kuldeep Singh, Richa Singh
Child malnutrition remains a global crisis, yet existing screening methods are laborious and poorly scalable, hindering early intervention. In this work, we present NutriScreener, a retrieval-augmented, multi-pose graph attention network that combines CLIP-based visual embeddings, class-boosted knowledge retrieval, and context awareness to enable robust malnutrition detection and anthropometric prediction from children's images, simultaneously addressing generalizability and class imbalance. In a clinical study, doctors rated it 4.3/5 for accuracy and 4.6/5 for efficiency, confirming its deployment readiness in low-resource settings. Trained and tested on 2,141 children from AnthroVision and additionally evaluated on diverse cross-continent populations, including ARAN and an in-house collected CampusPose dataset, it achieves 0.79 recall, 0.82 AUC, and significantly lower anthropometric RMSEs, demonstrating reliable measurement in unconstrained pediatric settings. Cross-dataset results show up to 25% recall gain and up to 3.5 cm RMSE reduction using demographically matched knowledge bases. NutriScreener offers a scalable and accurate solution for early malnutrition detection in low-resource environments.
Authors' comments: Accepted in AAAI 2026 Special Track on AI for Social Impact
Özay Ezerceli, Mahmoud El Hussieni, Selva Taş, Reyhan Bayraktar, Fatma Betül Terzioğlu, Yusuf Çelebi, Yağız Asker
Neural information retrieval systems excel in high-resource languages but remain underexplored for morphologically rich, lower-resource languages such as Turkish. Dense bi-encoders currently dominate Turkish IR, yet late-interaction models -- which retain token-level representations for fine-grained matching -- have not been systematically evaluated. We introduce TurkColBERT, the first comprehensive benchmark comparing dense encoders and late-interaction models for Turkish retrieval. Our two-stage adaptation pipeline fine-tunes English and multilingual encoders on Turkish NLI/STS tasks, then converts them into ColBERT-style retrievers using PyLate trained on MS MARCO-TR. We evaluate 10 models across five Turkish BEIR datasets covering scientific, financial, and argumentative domains. Results show strong parameter efficiency: the 1.0M-parameter colbert-hash-nano-tr is 600$\times$ smaller than the 600M turkish-e5-large dense encoder while preserving over 71\% of its average mAP. Late-interaction models that are 3--5$\times$ smaller than dense encoders significantly outperform them; ColmmBERT-base-TR yields up to +13.8\% mAP on domain-specific tasks. For production-readiness, we compare indexing algorithms: MUVERA+Rerank is 3.33$\times$ faster than PLAID and offers +1.7\% relative mAP gain. This enables low-latency retrieval, with ColmmBERT-base-TR achieving 0.54 ms query times under MUVERA. We release all checkpoints, configs, and evaluation scripts. Limitations include reliance on moderately sized datasets ($\leq$50K documents) and translated benchmarks, which may not fully reflect real-world Turkish retrieval conditions; larger-scale MUVERA evaluations remain necessary.
Yang Xu, Zuliang Yang, Kai Ming Ting
Trajectory similarity retrieval is an important part of spatiotemporal data mining, however, existing methods have the following limitations: traditional metrics are computationally expensive, while learning-based methods suffer from substantial training costs and potential instability. This paper addresses these problems by proposing \textbf{Geo}metric \textbf{P}rototype \textbf{T}rajectory \textbf{H}ashing (GeoPTH), a novel, lightweight, and non-learning framework for efficient category-based trajectory retrieval. GeoPTH constructs data-dependent hash functions by using representative trajectory prototypes, i.e., small point sets preserving geometric characteristics, as anchors. The hashing process is efficient, which involves mapping a new trajectory to its closest prototype via a robust, \textit{Hausdorff} metric. Extensive experiments show that GeoPTH's retrieval accuracy is highly competitive with both traditional metrics and state-of-the-art learning methods, and it significantly outperforms binary codes generated through simple binarization of the learned embeddings. Critically, GeoPTH consistently outperforms all competitors in terms of efficiency. Our work demonstrates that a lightweight, prototype-centric approach offers a practical and powerful alternative, achieving an exceptional retrieval performance and computational efficiency.
Qingao Dong, Mengfei Wang, Hengzhi Zhang, Zhichao Li, Yuan Yuan, Mu Li, Xiang Gao, Hailong Sun et al.
Large language model (LLM) agents have recently shown strong performance on repository-level issue resolution, but existing systems are almost exclusively designed for Python and rely heavily on lexical retrieval and shallow code navigation. These approaches transfer poorly to C++ projects, where overloaded identifiers, nested namespaces, template instantiations, and deep control-flow structures make context retrieval and fault localization substantially more difficult. As a result, state-of-the-art Python-oriented agents show a drastic performance drop on the C++ subset of MultiSWE-bench. We introduce INFCODE-C++, the first C++-aware autonomous system for end-to-end issue resolution. The system combines two complementary retrieval mechanisms -- semantic code-intent retrieval and deterministic AST-structured querying -- to construct accurate, language-aware context for repair.These components enable precise localization and robust patch synthesis in large, statically typed C++ repositories. Evaluated on the \texttt{MultiSWE-bench-CPP} benchmark, INFCODE-C++ achieves a resolution rate of 25.58\%, outperforming the strongest prior agent by 10.85 percentage points and more than doubling the performance of MSWE-agent. Ablation and behavioral studies further demonstrate the critical role of semantic retrieval, structural analysis, and accurate reproduction in C++ issue resolution. INFCODE-C++ highlights the need for language-aware reasoning in multi-language software agents and establishes a foundation for future research on scalable, LLM-driven repair for complex, statically typed ecosystems.
Shuang Guan, Kasso A. Okoudjou
We investigate the uniqueness of short-time Fourier transform phase retrieval problems in $L^2(\mathbb{R})$. In particular, for underlying window functions whose Fourier transform decay faster than any exponential function, we derive sufficient conditions on discrete sampling sets for unique phase retrieval from the spectrogram. This result generalizes previous uniqueness guarantees on sampling sets for Gaussian windows.
Ao Xie, Jiahui Chen, Quanzhi Zhu, Xiaoze Jiang, Zhiheng Qin, Enyun Yu, Han Li
Dense retrieval has become a foundational paradigm in modern search systems, especially on short-video platforms. However, most industrial systems adopt a self-reinforcing training pipeline that relies on historically exposed user interactions for supervision. This paradigm inevitably leads to a filter bubble effect, where potentially relevant but previously unseen content is excluded from the training signal, biasing the model toward narrow and conservative retrieval. In this paper, we present CroPS (Cross-Perspective Positive Samples), a novel retrieval data engine designed to alleviate this problem by introducing diverse and semantically meaningful positive examples from multiple perspectives. CroPS enhances training with positive signals derived from user query reformulation behavior (query-level), engagement data in recommendation streams (system-level), and world knowledge synthesized by large language models (knowledge-level). To effectively utilize these heterogeneous signals, we introduce a Hierarchical Label Assignment (HLA) strategy and a corresponding H-InfoNCE loss that together enable fine-grained, relevance-aware optimization. Extensive experiments conducted on Kuaishou Search, a large-scale commercial short-video search platform, demonstrate that CroPS significantly outperforms strong baselines both offline and in live A/B tests, achieving superior retrieval performance and reducing query reformulation rates. CroPS is now fully deployed in Kuaishou Search, serving hundreds of millions of users daily.
Authors' comments: AAAI-2026, Oral
Qing Wang, Chong-Wah Ngo, Ee-Peng Lim
This paper addresses the challenges of learning representations for recipes and food images in the cross-modal retrieval problem. As the relationship between a recipe and its cooked dish is cause-and-effect, treating a recipe as a text source describing the visual appearance of a dish for learning representation, as the existing approaches, will create bias misleading image-and-recipe similarity judgment. Specifically, a food image may not equally capture every detail in a recipe, due to factors such as the cooking process, dish presentation, and image-capturing conditions. The current representation learning tends to capture dominant visual-text alignment while overlooking subtle variations that determine retrieval relevance. In this paper, we model such bias in cross-modal representation learning using causal theory. The causal view of this problem suggests ingredients as one of the confounder sources and a simple backdoor adjustment can alleviate the bias. By causal intervention, we reformulate the conventional model for food-to-recipe retrieval with an additional term to remove the potential bias in similarity judgment. Based on this theory-informed formulation, we empirically prove the oracle performance of retrieval on the Recipe1M dataset to be MedR=1 across the testing data sizes of 1K, 10K, and even 50K. We also propose a plug-and-play neural module, which is essentially a multi-label ingredient classifier for debiasing. New state-of-the-art search performances are reported on the Recipe1M dataset.
Dawn Lawrie, James Mayfield, Eugene Yang, Andrew Yates, Sean MacAvaney, Ronak Pradeep, Scott Miller, Paul McNamee et al.
To measure advances in retrieval, test collections with relevance judgments that can faithfully distinguish systems are required. This paper presents NeuCLIRBench, an evaluation collection for cross-language and multilingual retrieval. The collection consists of documents written natively in Chinese, Persian, and Russian, as well as those same documents machine translated into English. The collection supports several retrieval scenarios including: monolingual retrieval in English, Chinese, Persian, or Russian; cross-language retrieval with English as the query language and one of the other three languages as the document language; and multilingual retrieval, again with English as the query language and relevant documents in all three languages. NeuCLIRBench combines the TREC NeuCLIR track topics of 2022, 2023, and 2024. The 250,128 judgments across approximately 150 queries for the monolingual and cross-language tasks and 100 queries for multilingual retrieval provide strong statistical discriminatory power to distinguish retrieval approaches. A fusion baseline of strong neural retrieval systems is included with the collection so that developers of reranking algorithms are no longer reliant on BM25 as their first-stage retriever. NeuCLIRBench is publicly available.
Authors' comments: 14 pages, 1 figure
Hang Ding, Yilun Zhao, Tiansheng Hu, Manasi Patwardhan, Arman Cohan
The accelerating growth of scientific publications has intensified the need for scalable, trustworthy systems to synthesize knowledge across diverse literature. While recent retrieval-augmented generation (RAG) methods have improved access to scientific information, they often overlook citation graph structure, adapt poorly to complex queries, and yield fragmented, hard-to-verify syntheses. We introduce SciRAG, an open-source framework for scientific literature exploration that addresses these gaps through three key innovations: (1) adaptive retrieval that flexibly alternates between sequential and parallel evidence gathering; (2) citation-aware symbolic reasoning that leverages citation graphs to organize and filter supporting documents; and (3) outline-guided synthesis that plans, critiques, and refines answers to ensure coherence and transparent attribution. Extensive experiments across multiple benchmarks such as QASA and ScholarQA demonstrate that SciRAG outperforms prior systems in factual accuracy and synthesis quality, establishing a new foundation for reliable, large-scale scientific knowledge aggregation.