benty-fields - Search paper

2761. NanoVDR: Distilling a 2B Vision-Language Retriever into a 70M Text-Only Encoder for Visual Document Retrieval

Zhuchenyang Liu, Yao Zhang, Yu Xiao

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.12824v1

Vote

Add to Library

Recommend

2762. Can Small Language Models Use What They Retrieve? An Empirical Study of Retrieval Utilization Across Model Scale

Sanchit Pandey

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.11513v1

Retrieval augmented generation RAG is widely deployed to improve factual accuracy in language models yet it remains unclear whether smaller models of size 7B parameters or less can effectively utilize retrieved information. To investigate this question we evaluate five model sizes from 360M to 8B across three architecture families SmolLM2 Qwen2.5 and Llama 3.1 under four retrieval conditions including no retrieval BM25 dense retrieval using E5 large v2 and oracle retrieval where the retrieved passage is guaranteed to contain the answer. We introduce a parametric knowledge split that separates questions a model can already answer from those that require external knowledge which allows us to isolate utilization failure from retrieval quality failure. We find three main results. First even with oracle retrieval models of size 7B or smaller fail to extract the correct answer 85 to 100 percent of the time on questions they cannot answer alone which indicates a fundamental utilization bottleneck. Second adding retrieval context destroys 42 to 100 percent of answers the model previously knew suggesting a distraction effect driven by the presence of context rather than its quality. Third an error analysis of 2588 oracle failures shows that the dominant failure mode is irrelevant generation where the model ignores the provided context entirely. These patterns hold across multiple prompt templates and retrieval methods. The results indicate that for models below 7B parameters the main limitation of RAG is context utilization rather than retrieval quality and that deploying RAG at this scale can lead to a net negative trade off under standard evaluation conditions.
Authors' comments: 10 pages, 5 figures, planning to submit to arr march 2026. Code and evaluation data: https://anonymous.4open.science/r/rag-utilization-study-C67F . Earlier draft preprint available on Zenodo: https://zenodo.org/records/18870116 (note: this arXiv submission is an updated draft)

Vote

Add to Library

Recommend

2763. Higress-RAG: A Holistic Optimization Framework for Enterprise Retrieval-Augmented Generation via Dual Hybrid Retrieval, Adaptive Routing, and CRAG

Weixi Lin

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.23374v1

Vote

Add to Library

Recommend

2764. Comparative Analysis of Neural Retriever-Reranker Pipelines for Retrieval-Augmented Generation over Knowledge Graphs in E-commerce Applications

Teri Rumble, Zbyněk Gazdík, Javad Zarrin, Jagdeep Ahluwalia

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.22219v1

Vote

Add to Library

Recommend

2765. Comparison of Text-Based and Image-Based Retrieval in Multimodal Retrieval Augmented Generation Large Language Model Systems

Elias Lumer, Alex Cardenas, Matt Melich, Myles Mason, Sara Dieter, Vamse Kumar Subbiah, Pradeep Honaganahalli Basavaraju, Roberto Hernandez

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2511.16654v1

Vote

Add to Library

Recommend

2766. Knobs and dials of retrieving JWST transmission spectra. II. Impacts of pipeline-level differences on retrieval posteriors

Simon Schleich, Sudeshna Boro Saikia, Quentin Changeat, Manuel Güdel, Aiko Voigt, Ingo Waldmann

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2511.05652v1

Since the launch of JWST, observations of exoplanetary atmospheres have seen a revolution in data quality. Given that atmospheric parameter inferences depend heavily on the underlying data, a re-evaluation of current methodologies is warranted to assess the reliability of these results. We investigate the impact of variations in input spectra on atmospheric retrievals for the hot Jupiter WASP-39 b using JWST transit data. Specifically, we analyse the reliability of parameter estimations from random perturbations of the underlying spectrum and their sensitivity to three transmission spectra derived from the same observational data. Using the NIRSpec PRISM observation from a single transit of WASP-39 b, we perform retrievals with the TauREx framework. As a baseline, we use a spectrum derived with the Eureka! data reduction pipeline. To evaluate retrieval reliability, we analyse posterior distributions under deviations from this spectrum. We simulate random noise by performing retrievals on scattered instances of this spectrum and compare them with retrievals based on existing spectra reduced from the same raw observation. Our analysis identifies three types of posterior distributions: (1) Stable, Gaussian distributions for species constrained across the entire spectrum (e.g., H2O, CO2); (2) Uniform posteriors with upper bounds for weakly constrained species (e.g., CO, CH4); and (3) Unstable, heavy-tailed posteriors for species constrained by minor spectrum features (e.g., SO2, C2H2). We find that other parameters, such as the planetary radius and p-T profile, are stable under spectral perturbations. Posterior distributions differ for retrievals on independently reduced transmission spectra from the same raw data, complicating interpretation, particularly for skewed distributions. Based on this, we advocate for careful assessment and selection of credible interval sizes to reflect this.
Authors' comments: 20 pages, 12 figures. Accepted for publication in Astronomy and Astrophysics

Vote

Add to Library

Recommend

2767. What's the Best Way to Retrieve Slides? A Comparative Study of Multimodal, Caption-Based, and Hybrid Retrieval Techniques

Petros Stylianos Giouroukis, Dimitris Dimitriadis, Dimitrios Papadopoulos, Zhenwen Shao, Grigorios Tsoumakas

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2509.15211v1

Vote

Add to Library

Recommend

2768. DS@GT at CheckThat! 2025: Exploring Retrieval and Reranking Pipelines for Scientific Claim Source Retrieval on Social Media Discourse

Jeanette Schofield, Shuyu Tian, Hoang Thanh Thanh Truong, Maximilian Heil

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.06563v1

Vote

Add to Library

Recommend

2769. Weak-to-Strong GraphRAG: Aligning Weak Retrievers with Large Language Models for Graph-based Retrieval Augmented Generation

Deyu Zou, Yongqiang Chen, Mufei Li, Siqi Miao, Chenxi Liu, Bo Han, James Cheng, Pan Li

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2506.22518v1

Vote

Add to Library

Recommend

2770. QUST_NLP at SemEval-2025 Task 7: A Three-Stage Retrieval Framework for Monolingual and Crosslingual Fact-Checked Claim Retrieval

Youzheng Liu, Jiyan Liu, Xiaoman Xu, Taihang Wang, Yimin Wang, Ye Jiang

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2506.17272v1

Vote

Add to Library

Recommend

2771. Deep Retrieval at CheckThat! 2025: Identifying Scientific Papers from Implicit Social Media Mentions via Hybrid Retrieval and Re-Ranking

Pascal J. Sager, Ashwini Kamaraj, Benjamin F. Grewe, Thilo Stadelmann

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.23250v1

Vote

Add to Library

Recommend

2772. SkewRoute: Training-Free LLM Routing for Knowledge Graph Retrieval-Augmented Generation via Score Skewness of Retrieved Context

Hairu Wang, Yuan Feng, Yukun Cao, Xike Xie, S Kevin Zhou

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.23841v1

Vote

Add to Library

Recommend

2773. A Generative-AI-Driven Claim Retrieval System Capable of Detecting and Retrieving Claims from Social Media Platforms in Multiple Languages

Ivan Vykopal, Martin Hyben, Robert Moro, Michal Gregor, Jakub Simko

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.20668v1

Vote

Add to Library

Recommend

2774. Chats-Grid: An Iterative Retrieval Q&A Optimization Scheme Leveraging Large Model and Retrieval Enhancement Generation in smart grid

Yunfeng Li, Jiqun Zhang, Guofu Liao, Xue Shi, Junhong Liu

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2502.15583v1

Vote

Add to Library

Recommend

2775. Open-Source Retrieval Augmented Generation Framework for Retrieving Accurate Medication Insights from Formularies for African Healthcare Workers

Axum AI, :, J. Owoyemi, S. Abubakar, A. Owoyemi, T. O. Togunwa, F. C. Madubuko, S. Oyatoye et al.

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2502.15722v1

Vote

Add to Library

Recommend

2776. Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval

Yuanmin Tang, Xiaoting Qin, Jue Zhang, Jing Yu, Gaopeng Gou, Gang Xiong, Qingwei Ling, Saravan Rajmohan et al.

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2412.11077v2

Vote

Add to Library

Recommend

2777. IRSC: A Zero-shot Evaluation Benchmark for Information Retrieval through Semantic Comprehension in Retrieval-Augmented Generation Scenarios

Hai Lin, Shaoxiong Zhan, Junyou Su, Haitao Zheng, Hui Wang

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2409.15763v2

Vote

Add to Library

Recommend

2778. Enhancing Retrieval and Managing Retrieval: A Four-Module Synergy for Improved Quality and Efficiency in RAG Systems

Yunxiao Shi, Xing Zi, Zijing Shi, Haimin Zhang, Qiang Wu, Min Xu

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2407.10670v1

Vote

Add to Library

Recommend

2779. Context-augmented Retrieval: A Novel Framework for Fast Information Retrieval based Response Generation using Large Language Model

Sai Ganesh, Anupam Purwar, Gautam B

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2406.16383v1

Vote

Add to Library

Recommend

2780. Npix2Cpix: A GAN-based Image-to-Image Translation Network with Retrieval-Classification Integration for Watermark Retrieval from Historical Document Images

Utsab Saha, Sawradip Saha, Shaikh Anowarul Fattah, Mohammad Saquib

IEEE Access, 12, 95857-95870 (2024)

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2406.03556v2

Vote

Add to Library

Recommend

Benty-search

2761. NanoVDR: Distilling a 2B Vision-Language Retriever into a 70M Text-Only Encoder for Visual Document Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2603.12824v1

2762. Can Small Language Models Use What They Retrieve? An Empirical Study of Retrieval Utilization Across Model Scale

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2603.11513v1

2763. Higress-RAG: A Holistic Optimization Framework for Enterprise Retrieval-Augmented Generation via Dual Hybrid Retrieval, Adaptive Routing, and CRAG

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.23374v1

2764. Comparative Analysis of Neural Retriever-Reranker Pipelines for Retrieval-Augmented Generation over Knowledge Graphs in E-commerce Applications

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.22219v1

2765. Comparison of Text-Based and Image-Based Retrieval in Multimodal Retrieval Augmented Generation Large Language Model Systems

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2511.16654v1

2766. Knobs and dials of retrieving JWST transmission spectra. II. Impacts of pipeline-level differences on retrieval posteriors

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2511.05652v1

2767. What's the Best Way to Retrieve Slides? A Comparative Study of Multimodal, Caption-Based, and Hybrid Retrieval Techniques

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2509.15211v1

2768. DS@GT at CheckThat! 2025: Exploring Retrieval and Reranking Pipelines for Scientific Claim Source Retrieval on Social Media Discourse

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2507.06563v1

2769. Weak-to-Strong GraphRAG: Aligning Weak Retrievers with Large Language Models for Graph-based Retrieval Augmented Generation

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2506.22518v1

2770. QUST_NLP at SemEval-2025 Task 7: A Three-Stage Retrieval Framework for Monolingual and Crosslingual Fact-Checked Claim Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2506.17272v1

2771. Deep Retrieval at CheckThat! 2025: Identifying Scientific Papers from Implicit Social Media Mentions via Hybrid Retrieval and Re-Ranking

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2505.23250v1

2772. SkewRoute: Training-Free LLM Routing for Knowledge Graph Retrieval-Augmented Generation via Score Skewness of Retrieved Context

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2505.23841v1

2773. A Generative-AI-Driven Claim Retrieval System Capable of Detecting and Retrieving Claims from Social Media Platforms in Multiple Languages

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2504.20668v1

2774. Chats-Grid: An Iterative Retrieval Q&A Optimization Scheme Leveraging Large Model and Retrieval Enhancement Generation in smart grid

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2502.15583v1

2775. Open-Source Retrieval Augmented Generation Framework for Retrieving Accurate Medication Insights from Formularies for African Healthcare Workers

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2502.15722v1

2776. Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2412.11077v2

2777. IRSC: A Zero-shot Evaluation Benchmark for Information Retrieval through Semantic Comprehension in Retrieval-Augmented Generation Scenarios

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2409.15763v2

2778. Enhancing Retrieval and Managing Retrieval: A Four-Module Synergy for Improved Quality and Efficiency in RAG Systems

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2407.10670v1

2779. Context-augmented Retrieval: A Novel Framework for Fast Information Retrieval based Response Generation using Large Language Model

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2406.16383v1

2780. Npix2Cpix: A GAN-based Image-to-Image Translation Network with Retrieval-Classification Integration for Watermark Retrieval from Historical Document Images

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2406.03556v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.12824v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.11513v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.23374v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.22219v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2511.16654v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2511.05652v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2509.15211v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.06563v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2506.22518v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2506.17272v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.23250v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.23841v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.20668v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2502.15583v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2502.15722v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2412.11077v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2409.15763v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2407.10670v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2406.16383v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2406.03556v2