benty-fields - Search paper

8781. SelfRACG: Enabling LLMs to Self-Express and Retrieve for Code Generation

Qian Dong, Jia Chen, Qingyao Ai, Hongning Wang, Haitao Li, Yi Wu, Yao Hu, Yiqun Liu et al.

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.19033v1

Vote

Add to Library

Recommend

8782. A Systematic Review of Key Retrieval-Augmented Generation (RAG) Systems: Progress, Gaps, and Future Directions

Agada Joseph Oche, Ademola Glory Folashade, Tirthankar Ghosal, Arpan Biswas

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.18910v1

Vote

Add to Library

Recommend

8783. DR.EHR: Dense Retrieval for Electronic Health Record with Knowledge Injection and Synthetic Data

Zhengyun Zhao, Huaiyuan Ying, Yue Zhong, Sheng Yu

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.18583v1

Vote

Add to Library

Recommend

8784. Transform Before You Query: A Privacy-Preserving Approach for Vector Retrieval with Embedding Space Alignment

Ruiqi He, Zekun Fei, Jiaqi Li, Xinyuan Zhu, Biao Yi, Siyi Lv, Weijie Liu, Zheli Liu

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.18518v1

Vote

Add to Library

Recommend

8785. A Deep Dive into Retrieval-Augmented Generation for Code Completion: Experience on WeChat

Zezhou Yang, Ting Peng, Cuiyun Gao, Chaozheng Wang, Hailiang Huang, Yuetang Deng

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.18515v1

Code completion, a crucial task in software engineering that enhances developer productivity, has seen substantial improvements with the rapid advancement of large language models (LLMs). In recent years, retrieval-augmented generation (RAG) has emerged as a promising method to enhance the code completion capabilities of LLMs, which leverages relevant context from codebases without requiring model retraining. While existing studies have demonstrated the effectiveness of RAG on public repositories and benchmarks, the potential distribution shift between open-source and closed-source codebases presents unique challenges that remain unexplored. To mitigate the gap, we conduct an empirical study to investigate the performance of widely-used RAG methods for code completion in the industrial-scale codebase of WeChat, one of the largest proprietary software systems. Specifically, we extensively explore two main types of RAG methods, namely identifier-based RAG and similarity-based RAG, across 26 open-source LLMs ranging from 0.5B to 671B parameters. For a more comprehensive analysis, we employ different retrieval techniques for similarity-based RAG, including lexical and semantic retrieval. Based on 1,669 internal repositories, we achieve several key findings: (1) both RAG methods demonstrate effectiveness in closed-source repositories, with similarity-based RAG showing superior performance, (2) the effectiveness of similarity-based RAG improves with more advanced retrieval techniques, where BM25 (lexical retrieval) and GTE-Qwen (semantic retrieval) achieve superior performance, and (3) the combination of lexical and semantic retrieval techniques yields optimal results, demonstrating complementary strengths. Furthermore, we conduct a developer survey to validate the practical utility of RAG methods in real-world development environments.
Authors' comments: Accepted in ICSME 25 Industry Track

Vote

Add to Library

Recommend

8786. TDR: Task-Decoupled Retrieval with Fine-Grained LLM Feedback for In-Context Learning

Yifu Chen, Bingchen Huang, Zhiling Wang, Yuanchao Du, Junfeng Luo, Lei Shen, Zhineng chen

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.18340v1

Vote

Add to Library

Recommend

8787. VERIRAG: Healthcare Claim Verification via Statistical Audit in Retrieval-Augmented Generation

Shubham Mohole, Hongjun Choi, Shusen Liu, Christine Klymko, Shashank Kushwaha, Derek Shi, Wesam Sakla, Sainyam Galhotra et al.

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.17948v1

Vote

Add to Library

Recommend

8788. Content-based 3D Image Retrieval and a ColBERT-inspired Re-ranking for Tumor Flagging and Staging

Farnaz Khun Jush, Steffen Vogler, Matthias Lenga

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.17412v1

Vote

Add to Library

Recommend

8789. EndoFinder: Online Lesion Retrieval for Explainable Colorectal Polyp Diagnosis Leveraging Latent Scene Representations

Ruijie Yang, Yan Zhu, Peiyao Fu, Yizhe Zhang, Zhihua Wang, Quanlin Li, Pinghong Zhou, Xian Yang et al.

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.17323v1

Vote

Add to Library

Recommend

8790. PRGB Benchmark: A Robust Placeholder-Assisted Algorithm for Benchmarking Retrieval-Augmented Generation

Zhehao Tan, Yihan Jiao, Dan Yang, Lei Liu, Jie Feng, Duolin Sun, Yue Shen, Jian Wang et al.

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.22927v1

Vote

Add to Library

Recommend

8791. Never Come Up Empty: Adaptive HyDE Retrieval for Improving LLM Developer Support

Fangjian Lei, Mariam El Mezouar, Shayan Noei, Ying Zou

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.16754v1

Vote

Add to Library

Recommend

8792. mRAKL: Multilingual Retrieval-Augmented Knowledge Graph Construction for Low-Resourced Languages

Hellina Hailu Nigatu, Min Li, Maartje ter Hoeve, Saloni Potdar, Sarah Chasins

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.16011v1

Vote

Add to Library

Recommend

8793. Convergence Analysis of Reshaped Wirtinger Flow with Random Initialization for Phase Retrieval

Linbin Li, Haiyang Peng, Yong Xia, Meng Huang

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.15684v1

Vote

Add to Library

Recommend

8794. Learning to Extract Rational Evidence via Reinforcement Learning for Retrieval-Augmented Generation

Xinping Zhao, Shouzheng Huang, Yan Zhong, Xinshuo Hu, Baotian Hu, Min Zhang

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.15586v1

Vote

Add to Library

Recommend

8795. PhishIntentionLLM: Uncovering Phishing Website Intentions through Multi-Agent Retrieval-Augmented Generation

Wenhao Li, Selvakumar Manickam, Yung-wey Chong, Shankar Karuppayah

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.15419v1

Vote

Add to Library

Recommend

8796. SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search

Xiaofeng Shi, Yuduo Li, Qian Kou, Longbin Yu, Jinxin Xie, Hua Zhou

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.15245v1

Vote

Add to Library

Recommend

8797. DeRAG: Black-box Adversarial Attacks on Multiple Retrieval-Augmented Generation Applications via Prompt Injection

Jerry Wang, Fang Yu

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.15042v1

Vote

Add to Library

Recommend

8798. FullRecall: A Semantic Search-Based Ranking Approach for Maximizing Recall in Patent Retrieval

Amna Ali, Liyanage C. De Silva, Pg Emeroylariffion Abas

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.14946v1

Vote

Add to Library

Recommend

8799. U-MARVEL: Unveiling Key Factors for Universal Multimodal Retrieval via Embedding Learning with MLLMs

Xiaojie Li, Chu Li, Shi-Zhe Chen, Xi Chen

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.14902v1

Vote

Add to Library

Recommend

8800. Optimizing Legal Document Retrieval in Vietnamese with Semi-Hard Negative Mining

Van-Hoang Le, Duc-Vu Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.14619v1

Vote

Add to Library

Recommend

Benty-search

8781. SelfRACG: Enabling LLMs to Self-Express and Retrieve for Code Generation

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2507.19033v1

8782. A Systematic Review of Key Retrieval-Augmented Generation (RAG) Systems: Progress, Gaps, and Future Directions

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2507.18910v1

8783. DR.EHR: Dense Retrieval for Electronic Health Record with Knowledge Injection and Synthetic Data

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2507.18583v1

8784. Transform Before You Query: A Privacy-Preserving Approach for Vector Retrieval with Embedding Space Alignment

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2507.18518v1

8785. A Deep Dive into Retrieval-Augmented Generation for Code Completion: Experience on WeChat

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2507.18515v1

8786. TDR: Task-Decoupled Retrieval with Fine-Grained LLM Feedback for In-Context Learning

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2507.18340v1

8787. VERIRAG: Healthcare Claim Verification via Statistical Audit in Retrieval-Augmented Generation

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2507.17948v1

8788. Content-based 3D Image Retrieval and a ColBERT-inspired Re-ranking for Tumor Flagging and Staging

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2507.17412v1

8789. EndoFinder: Online Lesion Retrieval for Explainable Colorectal Polyp Diagnosis Leveraging Latent Scene Representations

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2507.17323v1

8790. PRGB Benchmark: A Robust Placeholder-Assisted Algorithm for Benchmarking Retrieval-Augmented Generation

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2507.22927v1

8791. Never Come Up Empty: Adaptive HyDE Retrieval for Improving LLM Developer Support

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2507.16754v1

8792. mRAKL: Multilingual Retrieval-Augmented Knowledge Graph Construction for Low-Resourced Languages

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2507.16011v1

8793. Convergence Analysis of Reshaped Wirtinger Flow with Random Initialization for Phase Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2507.15684v1

8794. Learning to Extract Rational Evidence via Reinforcement Learning for Retrieval-Augmented Generation

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2507.15586v1

8795. PhishIntentionLLM: Uncovering Phishing Website Intentions through Multi-Agent Retrieval-Augmented Generation

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2507.15419v1

8796. SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2507.15245v1

8797. DeRAG: Black-box Adversarial Attacks on Multiple Retrieval-Augmented Generation Applications via Prompt Injection

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2507.15042v1

8798. FullRecall: A Semantic Search-Based Ranking Approach for Maximizing Recall in Patent Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2507.14946v1

8799. U-MARVEL: Unveiling Key Factors for Universal Multimodal Retrieval via Embedding Learning with MLLMs

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2507.14902v1

8800. Optimizing Legal Document Retrieval in Vietnamese with Semi-Hard Negative Mining

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2507.14619v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.19033v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.18910v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.18583v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.18518v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.18515v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.18340v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.17948v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.17412v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.17323v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.22927v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.16754v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.16011v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.15684v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.15586v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.15419v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.15245v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.15042v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.14946v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.14902v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2507.14619v1