benty-fields - Search paper

Pretrained language models have shown strong effectiveness in code-related tasks, such as code retrieval, code generation, code summarization, and code completion tasks. In this paper, we propose COde assistaNt viA retrieval-augmeNted language model (CONAN), which aims to build a code assistant by mimicking the knowledge-seeking behaviors of humans during coding. Specifically, it consists of a code structure aware retriever (CONAN-R) and a dual-view code representation-based retrieval-augmented generation model (CONAN-G). CONAN-R pretrains CodeT5 using Code-Documentation Alignment and Masked Entity Prediction tasks to make language models code structure-aware and learn effective representations for code snippets and documentation. Then CONAN-G designs a dual-view code representation mechanism for implementing a retrieval-augmented code generation model. CONAN-G regards the code documentation descriptions as prompts, which help language models better understand the code semantics. Our experiments show that CONAN achieves convincing performance on different code generation tasks and significantly outperforms previous retrieval augmented code generation models. Our further analyses show that CONAN learns tailored representations for both code snippets and documentation by aligning code-documentation data pairs and capturing structural semantics by masking and predicting entities in the code data. Additionally, the retrieved code snippets and documentation provide necessary information from both program language and natural language to assist the code generation process. CONAN can also be used as an assistant for Large Language Models (LLMs), providing LLMs with external knowledge in shorter code document lengths to improve their effectiveness on various code tasks. It shows the ability of CONAN to extract necessary information and help filter out the noise from retrieved code documents.

Vote

Add to Library

Recommend

5732. Deep Class-guided Hashing for Multi-label Cross-modal Retrieval

Hao Chen, Lei Zhu, Xinghui Zhu

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2410.15387v1

Vote

Add to Library

Recommend

5733. PReP: Efficient context-based shape retrieval for missing parts

Vlassis Fotis, Ioannis Romanelis, Georgios Mylonas, Athanasios Kalogeras, Konstantinos Moustakas

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2410.14245v1

Vote

Add to Library

Recommend

5734. Enhancing Retrieval Performance: An Ensemble Approach For Hard Negative Mining

Hansa Meghwani

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2411.02404v1

Ranking consistently emerges as a primary focus in information retrieval research. Retrieval and ranking models serve as the foundation for numerous applications, including web search, open domain QA, enterprise domain QA, and text-based recommender systems. Typically, these models undergo training on triplets consisting of binary relevance assignments, comprising one positive and one negative passage. However, their utilization involves a context where a significantly more nuanced understanding of relevance is necessary, especially when re-ranking a large pool of potentially relevant passages. Although collecting positive examples through user feedback like impressions or clicks is straightforward, identifying suitable negative pairs from a vast pool of possibly millions or even billions of documents possess a greater challenge. Generating a substantial number of negative pairs is often necessary to maintain the high quality of the model. Several approaches have been suggested in literature to tackle the issue of selecting suitable negative pairs from an extensive corpus. This study focuses on explaining the crucial role of hard negatives in the training process of cross-encoder models, specifically aiming to explain the performance gains observed with hard negative sampling compared to random sampling. We have developed a robust hard negative mining technique for efficient training of cross-encoder re-rank models on an enterprise dataset which has domain specific context. We provide a novel perspective to enhance retrieval models, ultimately influencing the performance of advanced LLM systems like Retrieval-Augmented Generation (RAG) and Reasoning and Action Agents (ReAct). The proposed approach demonstrates that learning both similarity and dissimilarity simultaneously with cross-encoders improves performance of retrieval systems.
Authors' comments: Master's thesis

Vote

Add to Library

Recommend

5735. Class-RAG: Real-Time Content Moderation with Retrieval Augmented Generation

Jianfa Chen, Emily Shen, Trupti Bavalatti, Xiaowen Lin, Yongkai Wang, Shuming Hu, Harihar Subramanyam, Ksheeraj Sai Vepuri et al.

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2410.14881v2

Vote

Add to Library

Recommend

5736. LLM-based Unit Test Generation via Property Retrieval

Zhe Zhang, Xingyu Liu, Yuanzhang Lin, Xiang Gao, Hailong Sun, Yuan Yuan

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2410.13542v1

Vote

Add to Library

Recommend

5737. Phase retrieval from short-time Fourier transform in LCA groups

Natalia Accomazzo, Daniel Carando, Rocio Nores, Victoria Paternostro, Sebastian Velazquez

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2410.13309v1

Vote

Add to Library

Recommend

5738. Retrieval of Temporal Event Sequences from Textual Descriptions

Zefang Liu, Yinzhu Quan

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2410.14043v2

Vote

Add to Library

Recommend

5739. FIRE: Fact-checking with Iterative Retrieval and Verification

Zhuohan Xie, Rui Xing, Yuxia Wang, Jiahui Geng, Hasan Iqbal, Dhruv Sahnan, Iryna Gurevych, Preslav Nakov

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2411.00784v2

Vote

Add to Library

Recommend

5740. RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards

Xinze Li, Sen Mei, Zhenghao Liu, Yukun Yan, Shuo Wang, Shi Yu, Zheni Zeng, Hao Chen et al.

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2410.13509v2

Vote

Add to Library

Recommend

Benty-search

5721. Retrieval-Augmented Diffusion Models for Time Series Forecasting

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2410.18712v1

5722. Understanding Ranking LLMs: A Mechanistic Analysis for Information Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2410.18527v2

5723. Retrieving Implicit and Explicit Emotional Events Using Large Language Models

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2410.19128v2

5724. POSEIDON: A Multidimensional Atmospheric Retrieval Code for Exoplanet Spectra

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2410.18181v1

5725. PtychoFormer: A Transformer-based Model for Ptychographic Phase Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2410.17377v1

5726. ProveRAG: Provenance-Driven Vulnerability Analysis with Automated Retrieval-Augmented LLMs

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2410.17406v2

5727. Beyond Retrieval: Generating Narratives in Conversational Recommender Systems

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2410.16780v2

5728. Partial Orientation Retrieval of Proteins From Coulomb Explosions

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2410.15965v1

5729. RAC: Efficient LLM Factuality Correction with Retrieval Augmentation

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2410.15667v1

5730. Test-time Adaptation for Cross-modal Retrieval with Query Shift

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2410.15624v1

5731. Building A Coding Assistant via the Retrieval-Augmented Language Model

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2410.16229v2

5732. Deep Class-guided Hashing for Multi-label Cross-modal Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2410.15387v1

5733. PReP: Efficient context-based shape retrieval for missing parts

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2410.14245v1

5734. Enhancing Retrieval Performance: An Ensemble Approach For Hard Negative Mining

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2411.02404v1

5735. Class-RAG: Real-Time Content Moderation with Retrieval Augmented Generation

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2410.14881v2

5736. LLM-based Unit Test Generation via Property Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2410.13542v1

5737. Phase retrieval from short-time Fourier transform in LCA groups

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2410.13309v1

5738. Retrieval of Temporal Event Sequences from Textual Descriptions

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2410.14043v2

5739. FIRE: Fact-checking with Iterative Retrieval and Verification

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2411.00784v2

5740. RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2410.13509v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2410.18712v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2410.18527v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2410.19128v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2410.18181v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2410.17377v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2410.17406v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2410.16780v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2410.15965v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2410.15667v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2410.15624v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2410.16229v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2410.15387v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2410.14245v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2411.02404v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2410.14881v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2410.13542v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2410.13309v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2410.14043v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2411.00784v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2410.13509v2