benty-fields - Search paper

Recently, generative retrieval emerges as a promising alternative to traditional retrieval paradigms. It assigns each document a unique identifier, known as DocID, and employs a generative model to directly generate the relevant DocID for the input query. A common choice for DocID is one or several natural language sequences, e.g. the title or n-grams, so that the pre-trained knowledge of the generative model can be utilized. However, a sequence is generated token by token, where only the most likely candidates are kept and the rest are pruned at each decoding step, thus, retrieval fails if any token within the relevant DocID is falsely pruned. What's worse, during decoding, the model can only perceive preceding tokens in DocID while being blind to subsequent ones, hence is prone to make such errors. To address this problem, we present a novel framework for generative retrieval, dubbed Term-Set Generation (TSGen). Instead of sequences, we use a set of terms as DocID, which are automatically selected to concisely summarize the document's semantics and distinguish it from others. On top of the term-set DocID, we propose a permutation-invariant decoding algorithm, with which the term set can be generated in any permutation yet will always lead to the corresponding document. Remarkably, TSGen perceives all valid terms rather than only the preceding ones at each decoding step. Given the constant decoding space, it can make more reliable decisions due to the broader perspective. TSGen is also resilient to errors: the relevant DocID will not be pruned as long as the decoded term belongs to it. Lastly, we design an iterative optimization procedure to incentivize the model to generate the relevant term set in its favorable permutation. We conduct extensive experiments on popular benchmarks, which validate the effectiveness, the generalizability, the scalability, and the efficiency of TSGen.

Vote

Add to Library

Recommend

3868. Retrieval-augmented Multi-label Text Classification

Ilias Chalkidis, Yova Kementchedjhieva

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2305.13058v1

Vote

Add to Library

Recommend

3869. Soft Prompt Decoding for Multilingual Dense Retrieval

Zhiqi Huang, Hansi Zeng, Hamed Zamani, James Allan

Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (2023)

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2305.09025v1

Vote

Add to Library

Recommend

3870. Multilingual Previously Fact-Checked Claim Retrieval

Matúš Pikuliak, Ivan Srba, Robert Moro, Timo Hromadka, Timotej Smolen, Martin Melisek, Ivan Vykopal, Jakub Simko et al.

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2305.07991v2

Vote

Add to Library

Recommend

3871. NevIR: Negation in Neural Information Retrieval

Orion Weller, Dawn Lawrie, Benjamin Van Durme

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2305.07614v2

Vote

Add to Library

Recommend

3872. Evaluating Embedding APIs for Information Retrieval

Ehsan Kamalloo, Xinyu Zhang, Odunayo Ogundepo, Nandan Thakur, David Alfonso-Hermelo, Mehdi Rezagholizadeh, Jimmy Lin

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2305.06300v2

Vote

Add to Library

Recommend

3873. Unsupervised Dense Retrieval Training with Web Anchors

Yiqing Xie, Xiao Liu, Chenyan Xiong

Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (2023)

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2305.05834v1

Vote

Add to Library

Recommend

3874. Towards Writer Retrieval for Historical Datasets

Marco Peer, Florian Kleber, Robert Sablatnig

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2305.05358v2

Vote

Add to Library

Recommend

3875. Unified Demonstration Retriever for In-Context Learning

Xiaonan Li, Kai Lv, Hang Yan, Tianyang Lin, Wei Zhu, Yuan Ni, Guotong Xie, Xiaoling Wang et al.

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2305.04320v2

Vote

Add to Library

Recommend

3876. Expository Text Generation: Imitate, Retrieve, Paraphrase

Nishant Balepur, Jie Huang, Kevin Chen-Chuan Chang

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2305.03276v2

Vote

Add to Library

Recommend

3877. Understanding Differential Search Index for Text Retrieval

Xiaoyang Chen, Yanjiang Liu, Ben He, Le Sun, Yingfei Sun

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2305.02073v2

Vote

Add to Library

Recommend

3878. Synthetic Cross-language Information Retrieval Training Data

James Mayfield, Eugene Yang, Dawn Lawrie, Samuel Barham, Orion Weller, Marc Mason, Suraj Nair, Scott Miller

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2305.00331v1

Vote

Add to Library

Recommend

3879. Multivariate Representation Learning for Information Retrieval

Hamed Zamani, Michael Bendersky

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2304.14522v1

Vote

Add to Library

Recommend

3880. STIR: Siamese Transformer for Image Retrieval Postprocessing

Aleksei Shabanov, Aleksei Tarasov, Sergey Nikolenko

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2304.13393v2

Vote

Add to Library

Recommend

Benty-search

3861. Chatting Makes Perfect: Chat-based Image Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2305.20062v2

3862. Adapting Learned Sparse Retrieval for Long Documents

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2305.18494v1

3863. Lexical Retrieval Hypothesis in Multimodal Context

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2305.17663v1

3864. Referral Augmentation for Zero-Shot Information Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2305.15098v1

3865. Privacy Implications of Retrieval-Based Language Models

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2305.14888v1

3866. Dr.ICL: Demonstration-Retrieved In-context Learning

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2305.14128v1

3867. Generative Retrieval via Term Set Generation

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2305.13859v3

3868. Retrieval-augmented Multi-label Text Classification

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2305.13058v1

3869. Soft Prompt Decoding for Multilingual Dense Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2305.09025v1

3870. Multilingual Previously Fact-Checked Claim Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2305.07991v2

3871. NevIR: Negation in Neural Information Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2305.07614v2

3872. Evaluating Embedding APIs for Information Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2305.06300v2

3873. Unsupervised Dense Retrieval Training with Web Anchors

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2305.05834v1

3874. Towards Writer Retrieval for Historical Datasets

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2305.05358v2

3875. Unified Demonstration Retriever for In-Context Learning

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2305.04320v2

3876. Expository Text Generation: Imitate, Retrieve, Paraphrase

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2305.03276v2

3877. Understanding Differential Search Index for Text Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2305.02073v2

3878. Synthetic Cross-language Information Retrieval Training Data

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2305.00331v1

3879. Multivariate Representation Learning for Information Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2304.14522v1

3880. STIR: Siamese Transformer for Image Retrieval Postprocessing

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2304.13393v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2305.20062v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2305.18494v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2305.17663v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2305.15098v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2305.14888v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2305.14128v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2305.13859v3

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2305.13058v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2305.09025v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2305.07991v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2305.07614v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2305.06300v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2305.05834v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2305.05358v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2305.04320v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2305.03276v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2305.02073v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2305.00331v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2304.14522v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2304.13393v2