benty-fields - Search paper

5361. Effective Inference-Free Retrieval for Learned Sparse Representations

Franco Maria Nardini, Thong Nguyen, Cosimo Rulli, Rossano Venturini, Andrew Yates

Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2936-2940 (2025)

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.01452v1

Vote

Add to Library

Recommend

5362. A Multi-Granularity Retrieval Framework for Visually-Rich Documents

Mingjun Xu, Zehui Wang, Hengxing Cai, Renxin Zhong

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.01457v2

Vote

Add to Library

Recommend

5363. PaRT: Enhancing Proactive Social Chatbots with Personalized Real-Time Retrieval

Zihan Niu, Zheyong Xie, Shaosheng Cao, Chonggang Lu, Zheyu Ye, Tong Xu, Zuozhu Liu, Yan Gao et al.

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.20624v1

Vote

Add to Library

Recommend

5364. ARCS: Agentic Retrieval-Augmented Code Synthesis with Iterative Refinement

Manish Bhattarai, Miguel Cordova, Javier Santos, Dan O'Malley

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.20434v1

Vote

Add to Library

Recommend

5365. Reconstructing Context: Evaluating Advanced Chunking Strategies for Retrieval-Augmented Generation

Carlo Merola, Jaspinder Singh

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.19754v1

Vote

Add to Library

Recommend

5366. Privacy-Preserving Federated Embedding Learning for Localized Retrieval-Augmented Generation

Qianren Mao, Qili Zhang, Hanwen Hao, Zhentao Han, Runhua Xu, Weifeng Jiang, Qi Hu, Zhijun Chen et al.

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.19101v1

Vote

Add to Library

Recommend

5367. MTCSC: Retrieval-Augmented Iterative Refinement for Chinese Spelling Correction

Junhong Liang, Yu Zhou

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.18938v1

Vote

Add to Library

Recommend

5368. Speaker Retrieval in the Wild: Challenges, Effectiveness and Robustness

Erfan Loweimi, Mengjie Qian, Kate Knill, Mark Gales

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.18950v1

Vote

Add to Library

Recommend

5369. PropRAG: Guiding Retrieval with Beam Search over Proposition Paths

Jingjin Wang

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.18070v1

Vote

Add to Library

Recommend

5370. Replication and Exploration of Generative Retrieval over Dynamic Corpora

Zhen Zhang, Xinyu Ma, Weiwei Sun, Pengjie Ren, Zhumin Chen, Shuaiqiang Wang, Dawei Yin, Maarten de Rijke et al.

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.17519v1

Generative retrieval (GR) has emerged as a promising paradigm in information retrieval (IR). However, most existing GR models are developed and evaluated using a static document collection, and their performance in dynamic corpora where document collections evolve continuously is rarely studied. In this paper, we first reproduce and systematically evaluate various representative GR approaches over dynamic corpora. Through extensive experiments, we reveal that existing GR models with \textit{text-based} docids show superior generalization to unseen documents. We observe that the more fine-grained the docid design in the GR model, the better its performance over dynamic corpora, surpassing BM25 and even being comparable to dense retrieval methods. While GR models with \textit{numeric-based} docids show high efficiency, their performance drops significantly over dynamic corpora. Furthermore, our experiments find that the underperformance of numeric-based docids is partly due to their excessive tendency toward the initial document set, which likely results from overfitting on the training set. We then conduct an in-depth analysis of the best-performing GR methods. We identify three critical advantages of text-based docids in dynamic corpora: (i) Semantic alignment with language models' pretrained knowledge, (ii) Fine-grained docid design, and (iii) High lexical diversity. Building on these insights, we finally propose a novel multi-docid design that leverages both the efficiency of numeric-based docids and the effectiveness of text-based docids, achieving improved performance in dynamic corpus without requiring additional retraining. Our work offers empirical evidence for advancing GR methods over dynamic corpora and paves the way for developing more generalized yet efficient GR models in real-world search engines.
Authors' comments: Accepted at SIGIR 2025 (Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval)

Vote

Add to Library

Recommend

5371. Unsupervised Corpus Poisoning Attacks in Continuous Space for Dense Retrieval

Yongkang Li, Panagiotis Eustratiadis, Simon Lupart, Evangelos Kanoulas

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.17884v1

This paper concerns corpus poisoning attacks in dense information retrieval, where an adversary attempts to compromise the ranking performance of a search algorithm by injecting a small number of maliciously generated documents into the corpus. Our work addresses two limitations in the current literature. First, attacks that perform adversarial gradient-based word substitution search do so in the discrete lexical space, while retrieval itself happens in the continuous embedding space. We thus propose an optimization method that operates in the embedding space directly. Specifically, we train a perturbation model with the objective of maintaining the geometric distance between the original and adversarial document embeddings, while also maximizing the token-level dissimilarity between the original and adversarial documents. Second, it is common for related work to have a strong assumption that the adversary has prior knowledge about the queries. In this paper, we focus on a more challenging variant of the problem where the adversary assumes no prior knowledge about the query distribution (hence, unsupervised). Our core contribution is an adversarial corpus attack that is fast and effective. We present comprehensive experimental results on both in- and out-of-domain datasets, focusing on two related tasks: a top-1 attack and a corpus poisoning attack. We consider attacks under both a white-box and a black-box setting. Notably, our method can generate successful adversarial examples in under two minutes per target document; four times faster compared to the fastest gradient-based word substitution methods in the literature with the same hardware. Furthermore, our adversarial generation method generates text that is more likely to occur under the distribution of natural text (low perplexity), and is therefore more difficult to detect.
Authors' comments: This paper has been accepted as a full paper at SIGIR 2025 and will be presented orally

Vote

Add to Library

Recommend

5372. MIRAGE: A Metric-Intensive Benchmark for Retrieval-Augmented Generation Evaluation

Chanhee Park, Hyeonseok Moon, Chanjun Park, Heuiseok Lim

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.17137v1

Vote

Add to Library

Recommend

5373. Dynamic Superblock Pruning for Fast Learned Sparse Retrieval

Parker Carlson, Wentai Xie, Shanxiu He, Tao Yang

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.17045v1

Vote

Add to Library

Recommend

5374. Rethinking Vision Transformer for Large-Scale Fine-Grained Image Retrieval

Xin Jiang, Hao Tang, Yonghua Pan, Zechao Li

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.16691v1

Large-scale fine-grained image retrieval (FGIR) aims to retrieve images belonging to the same subcategory as a given query by capturing subtle differences in a large-scale setting. Recently, Vision Transformers (ViT) have been employed in FGIR due to their powerful self-attention mechanism for modeling long-range dependencies. However, most Transformer-based methods focus primarily on leveraging self-attention to distinguish fine-grained details, while overlooking the high computational complexity and redundant dependencies inherent to these models, limiting their scalability and effectiveness in large-scale FGIR. In this paper, we propose an Efficient and Effective ViT-based framework, termed \textbf{EET}, which integrates token pruning module with a discriminative transfer strategy to address these limitations. Specifically, we introduce a content-based token pruning scheme to enhance the efficiency of the vanilla ViT, progressively removing background or low-discriminative tokens at different stages by exploiting feature responses and self-attention mechanism. To ensure the resulting efficient ViT retains strong discriminative power, we further present a discriminative transfer strategy comprising both \textit{discriminative knowledge transfer} and \textit{discriminative region guidance}. Using a distillation paradigm, these components transfer knowledge from a larger ``teacher'' ViT to a more efficient ``student'' model, guiding the latter to focus on subtle yet crucial regions in a cost-free manner. Extensive experiments on two widely-used fine-grained datasets and four large-scale fine-grained datasets demonstrate the effectiveness of our method. Specifically, EET reduces the inference latency of ViT-Small by 42.7\% and boosts the retrieval performance of 16-bit hash codes by 5.15\% on the challenging NABirds dataset.
Authors' comments: Accepted by IEEE TMM

Vote

Add to Library

Recommend

5375. CLIRudit: Cross-Lingual Information Retrieval of Scientific Documents

Francisco Valentini, Diego Kozlowski, Vincent Larivière

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.16264v1

Vote

Add to Library

Recommend

5376. ORION Grounded in Context: Retrieval-Based Method for Hallucination Detection

Assaf Gerner, Netta Madvil, Nadav Barak, Alex Zaikman, Jonatan Liberman, Liron Hamra, Rotem Brazilay, Shay Tsadok et al.

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.15771v2

Vote

Add to Library

Recommend

5377. POLYRAG: Integrating Polyviews into Retrieval-Augmented Generation for Medical Applications

Chunjing Gan, Dan Yang, Binbin Hu, Ziqi Liu, Yue Shen, Zhiqiang Zhang, Jian Wang, Jun Zhou

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.14917v1

Vote

Add to Library

Recommend

5378. Exploring $\ell_0$ Sparsification for Inference-free Sparse Retrievers

Xinjie Shen, Zhichao Geng, Yang Yang

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.14839v1

Vote

Add to Library

Recommend

5379. Incremental Attractor Neural Network Modelling of the Lifespan Retrieval Curve

Patrícia Pereira, Anders Lansner, Pawel Herman

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.14528v1

Vote

Add to Library

Recommend

5380. LegalRAG: A Hybrid RAG System for Multilingual Legal Information Retrieval

Muhammad Rafsan Kabir, Rafeed Mohammad Sultan, Fuad Rahman, Mohammad Ruhul Amin, Sifat Momen, Nabeel Mohammed, Shafin Rahman

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.16121v1

Vote

Add to Library

Recommend

Benty-search

5361. Effective Inference-Free Retrieval for Learned Sparse Representations

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2505.01452v1

5362. A Multi-Granularity Retrieval Framework for Visually-Rich Documents

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2505.01457v2

5363. PaRT: Enhancing Proactive Social Chatbots with Personalized Real-Time Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2504.20624v1

5364. ARCS: Agentic Retrieval-Augmented Code Synthesis with Iterative Refinement

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2504.20434v1

5365. Reconstructing Context: Evaluating Advanced Chunking Strategies for Retrieval-Augmented Generation

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2504.19754v1

5366. Privacy-Preserving Federated Embedding Learning for Localized Retrieval-Augmented Generation

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2504.19101v1

5367. MTCSC: Retrieval-Augmented Iterative Refinement for Chinese Spelling Correction

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2504.18938v1

5368. Speaker Retrieval in the Wild: Challenges, Effectiveness and Robustness

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2504.18950v1

5369. PropRAG: Guiding Retrieval with Beam Search over Proposition Paths

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2504.18070v1

5370. Replication and Exploration of Generative Retrieval over Dynamic Corpora

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2504.17519v1

5371. Unsupervised Corpus Poisoning Attacks in Continuous Space for Dense Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2504.17884v1

5372. MIRAGE: A Metric-Intensive Benchmark for Retrieval-Augmented Generation Evaluation

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2504.17137v1

5373. Dynamic Superblock Pruning for Fast Learned Sparse Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2504.17045v1

5374. Rethinking Vision Transformer for Large-Scale Fine-Grained Image Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2504.16691v1

5375. CLIRudit: Cross-Lingual Information Retrieval of Scientific Documents

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2504.16264v1

5376. ORION Grounded in Context: Retrieval-Based Method for Hallucination Detection

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2504.15771v2

5377. POLYRAG: Integrating Polyviews into Retrieval-Augmented Generation for Medical Applications

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2504.14917v1

5378. Exploring $\ell_0$ Sparsification for Inference-free Sparse Retrievers

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2504.14839v1

5379. Incremental Attractor Neural Network Modelling of the Lifespan Retrieval Curve

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2504.14528v1

5380. LegalRAG: A Hybrid RAG System for Multilingual Legal Information Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2504.16121v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.01452v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.01457v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.20624v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.20434v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.19754v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.19101v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.18938v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.18950v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.18070v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.17519v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.17884v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.17137v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.17045v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.16691v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.16264v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.15771v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.14917v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.14839v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.14528v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2504.16121v1