benty-fields - Search paper

8001. ImageRAGTurbo: Towards One-step Text-to-Image Generation with Retrieval-Augmented Diffusion Models

Peijie Qiu, Hariharan Ramshankar, Arnau Ramisa, René Vidal, Amit Kumar K C, Vamsi Salaka, Rahul Bhagat

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.12640v1

Vote

Add to Library

Recommend

8002. CAPTS: Channel-Aware, Preference-Aligned Trigger Selection for Multi-Channel Item-to-Item Retrieval

Xiaoyou Zhou, Yuqi Liu, Zhao Liu, Xiao Lv, Bo Chen, Ruiming Tang, Guorui Zhou

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.12564v1

Vote

Add to Library

Recommend

8003. Visual RAG Toolkit: Scaling Multi-Vector Visual Retrieval with Training-Free Pooling and Multi-Stage Search

Ara Yeroyan

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.12510v1

Multi-vector visual retrievers (e.g., ColPali-style late interaction models) deliver strong accuracy, but scale poorly because each page yields thousands of vectors, making indexing and search increasingly expensive. We present Visual RAG Toolkit, a practical system for scaling visual multi-vector retrieval with training-free, model-aware pooling and multi-stage retrieval. Motivated by Matryoshka Embeddings, our method performs static spatial pooling - including a lightweight sliding-window averaging variant - over patch embeddings to produce compact tile-level and global representations for fast candidate generation, followed by exact MaxSim reranking using full multi-vector embeddings. Our design yields a quadratic reduction in vector-to-vector comparisons by reducing stored vectors per page from thousands to dozens, notably without requiring post-training, adapters, or distillation. Across experiments with interaction-style models such as ColPali and ColSmol-500M, we observe that over the limited ViDoRe v2 benchmark corpus 2-stage retrieval typically preserves NDCG and Recall @ 5/10 with minimal degradation, while substantially improving throughput (approximately 4x QPS); with sensitivity mainly at very large k. The toolkit additionally provides robust preprocessing - high resolution PDF to image conversion, optional margin/empty-region cropping and token hygiene (indexing only visual tokens) - and a reproducible evaluation pipeline, enabling rapid exploration of two-, three-, and cascaded retrieval variants. By emphasizing efficiency at common cutoffs (e.g., k <= 10), the toolkit lowers hardware barriers and makes state-of-the-art visual retrieval more accessible in practice.
Authors' comments: 4 pages, 3 figures. Submitted to SIGIR 2026 Demonstrations Track. Project website: https://github.com/Ara-Yeroyan/visual-rag-toolkit

Vote

Add to Library

Recommend

8004. BLUEPRINT Rebuilding a Legacy: Multimodal Retrieval for Complex Engineering Drawings and Documents

Ethan Seefried, Ran Eldegaway, Sanjay Das, Nathaniel Blanchard, Tirthankar Ghosal

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.13345v1

Vote

Add to Library

Recommend

8005. Vision Transformer for Multi-Domain Phase Retrieval in Coherent Diffraction Imaging

Jialun Liu, David Yang, Ian Robinson

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.12255v1

Vote

Add to Library

Recommend

8006. IncompeBench: A Permissively Licensed, Fine-Grained Benchmark for Music Information Retrieval

Benjamin Clavié, Atoof Shakir, Jonah Turner, Sean Lee, Aamir Shakir, Makoto P. Kato

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.11941v1

Vote

Add to Library

Recommend

8007. AlphaPROBE: Alpha Mining via Principled Retrieval and On-graph biased evolution

Taian Guo, Haiyang Shen, Junyu Luo, Binqi Chen, Hongjun Ding, Jinsheng Huang, Luchen Liu, Yun Ma et al.

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.11917v1

Vote

Add to Library

Recommend

8008. RI-Mamba: Rotation-Invariant Mamba for Robust Text-to-Shape Retrieval

Khanh Nguyen, Dasith de Silva Edirimuni, Ghulam Mubashar Hassan, Ajmal Mian

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.11673v1

Vote

Add to Library

Recommend

8009. TEGRA: Text Encoding With Graph and Retrieval Augmentation for Misinformation Detection

Géraud Faye, Wassila Ouerdane, Guillaume Gadek, Céline Hudelot

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.11106v1

Vote

Add to Library

Recommend

8010. DeepImageSearch: Benchmarking Multimodal Agents for Context-Aware Image Retrieval in Visual Histories

Chenlong Deng, Mengjie Deng, Junjie Wu, Dun Zeng, Teng Wang, Qingsong Xie, Jiadeng Huang, Shengjie Ma et al.

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.10809v1

Vote

Add to Library

Recommend

8011. TRACE: Timely Retrieval and Alignment for Cybersecurity Knowledge Graph Construction and Expansion

Zijing Xu, Ziwei Ning, Tiancheng Hu, Jianwei Zhuge, Yangyang Wang, Jiahao Cao, Mingwei Xu

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.11211v1

Vote

Add to Library

Recommend

8012. GeoGR: A Generative Retrieval Framework for Spatio-Temporal Aware POI Recommendation

Fangye Wang, Haowen Lin, Yifang Yuan, Siyuan Wang, Xiaojiang Zhou, Song Yang, Pengjie Wang

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.10411v1

Vote

Add to Library

Recommend

8013. MEVER: Multi-Modal and Explainable Claim Verification with Graph-based Evidence Retrieval

Delvin Ce Zhang, Suhan Cui, Zhelin Chu, Xianren Zhang, Dongwon Lee

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.10023v1

Vote

Add to Library

Recommend

8014. AmharicIR+Instr: A Two-Dataset Resource for Neural Retrieval and Instruction Tuning

Tilahun Yeshambel, Moncef Garouani, Josiane Mothe

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.09914v1

Vote

Add to Library

Recommend

8015. ARK: A Dual-Axis Multimodal Retrieval Benchmark along Reasoning and Knowledge

Yijie Lin, Guofeng Ding, Haochen Zhou, Haobin Li, Mouxing Yang, Xi Peng

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.09839v1

Vote

Add to Library

Recommend

8016. LEMUR: A Corpus for Robust Fine-Tuning of Multilingual Law Embedding Models for Retrieval

Narges Baba Ahmadi, Jan Strich, Martin Semmann, Chris Biemann

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.09570v1

Vote

Add to Library

Recommend

8017. IntPro: A Proxy Agent for Context-Aware Intent Understanding via Retrieval-conditioned Inference

Guanming Liu, Meng Wu, Peng Zhang, Yu Zhang, Yubo Shu, Xianliang Huang, Kainan Tu, Ning Gu et al.

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.03325v1

Vote

Add to Library

Recommend

8018. The Wisdom of Many Queries: Complexity-Diversity Principle for Dense Retriever Training

Xincan Feng, Noriki Nishida, Yusuke Sakai, Yuji Matsumoto

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.09448v1

Vote

Add to Library

Recommend

8019. STaR: Scalable Task-Conditioned Retrieval for Long-Horizon Multimodal Robot Memory

Mingfeng Yuan, Hao Zhang, Mahan Mohammadi, Runhao Li, Jinjun Shan, Steven L. Waslander

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.09255v1

Vote

Add to Library

Recommend

8020. VERA: Identifying and Leveraging Visual Evidence Retrieval Heads in Long-Context Understanding

Rongcan Pei, Huan Li, Fang Guo, Qi Zhu

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.10146v1

Vote

Add to Library

Recommend

Benty-search

8001. ImageRAGTurbo: Towards One-step Text-to-Image Generation with Retrieval-Augmented Diffusion Models

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.12640v1

8002. CAPTS: Channel-Aware, Preference-Aligned Trigger Selection for Multi-Channel Item-to-Item Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.12564v1

8003. Visual RAG Toolkit: Scaling Multi-Vector Visual Retrieval with Training-Free Pooling and Multi-Stage Search

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.12510v1

8004. BLUEPRINT Rebuilding a Legacy: Multimodal Retrieval for Complex Engineering Drawings and Documents

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.13345v1

8005. Vision Transformer for Multi-Domain Phase Retrieval in Coherent Diffraction Imaging

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.12255v1

8006. IncompeBench: A Permissively Licensed, Fine-Grained Benchmark for Music Information Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.11941v1

8007. AlphaPROBE: Alpha Mining via Principled Retrieval and On-graph biased evolution

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.11917v1

8008. RI-Mamba: Rotation-Invariant Mamba for Robust Text-to-Shape Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.11673v1

8009. TEGRA: Text Encoding With Graph and Retrieval Augmentation for Misinformation Detection

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.11106v1

8010. DeepImageSearch: Benchmarking Multimodal Agents for Context-Aware Image Retrieval in Visual Histories

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.10809v1

8011. TRACE: Timely Retrieval and Alignment for Cybersecurity Knowledge Graph Construction and Expansion

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.11211v1

8012. GeoGR: A Generative Retrieval Framework for Spatio-Temporal Aware POI Recommendation

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.10411v1

8013. MEVER: Multi-Modal and Explainable Claim Verification with Graph-based Evidence Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.10023v1

8014. AmharicIR+Instr: A Two-Dataset Resource for Neural Retrieval and Instruction Tuning

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.09914v1

8015. ARK: A Dual-Axis Multimodal Retrieval Benchmark along Reasoning and Knowledge

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.09839v1

8016. LEMUR: A Corpus for Robust Fine-Tuning of Multilingual Law Embedding Models for Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.09570v1

8017. IntPro: A Proxy Agent for Context-Aware Intent Understanding via Retrieval-conditioned Inference

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2603.03325v1

8018. The Wisdom of Many Queries: Complexity-Diversity Principle for Dense Retriever Training

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.09448v1

8019. STaR: Scalable Task-Conditioned Retrieval for Long-Horizon Multimodal Robot Memory

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.09255v1

8020. VERA: Identifying and Leveraging Visual Evidence Retrieval Heads in Long-Context Understanding

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.10146v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.12640v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.12564v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.12510v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.13345v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.12255v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.11941v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.11917v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.11673v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.11106v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.10809v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.11211v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.10411v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.10023v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.09914v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.09839v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.09570v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.03325v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.09448v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.09255v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.10146v1