benty-fields - Search paper

Multimodal retrieval is the task of aggregating information from queries across heterogeneous modalities to retrieve desired targets. State-of-the-art multimodal retrieval models can understand complex queries, yet they are typically limited to two modalities: text and vision. This limitation impedes the development of universal retrieval systems capable of comprehending queries that combine more than two modalities. To advance toward this goal, we present OmniRet, the first retrieval model capable of handling complex, composed queries spanning three key modalities: text, vision, and audio. Our OmniRet model addresses two critical challenges for universal retrieval: computational efficiency and representation fidelity. First, feeding massive token sequences from modality-specific encoders to Large Language Models (LLMs) is computationally inefficient. We therefore introduce an attention-based resampling mechanism to generate compact, fixed-size representations from these sequences. Second, compressing rich omni-modal data into a single embedding vector inevitably causes information loss and discards fine-grained details. We propose Attention Sliced Wasserstein Pooling to preserve these fine-grained details, leading to improved omni-modal representations. OmniRet is trained on an aggregation of approximately 6 million query-target pairs spanning 30 datasets. We benchmark our model on 13 retrieval tasks and a MMEBv2 subset. Our model demonstrates significant improvements on composed query, audio and video retrieval tasks, while achieving on-par performance with state-of-the-art models on others. Furthermore, we curate a new Audio-Centric Multimodal Benchmark (ACM). This new benchmark introduces two critical, previously missing tasks-composed audio retrieval and audio-visual retrieval to more comprehensively evaluate a model's omni-modal embedding capacity.
Authors' comments: CVPR 2026. Project link: https://github.com/hmchuong/omniret

Vote

Add to Library

Recommend

4725. Inference-Time Safety For Code LLMs Via Retrieval-Augmented Revision

Manisha Mukherjee, Vincent J. Hellendoorn

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.01494v1

Vote

Add to Library

Recommend

4726. PhotoBench: Beyond Visual Matching Towards Personalized Intent-Driven Photo Retrieval

Tianyi Xu, Rong Shan, Junjie Wu, Jiadeng Huang, Teng Wang, Jiachen Zhu, Wenteng Chen, Minxin Tu et al.

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.01493v1

Vote

Add to Library

Recommend

4727. ROSER: Few-Shot Robotic Sequence Retrieval for Scalable Robot Learning

Zillur Rahman, Eddison Pham, Alejandro Daniel Noel, Cristian Meo

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.01474v1

Vote

Add to Library

Recommend

4728. LaSER: Internalizing Explicit Reasoning into Latent Space for Dense Retrieval

Jiajie Jin, Yanzhao Zhang, Mingxin Li, Dingkun Long, Pengjun Xie, Yutao Zhu, Zhicheng Dou

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.01425v1

Vote

Add to Library

Recommend

4729. CURE: A Multimodal Benchmark for Clinical Understanding and Retrieval Evaluation

Yannian Gu, Zhongzhen Huang, Linjie Mu, Xizhuo Zhang, Shaoting Zhang, Xiaofan Zhang

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.19274v1

Vote

Add to Library

Recommend

4730. Multi-Sourced, Multi-Agent Evidence Retrieval for Fact-Checking

Shuzhi Gong, Richard O. Sinnott, Jianzhong Qi, Cecile Paris, Preslav Nakov, Zhuohan Xie

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.00267v1

Vote

Add to Library

Recommend

4731. UniFAR: A Unified Facet-Aware Retrieval Framework for Scientific Documents

Zheng Dou, Zhao Zhang, Deqing Wang, Yikun Ban, Fuzhen Zhuang

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.23766v1

Vote

Add to Library

Recommend

4732. GetBatch: Distributed Multi-Object Retrieval for ML Data Loading

Alex Aizman, Abhishek Gaikwad, Piotr Żelasko

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.22434v1

Vote

Add to Library

Recommend

4733. HubScan: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems

Idan Habler, Vineeth Sai Narajala, Stav Koren, Amy Chang, Tiffany Saade

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.22427v1

Retrieval-Augmented Generation (RAG) systems are essential to contemporary AI applications, allowing large language models to obtain external knowledge via vector similarity search. Nevertheless, these systems encounter a significant security flaw: hubness - items that frequently appear in the top-k retrieval results for a disproportionately high number of varied queries. These hubs can be exploited to introduce harmful content, alter search rankings, bypass content filtering, and decrease system performance. We introduce hubscan, an open-source security scanner that evaluates vector indices and embeddings to identify hubs in RAG systems. Hubscan presents a multi-detector architecture that integrates: (1) robust statistical hubness detection utilizing median/MAD-based z-scores, (2) cluster spread analysis to assess cross-cluster retrieval patterns, (3) stability testing under query perturbations, and (4) domain-aware and modality-aware detection for category-specific and cross-modal attacks. Our solution accommodates several vector databases (FAISS, Pinecone, Qdrant, Weaviate) and offers versatile retrieval techniques, including vector similarity, hybrid search, and lexical matching with reranking capabilities. We evaluate hubscan on Food-101, MS-COCO, and FiQA adversarial hubness benchmarks constructed using state-of-the-art gradient-optimized and centroid-based hub generation methods. hubscan achieves 90% recall at a 0.2% alert budget and 100% recall at 0.4%, with adversarial hubs ranking above the 99.8th percentile. Domain-scoped scanning recovers 100% of targeted attacks that evade global detection. Production validation on 1M real web documents from MS MARCO demonstrates significant score separation between clean documents and adversarial content. Our work provides a practical, extensible framework for detecting hubness threats in production RAG systems.
Authors' comments: 11 pages, 5 figures, 2 tables, Github: https://github.com/cisco-ai-defense/adversarial-hubness-detector

Vote

Add to Library

Recommend

4734. RETLLM: Training and Data-Free MLLMs for Multimodal Information Retrieval

Dawei Su, Dongsheng Wang

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.22278v1

Vote

Add to Library

Recommend

4735. Seeing Through Words: Controlling Visual Retrieval Quality with Language Models

Jianglin Lu, Simon Jenni, Kushal Kafle, Jing Shi, Handong Zhao, Yun Fu

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.21175v1

Vote

Add to Library

Recommend

4736. DynaRAG: Bridging Static and Dynamic Knowledge in Retrieval-Augmented Generation

Penghao Liang, Mengwei Yuan, Jianan Liu, Jing Yang, Xianyou Li, Weiran Yan, Yichao Wu

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.18012v1

Vote

Add to Library

Recommend

4737. ERA: Evidence-based Reliability Alignment for Honest Retrieval-Augmented Generation

Sunguk Shin, Meeyoung Cha, Byung-Jun Lee, Sungwon Park

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2604.20854v1

Vote

Add to Library

Recommend

4738. How Retrieved Context Shapes Internal Representations in RAG

Samuel Yeh, Sharon Li

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.20091v1

Vote

Add to Library

Recommend

4739. Exo Skryer: A JAX-accelerated sub-stellar atmospheric retrieval framework

Elspeth K. H. Lee

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.19687v1

Vote

Add to Library

Recommend

4740. Evaluating the Impact of Data Anonymization on Image Retrieval

Marvin Chen, Manuel Eberhardinger, Johannes Maucher

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.19641v1

Vote

Add to Library

Recommend

Benty-search

4721. Diagnosing Retrieval vs. Utilization Bottlenecks in LLM Agent Memory

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2603.02473v1

4722. Beyond Caption-Based Queries for Video Moment Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2603.02363v1

4723. RealRoute: Dynamic Query Routing System via Retrieve-then-Verify Paradigm

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2604.20860v1

4724. OmniRet: Efficient and High-Fidelity Omni Modality Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2603.02098v1

4725. Inference-Time Safety For Code LLMs Via Retrieval-Augmented Revision

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2603.01494v1

4726. PhotoBench: Beyond Visual Matching Towards Personalized Intent-Driven Photo Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2603.01493v1

4727. ROSER: Few-Shot Robotic Sequence Retrieval for Scalable Robot Learning

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2603.01474v1

4728. LaSER: Internalizing Explicit Reasoning into Latent Space for Dense Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2603.01425v1

4729. CURE: A Multimodal Benchmark for Clinical Understanding and Retrieval Evaluation

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2603.19274v1

4730. Multi-Sourced, Multi-Agent Evidence Retrieval for Fact-Checking

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2603.00267v1

4731. UniFAR: A Unified Facet-Aware Retrieval Framework for Scientific Documents

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.23766v1

4732. GetBatch: Distributed Multi-Object Retrieval for ML Data Loading

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.22434v1

4733. HubScan: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.22427v1

4734. RETLLM: Training and Data-Free MLLMs for Multimodal Information Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.22278v1

4735. Seeing Through Words: Controlling Visual Retrieval Quality with Language Models

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.21175v1

4736. DynaRAG: Bridging Static and Dynamic Knowledge in Retrieval-Augmented Generation

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2603.18012v1

4737. ERA: Evidence-based Reliability Alignment for Honest Retrieval-Augmented Generation

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2604.20854v1

4738. How Retrieved Context Shapes Internal Representations in RAG

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.20091v1

4739. Exo Skryer: A JAX-accelerated sub-stellar atmospheric retrieval framework

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.19687v1

4740. Evaluating the Impact of Data Anonymization on Image Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.19641v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.02473v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.02363v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2604.20860v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.02098v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.01494v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.01493v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.01474v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.01425v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.19274v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.00267v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.23766v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.22434v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.22427v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.22278v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.21175v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.18012v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2604.20854v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.20091v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.19687v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.19641v1