benty-fields - Search paper

State-of-the-art retrieval models typically address a straightforward search scenario, in which retrieval tasks are fixed (e.g., finding a passage to answer a specific question) and only a single modality is supported for both queries and retrieved results. This paper introduces techniques for advancing information retrieval with multimodal large language models (MLLMs), enabling a broader search scenario, termed universal multimodal retrieval, where multiple modalities and diverse retrieval tasks are accommodated. To this end, we first study fine-tuning an MLLM as a bi-encoder retriever on 10 datasets with 16 retrieval tasks. Our empirical results show that the fine-tuned MLLM retriever is capable of understanding challenging queries, composed of both text and image, but it underperforms compared to a smaller CLIP retriever in cross-modal retrieval tasks due to the modality bias exhibited by MLLMs. To address the issue, we propose modality-aware hard negative mining to mitigate the modality bias exhibited by MLLM retrievers. Second, we propose continuously fine-tuning the universal multimodal retriever to enhance its text retrieval capability while preserving multimodal retrieval capability. As a result, our model, MM-Embed, achieves state-of-the-art performance on the multimodal retrieval benchmark M-BEIR, which spans multiple domains and tasks, while also surpassing the state-of-the-art text retrieval model, NV-Embed-v1, on the MTEB retrieval benchmark. We also explore prompting the off-the-shelf MLLMs as zero-shot rerankers to refine the ranking of the candidates from the multimodal retriever. We find that, through prompt-and-reranking, MLLMs can further improve multimodal retrieval when the user queries (e.g., text-image composed queries) are more complex and challenging to understand. These findings also pave the way for advancing universal multimodal retrieval in the future.
Authors' comments: Accepted at ICLR 2025. We release the model weights at: https://huggingface.co/nvidia/MM-Embed

Vote

Add to Library

Recommend

5694. INQUIRE: A Natural World Text-to-Image Retrieval Benchmark

Edward Vendrow, Omiros Pantazis, Alexander Shepard, Gabriel Brostow, Kate E. Jones, Oisin Mac Aodha, Sara Beery, Grant Van Horn

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2411.02537v2

Vote

Add to Library

Recommend

5695. Data Extraction Attacks in Retrieval-Augmented Generation via Backdoors

Yuefeng Peng, Junda Wang, Hong Yu, Amir Houmansadr

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2411.01705v2

Vote

Add to Library

Recommend

5696. Enhancing Question Answering Precision with Optimized Vector Retrieval and Instructions

Lixiao Yang, Mengyang Xu, Weimao Ke

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2411.01039v1

Vote

Add to Library

Recommend

5697. Retrieval-enriched zero-shot image classification in low-resource domains

Nicola Dall'Asen, Yiming Wang, Enrico Fini, Elisa Ricci

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2411.00988v1

Vote

Add to Library

Recommend

5698. Optimizing Contextual Speech Recognition Using Vector Quantization for Efficient Retrieval

Nikolaos Flemotomos, Roger Hsiao, Pawel Swietojanski, Dogan Can, Xiaodan Zhuang

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2411.00664v1

Vote

Add to Library

Recommend

5699. MIRFLEX: Music Information Retrieval Feature Library for Extraction

Anuradha Chopra, Abhinaba Roy, Dorien Herremans

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2411.00469v1

Vote

Add to Library

Recommend

5700. Responsible Retrieval Augmented Generation for Climate Decision Making from Documents

Matyas Juhasz, Kalyan Dutia, Henry Franks, Conor Delahunty, Patrick Fawbert Mills, Harrison Pim

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2410.23902v1

Vote

Add to Library

Recommend

Benty-search

5681. Invar-RAG: Invariant LLM-aligned Retrieval for Better Generation

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2411.07021v2

5682. Veri-Car: Towards Open-world Vehicle Information Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2411.06864v3

5683. Exploring Knowledge Boundaries in Large Language Models for Retrieval Judgment

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2411.06207v1

5684. Why These Documents? Explainable Generative Retrieval with Hierarchical Category Paths

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2411.05572v1

5685. Assessing the Answerability of Queries in Retrieval-Augmented Code Generation

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2411.05547v1

5686. IntellBot: Retrieval Augmented LLM Chatbot for Cyber Threat Knowledge Delivery

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2411.05442v1

5687. Sufficient Context: A New Lens on Retrieval Augmented Generation Systems

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2411.06037v3

5688. Deploying Large Language Models With Retrieval Augmented Generation

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2411.11895v1

5689. Towards Competitive Search Relevance For Inference-Free Learned Sparse Retrievers

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2411.04403v1

5690. Reproducible Hybrid Time-Travel Retrieval in Evolving Corpora

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2411.04051v1

5691. Input-Driven Dynamics for Robust Memory Retrieval in Hopfield Networks

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2411.05849v1

5692. PersianRAG: A Retrieval-Augmented Generation System for Persian Language

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2411.02832v1

5693. MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2411.02571v2

5694. INQUIRE: A Natural World Text-to-Image Retrieval Benchmark

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2411.02537v2

5695. Data Extraction Attacks in Retrieval-Augmented Generation via Backdoors

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2411.01705v2

5696. Enhancing Question Answering Precision with Optimized Vector Retrieval and Instructions

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2411.01039v1

5697. Retrieval-enriched zero-shot image classification in low-resource domains

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2411.00988v1

5698. Optimizing Contextual Speech Recognition Using Vector Quantization for Efficient Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2411.00664v1

5699. MIRFLEX: Music Information Retrieval Feature Library for Extraction

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2411.00469v1

5700. Responsible Retrieval Augmented Generation for Climate Decision Making from Documents

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2410.23902v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2411.07021v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2411.06864v3

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2411.06207v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2411.05572v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2411.05547v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2411.05442v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2411.06037v3

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2411.11895v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2411.04403v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2411.04051v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2411.05849v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2411.02832v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2411.02571v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2411.02537v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2411.01705v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2411.01039v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2411.00988v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2411.00664v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2411.00469v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2410.23902v1