B. V. Patel, B. B. Meshram
Content based video retrieval is an approach for facilitating the searching and browsing of large image collections over World Wide Web. In this approach, video analysis is conducted on low level visual properties extracted from video frame. We believed that in order to create an effective video retrieval system, visual perception must be taken into account. We conjectured that a technique which employs multiple features for indexing and retrieval would be more effective in the discrimination and search tasks of videos. In order to validate this claim, content based indexing and retrieval systems were implemented using color histogram, various texture features and other approaches. Videos were stored in Oracle 9i Database and a user study measured correctness of response.
Boris Alexeev, Afonso S. Bandeira, Matthew Fickus, Dustin G. Mixon
In many areas of imaging science, it is difficult to measure the phase of linear measurements. As such, one often wishes to reconstruct a signal from intensity measurements, that is, perform phase retrieval. In this paper, we provide a novel measurement design which is inspired by interferometry and exploits certain properties of expander graphs. We also give an efficient phase retrieval procedure, and use recent results in spectral graph theory to produce a stable performance guarantee which rivals the guarantee for PhaseLift in [Candes et al. 2011]. We use numerical simulations to illustrate the performance of our phase retrieval procedure, and we compare reconstruction error and runtime with a common alternating-projections-type procedure.
Jason Weston, Chong Wang, Ron Weiss, Adam Berenzweig
Retrieval tasks typically require a ranking of items given a query.
Collaborative filtering tasks, on the other hand, learn to model user's
preferences over items. In this paper we study the joint problem of
recommending items to a user with respect to a given query, which is a
surprisingly common task. This setup differs from the standard collaborative
filtering one in that we are given a query x user x item tensor for training
instead of the more traditional user x item matrix. Compared to document
retrieval we do have a query, but we may or may not have content features (we
will consider both cases) and we can also take account of the user's profile.
We introduce a factorized model for this new task that optimizes the top-ranked
items returned for the given query and user. We report empirical results where
it outperforms several baselines.
Authors' comments: ICML2012
Jean Bertoin, Marc Yor
We show that if $(X_s, s\geq 0)$ is a right-continuous process, $Y_t=\int_0^t\d s X_s$ its integral process and $\tau = (\tau_{\ell}, \ell \geq 0)$ a subordinator, then the time-changed process $(Y_{\tau_{\ell}}, \ell\geq 0)$ allows to retrieve the information about $(X_{\tau_{\ell}}, \ell\geq 0)$ when $\tau$ is stable, but not when $\tau$ is a gamma subordinator. This question has been motivated by a striking identity in law involving the Bessel clock taken at an independent inverse Gaussian variable.
Kostadin Koroutchev, Jian Shen, Elka Koroutcheva, Manuel Cebrian
In this work, we suggest a parameterized statistical model (the gamma
distribution) for the frequency of word occurrences in long strings of English
text and use this model to build a corresponding thermodynamic picture by
constructing the partition function. We then use our partition function to
compute thermodynamic quantities such as the free energy and the specific heat.
In this approach, the parameters of the word frequency model vary from word to
word so that each word has a different corresponding thermodynamics and we
suggest that differences in the specific heat reflect differences in how the
words are used in language, differentiating keywords from common and function
words. Finally, we apply our thermodynamic picture to the problem of retrieval
of texts based on keywords and suggest some advantages over traditional
information retrieval methods.
Authors' comments: 12 pages, 7 figures
Pere Constans
An approximate textual retrieval algorithm for searching sources with high levels of defects is presented. It considers splitting the words in a query into two overlapping segments and subsequently building composite regular expressions from interlacing subsets of the segments. This procedure reduces the probability of missed occurrences due to source defects, yet diminishes the retrieval of irrelevant, non-contextual occurrences.
Michael J. Kurtz, Guenther Eichhorn, Alberto Accomazzi, Carolyn Grant, Edwin Henneken, Stephen S. Murray
Since it was first announced at ADASS 2 the Smithsonian/NASA Astrophysics
System Abstract Service (ADS) has played a central role in the information
seeking behavior of astronomers. Central to the ability of the ADS to act as a
search and discovery tool is its role as metadata agregator. Over the past 13
years the ADS has introduced many new techniques to facilitate information
retrieval, broadly defined. We discuss some of these developments; with
particular attention to how the ADS might interact with the virtual
observatory, and to the new myADS-arXiv customized open access virtual journal.
The ADS is at http://ads.harvard.edu
Authors' comments: Invited talk, to appear in ADASS XV Proceedings
Simon Popelier, Matthieu X. B. Sarazin, Maximilien Bohm, Mathieu Gierski, Hanna Mergui, Matthieu Ospici, Adrien Bernhardt
The Sales Comparison Approach (SCA) is one of the most popular when it comes to real estate appraisal. Used as a reference in real estate expertise and as one of the major types of Automatic Valuation Models (AVM), it recently gained popularity within machine learning methods. The performance of models able to use data represented as sets and graphs made it possible to adapt this methodology efficiently, yielding substantial results. SCA relies on taking past transactions (comparables) as references, selected according to their similarity with the target property's sale. In this study, we focus on the selection of these comparables for real estate appraisal. We demonstrate that the selection of comparables used in many state-of-the-art algorithms can be significantly improved by learning a selection policy instead of imposing it. Our method relies on a hybrid vector-geographical retrieval module capable of adapting to different datasets and optimized jointly with an estimation module. We further show that the use of carefully selected comparables makes it possible to build models that require fewer comparables and fewer parameters with performance close to state-of-the-art models. All our evaluations are made on five datasets which span areas in the United States, Brazil, and France.
Authors' comments: Accepted at NFMCP 2024 workshop (New Frontiers in Mining Complex Patterns), held in conjunction with ECML 2024
Tingting Tang, James Flemings, Yongqin Wang, Murali Annavaram
Retrieval-augmented generation (RAG) is a widely used framework for reducing hallucinations in large language models (LLMs) on domain-specific tasks by retrieving relevant documents from a database to support accurate responses. However, when the database contains sensitive corpora, such as medical records or legal documents, RAG poses serious privacy risks by potentially exposing private information through its outputs. Prior work has demonstrated that one can practically craft adversarial prompts that force an LLM to regurgitate the augmented contexts. A promising direction is to integrate differential privacy (DP), a privacy notion that offers strong formal guarantees, into RAG systems. However, naively applying DP mechanisms into existing systems often leads to significant utility degradation. Particularly for RAG systems, DP can reduce the usefulness of the augmented contexts leading to increase risk of hallucination from the LLMs. Motivated by these challenges, we present DP-KSA, a novel privacy-preserving RAG algorithm that integrates DP using the propose-test-release paradigm. DP-KSA follows from a key observation that most question-answering (QA) queries can be sufficiently answered with a few keywords. Hence, DP-KSA first obtains an ensemble of relevant contexts, each of which will be used to generate a response from an LLM. We utilize these responses to obtain the most frequent keywords in a differentially private manner. Lastly, the keywords are augmented into the prompt for the final output. This approach effectively compresses the semantic space while preserving both utility and privacy. We formally show that DP-KSA provides formal DP guarantees on the generated output with respect to the RAG database. We evaluate DP-KSA on two QA benchmarks using three instruction-tuned LLMs, and our empirical results demonstrate that DP-KSA achieves a strong privacy-utility tradeoff.
Elias Jääsaari, Ville Hyvönen, Teemu Roos
Multi-vector representations generated by late interaction models, such as ColBERT, enable superior retrieval quality compared to single-vector representations in information retrieval applications. In multi-vector retrieval systems, both queries and documents are encoded using one embedding for each token, and similarity between queries and documents is measured by the MaxSim similarity measure. However, the improved recall of multi-vector retrieval comes at the expense of significantly increased latency. This necessitates designing efficient approximate nearest neighbor search (ANNS) algorithms for multi-vector search. In this work, we introduce LEMUR, a simple-yet-efficient framework for multi-vector similarity search. LEMUR consists of two consecutive problem reductions: We first formulate multi-vector similarity search as a supervised learning problem that can be solved using a one-hidden-layer neural network. Second, we reduce inference under this model to single-vector similarity search in its latent space, which enables the use of existing single-vector ANNS methods for speeding up retrieval. In addition to performance evaluation on ColBERTv2 embeddings, we evaluate LEMUR on embeddings generated by modern multi-vector text models and multi-vector visual document retrieval models. LEMUR is an order of magnitude faster than earlier multi-vector similarity search methods.
Authors' comments: 17 pages
Dominik Stammbach, Kylie Zhang, Patty Liu, Nimra Nadeem, Lucia Zheng, Peter Henderson
AI tools are increasingly suggested as solutions to assist public agencies with heavy workloads. In public defense, where a constitutional right to counsel meets the complexities of law, overwhelming caseloads and constrained resources, practitioners face especially taxing conditions. Yet, there is little evidence of how AI could meaningfully support defenders' day-to-day work. In partnership with the New Jersey Office of the Public Defender, we develop the NJ BriefBank, a retrieval tool which surfaces relevant appellate briefs to streamline legal research and writing. We show that existing legal retrieval benchmarks fail to transfer to public defense search, however adding domain knowledge improves retrieval quality. This includes query expansion with legal reasoning, domain-specific data and curated synthetic examples. To facilitate further research, we provide a taxonomy of realistic defender search queries and release a manually annotated public defense retrieval dataset. Together, our work offers starting points towards building practical, reliable retrieval AI tools for public defense, and towards more realistic legal retrieval benchmarks.
Max McKinnon
The ability of large language models (LLMs) to recall and retrieve
information from long contexts is critical for many real-world applications.
Prior work (Liu et al., 2023) reported that LLMs suffer significant drops in
retrieval accuracy for facts placed in the middle of large contexts, an effect
known as "Lost in the Middle" (LITM). We find the model Gemini 2.5 Flash can
answer needle-in-a-haystack questions with great accuracy regardless of
document position including when the document is nearly at the input context
limit. Our results suggest that the "Lost in the Middle" effect is not present
for simple factoid Q\&A in Gemini 2.5 Flash, indicating substantial
improvements in long-context retrieval.
Authors' comments: 3 pages, 0 figures
Weijian Jian, Yajun Zhang, Dawei Liang, Chunyu Xie, Yixiao He, Dawei Leng, Yuhui Yin
The rapid advancement of Multimodal Large Language Models (MLLMs) has extended CLIP-based frameworks to produce powerful, universal embeddings for retrieval tasks. However, existing methods primarily focus on natural images, offering limited support for other crucial visual modalities such as videos and visual documents. To bridge this gap, we introduce RzenEmbed, a unified framework to learn embeddings across a diverse set of modalities, including text, images, videos, and visual documents. We employ a novel two-stage training strategy to learn discriminative representations. The first stage focuses on foundational text and multimodal retrieval. In the second stage, we introduce an improved InfoNCE loss, incorporating two key enhancements. Firstly, a hardness-weighted mechanism guides the model to prioritize challenging samples by assigning them higher weights within each batch. Secondly, we implement an approach to mitigate the impact of false negatives and alleviate data noise. This strategy not only enhances the model's discriminative power but also improves its instruction-following capabilities. We further boost performance with learnable temperature parameter and model souping. RzenEmbed sets a new state-of-the-art on the MMEB benchmark. It not only achieves the best overall score but also outperforms all prior work on the challenging video and visual document retrieval tasks. Our models are available in https://huggingface.co/qihoo360/RzenEmbed.
Bill Psomas, George Retsinas, Nikos Efthymiadis, Panagiotis Filntisis, Yannis Avrithis, Petros Maragos, Ondrej Chum, Giorgos Tolias
The progress of composed image retrieval (CIR), a popular research direction
in image retrieval, where a combined visual and textual query is used, is held
back by the absence of high-quality training and evaluation data. We introduce
a new evaluation dataset, i-CIR, which, unlike existing datasets, focuses on an
instance-level class definition. The goal is to retrieve images that contain
the same particular object as the visual query, presented under a variety of
modifications defined by textual queries. Its design and curation process keep
the dataset compact to facilitate future research, while maintaining its
challenge-comparable to retrieval among more than 40M random
distractors-through a semi-automated selection of hard negatives.
To overcome the challenge of obtaining clean, diverse, and suitable training
data, we leverage pre-trained vision-and-language models (VLMs) in a
training-free approach called BASIC. The method separately estimates
query-image-to-image and query-text-to-image similarities, performing late
fusion to upweight images that satisfy both queries, while down-weighting those
that exhibit high similarity with only one of the two. Each individual
similarity is further improved by a set of components that are simple and
intuitive. BASIC sets a new state of the art on i-CIR but also on existing CIR
datasets that follow a semantic-level class definition. Project page:
https://vrg.fel.cvut.cz/icir/.
Authors' comments: NeurIPS 2025
Ruibo Hou, Shiyu Teng, Jiaqing Liu, Shurong Chai, Yinhao Li, Lanfen Lin, Yen-Wei Chen
Multimodal deep learning has shown promise in depression detection by
integrating text, audio, and video signals. Recent work leverages sentiment
analysis to enhance emotional understanding, yet suffers from high
computational cost, domain mismatch, and static knowledge limitations. To
address these issues, we propose a novel Retrieval-Augmented Generation (RAG)
framework. Given a depression-related text, our method retrieves semantically
relevant emotional content from a sentiment dataset and uses a Large Language
Model (LLM) to generate an Emotion Prompt as an auxiliary modality. This prompt
enriches emotional representation and improves interpretability. Experiments on
the AVEC 2019 dataset show our approach achieves state-of-the-art performance
with CCC of 0.593 and MAE of 3.95, surpassing previous transfer learning and
multi-task learning baselines.
Authors' comments: Accepted in IEEE EMBC 2025
Yu Xia, Zhiqiang Xu
Compressed sensing has demonstrated that a general signal $\boldsymbol{x} \in
\mathbb{F}^n$ ($\mathbb{F}\in \{\mathbb{R},\mathbb{C}\}$) can be estimated from
few linear measurements with an error {proportional to} the best $k$-term
approximation error, a property known as instance optimality. In this paper, we
investigate instance optimality in the context of phaseless measurements using
the $\ell_p$-minimization decoder, where $p \in (0, 1]$, for both real and
complex cases. More specifically, we prove that $(2,1)$ and $(1,1)$-instance
optimality of order $k$ can be achieved with $m =O(k \log(n/k))$ phaseless
measurements, paralleling results from linear measurements. These results imply
that one can stably recover approximately $k$-sparse signals from $m = O(k
\log(n/k))$ phaseless measurements. Our approach leverages the phaseless
bi-Lipschitz condition. Additionally, we present a non-uniform version of
$(2,2)$-instance optimality result in probability applicable to any fixed
vector $\boldsymbol{x} \in \mathbb{F}^n$. These findings reveal striking
parallels between compressive phase retrieval and classical compressed sensing,
enhancing our understanding of both phase retrieval and instance optimality.
Authors' comments: 18 pages
Ori nizan, Oren Shrout, Ayellet Tal
A concept may reflect either a concrete or abstract idea. Given an input image, this paper seeks to retrieve other images that share its central concepts, capturing aspects of the underlying narrative. This goes beyond conventional retrieval or clustering methods, which emphasize visual or semantic similarity. We formally define the problem, outline key requirements, and introduce appropriate evaluation metrics. We propose a novel approach grounded in two key observations: (1) While each neighbor in the embedding space typically shares at least one concept with the query, not all neighbors necessarily share the same concept with one another. (2) Modeling this neighborhood with a bimodal Gaussian distribution uncovers meaningful structure that facilitates concept identification. Qualitative, quantitative, and human evaluations confirm the effectiveness of our approach. See the package on PyPI: https://pypi.org/project/coret/
Anant Gupta, Karthik Singaravadivelan, Zekun Wang
Neural document retrieval often treats a corpus as a flat cloud of vectors scored at a single granularity, leaving corpus structure underused and explanations opaque. We use Cobweb--a hierarchy-aware framework--to organize sentence embeddings into a prototype tree and rank documents via coarse-to-fine traversal. Internal nodes act as concept prototypes, providing multi-granular relevance signals and a transparent rationale through retrieval paths. We instantiate two inference approaches: a generalized best-first search and a lightweight path-sum ranker. We evaluate our approaches on MS MARCO and QQP with encoder (e.g., BERT/T5) and decoder (GPT-2) representations. Our results show that our retrieval approaches match the dot product search on strong encoder embeddings while remaining robust when kNN degrades: with GPT-2 vectors, dot product performance collapses whereas our approaches still retrieve relevant results. Overall, our experiments suggest that Cobweb provides competitive effectiveness, improved robustness to embedding quality, scalability, and interpretable retrieval via hierarchical prototypes.
Authors' comments: 20 pages, 7 tables, 4 figures
Gennian Ge, Hao Wang, Zixiang Xu, Yijun Zhang
The problem of PIR in graph-based replication systems has received
significant attention in recent years. A systematic study was conducted by
Sadeh, Gu, and Tamo, where each file is replicated across two servers and the
storage topology is modeled by a graph. The PIR capacity of a graph $G$,
denoted by $\mathcal{C}(G)$, is defined as the supremum of retrieval rates
achievable by schemes that preserve user privacy, with the rate measured as the
ratio between the file size and the total number of bits downloaded. This paper
makes the following key contributions.
(1) The complete graph $K_N$ has emerged as a central benchmark in the study
of PIR over graphs. The asymptotic gap between the upper and lower bounds for
$\mathcal{C}(K_N)$ was previously 2 and was only recently reduced to $5/3$. We
shrink this gap to $1.0444$, bringing it close to resolution. More precisely,
(i) Sadeh, Gu, and Tamo proved that $\mathcal{C}(K_N)\le 2/(N+1)$ and
conjectured this bound to be tight. We refute this conjecture by establishing
the strictly stronger bound $\mathcal{C}(K_N) \le \frac{1.3922}{N}.$ We also
improve the upper bound for the balanced complete bipartite graph
$\mathcal{C}(K_{N/2,N/2})$. (ii) The first lower bound on $\mathcal{C}(K_N)$
was $(1+o(1))/N$, which was recently sharpened to $(6/5+o(1))/N$. We provide
explicit, systematic constructions that further improve this bound, proving
$\mathcal{C}(K_N)\ge(4/3-o(1))/N,$ which in particular implies $\mathcal{C}(G)
\ge (4/3-o(1))/|G|$ for every graph $G$.
(2) We establish a conceptual bridge between deterministic and probabilistic
PIR schemes on graphs. This connection has significant implications for
reducing the required subpacketization in practical implementations and is of
independent interest. We also design a general probabilistic PIR scheme that
performs particularly well on sparse graphs.
Authors' comments: 72 pages
Kayla Farivar
Information retrieval systems have progressed notably from lexical techniques such as BM25 and TF-IDF to modern semantic retrievers. This survey provides a brief overview of the BM25 baseline, then discusses the architecture of modern state-of-the-art semantic retrievers. Advancing from BERT, we introduce dense bi-encoders (DPR), late-interaction models (ColBERT), and neural sparse retrieval (SPLADE). Finally, we examine MonoT5, a cross-encoder model. We conclude with common evaluation tactics, pressing challenges, and propositions for future directions.