Giuseppe Amato, Fabrizio Falchi, Lucia Vadicamo
Content-Based Image Retrieval based on local features is computationally expensive because of the complexity of both extraction and matching of local feature. On one hand, the cost for extracting, representing, and comparing local visual descriptors has been dramatically reduced by recently proposed binary local features. On the other hand, aggregation techniques provide a meaningful summarization of all the extracted feature of an image into a single descriptor, allowing us to speed up and scale up the image search. Only a few works have recently mixed together these two research directions, defining aggregation methods for binary local features, in order to leverage on the advantage of both approaches. In this paper, we report an extensive comparison among state-of-the-art aggregation methods applied to binary features. Then, we mathematically formalize the application of Fisher Kernels to Bernoulli Mixture Models. Finally, we investigate the combination of the aggregated binary features with the emerging Convolutional Neural Network (CNN) features. Our results show that aggregation methods on binary features are effective and represent a worthwhile alternative to the direct matching. Moreover, the combination of the CNN with the Fisher Vector (FV) built upon binary features allowed us to obtain a relative improvement over the CNN results that is in line with that recently obtained using the combination of the CNN with the FV built upon SIFTs. The advantage of using the FV built upon binary features is that the extraction process of binary features is about two order of magnitude faster than SIFTs.
Tamir Bendory, Yonina C. Eldar, Nicolas Boumal
The problem of recovering a one-dimensional signal from its Fourier transform magnitude, called Fourier phase retrieval, is ill-posed in most cases. We consider the closely-related problem of recovering a signal from its phaseless short-time Fourier transform (STFT) measurements. This problem arises naturally in several applications, such as ultra-short laser pulse characterization and ptychography. The redundancy offered by the STFT enables unique recovery under mild conditions. We show that in some cases the unique solution can be obtained by the principal eigenvector of a matrix, constructed as the solution of a simple least-squares problem. When these conditions are not met, we suggest using the principal eigenvector of this matrix to initialize non-convex local optimization algorithms and propose two such methods. The first is based on minimizing the empirical risk loss function, while the second maximizes a quadratic function on the manifold of phases. We prove that under appropriate conditions, the proposed initialization is close to the underlying signal. We then analyze the geometry of the empirical risk loss function and show numerically that both gradient algorithms converge to the underlying signal even with small redundancy in the measurements. In addition, the algorithms are robust to noise.
Kaiye Wang, Qiyue Yin, Wei Wang, Shu Wu, Liang Wang
In recent years, cross-modal retrieval has drawn much attention due to the
rapid growth of multimodal data. It takes one type of data as the query to
retrieve relevant data of another type. For example, a user can use a text to
retrieve relevant pictures or videos. Since the query and its retrieved results
can be of different modalities, how to measure the content similarity between
different modalities of data remains a challenge. Various methods have been
proposed to deal with such a problem. In this paper, we first review a number
of representative methods for cross-modal retrieval and classify them into two
main groups: 1) real-valued representation learning, and 2) binary
representation learning. Real-valued representation learning methods aim to
learn real-valued common representations for different modalities of data. To
speed up the cross-modal retrieval, a number of binary representation learning
methods are proposed to map different modalities of data into a common Hamming
space. Then, we introduce several multimodal datasets in the community, and
show the experimental results on two commonly used multimodal datasets. The
comparison reveals the characteristic of different kinds of cross-modal
retrieval methods, which is expected to benefit both practical applications and
future research. Finally, we discuss open problems and future research
directions.
Authors' comments: 20 pages, 11 figures, 9 tables
Leonardo A. Duarte, Otávio A. B. Penatti, Jurandy Almeida
In this paper, we present the Bag-of-Attributes (BoA) model for video representation aiming at video event retrieval. The BoA model is based on a semantic feature space for representing videos, resulting in high-level video feature vectors. For creating a semantic space, i.e., the attribute space, we can train a classifier using a labeled image dataset, obtaining a classification model that can be understood as a high-level codebook. This model is used to map low-level frame vectors into high-level vectors (e.g., classifier probability scores). Then, we apply pooling operations to the frame vectors to create the final bag of attributes for the video. In the BoA representation, each dimension corresponds to one category (or attribute) of the semantic space. Other interesting properties are: compactness, flexibility regarding the classifier, and ability to encode multiple semantic concepts in a single video representation. Our experiments considered the semantic space created by state-of-the-art convolutional neural networks pre-trained on 1000 object categories of ImageNet. Such deep neural networks were used to classify each video frame and then different coding strategies were used to encode the probability distribution from the softmax layer into a frame vector. Next, different pooling strategies were used to combine frame vectors in the BoA representation for a video. Results using BoA were comparable or superior to the baselines in the task of video event retrieval using the EVVE dataset, with the advantage of providing a much more compact representation.
Lorenz Kuhn, Carsten Eickhoff
In this paper, we reflect on ways to improve the quality of bio-medical information retrieval by drawing implicit negative feedback from negated information in noisy natural language search queries. We begin by studying the extent to which negations occur in clinical texts and quantify their detrimental effect on retrieval performance. Subsequently, we present a number of query reformulation and ranking approaches that remedy these shortcomings by resolving natural language negations. Our experimental results are based on data collected in the course of the TREC Clinical Decision Support Track and show consistent improvements compared to state-of-the-art methods. Using our novel algorithms, we are able to reduce the negative impact of negations on early precision by up to 65%.
Gaipeng Kong, Le Dong, Wenpu Dong, Liang Zheng, Qi Tian
This paper addresses the problem of large-scale image retrieval. We propose a two-layer fusion method which takes advantage of global and local cues and ranks database images from coarse to fine (C2F). Departing from the previous methods fusing multiple image descriptors simultaneously, C2F is featured by a layered procedure composed by filtering and refining. In particular, C2F consists of three components. 1) Distractor filtering. With holistic representations, noise images are filtered out from the database, so the number of candidate images to be used for comparison with the query can be greatly reduced. 2) Adaptive weighting. For a certain query, the similarity of candidate images can be estimated by holistic similarity scores in complementary to the local ones. 3) Candidate refining. Accurate retrieval is conducted via local features, combining the pre-computed adaptive weights. Experiments are presented on two benchmarks, \emph{i.e.,} Holidays and Ukbench datasets. We show that our method outperforms recent fusion methods in terms of storage consumption and computation complexity, and that the accuracy is competitive to the state-of-the-arts.
Hua Sun, Syed A. Jafar
Private information retrieval (PIR) is the problem of retrieving as efficiently as possible, one out of $K$ messages from $N$ non-communicating replicated databases (each holds all $K$ messages) while keeping the identity of the desired message index a secret from each individual database. Symmetric PIR (SPIR) is a generalization of PIR to include the requirement that beyond the desired message, the user learns nothing about the other $K-1$ messages. The information theoretic capacity of SPIR (equivalently, the reciprocal of minimum download cost) is the maximum number of bits of desired information that can be privately retrieved per bit of downloaded information. We show that the capacity of SPIR is $1-1/N$ regardless of the number of messages $K$, if the databases have access to common randomness (not available to the user) that is independent of the messages, in the amount that is at least $1/(N-1)$ bits per desired message bit, and zero otherwise. Extensions to the capacity region of SPIR and the capacity of finite length SPIR are provided.
Nawal Ould-Amer, Philippe Mulhem, Mathias Gery
This paper presents preliminary works on using Word Embedding (word2vec) for query expansion in the context of Personalized Information Retrieval. Traditionally, word embeddings are learned on a general corpus, like Wikipedia. In this work we try to personalize the word embeddings learning, by achieving the learning on the user's profile. The word embeddings are then in the same context than the user interests. Our proposal is evaluated on the CLEF Social Book Search 2016 collection. The results obtained show that some efforts should be made in the way to apply Word Embedding in the context of Personalized Information Retrieval.
Petri Luukkonen, Markus Koskela, Patrik Floréen
We describe a method for proactive information retrieval targeted at
retrieving relevant information during a writing task. In our method, the
current task and the needs of the user are estimated, and the potential next
steps are unobtrusively predicted based on the user's past actions. We focus on
the task of writing, in which the user is coalescing previously collected
information into a text. Our proactive system automatically recommends the user
relevant background information. The proposed system incorporates text input
prediction using a long short-term memory (LSTM) network. We present
simulations, which show that the system is able to reach higher precision
values in an exploratory search setting compared to both a baseline and a
comparison system.
Authors' comments: Neu-IR '16 SIGIR Workshop on Neural Information Retrieval, July 21,
2016, Pisa, Italy
B. Piwowarski
Most Information Retrieval models compute the relevance score of a document for a given query by summing term weights specific to a document or a query. Heuristic approaches, like TF-IDF, or probabilistic models, like BM25, are used to specify how a term weight is computed. In this paper, we propose to leverage learning-to-rank principles to learn how to compute a term weight for a given document based on the term occurrence pattern.
Ritesh Kolte, Ayfer Özgür
In the phase retrieval problem, an unknown vector is to be recovered given quadratic measurements. This problem has received considerable attention in recent times. In this paper, we present an algorithm to solve a nonconvex formulation of the phase retrieval problem, that we call $\textit{Incremental Truncated Wirtinger Flow}$. Given random Gaussian sensing vectors, we prove that it converges linearly to the solution, with an optimal sample complexity. We also provide stability guarantees of the algorithm under noisy measurements. Performance and comparisons with existing algorithms are illustrated via numerical experiments on simulated and real data, with both random and structured sensing vectors.
Travis Gagie, Aleksi Hartikainen, Kalle Karhu, Juha Kärkkäinen, Gonzalo Navarro, Simon J. Puglisi, Jouni Sirén
Most of the fastest-growing string collections today are repetitive, that is,
most of the constituent documents are similar to many others. As these
collections keep growing, a key approach to handling them is to exploit their
repetitiveness, which can reduce their space usage by orders of magnitude. We
study the problem of indexing repetitive string collections in order to perform
efficient document retrieval operations on them. Document retrieval problems
are routinely solved by search engines on large natural language collections,
but the techniques are less developed on generic string collections. The case
of repetitive string collections is even less understood, and there are very
few existing solutions. We develop two novel ideas, {\em interleaved LCPs} and
{\em precomputed document lists}, that yield highly compressed indexes solving
the problem of document listing (find all the documents where a string
appears), top-$k$ document retrieval (find the $k$ documents where a string
appears most often), and document counting (count the number of documents where
a string appears). We also show that a classical data structure supporting the
latter query becomes highly compressible on repetitive data. Finally, we show
how the tools we developed can be combined to solve ranked conjunctive and
disjunctive multi-term queries under the simple tf-idf model of relevance. We
thoroughly evaluate the resulting techniques in various real-life
repetitiveness scenarios, and recommend the best choices for each case.
Authors' comments: This research has received funding from the European Union's Horizon
2020 research and innovation programme under the Marie Sk{\l}odowska-Curie
Actions H2020-MSCA-RISE-2015 BIRDS GA No. 690941. Accepted to the Information
Retrieval Journal
Dani Kogan, Yonina C. Eldar, Dan Oron
The recovery of a signal from the magnitude of its Fourier transform, also known as phase retrieval, is of fundamental importance in many scientific fields. It is well known that due to the loss of Fourier phase the problem in 1D is ill-posed. Without further constraints, there is no unique solution to the problem. In contrast, uniqueness up to trivial ambiguities very often exists in higher dimensions, with mild constraints on the input. In this paper we focus on the 2D phase retrieval problem and provide insight into this uniqueness property by exploring the connection between the 2D and 1D formulations. In particular, we show that 2D phase retrieval can be cast as a 1D problem with additional constraints, which limit the solution space. We then prove that only one additional constraint is sufficient to reduce the many feasible solutions in the 1D setting to a unique solution for almost all signals. These results allow to obtain an analytical approach (with combinatorial complexity) to solve the 2D phase retrieval problem when it is unique.
Mina Nouredanesh, Hamid R. Tizhoosh, Ershad Banijamali
In recent years, advances in medical imaging have led to the emergence of
massive databases, containing images from a diverse range of modalities. This
has significantly heightened the need for automated annotation of the images on
one side, and fast and memory-efficient content-based image retrieval systems
on the other side. Binary descriptors have recently gained more attention as a
potential vehicle to achieve these goals. One of the recently introduced binary
descriptors for tagging of medical images are Radon barcodes (RBCs) that are
driven from Radon transform via local thresholding. Gabor transform is also a
powerful transform to extract texture-based information. Gabor features have
exhibited robustness against rotation, scale, and also photometric
disturbances, such as illumination changes and image noise in many
applications. This paper introduces Gabor Barcodes (GBCs), as a novel framework
for the image annotation. To find the most discriminative GBC for a given query
image, the effects of employing Gabor filters with different parameters, i.e.,
different sets of scales and orientations, are investigated, resulting in
different barcode lengths and retrieval performances. The proposed method has
been evaluated on the IRMA dataset with 193 classes comprising of 12,677 x-ray
images for indexing, and 1,733 x-rays images for testing. A total error score
as low as $351$ ($\approx 80\%$ accuracy for the first hit) was achieved.
Authors' comments: To appear in proceedings of The 2016 IEEE International Conference on
Image Processing (ICIP 2016), Sep 25-28, 2016, Phoenix, Arizona, USA
Felix Krahmer, Yi-Kai Liu
In the context of the phase retrieval problem, it is known that certain
natural classes of measurements, such as Fourier measurements and random
Bernoulli measurements, do not lead to the unique reconstruction of all
possible signals, even in combination with certain practically feasible random
masks. To avoid this difficulty, the analysis is often restricted to
measurement ensembles (or masks) that satisfy a small-ball probability
condition, in order to ensure that the reconstruction is unique.
This paper shows a complementary result: for random Bernoulli measurements,
there is still a large class of signals that can be reconstructed uniquely,
namely those signals that are "non-peaky." In fact, this result is much more
general: it holds for random measurements sampled from any subgaussian
distribution D, without any small-ball conditions. This is demonstrated in two
ways: first, a proof of stability and uniqueness, and second, a uniform
recovery guarantee for the PhaseLift algorithm. In all of these cases, the
number of measurements m approaches the information-theoretic lower bound.
Finally, for random Bernoulli measurements with erasures, it is shown that
PhaseLift achieves uniform recovery of all signals (including peaky ones).
Authors' comments: 15 pages; v3: to appear in IEEE Trans. Info. Theory; v2: minor
revisions and clarifications; presented in part at the 2015 SampTA
Conference, see http://doi.org/10.1109/SAMPTA.2015.7148923 and
http://doi.org/10.1109/SAMPTA.2015.7148966
Raphael R. Toledo, George Danezis, Ian Goldberg
Private Information Retrieval (PIR), despite being well studied, is computationally costly and arduous to scale. We explore lower-cost relaxations of information-theoretic PIR, based on dummy queries, sparse vectors, and compositions with an anonymity system. We prove the security of each scheme using a flexible differentially private definition for private queries that can capture notions of imperfect privacy. We show that basic schemes are weak, but some of them can be made arbitrarily safe by composing them with large anonymity systems.
Qingqun Ning, Jianke Zhu, Zhiyuan Zhong, Steven C. H. Hoi, Chun Chen
Fast Approximate Nearest Neighbor (ANN) search technique for high-dimensional
feature indexing and retrieval is the crux of large-scale image retrieval. A
recent promising technique is Product Quantization, which attempts to index
high-dimensional image features by decomposing the feature space into a
Cartesian product of low dimensional subspaces and quantizing each of them
separately. Despite the promising results reported, their quantization approach
follows the typical hard assignment of traditional quantization methods, which
may result in large quantization errors and thus inferior search performance.
Unlike the existing approaches, in this paper, we propose a novel approach
called Sparse Product Quantization (SPQ) to encoding the high-dimensional
feature vectors into sparse representation. We optimize the sparse
representations of the feature vectors by minimizing their quantization errors,
making the resulting representation is essentially close to the original data
in practice. Experiments show that the proposed SPQ technique is not only able
to compress data, but also an effective encoding technique. We obtain
state-of-the-art results for ANN search on four public image datasets and the
promising results of content-based image retrieval further validate the
efficacy of our proposed method.
Authors' comments: 12 pages
Tanaka Kanji
Change detection, i.e., anomaly detection from local maps built by a mobile
robot at multiple different times, is a challenging problem to solve in
practice. Most previous work either cannot be applied to scenarios where the
size of the map collection is large, or simply assumed that the robot
self-location is globally known. In this paper, we tackle the problem of
simultaneous self-localization and change detection, by reformulating the
problem as a map retrieval problem, and propose a local map descriptor with a
compressed bag-of-words (BoW) structure as a scalable solution. We make the
following contributions. (1) To enable a direct comparison of the spatial
layout of visual features between different local maps, the origin of the local
map coordinate (termed "viewpoint") is planned by scene parsing and determined
by our "viewpoint planner" to be invariant against small variations in
self-location and changes, aiming at providing similar viewpoints for similar
scenes (i.e., the relevant map pair). (2) We extend the BoW model to enable the
use of not only the appearance (e.g., polestar) but also the spatial layout
(e.g., spatial pyramid) of visual features with respect to the planned
viewpoint. The key observation is that the planned viewpoint (i.e., the origin
of local map coordinate) acts as a pseudo viewpoint that is usually required by
spatial BoW (e.g., SPM) and also by anomaly detection (e.g., NN-d, LOF). (3)
Experimental results on a challenging "loop-closing" scenario show that the
proposed method outperforms previous BoW methods in self-localization, and
furthermore, that the use of both appearance and pose information in change
detection produces better results than the use of either information alone.
Authors' comments: 8 pages, 7 figures, Draft of a paper submitted to an International
Conference
Hua Sun, Syed A. Jafar
In the private information retrieval (PIR) problem a user wishes to retrieve, as efficiently as possible, one out of $K$ messages from $N$ non-communicating databases (each holds all $K$ messages) while revealing nothing about the identity of the desired message index to any individual database. The information theoretic capacity of PIR is the maximum number of bits of desired information that can be privately retrieved per bit of downloaded information. For $K$ messages and $N$ databases, we show that the PIR capacity is $(1+1/N+1/N^2+\cdots+1/N^{K-1})^{-1}$. A remarkable feature of the capacity achieving scheme is that if we eliminate any subset of messages (by setting the message symbols to zero), the resulting scheme also achieves the PIR capacity for the remaining subset of messages.
Ju Sun, Qing Qu, John Wright
Can we recover a complex signal from its Fourier magnitudes? More generally,
given a set of $m$ measurements, $y_k = |\mathbf a_k^* \mathbf x|$ for $k = 1,
\dots, m$, is it possible to recover $\mathbf x \in \mathbb{C}^n$ (i.e.,
length-$n$ complex vector)? This **generalized phase retrieval** (GPR) problem
is a fundamental task in various disciplines, and has been the subject of much
recent investigation. Natural nonconvex heuristics often work remarkably well
for GPR in practice, but lack clear theoretical explanations. In this paper, we
take a step towards bridging this gap. We prove that when the measurement
vectors $\mathbf a_k$'s are generic (i.i.d. complex Gaussian) and the number of
measurements is large enough ($m \ge C n \log^3 n$), with high probability, a
natural least-squares formulation for GPR has the following benign geometric
structure: (1) there are no spurious local minimizers, and all global
minimizers are equal to the target signal $\mathbf x$, up to a global phase;
and (2) the objective function has a negative curvature around each saddle
point. This structure allows a number of iterative optimization methods to
efficiently find a global minimizer, without special initialization. To
corroborate the claim, we describe and analyze a second-order trust-region
algorithm.
Authors' comments: 61 pages, 5 figures. A short version can be found here
http://sunju.org/docs/PR_G4_16.pdf . Revised according to reviewers' feedback