H. R. Tizhoosh, Shujin Zhu, Hanson Lo, Varun Chaudhari, Tahmid Mehdi
Content-based medical image retrieval can support diagnostic decisions by
clinical experts. Examining similar images may provide clues to the expert to
remove uncertainties in his/her final diagnosis. Beyond conventional feature
descriptors, binary features in different ways have been recently proposed to
encode the image content. A recent proposal is "Radon barcodes" that employ
binarized Radon projections to tag/annotate medical images with content-based
binary vectors, called barcodes. In this paper, MinMax Radon barcodes are
introduced which are superior to "local thresholding" scheme suggested in the
literature. Using IRMA dataset with 14,410 x-ray images from 193 different
classes, the advantage of using MinMax Radon barcodes over \emph{thresholded}
Radon barcodes are demonstrated. The retrieval error for direct search drops by
more than 15\%. As well, SURF, as a well-established non-binary approach, and
BRISK, as a recent binary method are examined to compare their results with
MinMax Radon barcodes when retrieving images from IRMA dataset. The results
demonstrate that MinMax Radon barcodes are faster and more accurate when
applied on IRMA images.
Authors' comments: To appear in proceedings of the 12th International Symposium on
Visual Computing, December 12-14, 2016, Las Vegas, Nevada, USA
Yiwei Zhang, Xin Wang, Hengjia Wei, Gennian Ge
Given a database, the private information retrieval (PIR) protocol allows a user to make queries to several servers and retrieve a certain item of the database via the feedbacks, without revealing the privacy of the specific item to any single server. Classical models of PIR protocols require that each server stores a whole copy of the database. Recently new PIR models are proposed with coding techniques arising from distributed storage system. In these new models each server only stores a fraction $1/s$ of the whole database, where $s>1$ is a given rational number. PIR array codes are recently proposed by Fazeli, Vardy and Yaakobi to characterize the new models. Consider a PIR array code with $m$ servers and the $k$-PIR property (which indicates that these $m$ servers may emulate any efficient $k$-PIR protocol). The central problem is to design PIR array codes with optimal rate $k/m$. Our contribution to this problem is three-fold. First, for the case $1<s\le 2$, although PIR array codes with optimal rate have been constructed recently by Blackburn and Etzion, the number of servers in their construction is impractically large. We determine the minimum number of servers admitting the existence of a PIR array code with optimal rate for a certain range of parameters. Second, for the case $s>2$, we derive a new upper bound on the rate of a PIR array code. Finally, for the case $s>2$, we analyze a new construction by Blackburn and Etzion and show that its rate is better than all the other existing constructions.
Mina Nouredanesh, H. R. Tizhoosh, Ershad Banijamali, James Tung
In recent years, with the explosion of digital images on the Web,
content-based retrieval has emerged as a significant research area. Shapes,
textures, edges and segments may play a key role in describing the content of
an image. Radon and Gabor transforms are both powerful techniques that have
been widely studied to extract shape-texture-based information. The combined
Radon-Gabor features may be more robust against scale/rotation variations,
presence of noise, and illumination changes. The objective of this paper is to
harness the potentials of both Gabor and Radon transforms in order to introduce
expressive binary features, called barcodes, for image annotation/tagging
tasks. We propose two different techniques: Gabor-of-Radon-Image Barcodes
(GRIBCs), and Guided-Radon-of-Gabor Barcodes (GRGBCs). For validation, we
employ the IRMA x-ray dataset with 193 classes, containing 12,677 training
images and 1,733 test images. A total error score as low as 322 and 330 were
achieved for GRGBCs and GRIBCs, respectively. This corresponds to $\approx
81\%$ retrieval accuracy for the first hit.
Authors' comments: To appear in proceedings of the 23rd International Conference on
Pattern Recognition (ICPR 2016), Cancun, Mexico, December 2016
Tianyu Qiu, Daniel P. Palomar
In the undersampled phase retrieval problem, the goal is to recover an $N$-dimensional complex signal $\mathbf{x}$ from only $M<N$ noisy intensity measurements without phase information. This problem has drawn a lot of attention to reduce the number of required measurements since a recent theory established that $M\approx4N$ intensity measurements are necessary and sufficient to recover a generic signal $\mathbf{x}$. In this paper, we propose to exploit the sparsity in the original signal and develop low-complexity algorithms with superior performance based on the majorization-minimization (MM) framework. The proposed algorithms are preferred to existing benchmark methods since at each iteration a simple surrogate problem is solved with a closed-form solution that monotonically decreases the original objective function. Experimental results validate that our algorithms outperform existing up-to-date methods in terms of recovery probability and accuracy, under the same settings.
Rima Alaifari, Ingrid Daubechies, Philipp Grohs, Rujie Yin
The problem of phase retrieval is to determine a signal $f\in \mathcal{H}$, with $\mathcal{H}$ a Hilbert space, from intensity measurements $|F(\omega)|$, where $F(\omega):=\langle f , \varphi_\omega\rangle$ are measurements of $f$ with respect to a measurement system $(\varphi_\omega)_{\omega\in \Omega}\subset \mathcal{H}$. Although phase retrieval is always stable in the finite dimensional setting whenever it is possible (i.e. injectivity implies stability for the inverse problem), the situation is drastically different if $\mathcal{H}$ is infinite-dimensional: in that case phase retrieval is never uniformly stable [8, 4]; moreover the stability deteriorates severely in the dimension of the problem [8]. On the other hand, all empirically observed instabilities are of a certain type: they occur whenever the function $|F|$ of intensity measurements is concentrated on disjoint sets $D_j\subset \Omega$, i.e., when $F= \sum_{j=1}^k F_j$ where each $F_j$ is concentrated on $D_j$ (and $k \geq 2$). Motivated by these considerations we propose a new paradigm for stable phase retrieval by considering the problem of reconstructing $F$ up to a phase factor that is not global, but that can be different for each of the subsets $D_j$, i.e., recovering $F$ up to the equivalence $$ F \sim \sum_{j=1}^k e^{i \alpha_j} F_j.$$ We present concrete applications (for example in audio processing) where this new notion of stability is natural and meaningful and show that in this setting stable phase retrieval can actually be achieved, for instance if the measurement system is a Gabor frame or a frame of Cauchy wavelets.
Christophe Van Gysel, Maarten de Rijke, Marcel Worring
We introduce an unsupervised discriminative model for the task of retrieving
experts in online document collections. We exclusively employ textual evidence
and avoid explicit feature engineering by learning distributed word
representations in an unsupervised way. We compare our model to
state-of-the-art unsupervised statistical vector space and probabilistic
generative approaches. Our proposed log-linear model achieves the retrieval
performance levels of state-of-the-art document-centric methods with the low
inference cost of so-called profile-centric approaches. It yields a
statistically significant improved ranking over vector space and generative
models in most cases, matching the performance of supervised methods on various
benchmarks. That is, by using solely text we can do as well as methods that
work with external evidence and/or relevance feedback. A contrastive analysis
of rankings produced by discriminative and generative approaches shows that
they have complementary strengths due to the ability of the unsupervised
discriminative model to perform semantic matching.
Authors' comments: WWW2016, Proceedings of the 25th International Conference on World
Wide Web. 2016
Zhangjie Cao, Mingsheng Long, Qiang Yang
Hashing has been widely applied to large-scale multimedia retrieval due to the storage and retrieval efficiency. Cross-modal hashing enables efficient retrieval from database of one modality in response to a query of another modality. Existing work on cross-modal hashing assumes heterogeneous relationship across modalities for hash function learning. In this paper, we relax the strong assumption by only requiring such heterogeneous relationship in an auxiliary dataset different from the query/database domain. We craft a hybrid deep architecture to simultaneously learn the cross-modal correlation from the auxiliary dataset, and align the dataset distributions between the auxiliary dataset and the query/database domain, which generates transitive hash codes for heterogeneous multimedia retrieval. Extensive experiments exhibit that the proposed approach yields state of the art multimedia retrieval performance on public datasets, i.e. NUS-WIDE, ImageNet-YahooQA.
Tomoya Murase, Kanji Tanaka
Change detection, or anomaly detection, from street-view images acquired by
an autonomous robot at multiple different times, is a major problem in robotic
mapping and autonomous driving. Formulation as an image comparison task, which
operates on a given pair of query and reference images is common to many
existing approaches to this problem. Unfortunately, providing relevant
reference images is not straightforward. In this paper, we propose a novel
formulation for change detection, termed compressive change retrieval, which
can operate on a query image and similar reference images retrieved from the
web. Compared to previous formulations, there are two sources of difficulty.
First, the retrieved reference images may frequently contain non-relevant
reference images, because even state-of-the-art place-recognition techniques
suffer from retrieval noise. Second, image comparison needs to be conducted in
a compressed domain to minimize the storage cost of large collections of
street-view images. To address the above issues, we also present a practical
change detection algorithm that uses compressed bag-of-words (BoW) image
representation as a scalable solution. The results of experiments conducted on
a practical change detection task, "moving object detection (MOD)," using the
publicly available Malaga dataset validate the effectiveness of the proposed
approach.
Authors' comments: 6 pages, 6 figures, Draft of a paper submitted to an International
Conference
Giuseppe Amato, Fabrizio Falchi, Lucia Vadicamo
Content-Based Image Retrieval based on local features is computationally expensive because of the complexity of both extraction and matching of local feature. On one hand, the cost for extracting, representing, and comparing local visual descriptors has been dramatically reduced by recently proposed binary local features. On the other hand, aggregation techniques provide a meaningful summarization of all the extracted feature of an image into a single descriptor, allowing us to speed up and scale up the image search. Only a few works have recently mixed together these two research directions, defining aggregation methods for binary local features, in order to leverage on the advantage of both approaches. In this paper, we report an extensive comparison among state-of-the-art aggregation methods applied to binary features. Then, we mathematically formalize the application of Fisher Kernels to Bernoulli Mixture Models. Finally, we investigate the combination of the aggregated binary features with the emerging Convolutional Neural Network (CNN) features. Our results show that aggregation methods on binary features are effective and represent a worthwhile alternative to the direct matching. Moreover, the combination of the CNN with the Fisher Vector (FV) built upon binary features allowed us to obtain a relative improvement over the CNN results that is in line with that recently obtained using the combination of the CNN with the FV built upon SIFTs. The advantage of using the FV built upon binary features is that the extraction process of binary features is about two order of magnitude faster than SIFTs.
Tamir Bendory, Yonina C. Eldar, Nicolas Boumal
The problem of recovering a one-dimensional signal from its Fourier transform magnitude, called Fourier phase retrieval, is ill-posed in most cases. We consider the closely-related problem of recovering a signal from its phaseless short-time Fourier transform (STFT) measurements. This problem arises naturally in several applications, such as ultra-short laser pulse characterization and ptychography. The redundancy offered by the STFT enables unique recovery under mild conditions. We show that in some cases the unique solution can be obtained by the principal eigenvector of a matrix, constructed as the solution of a simple least-squares problem. When these conditions are not met, we suggest using the principal eigenvector of this matrix to initialize non-convex local optimization algorithms and propose two such methods. The first is based on minimizing the empirical risk loss function, while the second maximizes a quadratic function on the manifold of phases. We prove that under appropriate conditions, the proposed initialization is close to the underlying signal. We then analyze the geometry of the empirical risk loss function and show numerically that both gradient algorithms converge to the underlying signal even with small redundancy in the measurements. In addition, the algorithms are robust to noise.
Kaiye Wang, Qiyue Yin, Wei Wang, Shu Wu, Liang Wang
In recent years, cross-modal retrieval has drawn much attention due to the
rapid growth of multimodal data. It takes one type of data as the query to
retrieve relevant data of another type. For example, a user can use a text to
retrieve relevant pictures or videos. Since the query and its retrieved results
can be of different modalities, how to measure the content similarity between
different modalities of data remains a challenge. Various methods have been
proposed to deal with such a problem. In this paper, we first review a number
of representative methods for cross-modal retrieval and classify them into two
main groups: 1) real-valued representation learning, and 2) binary
representation learning. Real-valued representation learning methods aim to
learn real-valued common representations for different modalities of data. To
speed up the cross-modal retrieval, a number of binary representation learning
methods are proposed to map different modalities of data into a common Hamming
space. Then, we introduce several multimodal datasets in the community, and
show the experimental results on two commonly used multimodal datasets. The
comparison reveals the characteristic of different kinds of cross-modal
retrieval methods, which is expected to benefit both practical applications and
future research. Finally, we discuss open problems and future research
directions.
Authors' comments: 20 pages, 11 figures, 9 tables
Leonardo A. Duarte, Otávio A. B. Penatti, Jurandy Almeida
In this paper, we present the Bag-of-Attributes (BoA) model for video representation aiming at video event retrieval. The BoA model is based on a semantic feature space for representing videos, resulting in high-level video feature vectors. For creating a semantic space, i.e., the attribute space, we can train a classifier using a labeled image dataset, obtaining a classification model that can be understood as a high-level codebook. This model is used to map low-level frame vectors into high-level vectors (e.g., classifier probability scores). Then, we apply pooling operations to the frame vectors to create the final bag of attributes for the video. In the BoA representation, each dimension corresponds to one category (or attribute) of the semantic space. Other interesting properties are: compactness, flexibility regarding the classifier, and ability to encode multiple semantic concepts in a single video representation. Our experiments considered the semantic space created by state-of-the-art convolutional neural networks pre-trained on 1000 object categories of ImageNet. Such deep neural networks were used to classify each video frame and then different coding strategies were used to encode the probability distribution from the softmax layer into a frame vector. Next, different pooling strategies were used to combine frame vectors in the BoA representation for a video. Results using BoA were comparable or superior to the baselines in the task of video event retrieval using the EVVE dataset, with the advantage of providing a much more compact representation.
Lorenz Kuhn, Carsten Eickhoff
In this paper, we reflect on ways to improve the quality of bio-medical information retrieval by drawing implicit negative feedback from negated information in noisy natural language search queries. We begin by studying the extent to which negations occur in clinical texts and quantify their detrimental effect on retrieval performance. Subsequently, we present a number of query reformulation and ranking approaches that remedy these shortcomings by resolving natural language negations. Our experimental results are based on data collected in the course of the TREC Clinical Decision Support Track and show consistent improvements compared to state-of-the-art methods. Using our novel algorithms, we are able to reduce the negative impact of negations on early precision by up to 65%.
Gaipeng Kong, Le Dong, Wenpu Dong, Liang Zheng, Qi Tian
This paper addresses the problem of large-scale image retrieval. We propose a two-layer fusion method which takes advantage of global and local cues and ranks database images from coarse to fine (C2F). Departing from the previous methods fusing multiple image descriptors simultaneously, C2F is featured by a layered procedure composed by filtering and refining. In particular, C2F consists of three components. 1) Distractor filtering. With holistic representations, noise images are filtered out from the database, so the number of candidate images to be used for comparison with the query can be greatly reduced. 2) Adaptive weighting. For a certain query, the similarity of candidate images can be estimated by holistic similarity scores in complementary to the local ones. 3) Candidate refining. Accurate retrieval is conducted via local features, combining the pre-computed adaptive weights. Experiments are presented on two benchmarks, \emph{i.e.,} Holidays and Ukbench datasets. We show that our method outperforms recent fusion methods in terms of storage consumption and computation complexity, and that the accuracy is competitive to the state-of-the-arts.
Hua Sun, Syed A. Jafar
Private information retrieval (PIR) is the problem of retrieving as efficiently as possible, one out of $K$ messages from $N$ non-communicating replicated databases (each holds all $K$ messages) while keeping the identity of the desired message index a secret from each individual database. Symmetric PIR (SPIR) is a generalization of PIR to include the requirement that beyond the desired message, the user learns nothing about the other $K-1$ messages. The information theoretic capacity of SPIR (equivalently, the reciprocal of minimum download cost) is the maximum number of bits of desired information that can be privately retrieved per bit of downloaded information. We show that the capacity of SPIR is $1-1/N$ regardless of the number of messages $K$, if the databases have access to common randomness (not available to the user) that is independent of the messages, in the amount that is at least $1/(N-1)$ bits per desired message bit, and zero otherwise. Extensions to the capacity region of SPIR and the capacity of finite length SPIR are provided.
Nawal Ould-Amer, Philippe Mulhem, Mathias Gery
This paper presents preliminary works on using Word Embedding (word2vec) for query expansion in the context of Personalized Information Retrieval. Traditionally, word embeddings are learned on a general corpus, like Wikipedia. In this work we try to personalize the word embeddings learning, by achieving the learning on the user's profile. The word embeddings are then in the same context than the user interests. Our proposal is evaluated on the CLEF Social Book Search 2016 collection. The results obtained show that some efforts should be made in the way to apply Word Embedding in the context of Personalized Information Retrieval.
Petri Luukkonen, Markus Koskela, Patrik Floréen
We describe a method for proactive information retrieval targeted at
retrieving relevant information during a writing task. In our method, the
current task and the needs of the user are estimated, and the potential next
steps are unobtrusively predicted based on the user's past actions. We focus on
the task of writing, in which the user is coalescing previously collected
information into a text. Our proactive system automatically recommends the user
relevant background information. The proposed system incorporates text input
prediction using a long short-term memory (LSTM) network. We present
simulations, which show that the system is able to reach higher precision
values in an exploratory search setting compared to both a baseline and a
comparison system.
Authors' comments: Neu-IR '16 SIGIR Workshop on Neural Information Retrieval, July 21,
2016, Pisa, Italy
B. Piwowarski
Most Information Retrieval models compute the relevance score of a document for a given query by summing term weights specific to a document or a query. Heuristic approaches, like TF-IDF, or probabilistic models, like BM25, are used to specify how a term weight is computed. In this paper, we propose to leverage learning-to-rank principles to learn how to compute a term weight for a given document based on the term occurrence pattern.
Ritesh Kolte, Ayfer Özgür
In the phase retrieval problem, an unknown vector is to be recovered given quadratic measurements. This problem has received considerable attention in recent times. In this paper, we present an algorithm to solve a nonconvex formulation of the phase retrieval problem, that we call $\textit{Incremental Truncated Wirtinger Flow}$. Given random Gaussian sensing vectors, we prove that it converges linearly to the solution, with an optimal sample complexity. We also provide stability guarantees of the algorithm under noisy measurements. Performance and comparisons with existing algorithms are illustrated via numerical experiments on simulated and real data, with both random and structured sensing vectors.
Travis Gagie, Aleksi Hartikainen, Kalle Karhu, Juha Kärkkäinen, Gonzalo Navarro, Simon J. Puglisi, Jouni Sirén
Most of the fastest-growing string collections today are repetitive, that is,
most of the constituent documents are similar to many others. As these
collections keep growing, a key approach to handling them is to exploit their
repetitiveness, which can reduce their space usage by orders of magnitude. We
study the problem of indexing repetitive string collections in order to perform
efficient document retrieval operations on them. Document retrieval problems
are routinely solved by search engines on large natural language collections,
but the techniques are less developed on generic string collections. The case
of repetitive string collections is even less understood, and there are very
few existing solutions. We develop two novel ideas, {\em interleaved LCPs} and
{\em precomputed document lists}, that yield highly compressed indexes solving
the problem of document listing (find all the documents where a string
appears), top-$k$ document retrieval (find the $k$ documents where a string
appears most often), and document counting (count the number of documents where
a string appears). We also show that a classical data structure supporting the
latter query becomes highly compressible on repetitive data. Finally, we show
how the tools we developed can be combined to solve ranked conjunctive and
disjunctive multi-term queries under the simple tf-idf model of relevance. We
thoroughly evaluate the resulting techniques in various real-life
repetitiveness scenarios, and recommend the best choices for each case.
Authors' comments: This research has received funding from the European Union's Horizon
2020 research and innovation programme under the Marie Sk{\l}odowska-Curie
Actions H2020-MSCA-RISE-2015 BIRDS GA No. 690941. Accepted to the Information
Retrieval Journal