Shanshan Huang, Yichao Xiong, Ya Zhang, Jia Wang
Hashing has played a pivotal role in large-scale image retrieval. With the development of Convolutional Neural Network (CNN), hashing learning has shown great promise. But existing methods are mostly tuned for classification, which are not optimized for retrieval tasks, especially for instance-level retrieval. In this study, we propose a novel hashing method for large-scale image retrieval. Considering the difficulty in obtaining labeled datasets for image retrieval task in large scale, we propose a novel CNN-based unsupervised hashing method, namely Unsupervised Triplet Hashing (UTH). The unsupervised hashing network is designed under the following three principles: 1) more discriminative representations for image retrieval; 2) minimum quantization loss between the original real-valued feature descriptors and the learned hash codes; 3) maximum information entropy for the learned hash codes. Extensive experiments on CIFAR-10, MNIST and In-shop datasets have shown that UTH outperforms several state-of-the-art unsupervised hashing methods in terms of retrieval accuracy.
Edouard Pauwels, Amir Beck, Yonina C. Eldar, Shoham Sabach
Alternating minimization, or Fienup methods, have a long history in phase retrieval. We provide new insights related to the empirical and theoretical analysis of these algorithms when used with Fourier measurements and combined with convex priors. In particular, we show that Fienup methods can be viewed as performing alternating minimization on a regularized nonconvex least-squares problem with respect to amplitude measurements. We then prove that under mild additional structural assumptions on the prior (semi-algebraicity), the sequence of signal estimates has a smooth convergent behaviour towards a critical point of the nonconvex regularized least-squares objective. Finally, we propose an extension to Fienup techniques, based on a projected gradient descent interpretation and acceleration using inertial terms. We demonstrate experimentally that this modification combined with an $\ell_1$ prior constitutes a competitive approach for sparse phase retrieval.
Y Qian, E Vazquez, B Sengupta
Comparing images to recommend items from an image-inventory is a subject of
continued interest. Added with the scalability of deep-learning architectures
the once `manual' job of hand-crafting features have been largely alleviated,
and images can be compared according to features generated from a deep
convolutional neural network. In this paper, we compare distance metrics (and
divergences) to rank features generated from a neural network, for
content-based image retrieval. Specifically, after modelling individual images
using approximations of mixture models or sparse covariance estimators, we
resort to their information-theoretic and Riemann geometric comparisons. We
show that using approximations of mixture models enable us to compute a
distance measure based on the Wasserstein metric that requires less effort than
other computationally intensive optimal transport plans; finally, an affine
invariant metric is used to compare the optimal transport metric to its Riemann
geometric counterpart -- we conclude that although expensive, retrieval metric
based on Wasserstein geometry is more suitable than information theoretic
comparison of images. In short, we combine GPU scalability in learning deep
feature vectors with statistically efficient metrics that we foresee being
utilised in a commercial setting.
Authors' comments: 5th ICDM Workshop on High Dimensional Data Mining (HDM 2017)
Eduardo X. Miqueles, Nathaly L. Archilha, Marcelo R. Dos Anjos, Harry Westfahl, Elias S. Helou
It was recently shown that the phase retrieval imaging of a sample can be modeled as a simple convolution process. Sometimes, such a convolution depends on physical parameters of the sample which are difficult to estimate a priori. In this case, a blind choice for those parameters usually lead to wrong results, e.g., in posterior image segmentation processing. In this manuscript, we propose a simple connection between phase-retrieval algorithms and optimization strategies, which lead us to ways of numerically determining the physical parameters
Xue Jiang, H. C. So, X. Liu
An outlier-resistance phase retrieval algorithm based on alternating direction method of multipliers (ADMM) is devised in this letter. Instead of the widely used least squares criterion that is only optimal for Gaussian noise environment, we adopt the least absolute deviation criterion to enhance the robustness against outliers. Considering both intensity- and amplitude-based observation models, the framework of ADMM is developed to solve the resulting non-differentiable optimization problems. It is demonstrated that the core subproblem of ADMM is the proximity operator of the L1-norm, which can be computed efficiently by soft-thresholding in each iteration. Simulation results are provided to validate the accuracy and efficiency of the proposed approach compared to the existing schemes.
Ziyang Yuan, Hongxia Wang
Phase retrieval(PR) problem is a kind of ill-condition inverse problem which
is arising in various of applications. Based on the Wirtinger flow(WF) method,
a reweighted Wirtinger flow(RWF) method is proposed to deal with PR problem.
RWF finds the global optimum by solving a series of sub-PR problems with
changing weights. Theoretical analyses illustrate that the RWF has a geometric
convergence from a deliberate initialization when the weights are bounded by 1
and $\frac{10}{9}$. Numerical testing shows RWF has a lower sampling complexity
compared with WF. As an essentially adaptive truncated Wirtinger flow(TWF)
method, RWF performs better than TWF especially when the ratio between sampling
number $m$ and length of signal $n$ is small.
Authors' comments: 21 pages, 11 figures
Sara Botello-Andrade, Peter G. Casazza, Dorsa Ghoreishi, Shani Jose, Janet C. Tremain
Phase retrieval and phaseless reconstruction for Hilbert space frames is a very active area of research. Recently, it was shown that these concepts are equivalent. In this paper, we make a detailed study of a weakening of these concepts to weak phase retrieval and weak phaseless reconstruction. We will give several necessary and/or sufficient conditions for frames to have these weak properties. We will prove three surprising results: (1) Weak phaseless reconstruction is equivalent to phaseless reconstruction. I.e. It never was "weak"; (2) Weak phase retrieval is not equivalent to weak phaseless reconstruction; (3) Weak phase retrieval requires at least $2m-2$ vectors in an m-dimensional Hilbert space. We also gives several examples illustrating the relationship between these concepts.
Paul Hand, Vladislav Voroninski
We consider the problem of phase retrieval from corrupted magnitude observations. In particular we show that a fixed $x_0 \in \mathbb{R}^n$ can be recovered exactly from corrupted magnitude measurements $|\langle a_i, x_0 \rangle | + \eta_i, \quad i =1,2\ldots m$ with high probability for $m = O(n)$, where $a_i \in \mathbb{R}^n$ are i.i.d standard Gaussian and $\eta \in \mathbb{R}^m$ has fixed sparse support and is otherwise arbitrary, by using a version of the PhaseMax algorithm augmented with slack variables subject to a penalty. This linear programming formulation, which we call RobustPhaseMax, operates in the natural parameter space, and our proofs rely on a direct analysis of the optimality conditions using concentration inequalities.
Gang Wang, Liang Zhang, Georgios B. Giannakis, Mehmet Akcakaya, Jie Chen
This paper develops a novel algorithm, termed \emph{SPARse Truncated
Amplitude flow} (SPARTA), to reconstruct a sparse signal from a small number of
magnitude-only measurements. It deals with what is also known as sparse phase
retrieval (PR), which is \emph{NP-hard} in general and emerges in many science
and engineering applications. Upon formulating sparse PR as an amplitude-based
nonconvex optimization task, SPARTA works iteratively in two stages: In stage
one, the support of the underlying sparse signal is recovered using an
analytically well-justified rule, and subsequently, a sparse
orthogonality-promoting initialization is obtained via power iterations
restricted on the support; and, in the second stage, the initialization is
successively refined by means of hard thresholding based gradient-type
iterations. SPARTA is a simple yet effective, scalable, and fast sparse PR
solver. On the theoretical side, for any $n$-dimensional $k$-sparse ($k\ll n$)
signal $\bm{x}$ with minimum (in modulus) nonzero entries on the order of
$(1/\sqrt{k})\|\bm{x}\|_2$, SPARTA recovers the signal exactly (up to a global
unimodular constant) from about $k^2\log n$ random Gaussian measurements with
high probability. Furthermore, SPARTA incurs computational complexity on the
order of $k^2n\log n$ with total runtime proportional to the time required to
read the data, which improves upon the state-of-the-art by at least a factor of
$k$. Finally, SPARTA is robust against additive noise of bounded support.
Extensive numerical tests corroborate markedly improved recovery performance
and speedups of SPARTA relative to existing alternatives.
Authors' comments: 23 pages; 5 figures
Ye Zhang, Md Mustafizur Rahman, Alex Braylan, Brandon Dang, Heng-Lu Chang, Henna Kim, Quinten McNamara, Aaron Angert et al.
A recent "third wave" of Neural Network (NN) approaches now delivers
state-of-the-art performance in many machine learning tasks, spanning speech
recognition, computer vision, and natural language processing. Because these
modern NNs often comprise multiple interconnected layers, this new NN research
is often referred to as deep learning. Stemming from this tide of NN work, a
number of researchers have recently begun to investigate NN approaches to
Information Retrieval (IR). While deep NNs have yet to achieve the same level
of success in IR as seen in other areas, the recent surge of interest and work
in NNs for IR suggest that this state of affairs may be quickly changing. In
this work, we survey the current landscape of Neural IR research, paying
special attention to the use of learned representations of queries and
documents (i.e., neural embeddings). We highlight the successes of neural IR
thus far, catalog obstacles to its wider adoption, and suggest potentially
promising directions for future research.
Authors' comments: 44 pages
Tom Goldstein, Christoph Studer
We consider the recovery of a (real- or complex-valued) signal from magnitude-only measurements, known as phase retrieval. We formulate phase retrieval as a convex optimization problem, which we call PhaseMax. Unlike other convex methods that use semidefinite relaxation and lift the phase retrieval problem to a higher dimension, PhaseMax is a "non-lifting" relaxation that operates in the original signal dimension. We show that the dual problem to PhaseMax is Basis Pursuit, which implies that phase retrieval can be performed using algorithms initially designed for sparse signal recovery. We develop sharp lower bounds on the success probability of PhaseMax for a broad range of random measurement ensembles, and we analyze the impact of measurement noise on the solution accuracy. We use numerical results to demonstrate the accuracy of our recovery guarantees, and we showcase the efficacy and limits of PhaseMax in practice.
Tongfei Chen, Benjamin Van Durme
We propose a framework for discriminative Information Retrieval (IR) atop linguistic features, trained to improve the recall of tasks such as answer candidate passage retrieval, the initial step in text-based Question Answering (QA). We formalize this as an instance of linear feature-based IR (Metzler and Croft, 2007), illustrating how a variety of knowledge discovery tasks are captured under this approach, leading to a 44% improvement in recall for candidate triage for QA.
H. R. Tizhoosh, Shujin Zhu, Hanson Lo, Varun Chaudhari, Tahmid Mehdi
Content-based medical image retrieval can support diagnostic decisions by
clinical experts. Examining similar images may provide clues to the expert to
remove uncertainties in his/her final diagnosis. Beyond conventional feature
descriptors, binary features in different ways have been recently proposed to
encode the image content. A recent proposal is "Radon barcodes" that employ
binarized Radon projections to tag/annotate medical images with content-based
binary vectors, called barcodes. In this paper, MinMax Radon barcodes are
introduced which are superior to "local thresholding" scheme suggested in the
literature. Using IRMA dataset with 14,410 x-ray images from 193 different
classes, the advantage of using MinMax Radon barcodes over \emph{thresholded}
Radon barcodes are demonstrated. The retrieval error for direct search drops by
more than 15\%. As well, SURF, as a well-established non-binary approach, and
BRISK, as a recent binary method are examined to compare their results with
MinMax Radon barcodes when retrieving images from IRMA dataset. The results
demonstrate that MinMax Radon barcodes are faster and more accurate when
applied on IRMA images.
Authors' comments: To appear in proceedings of the 12th International Symposium on
Visual Computing, December 12-14, 2016, Las Vegas, Nevada, USA
Yiwei Zhang, Xin Wang, Hengjia Wei, Gennian Ge
Given a database, the private information retrieval (PIR) protocol allows a user to make queries to several servers and retrieve a certain item of the database via the feedbacks, without revealing the privacy of the specific item to any single server. Classical models of PIR protocols require that each server stores a whole copy of the database. Recently new PIR models are proposed with coding techniques arising from distributed storage system. In these new models each server only stores a fraction $1/s$ of the whole database, where $s>1$ is a given rational number. PIR array codes are recently proposed by Fazeli, Vardy and Yaakobi to characterize the new models. Consider a PIR array code with $m$ servers and the $k$-PIR property (which indicates that these $m$ servers may emulate any efficient $k$-PIR protocol). The central problem is to design PIR array codes with optimal rate $k/m$. Our contribution to this problem is three-fold. First, for the case $1<s\le 2$, although PIR array codes with optimal rate have been constructed recently by Blackburn and Etzion, the number of servers in their construction is impractically large. We determine the minimum number of servers admitting the existence of a PIR array code with optimal rate for a certain range of parameters. Second, for the case $s>2$, we derive a new upper bound on the rate of a PIR array code. Finally, for the case $s>2$, we analyze a new construction by Blackburn and Etzion and show that its rate is better than all the other existing constructions.
Mina Nouredanesh, H. R. Tizhoosh, Ershad Banijamali, James Tung
In recent years, with the explosion of digital images on the Web,
content-based retrieval has emerged as a significant research area. Shapes,
textures, edges and segments may play a key role in describing the content of
an image. Radon and Gabor transforms are both powerful techniques that have
been widely studied to extract shape-texture-based information. The combined
Radon-Gabor features may be more robust against scale/rotation variations,
presence of noise, and illumination changes. The objective of this paper is to
harness the potentials of both Gabor and Radon transforms in order to introduce
expressive binary features, called barcodes, for image annotation/tagging
tasks. We propose two different techniques: Gabor-of-Radon-Image Barcodes
(GRIBCs), and Guided-Radon-of-Gabor Barcodes (GRGBCs). For validation, we
employ the IRMA x-ray dataset with 193 classes, containing 12,677 training
images and 1,733 test images. A total error score as low as 322 and 330 were
achieved for GRGBCs and GRIBCs, respectively. This corresponds to $\approx
81\%$ retrieval accuracy for the first hit.
Authors' comments: To appear in proceedings of the 23rd International Conference on
Pattern Recognition (ICPR 2016), Cancun, Mexico, December 2016
Tianyu Qiu, Daniel P. Palomar
In the undersampled phase retrieval problem, the goal is to recover an $N$-dimensional complex signal $\mathbf{x}$ from only $M<N$ noisy intensity measurements without phase information. This problem has drawn a lot of attention to reduce the number of required measurements since a recent theory established that $M\approx4N$ intensity measurements are necessary and sufficient to recover a generic signal $\mathbf{x}$. In this paper, we propose to exploit the sparsity in the original signal and develop low-complexity algorithms with superior performance based on the majorization-minimization (MM) framework. The proposed algorithms are preferred to existing benchmark methods since at each iteration a simple surrogate problem is solved with a closed-form solution that monotonically decreases the original objective function. Experimental results validate that our algorithms outperform existing up-to-date methods in terms of recovery probability and accuracy, under the same settings.
Rima Alaifari, Ingrid Daubechies, Philipp Grohs, Rujie Yin
The problem of phase retrieval is to determine a signal $f\in \mathcal{H}$, with $\mathcal{H}$ a Hilbert space, from intensity measurements $|F(\omega)|$, where $F(\omega):=\langle f , \varphi_\omega\rangle$ are measurements of $f$ with respect to a measurement system $(\varphi_\omega)_{\omega\in \Omega}\subset \mathcal{H}$. Although phase retrieval is always stable in the finite dimensional setting whenever it is possible (i.e. injectivity implies stability for the inverse problem), the situation is drastically different if $\mathcal{H}$ is infinite-dimensional: in that case phase retrieval is never uniformly stable [8, 4]; moreover the stability deteriorates severely in the dimension of the problem [8]. On the other hand, all empirically observed instabilities are of a certain type: they occur whenever the function $|F|$ of intensity measurements is concentrated on disjoint sets $D_j\subset \Omega$, i.e., when $F= \sum_{j=1}^k F_j$ where each $F_j$ is concentrated on $D_j$ (and $k \geq 2$). Motivated by these considerations we propose a new paradigm for stable phase retrieval by considering the problem of reconstructing $F$ up to a phase factor that is not global, but that can be different for each of the subsets $D_j$, i.e., recovering $F$ up to the equivalence $$ F \sim \sum_{j=1}^k e^{i \alpha_j} F_j.$$ We present concrete applications (for example in audio processing) where this new notion of stability is natural and meaningful and show that in this setting stable phase retrieval can actually be achieved, for instance if the measurement system is a Gabor frame or a frame of Cauchy wavelets.
Christophe Van Gysel, Maarten de Rijke, Marcel Worring
We introduce an unsupervised discriminative model for the task of retrieving
experts in online document collections. We exclusively employ textual evidence
and avoid explicit feature engineering by learning distributed word
representations in an unsupervised way. We compare our model to
state-of-the-art unsupervised statistical vector space and probabilistic
generative approaches. Our proposed log-linear model achieves the retrieval
performance levels of state-of-the-art document-centric methods with the low
inference cost of so-called profile-centric approaches. It yields a
statistically significant improved ranking over vector space and generative
models in most cases, matching the performance of supervised methods on various
benchmarks. That is, by using solely text we can do as well as methods that
work with external evidence and/or relevance feedback. A contrastive analysis
of rankings produced by discriminative and generative approaches shows that
they have complementary strengths due to the ability of the unsupervised
discriminative model to perform semantic matching.
Authors' comments: WWW2016, Proceedings of the 25th International Conference on World
Wide Web. 2016
Zhangjie Cao, Mingsheng Long, Qiang Yang
Hashing has been widely applied to large-scale multimedia retrieval due to the storage and retrieval efficiency. Cross-modal hashing enables efficient retrieval from database of one modality in response to a query of another modality. Existing work on cross-modal hashing assumes heterogeneous relationship across modalities for hash function learning. In this paper, we relax the strong assumption by only requiring such heterogeneous relationship in an auxiliary dataset different from the query/database domain. We craft a hybrid deep architecture to simultaneously learn the cross-modal correlation from the auxiliary dataset, and align the dataset distributions between the auxiliary dataset and the query/database domain, which generates transitive hash codes for heterogeneous multimedia retrieval. Extensive experiments exhibit that the proposed approach yields state of the art multimedia retrieval performance on public datasets, i.e. NUS-WIDE, ImageNet-YahooQA.
Tomoya Murase, Kanji Tanaka
Change detection, or anomaly detection, from street-view images acquired by
an autonomous robot at multiple different times, is a major problem in robotic
mapping and autonomous driving. Formulation as an image comparison task, which
operates on a given pair of query and reference images is common to many
existing approaches to this problem. Unfortunately, providing relevant
reference images is not straightforward. In this paper, we propose a novel
formulation for change detection, termed compressive change retrieval, which
can operate on a query image and similar reference images retrieved from the
web. Compared to previous formulations, there are two sources of difficulty.
First, the retrieved reference images may frequently contain non-relevant
reference images, because even state-of-the-art place-recognition techniques
suffer from retrieval noise. Second, image comparison needs to be conducted in
a compressed domain to minimize the storage cost of large collections of
street-view images. To address the above issues, we also present a practical
change detection algorithm that uses compressed bag-of-words (BoW) image
representation as a scalable solution. The results of experiments conducted on
a practical change detection task, "moving object detection (MOD)," using the
publicly available Malaga dataset validate the effectiveness of the proposed
approach.
Authors' comments: 6 pages, 6 figures, Draft of a paper submitted to an International
Conference