Bing Gao
This article presents new results concerning the recovery of a signal from
magnitude only measurements where the signal is not sparse in an orthonormal
basis but in a redundant dictionary. To solve this phaseless problem, we
analyze the $ \ell_1 $-analysis model. Firstly we investigate the noiseless
case with presenting a null space property of the measurement matrix under
which the $ \ell_1 $-analysis model provide an exact recovery. Secondly we
introduce a new property (S-DRIP) of the measurement matrix. By solving the $
\ell_1 $-analysis model, we prove that this property can guarantee a stable
recovery of real signals that are nearly sparse in highly overcomplete
dictionaries.
Authors' comments: 16 pages
Dong Yin, Kangwook Lee, Ramtin Pedarsani, Kannan Ramchandran
In this paper, we tackle the compressive phase retrieval problem in the presence of noise. The noisy compressive phase retrieval problem is to recover a $K$-sparse complex signal $s \in \mathbb{C}^n$, from a set of $m$ noisy quadratic measurements: $ y_i=| a_i^H s |^2+w_i$, where $a_i^H\in\mathbb{C}^n$ is the $i$th row of the measurement matrix $A\in\mathbb{C}^{m\times n}$, and $w_i$ is the additive noise to the $i$th measurement. We consider the regime where $K=\beta n^\delta$, with constants $\beta>0$ and $\delta\in(0,1)$. We use the architecture of PhaseCode algorithm, and robustify it using two schemes: the almost-linear scheme and the sublinear scheme. We prove that with high probability, the almost-linear scheme recovers $s$ with sample complexity $\Theta(K \log(n))$ and computational complexity $\Theta(n \log(n))$, and the sublinear scheme recovers $s$ with sample complexity $\Theta(K\log^3(n))$ and computational complexity $\Theta(K\log^3(n))$. To the best of our knowledge, this is the first scheme that achieves sublinear computational complexity for compressive phase retrieval problem. Finally, we provide simulation results that support our theoretical contributions.
Yang Wang, Zhiqiang Xu
In this paper, we develop a framework of generalized phase retrieval in which
one aims to reconstruct a vector ${\mathbf x}$ in ${\mathbb R}^d$ or ${\mathbb
C}^d$ through quadratic samples ${\mathbf x}^*A_1{\mathbf x}, \dots, {\mathbf
x}^*A_N{\mathbf x}$. The generalized phase retrieval includes as special cases
the standard phase retrieval as well as the phase retrieval by orthogonal
projections. We first explore the connections among generalized phase
retrieval, low-rank matrix recovery and nonsingular bilinear form. Motivated by
the connections, we present results on the minimal measurement number needed
for recovering a matrix that lies in a set $W\in {\mathbb C}^{d\times d}$.
Applying the results to phase retrieval, we show that generic $d \times d$
matrices $A_1,\ldots, A_N$ have the phase retrieval property if $N\geq 2d-1$ in
the real case and $N \geq 4d-4$ in the complex case for very general classes of
$A_1,\ldots,A_N$, e.g. matrices with prescribed ranks or orthogonal
projections. Our method also leads to a novel proof for the classical
Stiefel-Hopf condition on nonsingular bilinear form. We also give lower bounds
on the minimal measurement number required for generalized phase retrieval. For
several classes of dimensions $d$ we obtain the precise values of the minimal
measurement number. Our work unifies and enhances results from the standard
phase retrieval, phase retrieval by projections and low-rank matrix recovery.
Authors' comments: 31 pages, add Theorem 5.2, 5.3
Lan Li, Cheng Cheng, Deguang Han, Qiyu Sun, Guangming Shi
In this paper, we introduce two symmetric directed graphs depending on supports of signals and windows, and we show that the connectivity of those graphs provides either necessary and sufficient conditions to phase retrieval of a signal from magnitude measurements of its multiple-window short-time Fourier transform. Also we propose an algebraic reconstruction algorithm, and provide an error estimate to our algorithm when magnitude measurements are corrupted by deterministic/random noises.
Hua Sun, Syed A. Jafar
Private information retrieval (PIR) is the problem of retrieving as efficiently as possible, one out of $K$ messages from $N$ non-communicating replicated databases (each holds all $K$ messages) while keeping the identity of the desired message index a secret from each individual database. The information theoretic capacity of PIR (equivalently, the reciprocal of minimum download cost) is the maximum number of bits of desired information that can be privately retrieved per bit of downloaded information. $T$-private PIR is a generalization of PIR to include the requirement that even if any $T$ of the $N$ databases collude, the identity of the retrieved message remains completely unknown to them. Robust PIR is another generalization that refers to the scenario where we have $M \geq N$ databases, out of which any $M - N$ may fail to respond. For $K$ messages and $M\geq N$ databases out of which at least some $N$ must respond, we show that the capacity of $T$-private and Robust PIR is $\left(1+T/N+T^2/N^2+\cdots+T^{K-1}/N^{K-1}\right)^{-1}$. The result includes as special cases the capacity of PIR without robustness ($M=N$) or $T$-privacy constraints ($T=1$).
Yu Wu, Wei Wu, Zhoujun Li, Ming Zhou
We consider incorporating topic information into message-response matching to
boost responses with rich content in retrieval-based chatbots. To this end, we
propose a topic-aware convolutional neural tensor network (TACNTN). In TACNTN,
matching between a message and a response is not only conducted between a
message vector and a response vector generated by convolutional neural
networks, but also leverages extra topic information encoded in two topic
vectors. The two topic vectors are linear combinations of topic words of the
message and the response respectively, where the topic words are obtained from
a pre-trained LDA model and their weights are determined by themselves as well
as the message vector and the response vector. The message vector, the response
vector, and the two topic vectors are fed to neural tensors to calculate a
matching score. Empirical study on a public data set and a human annotated data
set shows that TACNTN can significantly outperform state-of-the-art methods for
message-response matching.
Authors' comments: under reviewed of AAAI 2017
Andre Araujo, Jason Chaves, Haricharan Lakshman, Roland Angst, Bernd Girod
We consider the problem of using image queries to retrieve videos from a database. Our focus is on large-scale applications, where it is infeasible to index each database video frame independently. Our main contribution is a framework based on Bloom filters, which can be used to index long video segments, enabling efficient image-to-video comparisons. Using this framework, we investigate several retrieval architectures, by considering different types of aggregation and different functions to encode visual information -- these play a crucial role in achieving high performance. Extensive experiments show that the proposed technique improves mean average precision by 24% on a public dataset, while being 4X faster, compared to the previous state-of-the-art.
Xiu-Shen Wei, Jian-Hao Luo, Jianxin Wu, Zhi-Hua Zhou
Deep convolutional neural network models pre-trained for the ImageNet
classification task have been successfully adopted to tasks in other domains,
such as texture description and object proposal generation, but these tasks
require annotations for images in the new domain. In this paper, we focus on a
novel and challenging task in the pure unsupervised setting: fine-grained image
retrieval. Even with image labels, fine-grained images are difficult to
classify, let alone the unsupervised retrieval task. We propose the Selective
Convolutional Descriptor Aggregation (SCDA) method. SCDA firstly localizes the
main object in fine-grained images, a step that discards the noisy background
and keeps useful deep descriptors. The selected descriptors are then aggregated
and dimensionality reduced into a short feature vector using the best practices
we found. SCDA is unsupervised, using no image label or bounding box
annotation. Experiments on six fine-grained datasets confirm the effectiveness
of SCDA for fine-grained image retrieval. Besides, visualization of the SCDA
features shows that they correspond to visual attributes (even subtle ones),
which might explain SCDA's high mean average precision in fine-grained
retrieval. Moreover, on general image retrieval datasets, SCDA achieves
comparable retrieval results with state-of-the-art general image retrieval
approaches.
Authors' comments: IEEE Transactions on Image Processing (TIP), 2017, 26(6): 2868-2881
Shujin Zhu, H. R. Tizhoosh
For more than two decades, research has been performed on content-based image
retrieval (CBIR). By combining Radon projections and the support vector
machines (SVM), a content-based medical image retrieval method is presented in
this work. The proposed approach employs the normalized Radon projections with
corresponding image category labels to build an SVM classifier, and the Radon
barcode database which encodes every image in a binary format is also generated
simultaneously to tag all images. To retrieve similar images when a query image
is given, Radon projections and the barcode of the query image are generated.
Subsequently, the k-nearest neighbor search method is applied to find the
images with minimum Hamming distance of the Radon barcode within the same class
predicted by the trained SVM classifier that uses Radon features. The
performance of the proposed method is validated by using the IRMA 2009 dataset
with 14,410 x-ray images in 57 categories. The results demonstrate that our
method has the capacity to retrieve similar responses for the correctly
identified query image and even for those mistakenly classified by SVM. The
approach further is very fast and has low memory requirement.
Authors' comments: To appear in proceedings of The 2016 IEEE International Joint
Conference on Neural Networks (IJCNN 2016), July 24-29, 2016, Vancouver,
Canada
Robert Beinert
The one-dimensional phase retrieval problem consists in the recovery of a complex-valued signal from its Fourier intensity. Due to the well-known ambiguousness of this problem, the determination of the original signal within the extensive solution set is challenging and can only be done under suitable a priori assumption or additional information about the unknown signal. Depending on the application, one has sometimes access to further interference measurements between the unknown signal and a reference signal. Beginning with the reconstruction in the discrete-time setting, we show that each signal can be uniquely recovered from its Fourier intensity and two further interference measurements between the unknown signal and a modulation of the signal itself. Afterwards, we consider the continuous-time problem, where we obtain an equivalent result. Moreover, the unique recovery of a continuous-time signal can also be ensured by using interference measurements with a known or an unknown reference which is unrelated to the unknown signal.
Albert Gordo, Jon Almazan, Jerome Revaud, Diane Larlus
We propose a novel approach for instance-level image retrieval. It produces a
global and compact fixed-length representation for each image by aggregating
many region-wise descriptors. In contrast to previous works employing
pre-trained deep networks as a black box to produce features, our method
leverages a deep architecture trained for the specific task of image retrieval.
Our contribution is twofold: (i) we leverage a ranking framework to learn
convolution and projection weights that are used to build the region features;
and (ii) we employ a region proposal network to learn which regions should be
pooled to form the final global descriptor. We show that using clean training
data is key to the success of our approach. To that aim, we use a large scale
but noisy landmark dataset and develop an automatic cleaning approach. The
proposed architecture produces a global image representation in a single
forward pass. Our approach significantly outperforms previous approaches based
on global descriptors on standard datasets. It even surpasses most prior works
based on costly local descriptor indexing and spatial verification. Additional
material is available at www.xrce.xerox.com/Deep-Image-Retrieval.
Authors' comments: ECCV 2016 version + additional results
Zhanning Gao, Gang Hua, Dongqing Zhang, Jianru Xue, Nanning Zheng
Event retrieval and recognition in a large corpus of videos necessitates a
holistic fixed-size visual representation at the video clip level that is
comprehensive, compact, and yet discriminative. It shall comprehensively
aggregate information across relevant video frames, while suppress redundant
information, leading to a compact representation that can effectively
differentiate among different visual events. In search for such a
representation, we propose to build a spatially consistent counting grid model
to aggregate together deep features extracted from different video frames. The
spatial consistency of the counting grid model is achieved by introducing a
prior model estimated from a large corpus of video data. The counting grid
model produces an intermediate tensor representation for each video, which
automatically identifies and removes the feature redundancy across the
different frames. The tensor representation is subsequently reduced to a
fixed-size vector representation by averaging over the counting grid. When
compared to existing methods on both event retrieval and event classification
benchmarks, we achieve significantly better accuracy with much more compact
representation.
Authors' comments: This paper has been withdrawn by the author because this work will be
part of another object which will be released soon
Dorota Glowacka, Yee Whye Teh, John Shawe-Taylor
A content-based image retrieval system based on multinomial relevance feedback is proposed. The system relies on an interactive search paradigm where at each round a user is presented with k images and selects the one closest to their ideal target. Two approaches, one based on the Dirichlet distribution and one based the Beta distribution, are used to model the problem motivating an algorithm that trades exploration and exploitation in presenting the images in each round. Experimental results show that the new approach compares favourably with previous work.
Kejun Huang, Yonina C. Eldar, Nicholas D. Sidiropoulos
This paper considers phase retrieval from the magnitude of 1D over-sampled Fourier measurements, a classical problem that has challenged researchers in various fields of science and engineering. We show that an optimal vector in a least-squares sense can be found by solving a convex problem, thus establishing a hidden convexity in Fourier phase retrieval. We also show that the standard semidefinite relaxation approach yields the optimal cost function value (albeit not necessarily an optimal solution) in this case. A method is then derived to retrieve an optimal minimum phase solution in polynomial time. Using these results, a new measuring technique is proposed which guarantees uniqueness of the solution, along with an efficient algorithm that can solve large-scale Fourier phase retrieval problems with uniqueness and optimality guarantees.
Matthieu Vergne
In order to find experts, different approaches build rankings of people,
assuming that they are ranked by level of expertise, and use typical
Information Retrieval (IR) measures to evaluate their effectiveness. However,
we figured out that expert rankings (i) tend to be partially ordered, (ii)
incomplete, and (iii) consequently provide more an order rather than absolute
ranks, which is not what usual IR measures exploit. To improve this state of
the art, we propose to revise the formalism used in IR to design proper
measures for comparing expert rankings. In this report, we investigate a first
step by providing mitigation procedures for the three issues, and we analyse IR
measures with the help of these procedures to identify interesting revisions
and remaining limitations. From this analysis, we see that most of the measures
can be exploited for this more generic context because of our mitigation
procedures. Moreover, measures based on precision and recall, usually unable to
consider the order of the ranked items, are of first interest if we represent a
ranking as a set of ordered pairs. Cumulative measures, on the other hand, are
specifically designed for considering the order but suffer from a higher
complexity, motivating the use of precision/recall measures with the right
representation.
Authors' comments: Technical report, 16 pages
Olivier Morère, Jie Lin, Antoine Veillard, Vijay Chandrasekhar, Tomaso Poggio
The goal of this work is the computation of very compact binary hashes for
image instance retrieval. Our approach has two novel contributions. The first
one is Nested Invariance Pooling (NIP), a method inspired from i-theory, a
mathematical theory for computing group invariant transformations with
feed-forward neural networks. NIP is able to produce compact and
well-performing descriptors with visual representations extracted from
convolutional neural networks. We specifically incorporate scale, translation
and rotation invariances but the scheme can be extended to any arbitrary sets
of transformations. We also show that using moments of increasing order
throughout nesting is important. The NIP descriptors are then hashed to the
target code size (32-256 bits) with a Restricted Boltzmann Machine with a novel
batch-level regularization scheme specifically designed for the purpose of
hashing (RBMH). A thorough empirical evaluation with state-of-the-art shows
that the results obtained both with the NIP descriptors and the NIP+RBMH hashes
are consistently outstanding across a wide range of datasets.
Authors' comments: Image Instance Retrieval, CNN, Invariant Representation, Hashing,
Unsupervised Learning, Regularization. arXiv admin note: text overlap with
arXiv:1601.02093
Changde Du, Ali Luo, Haifeng Yang, Wen Hou, Yanxin Guo
One of important aims of astronomical data mining is to systematically search
for specific rare objects in a massive spectral dataset, given a small fraction
of identified samples with the same type. Most existing methods are mainly
based on binary classification, which usually suffer from uncompleteness when
the known samples are too few. While, rank-based methods would provide good
solutions for such case. After investigating several algorithms, a method
combining bipartite ranking model with bootstrap aggregating techniques was
developed in this paper. The method was applied in searching for carbon stars
in the spectral data of Sloan Digital Sky Survey (SDSS) DR10, and compared with
several other popular methods used in data mining. Experimental results
validate that the proposed method is not only the most effective but also less
time consuming among these competitors automatically searching for rare spectra
in a large but unlabelled dataset.128
Authors' comments: 25 pages, 9 figures
Huishuai Zhang, Yuejie Chi, Yingbin Liang
This paper investigates the phase retrieval problem, which aims to recover a
signal from the magnitudes of its linear measurements. We develop statistically
and computationally efficient algorithms for the situation when the
measurements are corrupted by sparse outliers that can take arbitrary values.
We propose a novel approach to robustify the gradient descent algorithm by
using the sample median as a guide for pruning spurious samples in
initialization and local search. Adopting the Poisson loss and the reshaped
quadratic loss respectively, we obtain two algorithms termed median-TWF and
median-RWF, both of which provably recover the signal from a near-optimal
number of measurements when the measurement vectors are composed of i.i.d.
Gaussian entries, up to a logarithmic factor, even when a constant fraction of
the measurements are adversarially corrupted. We further show that both
algorithms are stable in the presence of additional dense bounded noise. Our
analysis is accomplished by developing non-trivial concentration results of
median-related quantities, which may be of independent interest. We provide
numerical experiments to demonstrate the effectiveness of our approach.
Authors' comments: journal version under review
Hanjiang Lai, Pan Yan, Xiangbo Shu, Yunchao Wei, Shuicheng Yan
Similarity-preserving hashing is a commonly used method for nearest neighbour
search in large-scale image retrieval. For image retrieval, deep-networks-based
hashing methods are appealing since they can simultaneously learn effective
image representations and compact hash codes. This paper focuses on
deep-networks-based hashing for multi-label images, each of which may contain
objects of multiple categories. In most existing hashing methods, each image is
represented by one piece of hash code, which is referred to as semantic
hashing. This setting may be suboptimal for multi-label image retrieval. To
solve this problem, we propose a deep architecture that learns
\textbf{instance-aware} image representations for multi-label image data, which
are organized in multiple groups, with each group containing the features for
one category. The instance-aware representations not only bring advantages to
semantic hashing, but also can be used in category-aware hashing, in which an
image is represented by multiple pieces of hash codes and each piece of code
corresponds to a category. Extensive evaluations conducted on several benchmark
datasets demonstrate that, for both semantic hashing and category-aware
hashing, the proposed method shows substantial improvement over the
state-of-the-art supervised and unsupervised hashing methods.
Authors' comments: has been accepted as a regular paper in the IEEE Transactions on
Image Processing, 2016
Mattis Paulin, Julien Mairal, Matthijs Douze, Zaid Harchaoui, Florent Perronnin, Cordelia Schmid
Convolutional neural networks (CNNs) have recently received a lot of attention due to their ability to model local stationary structures in natural images in a multi-scale fashion, when learning all model parameters with supervision. While excellent performance was achieved for image classification when large amounts of labeled visual data are available, their success for un-supervised tasks such as image retrieval has been moderate so far. Our paper focuses on this latter setting and explores several methods for learning patch descriptors without supervision with application to matching and instance-level retrieval. To that effect, we propose a new family of convolutional descriptors for patch representation , based on the recently introduced convolutional kernel networks. We show that our descriptor, named Patch-CKN, performs better than SIFT as well as other convolutional networks learned by artificially introducing supervision and is significantly faster to train. To demonstrate its effectiveness, we perform an extensive evaluation on standard benchmarks for patch and image retrieval where we obtain state-of-the-art results. We also introduce a new dataset called RomePatches, which allows to simultaneously study descriptor performance for patch and image retrieval.