Robert Beinert
The one-dimensional phase retrieval problem consists in the recovery of a complex-valued signal from its Fourier intensity. Due to the well-known ambiguousness of this problem, the determination of the original signal within the extensive solution set is challenging and can only be done under suitable a priori assumption or additional information about the unknown signal. Depending on the application, one has sometimes access to further interference measurements between the unknown signal and a reference signal. Beginning with the reconstruction in the discrete-time setting, we show that each signal can be uniquely recovered from its Fourier intensity and two further interference measurements between the unknown signal and a modulation of the signal itself. Afterwards, we consider the continuous-time problem, where we obtain an equivalent result. Moreover, the unique recovery of a continuous-time signal can also be ensured by using interference measurements with a known or an unknown reference which is unrelated to the unknown signal.
Albert Gordo, Jon Almazan, Jerome Revaud, Diane Larlus
We propose a novel approach for instance-level image retrieval. It produces a
global and compact fixed-length representation for each image by aggregating
many region-wise descriptors. In contrast to previous works employing
pre-trained deep networks as a black box to produce features, our method
leverages a deep architecture trained for the specific task of image retrieval.
Our contribution is twofold: (i) we leverage a ranking framework to learn
convolution and projection weights that are used to build the region features;
and (ii) we employ a region proposal network to learn which regions should be
pooled to form the final global descriptor. We show that using clean training
data is key to the success of our approach. To that aim, we use a large scale
but noisy landmark dataset and develop an automatic cleaning approach. The
proposed architecture produces a global image representation in a single
forward pass. Our approach significantly outperforms previous approaches based
on global descriptors on standard datasets. It even surpasses most prior works
based on costly local descriptor indexing and spatial verification. Additional
material is available at www.xrce.xerox.com/Deep-Image-Retrieval.
Authors' comments: ECCV 2016 version + additional results
Zhanning Gao, Gang Hua, Dongqing Zhang, Jianru Xue, Nanning Zheng
Event retrieval and recognition in a large corpus of videos necessitates a
holistic fixed-size visual representation at the video clip level that is
comprehensive, compact, and yet discriminative. It shall comprehensively
aggregate information across relevant video frames, while suppress redundant
information, leading to a compact representation that can effectively
differentiate among different visual events. In search for such a
representation, we propose to build a spatially consistent counting grid model
to aggregate together deep features extracted from different video frames. The
spatial consistency of the counting grid model is achieved by introducing a
prior model estimated from a large corpus of video data. The counting grid
model produces an intermediate tensor representation for each video, which
automatically identifies and removes the feature redundancy across the
different frames. The tensor representation is subsequently reduced to a
fixed-size vector representation by averaging over the counting grid. When
compared to existing methods on both event retrieval and event classification
benchmarks, we achieve significantly better accuracy with much more compact
representation.
Authors' comments: This paper has been withdrawn by the author because this work will be
part of another object which will be released soon
Dorota Glowacka, Yee Whye Teh, John Shawe-Taylor
A content-based image retrieval system based on multinomial relevance feedback is proposed. The system relies on an interactive search paradigm where at each round a user is presented with k images and selects the one closest to their ideal target. Two approaches, one based on the Dirichlet distribution and one based the Beta distribution, are used to model the problem motivating an algorithm that trades exploration and exploitation in presenting the images in each round. Experimental results show that the new approach compares favourably with previous work.
Kejun Huang, Yonina C. Eldar, Nicholas D. Sidiropoulos
This paper considers phase retrieval from the magnitude of 1D over-sampled Fourier measurements, a classical problem that has challenged researchers in various fields of science and engineering. We show that an optimal vector in a least-squares sense can be found by solving a convex problem, thus establishing a hidden convexity in Fourier phase retrieval. We also show that the standard semidefinite relaxation approach yields the optimal cost function value (albeit not necessarily an optimal solution) in this case. A method is then derived to retrieve an optimal minimum phase solution in polynomial time. Using these results, a new measuring technique is proposed which guarantees uniqueness of the solution, along with an efficient algorithm that can solve large-scale Fourier phase retrieval problems with uniqueness and optimality guarantees.
Matthieu Vergne
In order to find experts, different approaches build rankings of people,
assuming that they are ranked by level of expertise, and use typical
Information Retrieval (IR) measures to evaluate their effectiveness. However,
we figured out that expert rankings (i) tend to be partially ordered, (ii)
incomplete, and (iii) consequently provide more an order rather than absolute
ranks, which is not what usual IR measures exploit. To improve this state of
the art, we propose to revise the formalism used in IR to design proper
measures for comparing expert rankings. In this report, we investigate a first
step by providing mitigation procedures for the three issues, and we analyse IR
measures with the help of these procedures to identify interesting revisions
and remaining limitations. From this analysis, we see that most of the measures
can be exploited for this more generic context because of our mitigation
procedures. Moreover, measures based on precision and recall, usually unable to
consider the order of the ranked items, are of first interest if we represent a
ranking as a set of ordered pairs. Cumulative measures, on the other hand, are
specifically designed for considering the order but suffer from a higher
complexity, motivating the use of precision/recall measures with the right
representation.
Authors' comments: Technical report, 16 pages
Olivier Morère, Jie Lin, Antoine Veillard, Vijay Chandrasekhar, Tomaso Poggio
The goal of this work is the computation of very compact binary hashes for
image instance retrieval. Our approach has two novel contributions. The first
one is Nested Invariance Pooling (NIP), a method inspired from i-theory, a
mathematical theory for computing group invariant transformations with
feed-forward neural networks. NIP is able to produce compact and
well-performing descriptors with visual representations extracted from
convolutional neural networks. We specifically incorporate scale, translation
and rotation invariances but the scheme can be extended to any arbitrary sets
of transformations. We also show that using moments of increasing order
throughout nesting is important. The NIP descriptors are then hashed to the
target code size (32-256 bits) with a Restricted Boltzmann Machine with a novel
batch-level regularization scheme specifically designed for the purpose of
hashing (RBMH). A thorough empirical evaluation with state-of-the-art shows
that the results obtained both with the NIP descriptors and the NIP+RBMH hashes
are consistently outstanding across a wide range of datasets.
Authors' comments: Image Instance Retrieval, CNN, Invariant Representation, Hashing,
Unsupervised Learning, Regularization. arXiv admin note: text overlap with
arXiv:1601.02093
Changde Du, Ali Luo, Haifeng Yang, Wen Hou, Yanxin Guo
One of important aims of astronomical data mining is to systematically search
for specific rare objects in a massive spectral dataset, given a small fraction
of identified samples with the same type. Most existing methods are mainly
based on binary classification, which usually suffer from uncompleteness when
the known samples are too few. While, rank-based methods would provide good
solutions for such case. After investigating several algorithms, a method
combining bipartite ranking model with bootstrap aggregating techniques was
developed in this paper. The method was applied in searching for carbon stars
in the spectral data of Sloan Digital Sky Survey (SDSS) DR10, and compared with
several other popular methods used in data mining. Experimental results
validate that the proposed method is not only the most effective but also less
time consuming among these competitors automatically searching for rare spectra
in a large but unlabelled dataset.128
Authors' comments: 25 pages, 9 figures
Huishuai Zhang, Yuejie Chi, Yingbin Liang
This paper investigates the phase retrieval problem, which aims to recover a
signal from the magnitudes of its linear measurements. We develop statistically
and computationally efficient algorithms for the situation when the
measurements are corrupted by sparse outliers that can take arbitrary values.
We propose a novel approach to robustify the gradient descent algorithm by
using the sample median as a guide for pruning spurious samples in
initialization and local search. Adopting the Poisson loss and the reshaped
quadratic loss respectively, we obtain two algorithms termed median-TWF and
median-RWF, both of which provably recover the signal from a near-optimal
number of measurements when the measurement vectors are composed of i.i.d.
Gaussian entries, up to a logarithmic factor, even when a constant fraction of
the measurements are adversarially corrupted. We further show that both
algorithms are stable in the presence of additional dense bounded noise. Our
analysis is accomplished by developing non-trivial concentration results of
median-related quantities, which may be of independent interest. We provide
numerical experiments to demonstrate the effectiveness of our approach.
Authors' comments: journal version under review
Hanjiang Lai, Pan Yan, Xiangbo Shu, Yunchao Wei, Shuicheng Yan
Similarity-preserving hashing is a commonly used method for nearest neighbour
search in large-scale image retrieval. For image retrieval, deep-networks-based
hashing methods are appealing since they can simultaneously learn effective
image representations and compact hash codes. This paper focuses on
deep-networks-based hashing for multi-label images, each of which may contain
objects of multiple categories. In most existing hashing methods, each image is
represented by one piece of hash code, which is referred to as semantic
hashing. This setting may be suboptimal for multi-label image retrieval. To
solve this problem, we propose a deep architecture that learns
\textbf{instance-aware} image representations for multi-label image data, which
are organized in multiple groups, with each group containing the features for
one category. The instance-aware representations not only bring advantages to
semantic hashing, but also can be used in category-aware hashing, in which an
image is represented by multiple pieces of hash codes and each piece of code
corresponds to a category. Extensive evaluations conducted on several benchmark
datasets demonstrate that, for both semantic hashing and category-aware
hashing, the proposed method shows substantial improvement over the
state-of-the-art supervised and unsupervised hashing methods.
Authors' comments: has been accepted as a regular paper in the IEEE Transactions on
Image Processing, 2016
Mattis Paulin, Julien Mairal, Matthijs Douze, Zaid Harchaoui, Florent Perronnin, Cordelia Schmid
Convolutional neural networks (CNNs) have recently received a lot of attention due to their ability to model local stationary structures in natural images in a multi-scale fashion, when learning all model parameters with supervision. While excellent performance was achieved for image classification when large amounts of labeled visual data are available, their success for un-supervised tasks such as image retrieval has been moderate so far. Our paper focuses on this latter setting and explores several methods for learning patch descriptors without supervision with application to matching and instance-level retrieval. To that effect, we propose a new family of convolutional descriptors for patch representation , based on the recently introduced convolutional kernel networks. We show that our descriptor, named Patch-CKN, performs better than SIFT as well as other convolutional networks learned by artificially introducing supervision and is significantly faster to train. To demonstrate its effectiveness, we perform an extensive evaluation on standard benchmarks for patch and image retrieval where we obtain state-of-the-art results. We also introduce a new dataset called RomePatches, which allows to simultaneously study descriptor performance for patch and image retrieval.
Shangwen Li, Sanjay Purushotham, Chen Chen, Yuzhuo Ren, C. -C. Jay Kuo
Textual data such as tags, sentence descriptions are combined with visual cues to reduce the semantic gap for image retrieval applications in today's Multimodal Image Retrieval (MIR) systems. However, all tags are treated as equally important in these systems, which may result in misalignment between visual and textual modalities during MIR training. This will further lead to degenerated retrieval performance at query time. To address this issue, we investigate the problem of tag importance prediction, where the goal is to automatically predict the tag importance and use it in image retrieval. To achieve this, we first propose a method to measure the relative importance of object and scene tags from image sentence descriptions. Using this as the ground truth, we present a tag importance prediction model to jointly exploit visual, semantic and context cues. The Structural Support Vector Machine (SSVM) formulation is adopted to ensure efficient training of the prediction model. Then, the Canonical Correlation Analysis (CCA) is employed to learn the relation between the image visual feature and tag importance to obtain robust retrieval performance. Experimental results on three real-world datasets show a significant performance improvement of the proposed MIR with Tag Importance Prediction (MIR/TIP) system over other MIR systems.
Rahul Radhakrishnan Iyer, Sanjeel Parekh, Vikas Mohandoss, Anush Ramsurat, Bhiksha Raj, Rita Singh
Existing video indexing and retrieval methods on popular web-based multimedia
sharing websites are based on user-provided sparse tagging. This paper proposes
a very specific way of searching for video clips, based on the content of the
video. We present our work on Content-based Video Indexing and Retrieval using
the Correspondence-Latent Dirichlet Allocation (corr-LDA) probabilistic
framework. This is a model that provides for auto-annotation of videos in a
database with textual descriptors, and brings the added benefit of utilizing
the semantic relations between the content of the video and text. We use the
concept-level matching provided by corr-LDA to build correspondences between
text and multimedia, with the objective of retrieving content with increased
accuracy. In our experiments, we employ only the audio components of the
individual recordings and compare our results with an SVM-based approach.
Authors' comments: 8 Pages, Updated References, Added Figures
Yubao Sun, Renlong Hang, Qingshan Liu, Fuping Zhu, Hucheng Pei
In this paper, we propose a novel data-driven regression model for aerosol
optical depth (AOD) retrieval. First, we adopt a low rank representation (LRR)
model to learn a powerful representation of the spectral response. Then, graph
regularization is incorporated into the LRR model to capture the local
structure information and the nonlinear property of the remote-sensing data.
Since it is easy to acquire the rich satellite-retrieval results, we use them
as a baseline to construct the graph. Finally, the learned feature
representation is feeded into support vector machine (SVM) to retrieve AOD.
Experiments are conducted on two widely used data sets acquired by different
sensors, and the experimental results show that the proposed method can achieve
superior performance compared to the physical models and other state-of-the-art
empirical models.
Authors' comments: 16 pages, 6 figures
Yue Cao, Mingsheng Long, Jianmin Wang, Philip S. Yu
Hashing is widely applied to approximate nearest neighbor search for
large-scale multimodal retrieval with storage and computation efficiency.
Cross-modal hashing improves the quality of hash coding by exploiting semantic
correlations across different modalities. Existing cross-modal hashing methods
first transform data into low-dimensional feature vectors, and then generate
binary codes by another separate quantization step. However, suboptimal hash
codes may be generated since the quantization error is not explicitly minimized
and the feature representation is not jointly optimized with the binary codes.
This paper presents a Correlation Hashing Network (CHN) approach to cross-modal
hashing, which jointly learns good data representation tailored to hash coding
and formally controls the quantization error. The proposed CHN is a hybrid deep
architecture that constitutes a convolutional neural network for learning good
image representations, a multilayer perception for learning good text
representations, two hashing layers for generating compact binary codes, and a
structured max-margin loss that integrates all things together to enable
learning similarity-preserving and high-quality hash codes. Extensive empirical
study shows that CHN yields state of the art cross-modal retrieval performance
on standard benchmarks.
Authors' comments: 7 pages
Boshra Rajaei, Sylvain Gigan, Florent Krzakala, Laurent Daudet
This paper addresses fundamental scaling issues that hinder phase retrieval (PR) in high dimensions. We show that, if the measurement matrix can be put into a generalized block-diagonal form, a large PR problem can be solved on separate blocks, at the cost of a few extra global measurements to merge the partial results. We illustrate this principle using two distinct PR methods, and discuss different design trade-offs. Experimental results indicate that this block-based PR framework can reduce computational cost and memory requirements by several orders of magnitude.
Paolo Napoletano
In this paper we present an extensive evaluation of visual descriptors for the content-based retrieval of remote sensing (RS) images. The evaluation includes global hand-crafted, local hand-crafted, and Convolutional Neural Network (CNNs) features coupled with four different Content-Based Image Retrieval schemes. We conducted all the experiments on two publicly available datasets: the 21-class UC Merced Land Use/Land Cover (LandUse) dataset and 19-class High-resolution Satellite Scene dataset (SceneSat). The content of RS images might be quite heterogeneous, ranging from images containing fine grained textures, to coarse grained ones or to images containing objects. It is therefore not obvious in this domain, which descriptor should be employed to describe images having such a variability. Results demonstrate that CNN-based features perform better than both global and and local hand-crafted features whatever is the retrieval scheme adopted. Features extracted from SatResNet-50, a residual CNN suitable fine-tuned on the RS domain, shows much better performance than a residual CNN pre-trained on multimedia scene and object images. Features extracted from NetVLAD, a CNN that considers both CNN and local features, works better than others CNN solutions on those images that contain fine-grained textures and objects.
Anna Podlesnaya, Sergey Podlesnyy
We share the implementation details and testing results for video retrieval
system based exclusively on features extracted by convolutional neural
networks. We show that deep learned features might serve as universal signature
for semantic content of video useful in many search and retrieval tasks. We
further show that graph-based storage structure for video index allows to
efficiently retrieving the content with complicated spatial and temporal search
queries.
Authors' comments: 7 pages, 5 figures
Grzegorz Wilk, Zbigniew Włodarczyk
Multiplicity distributions $P(N)$ measured in multiparticle production
processes are most frequently described by the Negative Binomial Distribution
(NBD). However, with increasing collision energy some systematic discrepancies
become more and more apparent. They are usually attributed to the possible
multi-source structure of the production process and described using a
multi-NBD form of the multiplicity distribution. We investigate the possibility
of keeping a single NBD but with its parameters depending on the multiplicity
$N$. This is done by modifying the widely known clan model of particle
production leading to the NBD form of $P(N)$. This is then confronted with the
approach based on the so-called cascade-stochastic formalism which is based on
different types of recurrence relations defining $P(N)$. We demonstrate that a
combination of both approaches allows the retrieval of additional valuable
information from the multiplicity distributions, namely the oscillatory
behavior of the counting statistics apparently visible in the high energy data.
Authors' comments: A few cosmetical changes. Version accepted for publication by J.Phys.
G
Mahyuddin K. M. Nasution, Shahrul Azman Mohd. Noah, Saidah Saad
Social network has become one of the themes of government issues, mainly
dealing with the chaos. The use of web is steadily gaining ground in these
issues. However, most of the web documents are unstructured and lack of
semantic. In this paper we proposed an Information Retrieval driven method for
dealing with heterogeneity of features in the web. The proposed solution is to
compare some approaches have shown the capacity to extract social relation:
strength relations and relations based on online academic database.
Authors' comments: 11 pages, 1 figures