Emmanuel Candes, Xiaodong Li, Mahdi Soltanolkotabi
This paper considers the question of recovering the phase of an object from intensity-only measurements, a problem which naturally appears in X-ray crystallography and related disciplines. We study a physically realistic setup where one can modulate the signal of interest and then collect the intensity of its diffraction pattern, each modulation thereby producing a sort of coded diffraction pattern. We show that PhaseLift, a recent convex programming technique, recovers the phase information exactly from a number of random modulations, which is polylogarithmic in the number of unknowns. Numerical experiments with noiseless and noisy data complement our theoretical analysis and illustrate our approach.
Kirsty Kitto, Peter Bruza, Liane Gabora
As computers approach the physical limits of information storable in memory,
new methods will be needed to further improve information storage and
retrieval. We propose a quantum inspired vector based approach, which offers a
contextually dependent mapping from the subsymbolic to the symbolic
representations of information. If implemented computationally, this approach
would provide exceptionally high density of information storage, without the
traditionally required physical increase in storage capacity. The approach is
inspired by the structure of human memory and incorporates elements of
Gardenfors' Conceptual Space approach and Humphreys et al.'s matrix model of
memory.
Authors' comments: 8 pages; 2 figures
Mehmet Akçakaya, Vahid Tarokh
We consider the problem of sparse phase retrieval, where a $k$-sparse signal
${\bf x} \in {\mathbb R}^n \textrm{ (or } {\mathbb C}^n\textrm{)}$ is measured
as ${\bf y} = |{\bf Ax}|,$ where ${\bf A} \in {\mathbb R}^{m \times n} \textrm{
(or } {\mathbb C}^{m \times n}\textrm{ respectively)}$ is a measurement matrix
and $|\cdot|$ is the element-wise absolute value. For a real signal and a real
measurement matrix ${\bf A}$, we show that $m = 2k$ measurements are necessary
and sufficient to recover ${\bf x}$ uniquely. For complex signal ${\bf x} \in
{\mathbb C}^n$ and ${\bf A} \in {\mathbb C}^{m \times n}$, we show that $m =
4k-2$ phaseless measurements are sufficient to recover ${\bf x}$. It is known
that the multiplying constant $4$ in $m = 4k-2$ cannot be improved.
Authors' comments: 6 pages
Yoshihisa Udagawa
Duplicated code has a negative impact on the quality of software systems and
should be detected at least. In this paper, we discuss an approach that
improves source code retrieval using the structural information about the
programs. We developed a lexical parser to extract control statements and
method identifiers from Java programs. We propose a similarity measure that is
defined by the ratio of the number of sequentially full matching statements to
the number of sequentially partial matching ones. The similarity measure is
considered to be an extension of a set based similarity index, e.g.,
Sorensen-Dice index. Our key contribution of this research is the development
of a similarity retrieval algorithm that derives meaningful search conditions
from a given sequence, and then performs retrieval using all of the derived
conditions. Experiments show that our retrieval model outperforms the other
retrieval models up to 90.9% in the number of retrieved methods.
Authors' comments: 16 pages, 6 figures, 3 tables
Juri Ranieri, Amina Chebira, Yue M. Lu, Martin Vetterli
In a variety of fields, in particular those involving imaging and optics, we
often measure signals whose phase is missing or has been irremediably
distorted. Phase retrieval attempts the recovery of the phase information of a
signal from the magnitude of its Fourier transform to enable the reconstruction
of the original signal. A fundamental question then is: "Under which conditions
can we uniquely recover the signal of interest from its measured magnitudes?"
In this paper, we assume the measured signal to be sparse. This is a natural
assumption in many applications, such as X-ray crystallography, speckle imaging
and blind channel estimation. In this work, we derive a sufficient condition
for the uniqueness of the solution of the phase retrieval (PR) problem for both
discrete and continuous domains, and for one and multi-dimensional domains.
More precisely, we show that there is a strong connection between PR and the
turnpike problem, a classic combinatorial problem. We also prove that the
existence of collisions in the autocorrelation function of the signal may
preclude the uniqueness of the solution of PR. Then, assuming the absence of
collisions, we prove that the solution is almost surely unique on 1-dimensional
domains. Finally, we extend this result to multi-dimensional signals by solving
a set of 1-dimensional problems. We show that the solution of the
multi-dimensional problem is unique when the autocorrelation function has no
collisions, significantly improving upon a previously known result.
Authors' comments: submitted to IEEE TIT
Afonso S. Bandeira, Dustin G. Mixon
In many areas of imaging science, it is difficult to measure the phase of linear measurements. As such, one often wishes to reconstruct a signal from intensity measurements, that is, perform phase retrieval. In several applications the signal in question is believed to be sparse. In this paper, we use ideas from the recently developed polarization method for phase retrieval and provide an algorithm that is guaranteed to recover a sparse signal from a number of phaseless linear measurements that scales linearly with the sparsity of the signal (up to logarithmic factors). This is particularly remarkable since it is known that a certain popular class of convex methods is not able to perform recovery unless the number of measurements scales with the square of the sparsity of the signal. This is a shorter version of a more complete publication that will appear elsewhere.
Matthew Fickus, Dustin G. Mixon, Aaron A. Nelson, Yang Wang
In many applications, signals are measured according to a linear process, but
the phases of these measurements are often unreliable or not available. To
reconstruct the signal, one must perform a process known as phase retrieval.
This paper focuses on completely determining signals with as few intensity
measurements as possible, and on efficient phase retrieval algorithms from such
measurements. For the case of complex M-dimensional signals, we construct a
measurement ensemble of size 4M-4 which yields injective intensity
measurements; this is conjectured to be the smallest such ensemble. For the
case of real signals, we devise a theory of "almost" injective intensity
measurements, and we characterize such ensembles. Later, we show that phase
retrieval from M+1 almost injective intensity measurements is NP-hard,
indicating that computationally efficient phase retrieval must come at the
price of measurement redundancy.
Authors' comments: 18 pages, 1 figure
John G. DeVore, Joseph A. Kristl, Saul Rappaport
The aureoles around stars caused by thin cirrus limit nighttime measurement
opportunities for ground-based astronomy but can provide information on
high-altitude ice crystals for climate research. In this paper we attempt to
demonstrate quantitatively how this works. Aureole profiles can be followed out
to ~0.2 degrees from stars and ~0.5 degrees from Jupiter. Interpretation of
diffracted starlight is similar to that for sunlight, but emphasizes larger
particles. Stellar diffraction profiles are very distinctive, typically being
approximately flat out to a critical angle followed by gradually steepening
power-law falloff with slope less steep than -3. Using the relationship between
the phase function for diffraction and the average Fourier transform of the
projected area of complex ice crystals we show that defining particle size in
terms of average projected area normal to the propagation direction of the
starlight leads to a simple, analytic approximation representing large-particle
diffraction that is nearly independent of crystal habit. A similar analytic
approximation for the diffraction aureole allows it to be separated from the
point spread function and the sky background. Multiple scattering is
deconvolved using the Hankel transform leading to the diffraction phase
function. Application of constrained numerical inversion to the phase function
then yields a solution for the particle size distribution in the range between
~50 microns and ~400 microns. Stellar aureole measurements can provide one of
the very few, as well as least expensive, methods for retrieving cirrus
microphysical properties from ground-based observations.
Authors' comments: 18 pages, 21 figures, Journal of Geophysical Research, in press
Urmila Shrawankar, Anjali Mahajan
Along with the rapid development of information technology, the amount of
information generated at a given time far exceeds human's ability to organize,
search, and manipulate without the help of automatic systems. Now a days so
many tools and techniques are available for storage and retrieval of
information. User uses interface to interact with these techniques, mostly text
user interface (TUI) or graphical user interface (GUI). Here, I am trying to
introduce a new interface i.e. speech for information retrieval. The goal of
this project is to develop a speech interface that can search and read the
required information from the database effectively, efficiently and more
friendly. This tool will be highly useful to blind people, they will able to
demand the information to the computer by giving voice command/s (keyword)
through microphone and listen the required information using speaker or
headphones.
Authors' comments: Pages: 06 Figures: 01; Conference Proceedings International
Conference on Digital Libraires (ICDL 06), 2006, TERI, India
D. García Yárnoz, J. P. Sánchez, C. R. McInnes
Asteroids and comets are of strategic importance for science in an effort to understand the formation, evolution and composition of the Solar System. Near-Earth Objects (NEOs) are of particular interest because of their accessibility from Earth, but also because of their speculated wealth of material resources. The exploitation of these resources has long been discussed as a means to lower the cost of future space endeavours. In this paper, we consider the currently known NEO population and define a family of so-called Easily Retrievable Objects (EROs), objects that can be transported from accessible heliocentric orbits into the Earth's neighbourhood at affordable costs. The asteroid retrieval transfers are sought from the continuum of low energy transfers enabled by the dynamics of invariant manifolds; specifically, the retrieval transfers target planar, vertical Lyapunov and halo orbit families associated with the collinear equilibrium points of the Sun-Earth Circular Restricted Three Body problem. The judicious use of these dynamical features provides the best opportunity to find extremely low energy Earth transfers for asteroid material. A catalogue of asteroid retrieval candidates is then presented. Despite the highly incomplete census of very small asteroids, the ERO catalogue can already be populated with 12 different objects retrievable with less than 500 m/s of {\Delta}v. Moreover, the approach proposed represents a robust search and ranking methodology for future retrieval candidates that can be automatically applied to the growing survey of NEOs.
Diederik Aerts, Jan Broekaert, Sandro Sozzo, Tomas Veloz
In recent years, quantum-based methods have promisingly integrated the
traditional procedures in information retrieval (IR) and natural language
processing (NLP). Inspired by our research on the identification and
application of quantum structures in cognition, more specifically our work on
the representation of concepts and their combinations, we put forward a
'quantum meaning based' framework for structured query retrieval in text
corpora and standardized testing corpora. This scheme for IR rests on
considering as basic notions, (i) 'entities of meaning', e.g., concepts and
their combinations and (ii) traces of such entities of meaning, which is how
documents are considered in this approach. The meaning content of these
'entities of meaning' is reconstructed by solving an 'inverse problem' in the
quantum formalism, consisting of reconstructing the full states of the entities
of meaning from their collapsed states identified as traces in relevant
documents. The advantages with respect to traditional approaches, such as
Latent Semantic Analysis (LSA), are discussed by means of concrete examples.
Authors' comments: 11 pages
Marc Sloan, Jun Wang
Many Information Retrieval (IR) models make use of offline statistical
techniques to score documents for ranking over a single period, rather than use
an online, dynamic system that is responsive to users over time. In this paper,
we explicitly formulate a general Multi Period Information Retrieval problem,
where we consider retrieval as a stochastic yet controllable process. The
ranking action during the process continuously controls the retrieval system's
dynamics, and an optimal ranking policy is found in order to maximise the
overall users' satisfaction over the multiple periods as much as possible. Our
derivations show interesting properties about how the posterior probability of
the documents relevancy evolves from users feedbacks through clicks, and
provides a plug-in framework for incorporating different click models. Based on
the Multi-Armed Bandit theory, we propose a simple implementation of our
framework using a dynamic ranking rule that takes rank bias and exploration of
documents into account. We use TREC data to learn a suitable exploration
parameter for our model, and then analyse its performance and a number of
variants using a search log data set; the experiments suggest an ability to
explore document relevance dynamically over time using user feedback in a way
that can handle rank bias.
Authors' comments: 8 pages, 3 tables, published at the Workshop on Web Search Click Data
2013
Kishore Jaganathan, Samet Oymak, Babak Hassibi
We consider the problem of recovering signals from their power spectral
density. This is a classical problem referred to in literature as the phase
retrieval problem, and is of paramount importance in many fields of applied
sciences. In general, additional prior information about the signal is required
to guarantee unique recovery as the mapping from signals to power spectral
density is not one-to-one. In this paper, we assume that the underlying signals
are sparse. Recently, semidefinite programming (SDP) based approaches were
explored by various researchers. Simulations of these algorithms strongly
suggest that signals upto $o(\sqrt{n})$ sparsity can be recovered by this
technique. In this work, we develop a tractable algorithm based on reweighted
$l_1$-minimization that recovers a sparse signal from its power spectral
density for significantly higher sparsities, which is unprecedented.
We discuss the square-root bottleneck of the existing convex algorithms and
show that a $k$-sparse signal can be efficiently recovered using $O(k^2logn)$
phaseless Fourier measurements. We also show that a $k$-sparse signal can be
recovered using only $O(k log n)$ phaseless measurements if we are allowed to
design the measurement matrices.
Authors' comments: 6 pages, 2 figures
Fatiha Boubekeur, Wassila Azzoug
Traditional information retrieval systems rely on keywords to index documents
and queries. In such systems, documents are retrieved based on the number of
shared keywords with the query. This lexical-focused retrieval leads to
inaccurate and incomplete results when different keywords are used to describe
the documents and queries. Semantic-focused retrieval approaches attempt to
overcome this problem by relying on concepts rather than on keywords to
indexing and retrieval. The goal is to retrieve documents that are semantically
relevant to a given user query. This paper addresses this issue by proposing a
solution at the indexing level. More precisely, we propose a novel approach for
semantic indexing based on concepts identified from a linguistic resource. In
particular, our approach relies on the joint use of WordNet and WordNetDomains
lexical databases for concept identification. Furthermore, we propose a
semantic-based concept weighting scheme that relies on a novel definition of
concept centrality. The resulting system is evaluated on the TIME test
collection. Experimental results show the effectiveness of our proposition over
traditional IR approaches.
Authors' comments: 18 pages, 5 tables, 3 figures
Bernhard G. Bodmann, Nathaniel Hammen
We investigate the recovery of vectors from magnitudes of frame coefficients
when the frames have a low redundancy, meaning a small number of frame vectors
compared to the dimension of the Hilbert space. We first show that for vectors
in d dimensions, 4d-4 suitably chosen frame vectors are sufficient to uniquely
determine each signal, up to an overall unimodular constant, from the
magnitudes of its frame coefficients. Then we discuss the effect of noise and
show that 8d-4 frame vectors provide a stable recovery if part of the frame
coefficients is bounded away from zero. In this regime, perturbing the
magnitudes of the frame coefficients by noise that is sufficiently small
results in a recovery error that is at most proportional to the noise level.
Authors' comments: 12 pages AMSLaTeX, 1 figure
Yoav Shechtman, Amir Beck, Yonina C. Eldar
We consider the problem of phase retrieval, namely, recovery of a signal from
the magnitude of its Fourier transform, or of any other linear transform. Due
to the loss of the Fourier phase information, this problem is ill-posed.
Therefore, prior information on the signal is needed in order to enable its
recovery. In this work we consider the case in which the signal is known to be
sparse, i.e., it consists of a small number of nonzero elements in an
appropriate basis. We propose a fast local search method for recovering a
sparse signal from measurements of its Fourier transform (or other linear
transform) magnitude which we refer to as GESPAR: GrEedy Sparse PhAse
Retrieval. Our algorithm does not require matrix lifting, unlike previous
approaches, and therefore is potentially suitable for large scale problems such
as images. Simulation results indicate that GESPAR is fast and more accurate
than existing techniques in a variety of settings.
Authors' comments: Generalized to non-Fourier measurements, added 2D simulations, and a
theorem for convergence to stationary point
Andrey Nikolaev
Spinlocks are widely used in database engines for processes synchronization.
KGX mutexes is new retrial spinlocks appeared in contemporary Oracle versions
for submicrosecond synchronization. The mutex contention is frequently observed
in highly concurrent OLTP environments.
This work explores how Oracle mutexes operate, spin, and sleep. It develops
predictive mathematical model and discusses parameters and statistics related
to mutex performance tuning, as well as results of contention experiments.
Authors' comments: Proceedings of International Conference on Informatics MEDIAS2012.
Cyprus, Limassol, May 7--14, 2012. ISBN 978-5-88835-023-2. 12 pages, 15
figures
Roberto Konow, Gonzalo Navarro
An optimal index solving top-k document retrieval [Navarro and Nekrich,
SODA12] takes O(m + k) time for a pattern of length m, but its space is at
least 80n bytes for a collection of n symbols. We reduce it to 1.5n to 3n
bytes, with O(m+(k+log log n) log log n) time, on typical texts. The index is
up to 25 times faster than the best previous compressed solutions, and requires
at most 5% more space in practice (and in some cases as little as one half).
Apart from replacing classical by compressed data structures, our main idea is
to replace suffix tree sampling by frequency thresholding to achieve
compression.
Authors' comments: 10 pages
Yonina C. Eldar, Shahar Mendelson
We consider stability and uniqueness in real phase retrieval problems over general input sets. Specifically, we assume the data consists of noisy quadratic measurements of an unknown input x in R^n that lies in a general set T and study conditions under which x can be stably recovered from the measurements. In the noise-free setting we derive a general expression on the number of measurements needed to ensure that a unique solution can be found in a stable way, that depends on the set T through a natural complexity parameter. This parameter can be computed explicitly for many sets T of interest. For example, for k-sparse inputs we show that O(k\log(n/k)) measurements are needed, and when x can be any vector in R^n, O(n) measurements suffice. In the noisy case, we show that if one can find a value for which the empirical risk is bounded by a given, computable constant (that depends on the set T), then the error with respect to the true input is bounded above by an another, closely related complexity parameter of the set. By choosing an appropriate number N of measurements, this bound can be made arbitrarily small, and it decays at a rate faster than N^{-1/2+\delta} for any \delta>0. In particular, for k-sparse vectors stable recovery is possible from O(k\log(n/k)\log k) noisy measurements, and when x can be any vector in R^n, O(n \log n) noisy measurements suffice. We also show that the complexity parameter for the quadratic problem is the same as the one used for analyzing stability in linear measurements under very general conditions. Thus, no substantial price has to be paid in terms of stability if there is no knowledge of the phase.
Yi Wang
This paper gives a summary of the content-based Image Retrieval and Content-based Audio Retrieval, which are two parts of the Content-based Retrieval. Content-based Retrieval is the retrieval based on the features of the content. Generally, it is a way to extract features of the media data and find other data with the similar features from the database automatically. Content-based Retrieval can not only work on discrete media like texts, but also can be used on continuous media, such as video and audio.