Winfried Gödert
Starting from an unsolved problem of information retrieval this paper presents an ontology-based model for indexing and retrieval. The model combines the methods and experiences of cognitive-to-interpret indexing languages with the strengths and possibilities of formal knowledge representation. The core component of the model uses inferences along the paths of typed relations between the entities of a knowledge representation for enabling the determination of hit quantities in the context of retrieval processes. The entities are arranged in aspect-oriented facets to ensure a consistent hierarchical structure. The possible consequences for indexing and retrieval are discussed.
Rahul R Upadhyay
In this paper a novel approach for matrix manipulation and indexing is
proposed .Here the elements in a row of matrix are designated by numeric value
called permutation index followed by the elements of the row being randomised.
This is done for all the rows of the matrix and in the end the set of
permutation indices are put in the parent matrix and random locations depending
on a pre decided scheme called passkey. This passkey is used to put back the
elements of all the rows back in the correct sequence. This approach finds
application in data encapsulation and hiding.
Authors' comments: 6 pages, 4 figures, 5 tables
Aldo Conca, Dan Edidin, Milena Hering, Cynthia Vinzant
A complex frame is a collection of vectors that span $\mathbb{C}^M$ and
define measurements, called intensity measurements, on vectors in
$\mathbb{C}^M$. In purely mathematical terms, the problem of phase retrieval is
to recover a complex vector from its intensity measurements, namely the modulus
of its inner product with these frame vectors. We show that any vector is
uniquely determined (up to a global phase factor) from $4M-4$ generic
measurements. To prove this, we identify the set of frames defining
non-injective measurements with the projection of a real variety and bound its
dimension.
Authors' comments: 11 pages
Peter B. Lerner
The widely considered assertion is that the unitarity of quantum mechanical evolution assures the preservation of information. It is even promoted in popular literature as an established fact. (Susskind, 2008) Yet, a simple chain of reasoning demonstrates that: 1) almost any evolutionary operator can be well approximated by a degenerate (finite-rank) operator and 2) one needs an eternity to retrieve information exactly from a nonstationary quantum state and to distinguish between arbitrary unitary operator and its finite-dimensional approximations.
Sk Sazim, Chiranjeevi Vanarasa, Indranil Chakrabarty, Kannan Srinathan
In extant quantum secret sharing protocols, once the secret is shared in a
quantum network (\textsc{qnet}) it can not be retrieved back, even if the
dealer wishes that her secret no longer be available in the network. For
instance, if the dealer is part of two \textsc{qnet}s, say $\mathcal{Q}_1$ and
$\mathcal{Q}_2$ and subsequently finds that $\mathcal{Q}_2$ is more reliable
than $\mathcal{Q}_1$, the dealer may wish to transfer all her secrets from
$\mathcal{Q}_1$ to $\mathcal{Q}_2$. In this work we address this problem by
designing a protocol that enables the source/dealer to bring back the
information shared in the network, if desired. Unlike classical revocation,
no-cloning-theorem automatically ensures that the secret is no longer shared in
the network.
The implications of our results are multi-fold. One interesting implication
of our technique is the possibility of routing qubits in asynchronous
\textsc{qnets}. By asynchrony we mean that the requisite data/resources are
intermittently available (but not necessarily simultaneously) in the
\textsc{qnet}. For example, we show that a source $S$ can send quantum
information to a destination $R$ even though (a) $S$ and $R$ share no quantum
resource, (b) $R$'s identity is {\em unknown}\/ to $S$ initially, (c) $S$
herself can be $R$ at a later date and/or in a different location to bequeath
her information and (d) the path chosen for routing the secret may hit a
dead-end due to resource constraints. Another implication of our technique is
the possibility of using {\em insecure}\/ resources. For instance, it may
safely store its private information with a neighboring organization without
revealing data to the host and losing control over retrieving the data.
Putting the two implications together, namely routing and secure storage, it
is possible to envision applications like quantum mail (qmail) as an outsourced
service.
Authors' comments: 5 Pages including Appendix, 3 Figures and 3 Tables
K. Tikhonov, K. Samburskaya, T. Golubeva, Yu. Golubev
In this article the ability to record, store, and read out the quantum
properties of light is studied. The discussion is based on high-speed and
adiabatic models of quantum memory in lambda-configuration and in the limit of
strong resonance. We show that in this case the equality between efficiency and
squeezing ratio, predicted by the simple beamsplitter model, is broken. The
requirement of the maximum squeezing in the output pulse should not be
accompanied by the requirement of maximum efficiency of memory, as in the
beamsplitter model. We have demonstrated a high output pulse squeezing, when
the efficiency reached only about 50%. Comprehension of this "paradox" is
achieved on the basis of mode analysis. The memories eigenmodes, which have an
impact on the memory process, are found numerically. Also, the spectral
analysis of modes was performed to match the spectral width of the input signal
to the capacities of the memories.
Authors' comments: 19 pages, 8 figures, RevTeX4. Submitted to Phys. Rev. A
Abdul Kadir, Lukito Edi Nugroho, Adhi Susanto, Paulus Insap Santosa
One of important components in an image retrieval system is selecting a
distance measure to compute rank between two objects. In this paper, several
distance measures were researched to implement a foliage plant retrieval
system. Sixty kinds of foliage plants with various leaf color and shape were
used to test the performance of 7 different kinds of distance measures: city
block distance, Euclidean distance, Canberra distance, Bray-Curtis distance, x2
statistics, Jensen Shannon divergence and Kullback Leibler divergence. The
results show that city block and Euclidean distance measures gave the best
performance among the others.
Authors' comments: 14 pages, International Journal of Signal Processing, Image
Processing and Pattern Recognition Vol. 5, No. 2, June, 2012
Kishore Jaganathan, Samet Oymak, Babak Hassibi
The problem of signal recovery from its Fourier transform magnitude is of paramount importance in various fields of engineering and has been around for over 100 years. Due to the absence of phase information, some form of additional information is required in order to be able to uniquely identify the signal of interest. In this work, we focus our attention on discrete-time sparse signals (of length $n$). We first show that, if the DFT dimension is greater than or equal to $2n$, almost all signals with {\em aperiodic} support can be uniquely identified by their Fourier transform magnitude (up to time-shift, conjugate-flip and global phase). Then, we develop an efficient Two-stage Sparse Phase Retrieval algorithm (TSPR), which involves: (i) identifying the support, i.e., the locations of the non-zero components, of the signal using a combinatorial algorithm (ii) identifying the signal values in the support using a convex algorithm. We show that TSPR can {\em provably} recover most $O(n^{1/2-\eps})$-sparse signals (up to a time-shift, conjugate-flip and global phase). We also show that, for most $O(n^{1/4-\eps})$-sparse signals, the recovery is {\em robust} in the presence of measurement noise. Numerical experiments complement our theoretical analysis and verify the effectiveness of TSPR.
Yuxin Chen, Yonina C. Eldar, Andrea J. Goldsmith
We explore a fundamental problem of super-resolving a signal of interest from
a few measurements of its low-pass magnitudes. We propose a 2-stage tractable
algorithm that, in the absence of noise, admits perfect super-resolution of an
$r$-sparse signal from $2r^2-2r+2$ low-pass magnitude measurements. The spike
locations of the signal can assume any value over a continuous disk, without
increasing the required sample size. The proposed algorithm first employs a
conventional super-resolution algorithm (e.g. the matrix pencil approach) to
recover unlabeled sets of signal correlation coefficients, and then applies a
simple sorting algorithm to disentangle and retrieve the true parameters in a
deterministic manner. Our approach can be adapted to multi-dimensional spike
models and random Fourier sampling by replacing its first step with other
harmonic retrieval algorithms.
Authors' comments: accepted to IEEE ICASSP 2014
Philipp Mayr, Peter Mutschke
Bibliometric techniques are not yet widely used to enhance retrieval
processes in digital libraries, although they offer value-added effects for
users. In this paper we will explore how statistical modelling of scholarship,
such as Bradfordizing or network analysis of coauthorship network, can improve
retrieval services for specific communities, as well as for large, cross-domain
large collections. This paper aims to raise awareness of the missing link
between information retrieval (IR) and bibliometrics / scientometrics and to
create a common ground for the incorporation of bibliometric-enhanced services
into retrieval at the digital library interface.
Authors' comments: 4 pages, IEEE BigData 2013, Workshop on Scholarly Big Data:
Challenges and Ideas
Sohan Seth, Niko Välimäki, Samuel Kaski, Antti Honkela
Over the recent years, the field of whole metagenome shotgun sequencing has
witnessed significant growth due to the high-throughput sequencing technologies
that allow sequencing genomic samples cheaper, faster, and with better coverage
than before. This technical advancement has initiated the trend of sequencing
multiple samples in different conditions or environments to explore the
similarities and dissimilarities of the microbial communities. Examples include
the human microbiome project and various studies of the human intestinal tract.
With the availability of ever larger databases of such measurements, finding
samples similar to a given query sample is becoming a central operation. In
this paper, we develop a content-based exploration and retrieval method for
whole metagenome sequencing samples. We apply a distributed string mining
framework to efficiently extract all informative sequence $k$-mers from a pool
of metagenomic samples and use them to measure the dissimilarity between two
samples. We evaluate the performance of the proposed approach on two human gut
metagenome data sets as well as human microbiome project metagenomic samples.
We observe significant enrichment for diseased gut samples in results of
queries with another diseased sample and very high accuracy in discriminating
between different body sites even though the method is unsupervised. A software
implementation of the DSM framework is available at
https://github.com/HIITMetagenomics/dsm-framework
Authors' comments: 16 pages; additional results
Henrik Ohlsson, Yonina C. Eldar
The phase retrieval problem has a long history and is an important problem in many areas of optics. Theoretical understanding of phase retrieval is still limited and fundamental questions such as uniqueness and stability of the recovered solution are not yet fully understood. This paper provides several additions to the theoretical understanding of sparse phase retrieval. In particular we show that if the measurement ensemble can be chosen freely, as few as 4k-1 phaseless measurements suffice to guarantee uniqueness of a k-sparse M-dimensional real solution. We also prove that 2(k^2-k+1) Fourier magnitude measurements are sufficient under rather general conditions.
Spyros Hadjichristodoulou, Donald E. Porter, David S. Warren
In this paper we investigate XSB-Prolog as a static analysis engine for data
represented by medium-sized graphs. We use XSB-Prolog to automatically identify
function dependencies in the Linux Kernel---queries that are difficult to
implement efficiently in a commodity database and that developers often have to
identify manually. This project illustrates that Prolog systems are ideal for
building tools for use in other disciplines that require sophisticated
inferences, because Prolog is both declarative and can efficiently implement
complex problem specifications through tabling and indexing.
Authors' comments: Part of WLPE 2013 proceedings (arXiv:1308.2055)
Ali Wali, Adel M. Alimi
In this paper, we present an overview of a multimodal system to indexing and
searching video sequence by the content that has been developed within the
REGIMVid project. A large part of our system has been developed as part of
TRECVideo evaluation. The MAVSIR platform provides High-level feature
extraction from audio-visual content and concept/event-based video retrieval.
We illustrate the architecture of the system as well as provide an overview of
the descriptors supported to date. Then we demonstrate the usefulness of the
toolbox in the context of feature extraction, concepts/events learning and
retrieval in large collections of video surveillance dataset. The results are
encouraging as we are able to get good results on several event categories,
while for all events we have gained valuable insights and experience.
Authors' comments: 7 pages
Jae-Min Lee, Kevin Heng, Patrick G. J. Irwin
Directly-imaged exoplanets are unexplored laboratories for the application of
the spectral and temperature retrieval method, where the chemistry and
composition of their atmospheres are inferred from inverse modeling of the
available data. As a pilot study, we focus on the extrasolar gas giant HR 8799b
for which more than 50 data points are available. We upgrade our non-linear
optimal estimation retrieval method to include a phenomenological model of
clouds that requires the cloud optical depth and monodisperse particle size to
be specified. Previous studies have focused on forward models with assumed
values of the exoplanetary properties; there is no consensus on the best-fit
values of the radius, mass, surface gravity and effective temperature of HR
8799b. We show that cloudfree models produce reasonable fits to the data if the
atmosphere is of super-solar metallicity and non-solar elemental abundances.
Intermediately cloudy models with moderate values of the cloud optical depth
and micron-sized particles provide an equally reasonable fit to the data and
require a lower mean molecular weight. We report our best-fit values for the
radius, mass, surface gravity and effective temperature of HR 8799b. The mean
molecular weight is about 3.8, while the carbon-to-oxygen ratio is about unity
due to the prevalence of carbon monoxide. Our study emphasizes the need for
robust claims about the nature of an exoplanetary atmosphere to be based on
analyses involving both photometry and spectroscopy and inferred from beyond a
few photometric data points, such as are typically reported for hot Jupiters.
Authors' comments: Accepted by ApJ. 19 pages, 19 figures, 2 tables. Several improvements
made ("radius ratio problem" removed, resolution of contour plots doubled,
text expanded), major conclusions unchanged
Mohammed Alaeddine Abderrahim, Mohammed El Amine Abderrahim, Mohammed Amine Chikh
In the context of arabic Information Retrieval Systems (IRS) guided by arabic
ontology and to enable those systems to better respond to user requirements,
this paper aims to representing documents and queries by the best concepts
extracted from Arabic Wordnet. Identified concepts belonging to Arabic WordNet
synsets are extracted from documents and queries, and those having a single
sense are expanded. The expanded query is then used by the IRS to retrieve the
relevant documents searched. Our experiments are based primarily on a medium
size corpus of arabic text. The results obtained shown us that there are a
global improvement in the performance of the arabic IRS.
Authors' comments: 6 pages,2 figures,7 tables
Bo Li, Henry Johan
3D model retrieval techniques can be classified as histogram-based,
view-based and graph-based approaches. We propose a hybrid shape descriptor
which combines the global and local radial distance features by utilizing the
histogram-based and view-based approaches respectively. We define an
area-weighted global radial distance with respect to the center of the bounding
sphere of the model and encode its distribution into a 2D histogram as the
global radial distance shape descriptor. We then uniformly divide the bounding
cube of a 3D model into a set of small cubes and define their centers as local
centers. Then, we compute the local radial distance of a point based on the
nearest local center. By sparsely sampling a set of views and encoding the
local radial distance feature on the rendered views by color coding, we extract
the local radial distance shape descriptor. Based on these two shape
descriptors, we develop a hybrid radial distance shape descriptor for 3D model
retrieval. Experiment results show that our hybrid shape descriptor outperforms
several typical histogram-based and view-based approaches.
Authors' comments: 6
L. Bouchet, P. -R Amestoy, A. Buttari, F. -H. Rouet, M. Chauvin
The INTEGRAL/SPI, X-gamma-ray spectrometer (20 keV - 8 MeV) is an instrument
for which recovering source intensity variations is not straightforward and can
constitute a difficulty for data analysis. In most cases, determining the
source intensity changes between exposures is largely based on a priori
information. We propose techniques that help to overcome the difficulty related
to source intensity variations, which make this step more rational. In
addition, the constructed "synthetic" light curves should permit us to obtain a
sky model that describes the data better and optimizes the source
signal-to-noise ratios. For this purpose, the time intensity variation of each
source was modeled as a combination of piecewise segments of time during which
a given source exhibits a constant intensity. To optimize the signal-to-noise
ratios, the number of segments was minimized. We present a first method that
takes advantage of previous time series that can be obtained from another
instrument on-board the INTEGRAL observatory. A data segmentation algorithm was
then used to synthesize the time series into segments. The second method no
longer needs external light curves, but solely SPI raw data. For this, we
developed a specific algorithm that involves the SPI transfer function. The
time segmentation algorithms that were developed solve a difficulty inherent to
the SPI instrument, which is the intensity variations of sources between
exposures, and it allows us to obtain more information about the sources'
behavior.
Authors' comments: 23 pages, 15 figures. Accepted for publication in A&A
Yuan Sun
If the phase retrieval problem can be solved by a method similar to that of
solving a system of linear equations under the context of FFT, the time
complexity of computer based phase retrieval algorithm would be reduced. Here I
present such a method which is recursive but highly non-linear in nature, based
on a close look at the Fourier spectrum of the square of the function norm. In
a one dimensional problem it takes $O(N^2)$ steps of calculation to recover the
phases of an N component complex vector. This method could work in 1, 2 or even
higher dimensional finite Fourier analysis without changes in the behavior of
time complexity. For one dimensional problem the performance of an algorithm
based on this method is shown, where the limitations are discussed too,
especially when subject to random noises which contains significant high
frequency components.
Authors' comments: 4 pages, 4 figures
Ämin Baumeler, Anne Broadbent
In Private Information Retrieval (PIR), a client queries an n-bit database in
order to retrieve an entry of her choice, while maintaining privacy of her
query value. Chor, Goldreich, Kushilevitz, and Sudan showed that, in the
information-theoretical setting, a linear amount of communication is required
for classical PIR protocols (and thus that the trivial protocol is optimal).
This linear lower bound was shown by Nayak to hold also in the quantum setting.
Here, we extend Nayak's result by considering approximate privacy, and
requiring security only against "specious" adversaries, which are, in analogy
to classical honest-but-curious adversaries, the weakest reasonable quantum
adversaries. We show that, even in this weakened scenario, Quantum Private
Information Retrieval (QPIR) requires n qubits of communication. From this
follows that Le Gall's recent QPIR protocol with sublinear communication
complexity is not information-theoretically private, against the weakest
reasonable cryptographic adversary.
Authors' comments: 8 pages, minor typos fixed, affiliation of A. B. changed,
acknowledgments extended