Philipp Mayr, Andrea Scharnhorst
This special issue brings together eight papers from experts of communities
which often have been perceived as different once: bibliometrics,
scientometrics and informetrics on the one side and information retrieval on
the other. The idea of this special issue started at the workshop "Combining
Bibliometrics and Information Retrieval" held at the 14th International
Conference of Scientometrics and Informetrics, Vienna, July 14-19, 2013. Our
motivation as guest editors started from the observation that main discourses
in both fields are different, that communities are only partly overlapping and
from the belief that a knowledge transfer would be profitable for both sides.
Authors' comments: 8 pages, 1 figure, editorial for a special issue to appear in
Scientometrics
Çağkan Yapar, Volker Pohl, Holger Boche
This paper considers the problem of recovering a $k$-sparse, $N$-dimensional
complex signal from Fourier magnitude measurements. It proposes a Fourier
optics setup such that signal recovery up to a global phase factor is possible
with very high probability whenever $M \gtrsim 4k\log_2(N/k)$ random Fourier
intensity measurements are available. The proposed algorithm is comprised of
two stages: An algebraic phase retrieval stage and a compressive sensing step
subsequent to it. Simulation results are provided to demonstrate the
applicability of the algorithm for noiseless and noisy scenarios.
Authors' comments: 8 pages, 4 figures, submitted to ICASSP 2015 on Oct 6th, 2014
Terence H. Chan, Siu-Wai Ho, Hirosuke Yamamoto
Private information retrieval scheme for coded data storage is considered in
this paper. We focus on the case where the size of each data record is large
and hence only the download cost (but not the upload cost for transmitting
retrieval queries) is of interest. We prove that the tradeoff between storage
cost and retrieval/download cost depends on the number of data records in the
system. We also propose a fairly general class of linear storage codes and
retrieval schemes and derive conditions under which our retrieval schemes are
error-free and private. Tradeoffs between the storage cost and retrieval costs
are also obtained. Finally, we consider special cases when the underlying
storage code is based on an MDS code. Using our proposed method, we show that a
randomly generated retrieval scheme is indeed very likely to be private and
error-free.
Authors' comments: submitted to IEEE Journal of Selected Topics in Signal Processing
Mark Iwen, Aditya Viswanathan, Yang Wang
In this short note we propose a simple two-stage sparse phase retrieval strategy that uses a near-optimal number of measurements, and is both computationally efficient and robust to measurement noise. In addition, the proposed strategy is fairly general, allowing for a large number of new measurement constructions and recovery algorithms to be designed with minimal effort.
Xirong Li
Due to the subjective nature of social tagging, measuring the relevance of social tags with respect to the visual content is crucial for retrieving the increasing amounts of social-networked images. Witnessing the limit of a single measurement of tag relevance, we introduce in this paper tag relevance fusion as an extension to methods for tag relevance estimation. We present a systematic study, covering tag relevance fusion in early and late stages, and in supervised and unsupervised settings. Experiments on a large present-day benchmark set show that tag relevance fusion leads to better image retrieval. Moreover, unsupervised tag relevance fusion is found to be practically as effective as supervised tag relevance fusion, but without the need of any training efforts. This finding suggests the potential of tag relevance fusion for real-world deployment.
Djallel Bouneffouf
Context-Based Information Retrieval is recently modelled as an exploration/
exploitation trade-off (exr/exp) problem, where the system has to choose
between maximizing its expected rewards dealing with its current knowledge
(exploitation) and learning more about the unknown user's preferences to
improve its knowledge (exploration). This problem has been addressed by the
reinforcement learning community but they do not consider the risk level of the
current user's situation, where it may be dangerous to explore the
non-top-ranked documents the user may not desire in his/her current situation
if the risk level is high. We introduce in this paper an algorithm named
CBIR-R-greedy that considers the risk level of the user's situation to
adaptively balance between exr and exp.
Authors' comments: arXiv admin note: substantial text overlap with arXiv:1408.2195
Dilip K. Limbu, Andy M. Connor, Russel Pears, Stephen G. MacDonell
Contextual retrieval is a critical technique for today's search engines in terms of facilitating queries and returning relevant information. This paper reports on the development and evaluation of a system designed to tackle some of the challenges associated with contextual information retrieval from the World Wide Web (WWW). The developed system has been designed with a view to capturing both implicit and explicit user data which is used to develop a personal contextual profile. Such profiles can be shared across multiple users to create a shared contextual knowledge base. These are used to refine search queries and improve both the search results for a user as well as their search experience. An empirical study has been undertaken to evaluate the system against a number of hypotheses. In this paper, results related to one are presented that support the claim that users can find information more readily using the contextual search system.
Simon Gog, Matthias Petri
We engineer a self-index based retrieval system capable of rank-safe
evaluation of top-k queries. The framework generalizes the GREEDY approach of
Culpepper et al. (ESA 2010) to handle multi-term queries, including over
phrases. We propose two techniques which significantly reduce the ranking time
for a wide range of popular Information Retrieval (IR) relevance measures, such
as TFxIDF and BM25. First, we reorder elements in the document array according
to document weight. Second, we introduce the repetition array, which
generalizes Sadakane's (JDA 2007) document frequency structure to document
subsets. Combining document and repetition array, we achieve attractive
functionality-space trade-offs. We provide an extensive evaluation of our
system on terabyte-sized IR collections.
Authors' comments: 14 pages, 9 figures
Shasha Bu, Yu-Jin Zhang
A novel representation of images for image retrieval is introduced in this
paper, by using a new type of feature with remarkable discriminative power.
Despite the multi-scale nature of objects, most existing models perform feature
extraction on a fixed scale, which will inevitably degrade the performance of
the whole system. Motivated by this, we introduce a hierarchical sparse coding
architecture for image retrieval to explore multi-scale cues. Sparse codes
extracted on lower layers are transmitted to higher layers recursively. With
this mechanism, cues from different scales are fused. Experiments on the
Holidays dataset show that the proposed method achieves an excellent retrieval
performance with a small code length.
Authors' comments: 5 pages, 6 figures, conference
Peter Mutschke, Andrea Scharnhorst, Christophe Guéret, Philipp Mayr, Preben Hansen, Aida Slavic
Information systems usually show as a particular point of failure the
vagueness between user search terms and the knowledge orders of the information
space in question. Some kind of guided searching therefore becomes more and
more important in order to precisely discover information without knowing the
right search terms. Knowledge maps of digital library collections are promising
navigation tools through knowledge spaces but still far away from being
applicable for searching digital libraries. However, there is no continuous
knowledge exchange between the "map makers" on the one hand and the Information
Retrieval (IR) specialists on the other hand. Thus, there is also a lack of
models that properly combine insights of the two strands. The proposed workshop
aims at bringing together these two communities: experts in IR reflecting on
visual enhanced search interfaces and experts in knowledge mapping reflecting
on visualizations of the content of a collection that might also present a
context for a search term in a visual manner. The intention of the workshop is
to raise awareness of the potential of interactive knowledge maps for
information seeking purposes and to create a common ground for experiments
aiming at the incorporation of knowledge maps into IR models at the level of
the user interface.
Authors' comments: 6 pages, accepted workshop proposal for Digital Libraries 2014
Kutlu Emre Yılmaz, Ahmet Arslan, Ozgur Yilmazel
We used Lemur Toolkit, an open source toolkit designed for Information
Retrieval (IR) research, for our automated indexing and retrieval experiments
on a TREC-like test collection for Turkish. We study and compare three
retrieval models Lemur supports, especially Language modeling approach to IR,
combined with language specific preprocessing techniques. Our experiments show
that all retrieval models benefits from language specific preprocessing in
terms of retrieval quality. Also Language Modeling approach is the best
performing retrieval model when language specific preprocessing applied.
Authors' comments: 3 pages
Stéphane Mallat, Irène Waldspurger
We consider the phase retrieval problem in which one tries to reconstruct a
function from the modulus of its wavelet transform. We study the unicity and
stability of the reconstruction. In the case where the wavelets are Cauchy
wavelets, we prove that the modulus of the wavelet transform uniquely
determines the function up to a global phase. We show that the reconstruction
operator is continuous but not uniformly continuous. We describe how to
construct pairs of functions which are far away in $L^2$-norm but whose wavelet
transforms are very close, in modulus. The principle is to modulate the wavelet
transform of a fixed initial function by a phase which varies slowly in both
time and frequency. This construction seems to cover all the instabilities that
we observe in practice; we give a partial formal justification to this fact.
Finally, we describe an exact reconstruction algorithm and use it to
numerically confirm our analysis of the stability question.
Authors' comments: Acknowledgments updated
Friedrich Philipp
We prove by means of elementary methods that phase retrieval of complex
polynomials p of degree less than N is possible with 4N-4 phaseless Fourier
measurements of p and p'. In addition, we provide an associated algorithm and
prove that it recovers p up to global phase.
Authors' comments: 10 pages
Radu Balan, Dongmian Zou
In this note we prove that reconstruction from magnitudes of frame
coefficients (the so called "phase retrieval problem") can be performed using
Lipschitz continuous maps. Specifically we show that when the nonlinear
analysis map $\alpha:{\mathcal H}\rightarrow\mathbb{R}^m$ is injective, with
$(\alpha(x))_k=|<x,f_k>|^2$, where $\{f_1,\ldots,f_m\}$ is a frame for the
Hilbert space ${\mathcal H}$, then there exists a left inverse map
$\omega:\mathbb{R}^m\rightarrow {\mathcal H}$ that is Lipschitz continuous.
Additionally we obtain the Lipschitz constant of this inverse map in terms of
the lower Lipschitz constant of $\alpha$. Surprisingly the increase in
Lipschitz constant is independent of the space dimension or frame redundancy.
Authors' comments: 12 pages, 1 figure
Yoav Shechtman, Yonina C. Eldar, Oren Cohen, Henry N. Chapman, Jianwei Miao, Mordechai Segev
This review article provides a contemporary overview of phase retrieval in optical imaging, linking the relevant optical physics to the information processing methods and algorithms. Its purpose is to describe the current state of the art in this area, identify challenges, and suggest vision and areas where signal processing methods can have a large impact on optical imaging and on the world of imaging at large, with applications in a variety of fields ranging from biology and chemistry to physics and engineering.
Fabien Lauer, Henrik Ohlsson
This paper deals with sparse phase retrieval, i.e., the problem of estimating a vector from quadratic measurements under the assumption that few components are nonzero. In particular, we consider the problem of finding the sparsest vector consistent with the measurements and reformulate it as a group-sparse optimization problem with linear constraints. Then, we analyze the convex relaxation of the latter based on the minimization of a block l1-norm and show various exact recovery and stability results in the real and complex cases. Invariance to circular shifts and reflections are also discussed for real vectors measured via complex matrices.
Ko-Jen Hsiao, Jeff Calder, Alfred O. Hero III
Most content-based image retrieval systems consider either one single query, or multiple queries that include the same object or represent the same semantic information. In this paper we consider the content-based image retrieval problem for multiple query images corresponding to different image semantics. We propose a novel multiple-query information retrieval algorithm that combines the Pareto front method (PFM) with efficient manifold ranking (EMR). We show that our proposed algorithm outperforms state of the art multiple-query retrieval algorithms on real-world image databases. We attribute this performance improvement to concavity properties of the Pareto fronts, and prove a theoretical result that characterizes the asymptotic concavity of the fronts.
Benjamin Roth
This work compares concept models for cross-language retrieval: First, we
adapt probabilistic Latent Semantic Analysis (pLSA) for multilingual documents.
Experiments with different weighting schemes show that a weighting method
favoring documents of similar length in both language sides gives best results.
Considering that both monolingual and multilingual Latent Dirichlet Allocation
(LDA) behave alike when applied for such documents, we use a training corpus
built on Wikipedia where all documents are length-normalized and obtain
improvements over previously reported scores for LDA. Another focus of our work
is on model combination. For this end we include Explicit Semantic Analysis
(ESA) in the experiments. We observe that ESA is not competitive with LDA in a
query based retrieval task on CLEF 2000 data. The combination of machine
translation with concept models increased performance by 21.1% map in
comparison to machine translation alone. Machine translation relies on parallel
corpora, which may not be available for many language pairs. We further explore
how much cross-lingual information can be carried over by a specific
information source in Wikipedia, namely linked text. The best results are
obtained using a language modeling approach, entirely without information from
parallel corpora. The need for smoothing raises interesting questions on
soundness and efficiency. Link models capture only a certain kind of
information and suggest weighting schemes to emphasize particular words. For a
combined model, another interesting question is therefore how to integrate
different weighting schemes. Using a very simple combination scheme, we obtain
results that compare favorably to previously reported results on the CLEF 2000
dataset.
Authors' comments: 74 pages; MSc thesis at Saarland University
Avinash N Bhute, B. B. Meshram
In this paper, we present the efficient content based image retrieval systems
which employ the color, texture and shape information of images to facilitate
the retrieval process. For efficient feature extraction, we extract the color,
texture and shape feature of images automatically using edge detection which is
widely used in signal processing and image compression. For facilitated the
speedy retrieval we are implements the antipole-tree algorithm for indexing the
images.
Authors' comments: 12 pages
Volker Pohl, Fanny Yang, Holger Boche
The paper considers the phase retrieval problem in N-dimensional complex
vector spaces. It provides two sets of deterministic measurement vectors which
guarantee signal recovery for all signals, excluding only a specific subspace
and a union of subspaces, respectively. A stable analytic reconstruction
procedure of low complexity is given. Additionally it is proven that signal
recovery from these measurements can be solved exactly via a semidefinite
program. A practical implementation with 4 deterministic diffraction patterns
is provided and some numerical experiments with noisy measurements complement
the analytic approach.
Authors' comments: Preprint accepted for publication in Sampling Theory in Signal and
Image Processing -- Special issue on SampTa 2013