Madhu Reddy, Bernard J. Jansen
One of the key components of designing usable and useful collaborative
information retrieval systems is to understand the needs of the users of these
systems. Our research team has been exploring collaborative information
behavior in a variety of organizational settings. Our research goals have been
two-fold: First, to develop a conceptual understanding of collaborative
information behavior and second, gather requirements for the design of
collaborative information retrieval systems. In this paper, we present a brief
overview of our fieldwork in a three different organizational settings, discuss
our methodology for collecting data on collaborative information behavior, and
highlight some lessons that we are learning about potential users of
collaborative information retrieval systems in these domains.
Authors' comments: Presented at 1st Intl Workshop on Collaborative Information Seeking,
2008 (arXiv:0908.0583)
Samuel L. Braunstein, Stefano Pirandola, Karol Życzkowski
We show that, in order to preserve the equivalence principle until late times
in unitarily evaporating black holes, the thermodynamic entropy of a black hole
must be primarily entropy of entanglement across the event horizon. For such
black holes, we show that the information entering a black hole becomes encoded
in correlations within a tripartite quantum state, the quantum analogue of a
one-time pad, and is only decoded into the outgoing radiation very late in the
evaporation. This behavior generically describes the unitary evaporation of
highly entangled black holes and requires no specially designed evolution. Our
work suggests the existence of a matter-field sum rule for any fundamental
theory.
Authors' comments: Change of title to reflect information return. The physics of
"energetic curtains" remains unchanged
B. Piwowarski, M. Lalmas
Even the best information retrieval model cannot always identify the most
useful answers to a user query. This is in particular the case with web search
systems, where it is known that users tend to minimise their effort to access
relevant information. It is, however, believed that the interaction between
users and a retrieval system, such as a web search engine, can be exploited to
provide better answers to users. Interactive Information Retrieval (IR)
systems, in which users access information through a series of interactions
with the search system, are concerned with building models for IR, where
interaction plays a central role. There are many possible interactions between
a user and a search system, ranging from query (re)formulation to relevance
feedback. However, capturing them within a single framework is difficult and
previously proposed approaches have mostly focused on relevance feedback. In
this paper, we propose a general framework for interactive IR that is able to
capture the full interaction process in a principled way. Our approach relies
upon a generalisation of the probability framework of quantum physics, whose
strong geometric component can be a key towards a successful interactive IR
model.
Authors' comments: 14 pages, 1 figure
Mikhail Basilyan
In this paper we present a novel method for retrieving information in
languages other than that of the query. We use this technique in combination
with existing traditional Cross Language Information Retrieval (CLIR)
techniques to improve their results. This method has a number of advantages
over traditional techniques that rely on machine translation to translate the
query and then search the target document space using a machine translation.
This method is not limited to the availability of a machine translation
algorithm for the desired language and uses already existing sources of readily
available translated information on the internet as a "middle-man" approach. In
this paper we use Wikipedia; however, any similar multilingual, cross
referenced body of documents can be used. For evaluation and comparison
purposes we also implemented a traditional machine translation approach
separately as well as the Wikipedia approach separately.
Authors' comments: 9 pages
Daniel Ballester
The ability to accumulate and retrieve entanglement in the fields of two
remote cavities with pairs of two-level atoms is discussed. It is shown that
this transfer and retrieval can be indeed ideal with a resonant interaction.
The case of initial non-maximally entangled atomic pairs is also considered.
This leads to the possibility of concentrating entanglement into a single pair
at the retrieval stage. A teleportation protocol based on the same setup is
presented. This makes possible teleportation with built-in entanglement
concentration.
Authors' comments: 8 pages, 5 figs, 2 tables. To appear in PRA
Paolo Bolettieri, Andrea Esuli, Fabrizio Falchi, Claudio Lucchese, Raffaele Perego, Tommaso Piccioli, Fausto Rabitti
The scalability, as well as the effectiveness, of the different Content-based
Image Retrieval (CBIR) approaches proposed in literature, is today an important
research issue. Given the wealth of images on the Web, CBIR systems must in
fact leap towards Web-scale datasets. In this paper, we report on our
experience in building a test collection of 100 million images, with the
corresponding descriptive features, to be used in experimenting new scalable
techniques for similarity searching, and comparing their results. In the
context of the SAPIR (Search on Audio-visual content using Peer-to-peer
Information Retrieval) European project, we had to experiment our distributed
similarity searching technology on a realistic data set. Therefore, since no
large-scale collection was available for research purposes, we had to tackle
the non-trivial process of image crawling and descriptive feature extraction
(we used five MPEG-7 features) using the European EGEE computer GRID. The
result of this effort is CoPhIR, the first CBIR test collection of such scale.
CoPhIR is now open to the research community for experiments and comparisons,
and access to the collection was already granted to more than 50 research
groups worldwide.
Authors' comments: 15 pages
George Parfionov, Romàn Zapatrin
We develop a macro-model of information retrieval process using Game Theory as a mathematical theory of conflicts. We represent the participants of the Information Retrieval process as a game of two abstract players. The first player is the `intellectual crowd' of users of search engines, the second is a community of information retrieval systems. In order to apply Game Theory, we treat search log data as Nash equilibrium strategies and solve the inverse problem of finding appropriate payoff functions. For that, we suggest a particular model, which we call Alpha model. Within this model, we suggest a method, called shifting, which makes it possible to partially control the behavior of massive users. This Note is addressed to researchers in both game theory (providing a new class of real life problems) and information retrieval, for whom we present new techniques to control the IR environment.
Jie Luo, Mario A. Nascimento
The typical content-based image retrieval problem is to find images within a
database that are similar to a given query image. This paper presents a
solution to a different problem, namely that of content based sub-image
retrieval, i.e., finding images from a database that contains another image.
Note that this is different from finding a region in a (segmented) image that
is similar to another image region given as a query. We present a technique for
CBsIR that explores relevance feedback, i.e., the user's input on intermediary
results, in order to improve retrieval efficiency. Upon modeling images as a
set of overlapping and recursive tiles, we use a tile re-weighting scheme that
assigns penalties to each tile of the database images and updates the tile
penalties for all relevant images retrieved at each iteration using both the
relevant and irrelevant images identified by the user. Each tile is modeled by
means of its color content using a compact but very efficient method which can,
indirectly, capture some notion of texture as well, despite the fact that only
color information is maintained. Performance evaluation on a largely
heterogeneous dataset of over 10,000 images shows that the system can achieve a
stable average recall value of 70% within the top 20 retrieved (and presented)
images after only 5 iterations, with each such iteration taking about 2 seconds
on an off-the-shelf desktop computer.
Authors' comments: A preliminary version of this paper appeared in the Proceedings of
the 1st ACM International Workshop on Multimedia Databases, p. 63-69. 2003
Antonio Galves, Charlotte Galves, Jesús E. García, Nancy L. Garcia, Florencia Leonardi
The starting point of this article is the question "How to retrieve
fingerprints of rhythm in written texts?" We address this problem in the case
of Brazilian and European Portuguese. These two dialects of Modern Portuguese
share the same lexicon and most of the sentences they produce are superficially
identical. Yet they are conjectured, on linguistic grounds, to implement
different rhythms. We show that this linguistic question can be formulated as a
problem of model selection in the class of variable length Markov chains. To
carry on this approach, we compare texts from European and Brazilian
Portuguese. These texts are previously encoded according to some basic rhythmic
features of the sentences which can be automatically retrieved. This is an
entirely new approach from the linguistic point of view. Our statistical
contribution is the introduction of the smallest maximizer criterion which is a
constant free procedure for model selection. As a by-product, this provides a
solution for the problem of optimal choice of the penalty constant when using
the BIC to select a variable length Markov chain. Besides proving the
consistency of the smallest maximizer criterion when the sample size diverges,
we also make a simulation study comparing our approach with both the standard
BIC selection and the Peres-Shields order estimation. Applied to the linguistic
sample constituted for our case study, the smallest maximizer criterion assigns
different context-tree models to the two dialects of Portuguese. The features
of the selected models are compatible with current conjectures discussed in the
linguistic literature.
Authors' comments: Published in at http://dx.doi.org/10.1214/11-AOAS511 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org)
Jose Torres, Luis Paulo Reis
The Visual Object Information Retrieval (VOIR) system described in this paper
implements an image retrieval approach that combines two layers, the conceptual
and the visual layer. It uses terms from a textual thesaurus to represent the
conceptual information and also works with image regions, the visual
information. The terms are related with the image regions through a weighted
association enabling the execution of concept-level queries. VOIR uses
region-based relevance feedback to improve the quality of the results in each
query session and to discover new associations between text and image. This
paper describes a user-centred and task-oriented comparative evaluation of VOIR
which was undertaken considering three distinct versions of VOIR: a full-fledge
version; one supporting relevance feedback only at image level; and a third
version not supporting relevance feedback at all. The evaluation performed
showed the usefulness of region based relevance feedback in the context of VOIR
prototype.
Authors' comments: 15 Pages, 20 References
Philipp Mayr, Vivien Petras
The German Federal Ministry for Education and Research funded a major
terminology mapping initiative, which found its conclusion in 2007. The task of
this terminology mapping initiative was to organize, create and manage
'cross-concordances' between controlled vocabularies (thesauri, classification
systems, subject heading lists) centred around the social sciences but quickly
extending to other subject areas. 64 crosswalks with more than 500,000
relations were established. In the final phase of the project, a major
evaluation effort to test and measure the effectiveness of the vocabulary
mappings in an information system environment was conducted. The paper reports
on the cross-concordance work and evaluation results.
Authors' comments: 19 pages, 4 figures, 11 tables, IFLA conference 2008
Lagogiannis George, Lorentzos Nikos, Sioutas Spyros, Theodoridis Evaggelos
The paper is concerned with the time efficient processing of spatiotemporal
predicates, i.e. spatial predicates associated with an exact temporal
constraint. A set of such predicates forms a buffer query or a Spatio-temporal
Pattern (STP) Query with time. In the more general case of an STP query, the
temporal dimension is introduced via the relative order of the spatial
predicates (STP queries with order). Therefore, the efficient processing of a
spatiotemporal predicate is crucial for the efficient implementation of more
complex queries of practical interest. We propose an extension of a known
approach, suitable for processing spatial predicates, which has been used for
the efficient manipulation of STP queries with order. The extended method is
supported by efficient indexing structures. We also provide experimental
results that show the efficiency of the technique.
Authors' comments: 6 pages, 7 figures, submitted to Sigmod Record
Martin Dietzfelbinger, Rasmus Pagh
The retrieval problem is the problem of associating data with keys in a set. Formally, the data structure must store a function f: U ->{0,1}^r that has specified values on the elements of a given set S, a subset of U, |S|=n, but may have any value on elements outside S. Minimal perfect hashing makes it possible to avoid storing the set S, but this induces a space overhead of Theta(n) bits in addition to the nr bits needed for function values. In this paper we show how to eliminate this overhead. Moreover, we show that for any k query time O(k) can be achieved using space that is within a factor 1+e^{-k} of optimal, asymptotically for large n. If we allow logarithmic evaluation time, the additive overhead can be reduced to O(log log n) bits whp. The time to construct the data structure is O(n), expected. A main technical ingredient is to utilize existing tight bounds on the probability of almost square random matrices with rows of low weight to have full row rank. In addition to direct constructions, we point out a close connection between retrieval structures and hash tables where keys are stored in an array and some kind of probing scheme is used. Further, we propose a general reduction that transfers the results on retrieval into analogous results on approximate membership, a problem traditionally addressed using Bloom filters. Again, we show how to eliminate the space overhead present in previously known methods, and get arbitrarily close to the lower bound. The evaluation procedures of our data structures are extremely simple (similar to a Bloom filter). For the results stated above we assume free access to fully random hash functions. However, we show how to justify this assumption using extra space o(n) to simulate full randomness on a RAM.
Sabu M. Thampi, K. Chandra Sekaran
At present, the de-facto standard for providing contents in the Internet is the World Wide Web. A technology, which is now emerging on the Web, is Content-Based Image Retrieval (CBIR). CBIR applies methods and algorithms from computer science to analyse and index images based on their visual content. Mobile agents push the flexibility of distributed systems to their limits since not only computations are dynamically distributed but also the code that performs them. The current commercial applet-based methodologies for accessing image database systems offer limited flexibility, scalability and robustness. In this paper the author proposes a new framework for content-based WWW distributed image retrieval based on Java-based mobile agents. The implementation of the framework shows that its performance is comparable to, and in some cases outperforms, the current approach.
Radhakrishnan Nagarajan, Anand Nagarajan, Mariofanna Milanova
Developing fast and efficient algorithms for retrieval of objects to a given
user query is an area of active research. The present study investigates
retrieval of time series objects from a phoneme database to a given user
pattern or query. The proposed method maps the one-dimensional time series
retrieval into a sequence retrieval problem by partitioning the
multi-dimensional phase-space using k-means clustering. The problem of whole
sequence as well as subsequence matching is considered. Robustness of the
proposed technique is investigated on phoneme time series corrupted with
additive white Gaussian noise. The shortcoming of classical power-spectral
techniques for time series retrieval is also discussed.
Authors' comments: 6 Pages, 5 Figures
Eric Larose, Arnaud Derode, Dominique Clorennec, Ludovic Margerin, Michel Campillo
When averaged over sources or disorder, cross-correlation of diffuse fields yield the Green's function between two passive sensors. This technique is applied to elastic ultrasonic waves in an open scattering slab mimicking seismic waves in the Earth's crust. It appears that the Rayleigh wave reconstruction depends on the scattering properties of the elastic slab. Special attention is paid to the specific role of bulk to Rayleigh wave coupling, which may result in unexpected phenomena like a persistent time-asymmetry in the diffuse regime.
Jiangfeng Zhou, Thomas Koschny, Maria Kafesaki, Costas M. Soukoulis
We study the dependence of the retrieval parameters, such as the electric
permittivity, the magnetic permeability and the index of refraction, $n$, on
the size of the unit cell of a periodic metamaterial. The convergence of the
retrieved parameters on the number of the unit cells is also examined. We have
concentrated our studies on the so-called fishnet structure, which is the most
promising design to obtain negative $n$ at optical wavelengths. We find that as
the size of the unit cell decreases, the magnitude of the retrieved effective
parameters increases. The convergence of the effective parameters of the
fishnet as the number of the unit cells increases is demonstrated but found to
be slower than for regular split ring resonators and wires structures. This is
due to a much stronger coupling between the different unit cells in the fishnet
structure.
Authors' comments: Journal-ref and DOI added
Hanène Maghrebi, Amos David
The exponential growth of multimedia information and the development of various communication media generated new problems at various levels including the rate of flow of information, problems of storage and management. The difficulty which arises is no longer the existence of information but rather the access to this information. When designing multimedia information retrieval system, it is appropriate to bear in mind the potential users and their information needs. We assumed that multimedia information representation which takes into account explicitly the users' needs and the cases of use could contribute to the adaptation potentials of the system for the end-users. We believe also that responses of multimedia information system would be more relevant to the users' needs if the types of results to be used from the system were identified before the design and development of the system. We propose the integration of the users' information needs. More precisely integrating usage contexts of resulting information in an information system (during creation and feedback) should enhance more pertinent users' need. The first section of this study is dedicated to traditional multimedia information systems and specifically the approaches of representing multimedia information. Taking into account the dynamism of users, these approaches do not permit the explicit integration of the users' information needs. In this paper, we will present our proposals based on economic intelligence approach. This approach emphasizes the importance of starting any process of information retrieval witch the user information need.
K. Munir, M. Odeh, R. McClatchey, S. Khan, I. Habib
Information retrieval from distributed heterogeneous data sources remains a
challenging issue. As the number of data sources increases more intelligent
retrieval techniques, focusing on information content and semantics, are
required. Currently ontologies are being widely used for managing semantic
knowledge, especially in the field of bioinformatics. In this paper we describe
an ontology assisted system that allows users to query distributed
heterogeneous data sources by hiding details like location, information
structure, access pattern and semantic structure of the data. Our goal is to
provide an integrated view on biomedical information sources for the
Health-e-Child project with the aim to overcome the lack of sufficient
semantic-based reformulation techniques for querying distributed data sources.
In particular, this paper examines the problem of query reformulation across
biomedical data sources, based on merged ontologies and the underlying
heterogeneous descriptions of the respective data sources.
Authors' comments: 6 pages, 3 figures. Presented at the 4th International Workshop on
Frontiers of Information Technology -- FIT 2006. Islamabad, Pakistan December
2006
Filippo Geraci, Marco Pellegrini
Modern text retrieval systems often provide a similarity search utility, that
allows the user to find efficiently a fixed number k of documents in the data
set that are most similar to a given query (here a query is either a simple
sequence of keywords or the identifier of a full document found in previous
searches that is considered of interest). We consider the case of a textual
database made of semi-structured documents. Each field, in turns, is modelled
with a specific vector space. The problem is more complex when we also allow
each such vector space to have an associated user-defined dynamic weight that
influences its contribution to the overall dynamic aggregated and weighted
similarity. This dynamic problem has been tackled in a recent paper by
Singitham et al. in in VLDB 2004. Their proposed solution, which we take as
baseline, is a variant of the cluster-pruning technique that has the potential
for scaling to very large corpora of documents, and is far more efficient than
the naive exhaustive search. We devise an alternative way of embedding weights
in the data structure, coupled with a non-trivial application of a clustering
algorithm based on the furthest point first heuristic for the metric k-center
problem. The validity of our approach is demonstrated experimentally by showing
significant performance improvements over the scheme proposed in Singitham et
al. in VLDB 2004. We improve significantly tradeoffs between query time and
output quality with respect to the baseline method in Singitham et al. in in
VLDB 2004, and also with respect to a novel method by Chierichetti et al. to
appear in ACM PODS 2007. We also speed up the pre-processing time by a factor
at least thirty.
Authors' comments: Submitted to Spire 2007