Antti Ukkonen
We consider the evaluation of approximate top-k queries from relations with a-priori unknown values. Such relations can arise for example in the context of expensive predicates, or cloud-based data sources. The task is to find an approximate top-k set that is close to the exact one while keeping the total processing cost low. The cost of a query is the sum of the costs of the entries that are read from the hidden relation. A novel aspect of this work is that we consider prior information about the values in the hidden matrix. We propose an algorithm that uses regression models at query time to assess whether a row of the matrix can enter the top-k set given that only a subset of its values are known. The regression models are trained with existing data that follows the same distribution as the relation subjected to the query. To evaluate the algorithm and to compare it with a method proposed previously in literature, we conduct experiments using data from a context sensitive Wikipedia search engine. The results indicate that the proposed method outperforms the baseline algorithms in terms of the cost while maintaining a high accuracy of the returned results.
Alberto Costa, Fabio Roda
In this paper we present a method for reformulating the Recommender Systems problem in an Information Retrieval one. In our tests we have a dataset of users who give ratings for some movies; we hide some values from the dataset, and we try to predict them again using its remaining portion (the so-called "leave-n-out approach"). In order to use an Information Retrieval algorithm, we reformulate this Recommender Systems problem in this way: a user corresponds to a document, a movie corresponds to a term, the active user (whose rating we want to predict) plays the role of the query, and the ratings are used as weigths, in place of the weighting schema of the original IR algorithm. The output is the ranking list of the documents ("users") relevant for the query ("active user"). We use the ratings of these users, weighted according to the rank, to predict the rating of the active user. We carry out the comparison by means of a typical metric, namely the accuracy of the predictions returned by the algorithm, and we compare this to the real ratings from users. In our first tests, we use two different Information Retrieval algorithms: LSPR, a recently proposed model based on Discrete Fourier Transform, and a simple vector space model.
Md. Saiful Islam, Md. Haider Ali
Due to the rapid development of World Wide Web (WWW) and imaging technology,
more and more images are available in the Internet and stored in databases.
Searching the related images by the querying image is becoming tedious and
difficult. Most of the images on the web are compressed by methods based on
discrete cosine transform (DCT) including Joint Photographic Experts
Group(JPEG) and H.261. This paper presents an efficient content-based image
indexing technique for searching similar images using discrete cosine transform
features. Experimental results demonstrate its superiority with the existing
techniques.
Authors' comments: 9 pages, 4 figures, 4 tables
Simin Feng
We apply the equivalent theory to orthorhombic anisotropic materials and
provide a general unit-cell design criterion for achieving a length-independent
retrieval of the effective material parameters from a single layer of unit
cells. We introduce a graphical retrieval method and phase unwrapping
techniques. The graphical method utilizes the linear regression technique. Our
method can reduce the uncertainty of experimental measurements and the
ambiguity of phase unwrapping. Moreover, the graphical method can
simultaneously determine the bulk values of the six effective material
parameters, permittivity and permeability tensors, from a single layer of unit
cells.
Authors' comments: Accepted for publication in Optics Express
Marek Karpinski, Yakov Nekrich
In this paper we describe a new efficient (in fact optimal) data structure for the {\em top-$K$ color problem}. Each element of an array $A$ is assigned a color $c$ with priority $p(c)$. For a query range $[a,b]$ and a value $K$, we have to report $K$ colors with the highest priorities among all colors that occur in $A[a..b]$, sorted in reverse order by their priorities. We show that such queries can be answered in $O(K)$ time using an $O(N\log \sigma)$ bits data structure, where $N$ is the number of elements in the array and $\sigma$ is the number of colors. Thus our data structure is asymptotically optimal with respect to the worst-case query time and space. As an immediate application of our results, we obtain optimal time solutions for several document retrieval problems. The method of the paper could be also of independent interest.
Sonal Chawla, R. K. Singla
In today\^as world designing adaptable course material requires new technical
knowledge which involves a need for a uniform protocol that allows organizing
resources with emphasis on quality and Learning. This can be achieved by
bundling the resources in a known and prescribed fashion called Learning
objects. Learning Objects are composed of two aspects namely "Learning" and
"Object". The Learning aspect of Learning objects refers to Education. Since
Education is a process so the primary aim of learning objects tends to be
facilitating acquisition, assessment and conversion of content into Learning
objects while fostering the assimilation of these Learning objects into
learning modules and instruction. The Object part of Learning objects relates
to the Digital Electronic format of the resources i.e. to say that it deals
with the physical resource that forms the Learning objects. The objects in LOs
are analogous to objects used in object-oriented modeling (OOM). The analogy
helps visualize how LOs will be packaged, processed and transported across the
digital library as well as utilized in course building. OOM concepts such as
encapsulation, classification, polymorphism, inheritance and reuse can be
borrowed to describe the operations on LOs in the digital library. Thus, the
aim of this paper is threefolds. Firstly, to discuss the background of this
research and the concept of Learning Objects. Secondly, to provide a framework
for adaptive mechanism for the retrieval of Learning Objects and thirdly to
highlight the benefits that this new proposed framework shall bring.
Authors' comments: Submitted to Journal of Telecommunications, see
http://sites.google.com/site/journaloftelecommunications/volume-2-issue-2-may-2010
Uday Pratap Singh, Sanjeev Jain, Gulfishan Firdose Ahmed
The digital image data is rapidly expanding in quantity and heterogeneity.
The traditional information retrieval techniques does not meet the user's
demand, so there is need to develop an efficient system for content based image
retrieval. Content based image retrieval means retrieval of images from
database on the basis of visual features of image like as color, texture etc.
In our proposed method feature are extracted after applying Phong shading on
input image. Phong shading, flattering out the dull surfaces of the image The
features are extracted using color, texture & edge density methods. Feature
extracted values are used to find the similarity between input query image and
the data base image. It can be measure by the Euclidean distance formula. The
experimental result shows that the proposed approach has a better retrieval
results with phong shading.
Authors' comments: IEEE Publication format, International Journal of Computer Science
and Information Security, IJCSIS, Vol. 8 No. 1, April 2010, USA. ISSN 1947
5500, http://sites.google.com/site/ijcsis/
Abderrahim El Qadi, Driss Aboutajedine, Yassine Ennouary
In this paper we describe a mechanism to improve Information Retrieval (IR)
on the web. The method is based on Formal Concepts Analysis (FCA) that it is
makes semantical relations during the queries, and allows a reorganizing, in
the shape of a lattice of concepts, the answers provided by a search engine. We
proposed for the IR an incremental algorithm based on Galois lattice. This
algorithm allows a formal clustering of the data sources, and the results which
it turns over are classified by order of relevance. The control of relevance is
exploited in clustering, we improved the result by using ontology in field of
image processing, and reformulating the user queries which make it possible to
give more relevant documents.
Authors' comments: Pages IEEE format, International Journal of Computer Science and
Information Security, IJCSIS, Vol. 7 No. 2, February 2010, USA. ISSN 1947
5500, http://sites.google.com/site/ijcsis/
Kathrin Knautz, Simone Soubusta, Wolfgang G. Stock
The paper presents our design of a next generation information retrieval system based on tag co-occurrences and subsequent clustering. We help users getting access to digital data through information visualization in the form of tag clusters. Current problems like the absence of interactivity and semantics between tags or the difficulty of adding additional search arguments are solved. In the evaluation, based upon SERVQUAL and IT systems quality indicators, we found out that tag clusters are perceived as more useful than tag clouds, are much more trustworthy, and are more enjoyable to use.
Patricio Galeas, Ralph Kretschmer, Bernd Freisleben
In addition to the frequency of terms in a document collection, the
distribution of terms plays an important role in determining the relevance of
documents. In this paper, a new approach for representing term positions in
documents is presented. The approach allows an efficient evaluation of
term-positional information at query evaluation time. Three applications are
investigated: a function-based ranking optimization representing a user-defined
document region, a query expansion technique based on overlapping the term
distributions in the top-ranked documents, and cluster analysis of terms in
documents. Experimental results demonstrate the effectiveness of the proposed
approach.
Authors' comments: 12 pages, submitted to proceedings of ECIR-2010
James Schombert
The first step in a science project is the acquisition and understanding of
the relevant data. This paper outlines the results of a project to design and
test network tools specifically oriented at retrieving astronomical data. The
tools range from simple data transfer methods to more complex browser-emulating
scripts. When integrated with a defined sample or catalog, these scripts
provide seamless techniques to retrieve and store data of varying types.
Examples are given on how these tools can be used to leapfrog from website to
website to acquire multi-wavelength datasets. This project demonstrates the
capability to use multiple data websites, in conjunction, to perform the type
of calculations once reserved for on-site datasets.
Authors' comments: 10 pages, no figures, software at
http://abyss.uoregon.edu/~js/network
Daniel Sonntag, Romàn R. Zapatrin
We present a method to geometrize massive data sets from search engines query logs. For this purpose, a macrodynamic-like quantitative model of the Information Retrieval (IR) process is developed, whose paradigm is inspired by basic constructions of Einstein's general relativity theory in which all IR objects are uniformly placed in a common Room. The Room has a structure similar to Einsteinian spacetime, namely that of a smooth manifold. Documents and queries are treated as matter objects and sources of material fields. Relevance, the central notion of IR, becomes a dynamical issue controlled by both gravitation (or, more precisely, as the motion in a curved spacetime) and forces originating from the interactions of matter fields. The spatio-temporal description ascribes dynamics to any document or query, thus providing a uniform description for documents of both initially static and dynamical nature. Within the IR context, the techniques presented are based on two ideas. The first is the placement of all objects participating in IR into a common continuous space. The second idea is the `objectivization' of the IR process; instead of expressing users' wishes, we consider the overall IR as an objective physical process, representing the IR process in terms of motion in a given external-fields configuration. Various semantic environments are treated as various IR universes.
Kazuhito Honda, Daisuke Akamatsu, Manabu Arikawa, Yoshihiko Yokoi, Keiichirou Akiba, Satoshi Nagatsuka, Takahito Tanimura, Akira Furusawa et al.
Storage and retrieval of a squeezed vacuum was successfully demonstrated
using electromagnetically induced transparency. 930ns of the squeezed vacuum
pulse was incident on the laser cooled 87Rb atoms with an intense control light
in a coherent state. When the squeezed vacuum pulse was slowed and spatially
compressed in the cold atoms, the control light was switched off. After 3us of
storage, the control light was switched on again and the squeezed vacuum was
retrieved, as was confirmed using the time-domain homodyne method.
Authors' comments: 4 pages, 4 figures, to appear in Physical Review Letters
Julien Barre
We address the problem of retrieving information from a noisy version of the
``knowledge networks'' introduced by Maslov and Zhang. We map this problem onto
a disordered statistical mechanics model, which opens the door to many
analytical and numerical approaches. We give the replica symmetric solution,
compare with numerical simulations, and finally discuss an application to real
datas from the United States Senate.
Authors' comments: 10 pages, 4 figures. Writing of the last section improved; version
accepted in JSTAT
Gennadiy Averkov, Gabriele Bianchi
The covariogram g_K(x) of a convex body K \subseteq E^d is the function which associates to each x \in E^d the volume of the intersection of K with K+x. Matheron asked whether g_K determines K, up to translations and reflections in a point. Positive answers to Matheron's question have been obtained for large classes of planar convex bodies, while for d\geq 3 there are both positive and negative results. One of the purposes of this paper is to sharpen some of the known results on Matheron's conjecture indicating how much of the covariogram information is needed to get the uniqueness of determination. We indicate some subsets of the support of the covariogram, with arbitrarily small Lebesgue measure, such that the covariogram, restricted to those subsets, identifies certain geometric properties of the body. These results are more precise in the planar case, but some of them, both positive and negative ones, are proved for bodies of any dimension. Moreover some results regard most convex bodies, in the Baire category sense. Another purpose is to extend the class of convex bodies for which Matheron's conjecture is confirmed by including all planar convex bodies possessing two non-degenerate boundary arcs being reflections of each other.
S. Marchesini
Iterative algorithms with feedback are amongst the most powerful and
versatile optimization methods for phase retrieval. Among these, the hybrid
input-output algorithm has demonstrated practical solutions to giga-element
nonlinear phase retrieval problems, escaping local minima and producing images
at resolutions beyond the capabilities of lens-based optical methods. Here, the
input-output iteration is improved by a lower dimensional subspace saddle-point
optimization.
Authors' comments: 8 pages, 4 figures, revtex
Vyacheslav M. Abramov
The interest to retrial queueing systems is due to their application to
telephone systems. The paper studies multiserver retrial queueing systems with
$n$ servers. Arrival process is a quite general point process. An arriving
customer occupies one of free servers. If upon arrival all servers are busy,
then the customer waits for his service in orbit, and after random time retries
more and more to occupy a server. The orbit has one waiting space only, and
arriving customer, who finds all servers busy and the waiting space occupied,
losses from the system. Time intervals between possible retrials are assumed to
have arbitrary distribution (the retrial scheme is exactly explained in the
paper). The paper provides analysis of this system. Specifically the paper
studies optimal number of servers to decrease the loss proportion to a given
value. The representation obtained for loss proportion enables us to solve the
problem numerically. The algorithm for numerical solution includes effective
simulation, which meets the challenge of rare events problem in simulation.
Authors' comments: 21 pages, double spaced I added additional details in introduction
J. M. Borrero, S. Tomczyk, A. Norton, T. Darnell, J. Schou, P. Scherrer, R. Bush, Y. Lui
The Helioseismic and Magnetic Imager (HMI), on board the Solar Dynamics
Observatory (SDO), will begin data acquisition in 2008. It will provide the
first full disk, high temporal cadence observations of the full Stokes vector
with a 0.5 arc sec pixel size. This will allow for a continuous monitoring of
the Solar magnetic field vector. HMI data will advance our understanding of the
small and large-scale magnetic field evolution, its relation to the solar and
global dynamic processes, coronal field extrapolations, flux emergence,
magnetic helicity and the nature of the polar magnetic fields. We summarize
HMI's expected operation modes, focusing on the polarization cross-talk induced
by the solar oscillations and how this affects the magnetic field vector
determinations.
Authors' comments: 4 pages, 2 figures. to appear in the proceedings of the Solar
Polarizaton Workshop IV
Atsushi Fujii, Katunobu Itou, Tomoyosi Akiba, Tetsuya Ishikawa
We are developing a cross-media information retrieval system, in which users
can view specific segments of lecture videos by submitting text queries. To
produce a text index, the audio track is extracted from a lecture video and a
transcription is generated by automatic speech recognition. In this paper, to
improve the quality of our retrieval system, we extensively investigate the
effects of adapting acoustic and language models on speech recognition. We
perform an MLLR-based method to adapt an acoustic model. To obtain a corpus for
language model adaptation, we use the textbook for a target lecture to search a
Web collection for the pages associated with the lecture topic. We show the
effectiveness of our method by means of experiments.
Authors' comments: 4 pages, Proceedings of the 8th International Conference on Spoken
Language Processing (to appear)
S. Marchesini
Iterative projection algorithms for phase retrieval are tested on two simple toy models. The result provides useful insights in the behavior of these algorithms.