R. K. Roul, S. K. Sahay
Search engine returns thousands of web pages for a single user query, in
which most of them are not relevant. In this context, effective information
retrieval from the expanding web is a challenging task, in particular, if the
query is ambiguous. The major question arises here is that how to get the
relevant pages for an ambiguous query. We propose an approach for the effective
result of an ambiguous query by forming community vector based on association
concept of data minning using vector space model and the freedictionary. We
develop clusters by computing the similarity between community vectors and
document vectors formed from the extracted web pages by the search engine. We
use Gensim package to implement the algorithm because of its simplicity and
robust nature. Analysis shows that our approach is an effective way to form
clusters for an ambiguous query.
Authors' comments: 11 Pages, 1 figure
Youssef Bassil
The Bing Bang of the Internet in the early 90's increased dramatically the
number of images being distributed and shared over the web. As a result, image
information retrieval systems were developed to index and retrieve image files
spread over the Internet. Most of these systems are keyword-based which search
for images based on their textual metadata; and thus, they are imprecise as it
is vague to describe an image with a human language. Besides, there exist the
content-based image retrieval systems which search for images based on their
visual information. However, content-based type systems are still immature and
not that effective as they suffer from low retrieval recall/precision rate.
This paper proposes a new hybrid image information retrieval model for indexing
and retrieving web images published in HTML documents. The distinguishing mark
of the proposed model is that it is based on both graphical content and textual
metadata. The graphical content is denoted by color features and color
histogram of the image; while textual metadata are denoted by the terms that
surround the image in the HTML document, more particularly, the terms that
appear in the tags p, h1, and h2, in addition to the terms that appear in the
image's alt attribute, filename, and class-label. Moreover, this paper presents
a new term weighting scheme called VTF-IDF short for Variable Term
Frequency-Inverse Document Frequency which unlike traditional schemes, it
exploits the HTML tag structure and assigns an extra bonus weight for terms
that appear within certain particular HTML tags that are correlated to the
semantics of the image. Experiments conducted to evaluate the proposed IR model
showed a high retrieval precision rate that outpaced other current models.
Authors' comments: LACSC - Lebanese Association for Computational Sciences,
http://www.lacsc.org/; International Journal of Computer Science & Emerging
Technologies (IJCSET), Vol. 3, No. 1, February 2012
Eliyahu Osherovich, Michael Zibulevsky, Irad Yavneh
We present a new method for real- and complex-valued image reconstruction from two intensity measurements made in the Fourier plane: the Fourier magnitude of the unknown image, and the intensity of the interference pattern arising from superimposition of the original signal with a reference beam. This approach can provide significant advantages in digital holography since it poses less stringent requirements on the reference beam. In particular, it does not require spatial separation between the sought signal and the reference beam. Moreover, the reference beam need not be known precisely, and in fact, may contain severe errors, without leading to a deterioration in the reconstruction quality. Numerical simulations are presented to demonstrate the speed and quality of reconstruction.
Muhammad Fahad Khan, Saira Beg
Paper presents the way of transferring stereo images using SMS over GSM
network. Generally, Stereo image is composed of two stereoscopic images in such
way that gives three dimensional affect when viewed. GSM have two short
messaging services, which can transfer images and sounds etc. Such services are
known as; MMS (Multimedia Messaging Service) and EMS (Extended Messaging
Service). EMS can send Predefined sounds, animation and images but have
limitation that it does not support widely. MMS can send much higher contents
than EMS but need 3G and other network capability in order to send large size
data up to 1000 bytes. Other limitations are Portability, content adaption etc.
Our major aim in this paper is to provide an alternative way of sending stereo
images over SMS which is widely supported than EMS. We develop an application
using J2ME Platform.
Authors' comments: 3 pages,3 figuers,Journal
Awny Sayed
The continuous growth in the XML information repositories has been matched by
increasing efforts in development of XML retrieval systems, in large parts
aiming at supporting content-oriented XML retrieval. These systems exploit the
available structural information, as market up in XML documents, in order to
return documents components- the so called XML elements-instead of the
complement documents in repose to the user query. In this paper, we provide an
overview of the different XML information retrieval systems and classify them
according to their storage and query evaluation strategies.
Authors' comments: 10 pages, 25 references
Abdelghni Lakehal, Omar El Beqqali
The recent technological progress in acquisition, modeling and processing of
3D data leads to the proliferation of a large number of 3D objects databases.
Consequently, the techniques used for content based 3D retrieval has become
necessary. In this paper, we introduce a new method for 3D objects recognition
and retrieval by using a set of binary images CLI (Characteristic level
images). We propose a 3D indexing and search approach based on the similarity
between characteristic level images using Hu moments for it indexing. To
measure the similarity between 3D objects we compute the Hausdorff distance
between a vectors descriptor. The performance of this new approach is evaluated
at set of 3D object of well known database, is NTU (National Taiwan University)
database.
Authors' comments: 10 pages, 5 figures, publication paper
Fidelia Ibekwe-Sanjuan, Fernandez Silvia, Sanjuan Eric, Charton Eric
We present a methodology combining surface NLP and Machine Learning
techniques for ranking asbtracts and generating summaries based on annotated
corpora. The corpora were annotated with meta-semantic tags indicating the
category of information a sentence is bearing (objective, findings, newthing,
hypothesis, conclusion, future work, related work). The annotated corpus is fed
into an automatic summarizer for query-oriented abstract ranking and multi-
abstract summarization. To adapt the summarizer to these two tasks, two novel
weighting functions were devised in order to take into account the distribution
of the tags in the corpus. Results, although still preliminary, are encouraging
us to pursue this line of work and find better ways of building IR systems that
can take into account semantic annotations in a corpus.
Authors' comments: ECIR'08 Workshop on: Exploiting Semantic Annotations for Information
Retrieval, Glasgow : United Kingdom (2008)
Yin Ye, Yunshan Cao, Xin-Qi Li, Shmuel Gurvitz
We found that in contrast with the common premise, a measurement on the
environment of an open quantum system can {\em reduce} its decoherence rate. We
demonstrate it by studying an example of indirect qubit's measurement, where
the information on its state is hidden in the environment. This information is
extracted by a distant device, coupled with the environment. We also show that
the reduction of decoherence generated by this device, is accompanied with
diminution of the environmental noise in a vicinity of the qubit. An
interpretation of these results in terms of quantum interference on large
scales is presented.
Authors' comments: 9 pages, 8 figures, additional explanations added, Phys. Rev. B, in
press
E. Di Sciascio, F. M. Donini, M. Mongiello
We propose a structured approach to the problem of retrieval of images by content and present a description logic that has been devised for the semantic indexing and retrieval of images containing complex objects. As other approaches do, we start from low-level features extracted with image analysis to detect and characterize regions in an image. However, in contrast with feature-based approaches, we provide a syntax to describe segmented regions as basic objects and complex objects as compositions of basic ones. Then we introduce a companion extensional semantics for defining reasoning services, such as retrieval, classification, and subsumption. These services can be used for both exact and approximate matching, using similarity measures. Using our logical approach as a formal specification, we implemented a complete client-server image retrieval system, which allows a user to pose both queries by sketch and queries by example. A set of experiments has been carried out on a testbed of images to assess the retrieval capabilities of the system in comparison with expert users ranking. Results are presented adopting a well-established measure of quality borrowed from textual information retrieval.
C. Yang, J. Qian, A. Schirotzek, F. Maia, S. Marchesini
Ptychography promises diffraction limited resolution without the need for
high resolution lenses. To achieve high resolution one has to solve the phase
problem for many partially overlapping frames. Here we review some of the
existing methods for solving ptychographic phase retrieval problem from a
numerical analysis point of view, and propose alternative methods based on
numerical optimization.
Authors' comments: 32 pages, 15 figures
Ye Ji
This report presents the results and details of a content-based image retrieval project using the Top-surf descriptor. The experimental results are preliminary, however, it shows the capability of deducing objects from parts of the objects or from the objects that are similar. This paper uses a dataset consisting of 1200 images of which 800 images are equally divided into 8 categories, namely airplane, beach, motorbike, forest, elephants, horses, bus and building, while the other 400 images are randomly picked from the Internet. The best results achieved are from building category.
Heiko Hellweg, Jürgen Krause, Thomas Mandl, Jutta Marx, Matthias N. O. Müller, Peter Mutschke, Robert Strötgen
The first step to handle semantic heterogeneity should be the attempt to
enrich the semantic information about documents, i.e. to fill up the gaps in
the documents meta-data automatically. Section 2 describes a set of cascading
deductive and heuristic extraction rules, which were developed in the project
CARMEN for the domain of Social Sciences. The mapping between different
terminologies can be done by using intellectual, statistical and/or neural
network transfer modules. Intellectual transfers use cross-concordances between
different classification schemes or thesauri. Section 3 describes the creation,
storage and handling of such transfers.
Authors' comments: Technical Report (Arbeitsbericht) GESIS - Leibniz Institute for the
Social Sciences
Philipp Mayr, Philipp Schaer, Peter Mutschke
This paper is about a better understanding on the structure and dynamics of
science and the usage of these insights for compensating the typical problems
that arises in metadata-driven Digital Libraries. Three science model driven
retrieval services are presented: co-word analysis based query expansion,
re-ranking via Bradfordizing and author centrality. The services are evaluated
with relevance assessments from which two important implications emerge: (1)
precision values of the retrieval service are the same or better than the
tf-idf retrieval baseline and (2) each service retrieved a disjoint set of
documents. The different services each favor quite other - but still relevant -
documents than pure term-frequency based rankings. The proposed models and
derived retrieval services therefore open up new viewpoints on the scientific
knowledge space and provide an alternative framework to structure scholarly
information systems.
Authors' comments: 8 pages, 4 figures, Cologne Conference on Interoperability and
Semantics in Knowledge Organization
Shiro Ikeda, Hidetoshi Kono
In this paper, we propose the SPR (sparse phase retrieval) method, which is a
new phase retrieval method for coherent x-ray diffraction imaging (CXDI).
Conventional phase retrieval methods effectively solve the problem for high
signal-to-noise ratio measurements, but would not be sufficient for single
biomolecular imaging which is expected to be realized with femto-second x-ray
free electron laser pulses. The SPR method is based on the Bayesian statistics.
It does not need to set the object boundary constraint that is required by the
commonly used hybrid input-output (HIO) method, instead a prior distribution is
defined with an exponential distribution and used for the estimation.
Simulation results demonstrate that the proposed method reconstructs the
electron density under a noisy condition even some central pixels are masked.
Authors' comments: 13 pages, 13 figures, submitted for a journal
Ranjeet Devarakonda, Giri Palanisamy
In the recent years, there has been significant advancement in the areas of scientific data management and retrieval techniques, especially in terms of standards and protocols for archiving data. Oak Ridge National Laboratory Distributed Data Archive Center for biogeochemical dynamics is making efforts in building advanced toolsets for these purposes. Mercury is a web-based metadata harvesting, data discovery and access system, built for researchers to search for, share and obtain biogeochemical data. Originally developed for single National Aeronautics and Space Administration (NASA) project, Mercury now used over fourteen different projects across three US federal agencies. Mercury renders various capabilities including metadata management, indexing, searching, data sharing, and also software reusability.
Samir AbdelRahman, Basma Hassan, Reem Bahgat
Email Retrieval task has recently taken much attention to help the user retrieve the email(s) related to the submitted query. Up to our knowledge, existing email retrieval ranking approaches sort the retrieved emails based on some heuristic rules, which are either search clues or some predefined user criteria rooted in email fields. Unfortunately, the user usually does not know the effective rule that acquires best ranking related to his query. This paper presents a new email retrieval ranking approach to tackle this problem. It ranks the retrieved emails based on a scoring function that depends on crucial email fields, namely subject, content, and sender. The paper also proposes an architecture to allow every user in a network/group of users to be able, if permissible, to know the most important network senders who are interested in his submitted query words. The experimental evaluation on Enron corpus prove that our approach outperforms known email retrieval ranking approaches.
Samir AbdelRahman, Basma Hassan, Reem Bahgat
Email Retrieval task has recently taken much attention to help the user
retrieve the email(s) related to the submitted query. Up to our knowledge,
existing email retrieval ranking approaches sort the retrieved emails based on
some heuristic rules, which are either search clues or some predefined user
criteria rooted in email fields. Unfortunately, the user usually does not know
the effective rule that acquires best ranking related to his query. This paper
presents a new email retrieval ranking approach to tackle this problem. It
ranks the retrieved emails based on a scoring function that depends on crucial
email fields, namely subject, content, and sender. The paper also proposes an
architecture to allow every user in a network/group of users to be able, if
permissible, to know the most important network senders who are interested in
his submitted query words. The experimental evaluation on Enron corpus prove
that our approach outperforms known email retrieval ranking approaches
Authors' comments: 20 pages
Ranjeet Devarakonda, Giri Palanisamy, Jim Green
Storing data is easy, but finding and using data is not. It is desirable that the data is stored in a structured format, which can be preserved and retrieved in future. Creating Metadata for the data is one way of creating structured data formats. Metadata can provide Multidisciplinary data access and will foster more robust scientific discoveries. In the recent years, there has been significant advancement in the areas of scientific data management and retrieval techniques, particularly in terms of standards and protocols for archiving data and metadata. New search technologies are being implemented around these protocols, which makes searching easy, fast and yet robust. Scientific data is generally rich, not easy to understand, and spread across different places. In order to integrate these pieces together, a data archive and an associated metadata is generated. This data should be stored in a format that can be locatable, retrievable and understandable, more importantly it should be in a form that will continue to be accessible as technology changes, such as XML.
Philipp Schaer, Philipp Mayr, Peter Mutschke
This paper is a short description of an information retrieval system enhanced
by three model driven retrieval services: (1) co-word analysis based query
expansion, re-ranking via (2) Bradfordizing and (3) author centrality. The
different services each favor quite other - but still relevant - documents than
pure term-frequency based rankings. Each service can be interactively combined
with each other to allow an iterative retrieval refinement.
Authors' comments: 2 pages, 1 figure, ASIST 2010 conference, Pittsburgh, PA, USA
Veit Elser, Stefan Eisebitt
Previous criteria for the feasibility of reconstructing phase information
from intensity measurements, both in x-ray crystallography and more recently in
coherent x-ray imaging, have been based on the Maxwell constraint counting
principle. We propose a new criterion, based on Shannon's mutual information,
that is better suited for noisy data or contrast that has strong priors not
well modeled by continuous variables. A natural application is magnetic domain
imaging, where the criterion for uniqueness in the reconstruction takes the
form that the number of photons, per pixel of contrast in the image, exceeds a
certain minimum. Detailed studies of a simple model show that the uniqueness
transition is of the type exhibited by spin glasses.
Authors' comments: 19 pages, 8 figures