Adeyinka K. Akanbi, Olusanya Y. Agunbiade, Sadiq Kuti, Olumuyiwa J. Dehinbo
A lot of information on the web is geographically referenced. Discovering and
retrieving this geographic information to satisfy various users needs across
both open and distributed Spatial Data Infrastructures (SDI) poses eminent
research challenges. However, this is mostly caused by semantic heterogeneity
in users query and lack of semantic referencing of the Geographic Information
(GI) metadata. To addressing these challenges, this paper discusses ontology
based semantic enhanced model, which explicitly represents GI metadata, and
provides linked RDF instances of each entity. The system focuses on semantic
search, ontology, and efficient spatial information retrieval. In particular,
an integrated model that uses specific domain information extraction to improve
the searching and retrieval of ranked spatial search results.
Authors' comments: 7 pages, 9 figures
Xi Zheng, Akanksha Bansal, Matthew Lease
We present the Bullseye system for scholarly search. Given a collection of research papers, Bullseye: 1) identifies relevant passages using any on-the-shelf algorithm; 2) automatically detects document structure and restricts retrieved passages to user-specifed sections; and 3) highlights those passages for each PDF document retrieved. We evaluate Bullseye with regard to three aspects: system effectiveness, user effectiveness, and user effort. In a system-blind evaluation, users were asked to compare passage retrieval using Bullseye vs. a baseline which ignores document structure, in regard to four types of graded assessments. Results show modest improvement in system effectiveness while both user effectiveness and user effort show substantial improvement. Users also report very strong demand for passage highlighting in scholarly search across both systems considered.
Felicia Florentina Giza, Cristina Elena Turcu, Ovidiu Andrei Schipor
This paper presents an architecture of an information retrieval system that
use the advantages offered by mobile agents to collect information from
different sources and bring the result to the calling user. Mobile agent
technology will be used for determine the traceability of a product and also
for searching information about a specific entity.
Authors' comments: 6 pages, 2 figures, in Romanian
Philip Schniter, Sundeep Rangan
In phase retrieval, the goal is to recover a signal $\mathbf{x}\in\mathbb{C}^N$ from the magnitudes of linear measurements $\mathbf{Ax}\in\mathbb{C}^M$. While recent theory has established that $M\approx 4N$ intensity measurements are necessary and sufficient to recover generic $\mathbf{x}$, there is great interest in reducing the number of measurements through the exploitation of sparse $\mathbf{x}$, which is known as compressive phase retrieval. In this work, we detail a novel, probabilistic approach to compressive phase retrieval based on the generalized approximate message passing (GAMP) algorithm. We then present a numerical study of the proposed PR-GAMP algorithm, demonstrating its excellent phase-transition behavior, robustness to noise, and runtime. Our experiments suggest that approximately $M\geq 2K\log_2(N/K)$ intensity measurements suffice to recover $K$-sparse Bernoulli-Gaussian signals for $\mathbf{A}$ with i.i.d Gaussian entries and $K\ll N$. Meanwhile, when recovering a 6k-sparse 65k-pixel grayscale image from 32k randomly masked and blurred Fourier intensity measurements at 30~dB measurement SNR, PR-GAMP achieved an output SNR of no less than 28~dB in all of 100 random trials, with a median runtime of only 7.3 seconds. Compared to the recently proposed CPRL, sparse-Fienup, and GESPAR algorithms, our experiments suggest that PR-GAMP has a superior phase transition and orders-of-magnitude faster runtimes as the sparsity and problem dimensions increase.
Joanna K. Barstow, Neil E. Bowles, Suzanne Aigrain, Leigh N. Fletcher, Patrick G. J. Irwin, Ryan Varley, Enzo Pascale
We demonstrate the effectiveness of the Exoplanet Characterisation
Observatory mission concept for constraining the atmospheric properties of hot
and warm gas giants and super Earths. Synthetic primary and secondary transit
spectra for a range of planets are passed through EChOSim (Waldmann & Pascale
2014) to obtain the expected level of noise for different observational
scenarios; these are then used as inputs for the NEMESIS atmospheric retrieval
code and the retrieved atmospheric properties (temperature structure,
composition and cloud properties) compared with the known input values,
following the method of Barstow et al. (2013a). To correctly retrieve the
temperature structure and composition of the atmosphere to within 2 {\sigma},
we find that we require: a single transit or eclipse of a hot Jupiter orbiting
a sun-like (G2) star at 35 pc to constrain the terminator and dayside
atmospheres; 20 transits or eclipses of a warm Jupiter orbiting a similar star;
10 transits/eclipses of a hot Neptune orbiting an M dwarf at 6 pc; and 30
transits or eclipses of a GJ1214b-like planet.
Authors' comments: 13 pages, 15 figures, 1 table. Accepted by Experimental Astronomy.
The final publication will shortly be available at Springer via
http://dx.doi.org/10.1007/s10686-014-9397-y
Thangaraj M, Gayathri V
As the marvellous growth of the digital library in each year, the problems
with indexing and searching a digital library is increased in a high rate. When
the researchers search for the earlier versions, only a few recent versions in
the back volumes can be retrieved soon. It is unpredictable that researchers
require the earlier versions in a specific boundary. In order to facilitate the
researchers, who may access any version at any time, we propose a VTAG
technique for indexing. Our experiments indicate that the proposed retrieval
technique, VTAG, effectively retrieves any version in considerable amount of
time than the existing method.
Authors' comments: 6 pages, 7 figures, Published with International Journal of Computer
& Organization Trends (IJCOT)
Ardeshir Mohammad Ebtehaj, Rafael Luis Bras, Efi Foufoula-Georgiou
This paper introduces a new Bayesian approach to the inverse problem of passive microwave rainfall retrieval. The proposed methodology relies on a regularization technique and makes use of two joint dictionaries of coincidental rainfall profiles and their corresponding upwelling spectral radiative fluxes. A sequential detection-estimation strategy is adopted, which basically assumes that similar rainfall intensity values and their spectral radiances live close to some sufficiently smooth manifolds with analogous local geometry. The detection step employs a nearest neighborhood classification rule, while the estimation scheme is equipped with a constrained shrinkage estimator to ensure stability of retrieval and some physical consistency. The algorithm is examined using coincidental observations of the active precipitation radar (PR) and passive microwave imager (TMI) on board the Tropical Rainfall Measuring Mission (TRMM) satellite. We present promising results of instantaneous rainfall retrieval for some tropical storms and mesoscale convective systems over ocean, land, and coastal zones. We provide evidence that the algorithm is capable of properly capturing different storm morphologies including high intensity rain-cells and trailing light rainfall, especially over land and coastal areas. The algorithm is also validated at an annual scale for calendar year 2013 versus the standard (version 7) radar (2A25) and radiometer (2A12) rainfall products of the TRMM satellite.
Namita Mittal, Basant Agarwal, Ajay Gupta, Hemant Madhur
Recent developments in the ICT industry in past few decades has enabled the
quick and easy access to the information available on the internet. But,
digital literacy is the pre-requisite for its use. The main purpose of this
paper is to provide an interface for digitally illiterate users, especially
farmers to efficiently and effectively retrieve information through Internet.
In addition, to enable the farmers to identify the disease in their crop, its
cause and symptoms using digital image processing and pattern recognition
instantly without waiting for an expert to visit the farms and identify the
disease.
Authors' comments: Iconic Interface, Image Processing, Pattern Recognition, Data Mining,
Information Retrieval
Anders Friberg, Erwin Schoonderwaldt, Anton Hedblad, Marco Fabiani, Anders Elowsson
In this study, the notion of perceptual features is introduced for describing
general music properties based on human perception. This is an attempt at
rethinking the concept of features, in order to understand the underlying human
perception mechanisms. Instead of using concepts from music theory such as
tones, pitches, and chords, a set of nine features describing overall
properties of the music was selected. They were chosen from qualitative
measures used in psychology studies and motivated from an ecological approach.
The selected perceptual features were rated in two listening experiments using
two different data sets. They were modeled both from symbolic (MIDI) and audio
data using different sets of computational features. Ratings of emotional
expression were predicted using the perceptual features. The results indicate
that (1) at least some of the perceptual features are reliable estimates; (2)
emotion ratings could be predicted by a small combination of perceptual
features with an explained variance up to 90%; (3) the perceptual features
could only to a limited extent be modeled using existing audio features. The
results also clearly indicated that a small number of dedicated features were
superior to a 'brute force' model using a large number of general audio
features.
Authors' comments: submitted to the Journal of the Acoustical Society of America January
9, 2014
Gagandeep Singh, Vishal Jain
A large amount of data is present on the web. It contains huge number of web pages and to find suitable information from them is very cumbersome task. There is need to organize data in formal manner so that user can easily access and use them. To retrieve information from documents, we have many Information Retrieval (IR) techniques. Current IR techniques are not so advanced that they can be able to exploit semantic knowledge within documents and give precise results. IR technology is major factor responsible for handling annotations in Semantic Web (SW) languages and in the present paper knowledgeable representation languages used for retrieving information are discussed.
Qifeng Qiao, Peter A. Beling
We propose a multiple instance learning approach to content-based retrieval of classroom video for the purpose of supporting human assessing the learning environment. The key element of our approach is a mapping between the semantic concepts of the assessment system and features of the video that can be measured using techniques from the fields of computer vision and speech analysis. We report on a formative experiment in content-based video retrieval involving trained experts in the Classroom Assessment Scoring System, a widely used framework for assessment and improvement of learning environments. The results of this experiment suggest that our approach has potential application to productivity enhancement in assessment and to broader retrieval tasks.
Liang Zheng, Shengjin Wang, Wengang Zhou, Qi Tian
The Bag-of-Words (BoW) representation is well applied to recent
state-of-the-art image retrieval works. Typically, multiple vocabularies are
generated to correct quantization artifacts and improve recall. However, this
routine is corrupted by vocabulary correlation, i.e., overlapping among
different vocabularies. Vocabulary correlation leads to an over-counting of the
indexed features in the overlapped area, or the intersection set, thus
compromising the retrieval accuracy. In order to address the correlation
problem while preserve the benefit of high recall, this paper proposes a Bayes
merging approach to down-weight the indexed features in the intersection set.
Through explicitly modeling the correlation problem in a probabilistic view, a
joint similarity on both image- and feature-level is estimated for the indexed
features in the intersection set.
We evaluate our method through extensive experiments on three benchmark
datasets. Albeit simple, Bayes merging can be well applied in various merging
tasks, and consistently improves the baselines on multi-vocabulary merging.
Moreover, Bayes merging is efficient in terms of both time and memory cost, and
yields competitive performance compared with the state-of-the-art methods.
Authors' comments: 8 pages, 7 figures, 6 tables, accepted to CVPR 2014
David Gross, Felix Krahmer, Richard Kueng
In this work we analyze the problem of phase retrieval from Fourier
measurements with random diffraction patterns. To this end, we consider the
recently introduced PhaseLift algorithm, which expresses the problem in the
language of convex optimization. We provide recovery guarantees which require
O(log^2 d) different diffraction patterns, thus improving on recent results by
Candes et al. [arXiv:1310.3240], which require O(log^4 d) different patterns.
Authors' comments: 28 pages, in press, Applied and Computational Harmonic Analysis
(2015)
Sohan Seth, John Shawe-Taylor, Samuel Kaski
We study the task of retrieving relevant experiments given a query experiment. By experiment, we mean a collection of measurements from a set of `covariates' and the associated `outcomes'. While similar experiments can be retrieved by comparing available `annotations', this approach ignores the valuable information available in the measurements themselves. To incorporate this information in the retrieval task, we suggest employing a retrieval metric that utilizes probabilistic models learned from the measurements. We argue that such a metric is a sensible measure of similarity between two experiments since it permits inclusion of experiment-specific prior knowledge. However, accurate models are often not analytical, and one must resort to storing posterior samples which demands considerable resources. Therefore, we study strategies to select informative posterior samples to reduce the computational load while maintaining the retrieval performance. We demonstrate the efficacy of our approach on simulated data with simple linear regression as the models, and real world datasets.
Lev B Levitin, Tommaso Toffoli
We consider the physical limitations imposed on the information content of an
image by the wave and quantum nature of light, when the image is obtained by
illuminating a reflecting or transmitting planar object by natural---i.e.,
fully thermalized---light, or by observation of an object emitting incoherent
(thermal) radiation. The discreteness of the degrees of freedom and the
statistical properties of thermal radiation are taken into account. We derive
the maximum amount of information that can be retrieved from the object. This
amount is always finite and is proportional to the area of the object, the
solid angle under which the entrance pupil of the receiver is seen from the
object, and the time of observation.
An explicit expression for the information in the case where the information
recorded by the receiver obeys Planck's spectral distribution is obtained. The
amount of information per photon of recorded radiation is a universal numerical
constant, independent of the parameters of observation.
Authors' comments: 3 pages
Liang Zheng, Shengjin Wang, Ziqiong Liu, Qi Tian
In Bag-of-Words (BoW) based image retrieval, the SIFT visual word has a low
discriminative power, so false positive matches occur prevalently. Apart from
the information loss during quantization, another cause is that the SIFT
feature only describes the local gradient distribution. To address this
problem, this paper proposes a coupled Multi-Index (c-MI) framework to perform
feature fusion at indexing level. Basically, complementary features are coupled
into a multi-dimensional inverted index. Each dimension of c-MI corresponds to
one kind of feature, and the retrieval process votes for images similar in both
SIFT and other feature spaces. Specifically, we exploit the fusion of local
color feature into c-MI. While the precision of visual match is greatly
enhanced, we adopt Multiple Assignment to improve recall. The joint cooperation
of SIFT and color features significantly reduces the impact of false positive
matches.
Extensive experiments on several benchmark datasets demonstrate that c-MI
improves the retrieval accuracy significantly, while consuming only half of the
query time compared to the baseline. Importantly, we show that c-MI is well
complementary to many prior techniques. Assembling these methods, we have
obtained an mAP of 85.8% and N-S score of 3.85 on Holidays and Ukbench
datasets, respectively, which compare favorably with the state-of-the-arts.
Authors' comments: 8 pages, 7 figures, 6 tables. Accepted to CVPR 2014
Pokkuluri Kiran Sree, Inampudi Ramesh Babu
Clustering has been widely applied to Information Retrieval (IR) on the
grounds of its potential improved effectiveness over inverted file search.
Clustering is a mostly unsupervised procedure and the majority of the
clustering algorithms depend on certain assumptions in order to define the
subgroups present in a data set .A clustering quality measure is a function
that, given a data set and its partition into clusters, returns a non-negative
real number representing the quality of that clustering. Moreover, they may
behave in a different way depending on the features of the data set and their
input parameters values. Therefore, in most applications the resulting
clustering scheme requires some sort of evaluation as regards its validity. The
quality of clustering can be enhanced by using a Cellular Automata Classifier
for information retrieval. In this study we take the view that if cellular
automata with clustering is applied to search results (query-specific
clustering), then it has the potential to increase the retrieval effectiveness
compared both to that of static clustering and of conventional inverted file
search. We conducted a number of experiments using ten document collections and
eight hierarchic clustering methods. Our results show that the effectiveness of
query-specific clustering with cellular automata is indeed higher and suggest
that there is scope for its application to IR.
Authors' comments: Journal of Computer Science 4 (2): 167-171, 2008,ISSN 1549-3636, 2008
Science Publications
Avinash N Bhute, B B Meshram
Due to the extensive use of information technology and the recent
developments in multimedia systems, the amount of multimedia data available to
users has increased exponentially. Video is an example of multimedia data as it
contains several kinds of data such as text, image, meta-data, visual and
audio. Content based video retrieval is an approach for facilitating the
searching and browsing of large multimedia collections over WWW. In order to
create an effective video retrieval system, visual perception must be taken
into account. We conjectured that a technique which employs multiple features
for indexing and retrieval would be more effective in the discrimination and
search tasks of videos. In order to validate this, content based indexing and
retrieval systems were implemented using color histogram, Texture feature
(GLCM), edge density and motion..
Authors' comments: 20 pages, 12 Figures. arXiv admin note: substantial text overlap with
arXiv:1211.4683
Izuddin Zainalabidin, Izyan Izzati A Halim, Faizal A Fadzil
The technological transformation and automation of digital content delivery
has revolutionized the media industry. Advertising landscape is gradually
shifting its traditional media forms to the emergent of Internet advertising.
In this paper, the types of internet advertising to be discussed on are
contextual and sponsored search ads. These types of advertising have the
central challenge of finding the best match between a given context and a
suitable advertisement, through a principled method. Furthermore, there are
four main players that exist in the Internet advertising ecosystem: users,
advertisers, ad exchange and publishers. Hence, to find ways to counter the
central challenge, the paper addresses two objectives: how to successfully make
the best contextual ads selections to match to a web page content to ensure
that there is a valuable connection between the web page and the contextual
ads. All methods, discussions, conclusion and future recommendations are
presented as per sections. Hence, in order to prove the working mechanism of
matching contextual ads and web pages, web pages together with the ads matching
system are developed as a prototype.
Authors' comments: 14 pages, 3 figures, 1 table. arXiv admin note: text overlap with
arXiv:1206.1754 by other authors
Yonatan Vaizman, Brian McFee, Gert Lanckriet
Digital music has become prolific in the web in recent decades. Automated
recommendation systems are essential for users to discover music they love and
for artists to reach appropriate audience. When manual annotations and user
preference data is lacking (e.g. for new artists) these systems must rely on
\emph{content based} methods. Besides powerful machine learning tools for
classification and retrieval, a key component for successful recommendation is
the \emph{audio content representation}.
Good representations should capture informative musical patterns in the audio
signal of songs. These representations should be concise, to enable efficient
(low storage, easy indexing, fast search) management of huge music
repositories, and should also be easy and fast to compute, to enable real-time
interaction with a user supplying new songs to the system.
Before designing new audio features, we explore the usage of traditional
local features, while adding a stage of encoding with a pre-computed
\emph{codebook} and a stage of pooling to get compact vectorial
representations. We experiment with different encoding methods, namely
\emph{the LASSO}, \emph{vector quantization (VQ)} and \emph{cosine similarity
(CS)}. We evaluate the representations' quality in two music information
retrieval applications: query-by-tag and query-by-example. Our results show
that concise representations can be used for successful performance in both
applications. We recommend using top-$\tau$ VQ encoding, which consistently
performs well in both applications, and requires much less computation time
than the LASSO.
Authors' comments: Journal paper. Submitted to IEEE transactions on Audio, Speech and
Language Processing. Submitted on Dec 18th, 2013