Olivier Morère, Antoine Veillard, Jie Lin, Julie Petta, Vijay Chandrasekhar, Tomaso Poggio
Most image instance retrieval pipelines are based on comparison of vectors known as global image descriptors between a query image and the database images. Due to their success in large scale image classification, representations extracted from Convolutional Neural Networks (CNN) are quickly gaining ground on Fisher Vectors (FVs) as state-of-the-art global descriptors for image instance retrieval. While CNN-based descriptors are generally remarked for good retrieval performance at lower bitrates, they nevertheless present a number of drawbacks including the lack of robustness to common object transformations such as rotations compared with their interest point based FV counterparts. In this paper, we propose a method for computing invariant global descriptors from CNNs. Our method implements a recently proposed mathematical theory for invariance in a sensory cortex modeled as a feedforward neural network. The resulting global descriptors can be made invariant to multiple arbitrary transformation groups while retaining good discriminativeness. Based on a thorough empirical evaluation using several publicly available datasets, we show that our method is able to significantly and consistently improve retrieval results every time a new type of invariance is incorporated. We also show that our method which has few parameters is not prone to overfitting: improvements generalize well across datasets with different properties with regard to invariances. Finally, we show that our descriptors are able to compare favourably to other state-of-the-art compact descriptors in similar bitranges, exceeding the highest retrieval results reported in the literature on some datasets. A dedicated dimensionality reduction step --quantization or hashing-- may be able to further improve the competitiveness of the descriptors.
Mengchen Liu, Shixia Liu, Xizhou Zhu, Qinying Liao, Furu Wei, Shimei Pan
Although there has been a great deal of interest in analyzing customer opinions and breaking news in microblogs, progress has been hampered by the lack of an effective mechanism to discover and retrieve data of interest from microblogs. To address this problem, we have developed an uncertainty-aware visual analytics approach to retrieve salient posts, users, and hashtags. We extend an existing ranking technique to compute a multifaceted retrieval result: the mutual reinforcement rank of a graph node, the uncertainty of each rank, and the propagation of uncertainty among different graph nodes. To illustrate the three facets, we have also designed a composite visualization with three visual components: a graph visualization, an uncertainty glyph, and a flow map. The graph visualization with glyphs, the flow map, and the uncertainty analysis together enable analysts to effectively find the most uncertain results and interactively refine them. We have applied our approach to several Twitter datasets. Qualitative evaluation and two real-world case studies demonstrate the promise of our approach for retrieving high-quality microblog data.
Giorgos Tolias, Ronan Sicre, Hervé Jégou
Recently, image representation built upon Convolutional Neural Network (CNN) has been shown to provide effective descriptors for image search, outperforming pre-CNN features as short-vector representations. Yet such models are not compatible with geometry-aware re-ranking methods and still outperformed, on some particular object retrieval benchmarks, by traditional image search systems relying on precise descriptor matching, geometric re-ranking, or query expansion. This work revisits both retrieval stages, namely initial search and re-ranking, by employing the same primitive information derived from the CNN. We build compact feature vectors that encode several image regions without the need to feed multiple inputs to the network. Furthermore, we extend integral images to handle max-pooling on convolutional layer activations, allowing us to efficiently localize matching objects. The resulting bounding box is finally used for image re-ranking. As a result, this paper significantly improves existing CNN-based recognition pipeline: We report for the first time results competing with traditional methods on the challenging Oxford5k and Paris6k datasets.
Dirk Lewandowski
Purpose - To test major Web search engines on their performance on navigational queries, i.e. searches for homepages. Design/methodology/approach - 100 real user queries are posed to six search engines (Google, Yahoo, MSN, Ask, Seekport, and Exalead). Users described the desired pages, and the results position of these is recorded. Measured success N and mean reciprocal rank are calculated. Findings - Performance of the major search engines Google, Yahoo, and MSN is best, with around 90 percent of queries answered correctly. Ask and Exalead perform worse but receive good scores as well. Research limitations/implications - All queries were in German, and the German-language interfaces of the search engines were used. Therefore, the results are only valid for German queries. Practical implications - When designing a search engine to compete with the major search engines, care should be taken on the performance on navigational queries. Users can be influenced easily in their quality ratings of search engines based on this performance. Originality/value - This study systematically compares the major search engines on navigational queries and compares the findings with studies on the retrieval effectiveness of the engines on informational queries. Paper type - research paper
Dirk Lewandowski
Purpose: To compare five major Web search engines (Google, Yahoo, MSN,
Ask.com, and Seekport) for their retrieval effectiveness, taking into account
not only the results but also the results descriptions.
Design/Methodology/Approach: The study uses real-life queries. Results are
made anonymous and are randomised. Results are judged by the persons posing the
original queries.
Findings: The two major search engines, Google and Yahoo, perform best, and
there are no significant differences between them. Google delivers
significantly more relevant result descriptions than any other search engine.
This could be one reason for users perceiving this engine as superior.
Research Limitations: The study is based on a user model where the user takes
into account a certain amount of results rather systematically. This may not be
the case in real life.
Practical Implications: Implies that search engines should focus on relevant
descriptions. Searchers are advised to use other search engines in addition to
Google.
Originality/Value: This is the first major study comparing results and
descriptions systematically and proposes new retrieval measures to take into
account results descriptions
Authors' comments: Research paper, Word Wide Web, search engines, retrieval
effectiveness, results descriptions, retrieval measures
Dirk Lewandowski
This chapter presents a theoretical framework for evaluating next generation search engines. We focus on search engines whose results presentation is enriched with additional information and does not merely present the usual list of 10 blue links, that is, of ten links to results, accompanied by a short description. While Web search is used as an example here, the framework can easily be applied to search engines in any other area. The framework not only addresses the results presentation, but also takes into account an extension of the general design of retrieval effectiveness tests. The chapter examines the ways in which this design might influence the results of such studies and how a reliable test is best designed.
Aiwen Jiang, Hanxi Li, Yi Li, Mingwen Wang
Heterogeneous gap among different modalities emerges as one of the critical issues in modern AI problems. Unlike traditional uni-modal cases, where raw features are extracted and directly measured, the heterogeneous nature of cross modal tasks requires the intrinsic semantic representation to be compared in a unified framework. This paper studies the learning of different representations that can be retrieved across different modality contents. A novel approach for mining cross-modal representations is proposed by incorporating explicit linear semantic projecting in Hilbert space. The insight is that the discriminative structures of different modality data can be linearly represented in appropriate high dimension Hilbert spaces, where linear operations can be used to approximate nonlinear decisions in the original spaces. As a result, an efficient linear semantic down mapping is jointly learned for multimodal data, leading to a common space where they can be compared. The mechanism of "feature up-lifting and down-projecting" works seamlessly as a whole, which accomplishes crossmodal retrieval tasks very well. The proposed method, named as shared discriminative semantic representation learning (\textbf{SDSRL}), is tested on two public multimodal dataset for both within- and inter- modal retrieval. The experiments demonstrate that it outperforms several state-of-the-art methods in most scenarios.
Imran Sheikh, Irina Illina, Dominique Fohr, Georges Linarès
Many Proper Names (PNs) are Out-Of-Vocabulary (OOV) words for speech
recognition systems used to process diachronic audio data. To help recovery of
the PNs missed by the system, relevant OOV PNs can be retrieved out of the many
OOVs by exploiting semantic context of the spoken content. In this paper, we
propose two neural network models targeted to retrieve OOV PNs relevant to an
audio document: (a) Document level Continuous Bag of Words (D-CBOW), (b)
Document level Continuous Bag of Weighted Words (D-CBOW2). Both these models
take document words as input and learn with an objective to maximise the
retrieval of co-occurring OOV PNs. With the D-CBOW2 model we propose a new
approach in which the input embedding layer is augmented with a context anchor
layer. This layer learns to assign importance to input words and has the
ability to capture (task specific) key-words in a bag-of-word neural network
model. With experiments on French broadcast news videos we show that these two
models outperform the baseline methods based on raw embeddings from LDA,
Skip-gram and Paragraph Vectors. Combining the D-CBOW and D-CBOW2 models gives
faster convergence during training.
Authors' comments: Updated references, added appendix discussing more results; added
more discussion, replaced simple phone search results with KWS results; added
KWS results for both training phase, probably last update
Laura Mersini-Houghton
The retrieval of black hole information was recently presented in two
interesting proposals in the 'Hawking Radiation' conference: a revised version
by G. 't Hooft of a proposal he initially suggested 20 years ago and, a new
proposal by S. Hawking. Both proposals address the problem of black hole
information loss at the classical level and derive an expression for the
scattering matrix. The former uses gravitation back reaction of incoming
particles that imprints its information on the outgoing modes. The latter uses
supertranslation symmetry of horizons to relate a phase delay of the outgoing
wave packet compared to their incoming wave partners. The difficulty in both
proposals is that the entropy obtained from them appears to be infinite.
By including quantum effects into the Hawking and 't Hooft's proposals, I
show that a subtlety arising from the inescapable measurement process, the
Quantum Zeno Effect, not only tames divergences but it actually recovers the
correct $1/4$ of the area Bekenstein-Hawking entropy law of black holes.
Authors' comments: 9 pgs., 1 figure
S. R. Kane, S. D. Domagal-Goldman, J. R. Herman, T. D. Robinson, A. R. Stine
The field of exoplanets has rapidly expanded from the exclusivity of
exoplanet detection to include exoplanet characterization. A key step towards
this characterization will be retrieval of planetary albedos and rotation rates
from highly undersampled imaging data. The Deep Space Climate Observatory
(DSCOVR) provides a unique opportunity to test such retrieval methods using
high cadence data of the sunlit surface of the Earth. There are two NASA
instruments on board DSCOVR that can be used to achieve this task: the NASA
instruments Earth Polychromatic Imaging Camera (EPIC) and the National
Institute of Standards and Technology Advanced Radiometer (NISTAR). Here we
briefly describe the properties of these instruments and the exoplanetary
science that can be explored with their data products. These are described
within the context of future NASA direct imaging missions for exoplanets.
Authors' comments: 3 pages, 2 figures; to appear in the proceedings of the Comparative
Climates of Terrestrial Planets II conference
Jie Lin, Olivier Morère, Julie Petta, Vijay Chandrasekhar, Antoine Veillard
A typical image retrieval pipeline starts with the comparison of global descriptors from a large database to find a short list of candidate matches. A good image descriptor is key to the retrieval pipeline and should reconcile two contradictory requirements: providing recall rates as high as possible and being as compact as possible for fast matching. Following the recent successes of Deep Convolutional Neural Networks (DCNN) for large scale image classification, descriptors extracted from DCNNs are increasingly used in place of the traditional hand crafted descriptors such as Fisher Vectors (FV) with better retrieval performances. Nevertheless, the dimensionality of a typical DCNN descriptor --extracted either from the visual feature pyramid or the fully-connected layers-- remains quite high at several thousands of scalar values. In this paper, we propose Unsupervised Triplet Hashing (UTH), a fully unsupervised method to compute extremely compact binary hashes --in the 32-256 bits range-- from high-dimensional global descriptors. UTH consists of two successive deep learning steps. First, Stacked Restricted Boltzmann Machines (SRBM), a type of unsupervised deep neural nets, are used to learn binary embedding functions able to bring the descriptor size down to the desired bitrate. SRBMs are typically able to ensure a very high compression rate at the expense of loosing some desirable metric properties of the original DCNN descriptor space. Then, triplet networks, a rank learning scheme based on weight sharing nets is used to fine-tune the binary embedding functions to retain as much as possible of the useful metric properties of the original space. A thorough empirical evaluation conducted on multiple publicly available dataset using DCNN descriptors shows that our method is able to significantly outperform state-of-the-art unsupervised schemes in the target bit range.
Adrian Groza, Lidia Corde
Our aim is to extract information about literary characters in unstructured
texts. We employ natural language processing and reasoning on domain
ontologies. The first task is to identify the main characters and the parts of
the story where these characters are described or act. We illustrate the system
in a scenario in the folktale domain. The system relies on a folktale ontology
that we have developed based on Propp's model for folktales morphology.
Authors' comments: IEEE 11 International Conference on Intelligent Computer
Communication and Processing (ICCP2015), Cluj-Napoca, Romania, 3-5 September
2014
Li-Hao Yeh, Jonathan Dong, Jingshan Zhong, Lei Tian, Michael Chen, Gongguo Tang, Mahdi Soltanolkotabi, Laura Waller
Fourier ptychography is a new computational microscopy technique that provides gigapixel-scale intensity and phase images with both wide field-of-view and high resolution. By capturing a stack of low-resolution images under different illumination angles, a nonlinear inverse algorithm can be used to computationally reconstruct the high-resolution complex field. Here, we compare and classify multiple proposed inverse algorithms in terms of experimental robustness. We find that the main sources of error are noise, aberrations and mis-calibration (i.e. model mis-match). Using simulations and experiments, we demonstrate that the choice of cost function plays a critical role, with amplitude-based cost functions performing better than intensity-based ones. The reason for this is that Fourier ptychography datasets consist of images from both brightfield and darkfield illumination, representing a large range of measured intensities. Both noise (e.g. Poisson noise) and model mis-match errors are shown to scale with intensity. Hence, algorithms that use an appropriate cost function will be more tolerant to both noise and model mis-match. Given these insights, we propose a global Newton's method algorithm which is robust and computationally efficient. Finally, we discuss the impact of procedures for algorithmic correction of aberrations and mis-calibration.
A. D. Koulouklidis, V. Yu. Fedorov, S. Tzortzakis
Most transmission and detection channels fail to faithfully support broadband
wave packets because of physical limitations, like chromatic dispersion and
absorption. We explore the case of lossy detection of ultrashort THz pulses
using the widespread electro-optic detection scheme. We demonstrate that one
can fully recover the original THz pulse shape, duration and amplitude, using a
simple experimental procedure and a reconstruction algorithm which encodes the
physical properties of the detection system.
Authors' comments: 8 pages, 5 figures
Philipp Mayr, Ingo Frommholz, Guillaume Cabanac
The BIR workshop brings together experts in Bibliometrics and Information
Retrieval. While sometimes perceived as rather loosely related, these research
areas share various interests and face similar challenges. Our motivation as
organizers of the BIR workshop stemmed from a twofold observation. First, both
communities only partly overlap, albeit sharing various interests. Second, it
will be profitable for both sides to tackle some of the emerging problems that
scholars face today when they have to identify relevant and high quality
literature in the fast growing number of electronic publications available
worldwide. Bibliometric techniques are not yet used widely to enhance retrieval
processes in digital libraries, although they offer value-added effects for
users. Information professionals working in libraries and archives, however,
are increasingly confronted with applying bibliometric techniques in their
services. The first BIR workshop in 2014 set the research agenda by introducing
each group to the other, illustrating state-of-the-art methods, reporting on
current research problems, and brainstorming about common interests. The second
workshop in 2015 further elaborated these themes. This third BIR workshop aims
to foster a common ground for the incorporation of bibliometric-enhanced
services into scholarly search engine interfaces. In particular we will address
specific communities, as well as studies on large, cross-domain collections
like Mendeley and ResearchGate. This third BIR workshop addresses explicitly
both scholarly and industrial researchers.
Authors' comments: 4 pages, 38th European Conference on IR Research, ECIR 2016, Padova,
Italy. arXiv admin note: substantial text overlap with arXiv:1501.02646
Mihaela Mitici, Jasper Goseling, Maurits de Graaf, Richard J. Boucherie
We consider the problem of retrieving a reliable estimate of an attribute
monitored by a wireless sensor network, where the sensors harvest energy from
the environment independently, at random. Each sensor stores the harvested
energy in batteries of limited capacity. Moreover, provided they have
sufficient energy, the sensors broadcast their measurements in a decentralized
fashion. Clients arrive at the sensor network according to a Poisson process
and are interested in retrieving a fixed number of sensor measurements, based
on which a reliable estimate is computed. We show that the time until an
arbitrary sensor broadcasts has a phase-type distribution. Based on this result
and the theory of order statistics of phase-type distributions, we determine
the probability distribution of the time needed for a client to retrieve a
reliable estimate of an attribute monitored by the sensor network. We also
provide closed-form expression for the retrieval time of a reliable estimate
when the capacity of the sensor battery or the rate at which energy is
harvested is asymptotically large. In addition, we analyze numerically the
retrieval time of a reliable estimate for various sizes of the sensor network,
maximum capacity of the sensor batteries and rate at which energy is harvested.
These results show that the energy harvesting rate and the broadcasting rate
are the main parameters that influence the retrieval time of a reliable
estimate, while deploying sensors with large batteries does not significantly
reduce the retrieval time.
Authors' comments: 14 pages, 3 figures
Peng Zhang, Qian Yu, Yuexian Hou, Dawei Song, Jingfei Li, Bin Hu
Recently, a Distribution Separation Method (DSM) is proposed for relevant feedback in information retrieval, which aims to approximate the true relevance distribution by separating a seed irrelevance distribution from the mixture one. While DSM achieved a promising empirical performance, theoretical analysis of DSM is still need further study and comparison with other relative retrieval model. In this article, we first generalize DSM's theoretical property, by proving that its minimum correlation assumption is equivalent to the maximum (original and symmetrized) KL-Divergence assumption. Second, we also analytically show that the EM algorithm in a well-known Mixture Model is essentially a distribution separation process and can be simplified using the linear separation algorithm in DSM. Some empirical results are also presented to support our theoretical analysis.
Thierry Pinheiro Moreira, Mauricio Lisboa Perez, Rafael de Oliveira Werneck, Eduardo Valle
A pet that goes missing is among many people's worst fears: a moment of
distraction is enough for a dog or a cat wandering off from home. Some measures
help matching lost animals to their owners; but automated visual recognition is
one that - although convenient, highly available, and low-cost - is
surprisingly overlooked. In this paper, we inaugurate that promising avenue by
pursuing face recognition for dogs. We contrast four ready-to-use human facial
recognizers (EigenFaces, FisherFaces, LBPH, and a Sparse method) to two
original solutions based upon convolutional neural networks: BARK (inspired in
architecture-optimized networks employed for human facial recognition) and WOOF
(based upon off-the-shelf OverFeat features). Human facial recognizers perform
poorly for dogs (up to 60.5% accuracy), showing that dog facial recognition is
not a trivial extension of human facial recognition. The convolutional network
solutions work much better, with BARK attaining up to 81.1% accuracy, and WOOF,
89.4%. The tests were conducted in two datasets: Flickr-dog, with 42 dogs of
two breeds (pugs and huskies); and Snoopybook, with 18 mongrel dogs.
Authors' comments: 17 pages, 8 figures, 1 table, Multimedia Tools and Applications
Khan Muhammad, Irfan Mehmood, Mi Young Lee, Su Mi Ji, Sung Wook Baik
Image classification is an enthusiastic research field where large amount of
image data is classified into various classes based on their visual contents.
Researchers have presented various low-level features-based techniques for
classifying images into different categories. However, efficient and effective
classification and retrieval is still a challenging problem due to complex
nature of visual contents. In addition, the traditional information retrieval
techniques are vulnerable to security risks, making it easy for attackers to
retrieve personal visual contents such as patients records and law enforcement
agencies databases. Therefore, we propose a novel ontology-based framework
using image steganography for secure image classification and information
retrieval. The proposed framework uses domain-specific ontology for mapping the
low-level image features to high-level concepts of ontologies which
consequently results in efficient classification. Furthermore, the proposed
method utilizes image steganography for hiding the image semantics as a secret
message inside them, making the information retrieval process secure from third
parties. The proposed framework minimizes the computational complexity of
traditional techniques, increasing its suitability for secure and real-time
visual contents retrieval from personalized image databases. Experimental
results confirm the efficiency, effectiveness, and security of the proposed
framework as compared with other state-of-the-art systems.
Authors' comments: A short paper of 11 pages for secure visual contents retrieval.The
original version can be accessed at this link:
http://www.kingpc.or.kr/inc_html/index.html
Benjamin Piwowarski, Sylvain Lamprier, Nicolas Despres
Information Retrieval (IR) models need to deal with two difficult issues, vocabulary mismatch and term dependencies. Vocabulary mismatch corresponds to the difficulty of retrieving relevant documents that do not contain exact query terms but semantically related terms. Term dependencies refers to the need of considering the relationship between the words of the query when estimating the relevance of a document. A multitude of solutions has been proposed to solve each of these two problems, but no principled model solve both. In parallel, in the last few years, language models based on neural networks have been used to cope with complex natural language processing tasks like emotion and paraphrase detection. Although they present good abilities to cope with both term dependencies and vocabulary mismatch problems, thanks to the distributed representation of words they are based upon, such models could not be used readily in IR, where the estimation of one language model per document (or query) is required. This is both computationally unfeasible and prone to over-fitting. Based on a recent work that proposed to learn a generic language model that can be modified through a set of document-specific parameters, we explore use of new neural network models that are adapted to ad-hoc IR tasks. Within the language model IR framework, we propose and study the use of a generic language model as well as a document-specific language model. Both can be used as a smoothing component, but the latter is more adapted to the document at hand and has the potential of being used as a full document language model. We experiment with such models and analyze their results on TREC-1 to 8 datasets.