Jinghui Chen, Lingxiao Wang, Xiao Zhang, Quanquan Gu
We consider the robust phase retrieval problem of recovering the unknown
signal from the magnitude-only measurements, where the measurements can be
contaminated by both sparse arbitrary corruption and bounded random noise. We
propose a new nonconvex algorithm for robust phase retrieval, namely Robust
Wirtinger Flow to jointly estimate the unknown signal and the sparse
corruption. We show that our proposed algorithm is guaranteed to converge
linearly to the unknown true signal up to a minimax optimal statistical
precision in such a challenging setting. Compared with existing robust phase
retrieval methods, we achieve an optimal sample complexity of $O(n)$ in both
noisy and noise-free settings. Thorough experiments on both synthetic and real
datasets corroborate our theory.
Authors' comments: 29 pages, 5 figures, 2 tables
Giorgos Tolias, Ondřej Chum
We propose a novel concept of asymmetric feature maps (AFM), which allows to
evaluate multiple kernels between a query and database entries without
increasing the memory requirements. To demonstrate the advantages of the AFM
method, we derive a short vector image representation that, due to asymmetric
feature maps, supports efficient scale and translation invariant sketch-based
image retrieval. Unlike most of the short-code based retrieval systems, the
proposed method provides the query localization in the retrieved image. The
efficiency of the search is boosted by approximating a 2D translation search
via trigonometric polynomial of scores by 1D projections. The projections are a
special case of AFM. An order of magnitude speed-up is achieved compared to
traditional trigonometric polynomials. The results are boosted by an
image-based average query expansion, exceeding significantly the state of the
art on standard benchmarks.
Authors' comments: CVPR 2017
Christina Lioma, Roi Blanco
Automatic language processing tools typically assign to terms so-called weights corresponding to the contribution of terms to information content. Traditionally, term weights are computed from lexical statistics, e.g., term frequencies. We propose a new type of term weight that is computed from part of speech (POS) n-gram statistics. The proposed POS-based term weight represents how informative a term is in general, based on the POS contexts in which it generally occurs in language. We suggest five different computations of POS-based term weights by extending existing statistical approximations of term information measures. We apply these POS-based term weights to information retrieval, by integrating them into the model that matches documents to queries. Experiments with two TREC collections and 300 queries, using TF-IDF & BM25 as baselines, show that integrating our POS-based term weights to retrieval always leads to gains (up to +33.7% from the baseline). Additional experiments with a different retrieval model as baseline (Language Model with Dirichlet priors smoothing) and our best performing POS-based term weight, show retrieval gains always and consistently across the whole smoothing range of the baseline.
Yusuke Watanabe, Bhuwan Dhingra, Ruslan Salakhutdinov
Open domain Question Answering (QA) systems must interact with external knowledge sources, such as web pages, to find relevant information. Information sources like Wikipedia, however, are not well structured and difficult to utilize in comparison with Knowledge Bases (KBs). In this work we present a two-step approach to question answering from unstructured text, consisting of a retrieval step and a comprehension step. For comprehension, we present an RNN based attention model with a novel mixture mechanism for selecting answers from either retrieved articles or a fixed vocabulary. For retrieval we introduce a hand-crafted model and a neural model for ranking relevant articles. We achieve state-of-the-art performance on W IKI M OVIES dataset, reducing the error by 40%. Our experimental results further demonstrate the importance of each of the introduced components.
Adnan Qayyum, Syed Muhammad Anwar, Muhammad Awais, Muhammad Majid
With a widespread use of digital imaging data in hospitals, the size of
medical image repositories is increasing rapidly. This causes difficulty in
managing and querying these large databases leading to the need of content
based medical image retrieval (CBMIR) systems. A major challenge in CBMIR
systems is the semantic gap that exists between the low level visual
information captured by imaging devices and high level semantic information
perceived by human. The efficacy of such systems is more crucial in terms of
feature representations that can characterize the high-level information
completely. In this paper, we propose a framework of deep learning for CBMIR
system by using deep Convolutional Neural Network (CNN) that is trained for
classification of medical images. An intermodal dataset that contains twenty
four classes and five modalities is used to train the network. The learned
features and the classification results are used to retrieve medical images.
For retrieval, best results are achieved when class based predictions are used.
An average classification accuracy of 99.77% and a mean average precision of
0.69 is achieved for retrieval task. The proposed method is best suited to
retrieve multimodal medical images for different body organs.
Authors' comments: Submitted to Neurocomputing
Federico Venturi, Marco Campanini, Gian Carlo Gazzadi, Roberto Balboni, Stefano Frabboni, Robert W. Boyd, Rafal E. Dunin-Borkowski, Ebrahim Karimi et al.
In both light optics and electron optics, the amplitude of a wave scattered by an object is an observable that is usually recorded in the form of an intensity distribution in a real space image or a diffraction image. In contrast, retrieval of the phase of a scattered wave is a well-known challenge, which is usually approached by interferometric or numerical methods. In electron microscopy, as a result of constraints in the lens setup, it is particularly difficult to retrieve the phase of a diffraction image. Here, we use a defocused beam generated by a nanofabricated hologram to form a reference wave that can be interfered with a diffracted beam. This setup provides an extended interference region with the sample wavefunction in the Fraunhofer plane. As a case study, we retrieve the phase of an electron vortex beam. Beyond this specific example, the approach can be used to retrieve the wavefronts of diffracted beams from a wide range of samples.
Gagandeep Singh Narula, Vishal Jain
A typical IR system that delivers and stores information is affected by problem of matching between user query and available content on web. Use of Ontology represents the extracted terms in form of network graph consisting of nodes, edges, index terms etc. The above mentioned IR approaches provide relevance thus satisfying users query. The paper also emphasis on analyzing multimedia documents and performs calculation for extracted terms using different statistical formulas. The proposed model developed reduces semantic gap and satisfies user needs efficiently.
Li Liu, Fumin Shen, Yuming Shen, Xianglong Liu, Ling Shao
Free-hand sketch-based image retrieval (SBIR) is a specific cross-view
retrieval task, in which queries are abstract and ambiguous sketches while the
retrieval database is formed with natural images. Work in this area mainly
focuses on extracting representative and shared features for sketches and
natural images. However, these can neither cope well with the geometric
distortion between sketches and images nor be feasible for large-scale SBIR due
to the heavy continuous-valued distance computation. In this paper, we speed up
SBIR by introducing a novel binary coding method, named \textbf{Deep Sketch
Hashing} (DSH), where a semi-heterogeneous deep architecture is proposed and
incorporated into an end-to-end binary coding framework. Specifically, three
convolutional neural networks are utilized to encode free-hand sketches,
natural images and, especially, the auxiliary sketch-tokens which are adopted
as bridges to mitigate the sketch-image geometric distortion. The learned DSH
codes can effectively capture the cross-view similarities as well as the
intrinsic semantic correlations between different categories. To the best of
our knowledge, DSH is the first hashing work specifically designed for
category-level SBIR with an end-to-end deep architecture. The proposed DSH is
comprehensively evaluated on two large-scale datasets of TU-Berlin Extension
and Sketchy, and the experiments consistently show DSH's superior SBIR
accuracies over several state-of-the-art methods, while achieving significantly
reduced retrieval time and memory footprint.
Authors' comments: This paper will appear as a spotlight paper in CVPR2017
Sailesh Conjeti, Magdalini Paschali, Amin Katouzian, Nassir Navab
In this paper, for the first time, we introduce a multiple instance (MI) deep
hashing technique for learning discriminative hash codes with weak bag-level
supervision suited for large-scale retrieval. We learn such hash codes by
aggregating deeply learnt hierarchical representations across bag members
through a dedicated MI pool layer. For better trainability and retrieval
quality, we propose a two-pronged approach that includes robust optimization
and training with an auxiliary single instance hashing arm which is
down-regulated gradually. We pose retrieval for tumor assessment as an MI
problem because tumors often coexist with benign masses and could exhibit
complementary signatures when scanned from different anatomical views.
Experimental validations on benchmark mammography and histology datasets
demonstrate improved retrieval performance over the state-of-the-art methods.
Authors' comments: 10 pages, 7 figures, under review at MICCAI 2017
Philippe Jaming, Salvador Pérez-Esteva
In this paper we consider the phase retrieval problem for Herglotz functions, that is, solutions of the Helmholtz equation $\Delta u+\lambda^2u=0$ on domains $\Omega\subset\mathbb{R}^d$, $d\geq2$. In dimension $d=2$, if $u,v$ are two such solutions then $|u|=|v|$ implies that either $u=cv$ or $u=c\bar v$ for some $c\in\mathbb{C}$ with $|c|=1$. In dimension $d\geq3$, the same conclusion holds under some restriction on $u$ and $v$: either they are real valued or zonal functions or have non vanishing mean.
Zakaria Laskar, Juho Kannala
The current models of image representation based on Convolutional Neural
Networks (CNN) have shown tremendous performance in image retrieval. Such
models are inspired by the information flow along the visual pathway in the
human visual cortex. We propose that in the field of particular object
retrieval, the process of extracting CNN representations from query images with
a given region of interest (ROI) can also be modelled by taking inspiration
from human vision. Particularly, we show that by making the CNN pay attention
on the ROI while extracting query image representation leads to significant
improvement over the baseline methods on challenging Oxford5k and Paris6k
datasets. Furthermore, we propose an extension to a recently introduced
encoding method for CNN representations, regional maximum activations of
convolutions (R-MAC). The proposed extension weights the regional
representations using a novel saliency measure prior to aggregation. This leads
to further improvement in retrieval accuracy.
Authors' comments: 14 pages, Extended version of a manuscript submitted to SCIA 2017
Itzik Malkiel, Achiya Nagler, Michael Mrejen, Uri Arieli, Lior Wolf, Haim Suchowski
Our visual perception of our surroundings is ultimately limited by the diffraction limit, which stipulates that optical information smaller than roughly half the illumination wavelength is not retrievable. Over the past decades, many breakthroughs have led to unprecedented imaging capabilities beyond the diffraction-limit, with applications in biology and nanotechnology. In this context, nano-photonics has revolutionized the field of optics in recent years by enabling the manipulation of light-matter interaction with subwavelength structures. However, despite the many advances in this field, its impact and penetration in our daily life has been hindered by a convoluted and iterative process, cycling through modeling, nanofabrication and nano-characterization. The fundamental reason is the fact that not only the prediction of the optical response is very time consuming and requires solving Maxwell's equations with dedicated numerical packages. But, more significantly, the inverse problem, i.e. designing a nanostructure with an on-demand optical response, is currently a prohibitive task even with the most advanced numerical tools due to the high non-linearity of the problem. Here, we harness the power of Deep Learning, a new path in modern machine learning, and show its ability to predict the geometry of nanostructures based solely on their far-field response. This approach also addresses in a direct way the currently inaccessible inverse problem breaking the ground for on-demand design of optical response with applications such as sensing, imaging and also for plasmon's mediated cancer thermotherapy.
Eng-Jon Ong, Sameed Husain, Miroslaw Bober
This paper addresses the problem of large scale image retrieval, with the aim of accurately ranking the similarity of a large number of images to a given query image. To achieve this, we propose a novel Siamese network. This network consists of two computational strands, each comprising of a CNN component followed by a Fisher vector component. The CNN component produces dense, deep convolutional descriptors that are then aggregated by the Fisher Vector method. Crucially, we propose to simultaneously learn both the CNN filter weights and Fisher Vector model parameters. This allows us to account for the evolving distribution of deep descriptors over the course of the learning process. We show that the proposed approach gives significant improvements over the state-of-the-art methods on the Oxford and Paris image retrieval datasets. Additionally, we provide a baseline performance measure for both these datasets with the inclusion of 1 million distractors.
Robert Beinert, Gerlind Plonka
In this paper, we show that sparse signals f representable as a linear combination of a finite number N of spikes at arbitrary real locations or as a finite linear combination of B-splines of order m with arbitrary real knots can be almost surely recovered from O(N^2) Fourier intensity measurements up to trivial ambiguities. The constructive proof consists of two steps, where in the first step the Prony method is applied to recover all parameters of the autocorrelation function and in the second step the parameters of f are derived. Moreover, we present an algorithm to evaluate f from its Fourier intensities and illustrate it at different numerical examples.
Osman Tursun, Cemal Aker, Sinan Kalkan
Trademark retrieval (TR) has become an important yet challenging problem due to an ever increasing trend in trademark applications and infringement incidents. There have been many promising attempts for the TR problem, which, however, fell impracticable since they were evaluated with limited and mostly trivial datasets. In this paper, we provide a large-scale dataset with benchmark queries with which different TR approaches can be evaluated systematically. Moreover, we provide a baseline on this benchmark using the widely-used methods applied to TR in the literature. Furthermore, we identify and correct two important issues in TR approaches that were not addressed before: reversal of contrast, and presence of irrelevant text in trademarks severely affect the TR methods. Lastly, we applied deep learning, namely, several popular Convolutional Neural Network models, to the TR problem. To the best of the authors, this is the first attempt to do so.
Vijay Chandrasekhar, Jie Lin, Qianli Liao, Olivier Morère, Antoine Veillard, Lingyu Duan, Tomaso Poggio
Image instance retrieval is the problem of retrieving images from a database
which contain the same object. Convolutional Neural Network (CNN) based
descriptors are becoming the dominant approach for generating {\it global image
descriptors} for the instance retrieval problem. One major drawback of
CNN-based {\it global descriptors} is that uncompressed deep neural network
models require hundreds of megabytes of storage making them inconvenient to
deploy in mobile applications or in custom hardware. In this work, we study the
problem of neural network model compression focusing on the image instance
retrieval task. We study quantization, coding, pruning and weight sharing
techniques for reducing model size for the instance retrieval problem. We
provide extensive experimental results on the trade-off between retrieval
performance and model size for different types of networks on several data sets
providing the most comprehensive study on this topic. We compress models to the
order of a few MBs: two orders of magnitude smaller than the uncompressed
models while achieving negligible loss in retrieval performance.
Authors' comments: 10 pages, accepted by DCC 2017
Juan Echeverría, Shi Zhou
It is known that many Twitter users are bots, which are accounts controlled
and sometimes created by computers. Twitter bots can send spam tweets,
manipulate public opinion and be used for online fraud. Here we report the
discovery, retrieval, and analysis of the `Star Wars' botnet in Twitter, which
consists of more than 350,000 bots tweeting random quotations exclusively from
Star Wars novels.
The botnet contains a single type of bot, showing exactly the same properties
throughout the botnet. It is unusually large, many times larger than other
available datasets. It provides a valuable source of ground truth for research
on Twitter bots. We analysed and revealed rich details on how the botnet was
designed and created. As of this writing, the Star Wars bots are still alive in
Twitter. They have survived since their creation in 2013, despite the
increasing efforts in recent years to detect and remove Twitter bots.We also
reflect on the `unconventional' way in which we discovered the Star Wars bots,
and discuss the current problems and future challenges of Twitter bot
detection.
Authors' comments: Accepted for publication at ASONAM 2017
Ben Burningham, Mark S. Marley, Michael R. Line, Roxana Lupu, Channon Visscher, Caroline V. Morley, Didier Saumon, Richard Freedman
We present the first results from applying the spectral inversion technique
in the cloudy L dwarf regime. Our new framework provides a flexible approach to
modelling cloud opacity which can be built incrementally as the data requires,
and improves upon previous retrieval experiments in the brown dwarf regime by
allowing for scattering in two stream radiative transfer. Our first application
of the tool to two mid-L dwarfs is able to reproduce their near-infrared
spectra far more closely than grid models. Our retrieved thermal, chemical, and
cloud profiles allow us to estimate $T_{\rm eff} = 1796^{+23}_{-25}$ K and
$\log g = 5.21^{+0.05}_{-0.08}$ for 2MASS J05002100+0330501 and for 2MASSW
J2224438-015852 we find $T_{\rm eff} = 1723^{+18}_{-19}$ K and $\log g =
5.31^{+0.04}_{-0.08}$, in close agreement with previous empirical estimates.
Our best model for both objects includes an optically thick cloud deck which
passes $\tau_{cloud} \geq 1$ (looking down) at a pressure of around 5 bar. The
temperature at this pressure is too high for silicate species to condense, and
we argue that corundum and/or iron clouds are responsible for this cloud
opacity. Our retrieved profiles are cooler at depth, and warmer at altitude
than the forward grid models that we compare, and we argue that some form of
heating mechanism may be at work in the upper atmospheres of these L dwarfs. We
also identify anomalously high CO abundance in both targets, which does not
correlate with the warmth of our upper atmospheres or our choice of cloud
model, and find similarly anomalous alkali abundance for one of our targets.
These anomalies may reflect unrecognised shortcomings in our retrieval model,
or inaccuracies in our gas phase opacities.
Authors' comments: Accepted for publication in MNRAS. Changes in review: additional
validation against T dwarf test case, posteriors for cloud free retrievals
and contribution functions added to Appendix. Also, entire investigation
re-run with new UCL H2O opacities, with minimal impact on results
Spencer Cappallo, Thomas Mensink, Cees G. M. Snoek
Retrieval of live, user-broadcast video streams is an under-addressed and
increasingly relevant challenge. The on-line nature of the problem requires
temporal evaluation and the unforeseeable scope of potential queries motivates
an approach which can accommodate arbitrary search queries. To account for the
breadth of possible queries, we adopt a no-example approach to query retrieval,
which uses a query's semantic relatedness to pre-trained concept classifiers.
To adapt to shifting video content, we propose memory pooling and memory
welling methods that favor recent information over long past content. We
identify two stream retrieval tasks, instantaneous retrieval at any particular
time and continuous retrieval over a prolonged duration, and propose means for
evaluating them. Three large scale video datasets are adapted to the challenge
of stream retrieval. We report results for our search methods on the new stream
retrieval tasks, as well as demonstrate their efficacy in a traditional,
non-streaming video task.
Authors' comments: Presented at BMVC 2016, British Machine Vision Conference, 2016
Hyeonwoo Noh, Andre Araujo, Jack Sim, Tobias Weyand, Bohyung Han
We propose an attentive local feature descriptor suitable for large-scale
image retrieval, referred to as DELF (DEep Local Feature). The new feature is
based on convolutional neural networks, which are trained only with image-level
annotations on a landmark image dataset. To identify semantically useful local
features for image retrieval, we also propose an attention mechanism for
keypoint selection, which shares most network layers with the descriptor. This
framework can be used for image retrieval as a drop-in replacement for other
keypoint detectors and descriptors, enabling more accurate feature matching and
geometric verification. Our system produces reliable confidence scores to
reject false positives---in particular, it is robust against queries that have
no correct match in the database. To evaluate the proposed descriptor, we
introduce a new large-scale dataset, referred to as Google-Landmarks dataset,
which involves challenges in both database and query such as background
clutter, partial occlusion, multiple landmarks, objects in variable scales,
etc. We show that DELF outperforms the state-of-the-art global and local
descriptors in the large-scale setting by significant margins. Code and dataset
can be found at the project webpage:
https://github.com/tensorflow/models/tree/master/research/delf .
Authors' comments: ICCV 2017. Code and dataset available:
https://github.com/tensorflow/models/tree/master/research/delf