Zhonghao Wang, Yujun Gu, Ya Zhang, Jun Zhou, Xiao Gu
Clothing retrieval is a challenging problem in computer vision. With the
advance of Convolutional Neural Networks (CNNs), the accuracy of clothing
retrieval has been significantly improved. FashionNet[1], a recent study,
proposes to employ a set of artificial features in the form of landmarks for
clothing retrieval, which are shown to be helpful for retrieval. However, the
landmark detection module is trained with strong supervision which requires
considerable efforts to obtain. In this paper, we propose a self-learning
Visual Attention Model (VAM) to extract attention maps from clothing images.
The VAM is further connected to a global network to form an end-to-end network
structure through Impdrop connection which randomly Dropout on the feature maps
with the probabilities given by the attention map. Extensive experiments on
several widely used benchmark clothing retrieval data sets have demonstrated
the promise of the proposed method. We also show that compared to the trivial
Product connection, the Impdrop connection makes the network structure more
robust when training sets of limited size are used.
Authors' comments: 4 pages, to be presented at IEEE VCIP 2017
Andras Tüzkö, Christian Herrmann, Daniel Manger, Jürgen Beyerer
Current logo retrieval research focuses on closed set scenarios. We argue
that the logo domain is too large for this strategy and requires an open set
approach. To foster research in this direction, a large-scale logo dataset,
called Logos in the Wild, is collected and released to the public. A typical
open set logo retrieval application is, for example, assessing the
effectiveness of advertisement in sports event broadcasts. Given a query sample
in shape of a logo image, the task is to find all further occurrences of this
logo in a set of images or videos. Currently, common logo retrieval approaches
are unsuitable for this task because of their closed world assumption. Thus, an
open set logo retrieval method is proposed in this work which allows searching
for previously unseen logos by a single query sample. A two stage concept with
separate logo detection and comparison is proposed where both modules are based
on task specific CNNs. If trained with the Logos in the Wild data, significant
performance improvements are observed, especially compared with
state-of-the-art closed set approaches.
Authors' comments: accepted at VISAPP 2018
Siddharth Gandhi, Nikku Madhusudhan
Thermal emission spectra of exoplanets provide constraints on the chemical
compositions, pressure-temperature (P-T) profiles, and energy transport in
exoplanetary atmospheres. Accurate inferences of these properties rely on the
robustness of the atmospheric retrieval methods employed. While extant
retrieval codes have provided significant constraints on molecular abundances
and temperature profiles in several exoplanetary atmospheres, the constraints
on their deviations from thermal and chemical equilibria have yet to be fully
explored. Our present work is a step in this direction. We report HyDRA, a
disequilibrium retrieval framework for thermal emission spectra of exoplanetary
atmospheres. The retrieval code uses the standard architecture of a parametric
atmospheric model coupled with Bayesian statistical inference using the Nested
Sampling algorithm. For a given dataset, the retrieved compositions and P-T
profiles are used in tandem with the GENESIS self-consistent atmospheric model
to constrain layer-by-layer deviations from chemical and radiative-convective
equilibrium in the observable atmosphere. We demonstrate HyDRA on the Hot
Jupiter WASP-43b with a high-precision emission spectrum. We retrieve an H2O
mixing ratio of log(H2O) = -3.54^{+0.82}_{-0.52}, consistent with previous
studies. We detect H2O and a combined CO/CO2 at 8-sigma significance. We find
the dayside P-T profile to be consistent with radiative-convective equilibrium
within the 1-sigma limits and with low day-night redistribution, consistent
with previous studies. The derived compositions are also consistent with
thermochemical equilibrium for the corresponding distribution of P-T profiles.
In the era of high precision and high resolution emission spectroscopy, HyDRA
provides a path to retrieve disequilibrium phenomena in exoplanetary
atmospheres.
Authors' comments: 20 pages, 13 figures, Accepted for publication in MNRAS
H. R. Tizhoosh, G. J. Czarnota
Marking tumors and organs is a challenging task suffering from both inter-
and intra-observer variability. The literature quantifies observer variability
by generating consensus among multiple experts when they mark the same image.
Automatically building consensus contours to establish quality assurance for
image segmentation is presently absent in the clinical practice. As the
\emph{big data} becomes more and more available, techniques to access a large
number of existing segments of multiple experts becomes possible. Fast
algorithms are, hence, required to facilitate the search for similar cases. The
present work puts forward a potential framework that tested with small datasets
(both synthetic and real images) displays the reliability of finding similar
images. In this paper, the idea of content-based barcodes is used to retrieve
similar cases in order to build consensus contours in medical image
segmentation. This approach may be regarded as an extension of the conventional
atlas-based segmentation that generally works with rather small atlases due to
required computational expenses. The fast segment-retrieval process via
barcodes makes it possible to create and use large atlases, something that
directly contributes to the quality of the consensus building. Because the
accuracy of experts' contours must be measured, we first used 500 synthetic
prostate images with their gold markers and delineations by 20 simulated users.
The fast barcode-guided computed consensus delivered an average error of
$8\%\!\pm\!5\%$ compared against the gold standard segments. Furthermore, we
used magnetic resonance images of prostates from 15 patients delineated by 5
oncologists and selected the best delineations to serve as the gold-standard
segments. The proposed barcode atlas achieved a Jaccard overlap of
$87\%\!\pm\!9\%$ with the contours of the gold-standard segments.
Authors' comments: Images used in this paper are available to the public:
http://kimia.uwaterloo.ca/
Shuang Li, Peter Mathews
We propose a novel image retrieval framework for visual saliency detection using information about salient objects contained within bounding box annotations for similar images. For each test image, we train a customized SVM from similar example images to predict the saliency values of its object proposals and generate an external saliency map (ES) by aggregating the regional scores. To overcome limitations caused by the size of the training dataset, we also propose an internal optimization module which computes an internal saliency map (IS) by measuring the low-level contrast information of the test image. The two maps, ES and IS, have complementary properties so we take a weighted combination to further improve the detection performance. Experimental results on several challenging datasets demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods.
Julien Lavauzelle
Private information retrieval (PIR) protocols allow a user to retrieve entries of a database without revealing the index of the desired item. Information-theoretical privacy can be achieved by the use of several servers and specific retrieval algorithms. Most of known PIR protocols focus on decreasing the number of bits exchanged between the client and the server(s) during the retrieval process. On another side, Fazeli et. al. introduced so-called PIR codes in order to reduce the storage overhead on the servers. However, only a few works address the issue of the computation complexity of the servers. In this paper, we show that a specific encoding of the database provides PIR protocols with reasonable communication complexity, low storage overhead and optimal computational complexity for the servers. This encoding is based on incidence matrices of transversal designs, from which a natural and efficient recovering algorithm is derived. We also present instances of our construction, making use of finite geometries and orthogonal arrays, and we finally give a generalisation of our main construction for resisting collusions of servers.
Shiv Ram Dubey
The local descriptors have been the backbone of most of the computer vision
problems. Most of the existing local descriptors are generated over the raw
input images. In order to increase the discriminative power of the local
descriptors, some researchers converted the raw image into multiple images with
the help of some high and low pass frequency filters, then the local
descriptors are computed over each filtered image and finally concatenated into
a single descriptor. By doing so, these approaches do not utilize the inter
frequency relationship which causes the less improvement in the discriminative
power of the descriptor that could be achieved. In this paper, this problem is
solved by utilizing the decoder concept of multi-channel decoded local binary
pattern over the multi-frequency patterns. A frequency decoded local binary
pattern (FDLBP) is proposed with two decoders. Each decoder works with one low
frequency pattern and two high frequency patterns. Finally, the descriptors
from both decoders are concatenated to form the single descriptor. The face
retrieval experiments are conducted over four benchmarks and challenging
databases such as PaSC, LFW, PubFig, and ESSEX. The experimental results
confirm the superiority of the FDLBP descriptor as compared to the
state-of-the-art descriptors such as LBP, SOBEL_LBP, BoF_LBP, SVD_S_LBP, mdLBP,
etc.
Authors' comments: Accepted in Multimedia Tools and Applications, Springer
Christina Lioma
Building machines that can understand text like humans is an AI-complete
problem. A great deal of research has already gone into this, with astounding
results, allowing everyday people to discuss with their telephones, or have
their reading materials analysed and classified by computers. A prerequisite
for processing text semantics, common to the above examples, is having some
computational representation of text as an abstract object. Operations on this
representation practically correspond to making semantic inferences, and by
extension simulating understanding text. The complexity and granularity of
semantic processing that can be realised is constrained by the mathematical and
computational robustness, expressiveness, and rigour of the tools used.
This dissertation contributes a series of such tools, diverse in their
mathematical formulation, but common in their application to model semantic
inferences when machines process text. These tools are principally expressed in
nine distinct models that capture aspects of semantic dependence in highly
interpretable and non-complex ways. This dissertation further reflects on
present and future problems with the current research paradigm in this area,
and makes recommendations on how to overcome them.
The amalgamation of the body of work presented in this dissertation advances
the complexity and granularity of semantic inferences that can be made
automatically by machines.
Authors' comments: This document is a doktordisputats - a dissertation within the Danish
academic system required to obtain the degree of \textit{Doctor Scientiarum},
in form and function equivalent to the French and German Habilitation and the
Higher Doctorate of the Commonwealth
Yi Li, Vasileios Nakos
In the compressive phase retrieval problem, or phaseless compressed sensing,
or compressed sensing from intensity only measurements, the goal is to
reconstruct a sparse or approximately $k$-sparse vector $x \in \mathbb{R}^n$
given access to $y= |\Phi x|$, where $|v|$ denotes the vector obtained from
taking the absolute value of $v\in\mathbb{R}^n$ coordinate-wise. In this paper
we present sublinear-time algorithms for different variants of the compressive
phase retrieval problem which are akin to the variants considered for the
classical compressive sensing problem in theoretical computer science. Our
algorithms use pure combinatorial techniques and near-optimal number of
measurements.
Authors' comments: The ell_2/ell_2 algorithm was substituted by a modification of the
ell_infty/ell_2 algorithm which strictly subsumes it
Xin Ji, Wei Wang, Meihui Zhang, Yang Yang
With the proliferation of e-commerce websites and the ubiquitousness of smart
phones, cross-domain image retrieval using images taken by smart phones as
queries to search products on e-commerce websites is emerging as a popular
application. One challenge of this task is to locate the attention of both the
query and database images. In particular, database images, e.g. of fashion
products, on e-commerce websites are typically displayed with other
accessories, and the images taken by users contain noisy background and large
variations in orientation and lighting. Consequently, their attention is
difficult to locate. In this paper, we exploit the rich tag information
available on the e-commerce websites to locate the attention of database
images. For query images, we use each candidate image in the database as the
context to locate the query attention. Novel deep convolutional neural network
architectures, namely TagYNet and CtxYNet, are proposed to learn the attention
weights and then extract effective representations of the images. Experimental
results on public datasets confirm that our approaches have significant
improvement over the existing methods in terms of the retrieval accuracy and
efficiency.
Authors' comments: 8 pages with an extra reference page
Swanand Kadhe, Brenden Garcia, Anoosheh Heidarzadeh, Salim El Rouayheb, Alex Sprintson
We study the problem of Private Information Retrieval (PIR) in the presence
of prior side information. The problem setup includes a database of $K$
independent messages possibly replicated on several servers, and a user that
needs to retrieve one of these messages. In addition, the user has some prior
side information in the form of a subset of $M$ messages, not containing the
desired message and unknown to the servers. This problem is motivated by
practical settings in which the user can obtain side information
opportunistically from other users or has previously downloaded some messages
using classical PIR schemes. The objective of the user is to retrieve the
required message without revealing its identity while minimizing the amount of
data downloaded from the servers.
We focus on achieving information-theoretic privacy in two scenarios: (i) the
user wants to protect jointly its demand and side information; (ii) the user
wants to protect only the information about its demand, but not the side
information. To highlight the role of side information, we focus first on the
case of a single server (single database). In the first scenario, we prove that
the minimum download cost is $K-M$ messages, and in the second scenario it is
$\lceil \frac{K}{M+1}\rceil$ messages, which should be compared to $K$
messages, the minimum download cost in the case of no side information. Then,
we extend some of our results to the case of the database replicated on
multiple servers. Our proof techniques relate PIR with side information to the
index coding problem. We leverage this connection to prove converse results, as
well as to design achievability schemes.
Authors' comments: Shorter version of the paper is accepted in Allerton Conference 2017
Shuo Zhang, Krisztian Balog
We address the task of ranking objects (such as people, blogs, or verticals)
that, unlike documents, do not have direct term-based representations. To be
able to match them against keyword queries, evidence needs to be amassed from
documents that are associated with the given object. We present two design
patterns, i.e., general reusable retrieval strategies, which are able to
encompass most existing approaches from the past. One strategy combines
evidence on the term level (early fusion), while the other does it on the
document level (late fusion). We demonstrate the generality of these patterns
by applying them to three different object retrieval tasks: expert finding,
blog distillation, and vertical ranking.
Authors' comments: Proceedings of the 39th European conference on Advances in
Information Retrieval (ECIR '17), 2017
Jingkuan Song
The most striking successes in image retrieval using deep hashing have mostly
involved discriminative models, which require labels. In this paper, we use
binary generative adversarial networks (BGAN) to embed images to binary codes
in an unsupervised way. By restricting the input noise variable of generative
adversarial networks (GAN) to be binary and conditioned on the features of each
input image, BGAN can simultaneously learn a binary representation per image,
and generate an image plausibly similar to the original one. In the proposed
framework, we address two main problems: 1) how to directly generate binary
codes without relaxation? 2) how to equip the binary representation with the
ability of accurate image retrieval? We resolve these problems by proposing new
sign-activation strategy and a loss function steering the learning process,
which consists of new models for adversarial loss, a content loss, and a
neighborhood structure loss. Experimental results on standard datasets
(CIFAR-10, NUSWIDE, and Flickr) demonstrate that our BGAN significantly
outperforms existing hashing methods by up to 107\% in terms of~mAP (See Table
tab.res.map.comp) Our anonymous code is available at:
https://github.com/htconquer/BGAN.
Authors' comments: arXiv admin note: text overlap with arXiv:1702.00758 by other authors
Christophe Van Gysel, Maarten de Rijke, Evangelos Kanoulas
We propose the Neural Vector Space Model (NVSM), a method that learns
representations of documents in an unsupervised manner for news article
retrieval. In the NVSM paradigm, we learn low-dimensional representations of
words and documents from scratch using gradient descent and rank documents
according to their similarity with query representations that are composed from
word representations. We show that NVSM performs better at document ranking
than existing latent semantic vector space methods. The addition of NVSM to a
mixture of lexical language models and a state-of-the-art baseline vector space
model yields a statistically significant increase in retrieval effectiveness.
Consequently, NVSM adds a complementary relevance signal. Next to semantic
matching, we find that NVSM performs well in cases where lexical matching is
needed.
NVSM learns a notion of term specificity directly from the document
collection without feature engineering. We also show that NVSM learns
regularities related to Luhn significance. Finally, we give advice on how to
deploy NVSM in situations where model selection (e.g., cross-validation) is
infeasible. We find that an unsupervised ensemble of multiple models trained
with different hyperparameter values performs better than a single
cross-validated model. Therefore, NVSM can safely be used for ranking documents
without supervised relevance judgments.
Authors' comments: TOIS 2018
Razane Tajeddine, Salim El Rouayheb
We consider the problem of designing PIR scheme on coded data when certain nodes are unresponsive. We provide the construction of $\nu$-robust PIR schemes that can tolerate up to $\nu$ unresponsive nodes. These schemes are adaptive and universally optimal in the sense of achieving (asymptotically) optimal download cost for any number of unresponsive nodes up to $\nu$.
Pedro Saleiro, Natasa Milic-Frayling, Eduarda Mendes Rodrigues, Carlos Soares
We address the task of entity-relationship (E-R) retrieval, i.e, given a
query characterizing types of two or more entities and relationships between
them, retrieve the relevant tuples of related entities. Answering E-R queries
requires gathering and joining evidence from multiple unstructured documents.
In this work, we consider entity and relationships of any type, i.e,
characterized by context terms instead of pre-defined types or relationships.
We propose a novel IR-centric approach for E-R retrieval, that builds on the
basic early fusion design pattern for object retrieval, to provide extensible
entity-relationship representations, suitable for complex, multi-relationships
queries. We performed experiments with Wikipedia articles as entity
representations combined with relationships extracted from ClueWeb-09-B with
FACC1 entity linking. We obtained promising results using 3 different query
collections comprising 469 E-R queries.
Authors' comments: KG4IR (SIGIR workshop)
Hantao Yao, Shiliang Zhang, Yongdong Zhang, Jintao Li, Qi Tian
Fine-Grained Visual Categorization (FGVC) has achieved significant progress
recently. However, the number of fine-grained species could be huge and
dynamically increasing in real scenarios, making it difficult to recognize
unseen objects under the current FGVC framework. This raises an open issue to
perform large-scale fine-grained identification without a complete training
set. Aiming to conquer this issue, we propose a retrieval task named One-Shot
Fine-Grained Instance Retrieval (OSFGIR). "One-Shot" denotes the ability of
identifying unseen objects through a fine-grained retrieval task assisted with
an incomplete auxiliary training set. This paper first presents the detailed
description to OSFGIR task and our collected OSFGIR-378K dataset. Next, we
propose the Convolutional and Normalization Networks (CN-Nets) learned on the
auxiliary dataset to generate a concise and discriminative representation.
Finally, we present a coarse-to-fine retrieval framework consisting of three
components, i.e., coarse retrieval, fine-grained retrieval, and query
expansion, respectively. The framework progressively retrieves images with
similar semantics, and performs fine-grained identification. Experiments show
our OSFGIR framework achieves significantly better accuracy and efficiency than
existing FGVC and image retrieval methods, thus could be a better solution for
large-scale fine-grained object identification.
Authors' comments: Accepted by MM2017, 9 pages, 7 figures
Tuan Hoang, Thanh-Toan Do, Dang-Khoa Le Tan, Ngai-Man Cheung
Convolutional Neural Network (CNN) is a very powerful approach to extract
discriminative local descriptors for effective image search. Recent work adopts
fine-tuned strategies to further improve the discriminative power of the
descriptors. Taking a different approach, in this paper, we propose a novel
framework to achieve competitive retrieval performance. Firstly, we propose
various masking schemes, namely SIFT-mask, SUM-mask, and MAX-mask, to select a
representative subset of local convolutional features and remove a large number
of redundant features. We demonstrate that this can effectively address the
burstiness issue and improve retrieval accuracy. Secondly, we propose to employ
recent embedding and aggregating methods to further enhance feature
discriminability. Extensive experiments demonstrate that our proposed framework
achieves state-of-the-art retrieval accuracy.
Authors' comments: Accepted to ACM MM 2017
Yan Shuo Tan, Roman Vershynin
We consider the problem of phase retrieval, i.e. that of solving systems of
quadratic equations. A simple variant of the randomized Kaczmarz method was
recently proposed for phase retrieval, and it was shown numerically to have a
computational edge over state-of-the-art Wirtinger flow methods. In this paper,
we provide the first theoretical guarantee for the convergence of the
randomized Kaczmarz method for phase retrieval. We show that it is sufficient
to have as many Gaussian measurements as the dimension, up to a constant
factor. Along the way, we introduce a sufficient condition on measurement sets
for which the randomized Kaczmarz method is guaranteed to work. We show that
Gaussian sampling vectors satisfy this property with high probability; this is
proved using a chaining argument coupled with bounds on VC dimension and metric
entropy.
Authors' comments: Revised after comments from referees
Philipp Grohs, Martin Rathmair
We consider the problem of reconstructing a signal $f$ from its spectrogram, i.e., the magnitudes $|V_\varphi f|$ of its Gabor transform $$V_\varphi f (x,y):=\int_{\mathbb{R}}f(t)e^{-\pi (t-x)^2}e^{-2\pi \i y t}dt, \quad x,y\in \mathbb{R}.$$ Such problems occur in a wide range of applications, from optical imaging of nanoscale structures to audio processing and classification. While it is well-known that the solution of the above Gabor phase retrieval problem is unique up to natural identifications, the stability of the reconstruction has remained wide open. The present paper discovers a deep and surprising connection between phase retrieval, spectral clustering and spectral geometry. We show that the stability of the Gabor phase reconstruction is bounded by the reciprocal of the Cheeger constant of the flat metric on $\mathbb{R}^2$, conformally multiplied with $|V_\varphi f|$. The Cheeger constant, in turn, plays a prominent role in the field of spectral clustering, and it precisely quantifies the `disconnectedness' of the measurements $V_\varphi f$. It has long been known that a disconnected support of the measurements results in an instability -- our result for the first time provides a converse in the sense that there are no other sources of instabilities. Due to the fundamental importance of Gabor phase retrieval in coherent diffraction imaging, we also provide a new understanding of the stability properties of these imaging techniques: Contrary to most classical problems in imaging science whose regularization requires the promotion of smoothness or sparsity, the correct regularization of the phase retrieval problem promotes the `connectedness' of the measurements in terms of bounding the Cheeger constant from below. Our work thus, for the first time, opens the door to the development of efficient regularization strategies.