Dayong Tian
In social networks, heterogeneous multimedia data correlate to each other, such as videos and their corresponding tags in YouTube and image-text pairs in Facebook. Nearest neighbor retrieval across multiple modalities on large data sets becomes a hot yet challenging problem. Hashing is expected to be an efficient solution, since it represents data as binary codes. As the bit-wise XOR operations can be fast handled, the retrieval time is greatly reduced. Few existing multimodal hashing methods consider the correlation among hashing bits. The correlation has negative impact on hashing codes. When the hashing code length becomes longer, the retrieval performance improvement becomes slower. In this paper, we propose a minimum correlation regularization (MCR) for multimodal hashing. First, the sigmoid function is used to embed the data matrices. Then, the MCR is applied on the output of sigmoid function. As the output of sigmoid function approximates a binary code matrix, the proposed MCR can efficiently decorrelate the hashing codes. Experiments show the superiority of the proposed method becomes greater as the code length increases.
Shuo Zhang, Krisztian Balog
We introduce and address the problem of ad hoc table retrieval: answering a
keyword query with a ranked list of tables. This task is not only interesting
on its own account, but is also being used as a core component in many other
table-based information access scenarios, such as table completion or table
mining. The main novel contribution of this work is a method for performing
semantic matching between queries and tables. Specifically, we (i) represent
queries and tables in multiple semantic spaces (both discrete sparse and
continuous dense vector representations) and (ii) introduce various similarity
measures for matching those semantic representations. We consider all possible
combinations of semantic representations and similarity measures and use these
as features in a supervised learning model. Using a purpose-built test
collection based on Wikipedia tables, we demonstrate significant and
substantial improvements over a state-of-the-art baseline.
Authors' comments: The web conference 2018 (WWW'18)
Vladimir Katkovnik, Karen Egiazarian
The phase retrieval from multi-frequency intensity (power) observations is
considered. The object to be reconstructed is complex-valued. A novel algorithm
is presented that accomplishes both the object phase (absolute phase) retrieval
and denoising for Poissonian and Gaussian measurements. The algorithm is
derived from the maximum likelihood formulation with Block Matching 3D (BM3D)
sparsity priors. These priors result in two filtering: one is in the complex
domain for complex-valued multi-frequency object images and another one in the
real domain for the object phase. The algorithm is iterative with alternating
projections between the object and measurement variables. The simulation
experiments are produced for Fourier transform image formation and random phase
modulations of the object, then the observations are random object diffraction
patterns. The results demonstrate the success of the algorithm for
reconstruction of the complex phase objects with the high-accuracy performance
even for very noisy data.
Authors' comments: 5 pages, 3 figures
Tanya Piplani, David Bamman
Most of the internet today is composed of digital media that includes videos
and images. With pixels becoming the currency in which most transactions happen
on the internet, it is becoming increasingly important to have a way of
browsing through this ocean of information with relative ease. YouTube has 400
hours of video uploaded every minute and many million images are browsed on
Instagram, Facebook, etc. Inspired by recent advances in the field of deep
learning and success that it has gained on various problems like image
captioning and, machine translation , word2vec , skip thoughts, etc, we present
DeepSeek a natural language processing based deep learning model that allows
users to enter a description of the kind of images that they want to search,
and in response the system retrieves all the images that semantically and
contextually relate to the query. Two approaches are described in the following
sections.
Authors' comments: arXiv admin note: text overlap with arXiv:1706.06064 by other authors
Zhuoran Yang, Lin F. Yang, Ethan X. Fang, Tuo Zhao, Zhaoran Wang, Matey Neykov
Existing nonconvex statistical optimization theory and methods crucially rely
on the correct specification of the underlying "true" statistical models. To
address this issue, we take a first step towards taming model misspecification
by studying the high-dimensional sparse phase retrieval problem with
misspecified link functions. In particular, we propose a simple variant of the
thresholded Wirtinger flow algorithm that, given a proper initialization,
linearly converges to an estimator with optimal statistical accuracy for a
broad family of unknown link functions. We further provide extensive numerical
experiments to support our theoretical findings.
Authors' comments: 56 pages
Zhangjie Cao, Mingsheng Long, Chao Huang, Jianmin Wang
Hashing is widely applied to large-scale image retrieval due to the storage and retrieval efficiency. Existing work on deep hashing assumes that the database in the target domain is identically distributed with the training set in the source domain. This paper relaxes this assumption to a transfer retrieval setting, which allows the database and the training set to come from different but relevant domains. However, the transfer retrieval setting will introduce two technical difficulties: first, the hash model trained on the source domain cannot work well on the target domain due to the large distribution gap; second, the domain gap makes it difficult to concentrate the database points to be within a small Hamming ball. As a consequence, transfer retrieval performance within Hamming Radius 2 degrades significantly in existing hashing methods. This paper presents Transfer Adversarial Hashing (TAH), a new hybrid deep architecture that incorporates a pairwise $t$-distribution cross-entropy loss to learn concentrated hash codes and an adversarial network to align the data distributions between the source and target domains. TAH can generate compact transfer hash codes for efficient image retrieval on both source and target domains. Comprehensive experiments validate that TAH yields state of the art Hamming space retrieval performance on standard datasets.
Liang Zhang, Gang Wang, Georgios B. Giannakis, Jie Chen
The problem of reconstructing a sparse signal vector from magnitude-only measurements (a.k.a., compressive phase retrieval), emerges naturally in diverse applications, but it is NP-hard in general. Building on recent advances in nonconvex optimization, this paper puts forth a new algorithm that is termed compressive reweighted amplitude flow and abbreviated as CRAF, for compressive phase retrieval. Specifically, CRAF operates in two stages. The first stage seeks a sparse initial guess via a new spectral procedure. In the second stage, CRAF implements a few hard thresholding based iterations using reweighted gradients. When there are sufficient measurements, CRAF provably recovers the underlying signal vector exactly with high probability under suitable conditions. Moreover, its sample complexity coincides with that of the state-of-the-art procedures. Finally, substantial simulated tests showcase remarkable performance of the new spectral initialization, as well as improved exact recovery relative to competing alternatives.
Qing Qu, Yuqian Zhang, Yonina C. Eldar, John Wright
We study the convolutional phase retrieval problem, of recovering an unknown
signal $\mathbf x \in \mathbb C^n $ from $m$ measurements consisting of the
magnitude of its cyclic convolution with a given kernel $\mathbf a \in \mathbb
C^m $. This model is motivated by applications such as channel estimation,
optics, and underwater acoustic communication, where the signal of interest is
acted on by a given channel/filter, and phase information is difficult or
impossible to acquire. We show that when $\mathbf a$ is random and the number
of observations $m$ is sufficiently large, with high probability $\mathbf x$
can be efficiently recovered up to a global phase shift using a combination of
spectral initialization and generalized gradient descent. The main challenge is
coping with dependencies in the measurement operator. We overcome this
challenge by using ideas from decoupling theory, suprema of chaos processes and
the restricted isometry property of random circulant matrices, and recent
analysis of alternating minimization methods.
Authors' comments: 64 pages , 9 figures, appeared in NeurIPS 2017. Accepted at IEEE
Transactions on Information Theory. This is the final (minor) update: fixed
typos and grammar issues
Aniruddha Tammewar, Monik Pamecha, Chirag Jain, Apurva Nagvenkar, Krupal Modi
In this paper, we present a hybrid model that combines a neural
conversational model and a rule-based graph dialogue system that assists users
in scheduling reminders through a chat conversation. The graph based system has
high precision and provides a grammatically accurate response but has a low
recall. The neural conversation model can cater to a variety of requests, as it
generates the responses word by word as opposed to using canned responses. The
hybrid system shows significant improvements over the existing baseline system
of rule based approach and caters to complex queries with a domain-restricted
neural model. Restricting the conversation topic and combination of graph based
retrieval system with a neural generative model makes the final system robust
enough for a real world application.
Authors' comments: DEEPDIAL-18, AAAI-2018
Binanda Sengupta, Sushmita Ruj
Cloud servers offer data outsourcing facility to their clients. A client
outsources her data without having any copy at her end. Therefore, she needs a
guarantee that her data are not modified by the server which may be malicious.
Data auditing is performed on the outsourced data to resolve this issue.
Moreover, the client may want all her data to be stored untampered. In this
chapter, we describe proofs of retrievability (POR) that convince the client
about the integrity of all her data.
Authors' comments: A version has been published as a book chapter in Guide to Security
Assurance for Cloud Computing (Springer International Publishing Switzerland
2015)
Daniel Heestermans Svendsen, Luca Martino, Manuel Campos-Taberner, Francisco Javier García-Haro, Gustau Camps-Valls
Solving inverse problems is central to geosciences and remote sensing.
Radiative transfer models (RTMs) represent mathematically the physical laws
which govern the phenomena in remote sensing applications (forward models). The
numerical inversion of the RTM equations is a challenging and computationally
demanding problem, and for this reason, often the application of a nonlinear
statistical regression is preferred. In general, regression models predict the
biophysical parameter of interest from the corresponding received radiance.
However, this approach does not employ the physical information encoded in the
RTMs. An alternative strategy, which attempts to include the physical
knowledge, consists in learning a regression model trained using data simulated
by an RTM code. In this work, we introduce a nonlinear nonparametric regression
model which combines the benefits of the two aforementioned approaches. The
inversion is performed taking into account jointly both real observations and
RTM-simulated data. The proposed Joint Gaussian Process (JGP) provides a solid
framework for exploiting the regularities between the two types of data. The
JGP automatically detects the relative quality of the simulated and real data,
and combines them accordingly. This occurs by learning an additional
hyper-parameter w.r.t. a standard GP model, and fitting parameters through
maximizing the pseudo-likelihood of the real observations. The resulting scheme
is both simple and robust, i.e., capable of adapting to different scenarios.
The advantages of the JGP method compared to benchmark strategies are shown
considering RTM-simulated and real observations in different experiments.
Specifically, we consider leaf area index (LAI) retrieval from Landsat data
combined with simulated data generated by the PROSAIL model.
Authors' comments: 21 pages single column, Accepted for publication in IEEE Transactions
on Geoscience and Remote Sensing
Damek Davis, Dmitriy Drusvyatskiy, Courtney Paquette
We consider a popular nonsmooth formulation of the real phase retrieval
problem. We show that under standard statistical assumptions, a simple
subgradient method converges linearly when initialized within a constant
relative distance of an optimal solution. Seeking to understand the
distribution of the stationary points of the problem, we complete the paper by
proving that as the number of Gaussian measurements increases, the stationary
points converge to a codimension two set, at a controlled rate. Experiments on
image recovery problems illustrate the developed algorithm and theory.
Authors' comments: 42 Pages, 15 figures
Zhonghao Wang, Yujun Gu, Ya Zhang, Jun Zhou, Xiao Gu
Clothing retrieval is a challenging problem in computer vision. With the
advance of Convolutional Neural Networks (CNNs), the accuracy of clothing
retrieval has been significantly improved. FashionNet[1], a recent study,
proposes to employ a set of artificial features in the form of landmarks for
clothing retrieval, which are shown to be helpful for retrieval. However, the
landmark detection module is trained with strong supervision which requires
considerable efforts to obtain. In this paper, we propose a self-learning
Visual Attention Model (VAM) to extract attention maps from clothing images.
The VAM is further connected to a global network to form an end-to-end network
structure through Impdrop connection which randomly Dropout on the feature maps
with the probabilities given by the attention map. Extensive experiments on
several widely used benchmark clothing retrieval data sets have demonstrated
the promise of the proposed method. We also show that compared to the trivial
Product connection, the Impdrop connection makes the network structure more
robust when training sets of limited size are used.
Authors' comments: 4 pages, to be presented at IEEE VCIP 2017
Andras Tüzkö, Christian Herrmann, Daniel Manger, Jürgen Beyerer
Current logo retrieval research focuses on closed set scenarios. We argue
that the logo domain is too large for this strategy and requires an open set
approach. To foster research in this direction, a large-scale logo dataset,
called Logos in the Wild, is collected and released to the public. A typical
open set logo retrieval application is, for example, assessing the
effectiveness of advertisement in sports event broadcasts. Given a query sample
in shape of a logo image, the task is to find all further occurrences of this
logo in a set of images or videos. Currently, common logo retrieval approaches
are unsuitable for this task because of their closed world assumption. Thus, an
open set logo retrieval method is proposed in this work which allows searching
for previously unseen logos by a single query sample. A two stage concept with
separate logo detection and comparison is proposed where both modules are based
on task specific CNNs. If trained with the Logos in the Wild data, significant
performance improvements are observed, especially compared with
state-of-the-art closed set approaches.
Authors' comments: accepted at VISAPP 2018
Siddharth Gandhi, Nikku Madhusudhan
Thermal emission spectra of exoplanets provide constraints on the chemical
compositions, pressure-temperature (P-T) profiles, and energy transport in
exoplanetary atmospheres. Accurate inferences of these properties rely on the
robustness of the atmospheric retrieval methods employed. While extant
retrieval codes have provided significant constraints on molecular abundances
and temperature profiles in several exoplanetary atmospheres, the constraints
on their deviations from thermal and chemical equilibria have yet to be fully
explored. Our present work is a step in this direction. We report HyDRA, a
disequilibrium retrieval framework for thermal emission spectra of exoplanetary
atmospheres. The retrieval code uses the standard architecture of a parametric
atmospheric model coupled with Bayesian statistical inference using the Nested
Sampling algorithm. For a given dataset, the retrieved compositions and P-T
profiles are used in tandem with the GENESIS self-consistent atmospheric model
to constrain layer-by-layer deviations from chemical and radiative-convective
equilibrium in the observable atmosphere. We demonstrate HyDRA on the Hot
Jupiter WASP-43b with a high-precision emission spectrum. We retrieve an H2O
mixing ratio of log(H2O) = -3.54^{+0.82}_{-0.52}, consistent with previous
studies. We detect H2O and a combined CO/CO2 at 8-sigma significance. We find
the dayside P-T profile to be consistent with radiative-convective equilibrium
within the 1-sigma limits and with low day-night redistribution, consistent
with previous studies. The derived compositions are also consistent with
thermochemical equilibrium for the corresponding distribution of P-T profiles.
In the era of high precision and high resolution emission spectroscopy, HyDRA
provides a path to retrieve disequilibrium phenomena in exoplanetary
atmospheres.
Authors' comments: 20 pages, 13 figures, Accepted for publication in MNRAS
H. R. Tizhoosh, G. J. Czarnota
Marking tumors and organs is a challenging task suffering from both inter-
and intra-observer variability. The literature quantifies observer variability
by generating consensus among multiple experts when they mark the same image.
Automatically building consensus contours to establish quality assurance for
image segmentation is presently absent in the clinical practice. As the
\emph{big data} becomes more and more available, techniques to access a large
number of existing segments of multiple experts becomes possible. Fast
algorithms are, hence, required to facilitate the search for similar cases. The
present work puts forward a potential framework that tested with small datasets
(both synthetic and real images) displays the reliability of finding similar
images. In this paper, the idea of content-based barcodes is used to retrieve
similar cases in order to build consensus contours in medical image
segmentation. This approach may be regarded as an extension of the conventional
atlas-based segmentation that generally works with rather small atlases due to
required computational expenses. The fast segment-retrieval process via
barcodes makes it possible to create and use large atlases, something that
directly contributes to the quality of the consensus building. Because the
accuracy of experts' contours must be measured, we first used 500 synthetic
prostate images with their gold markers and delineations by 20 simulated users.
The fast barcode-guided computed consensus delivered an average error of
$8\%\!\pm\!5\%$ compared against the gold standard segments. Furthermore, we
used magnetic resonance images of prostates from 15 patients delineated by 5
oncologists and selected the best delineations to serve as the gold-standard
segments. The proposed barcode atlas achieved a Jaccard overlap of
$87\%\!\pm\!9\%$ with the contours of the gold-standard segments.
Authors' comments: Images used in this paper are available to the public:
http://kimia.uwaterloo.ca/
Shuang Li, Peter Mathews
We propose a novel image retrieval framework for visual saliency detection using information about salient objects contained within bounding box annotations for similar images. For each test image, we train a customized SVM from similar example images to predict the saliency values of its object proposals and generate an external saliency map (ES) by aggregating the regional scores. To overcome limitations caused by the size of the training dataset, we also propose an internal optimization module which computes an internal saliency map (IS) by measuring the low-level contrast information of the test image. The two maps, ES and IS, have complementary properties so we take a weighted combination to further improve the detection performance. Experimental results on several challenging datasets demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods.
Julien Lavauzelle
Private information retrieval (PIR) protocols allow a user to retrieve entries of a database without revealing the index of the desired item. Information-theoretical privacy can be achieved by the use of several servers and specific retrieval algorithms. Most of known PIR protocols focus on decreasing the number of bits exchanged between the client and the server(s) during the retrieval process. On another side, Fazeli et. al. introduced so-called PIR codes in order to reduce the storage overhead on the servers. However, only a few works address the issue of the computation complexity of the servers. In this paper, we show that a specific encoding of the database provides PIR protocols with reasonable communication complexity, low storage overhead and optimal computational complexity for the servers. This encoding is based on incidence matrices of transversal designs, from which a natural and efficient recovering algorithm is derived. We also present instances of our construction, making use of finite geometries and orthogonal arrays, and we finally give a generalisation of our main construction for resisting collusions of servers.
Shiv Ram Dubey
The local descriptors have been the backbone of most of the computer vision
problems. Most of the existing local descriptors are generated over the raw
input images. In order to increase the discriminative power of the local
descriptors, some researchers converted the raw image into multiple images with
the help of some high and low pass frequency filters, then the local
descriptors are computed over each filtered image and finally concatenated into
a single descriptor. By doing so, these approaches do not utilize the inter
frequency relationship which causes the less improvement in the discriminative
power of the descriptor that could be achieved. In this paper, this problem is
solved by utilizing the decoder concept of multi-channel decoded local binary
pattern over the multi-frequency patterns. A frequency decoded local binary
pattern (FDLBP) is proposed with two decoders. Each decoder works with one low
frequency pattern and two high frequency patterns. Finally, the descriptors
from both decoders are concatenated to form the single descriptor. The face
retrieval experiments are conducted over four benchmarks and challenging
databases such as PaSC, LFW, PubFig, and ESSEX. The experimental results
confirm the superiority of the FDLBP descriptor as compared to the
state-of-the-art descriptors such as LBP, SOBEL_LBP, BoF_LBP, SVD_S_LBP, mdLBP,
etc.
Authors' comments: Accepted in Multimedia Tools and Applications, Springer
Christina Lioma
Building machines that can understand text like humans is an AI-complete
problem. A great deal of research has already gone into this, with astounding
results, allowing everyday people to discuss with their telephones, or have
their reading materials analysed and classified by computers. A prerequisite
for processing text semantics, common to the above examples, is having some
computational representation of text as an abstract object. Operations on this
representation practically correspond to making semantic inferences, and by
extension simulating understanding text. The complexity and granularity of
semantic processing that can be realised is constrained by the mathematical and
computational robustness, expressiveness, and rigour of the tools used.
This dissertation contributes a series of such tools, diverse in their
mathematical formulation, but common in their application to model semantic
inferences when machines process text. These tools are principally expressed in
nine distinct models that capture aspects of semantic dependence in highly
interpretable and non-complex ways. This dissertation further reflects on
present and future problems with the current research paradigm in this area,
and makes recommendations on how to overcome them.
The amalgamation of the body of work presented in this dissertation advances
the complexity and granularity of semantic inferences that can be made
automatically by machines.
Authors' comments: This document is a doktordisputats - a dissertation within the Danish
academic system required to obtain the degree of \textit{Doctor Scientiarum},
in form and function equivalent to the French and German Habilitation and the
Higher Doctorate of the Commonwealth