Yi Li, Vasileios Nakos
In the compressive phase retrieval problem, or phaseless compressed sensing,
or compressed sensing from intensity only measurements, the goal is to
reconstruct a sparse or approximately $k$-sparse vector $x \in \mathbb{R}^n$
given access to $y= |\Phi x|$, where $|v|$ denotes the vector obtained from
taking the absolute value of $v\in\mathbb{R}^n$ coordinate-wise. In this paper
we present sublinear-time algorithms for different variants of the compressive
phase retrieval problem which are akin to the variants considered for the
classical compressive sensing problem in theoretical computer science. Our
algorithms use pure combinatorial techniques and near-optimal number of
measurements.
Authors' comments: The ell_2/ell_2 algorithm was substituted by a modification of the
ell_infty/ell_2 algorithm which strictly subsumes it
Xin Ji, Wei Wang, Meihui Zhang, Yang Yang
With the proliferation of e-commerce websites and the ubiquitousness of smart
phones, cross-domain image retrieval using images taken by smart phones as
queries to search products on e-commerce websites is emerging as a popular
application. One challenge of this task is to locate the attention of both the
query and database images. In particular, database images, e.g. of fashion
products, on e-commerce websites are typically displayed with other
accessories, and the images taken by users contain noisy background and large
variations in orientation and lighting. Consequently, their attention is
difficult to locate. In this paper, we exploit the rich tag information
available on the e-commerce websites to locate the attention of database
images. For query images, we use each candidate image in the database as the
context to locate the query attention. Novel deep convolutional neural network
architectures, namely TagYNet and CtxYNet, are proposed to learn the attention
weights and then extract effective representations of the images. Experimental
results on public datasets confirm that our approaches have significant
improvement over the existing methods in terms of the retrieval accuracy and
efficiency.
Authors' comments: 8 pages with an extra reference page
Swanand Kadhe, Brenden Garcia, Anoosheh Heidarzadeh, Salim El Rouayheb, Alex Sprintson
We study the problem of Private Information Retrieval (PIR) in the presence
of prior side information. The problem setup includes a database of $K$
independent messages possibly replicated on several servers, and a user that
needs to retrieve one of these messages. In addition, the user has some prior
side information in the form of a subset of $M$ messages, not containing the
desired message and unknown to the servers. This problem is motivated by
practical settings in which the user can obtain side information
opportunistically from other users or has previously downloaded some messages
using classical PIR schemes. The objective of the user is to retrieve the
required message without revealing its identity while minimizing the amount of
data downloaded from the servers.
We focus on achieving information-theoretic privacy in two scenarios: (i) the
user wants to protect jointly its demand and side information; (ii) the user
wants to protect only the information about its demand, but not the side
information. To highlight the role of side information, we focus first on the
case of a single server (single database). In the first scenario, we prove that
the minimum download cost is $K-M$ messages, and in the second scenario it is
$\lceil \frac{K}{M+1}\rceil$ messages, which should be compared to $K$
messages, the minimum download cost in the case of no side information. Then,
we extend some of our results to the case of the database replicated on
multiple servers. Our proof techniques relate PIR with side information to the
index coding problem. We leverage this connection to prove converse results, as
well as to design achievability schemes.
Authors' comments: Shorter version of the paper is accepted in Allerton Conference 2017
Shuo Zhang, Krisztian Balog
We address the task of ranking objects (such as people, blogs, or verticals)
that, unlike documents, do not have direct term-based representations. To be
able to match them against keyword queries, evidence needs to be amassed from
documents that are associated with the given object. We present two design
patterns, i.e., general reusable retrieval strategies, which are able to
encompass most existing approaches from the past. One strategy combines
evidence on the term level (early fusion), while the other does it on the
document level (late fusion). We demonstrate the generality of these patterns
by applying them to three different object retrieval tasks: expert finding,
blog distillation, and vertical ranking.
Authors' comments: Proceedings of the 39th European conference on Advances in
Information Retrieval (ECIR '17), 2017
Jingkuan Song
The most striking successes in image retrieval using deep hashing have mostly
involved discriminative models, which require labels. In this paper, we use
binary generative adversarial networks (BGAN) to embed images to binary codes
in an unsupervised way. By restricting the input noise variable of generative
adversarial networks (GAN) to be binary and conditioned on the features of each
input image, BGAN can simultaneously learn a binary representation per image,
and generate an image plausibly similar to the original one. In the proposed
framework, we address two main problems: 1) how to directly generate binary
codes without relaxation? 2) how to equip the binary representation with the
ability of accurate image retrieval? We resolve these problems by proposing new
sign-activation strategy and a loss function steering the learning process,
which consists of new models for adversarial loss, a content loss, and a
neighborhood structure loss. Experimental results on standard datasets
(CIFAR-10, NUSWIDE, and Flickr) demonstrate that our BGAN significantly
outperforms existing hashing methods by up to 107\% in terms of~mAP (See Table
tab.res.map.comp) Our anonymous code is available at:
https://github.com/htconquer/BGAN.
Authors' comments: arXiv admin note: text overlap with arXiv:1702.00758 by other authors
Christophe Van Gysel, Maarten de Rijke, Evangelos Kanoulas
We propose the Neural Vector Space Model (NVSM), a method that learns
representations of documents in an unsupervised manner for news article
retrieval. In the NVSM paradigm, we learn low-dimensional representations of
words and documents from scratch using gradient descent and rank documents
according to their similarity with query representations that are composed from
word representations. We show that NVSM performs better at document ranking
than existing latent semantic vector space methods. The addition of NVSM to a
mixture of lexical language models and a state-of-the-art baseline vector space
model yields a statistically significant increase in retrieval effectiveness.
Consequently, NVSM adds a complementary relevance signal. Next to semantic
matching, we find that NVSM performs well in cases where lexical matching is
needed.
NVSM learns a notion of term specificity directly from the document
collection without feature engineering. We also show that NVSM learns
regularities related to Luhn significance. Finally, we give advice on how to
deploy NVSM in situations where model selection (e.g., cross-validation) is
infeasible. We find that an unsupervised ensemble of multiple models trained
with different hyperparameter values performs better than a single
cross-validated model. Therefore, NVSM can safely be used for ranking documents
without supervised relevance judgments.
Authors' comments: TOIS 2018
Razane Tajeddine, Salim El Rouayheb
We consider the problem of designing PIR scheme on coded data when certain nodes are unresponsive. We provide the construction of $\nu$-robust PIR schemes that can tolerate up to $\nu$ unresponsive nodes. These schemes are adaptive and universally optimal in the sense of achieving (asymptotically) optimal download cost for any number of unresponsive nodes up to $\nu$.
Pedro Saleiro, Natasa Milic-Frayling, Eduarda Mendes Rodrigues, Carlos Soares
We address the task of entity-relationship (E-R) retrieval, i.e, given a
query characterizing types of two or more entities and relationships between
them, retrieve the relevant tuples of related entities. Answering E-R queries
requires gathering and joining evidence from multiple unstructured documents.
In this work, we consider entity and relationships of any type, i.e,
characterized by context terms instead of pre-defined types or relationships.
We propose a novel IR-centric approach for E-R retrieval, that builds on the
basic early fusion design pattern for object retrieval, to provide extensible
entity-relationship representations, suitable for complex, multi-relationships
queries. We performed experiments with Wikipedia articles as entity
representations combined with relationships extracted from ClueWeb-09-B with
FACC1 entity linking. We obtained promising results using 3 different query
collections comprising 469 E-R queries.
Authors' comments: KG4IR (SIGIR workshop)
Hantao Yao, Shiliang Zhang, Yongdong Zhang, Jintao Li, Qi Tian
Fine-Grained Visual Categorization (FGVC) has achieved significant progress
recently. However, the number of fine-grained species could be huge and
dynamically increasing in real scenarios, making it difficult to recognize
unseen objects under the current FGVC framework. This raises an open issue to
perform large-scale fine-grained identification without a complete training
set. Aiming to conquer this issue, we propose a retrieval task named One-Shot
Fine-Grained Instance Retrieval (OSFGIR). "One-Shot" denotes the ability of
identifying unseen objects through a fine-grained retrieval task assisted with
an incomplete auxiliary training set. This paper first presents the detailed
description to OSFGIR task and our collected OSFGIR-378K dataset. Next, we
propose the Convolutional and Normalization Networks (CN-Nets) learned on the
auxiliary dataset to generate a concise and discriminative representation.
Finally, we present a coarse-to-fine retrieval framework consisting of three
components, i.e., coarse retrieval, fine-grained retrieval, and query
expansion, respectively. The framework progressively retrieves images with
similar semantics, and performs fine-grained identification. Experiments show
our OSFGIR framework achieves significantly better accuracy and efficiency than
existing FGVC and image retrieval methods, thus could be a better solution for
large-scale fine-grained object identification.
Authors' comments: Accepted by MM2017, 9 pages, 7 figures
Tuan Hoang, Thanh-Toan Do, Dang-Khoa Le Tan, Ngai-Man Cheung
Convolutional Neural Network (CNN) is a very powerful approach to extract
discriminative local descriptors for effective image search. Recent work adopts
fine-tuned strategies to further improve the discriminative power of the
descriptors. Taking a different approach, in this paper, we propose a novel
framework to achieve competitive retrieval performance. Firstly, we propose
various masking schemes, namely SIFT-mask, SUM-mask, and MAX-mask, to select a
representative subset of local convolutional features and remove a large number
of redundant features. We demonstrate that this can effectively address the
burstiness issue and improve retrieval accuracy. Secondly, we propose to employ
recent embedding and aggregating methods to further enhance feature
discriminability. Extensive experiments demonstrate that our proposed framework
achieves state-of-the-art retrieval accuracy.
Authors' comments: Accepted to ACM MM 2017
Yan Shuo Tan, Roman Vershynin
We consider the problem of phase retrieval, i.e. that of solving systems of
quadratic equations. A simple variant of the randomized Kaczmarz method was
recently proposed for phase retrieval, and it was shown numerically to have a
computational edge over state-of-the-art Wirtinger flow methods. In this paper,
we provide the first theoretical guarantee for the convergence of the
randomized Kaczmarz method for phase retrieval. We show that it is sufficient
to have as many Gaussian measurements as the dimension, up to a constant
factor. Along the way, we introduce a sufficient condition on measurement sets
for which the randomized Kaczmarz method is guaranteed to work. We show that
Gaussian sampling vectors satisfy this property with high probability; this is
proved using a chaining argument coupled with bounds on VC dimension and metric
entropy.
Authors' comments: Revised after comments from referees
Philipp Grohs, Martin Rathmair
We consider the problem of reconstructing a signal $f$ from its spectrogram, i.e., the magnitudes $|V_\varphi f|$ of its Gabor transform $$V_\varphi f (x,y):=\int_{\mathbb{R}}f(t)e^{-\pi (t-x)^2}e^{-2\pi \i y t}dt, \quad x,y\in \mathbb{R}.$$ Such problems occur in a wide range of applications, from optical imaging of nanoscale structures to audio processing and classification. While it is well-known that the solution of the above Gabor phase retrieval problem is unique up to natural identifications, the stability of the reconstruction has remained wide open. The present paper discovers a deep and surprising connection between phase retrieval, spectral clustering and spectral geometry. We show that the stability of the Gabor phase reconstruction is bounded by the reciprocal of the Cheeger constant of the flat metric on $\mathbb{R}^2$, conformally multiplied with $|V_\varphi f|$. The Cheeger constant, in turn, plays a prominent role in the field of spectral clustering, and it precisely quantifies the `disconnectedness' of the measurements $V_\varphi f$. It has long been known that a disconnected support of the measurements results in an instability -- our result for the first time provides a converse in the sense that there are no other sources of instabilities. Due to the fundamental importance of Gabor phase retrieval in coherent diffraction imaging, we also provide a new understanding of the stability properties of these imaging techniques: Contrary to most classical problems in imaging science whose regularization requires the promotion of smoothness or sparsity, the correct regularization of the phase retrieval problem promotes the `connectedness' of the measurements in terms of bounding the Cheeger constant from below. Our work thus, for the first time, opens the door to the development of efficient regularization strategies.
Wen-Jun Zeng, H. C. So
Phase retrieval aims at recovering a complex-valued signal from magnitude-only measurements, which attracts much attention since it has numerous applications in many disciplines. However, phase recovery involves solving a system of quadratic equations, indicating that it is a challenging nonconvex optimization problem. To tackle phase retrieval in an effective and efficient manner, we apply coordinate descent (CD) such that a single unknown is solved at each iteration while all other variables are kept fixed. As a result, only minimization of a univariate quartic polynomial is needed which is easily achieved by finding the closed-form roots of a cubic equation. Three computationally simple algorithms referred to as cyclic, randomized and greedy CDs, based on different updating rules, are devised. It is proved that the three CDs globally converge to a stationary point of the nonconvex problem, and specifically, the randomized CD locally converges to the global minimum and attains exact recovery at a geometric rate with high probability if the sample size is large enough. The cyclic and randomized CDs are also modified via minimization of the $\ell_1$-regularized quartic polynomial for phase retrieval of sparse signals. Furthermore, a novel application of the three CDs, namely, blind equalization in digital communications, is proposed. It is demonstrated that the CD methodology is superior to the state-of-the-art techniques in terms of computational efficiency and/or recovery performance.
Zhao Yan, Duyu Tang, Nan Duan, Junwei Bao, Yuanhua Lv, Ming Zhou, Zhoujun Li
Understanding the connections between unstructured text and semi-structured table is an important yet neglected problem in natural language processing. In this work, we focus on content-based table retrieval. Given a query, the task is to find the most relevant table from a collection of tables. Further progress towards improving this area requires powerful models of semantic matching and richer training and evaluation resources. To remedy this, we present a ranking based approach, and implement both carefully designed features and neural network architectures to measure the relevance between a query and the content of a table. Furthermore, we release an open-domain dataset that includes 21,113 web queries for 273,816 tables. We conduct comprehensive experiments on both real world and synthetic datasets. Results verify the effectiveness of our approach and present the challenges for this task.
Tamir Bendory, Robert Beinert, Yonina C. Eldar
The problem of recovering a signal from its phaseless Fourier transform measurements, called Fourier phase retrieval, arises in many applications in engineering and science. Fourier phase retrieval poses fundamental theoretical and algorithmic challenges. In general, there is no unique mapping between a one-dimensional signal and its Fourier magnitude and therefore the problem is ill-posed. Additionally, while almost all multidimensional signals are uniquely mapped to their Fourier magnitude, the performance of existing algorithms is generally not well-understood. In this chapter we survey methods to guarantee uniqueness in Fourier phase retrieval. We then present different algorithmic approaches to retrieve the signal in practice. We conclude by outlining some of the main open questions in this field.
Tao Zhou, Muhao Chen, Jie Yu, Demetri Terzopoulos
Following the recent progress in image classification and captioning using
deep learning, we develop a novel natural language person retrieval system
based on an attention mechanism. More specifically, given the description of a
person, the goal is to localize the person in an image. To this end, we first
construct a benchmark dataset for natural language person retrieval. To do so,
we generate bounding boxes for persons in a public image dataset from the
segmentation masks, which are then annotated with descriptions and attributes
using the Amazon Mechanical Turk. We then adopt a region proposal network in
Faster R-CNN as a candidate region generator. The cropped images based on the
region proposals as well as the whole images with attention weights are fed
into Convolutional Neural Networks for visual feature extraction, while the
natural language expression and attributes are input to Bidirectional Long
Short- Term Memory (BLSTM) models for text feature extraction. The visual and
text features are integrated to score region proposals, and the one with the
highest score is retrieved as the output of our system. The experimental
results show significant improvement over the state-of-the-art method for
generic object retrieval and this line of research promises to benefit search
in surveillance video footage.
Authors' comments: CVPR 2017 Workshop (vision meets cognition)
Bálint Zoltán Daróczy
In this thesis we examined several multimodal feature extraction and learning
methods for retrieval and classification purposes. We reread briefly some
theoretical results of learning in Section 2 and reviewed several generative
and discriminative models in Section 3 while we described the similarity kernel
in Section 4. We examined different aspects of the multimodal image retrieval
and classification in Section 5 and suggested methods for identifying quality
assessments of Web documents in Section 6. In our last problem we proposed
similarity kernel for time-series based classification. The experiments were
carried over publicly available datasets and source codes for the most
essential parts are either open source or released. Since the used similarity
graphs (Section 4.2) are greatly constrained for computational purposes, we
would like to continue work with more complex, evolving and capable graphs and
apply for different problems such as capturing the rapid change in the
distribution (e.g. session based recommendation) or complex graphs of the
literature work. The similarity kernel with the proper metrics reaches and in
many cases improves over the state-of-the-art. Hence we may conclude generative
models based on instance similarities with multiple modes is a generally
applicable model for classification and regression tasks ranging over various
domains, including but not limited to the ones presented in this thesis. More
generally, the Fisher kernel is not only unique in many ways but one of the
most powerful kernel functions. Therefore we may exploit the Fisher kernel in
the future over widely used generative models, such as Boltzmann Machines
[Hinton et al., 1984], a particular subset, the Restricted Boltzmann Machines
and Deep Belief Networks [Hinton et al., 2006]), Latent Dirichlet Allocation
[Blei et al., 2003] or Hidden Markov Models [Baum and Petrie, 1966] to name a
few.
Authors' comments: doctoral thesis, 2016
Ziyang Yuan, Qi Wang, Hongxia Wang
Phase retrieval(PR) problem is a kind of ill-condition inverse problem which can be found in various of applications. Utilizing the sparse priority, an algorithm called SWF(Sparse Wirtinger Flow) is proposed in this paper to deal with sparse PR problem based on the Wirtinger flow method. SWF firstly recovers the support of the signal and then updates the evaluation by hard thresholding method with an elaborate initialization. Theoretical analyses show that SWF has a geometric convergence for any $k$ sparse $n$ length signal with the sampling complexity $\mathcal{O}(k^2\mathrm{log}n)$. To get $\varepsilon$ accuracy, the computational complexity of SWF is $\mathcal{O}(k^3n\mathrm{log}n\mathrm{log}\frac{1}{\varepsilon})$. Numerical tests also demonstrate that SWF performs better than state-of-the-art methods especially when we have no priori knowledge about sparsity $k$. Moreover, SWF is also robust to the noise
Zhifang Zhang, Jingke Xu
Suppose there are $N$ distributed databases each storing a full set of $M$
independent files. A user wants to retrieve $r$ out of the $M$ files without
revealing the identity of the $r$ files. When $r=1$ it is the classic problem
of private information retrieval (PIR). In this paper we study the problem of
private multi-file retrieval (PMFR) which covers the case of general $r$. We
first prove an upper bound on the capacity of PMFR schemes which indicates the
minimum possible download size per unit of retrieved files. Then we design a
general PMFR scheme which happens to attain the upper bound when
$r\geq\frac{M}{2}$, thus achieving the optimal communication cost. As $r$ goes
down we show the trivial approach of executing $r$ independent PIR instances
achieves the near optimal communication cost. Comparing with the
capacity-achieving PIR schemes, our PMFR scheme reduces the number of
subpackages needed for each file from $N^M$ to $N^2$, which implies a great
reduction of implementation complexity.
Authors' comments: This paper has been covered by another paper
Besnik Fetahu, Ujwal Gadiraju, Stefan Dietze
The increasing amount of data on the Web, in particular of Linked Data, has led to a diverse landscape of datasets, which make entity retrieval a challenging task. Explicit cross-dataset links, for instance to indicate co-references or related entities can significantly improve entity retrieval. However, only a small fraction of entities are interlinked through explicit statements. In this paper, we propose a two-fold entity retrieval approach. In a first, offline preprocessing step, we cluster entities based on the \emph{x--means} and \emph{spectral} clustering algorithms. In the second step, we propose an optimized retrieval model which takes advantage of our precomputed clusters. For a given set of entities retrieved by the BM25F retrieval approach and a given user query, we further expand the result set with relevant entities by considering features of the queries, entities and the precomputed clusters. Finally, we re-rank the expanded result set with respect to the relevance to the query. We perform a thorough experimental evaluation on the Billions Triple Challenge (BTC12) dataset. The proposed approach shows significant improvements compared to the baseline and state of the art approaches.