Hiteshwar Kumar Azad, Akshay Deepak
With the ever increasing size of the web, relevant information extraction on
the Internet with a query formed by a few keywords has become a big challenge.
Query Expansion (QE) plays a crucial role in improving searches on the
Internet. Here, the user's initial query is reformulated by adding additional
meaningful terms with similar significance. QE -- as part of information
retrieval (IR) -- has long attracted researchers' attention. It has become very
influential in the field of personalized social document, question answering,
cross-language IR, information filtering and multimedia IR. Research in QE has
gained further prominence because of IR dedicated conferences such as TREC
(Text Information Retrieval Conference) and CLEF (Conference and Labs of the
Evaluation Forum). This paper surveys QE techniques in IR from 1960 to 2017
with respect to core techniques, data sources used, weighting and ranking
methodologies, user participation and applications -- bringing out similarities
and differences.
Authors' comments: 43 pages, 5 figures, 10 tables
Sercan O. Arik, Joseph M. Kahn
Phase retrieval has important applications in optical imaging, communications and sensing. Lifting the dimensionality of the problem allows phase retrieval to be approximated as a convex optimization problem in a higher-dimensional space. Convex optimization-based phase retrieval has been shown to yield high accuracy, yet its low-complexity implementation has not been explored. In this paper, we study three fundamental approaches for its low-complexity implementation: the projected gradient method, the Nesterov accelerated gradient method, and the alternating direction method of multipliers (ADMM) method. We derive the corresponding estimation algorithms and evaluate their complexities. We compare their performance in the application area of direct-detection mode-division multiplexing. We demonstrate that they yield negligible estimation penalties (less than 0.2 dB for transmitter processing and less than 0.6 dB for receiver equalization) while yielding low computational cost, as their implementation complexities all scale quadratically in the number of unknown parameters. Among the three methods, ADMM achieves convergence after the smallest number of iterations.
Gaurav Manek, Jie Lin, Vijay Chandrasekhar, Lingyu Duan, Sateesh Giduthuri, Xiaoli Li, Tomaso Poggio
In this work, we focus on the problem of image instance retrieval with deep
descriptors extracted from pruned Convolutional Neural Networks (CNN). The
objective is to heavily prune convolutional edges while maintaining retrieval
performance. To this end, we introduce both data-independent and data-dependent
heuristics to prune convolutional edges, and evaluate their performance across
various compression rates with different deep descriptors over several
benchmark datasets. Further, we present an end-to-end framework to fine-tune
the pruned network, with a triplet loss function specially designed for the
retrieval task. We show that the combination of heuristic pruning and
fine-tuning offers 5x compression rate without considerable loss in retrieval
performance.
Authors' comments: 5 pages
Tomasz Jurczyk, Jinho D. Choi
This paper challenges a cross-genre document retrieval task, where the queries are in formal writing and the target documents are in conversational writing. In this task, a query, is a sentence extracted from either a summary or a plot of an episode in a TV show, and the target document consists of transcripts from the corresponding episode. To establish a strong baseline, we employ the current state-of-the-art search engine to perform document retrieval on the dataset collected for this work. We then introduce a structure reranking approach to improve the initial ranking by utilizing syntactic and semantic structures generated by NLP tools. Our evaluation shows an improvement of more than 4% when the structure reranking is applied, which is very promising.
Hsuan-Yin Lin, Eirik Rosnes
It was recently shown by Fazeli et al. that the storage overhead of a
traditional $t$-server private information retrieval (PIR) protocol can be
significantly reduced using the concept of a $t$-server PIR code. In this work,
we show that a family of $t$-server PIR codes (with increasing dimensions and
blocklengths) can be constructed from an existing $t$-server PIR code through
lengthening by a single information symbol and code extension by at most
$\bigl\lceil t/2\bigr\rceil$ code symbols. Furthermore, by extending a code
construction notion from Steiner systems by Fazeli et al., we obtain a specific
family of $t$-server PIR codes. Based on a code construction technique that
lengthens and extends a $t$-server PIR code simultaneously, a basic algorithm
to find good (i.e., small blocklength) $t$-server PIR codes is proposed. For
the special case of $t=5$, we find provably optimal PIR codes for code
dimensions $k\leq 6$, while for all $7\leq k\leq 32$ we find codes of smaller
blocklength than the best known codes from the literature. Furthermore, in the
case of $t = 8$, we also find better codes for $k = 5, 6, 11, 12$. Numerical
results show that most of the best found $5$-server PIR codes can be
constructed from the proposed family of codes connected to Steiner systems.
Authors' comments: The shorter version of this paper will appear in the proceedings of
2018 International Zurich Seminar on Information and Communication
Gregor Wiedemann, Andreas Niekler
This paper presents a procedure to retrieve subsets of relevant documents
from large text collections for Content Analysis, e.g. in social sciences.
Document retrieval for this purpose needs to take account of the fact that
analysts often cannot describe their research objective with a small set of key
terms, especially when dealing with theoretical or rather abstract research
interests. Instead, it is much easier to define a set of paradigmatic documents
which reflect topics of interest as well as targeted manner of speech. Thus, in
contrast to classic information retrieval tasks we employ manually compiled
collections of reference documents to compose large queries of several hundred
key terms, called dictionaries. We extract dictionaries via Topic Models and
also use co-occurrence data from reference collections. Evaluations show that
the procedure improves retrieval results for this purpose compared to
alternative methods of key term extraction as well as neglecting co-occurrence
data.
Authors' comments: https://hal.archives-ouvertes.fr/hal-01005879; Proceedings of
Terminology and Knowledge Engineering 2014 (TKE'14), Berlin
Eleni Triantafillou, Richard Zemel, Raquel Urtasun
Few-shot learning refers to understanding new concepts from only a few examples. We propose an information retrieval-inspired approach for this problem that is motivated by the increased importance of maximally leveraging all the available information in this low-data regime. We define a training objective that aims to extract as much information as possible from each training batch by effectively optimizing over all relative orderings of the batch points simultaneously. In particular, we view each batch point as a `query' that ranks the remaining ones based on its predicted relevance to them and we define a model within the framework of structured prediction to optimize mean Average Precision over these rankings. Our method achieves impressive results on the standard few-shot classification benchmarks while is also capable of few-shot retrieval.
Qiwen Wang, Mikael Skoglund
The problem of symmetric private information retrieval (SPIR) from replicated databases with colluding servers and adversaries is studied. Specifically, the database comprises $K$ files, which are replicatively stored among $N$ servers. A user wants to retrieve one file from the database by communicating with the $N$ servers, without revealing the identity of the desired file to any server. Furthermore, the user shall learn nothing about the other $K-1$ files. Any $T$ out of $N$ servers may collude, that is, they may communicate their interactions with the user to guess the identity of the requested file. An adversary in the system can tap in on or even try to corrupt the communication. Three types of adversaries are considered: a Byzantine adversary who can overwrite the transmission of any $B$ servers to the user; a passive eavesdropper who can tap in on the incoming and outgoing transmissions of any $E$ servers; and a combination of both -- an adversary who can tap in on a set of any $E$ nodes, and overwrite the transmission of a set of any $B$ nodes. The problems of SPIR with colluding servers and the three types of adversaries are named T-BSPIR, T-ESPIR and T-BESPIR respectively. The capacity of the problem is defined as the maximum number of information bits of the desired file retrieved per downloaded bit. We show that the information-theoretical capacity of T-BSPIR equals $1-\frac{2B+T}{N}$, if the servers share common randomness (unavailable at the user) with amount at least $\frac{2B+T}{N-2B-T}$ times the file size. Otherwise, the capacity equals zero. The capacity of T-ESPIR is proved to equal $1-\frac{\max(T,E)}{N}$, with common randomness at least $\frac{\max(T,E)}{N-\max(T,E)}$ times the file size. Finally, the capacity of T-BESPIR is proved to be $1-\frac{2B+\max(T,E)}{N}$, with common randomness at least $\frac{2B+\max(T,E)}{N-2B-\max(T,E)}$ times the file size.
Oliver Gnilke, Marcus Greferath, Camilla Hollanti, Guillermo Nuñez Ponasso, Padraig Ó Catháin, Eric Swartz
In a User-Private Information Retrieval (UPIR) scheme, a set of users
collaborate to retrieve files from a database without revealing to observers
which participant in the scheme requested the file. Protocols have been
proposed based on pairwise balanced designs and symmetric designs. Wepropose a
new class of UPIR schemes based on generalised quadrangles (GQ).
We prove that while the privacy of users in the previously proposed schemes
could be compromised by a single user, the new GQ-UPIR schemes proposed in this
paper maintain privacy with high probability even when up to $O(n^{1/4 -
\epsilon})$ users collude, where $n$ is the total number of users in the
scheme.
Authors' comments: To appear in the proceedings of The Tenth International Workshop on
Coding and Cryptography 2017. This version contains proofs of some elementary
results on generalised quadrangles omitted from the submitted version for
reasons of space
Halyun Jeong, C. Sinan Güntürk
The classical Kaczmarz iteration and its randomized variants are popular
tools for fast inversion of linear overdetermined systems. This method extends
naturally to the setting of the phase retrieval problem via substituting at
each iteration the phase of any measurement of the available approximate
solution for the unknown phase of the measurement of the true solution. Despite
the simplicity of the method, rigorous convergence guarantees that are
available for the classical linear setting have not been established so far for
the phase retrieval setting. In this short note, we provide a convergence
result for the randomized Kaczmarz method for phase retrieval in
$\mathbb{R}^d$. We show that with high probability a random measurement system
of size $m \asymp d$ will be admissible for this method in the sense that
convergence in the mean square sense is guaranteed with any prescribed
probability. The convergence is exponential and comparable to the linear
setting.
Authors' comments: 13 pages, a mistake corrected, an appendix added
Teng Zhang
This paper considers the problem of phase retrieval, where the goal is to recover a signal $z\in C^n$ from the observations $y_i=|a_i^* z|$, $i=1,2,\cdots,m$. While many algorithms have been proposed, the alternating minimization algorithm has been one of the most commonly used methods, and it is very simple to implement. Current work has proved that when the observation vectors $\{a_i\}_{i=1}^m$ are sampled from a complex Gaussian distribution $N(0, I)$, it recovers the underlying signal with a good initialization when $m=O(n)$, or with random initialization when $m=O(n^2)$, and it conjectured that random initialization succeeds with $m=O(n)$. This work proposes a modified alternating minimization method in a batch setting, and proves that when $m=O(n\log^{3}n)$, the proposed algorithm with random initialization recovers the underlying signal with high probability. The proof is based on the observation that after each iteration of alternating minimization, with high probability, the angle between the estimated signal and the underlying signal is reduced.
Ravi Tandon
The problem of cache enabled private information retrieval (PIR) is considered in which a user wishes to privately retrieve one out of $K$ messages, each of size $L$ bits from $N$ distributed databases. The user has a local cache of storage $SL$ bits which can be used to store any function of the $K$ messages. The main contribution of this work is the exact characterization of the capacity of cache aided PIR as a function of the storage parameter $S$. In particular, for a given cache storage parameter $S$, the information-theoretically optimal download cost $D^{*}(S)/L$ (or the inverse of capacity) is shown to be equal to $(1- \frac{S}{K})\left(1+ \frac{1}{N}+ \ldots + \frac{1}{N^{K-1}}\right)$. Special cases of this result correspond to the settings when $S=0$, for which the optimal download cost was shown by Sun and Jafar to be $\left(1+ \frac{1}{N}+ \ldots + \frac{1}{N^{K-1}}\right)$, and the case when $S=K$, i.e., cache size is large enough to store all messages locally, for which the optimal download cost is $0$. The intermediate points $S\in (0, K)$ can be readily achieved through a simple memory-sharing based PIR scheme. The key technical contribution of this work is the converse, i.e., a lower bound on the download cost as a function of storage $S$ which shows that memory sharing is information-theoretically optimal.
Wengang Zhou, Houqiang Li, Qi Tian
The explosive increase and ubiquitous accessibility of visual data on the Web
have led to the prosperity of research activity in image search or retrieval.
With the ignorance of visual content as a ranking clue, methods with text
search techniques for visual retrieval may suffer inconsistency between the
text words and visual content. Content-based image retrieval (CBIR), which
makes use of the representation of visual content to identify relevant images,
has attracted sustained attention in recent two decades. Such a problem is
challenging due to the intention gap and the semantic gap problems. Numerous
techniques have been developed for content-based image retrieval in the last
decade. The purpose of this paper is to categorize and evaluate those
algorithms proposed during the period of 2003 to 2016. We conclude with several
promising directions for future research.
Authors' comments: 22 pages
Tomaso Aste, T. Di Matteo
We investigate how efficiently a known underlying sparse causality structure
of a simulated multivariate linear process can be retrieved from the analysis
of time-series of short lengths. Causality is quantified from conditional
transfer entropy and the network is constructed by retaining only the
statistically validated contributions. We compare results from three
methodologies: two commonly used regularization methods, Glasso and ridge, and
a newly introduced technique, LoGo, based on the combination of information
filtering network and graphical modelling. For these three methodologies we
explore the regions of time series lengths and model-parameters where a
significant fraction of true causality links is retrieved. We conclude that,
when time-series are short, with their lengths shorter than the number of
variables, sparse models are better suited to uncover true causality links with
LoGo retrieving the true causality network more accurately than Glasso and
ridge.
Authors' comments: 17 pages, 5 figures, 5 tables
Vincent Christlein, Martin Gropp, Stefan Fiel, Andreas Maier
Deep Convolutional Neural Networks (CNN) have shown great success in
supervised classification tasks such as character classification or dating.
Deep learning methods typically need a lot of annotated training data, which is
not available in many scenarios. In these cases, traditional methods are often
better than or equivalent to deep learning methods. In this paper, we propose a
simple, yet effective, way to learn CNN activation features in an unsupervised
manner. Therefore, we train a deep residual network using surrogate classes.
The surrogate classes are created by clustering the training dataset, where
each cluster index represents one surrogate class. The activations from the
penultimate CNN layer serve as features for subsequent classification tasks. We
evaluate the feature representations on two publicly available datasets. The
focus lies on the ICDAR17 competition dataset on historical document writer
identification (Historical-WI). We show that the activation features trained
without supervision are superior to descriptors of state-of-the-art writer
identification methods. Additionally, we achieve comparable results in the case
of handwriting classification using the ICFHR16 competition dataset on
historical Latin script types (CLaMM16).
Authors' comments: ICDAR2017 camera ready (fixed p@2 values, missing table references)
Morteza Babaie, Shivam Kalra, Aditya Sriram, Christopher Mitcheltree, Shujin Zhu, Amin Khatami, Shahryar Rahnamayan, H. R. Tizhoosh
In this paper, we introduce a new dataset, \textbf{Kimia Path24}, for image
classification and retrieval in digital pathology. We use the whole scan images
of 24 different tissue textures to generate 1,325 test patches of size
1000$\times$1000 (0.5mm$\times$0.5mm). Training data can be generated according
to preferences of algorithm designer and can range from approximately 27,000 to
over 50,000 patches if the preset parameters are adopted. We propose a compound
patch-and-scan accuracy measurement that makes achieving high accuracies quite
challenging. In addition, we set the benchmarking line by applying LBP,
dictionary approach and convolutional neural nets (CNNs) and report their
results. The highest accuracy was 41.80\% for CNN.
Authors' comments: Accepted for presentation at Workshop for Computer Vision for
Microscopy Image Analysis (CVMI 2017) @ CVPR 2017, Honolulu, Hawaii
Mihail Eric, Christopher D. Manning
Neural task-oriented dialogue systems often struggle to smoothly interface with a knowledge base. In this work, we seek to address this problem by proposing a new neural dialogue agent that is able to effectively sustain grounded, multi-domain discourse through a novel key-value retrieval mechanism. The model is end-to-end differentiable and does not need to explicitly model dialogue state or belief trackers. We also release a new dataset of 3,031 dialogues that are grounded through underlying knowledge bases and span three distinct tasks in the in-car personal assistant space: calendar scheduling, weather information retrieval, and point-of-interest navigation. Our architecture is simultaneously trained on data from all domains and significantly outperforms a competitive rule-based system and other existing neural dialogue architectures on the provided domains according to both automatic and human evaluation metrics.
Kohei Kawabata, Yuto Ashida, Masahito Ueda
By investigating information flow between a general parity-time (PT)
-symmetric non-Hermitian system and an environment, we find that the complete
information retrieval from the environment can be achieved in the PT-unbroken
phase, whereas no information can be retrieved in the PT-broken phase. The
PT-transition point thus marks the reversible-irreversible criticality of
information flow, around which many physical quantities such as the recurrence
time and the distinguishability between quantum states exhibit power-law
behavior. Moreover, by embedding a PT-symmetric system into a larger Hilbert
space so that the entire system obeys unitary dynamics, we reveal that behind
the information retrieval lies a hidden entangled partner protected by PT
symmetry. Possible experimental situations are also discussed.
Authors' comments: 6+5 pages, 1+2 figures
Nir Shlezinger, Ron Dabora, Yonina C. Eldar
In phase retrieval problems, a signal of interest (SOI) is reconstructed
based on the magnitude of a linear transformation of the SOI observed with
additive noise. The linear transform is typically referred to as a measurement
matrix. Many works on phase retrieval assume that the measurement matrix is a
random Gaussian matrix, which, in the noiseless scenario with sufficiently many
measurements, guarantees invertability of the transformation between the SOI
and the observations, up to an inherent phase ambiguity. However, in many
practical applications, the measurement matrix corresponds to an underlying
physical setup, and is therefore deterministic, possibly with structural
constraints. In this work we study the design of deterministic measurement
matrices, based on maximizing the mutual information between the SOI and the
observations. We characterize necessary conditions for the optimality of a
measurement matrix, and analytically obtain the optimal matrix in the low
signal-to-noise ratio regime. Practical methods for designing general
measurement matrices and masked Fourier measurements are proposed. Simulation
tests demonstrate the performance gain achieved by the proposed techniques
compared to random Gaussian measurements for various phase recovery algorithms.
Authors' comments: This paper was presented in part at the 2017 International Symposium
on Information Theory
Masataka Yamaguchi, Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada
In this paper, we address the problem of spatio-temporal person retrieval
from multiple videos using a natural language query, in which we output a tube
(i.e., a sequence of bounding boxes) which encloses the person described by the
query. For this problem, we introduce a novel dataset consisting of videos
containing people annotated with bounding boxes for each second and with five
natural language descriptions. To retrieve the tube of the person described by
a given natural language query, we design a model that combines methods for
spatio-temporal human detection and multimodal retrieval. We conduct
comprehensive experiments to compare a variety of tube and text representations
and multimodal retrieval methods, and present a strong baseline in this task as
well as demonstrate the efficacy of our tube representation and multimodal
feature embedding technique. Finally, we demonstrate the versatility of our
model by applying it to two other important tasks.
Authors' comments: Accepted to ICCV2017