Weinan Zhang
Generative adversarial nets (GANs) have been widely studied during the recent
development of deep learning and unsupervised learning. With an adversarial
training mechanism, GAN manages to train a generative model to fit the
underlying unknown real data distribution under the guidance of the
discriminative model estimating whether a data instance is real or generated.
Such a framework is originally proposed for fitting continuous data
distribution such as images, thus it is not straightforward to be directly
applied to information retrieval scenarios where the data is mostly discrete,
such as IDs, text and graphs. In this tutorial, we focus on discussing the GAN
techniques and the variants on discrete data fitting in various information
retrieval scenarios. (i) We introduce the fundamentals of GAN framework and its
theoretic properties; (ii) we carefully study the promising solutions to extend
GAN onto discrete data generation; (iii) we introduce IRGAN, the fundamental
GAN framework of fitting single ID data distribution and the direct application
on information retrieval; (iv) we further discuss the task of sequential
discrete data generation tasks, e.g., text generation, and the corresponding
GAN solutions; (v) we present the most recent work on graph/network data
fitting with node embedding techniques by GANs. Meanwhile, we also introduce
the relevant open-source platforms such as IRGAN and Texygen to help audience
conduct research experiments on GANs in information retrieval. Finally, we
conclude this tutorial with a comprehensive summarization and a prospect of
further research directions for GANs in information retrieval.
Authors' comments: 4 pages, SIGIR 2018 tutorial
Ramina Ghods, Andrew S. Lan, Tom Goldstein, Christoph Studer
Phase retrieval refers to the problem of recovering real- or complex-valued
vectors from magnitude measurements. The best-known algorithms for this problem
are iterative in nature and rely on so-called spectral initializers that
provide accurate initialization vectors. We propose a novel class of estimators
suitable for general nonlinear measurement systems, called linear spectral
estimators (LSPEs), which can be used to compute accurate initialization
vectors for phase retrieval problems. The proposed LSPEs not only provide
accurate initialization vectors for noisy phase retrieval systems with
structured or random measurement matrices, but also enable the derivation of
sharp and nonasymptotic mean-squared error bounds. We demonstrate the efficacy
of LSPEs on synthetic and real-world phase retrieval problems, and show that
our estimators significantly outperform existing methods for structured
measurement systems that arise in practice.
Authors' comments: To appear at ICML 2018, extended version with supplementary material
Jiedong Hao, Jing Dong, Wei Wang, Tieniu Tan
There are great demands for automatically regulating inappropriate appearance
of shocking firearm images in social media or identifying firearm types in
forensics. Image retrieval techniques have great potential to solve these
problems. To facilitate research in this area, we introduce Firearm 14k, a
large dataset consisting of over 14,000 images in 167 categories. It can be
used for both fine-grained recognition and retrieval of firearm images. Recent
advances in image retrieval are mainly driven by fine-tuning state-of-the-art
convolutional neural networks for retrieval task. The conventional single
margin contrastive loss, known for its simplicity and good performance, has
been widely used. We find that it performs poorly on the Firearm 14k dataset
due to: (1) Loss contributed by positive and negative image pairs is unbalanced
during training process. (2) A huge domain gap exists between this dataset and
ImageNet. We propose to deal with the unbalanced loss by employing a double
margin contrastive loss. We tackle the domain gap issue with a two-stage
training strategy, where we first fine-tune the network for classification, and
then fine-tune it for retrieval. Experimental results show that our approach
outperforms the conventional single margin approach by a large margin (up to
88.5% relative improvement) and even surpasses the strong triplet-loss-based
approach.
Authors' comments: 6 pages, 5 figures, accepted by ICPR 2018. Code are available at
https://github.com/jdhao/deep_firearm. Dataset is available at
http://forensics.idealtest.org/Firearm14k/
Tiziano Zingales, Ingo Peter Waldmann
Atmospheric retrievals on exoplanets usually involve computationally
intensive Bayesian sampling methods. Large parameter spaces and increasingly
complex atmospheric models create a computational bottleneck forcing a
trade-off between statistical sampling accuracy and model complexity. It is
especially true for upcoming JWST and ARIEL observations. We introduce ExoGAN,
the Exoplanet Generative Adversarial Network, a new deep learning algorithm
able to recognise molecular features, atmospheric trace-gas abundances and
planetary parameters using unsupervised learning. Once trained, ExoGAN is
widely applicable to a large number of instruments and planetary types. The
ExoGAN retrievals constitute a significant speed improvement over traditional
retrievals and can be used either as a final atmospheric analysis or provide
prior constraints to subsequent retrieval.
Authors' comments: 19 pages, 17 figures, 7 tables
Hsuan-Yin Lin, Siddhartha Kumar, Eirik Rosnes, Alexandre Graell i Amat
We consider private information retrieval (PIR) for distributed storage
systems (DSSs) with noncolluding nodes where data is stored using a non maximum
distance separable (MDS) linear code. It was recently shown that if data is
stored using a particular class of non-MDS linear codes, the MDS-PIR capacity,
i.e., the maximum possible PIR rate for MDS-coded DSSs, can be achieved. For
this class of codes, we prove that the PIR capacity is indeed equal to the
MDS-PIR capacity, giving the first family of non-MDS codes for which the PIR
capacity is known. For other codes, we provide asymmetric PIR protocols that
achieve a strictly larger PIR rate compared to existing symmetric PIR
protocols.
Authors' comments: To be presented at 2018 IEEE Information Theory Workshop (ITW'18).
See arXiv:1808.09018 for its extended version
Seyed Pooya Shariatpanahi, Mahdi Jafari Siavoshani, Mohammad Ali Maddah-Ali
We consider the problem of private information retrieval (PIR) where a single user with private side information aims to retrieve multiple files from a library stored (uncoded) at a number of servers. We assume the side information at the user includes a subset of files stored privately (i.e., the server does not know the indices of these files). In addition, we require that the identity of the requests and side information at the user are not revealed to any of the servers. The problem involves finding the minimum load to be transmitted from the servers to the user such that the requested files can be decoded with the help of received and side information. By providing matching lower and upper bounds, for certain regimes, we characterize the minimum load imposed to all the servers (i.e., the capacity of this PIR problem). Our result shows that the capacity is the same as the capacity of a multi-message PIR problem without private side information, but with a library of reduced size. The effective size of the library is equal to the original library size minus the size of side information.
Keet Sugathadasa, Buddhi Ayesha, Nisansa de Silva, Amal Shehan Perera, Vindula Jayawardana, Dimuthu Lakmal, Madhavi Perera
Domain specific information retrieval process has been a prominent and ongoing research in the field of natural language processing. Many researchers have incorporated different techniques to overcome the technical and domain specificity and provide a mature model for various domains of interest. The main bottleneck in these studies is the heavy coupling of domain experts, that makes the entire process to be time consuming and cumbersome. In this study, we have developed three novel models which are compared against a golden standard generated via the on line repositories provided, specifically for the legal domain. The three different models incorporated vector space representations of the legal domain, where document vector generation was done in two different mechanisms and as an ensemble of the above two. This study contains the research being carried out in the process of representing legal case documents into different vector spaces, whilst incorporating semantic word measures and natural language processing techniques. The ensemble model built in this study, shows a significantly higher accuracy level, which indeed proves the need for incorporation of domain specific semantic similarity measures into the information retrieval process. This study also shows, the impact of varying distribution of the word similarity measures, against varying document vector dimensions, which can lead to improvements in the process of legal information retrieval.
Xinyu Hua, Lu Wang
High quality arguments are essential elements for human reasoning and
decision-making processes. However, effective argument construction is a
challenging task for both human and machines. In this work, we study a novel
task on automatically generating arguments of a different stance for a given
statement. We propose an encoder-decoder style neural network-based argument
generation model enriched with externally retrieved evidence from Wikipedia.
Our model first generates a set of talking point phrases as intermediate
representation, followed by a separate decoder producing the final argument
based on both input and the keyphrases. Experiments on a large-scale dataset
collected from Reddit show that our model constructs arguments with more
topic-relevant content than a popular sequence-to-sequence generation model
according to both automatic evaluation and human assessments.
Authors' comments: This paper is accepted as to ACL 2018
Xirong Li, Chaoxi Xu, Xiaoxu Wang, Weiyu Lan, Zhengxiong Jia, Gang Yang, Jieping Xu
This paper contributes to cross-lingual image annotation and retrieval in
terms of data and baseline methods. We propose COCO-CN, a novel dataset
enriching MS-COCO with manually written Chinese sentences and tags. For more
effective annotation acquisition, we develop a recommendation-assisted
collective annotation system, automatically providing an annotator with several
tags and sentences deemed to be relevant with respect to the pictorial content.
Having 20,342 images annotated with 27,218 Chinese sentences and 70,993 tags,
COCO-CN is currently the largest Chinese-English dataset that provides a
unified and challenging platform for cross-lingual image tagging, captioning
and retrieval. We develop conceptually simple yet effective methods per task
for learning from cross-lingual resources. Extensive experiments on the three
tasks justify the viability of the proposed dataset and methods. Data and code
are publicly available at https://github.com/li-xirong/coco-cn
Authors' comments: accepted for publication as a regular paper in the IEEE Transactions
on Multimedia
Meng Huang, Zhiqiang Xu
In this paper, we consider the generalized phase retrieval from affine
measurements. This problem aims to recover signals ${\mathbf x} \in {\mathbb
F}^d$ from the affine measurements $y_j=\norm{M_j^*\vx +{\mathbb b}_j}^2,\;
j=1,\ldots,m,$ where $M_j \in {\mathbb F}^{d\times r}, {\mathbf b}_j\in
{\mathbb F}^{r}, {\mathbb F}\in \{{\mathbb R},{\mathbb C}\}$ and we call it as
{\em generalized affine phase retrieval}. We develop a framework for
generalized affine phase retrieval with presenting necessary and sufficient
conditions for $\{(M_j,{\mathbf b}_j)\}_{j=1}^m$ having generalized affine
phase retrieval property. We also establish results on minimal measurement
number for generalized affine phase retrieval. Particularly, we show if
$\{(M_j,{\mathbf b}_j)\}_{j=1}^m \subset {\mathbb F}^{d\times r}\times {\mathbb
F}^{r}$ has generalized affine phase retrieval property, then $m\geq
d+\floor{d/r}$ for ${\mathbb F}={\mathbb R}$ ($m\geq 2d+\floor{d/r}$ for
${\mathbb F}={\mathbb C}$ ). We also show that the bound is tight provided
$r\mid d$. These results imply that one can reduce the measurement number by
raising $r$, i.e. the rank of $M_j$. This highlights a notable difference
between generalized affine phase retrieval and generalized phase retrieval.
Furthermore, using tools of algebraic geometry, we show that $m\geq 2d$ (resp.
$m\geq 4d-1$) generic measurements ${\mathcal A}=\{(M_j,b_j)\}_{j=1}^m$ have
the generalized phase retrieval property for ${\mathbb F}={\mathbb R}$ (resp.
${\mathbb F}={\mathbb C}$).
Authors' comments: 20 pages
Christy Y. Li, Xiaodan Liang, Zhiting Hu, Eric P. Xing
Generating long and coherent reports to describe medical images poses challenges to bridging visual patterns with informative human linguistic descriptions. We propose a novel Hybrid Retrieval-Generation Reinforced Agent (HRGR-Agent) which reconciles traditional retrieval-based approaches populated with human prior knowledge, with modern learning-based approaches to achieve structured, robust, and diverse report generation. HRGR-Agent employs a hierarchical decision-making procedure. For each sentence, a high-level retrieval policy module chooses to either retrieve a template sentence from an off-the-shelf template database, or invoke a low-level generation module to generate a new sentence. HRGR-Agent is updated via reinforcement learning, guided by sentence-level and word-level rewards. Experiments show that our approach achieves the state-of-the-art results on two medical report datasets, generating well-balanced structured sentences with robust coverage of heterogeneous medical report contents. In addition, our model achieves the highest detection accuracy of medical terminologies, and improved human evaluation performance.
Chatdanai Dorkson, Siaw-Lynn Ng
Private Information Retrieval (PIR) schemes allow a user to retrieve a record from the server without revealing any information on which record is being downloaded. In this paper, we consider PIR schemes where the database is stored using Minimum Storage Regenerating (MSR) codes which is a class of optimal regenerating codes providing efficient repair when a node failure occurs in the system. We analyse the relationship between the costs of privacy, storage and repair, and also construct an explicit PIR scheme that uses the MSR codes from [3] to achieve the optimal curve of the trade-off.
Yixing Fan, Jiafeng Guo, Yanyan Lan, Jun Xu, Chengxiang Zhai, Xueqi Cheng
Assessing relevance between a query and a document is challenging in ad-hoc
retrieval due to its diverse patterns, i.e., a document could be relevant to a
query as a whole or partially as long as it provides sufficient information for
users' need. Such diverse relevance patterns require an ideal retrieval model
to be able to assess relevance in the right granularity adaptively.
Unfortunately, most existing retrieval models compute relevance at a single
granularity, either document-wide or passage-level, or use fixed combination
strategy, restricting their ability in capturing diverse relevance patterns. In
this work, we propose a data-driven method to allow relevance signals at
different granularities to compete with each other for final relevance
assessment. Specifically, we propose a HIerarchical Neural maTching model
(HiNT) which consists of two stacked components, namely local matching layer
and global decision layer. The local matching layer focuses on producing a set
of local relevance signals by modeling the semantic matching between a query
and each passage of a document. The global decision layer accumulates local
signals into different granularities and allows them to compete with each other
to decide the final relevance score. Experimental results demonstrate that our
HiNT model outperforms existing state-of-the-art retrieval models significantly
on benchmark ad-hoc retrieval datasets.
Authors' comments: The 41st International ACM SIGIR Conference on Research \&
Development in Information Retrieval, SIGIR'18
Georgios Balikas, Charlotte Laclau, Ievgen Redko, Massih-Reza Amini
Many information retrieval algorithms rely on the notion of a good distance
that allows to efficiently compare objects of different nature. Recently, a new
promising metric called Word Mover's Distance was proposed to measure the
divergence between text passages. In this paper, we demonstrate that this
metric can be extended to incorporate term-weighting schemes and provide more
accurate and computationally efficient matching between documents using
entropic regularization. We evaluate the benefits of both extensions in the
task of cross-lingual document retrieval (CLDR). Our experimental results on
eight CLDR problems suggest that the proposed methods achieve remarkable
improvements in terms of Mean Reciprocal Rank compared to several baselines.
Authors' comments: ECIR 2018
Lokesh Boominathan, Mayug Maniparambil, Honey Gupta, Rahul Baburajan, Kaushik Mitra
Fourier Ptychography is a recently proposed imaging technique that yields
high-resolution images by computationally transcending the diffraction blur of
an optical system. At the crux of this method is the phase retrieval algorithm,
which is used for computationally stitching together low-resolution images
taken under varying illumination angles of a coherent light source. However,
the traditional iterative phase retrieval technique relies heavily on the
initialization and also need a good amount of overlap in the Fourier domain for
the successively captured low-resolution images, thus increasing the
acquisition time and data. We show that an auto-encoder based architecture can
be adaptively trained for phase retrieval under both low overlap, where
traditional techniques completely fail, and at higher levels of overlap. For
the low overlap case we show that a supervised deep learning technique using an
autoencoder generator is a good choice for solving the Fourier ptychography
problem. And for the high overlap case, we show that optimizing the generator
for reducing the forward model error is an appropriate choice. Using
simulations for the challenging case of uncorrelated phase and amplitude, we
show that our method outperforms many of the previously proposed Fourier
ptychography phase retrieval techniques.
Authors' comments: Supplementary material attached after Reference section
Alessia Angeli, Massimo Ferri, Ivan Tomba
Persistence diagrams, combining geometry and topology for an effective shape
description used in pattern recognition, have already proven to be an effective
tool for shape representation with respect to a certainfiltering function.
Comparing the persistence diagram of a query with those of a database allows
automatic classification or retrieval, but unfortunately, the standard method
for comparing persistence diagrams, the bottleneck distance, has a high
computational cost. A possible algebraic solution to this problem is to switch
to comparisons between the complex polynomials whose roots are the cornerpoints
of the persistence diagrams. This strategy allows to reduce the computational
cost in a significant way, thereby making persistent homology based
applications suitable for large scale databases. The definition of new
distances in the polynomial frame-work poses some interesting problems, both of
theoretical and practical nature. In this paper, these questions have been
addressed by considering possible transformations of the half-plane where the
persistence diagrams lie onto the complex plane, and by considering a certain
re-normalisation the symmetric functions associated to the polynomial roots of
the resulting transformed polynomial. The encouraging numerical results,
obtained in a dermatology application test, suggest that the proposed method
may even improve the achievements obtained by the standard methods using
persistence diagrams and the bottleneck distance.
Authors' comments: 14 pages
Nils Murrugarra-Llerena, Adriana Kovashka
How would you search for a unique, fashionable shoe that a friend wore and
you want to buy, but you didn't take a picture? Existing approaches propose
interactive image search as a promising venue. However, they either entrust the
user with taking the initiative to provide informative feedback, or give all
control to the system which determines informative questions to ask. Instead,
we propose a mixed-initiative framework where both the user and system can be
active participants, depending on whose initiative will be more beneficial for
obtaining high-quality search results. We develop a reinforcement learning
approach which dynamically decides which of three interaction opportunities to
give to the user: drawing a sketch, providing free-form attribute feedback, or
answering attribute-based questions. By allowing these three options, our
system optimizes both the informativeness and exploration capabilities allowing
faster image retrieval. We outperform three baselines on three datasets and
extensive experimental settings.
Authors' comments: In submission to BMVC 2018
Sonakshi Mathur, Mallika Chaudhary, Hemant Verma, Murari Mandal, S. K. Vipparthi, Subrahmanyam Murala
A novel color feature descriptor, Multichannel Distributed Local Pattern
(MDLP) is proposed in this manuscript. The MDLP combines the salient features
of both local binary and local mesh patterns in the neighborhood. The
multi-distance information computed by the MDLP aids in robust extraction of
the texture arrangement. Further, MDLP features are extracted for each color
channel of an image. The retrieval performance of the MDLP is evaluated on the
three benchmark datasets for CBIR, namely Corel-5000, Corel-10000 and MIT-Color
Vistex respectively. The proposed technique attains substantial improvement as
compared to other state-of- the-art feature descriptors in terms of various
evaluation parameters such as ARP and ARR on the respective databases.
Authors' comments: Accepted in INDICON-2017
Robert Litschko, Goran Glavaš, Simone Paolo Ponzetto, Ivan Vulić
We propose a fully unsupervised framework for ad-hoc cross-lingual
information retrieval (CLIR) which requires no bilingual data at all. The
framework leverages shared cross-lingual word embedding spaces in which terms,
queries, and documents can be represented, irrespective of their actual
language. The shared embedding spaces are induced solely on the basis of
monolingual corpora in two languages through an iterative process based on
adversarial neural networks. Our experiments on the standard CLEF CLIR
collections for three language pairs of varying degrees of language similarity
(English-Dutch/Italian/Finnish) demonstrate the usefulness of the proposed
fully unsupervised approach. Our CLIR models with unsupervised cross-lingual
embeddings outperform baselines that utilize cross-lingual embeddings induced
relying on word-level and document-level alignments. We then demonstrate that
further improvements can be achieved by unsupervised ensemble CLIR models. We
believe that the proposed framework is the first step towards development of
effective CLIR models for language pairs and domains where parallel data are
scarce or non-existent.
Authors' comments: accepted at SIGIR'18 (preprint)
Diyah Puspitaningrum
While working in collaborative team elsewhere sometimes the federated (huge)
data are from heterogeneous cloud vendors. It is not only about the data
privacy concern but also about how can those federated data can be querying
from cloud directly in fast and securely way. Previous solution offered hybrid
cloud between public and trusted private cloud. Another previous solution used
encryption on MapReduce framework. But the challenge is we are working on
heterogeneous clouds. In this paper, we present a novel technique for querying
with privacy concern.
Since we take execution time into account, our basic idea is to use the data
mining model by partitioning the federated databases in order to reduce the
search and query time. By using model of the database it means we use only the
summary or the very characteristic patterns of the database. Modeling is the
Preserving Privacy Stage I, since by modeling the data is being symbolized. We
implement encryption on the database as preserving privacy Stage II. Our
system, called "cSELENE" (stands for "cloud SELENE"), is designed to handle
federated data on heterogeneous clouds: AWS, Microsoft Azure, and Google Cloud
Platform with MapReduce technique.
In this paper we discuss preserving-privacy system and threat model, the
format of federated data, the parallel programming (GPU programming and
shared/memory systems), the parallel and secure algorithm for data mining model
in distributed cloud, the cloud infrastructure/architecture, and the UIX design
of the cSELENE system. Other issues such as incremental method and the secure
design of cloud architecture system (Virtual Machines across platform design)
are still open to discuss. Our experiments should demonstrate the validity and
practicality of the proposed high performance computing scheme.
Authors' comments: The First International Workshop on Learning From Limited or Noisy
Data for Information Retrieval (LND4IR), Ann Arbor, Michigan, USA, July 2018
(SIGIR 2018), 6 pages