Minghua Zhang, Yunfang Wu
Question retrieval is a crucial subtask for community question answering. Previous research focus on supervised models which depend heavily on training data and manual feature engineering. In this paper, we propose a novel unsupervised framework, namely reduced attentive matching network (RAMN), to compute semantic matching between two questions. Our RAMN integrates together the deep semantic representations, the shallow lexical mismatching information and the initial rank produced by an external search engine. For the first time, we propose attention autoencoders to generate semantic representations of questions. In addition, we employ lexical mismatching to capture surface matching between two questions, which is derived from the importance of each word in a question. We conduct experiments on the open CQA datasets of SemEval-2016 and SemEval-2017. The experimental results show that our unsupervised model obtains comparable performance with the state-of-the-art supervised methods in SemEval-2016 Task 3, and outperforms the best system in SemEval-2017 Task 3 by a wide margin.
Shiv Ram Dubey, Snehasis Mukherjee
The local descriptors have gained wide range of attention due to their
enhanced discriminative abilities. It has been proved that the consideration of
multi-scale local neighborhood improves the performance of the descriptor,
though at the cost of increased dimension. This paper proposes a novel method
to construct a local descriptor using multi-scale neighborhood by finding the
local directional order among the intensity values at different scales in a
particular direction. Local directional order is the multi-radius relationship
factor in a particular direction. The proposed local directional order pattern
(LDOP) for a particular pixel is computed by finding the relationship between
the center pixel and local directional order indexes. It is required to
transform the center value into the range of neighboring orders. Finally, the
histogram of LDOP is computed over whole image to construct the descriptor. In
contrast to the state-of-the-art descriptors, the dimension of the proposed
descriptor does not depend upon the number of neighbors involved to compute the
order; it only depends upon the number of directions. The introduced descriptor
is evaluated over the image retrieval framework and compared with the
state-of-the-art descriptors over challenging face databases such as PaSC, LFW,
PubFig, FERET, AR, AT&T, and ExtendedYale. The experimental results confirm the
superiority and robustness of the LDOP descriptor.
Authors' comments: Published in Multimedia Tools and Applications, Springer
Christopher A. Metzler, Philip Schniter, Ashok Veeraraghavan, Richard G. Baraniuk
Phase retrieval algorithms have become an important component in many modern computational imaging systems. For instance, in the context of ptychography and speckle correlation imaging, they enable imaging past the diffraction limit and through scattering media, respectively. Unfortunately, traditional phase retrieval algorithms struggle in the presence of noise. Progress has been made recently on more robust algorithms using signal priors, but at the expense of limiting the range of supported measurement models (e.g., to Gaussian or coded diffraction patterns). In this work we leverage the regularization-by-denoising framework and a convolutional neural network denoiser to create prDeep, a new phase retrieval algorithm that is both robust and broadly applicable. We test and validate prDeep in simulation to demonstrate that it is robust to noise and can handle a variety of system models. A MatConvNet implementation of prDeep is available at https://github.com/ricedsp/prDeep.
Sarah A. Obead, Jörg Kliewer
We study the problem of private function retrieval (PFR) in a distributed
storage system. In PFR the user wishes to retrieve a linear combination of $M$
messages stored in non-colluding $(N,K)$ MDS coded databases while revealing no
information about the coefficients of the intended linear combination to any of
the individual databases. We present an achievable scheme for MDS coded PFR
with a rate that matches the capacity for coded private information retrieval
derived recently, $R=(1+R_c+R_c^2+\dots+R_c^{M-1})^{-1}=\frac{1-R_c}{1-R_c^M}$,
where $R_c=\frac{K}{N}$ is the rate of the MDS code. This achievable rate is
tight in some special cases.
Authors' comments: 5 pages, 1 table, submitted for publication
Akshay Mehra, Jihun Hamm, Mikhail Belkin
An interactive image retrieval system learns which images in the database
belong to a user's query concept, by analyzing the example images and feedback
provided by the user. The challenge is to retrieve the relevant images with
minimal user interaction. In this work, we propose to solve this problem by
posing it as a binary classification task of classifying all images in the
database as being relevant or irrelevant to the user's query concept. Our
method combines active learning with graph-based semi-supervised learning
(GSSL) to tackle this problem. Active learning reduces the number of user
interactions by querying the labels of the most informative points and GSSL
allows to use abundant unlabeled data along with the limited labeled data
provided by the user. To efficiently find the most informative point, we use an
uncertainty sampling based method that queries the label of the point nearest
to the decision boundary of the classifier. We estimate this decision boundary
using our heuristic of adaptive threshold. To utilize huge volumes of unlabeled
data we use an efficient approximation based method that reduces the complexity
of GSSL from $O(n^3)$ to $O(n)$, making GSSL scalable. We make the classifier
robust to the diversity and noisy labels associated with images in large
databases by incorporating information from multiple modalities such as visual
information extracted from deep learning based models and semantic information
extracted from the WordNet. High F1 scores within few relevance feedback rounds
in our experiments with concepts defined on AnimalWithAttributes and Imagenet
(1.2 million images) datasets indicate the effectiveness and scalability of our
approach.
Authors' comments: 15 Pages, Submitted to KDD 2018
Mucahid Kutlu, Vivek Khetan, Matthew Lease
Because researchers typically do not have the time or space to present more than a few evaluation metrics in any published study, it can be difficult to assess relative effectiveness of prior methods for unreported metrics when baselining a new method or conducting a systematic meta-review. While sharing of study data would help alleviate this, recent attempts to encourage consistent sharing have been largely unsuccessful. Instead, we propose to enable relative comparisons with prior work across arbitrary metrics by predicting unreported metrics given one or more reported metrics. In addition, we further investigate prediction of high-cost evaluation measures using low-cost measures as a potential strategy for reducing evaluation cost. We begin by assessing the correlation between 23 IR metrics using 8 TREC test collections. Measuring prediction error wrt. R-square and Kendall's tau, we show that accurate prediction of MAP, P@10, and RBP can be achieved using only 2-3 other metrics. With regard to lowering evaluation cost, we show that RBP(p=0.95) can be predicted with high accuracy using measures with only evaluation depth of 30. Taken together, our findings provide a valuable proof-of-concept which we expect to spur follow-on work by others in proposing more sophisticated models for metric prediction.
Spencer Cappallo, Stacey Svetlichnaya, Pierre Garrigues, Thomas Mensink, Cees G. M. Snoek
Over the past decade, emoji have emerged as a new and widespread form of digital communication, spanning diverse social networks and spoken languages. We propose to treat these ideograms as a new modality in their own right, distinct in their semantic structure from both the text in which they are often embedded as well as the images which they resemble. As a new modality, emoji present rich novel possibilities for representation and interaction. In this paper, we explore the challenges that arise naturally from considering the emoji modality through the lens of multimedia research. Specifically, the ways in which emoji can be related to other common modalities such as text and images. To do so, we first present a large scale dataset of real-world emoji usage collected from Twitter. This dataset contains examples of both text-emoji and image-emoji relationships. We present baseline results on the challenge of predicting emoji from both text and images, using state-of-the-art neural networks. Further, we offer a first consideration into the problem of how to account for new, unseen emoji - a relevant issue as the emoji vocabulary continues to expand on a yearly basis. Finally, we present results for multimedia retrieval using emoji as queries.
Steven Hickson, Anelia Angelova, Irfan Essa, Rahul Sukthankar
We consider the problem of retrieving objects from image data and learning to
classify them into meaningful semantic categories with minimal supervision. To
that end, we propose a fully differentiable unsupervised deep clustering
approach to learn semantic classes in an end-to-end fashion without individual
class labeling using only unlabeled object proposals. The key contributions of
our work are 1) a kmeans clustering objective where the clusters are learned as
parameters of the network and are represented as memory units, and 2)
simultaneously building a feature representation, or embedding, while learning
to cluster it. This approach shows promising results on two popular computer
vision datasets: on CIFAR10 for clustering objects, and on the more complex and
challenging Cityscapes dataset for semantically discovering classes which
visually correspond to cars, people, and bicycles. Currently, the only
supervision provided is segmentation objectness masks, but this method can be
extended to use an unsupervised objectness-based object generation mechanism
which will make the approach completely unsupervised.
Authors' comments: Camera-ready version for NIPS 2017 workshop Learning with Limited
Labeled Data
Pedro Saleiro
Online Reputation Monitoring (ORM) is concerned with the use of computational
tools to measure the reputation of entities online, such as politicians or
companies. In practice, current ORM methods are constrained to the generation
of data analytics reports, which aggregate statistics of popularity and
sentiment on social media. We argue that this format is too restrictive as end
users often like to have the flexibility to search for entity-centric
information that is not available in predefined charts. As such, we propose the
inclusion of entity retrieval capabilities as a first step towards the
extension of current ORM capabilities. However, an entity's reputation is also
influenced by the entity's relationships with other entities. Therefore, we
address the problem of Entity-Relationship (E-R) retrieval in which the goal is
to search for multiple connected entities. This is a challenging problem which
traditional entity search systems cannot cope with. Besides E-R retrieval we
also believe ORM would benefit of text-based entity-centric prediction
capabilities, such as predicting entity popularity on social media based on
news events or the outcome of political surveys. However, none of these tasks
can provide useful results if there is no effective entity disambiguation and
sentiment analysis tailored to the context of ORM. Consequently, this thesis
address two computational problems in Online Reputation Monitoring: Entity
Retrieval and Text Mining. We researched and developed methods to extract,
retrieve and predict entity-centric information spread across the Web.
Authors' comments: PhD Thesis
Joris Walraevens, Dieter Claeys, Tuan Phung-Duc
We calculate asymptotics of the distribution of the number of customers in
orbit in a two-class priority retrial $M/G/1$-type queueing model. In this
model, priority customers wait in line while non-priority customers join an
orbit and retry later. Although the generating function and moments of the
number of customers in orbit has been analyzed before, asymptotics of the
distribution have not been thoroughly investigated. We use singularity analysis
of the probability generating function to do just that. Our results show that
different regimes exist for these asymptotics in case of light-tailed service
times: in what we call the `priority regime', the tail asymptotics have the
same decay ($\sim cn^{-3/2}R^{-n}$) as in the priority non-retrial queue and
the retrial rate only influences the constant $c$. In the `retrial regime', the
retrial rate also influences the sub-exponential factor of the asymptotics. In
this regime, asymptotics are very similar to asymptotics in retrial queues
without (priority) waiting line. Finally, we also analyze the case that the
service time distribution is power law (with or without exponential cut-off)
using the same technique.
Authors' comments: Submitted for review
Fariborz Salehi, Ehsan Abbasi, Babak Hassibi
Recovering an unknown complex signal from the magnitude of linear combinations of the signal is referred to as phase retrieval. We present an exact performance analysis of a recently proposed convex-optimization-formulation for this problem, known as PhaseMax. Standard convex-relaxation-based methods in phase retrieval resort to the idea of "lifting" which makes them computationally inefficient, since the number of unknowns is effectively squared. In contrast, PhaseMax is a novel convex relaxation that does not increase the number of unknowns. Instead it relies on an initial estimate of the true signal which must be externally provided. In this paper, we investigate the required number of measurements for exact recovery of the signal in the large system limit and when the linear measurement matrix is random with iid standard normal entries. If $n$ denotes the dimension of the unknown complex signal and $m$ the number of phaseless measurements, then in the large system limit, $\frac{m}{n} > \frac{4}{\cos^2(\theta)}$ measurements is necessary and sufficient to recover the signal with high probability, where $\theta$ is the angle between the initial estimate and the true signal. Our result indicates a sharp phase transition in the asymptotic regime which matches the empirical result in numerical simulations.
Karim Banawan, Sennur Ulukus
We consider the problem of private information retrieval through wiretap
channel II (PIR-WTC-II). In PIR-WTC-II, a user wants to retrieve a single
message (file) privately out of $M$ messages, which are stored in $N$
replicated and non-communicating databases. An external eavesdropper observes a
fraction $\mu_n$ (of its choice) of the traffic exchanged between the $n$th
database and the user. In addition to the privacy constraint, the databases
should encode the returned answer strings such that the eavesdropper learns
absolutely nothing about the \emph{contents} of the databases. We aim at
characterizing the capacity of the PIR-WTC-II under the combined privacy and
security constraints. We obtain a general upper bound for the problem in the
form of a max-min optimization problem, which extends the converse proof of the
PIR problem under asymmetric traffic constraints. We propose an achievability
scheme that satisfies the security constraint by encoding a secret key, which
is generated securely at each database, into an artificial noise vector using
an MDS code. The user and the databases operate at one of the corner points of
the achievable scheme for the PIR under asymmetric traffic constraints such
that the retrieval rate is maximized under the imposed security constraint. The
upper bound and the lower bound match for the case of $M=2$ and $M=3$ messages,
for any $N$, and any $\boldsymbol{\mu}=(\mu_1, \cdots, \mu_N)$.
Authors' comments: Submitted to IEEE Transactions on Information Theory, January 2018
Yao Gu, Mayank Kejriwal
In social media like Twitter, hashtags carry a lot of semantic information
and can be easily distinguished from the main text. Exploring and visualizing
the space of hashtags in a meaningful way can offer important insights into a
dataset, especially in crisis situations. In this demonstration paper, we
present a functioning prototype, HashViz, that ingests a corpus of tweets
collected in the aftermath of a crisis situation (such as the Las Vegas
shootings) and uses the fastText bag-of-tricks semantic embedding algorithm
(from Facebook Research) to embed words and hashtags into a vector space.
Hashtag vectors obtained in this way can be visualized using the t-SNE
dimensionality reduction algorithm in 2D. Although multiple Twitter
visualization platforms exist, HashViz is distinguished by being simple,
scalable, interactive and portable enough to be deployed on a server for
million-tweet corpora collected in the aftermath of arbitrary disasters,
without special-purpose installation, technical expertise, manual supervision
or costly software or infrastructure investment. Although simple, we show that
HashViz offers an intuitive way to summarize, and gain insight into, a
developing crisis situation. HashViz is also completely unsupervised, requiring
no manual inputs to go from a raw corpus to a visualization and search
interface. Using the recent Las Vegas mass shooting massacre as a case study,
we illustrate the potential of HashViz using only a web browser on the client
side.
Authors' comments: 2 pages, 3 figures, Workshop on Social Web in Emergency and Disaster
Management at ACM WSDM 2018
Jibril Frej, Jean-Pierre Chevallet, Didier Schwab
In this paper, we explore the usage of Word Embedding semantic resources for Information Retrieval (IR) task. This embedding, produced by a shallow neural network, have been shown to catch semantic similarities between words (Mikolov et al., 2013). Hence, our goal is to enhance IR Language Models by addressing the term mismatch problem. To do so, we applied the model presented in the paper Integrating and Evaluating Neural Word Embedding in Information Retrieval by Zuccon et al. (2015) that proposes to estimate the translation probability of a Translation Language Model using the cosine similarity between Word Embedding. The results we obtained so far did not show a statistically significant improvement compared to classical Language Model.
Karim Banawan, Sennur Ulukus
We consider the classical setting of private information retrieval (PIR) of a
single message (file) out of $M$ messages from $N$ distributed databases under
the new constraint of \emph{asymmetric traffic} from databases. In this
problem, the \emph{ratios between the traffic} from the databases are
constrained, i.e., the ratio of the length of the answer string that the user
(retriever) receives from the $n$th database to the total length of all answer
strings from all databases is constrained to be $\tau_n$. This may happen if
the user's access to the databases is restricted due database availability,
channel quality to the databases, and other factors. For this problem, for
fixed $M$, $N$, we develop a general upper bound $\bar{C}(\boldsymbol{\tau})$,
which generalizes the converse proof of Sun-Jafar, where database symmetry was
inherently used. Our converse bound is a piece-wise affine function in the
traffic ratio vector $\boldsymbol{\tau}=(\tau_1, \cdots, \tau_N)$. For the
lower bound, we explicitly show the achievability of $\binom{M+N-1}{M}$ corner
points. For the remaining traffic ratio vectors, we perform time-sharing
between these corner points. The recursive structure of our achievability
scheme is captured via a system of difference equations. The upper and lower
bounds exactly match for $M=2$ and $M=3$ for any $N$ and any
$\boldsymbol{\tau}$. The results show strict loss of PIR capacity due to the
asymmetric traffic constraints compared with the symmetric case of Sun-Jafar
which implicitly uses $\tau_n=\frac{1}{N}$ for all $n$.
Authors' comments: Submitted to IEEE Transactions on Information Theory, January 2018
Ferenc Galkó, Carsten Eickhoff
The amount of publicly available biomedical literature has been growing
rapidly in recent years, yet question answering systems still struggle to
exploit the full potential of this source of data. In a preliminary processing
step, many question answering systems rely on retrieval models for identifying
relevant documents and passages. This paper proposes a weighted cosine distance
retrieval scheme based on neural network word embeddings. Our experiments are
based on publicly available data and tasks from the BioASQ biomedical question
answering challenge and demonstrate significant performance gains over a wide
range of state-of-the-art models.
Authors' comments: To appear in ECIR 2017
Sheikh Muhammad Sarwar, John Foley, James Allan
We address the role of a user in Contextual Named Entity Retrieval (CNER), showing (1) that user identification of important context-bearing terms is superior to automated approaches, and (2) that further gains are possible if the user indicates the relative importance of those terms. CNER is similar in spirit to List Question answering and Entity disambiguation. However, the main focus of CNER is to obtain user feedback for constructing a profile for a class of entities on the fly and use that to retrieve entities from free text. Given a sentence, and an entity selected from that sentence, CNER aims to retrieve sentences that have entities similar to query entity. This paper explores obtaining term relevance feedback and importance weighting from humans in order to improve a CNER system. We report our findings based on the efforts of IR researchers as well as crowdsourced workers.
Didac Surís, Amanda Duarte, Amaia Salvador, Jordi Torres, Xavier Giró-i-Nieto
The increasing amount of online videos brings several opportunities for
training self-supervised neural networks. The creation of large scale datasets
of videos such as the YouTube-8M allows us to deal with this large amount of
data in manageable way. In this work, we find new ways of exploiting this
dataset by taking advantage of the multi-modal information it provides. By
means of a neural network, we are able to create links between audio and visual
documents, by projecting them into a common region of the feature space,
obtaining joint audio-visual embeddings. These links are used to retrieve audio
samples that fit well to a given silent video, and also to retrieve images that
match a given a query audio. The results in terms of Recall@K obtained over a
subset of YouTube-8M videos show the potential of this unsupervised approach
for cross-modal feature learning. We train embeddings for both scales and
assess their quality in a retrieval problem, formulated as using the feature
extracted from one modality to retrieve the most similar videos based on the
features computed in the other modality.
Authors' comments: 6 pages, 3 figures
Michael J. Kurtz
What is "intelligent" information retrieval? Essentially this is asking what
is intelligence, in this article I will attempt to show some of the aspects of
human intelligence, as related to information retrieval. I will do this by the
device of a semi-imaginary Oracle. Every Observatory has an oracle, someone who
is a distinguished scientist, has great administrative responsibilities, acts
as mentor to a number of less senior people, and as trusted advisor to even the
most accomplished scientists, and knows essentially everyone in the field. In
an appendix I will present a brief summary of the Statistical Factor Space
method for text indexing and retrieval, and indicate how it will be used in the
Astrophysics Data System Abstract Service. 2018 Keywords: Personal Digital
Assistant; Supervised Topic Models
Authors' comments: Author copy; published 25 years ago at the beginning of the
Astrophysics Data System; 2018 keywords added
Xu Liu, Lili Zhao, Dajun Ding, Yajiao Dong
This paper proposes an end-to-end deep hashing framework with category mask for fast video retrieval. We train our network in a supervised way by fully exploiting inter-class diversity and intra-class identity. Classification loss is optimized to maximize inter-class diversity, while intra-pair is introduced to learn representative intra-class identity. We investigate the binary bits distribution related to categories and find out that the effectiveness of binary bits is highly correlated with data categories, and some bits may degrade classification performance of some categories. We then design hash code generation scheme with category mask to filter out bits with negative contribution. Experimental results demonstrate the proposed method outperforms several state-of-the-arts under various evaluation metrics on public datasets.