Mohammad Aliannejadi, Manajit Chakraborty, Esteban Andrés Ríssola, Fabio Crestani
With the improvements in speech recognition and voice generation technologies
over the last years, a lot of companies have sought to develop conversation
understanding systems that run on mobile phones or smart home devices through
natural language interfaces. Conversational assistants, such as Google
Assistant and Microsoft Cortana, can help users to complete various types of
tasks. This requires an accurate understanding of the user's information need
as the conversation evolves into multiple turns. Finding relevant context in a
conversation's history is challenging because of the complexity of natural
language and the evolution of a user's information need. In this work, we
present an extensive analysis of language, relevance, dependency of user
utterances in a multi-turn information-seeking conversation. To this aim, we
have annotated relevant utterances in the conversations released by the TREC
CaST 2019 track. The annotation labels determine which of the previous
utterances in a conversation can be used to improve the current one.
Furthermore, we propose a neural utterance relevance model based on BERT
fine-tuning, outperforming competitive baselines. We study and compare the
performance of multiple retrieval models, utilizing different strategies to
incorporate the user's context. The experimental results on both classification
and retrieval tasks show that our proposed approach can effectively identify
and incorporate the conversation context. We show that processing the current
utterance using the predicted relevant utterance leads to a 38% relative
improvement in terms of nDCG@20. Finally, to foster research in this area, we
have released the dataset of the annotations.
Authors' comments: To appear in ACM CHIIR 2020, Vancouver, BC, Canada
Ahmed F. Al-Refaie, Quentin Changeat, Ingo P. Waldmann, Giovanna Tinetti
TauREx 3 is the next generation of the TauREx exoplanet atmospheric retrieval framework for Windows, Mac, and Linux. It is a complete rewrite with a full Python stack that makes it easy-to-use, high-performance, dynamic, and flexible. The new main TauREx program is built with modularity in mind, allowing the user to augment its functionalities with custom code and efficiently perform retrievals on custom parameters. We achieve this result by dynamic determination of fitting parameters, whereby TauREx 3 can detect new parameters for retrieval from user code through a simple interface. TauREx 3 can act as a library with a simple 'import taurex' command, providing a rich set of classes and functions related to atmospheric modelling. A 10x speedup in forward model computations is achieved as compared to the previous version with a sixfold reduction in retrieval times while maintaining robust results. TauREx 3 is intended as a standalone, all-in-one package for retrievals while the TauREx 3 Python library can build or augment a user's custom data pipeline easily.
Liang Pang, Jun Xu, Qingyao Ai, Yanyan Lan, Xueqi Cheng, Jirong Wen
In learning-to-rank for information retrieval, a ranking model is
automatically learned from the data and then utilized to rank the sets of
retrieved documents. Therefore, an ideal ranking model would be a mapping from
a document set to a permutation on the set, and should satisfy two critical
requirements: (1)~it should have the ability to model cross-document
interactions so as to capture local context information in a query; (2)~it
should be permutation-invariant, which means that any permutation of the
inputted documents would not change the output ranking. Previous studies on
learning-to-rank either design uni-variate scoring functions that score each
document separately, and thus failed to model the cross-document interactions;
or construct multivariate scoring functions that score documents sequentially,
which inevitably sacrifice the permutation invariance requirement. In this
paper, we propose a neural learning-to-rank model called SetRank which directly
learns a permutation-invariant ranking model defined on document sets of any
size. SetRank employs a stack of (induced) multi-head self attention blocks as
its key component for learning the embeddings for all of the retrieved
documents jointly. The self-attention mechanism not only helps SetRank to
capture the local context information from cross-document interactions, but
also to learn permutation-equivariant representations for the inputted
documents, which therefore achieving a permutation-invariant ranking model.
Experimental results on three large scale benchmarks showed that the SetRank
significantly outperformed the baselines include the traditional
learning-to-rank models and state-of-the-art Neural IR models.
Authors' comments: Accepted at SIGIR 2020
Vincent Christlein, Anguelos Nicolaou, Mathias Seuret, Dominique Stutzmann, Andreas Maier
This competition investigates the performance of large-scale retrieval of historical document images based on writing style. Based on large image data sets provided by cultural heritage institutions and digital libraries, providing a total of 20 000 document images representing about 10 000 writers, divided in three types: writers of (i) manuscript books, (ii) letters, (iii) charters and legal documents. We focus on the task of automatic image retrieval to simulate common scenarios of humanities research, such as writer retrieval. The most teams submitted traditional methods not using deep learning techniques. The competition results show that a combination of methods is outperforming single methods. Furthermore, letters are much more difficult to retrieve than manuscripts.
Eitan Levin, Tamir Bendory
The properties of gradient techniques for the phase retrieval problem have received a considerable attention in recent years. In almost all applications, however, the phase retrieval problem is solved using a family of algorithms that can be interpreted as variants of Douglas-Rachford splitting. In this work, we establish a connection between Douglas-Rachford and gradient algorithms. Specifically, we show that in some cases a generalization of Douglas-Rachford, called relaxed-reflect-reflect (RRR), can be viewed as gradient descent on a certain objective function. The solutions coincide with the critical points of that objective, which---in contrast to standard gradient techniques---are not its minimizers. Using the objective function, we give simple proofs of some basic properties of the RRR algorithm. Specifically, we describe its set of solutions, show a local convexity around any solution, and derive stability guarantees. Nevertheless, in its present state, the analysis does not elucidate the remarkable empirical performance of RRR and its global properties.
Carlos Badenes-Olmedo, Jose-Luis Redondo-Garcia, Oscar Corcho
Cross-lingual annotations of legislative texts enable us to explore major
themes covered in multilingual legal data and are a key facilitator of semantic
similarity when searching for similar documents. Multilingual probabilistic
topic models have recently emerged as a group of semi-supervised machine
learning models that can be used to perform thematic explorations on
collections of texts in multiple languages. However, these approaches require
theme-aligned training data to create a language-independent space, which
limits the amount of scenarios where this technique can be used. In this work,
we provide an unsupervised document similarity algorithm based on hierarchies
of multi-lingual concepts to describe topics across languages. The algorithm
does not require parallel or comparable corpora, or any other type of
translation resource. Experiments performed on the English, Spanish, French and
Portuguese editions of JCR-Acquis corpora reveal promising results on
classifying and sorting documents by similar content.
Authors' comments: IberLegal Workshop co-located with Jurix 2019
Tien-Thanh Vu, Dat Quoc Nguyen
A price information retrieval (IR) system allows users to search and view
differences among prices of specific products. Building product-price driven IR
system is a challenging and active research area. Approaches entirely depending
products information provided by shops via interface environment encounter
limitations of database. While automatic systems specifically require product
names and commercial websites for their input. For both paradigms, approaches
of building product-price IR system for Vietnamese are still very limited. In
this paper, we introduce an automatic Vietnamese IR system for product-price by
identifying and storing Xpath patterns to extract prices of products from
commercial websites. Experiments of our system show promising results.
Authors' comments: In Proceedings of the 2011 IEEE International Conference on Granular
Computing (GrC 2011)
Karlheinz Gröchenig
We study the problem of recovering a function of the form $f(x) = \sum _{k\in \mathbb{Z} } c_k e^{-(x-k)^2}$ from its phaseless samples $|f(\lambda )|$ on some arbitrary countable set $\Lambda \subseteq \mathbb{R} $. For real-valued functions this is possible up to a sign for every separated set with Beurling density $D^-(\Lambda ) >2$. This result is sharp. For complex-valued functions we find all possible solutions with the same phaseless samples.
Ruicong Xu, Li Niu, Jianfu Zhang, Liqing Zhang
Activity image-to-video retrieval task aims to retrieve videos containing the
similar activity as the query image, which is a challenging task because videos
generally have many background segments irrelevant to the activity. In this
paper, we utilize R-C3D model to represent a video by a bag of activity
proposals, which can filter out background segments to some extent. However,
there are still noisy proposals in each bag. Thus, we propose an Activity
Proposal-based Image-to-Video Retrieval (APIVR) approach, which incorporates
multi-instance learning into cross-modal retrieval framework to address the
proposal noise issue. Specifically, we propose a Graph Multi-Instance Learning
(GMIL) module with graph convolutional layer, and integrate this module with
classification loss, adversarial loss, and triplet loss in our cross-modal
retrieval framework. Moreover, we propose geometry-aware triplet loss based on
point-to-subspace distance to preserve the structural information of activity
proposals. Extensive experiments on three widely-used datasets verify the
effectiveness of our approach.
Authors' comments: The Thirty-Fourth AAAI Conference on Artificial Intelligence
Saad Farooq
This survey paper discusses different forms of malicious techniques that can affect how an information retrieval model retrieves documents for a query and their remedies.
Chuyuan Xiong, Deyuan Zhang, Tao Liu, Xiaoyong Du
Cross-modal associations between voice and face from a person can be learnt algorithmically, which can benefit a lot of applications. The problem can be defined as voice-face matching and retrieval tasks. Much research attention has been paid on these tasks recently. However, this research is still in the early stage. Test schemes based on random tuple mining tend to have low test confidence. Generalization ability of models can not be evaluated by small scale datasets. Performance metrics on various tasks are scarce. A benchmark for this problem needs to be established. In this paper, first, a framework based on comprehensive studies is proposed for voice-face matching and retrieval. It achieves state-of-the-art performance with various performance metrics on different tasks and with high test confidence on large scale datasets, which can be taken as a baseline for the follow-up research. In this framework, a voice anchored L2-Norm constrained metric space is proposed, and cross-modal embeddings are learned with CNN-based networks and triplet loss in the metric space. The embedding learning process can be more effective and efficient with this strategy. Different network structures of the framework and the cross language transfer abilities of the model are also analyzed. Second, a voice-face dataset (with 1.15M face data and 0.29M audio data) from Chinese speakers is constructed, and a convenient and quality controllable dataset collection tool is developed. The dataset and source code of the paper will be published together with this paper.
Zhijie Lin, Zhou Zhao, Zhu Zhang, Qi Wang, Huasheng Liu
Video moment retrieval is to search the moment that is most relevant to the
given natural language query. Existing methods are mostly trained in a
fully-supervised setting, which requires the full annotations of temporal
boundary for each query. However, manually labeling the annotations is actually
time-consuming and expensive. In this paper, we propose a novel
weakly-supervised moment retrieval framework requiring only coarse video-level
annotations for training. Specifically, we devise a proposal generation module
that aggregates the context information to generate and score all candidate
proposals in one single pass. We then devise an algorithm that considers both
exploitation and exploration to select top-K proposals. Next, we build a
semantic completion module to measure the semantic similarity between the
selected proposals and query, compute reward and provide feedbacks to the
proposal generation module for scoring refinement. Experiments on the
ActivityCaptions and Charades-STA demonstrate the effectiveness of our proposed
method.
Authors' comments: Accepted by AAAI 2020 as a full paper
Tamas Kovacs
We propose a novel method applied to extrasolar planetary dynamics to
describe the system stability. The observations in this field serve the
measurements mainly of radial velocity, transit time, and/or celestial
position. These scalar time series are used to build up the high-dimensional
phase space trajectory representing the dynamical evolution of planetary
motion. The framework of nonlinear time series analysis and Poincar\'e
recurrences allows us to transform the obtained univariate signals into complex
networks whose topology carries the dynamical properties of the underlying
system. The network-based analysis is able to distinguish the regular and
chaotic behaviour not only for synthetic inputs but also for noisy and
irregularly sampled real world observations. The proposed scheme does not
require neither n-body integration nor best fitting planetary model to perform
the stability investigation, therefore, the computation time can be reduced
drastically compared to those of the standard numerical methods.
Authors' comments: 19 pages, 20 figures, accepted for publication in mnras
Qin Zou, Zheng Zhang, Ling Cao, Long Chen, Song Wang
Hash coding has been widely used in approximate nearest neighbor search for
large-scale image retrieval. Given semantic annotations such as class labels
and pairwise similarities of the training data, hashing methods can learn and
generate effective and compact binary codes. While some newly introduced images
may contain undefined semantic labels, which we call unseen images, zeor-shot
hashing techniques have been studied. However, existing zeor-shot hashing
methods focus on the retrieval of single-label images, and cannot handle
multi-label images. In this paper, for the first time, a novel transductive
zero-shot hashing method is proposed for multi-label unseen image retrieval. In
order to predict the labels of the unseen/target data, a visual-semantic bridge
is built via instance-concept coherence ranking on the seen/source data. Then,
pairwise similarity loss and focal quantization loss are constructed for
training a hashing model using both the seen/source and unseen/target data.
Extensive evaluations on three popular multi-label datasets demonstrate that,
the proposed hashing method achieves significantly better results than the
competing methods.
Authors' comments: 15 pages
Sadik Bessou, Mohamed Touahria
This paper provides a method for indexing and retrieving Arabic texts, based on natural language processing. Our approach exploits the notion of template in word stemming and replaces the words by their stems. This technique has proven to be effective since it has returned significant relevant retrieval results by decreasing silence during the retrieval phase. Series of experiments have been conducted to test the performance of the proposed algorithm ESAIR (Enhanced Stemmer for Arabic Information Retrieval). The results obtained indicate that the algorithm extracts the exact root with an accuracy rate up to 96% and hence, improving information retrieval.
Giovanni Bruno, Nikole K. Lewis, Munazza K. Alam, Mercedes López-Morales, Joanna K. Barstow, Hannah R. Wakeford, David Sing, Gregory W. Henry et al.
We perform atmospheric retrievals on the full optical to infrared ($0.3-5 \,
\mu \mathrm{m}$) transmission spectrum of the inflated hot Jupiter WASP-52b by
combining HST/STIS, WFC3 IR, and Spitzer/IRAC observations. As WASP-52 is an
active star which shows both out-of-transit photometric variability and
starspot crossings during transits, we account for the contribution of
non-occulted active regions in the retrieval. We recover a $0.1-10\times$ solar
atmospheric composition, in agreement with core accretion predictions for giant
planets, and a weak contribution of aerosols. We also obtain a $<3000$ K
temperature for the starspots, a measure which is likely affected by the models
used to fit instrumental effects in the transits, and a 5% starspot fractional
coverage, compatible with expectations for the host star's spectral type. Such
constraints on the planetary atmosphere and on the activity of its host star
will inform future JWST GTO observations of this target.
Authors' comments: 16 pages, 11 figures, 1 table. Updated figure 5
Ledell Wu, Fabio Petroni, Martin Josifoski, Sebastian Riedel, Luke Zettlemoyer
This paper introduces a conceptually simple, scalable, and highly effective
BERT-based entity linking model, along with an extensive evaluation of its
accuracy-speed trade-off. We present a two-stage zero-shot linking algorithm,
where each entity is defined only by a short textual description. The first
stage does retrieval in a dense space defined by a bi-encoder that
independently embeds the mention context and the entity descriptions. Each
candidate is then re-ranked with a cross-encoder, that concatenates the mention
and entity text. Experiments demonstrate that this approach is state of the art
on recent zero-shot benchmarks (6 point absolute gains) and also on more
established non-zero-shot evaluations (e.g. TACKBP-2010), despite its relative
simplicity (e.g. no explicit entity embeddings or manually engineered mention
tables). We also show that bi-encoder linking is very fast with nearest
neighbour search (e.g. linking with 5.9 million candidates in 2 milliseconds),
and that much of the accuracy gain from the more expensive cross-encoder can be
transferred to the bi-encoder via knowledge distillation. Our code and models
are available at https://github.com/facebookresearch/BLINK.
Authors' comments: accepted at EMNLP 2020
Andrew O. Arnold, William W. Cohen
We focus on the problem of search in the multilingual setting. Examining the problems of next-sentence prediction and inverse cloze, we show that at large scale, instance-based transfer learning is surprisingly effective in the multilingual setting, leading to positive transfer on all of the 35 target languages and two tasks tested. We analyze this improvement and argue that the most natural explanation, namely direct vocabulary overlap between languages, only partially explains the performance gains: in fact, we demonstrate target-language improvement can occur after adding data from an auxiliary language even with no vocabulary in common with the target. This surprising result is due to the effect of transitive vocabulary overlaps between pairs of auxiliary and target languages.
Peter J. Christopher, Timothy D. Wilkinson
Time multiplexed approaches for high frame-rate holographic displays have been around since the invention of One-Step Phase-Retrieval (OSPR) in the early 2000s. When discovered, formulations were created for variance reduction but other image quality metrics were ignored. This work sets out statistical models for the mean squared error (MSE) and structural similarity index (SSIM) behaviour of OSPR for a range of image types in order to better understand the effect of time multiplexing on visible images. This finds that while observed variances converges to zero as the number of frames per second increases, MSE converges to a non-zero value while SSIM converges quadratically to a non-unitary value.
Hiren Galiyawala, Mehul S Raval, Shivansh Dave
Visual appearance-based person retrieval is a challenging problem in
surveillance. It uses attributes like height, cloth color, cloth type and
gender to describe a human. Such attributes are known as soft biometrics. This
paper proposes person retrieval from surveillance video using height, torso
cloth type, torso cloth color and gender. The approach introduces an adaptive
torso patch extraction and bounding box regression to improve the retrieval.
The algorithm uses fine-tuned Mask R-CNN and DenseNet-169 for person detection
and attribute classification respectively. The performance is analyzed on AVSS
2018 challenge II dataset and it achieves 11.35% improvement over
state-of-the-art based on average Intersection over Union measure.
Authors' comments: 11 pages