Xueli Yu, Weizhi Xu, Zeyu Cui, Shu Wu, Liang Wang
The ad-hoc retrieval task is to rank related documents given a query and a
document collection. A series of deep learning based approaches have been
proposed to solve such problem and gained lots of attention. However, we argue
that they are inherently based on local word sequences, ignoring the subtle
long-distance document-level word relationships. To solve the problem, we
explicitly model the document-level word relationship through the graph
structure, capturing the subtle information via graph neural networks. In
addition, due to the complexity and scale of the document collections, it is
considerable to explore the different grain-sized hierarchical matching signals
at a more general level. Therefore, we propose a Graph-based Hierarchical
Relevance Matching model (GHRM) for ad-hoc retrieval, by which we can capture
the subtle and general hierarchical matching signals simultaneously. We
validate the effects of GHRM over two representative ad-hoc retrieval
benchmarks, the comprehensive experiments and results demonstrate its
superiority over state-of-the-art methods.
Authors' comments: To appear at WWW 2021
Yijiang Lian, Yubo Liu, Zhicong Ye, Liang Yuan, Yanfeng Zhu, Min Zhao, Jianyi Cheng, Xinwei Feng
In sponsored search, retrieving synonymous keywords for exact match type is important for accurately targeted advertising. Data-driven deep learning-based method has been proposed to tackle this problem. An apparent disadvantage of this method is its poor generalization performance on entity-level long-tail instances, even though they might share similar concept-level patterns with frequent instances. With the help of a large knowledge base, we find that most commercial synonymous query-keyword pairs can be abstracted into meaningful conceptual patterns through concept tagging. Based on this fact, we propose a novel knowledge-driven conceptual retrieval framework to mitigate this problem, which consists of three parts: data conceptualization, matching via conceptual patterns and concept-augmented discrimination. Both offline and online experiments show that our method is very effective. This framework has been successfully applied to Baidu's sponsored search system, which yields a significant improvement in revenue.
Shova Bhandari, Rini Raju
Social networks are rich source of data to analyze user habits in all aspects of life. User's behavior is decisive component of a health system in various countries. Promoting good behavior can improve the public health significantly. In this work, we develop a new model for social network analysis by using text analysis approach. We define each user reaction to global pandemic with analyzing his online behavior. Clustering a group of online users with similar habits, help to find how virus spread in different societies. Promoting the healthy life style in the high risk online users of social media have significant effect on public health and reducing the effect of global pandemic. In this work, we introduce a new approach to clustering habits based on user activities on social media in the time of pandemic and recommend a machine learning model to promote health in the online platforms.
Zhe Ma, Fenghao Liu, Jianfeng Dong, Xiaoye Qu, Yuan He, Shouling Ji
This paper aims for the language-based product image retrieval task. The
majority of previous works have made significant progress by designing network
structure, similarity measurement, and loss function. However, they typically
perform vision-text matching at certain granularity regardless of the intrinsic
multiple granularities of images. In this paper, we focus on the cross-modal
similarity measurement, and propose a novel Hierarchical Similarity Learning
(HSL) network. HSL first learns multi-level representations of input data by
stacked encoders, and object-granularity similarity and image-granularity
similarity are computed at each level. All the similarities are combined as the
final hierarchical cross-modal similarity. Experiments on a large-scale product
retrieval dataset demonstrate the effectiveness of our proposed method. Code
and data are available at https://github.com/liufh1/hsl.
Authors' comments: Accepted by ICASSP 2021. Code and data will be available at
https://github.com/liufh1/hsl
Svitlana Vakulenko, Nikos Voskarides, Zhucheng Tu, Shayne Longpre
This paper describes the participation of UvA.ILPS group at the TREC CAsT
2020 track. Our passage retrieval pipeline consists of (i) an initial retrieval
module that uses BM25, and (ii) a re-ranking module that combines the score of
a BERT ranking model with the score of a machine comprehension model adjusted
for passage retrieval. An important challenge in conversational passage
retrieval is that queries are often under-specified. Thus, we perform query
resolution, that is, add missing context from the conversation history to the
current turn query using QuReTeC, a term classification query resolution model.
We show that our best automatic and manual runs outperform the corresponding
median runs by a large margin.
Authors' comments: TREC 2020
Osman Tursun, Simon Denman, Sridha Sridharan, Ethan Goan, Clinton Fookes
Recently, Zero-shot Sketch-based Image Retrieval (ZS-SBIR) has attracted the attention of the computer vision community due to it's real-world applications, and the more realistic and challenging setting than found in SBIR. ZS-SBIR inherits the main challenges of multiple computer vision problems including content-based Image Retrieval (CBIR), zero-shot learning and domain adaptation. The majority of previous studies using deep neural networks have achieved improved results through either projecting sketch and images into a common low-dimensional space or transferring knowledge from seen to unseen classes. However, those approaches are trained with complex frameworks composed of multiple deep convolutional neural networks (CNNs) and are dependent on category-level word labels. This increases the requirements on training resources and datasets. In comparison, we propose a simple and efficient framework that does not require high computational training resources, and can be trained on datasets without semantic categorical labels. Furthermore, at training and inference stages our method only uses a single CNN. In this work, a pre-trained ImageNet CNN (e.g., ResNet50) is fine-tuned with three proposed learning objects: domain-aware quadruplet loss, semantic classification loss, and semantic knowledge preservation loss. The domain-aware quadruplet and semantic classification losses are introduced to learn discriminative, semantic and domain invariant features through considering ZS-SBIR as object detection and verification problem. ...
Nan Shao, Yiming Cui, Ting Liu, Shijin Wang, Guoping Hu
Retrieving information from correlative paragraphs or documents to answer
open-domain multi-hop questions is very challenging. To deal with this
challenge, most of the existing works consider paragraphs as nodes in a graph
and propose graph-based methods to retrieve them. However, in this paper, we
point out the intrinsic defect of such methods. Instead, we propose a new
architecture that models paragraphs as sequential data and considers multi-hop
information retrieval as a kind of sequence labeling task. Specifically, we
design a rewritable external memory to model the dependency among paragraphs.
Moreover, a threshold gate mechanism is proposed to eliminate the distraction
of noise paragraphs. We evaluate our method on both full wiki and distractor
subtask of HotpotQA, a public textual multi-hop QA dataset requiring multi-hop
information retrieval. Experiments show that our method achieves significant
improvement over the published state-of-the-art method in retrieval and
downstream QA task performance.
Authors' comments: 10 pages
Hai X. Pham, Ricardo Guerrero, Jiatong Li, Vladimir Pavlovic
Despite the abundance of multi-modal data, such as image-text pairs, there
has been little effort in understanding the individual entities and their
different roles in the construction of these data instances. In this work, we
endeavour to discover the entities and their corresponding importance in
cooking recipes automaticall} as a visual-linguistic association problem. More
specifically, we introduce a novel cross-modal learning framework to jointly
model the latent representations of images and text in the food image-recipe
association and retrieval tasks. This model allows one to discover complex
functional and hierarchical relationships between images and text, and among
textual parts of a recipe including title, ingredients and cooking
instructions. Our experiments show that by making use of efficient
tree-structured Long Short-Term Memory as the text encoder in our computational
cross-modal retrieval framework, we are not only able to identify the main
ingredients and cooking actions in the recipe descriptions without explicit
supervision, but we can also learn more meaningful feature representations of
food recipes, appropriate for challenging cross-modal retrieval and recipe
adaption tasks.
Authors' comments: 22 pages, accepted in AAAI 2021
Yinbin Ma, Daniela Tuninetti
Coded caching aims to minimize the network's peak-time communication load by
leveraging the information pre-stored in the local caches at the users. The
original single file retrieval setting by Maddah-Ali and Niesen has been
recently extended to general Scalar Linear Function Retrieval (SLFR) by Wan et
al., who proposed a linear scheme that surprisingly achieves the same optimal
load (under the constraint of uncoded cache placement) as in single file
retrieval. This paper's goal is to characterize the conditions under which a
general SLFR linear scheme is optimal and gain practical insights into why the
specific choices made by Wan et al. work. This paper shows that the optimal
decoding coefficients are necessarily the product of two terms, one only
involving the encoding coefficients and the other only the demands. In
addition, the relationships among the encoding coefficients are shown to be
captured by the cycles of certain graphs. Thus, a general linear scheme for
SLFR can be found by solving a spanning tree problem.
Authors' comments: Submitted to ISIT 2021
Al-Fahad M. Al-Qadhi, Carey E. Priebe, Hayden S. Helm, Vince Lyzinski
This paper introduces the subgraph nomination inference task, in which
example subgraphs of interest are used to query a network for similarly
interesting subgraphs. This type of problem appears time and again in real
world problems connected to, for example, user recommendation systems and
structural retrieval tasks in social and biological/connectomic networks. We
formally define the subgraph nomination framework with an emphasis on the
notion of a user-in-the-loop in the subgraph nomination pipeline. In this
setting, a user can provide additional post-nomination light supervision that
can be incorporated into the retrieval task. After introducing and formalizing
the retrieval task, we examine the nuanced effect that user-supervision can
have on performance, both analytically and across real and simulated data
examples.
Authors' comments: 37 pages, 11 figures
Yufeng Zhang, Jinghao Zhang, Zeyu Cui, Shu Wu, Liang Wang
To retrieve more relevant, appropriate and useful documents given a query,
finding clues about that query through the text is crucial. Recent deep
learning models regard the task as a term-level matching problem, which seeks
exact or similar query patterns in the document. However, we argue that they
are inherently based on local interactions and do not generalise to ubiquitous,
non-consecutive contextual relationships. In this work, we propose a novel
relevance matching model based on graph neural networks to leverage the
document-level word relationships for ad-hoc retrieval. In addition to the
local interactions, we explicitly incorporate all contexts of a term through
the graph-of-word text format. Matching patterns can be revealed accordingly to
provide a more accurate relevance score. Our approach significantly outperforms
strong baselines on two ad-hoc benchmarks. We also experimentally compare our
model with BERT and show our advantages on long documents.
Authors' comments: To appear at AAAI 2021
Jiansheng Fang, Huazhu Fu, Jiang Liu
Deep hashing methods have been shown to be the most efficient approximate
nearest neighbor search techniques for large-scale image retrieval. However,
existing deep hashing methods have a poor small-sample ranking performance for
case-based medical image retrieval. The top-ranked images in the returned query
results may be as a different class than the query image. This ranking problem
is caused by classification, regions of interest (ROI), and small-sample
information loss in the hashing space. To address the ranking problem, we
propose an end-to-end framework, called Attention-based Triplet Hashing (ATH)
network, to learn low-dimensional hash codes that preserve the classification,
ROI, and small-sample information. We embed a spatial-attention module into the
network structure of our ATH to focus on ROI information. The spatial-attention
module aggregates the spatial information of feature maps by utilizing
max-pooling, element-wise maximum, and element-wise mean operations jointly
along the channel axis. The triplet cross-entropy loss can help to map the
classification information of images and similarity between images into the
hash codes. Extensive experiments on two case-based medical datasets
demonstrate that our proposed ATH can further improve the retrieval performance
compared to the state-of-the-art deep hashing methods and boost the ranking
performance for small samples. Compared to the other loss methods, the triplet
cross-entropy loss can enhance the classification performance and hash
code-discriminability
Authors' comments: 12 pages, 6 figures, MedIA Journal
Tim Ziemer, Pattararat Kiattipadungkul, Tanyarin Karuchit
In the recording studio, producers of Electronic Dance Music (EDM) spend more
time creating, shaping, mixing and mastering sounds, than with compositional
aspects or arrangement. They tune the sound by close listening and by
leveraging audio metering and audio analysis tools, until they successfully
creat the desired sound aesthetics. DJs of EDM tend to play sets of songs that
meet their sound ideal. We therefore suggest using audio metering and
monitoring tools from the recording studio to analyze EDM, instead of relying
on conventional low-level audio features. We test our novel set of features by
a simple classification task. We attribute songs to DJs who would play the
specific song. This new set of features and the focus on DJ sets is targeted at
EDM as it takes the producer and DJ culture into account. With simple
dimensionality reduction and machine learning these features enable us to
attribute a song to a DJ with an accuracy of 63%. The features from the audio
metering and monitoring tools in the recording studio could serve for many
applications in Music Information Retrieval, such as genre, style and era
classification and music recommendation for both DJs and consumers of
electronic dance music.
Authors' comments: 13 pages, 9 figures, Meeting of the Acoustical Society of America,
Dec. 2020
Luyu Gao, Zhuyun Dai, Jamie Callan
Pre-trained deep language models~(LM) have advanced the state-of-the-art of
text retrieval. Rerankers fine-tuned from deep LM estimates candidate relevance
based on rich contextualized matching signals. Meanwhile, deep LMs can also be
leveraged to improve search index, building retrievers with better recall. One
would expect a straightforward combination of both in a pipeline to have
additive performance gain. In this paper, we discover otherwise and that
popular reranker cannot fully exploit the improved retrieval result. We,
therefore, propose a Localized Contrastive Estimation (LCE) for training
rerankers and demonstrate it significantly improves deep two-stage models.
Authors' comments: ECIR 2021
Yutao Zhu, Jian-Yun Nie, Kun Zhou, Pan Du, Zhicheng Dou
Grounding human-machine conversation in a document is an effective way to
improve the performance of retrieval-based chatbots. However, only a part of
the document content may be relevant to help select the appropriate response at
a round. It is thus crucial to select the part of document content relevant to
the current conversation context. In this paper, we propose a document content
selection network (CSN) to perform explicit selection of relevant document
contents, and filter out the irrelevant parts. We show in experiments on two
public document-grounded conversation datasets that CSN can effectively help
select the relevant document contents to the conversation context, and it
produces better results than the state-of-the-art approaches. Our code and
datasets are available at https://github.com/DaoD/CSN.
Authors' comments: ECIR 2021 Camera Ready
Robert Litschko, Ivan Vulić, Simone Paolo Ponzetto, Goran Glavaš
Pretrained multilingual text encoders based on neural Transformer
architectures, such as multilingual BERT (mBERT) and XLM, have achieved strong
performance on a myriad of language understanding tasks. Consequently, they
have been adopted as a go-to paradigm for multilingual and cross-lingual
representation learning and transfer, rendering cross-lingual word embeddings
(CLWEs) effectively obsolete. However, questions remain to which extent this
finding generalizes 1) to unsupervised settings and 2) for ad-hoc cross-lingual
IR (CLIR) tasks. Therefore, in this work we present a systematic empirical
study focused on the suitability of the state-of-the-art multilingual encoders
for cross-lingual document and sentence retrieval tasks across a large number
of language pairs. In contrast to supervised language understanding, our
results indicate that for unsupervised document-level CLIR -- a setup with no
relevance judgments for IR-specific fine-tuning -- pretrained encoders fail to
significantly outperform models based on CLWEs. For sentence-level CLIR, we
demonstrate that state-of-the-art performance can be achieved. However, the
peak performance is not met using the general-purpose multilingual text
encoders `off-the-shelf', but rather relying on their variants that have been
further specialized for sentence understanding tasks.
Authors' comments: accepted at ECIR'21 (preprint)
Mikaela Angelina Uy, Vladimir G. Kim, Minhyuk Sung, Noam Aigerman, Siddhartha Chaudhuri, Leonidas Guibas
We propose a novel technique for producing high-quality 3D models that match
a given target object image or scan. Our method is based on retrieving an
existing shape from a database of 3D models and then deforming its parts to
match the target shape. Unlike previous approaches that independently focus on
either shape retrieval or deformation, we propose a joint learning procedure
that simultaneously trains the neural deformation module along with the
embedding space used by the retrieval module. This enables our network to learn
a deformation-aware embedding space, so that retrieved models are more amenable
to match the target after an appropriate deformation. In fact, we use the
embedding space to guide the shape pairs used to train the deformation module,
so that it invests its capacity in learning deformations between meaningful
shape pairs. Furthermore, our novel part-aware deformation module can work with
inconsistent and diverse part-structures on the source shapes. We demonstrate
the benefits of our joint training not only on our novel framework, but also on
other state-of-the-art neural deformation modules proposed in recent years.
Lastly, we also show that our jointly-trained method outperforms various
non-joint baselines.
Authors' comments: CVPR '21 accepted paper
Jack Tyler, Alexander Wittig
In recent years, the retrieval of entire asteroids has received significant
attention, with many approaches leveraging the invariant manifolds of the
Circular-Restricted Three-body Problem to capture an asteroid into a periodic
orbit about the $L_1$ or $L_2$ points of the Sun-Earth system. Previous works
defined an `Easily Retrievable Object' (ERO) as any Near-Earth Object (NEO)
which is retrievable using these invariant manifolds with an impulsive $\Delta
v$ of less than $500$ m/s. We extend the previous literature by analysing the
Pareto fronts for the EROs discovered for the first time, using
high-performance computing to lift optimisation constraints used in previous
literature, and modifying the method used to filter unsuitable NEOs from the
NEO catalogue. In doing so, we can demonstrate that EROs have approximately the
same transfer cost for almost any possible transfer time, including
single-impulse transfers, which could offer significant flexibility to mission
designers. We also identify $44$ EROs, of which $27$ are new, and improve on
previously-known transfer solutions by up to $443$ m/s, including $17$ new
capture trajectories with $\Delta v$ costs of less than $100$ m/s.
Authors' comments: Updated to the accepted manuscript; to be published in Acta
Astronautica
Svitlana Vakulenko, Nikos Voskarides, Zhucheng Tu, Shayne Longpre
Conversational passage retrieval relies on question rewriting to modify the
original question so that it no longer depends on the conversation history.
Several methods for question rewriting have recently been proposed, but they
were compared under different retrieval pipelines. We bridge this gap by
thoroughly evaluating those question rewriting methods on the TREC CAsT 2019
and 2020 datasets under the same retrieval pipeline. We analyze the effect of
different types of question rewriting methods on retrieval performance and show
that by combining question rewriting methods of different types we can achieve
state-of-the-art performance on both datasets.
Authors' comments: ECIR 2021 short paper
Haim Kaplan, Jay Tenenbaum
Locality Sensitive Hashing (LSH) is an effective method of indexing a set of items to support efficient nearest neighbors queries in high-dimensional spaces. The basic idea of LSH is that similar items should produce hash collisions with higher probability than dissimilar items. We study LSH for (not necessarily convex) polygons, and use it to give efficient data structures for similar shape retrieval. Arkin et al. represent polygons by their "turning function" - a function which follows the angle between the polygon's tangent and the $ x $-axis while traversing the perimeter of the polygon. They define the distance between polygons to be variations of the $ L_p $ (for $p=1,2$) distance between their turning functions. This metric is invariant under translation, rotation and scaling (and the selection of the initial point on the perimeter) and therefore models well the intuitive notion of shape resemblance. We develop and analyze LSH near neighbor data structures for several variations of the $ L_p $ distance for functions (for $p=1,2$). By applying our schemes to the turning functions of a collection of polygons we obtain efficient near neighbor LSH-based structures for polygons. To tune our structures to turning functions of polygons, we prove some new properties of these turning functions that may be of independent interest. As part of our analysis, we address the following problem which is of independent interest. Find the vertical translation of a function $ f $ that is closest in $ L_1 $ distance to a function $ g $. We prove tight bounds on the approximation guarantee obtained by the translation which is equal to the difference between the averages of $ g $ and $ f $.