Qingnan Jiang, Mingxuan Wang, Jun Cao, Shanbo Cheng, Shujian Huang, Lei Li
How to effectively adapt neural machine translation (NMT) models according to
emerging cases without retraining? Despite the great success of neural machine
translation, updating the deployed models online remains a challenge. Existing
non-parametric approaches that retrieve similar examples from a database to
guide the translation process are promising but are prone to overfit the
retrieved examples. In this work, we propose to learn Kernel-Smoothed
Translation with Example Retrieval (KSTER), an effective approach to adapt
neural machine translation models online. Experiments on domain adaptation and
multi-domain machine translation datasets show that even without expensive
retraining, KSTER is able to achieve improvement of 1.1 to 1.5 BLEU scores over
the best existing online adaptation methods. The code and trained models are
released at https://github.com/jiangqn/KSTER.
Authors' comments: EMNLP 2021
Zhijian Hou, Chong-Wah Ngo, Wing Kwong Chan
This paper tackles a recently proposed Video Corpus Moment Retrieval task.
This task is essential because advanced video retrieval applications should
enable users to retrieve a precise moment from a large video corpus. We propose
a novel CONtextual QUery-awarE Ranking~(CONQUER) model for effective moment
localization and ranking. CONQUER explores query context for multi-modal fusion
and representation learning in two different steps. The first step derives
fusion weights for the adaptive combination of multi-modal video content. The
second step performs bi-directional attention to tightly couple video and query
as a single joint representation for moment localization. As query context is
fully engaged in video representation learning, from feature fusion to
transformation, the resulting feature is user-centered and has a larger
capacity in capturing multi-modal signals specific to query. We conduct studies
on two datasets, TVR for closed-world TV episodes and DiDeMo for open-world
user-generated videos, to investigate the potential advantages of fusing video
and query online as a joint representation for moment retrieval.
Authors' comments: 10 pages, 4 figures, 2021 MultiMedia, code:
https://github.com/houzhijian/CONQUER
Pranav Aggarwal, Ritiz Tambi, Ajinkya Kale
There has been a recent spike in interest in multi-modal Language and Vision
problems. On the language side, most of these models primarily focus on English
since most multi-modal datasets are monolingual. We try to bridge this gap with
a zero-shot approach for learning multi-modal representations using
cross-lingual pre-training on the text side. We present a simple yet practical
approach for building a cross-lingual image retrieval model which trains on a
monolingual training dataset but can be used in a zero-shot cross-lingual
fashion during inference. We also introduce a new objective function which
tightens the text embedding clusters by pushing dissimilar texts away from each
other. For evaluation, we introduce a new 1K multi-lingual MSCOCO2014 caption
test dataset (XTD10) in 7 languages that we collected using a crowdsourcing
platform. We use this as the test set for zero-shot model performance across
languages. We also demonstrate how a cross-lingual model can be used for
downstream tasks like multi-lingual image tagging in a zero shot manner. XTD10
dataset is made publicly available here:
https://github.com/adobe-research/Cross-lingual-Test-Dataset-XTD10.
Authors' comments: Presented at Workshop on Multilingual Search, in conjunction with
30th The Web Conference 2021. arXiv admin note: substantial text overlap with
arXiv:2012.05107
David Bartusel, Hartmut Führ, Vignon Oussa
We study phase retrieval for group frames arising from permutation
representations, focusing on the action of the affine group of a finite field.
We investigate various versions of the phase retrieval problem, including
conjugate phase retrieval, sign retrieval, and matrix recovery. Our main result
establishes that the canonical irreducible representation of the affine group
$\mathbb{Z}_p \rtimes \mathbb{Z}_p^\ast$ (with $p$ prime), acting on the
vectors in $\mathbb{C}^{p}$ with zero-sum, has the strongest retrieval
property, allowing to reconstruct matrices from scalar products with a group
orbit consisting of rank-one projections. We explicitly characterize the
generating vectors that ensure this property, provide a linear matrix recovery
algorithm and explicit examples of vectors that allow matrix recovery. We also
comment on more general permutation representations.
Authors' comments: Slightly updated file, but no substantial changes
Misael Mongiovì, Aldo Gangemi
Verifying the veracity of claims requires reasoning over a large knowledge base, often in the form of corpora of trustworthy sources. A common approach consists in retrieving short portions of relevant text from the reference documents and giving them as input to a natural language inference module that determines whether the claim can be inferred or contradicted from them. This approach, however, struggles when multiple pieces of evidence need to be collected and combined from different documents, since the single documents are often barely related to the target claim and hence they are left out by the retrieval module. We conjecture that a graph-based approach can be beneficial to identify fragmented evidence. We tested this hypothesis by building, over the whole corpus, a large graph that interconnects text portions by means of mentioned entities and exploiting such a graph for identifying candidate sets of evidence from multiple sources. Our experiments show that leveraging on a graph structure is beneficial in identifying a reasonably small portion of passages related to a claim.
Oleg Borisov, Mohammad Aliannejadi, Fabio Crestani
Recent research has shown that mixed-initiative conversational search, based
on the interaction between users and computers to clarify and improve a query,
provides enormous advantages. Nonetheless, incorporating additional information
provided by the user from the conversation poses some challenges. In fact,
further interactions could confuse the system as a user might use words
irrelevant to the information need but crucial for correct sentence
construction in the context of multi-turn conversations. To this aim, in this
paper, we have collected two conversational keyword extraction datasets and
propose an end-to-end document retrieval pipeline incorporating them.
Furthermore, we study the performance of two neural keyword extraction models,
namely, BERT and sequence to sequence, in terms of extraction accuracy and
human annotation. Finally, we study the effect of keyword extraction on the
end-to-end neural IR performance and show that our approach beats
state-of-the-art IR models. We make the two datasets publicly available to
foster research in this area.
Authors' comments: Accepted in IIR 2021
O. Munoz, E. Frattin, T. Jardiel, J. C. Gomez-Martin, F. Moreno, J. L. Ramos, D. Guirado, M. Peiteado et al.
We present the experimental phase function, degree of linear polarization
(DLP), and linear depolarization (deltaL) curves of a set of forsterite samples
representative of low-absorbing cosmic dust particles. The samples are prepared
using state-of-the-art size-segregating techniques to obtain narrow size
distributions spanning a broad range of the scattering size parameter domain.
We conclude that the behavior of the phase function at the side- and
back-scattering regions provides information on the size regime, the position
and magnitude of the maximum of the DLP curve are strongly dependent on
particle size, the negative polarization branch is mainly produced by particles
with size parameters in the approx. 6 to 20 range, and the deltaL is strongly
dependent on particle size at all measured phase angles except for the exact
backward direction. From a direct comparison of the experimental data with
computations for spherical particles, it becomes clear that the use of the
spherical model for simulating the phase function and DLP curves of irregular
dust produces dramatic errors in the retrieved composition and size of the
scattering particles: The experimental phase functions are reproduced by
assuming unrealistically high values of the imaginary part of the refractive
index. The spherical model does not reproduce the bell-shaped DLP curve of dust
particles with sizes in the resonance and/or geometric optics size domain.
Thus, the use of the Mie model for analyzing polarimetric observations might
prevent locating dust particles with sizes of the order of or larger than the
wavelength of the incident light.
Authors' comments: Published in ApJS
Eunhyek Joa, Yibo Sun, Francesco Borrelli
We address the problem of finding the current position and heading angle of an autonomous vehicle in real-time using a single camera. Compared to methods which require LiDARs and high definition (HD) 3D maps in real-time, the proposed approach is easily scalable and computationally efficient, at the price of lower precision. The new method combines and adapts existing algorithms in three different fields: image retrieval, mapping database, and particle filtering. The result is a simple, real-time localization method using an image retrieval method whose performance is comparable to other monocular camera localization methods which use a map built with LiDARs. We evaluate the proposed method using the KITTI odometry dataset and via closed-loop experiments with an indoor 1:10 autonomous vehicle. The tests demonstrate real-time capability and a 10cm level accuracy. Also, experimental results of the closed-loop indoor tests show the presence of a positive feedback loop between the localization error and the control error. Such phenomena is analysed in details at the end of the article.
Hao Fu, Yan Wang, Ruihua Song, Tianran Hu, Jianyun Nie
The ability of a dialog system to express consistent language style during conversations has a direct, positive impact on its usability and on user satisfaction. Although previous studies have demonstrated that style transfer is feasible with a large amount of parallel data, it is often impossible to collect such data for different styles. In this paper, instead of manually constructing conversation data with a certain style, we propose a flexible framework that adapts a generic retrieval-based dialogue system to mimic the language style of a specified persona without any parallel data. Our approach is based on automatic generation of stylized data by learning the usage of jargon, and then rewriting the generic conversations to a stylized one by incorporating the jargon. In experiments we implemented dialogue systems with five distinct language styles, and the result shows our framework significantly outperforms baselines in terms of the average score of responses' relevance and style degree, and content diversity. A/B testing on a commercial chatbot shows that users are more satisfied with our system. This study demonstrates the feasibility of building stylistic dialogue systems by simple data augmentation.
Tianyi Liu, Andreas M. Tillmann, Yang Yang, Yonina C. Eldar, Marius Pesavento
Phase retrieval aims at reconstructing unknown signals from magnitude
measurements of linear mixtures. In this paper, we consider the phase retrieval
with dictionary learning problem, which includes an additional prior
information that the measured signal admits a sparse representation over an
unknown dictionary. The task is to jointly estimate the dictionary and the
sparse representation from magnitude-only measurements. To this end, we study
two complementary formulations and develop efficient parallel algorithms by
extending the successive convex approximation framework using a smooth
majorization. The first algorithm is termed compact-SCAphase and is preferable
in the case of less diverse mixture models. It employs a compact formulation
that avoids the use of auxiliary variables. The proposed algorithm is highly
scalable and has reduced parameter tuning cost. The second algorithm, referred
to as SCAphase, uses auxiliary variables and is favorable in the case of highly
diverse mixture models. It also renders simple incorporation of additional side
constraints. The performance of both methods is evaluated when applied to blind
sparse channel estimation from subband magnitude measurements in a
multi-antenna random access network. Simulation results demonstrate the
efficiency of the proposed techniques compared to state-of-the-art methods.
Authors' comments: \c{opyright} 2023 IEEE. Personal use of this material is permitted.
Permission from IEEE must be obtained for all other uses, in any current or
future media, including reprinting/republishing this material for advertising
or promotional purposes, creating new collective works, for resale or
redistribution to servers or lists, or reuse of any copyrighted component of
this work in other works
Jinpeng Wang, Ziyun Zeng, Bin Chen, Tao Dai, Shu-Tao Xia
The high efficiency in computation and storage makes hashing (including
binary hashing and quantization) a common strategy in large-scale retrieval
systems. To alleviate the reliance on expensive annotations, unsupervised deep
hashing becomes an important research problem. This paper provides a novel
solution to unsupervised deep quantization, namely Contrastive Quantization
with Code Memory (MeCoQ). Different from existing reconstruction-based
strategies, we learn unsupervised binary descriptors by contrastive learning,
which can better capture discriminative visual semantics. Besides, we uncover
that codeword diversity regularization is critical to prevent contrastive
learning-based quantization from model degeneration. Moreover, we introduce a
novel quantization code memory module that boosts contrastive learning with
lower feature drift than conventional feature memories. Extensive experiments
on benchmark datasets show that MeCoQ outperforms state-of-the-art methods.
Code and configurations are publicly available at
https://github.com/gimpong/AAAI22-MeCoQ.
Authors' comments: Accepted for AAAI'22 (Oral). 9 pages, 4 figures, 3 tables
Weiqin Zou, Enming Li, Chunrong Fang
Static bug localization techniques that locate bugs at method granularity have gained much attention from both researchers and practitioners. For a static method-level bug localization technique, a key but challenging step is to fully retrieve the semantics of methods and bug reports. Currently, existing studies mainly use the same bag-of-word space to represent the semantics of methods and bug reports without considering structure information of methods and textual contexts of bug reports, which largely and negatively affects bug localization performance. To address this problem, we develop BLESER, a new bug localization technique based on enhanced semantic retrieval. Specifically, we use an AST-based code embedding model (capturing code structure better) to retrieve the semantics of methods, and word embedding models (capturing textual contexts better) to represent the semantics of bug reports. Then, a deep learning model is built on the enhanced semantic representations. During model building, we compare five typical word embedding models in representing bug reports and try to explore the usefulness of re-sampling strategies and cost-sensitive strategies in handling class imbalance problems. We evaluate our BLESER on five Java projects from the Defects4J dataset. We find that: (1) On the whole, the word embedding model ELMo outperformed the other four models (including word2vec, BERT, etc.) in facilitating bug localization techniques. (2) Among four strategies aiming at solving class imbalance problems, the strategy ROS (random over-sampling) performed much better than the other three strategies (including random under-sampling, Focal Loss, etc.). (3) By integrating ELMo and ROS into BLESER, at method-level bug localization, we could achieve MAP of 0.108-0.504, MRR of 0.134-0.510, and Accuracy@1 of 0.125-0.5 on five Defects4J projects.
Man Luo, Yankai Zeng, Pratyay Banerjee, Chitta Baral
Knowledge-based visual question answering (VQA) requires answering questions
with external knowledge in addition to the content of images. One dataset that
is mostly used in evaluating knowledge-based VQA is OK-VQA, but it lacks a gold
standard knowledge corpus for retrieval. Existing work leverage different
knowledge bases (e.g., ConceptNet and Wikipedia) to obtain external knowledge.
Because of varying knowledge bases, it is hard to fairly compare models'
performance. To address this issue, we collect a natural language knowledge
base that can be used for any VQA system. Moreover, we propose a Visual
Retriever-Reader pipeline to approach knowledge-based VQA. The visual retriever
aims to retrieve relevant knowledge, and the visual reader seeks to predict
answers based on given knowledge. We introduce various ways to retrieve
knowledge using text and images and two reader styles: classification and
extraction. Both the retriever and reader are trained with weak supervision.
Our experimental results show that a good retriever can significantly improve
the reader's performance on the OK-VQA challenge. The code and corpus are
provided in https://github.com/luomancs/retriever\_reader\_for\_okvqa.git
Authors' comments: accepted at EMNLP 2021
Young Kyun Jang, Nam Ik Cho
Supervised deep learning-based hash and vector quantization are enabling fast
and large-scale image retrieval systems. By fully exploiting label annotations,
they are achieving outstanding retrieval performances compared to the
conventional methods. However, it is painstaking to assign labels precisely for
a vast amount of training data, and also, the annotation process is
error-prone. To tackle these issues, we propose the first deep unsupervised
image retrieval method dubbed Self-supervised Product Quantization (SPQ)
network, which is label-free and trained in a self-supervised manner. We design
a Cross Quantized Contrastive learning strategy that jointly learns codewords
and deep visual descriptors by comparing individually transformed images
(views). Our method analyzes the image contents to extract descriptive
features, allowing us to understand image representations for accurate
retrieval. By conducting extensive experiments on benchmarks, we demonstrate
that the proposed method yields state-of-the-art results even without
supervised pretraining.
Authors' comments: ICCV 2021
Peter C. Dillinger, Lorenz Hübschle-Schneider, Peter Sanders, Stefan Walzer
A retrieval data structure for a static function $f:S\rightarrow \{0,1\}^r$ supports queries that return $f(x)$ for any $x \in S$. Retrieval data structures can be used to implement a static approximate membership query data structure (AMQ), i.e., a Bloom filter alternative, with false positive rate $2^{-r}$. The information-theoretic lower bound for both tasks is $r|S|$ bits. While succinct theoretical constructions using $(1+o(1))r|S|$ bits were known, these could not achieve very small overheads in practice because they have an unfavorable space--time tradeoff hidden in the asymptotic costs or because small overheads would only be reached for physically impossible input sizes. With bumped ribbon retrieval (BuRR), we present the first practical succinct retrieval data structure. In an extensive experimental evaluation BuRR achieves space overheads well below 1\,\% while being faster than most previously used retrieval data structures (typically with space overheads at least an order of magnitude larger) and faster than classical Bloom filters (with space overhead $\geq 44\,\%$). This efficiency, including favorable constants, stems from a combination of simplicity, word parallelism, and high locality. We additionally describe homogeneous ribbon filter AMQs, which are even simpler and faster at the price of slightly larger space overhead.
Zixian Huang, Ao Wu, Yulin Shen, Gong Cheng, Yuzhong Qu
Scenario-based question answering (SQA) requires retrieving and reading
paragraphs from a large corpus to answer a question which is contextualized by
a long scenario description. Since a scenario contains both keyphrases for
retrieval and much noise, retrieval for SQA is extremely difficult. Moreover,
it can hardly be supervised due to the lack of relevance labels of paragraphs
for SQA. To meet the challenge, in this paper we propose a joint
retriever-reader model called JEEVES where the retriever is implicitly
supervised only using QA labels via a novel word weighting mechanism. JEEVES
significantly outperforms a variety of strong baselines on multiple-choice
questions in three SQA datasets.
Authors' comments: 10 pages, accepted to Findings of EMNLP 2021
Michael Glass, Gaetano Rossiello, Md Faisal Mahbub Chowdhury, Alfio Gliozzo
Automatically inducing high quality knowledge graphs from a given collection
of documents still remains a challenging problem in AI. One way to make headway
for this problem is through advancements in a related task known as slot
filling. In this task, given an entity query in form of [Entity, Slot, ?], a
system is asked to fill the slot by generating or extracting the missing value
exploiting evidence extracted from relevant passage(s) in the given document
collection. The recent works in the field try to solve this task in an
end-to-end fashion using retrieval-based language models. In this paper, we
present a novel approach to zero-shot slot filling that extends dense passage
retrieval with hard negatives and robust training procedures for retrieval
augmented generation models. Our model reports large improvements on both T-REx
and zsRE slot filling datasets, improving both passage retrieval and slot value
generation, and ranking at the top-1 position in the KILT leaderboard.
Moreover, we demonstrate the robustness of our system showing its domain
adaptation capability on a new variant of the TACRED dataset for slot filling,
through a combination of zero/few-shot learning. We release the source code and
pre-trained models.
Authors' comments: Accepted at EMNLP 2021. arXiv admin note: substantial text overlap
with arXiv:2104.08610
HongChien Yu, Chenyan Xiong, Jamie Callan
Dense retrieval systems conduct first-stage retrieval using embedded
representations and simple similarity metrics to match a query to documents.
Its effectiveness depends on encoded embeddings to capture the semantics of
queries and documents, a challenging task due to the shortness and ambiguity of
search queries. This paper proposes ANCE-PRF, a new query encoder that uses
pseudo relevance feedback (PRF) to improve query representations for dense
retrieval. ANCE-PRF uses a BERT encoder that consumes the query and the top
retrieved documents from a dense retrieval model, ANCE, and it learns to
produce better query embeddings directly from relevance labels. It also keeps
the document index unchanged to reduce overhead. ANCE-PRF significantly
outperforms ANCE and other recent dense retrieval systems on several datasets.
Analysis shows that the PRF encoder effectively captures the relevant and
complementary information from PRF documents, while ignoring the noise with its
learned attention mechanism.
Authors' comments: Accepted at CIKM 2021
Shengyao Zhuang, Guido Zuccon
Passage retrieval and ranking is a key task in open-domain question answering
and information retrieval. Current effective approaches mostly rely on
pre-trained deep language model-based retrievers and rankers. These methods
have been shown to effectively model the semantic matching between queries and
passages, also in presence of keyword mismatch, i.e. passages that are relevant
to a query but do not contain important query keywords. In this paper we
consider the Dense Retriever (DR), a passage retrieval method, and the BERT
re-ranker, a popular passage re-ranking method. In this context, we formally
investigate how these models respond and adapt to a specific type of keyword
mismatch -- that caused by keyword typos occurring in queries. Through
empirical investigation, we find that typos can lead to a significant drop in
retrieval and ranking effectiveness. We then propose a simple typos-aware
training framework for DR and BERT re-ranker to address this issue. Our
experimental results on the MS MARCO passage ranking dataset show that, with
our proposed typos-aware training, DR and BERT re-ranker can become robust to
typos in queries, resulting in significantly improved effectiveness compared to
models trained without appropriately accounting for typos.
Authors' comments: Short paper, accepted at EMNLP2021 main conference
N S Kamal, Barathi Ganesh HB, Sajith Variyar VV, Sowmya V, Soman KP
Manufacturing industries have widely adopted the reuse of machine parts as a
method to reduce costs and as a sustainable manufacturing practice.
Identification of reusable features from the design of the parts and finding
their similar features from the database is an important part of this process.
In this project, with the help of fully convolutional geometric features, we
are able to extract and learn the high level semantic features from CAD models
with inductive transfer learning. The extracted features are then compared with
that of other CAD models from the database using Frobenius norm and identical
features are retrieved. Later we passed the extracted features to a deep
convolutional neural network with a spatial pyramid pooling layer and the
performance of the feature retrieval increased significantly. It was evident
from the results that the model could effectively capture the geometrical
elements from machining features.
Authors' comments: Submitted to 9th International Conference on Frontiers of Intelligent
Computing: Theory and Applications (FICTA 2021)