Chao-Chun Hsu, Eric Lind, Luca Soldaini, Alessandro Moschitti
Recent advancements in transformer-based models have greatly improved the
ability of Question Answering (QA) systems to provide correct answers; in
particular, answer sentence selection (AS2) models, core components of
retrieval-based systems, have achieved impressive results. While generally
effective, these models fail to provide a satisfying answer when all retrieved
candidates are of poor quality, even if they contain correct information. In
AS2, models are trained to select the best answer sentence among a set of
candidates retrieved for a given question. In this work, we propose to generate
answers from a set of AS2 top candidates. Rather than selecting the best
candidate, we train a sequence to sequence transformer model to generate an
answer from a candidate set. Our tests on three English AS2 datasets show
improvement up to 32 absolute points in accuracy over the state of the art.
Authors' comments: Short paper, Accepted at Findings of ACL 2021
Ikuya Yamada, Akari Asai, Hannaneh Hajishirzi
Most state-of-the-art open-domain question answering systems use a neural
retrieval model to encode passages into continuous vectors and extract them
from a knowledge source. However, such retrieval models often require large
memory to run because of the massive size of their passage index. In this
paper, we introduce Binary Passage Retriever (BPR), a memory-efficient neural
retrieval model that integrates a learning-to-hash technique into the
state-of-the-art Dense Passage Retriever (DPR) to represent the passage index
using compact binary codes rather than continuous vectors. BPR is trained with
a multi-task objective over two tasks: efficient candidate generation based on
binary codes and accurate reranking based on continuous vectors. Compared with
DPR, BPR substantially reduces the memory cost from 65GB to 2GB without a loss
of accuracy on two standard open-domain question answering benchmarks: Natural
Questions and TriviaQA. Our code and trained models are available at
https://github.com/studio-ousia/bpr.
Authors' comments: ACL 2021
Fangwei Ye, Salim El Rouayheb
We study the problem of intermittent private information retrieval with multiple servers, in which a user consecutively requests one of K messages from N replicated databases such that part of requests need to be protected while others do not need privacy. Motivated by the location privacy application, the correlation between requests is modeled by a Markov chain. We propose an intermittent private information retrieval scheme that concatenates an obfuscation scheme and a private information retrieval scheme for the time period when privacy is not needed, to prevent leakage incurred by the correlation over time. In the end, we illustrate how the proposed scheme for the problem of intermittent private information retrieval with Markov structure correlation can be applied to design a location privacy protection mechanism in the location privacy problem.
Juliang Li, P. Barry, C. Chang
This article presents pedagogical explanation of retrieving the resonance parameters $Q_{L}$, $Q_{o}$ and $Q_{c}$ from both reflection and transmission measurement of microwave resonator. Here $Q_{L}$ stands for the total or loaded quality factor (Q), $Q_{o}$ is the internal Q and $Q_{c}$ is the coupling or external Q. Matlab Code based on the methods is available for download for direct calculation of the Qs.\cite{lighq}
Shuai Bai, Zhedong Zheng, Xiaohan Wang, Junyang Lin, Zhu Zhang, Chang Zhou, Yi Yang, Hongxia Yang
Vehicle search is one basic task for the efficient traffic management in
terms of the AI City. Most existing practices focus on the image-based vehicle
matching, including vehicle re-identification and vehicle tracking. In this
paper, we apply one new modality, i.e., the language description, to search the
vehicle of interest and explore the potential of this task in the real-world
scenario. The natural language-based vehicle search poses one new challenge of
fine-grained understanding of both vision and language modalities. To connect
language and vision, we propose to jointly train the state-of-the-art vision
models with the transformer-based language model in an end-to-end manner.
Except for the network structure design and the training strategy, several
optimization objectives are also re-visited in this work. The qualitative and
quantitative experiments verify the effectiveness of the proposed method. Our
proposed method has achieved the 1st place on the 5th AI City Challenge,
yielding competitive performance 18.69% MRR accuracy on the private test set.
We hope this work can pave the way for the future study on using language
description effectively and efficiently for real-world vehicle retrieval
systems. The code will be available at
https://github.com/ShuaiBai623/AIC2021-T5-CLV.
Authors' comments: CVPR 2021 AI CITY CHALLENGE Natural Language-Based Vehicle Retrieval
Top 1
Papri Dey, Dan Edidin
Let ${\mathcal A} = \{A_{1},\dots,A_{r}\}$ be a collection of linear
operators on ${\mathbb R}^m$. The degeneracy locus of ${\mathcal A}$ is defined
as the set of points $x \in {\mathbb P}^{m-1}$ for which rank$([A_1 x \ \dots \
A_{r} x]) \\ \leq m-1$. Motivated by results in phase retrieval we study
degeneracy loci of four linear operators on ${\mathbb R}^3$ and prove that the
degeneracy locus consists of 6 real points obtained by intersecting four real
lines if and only if the collection of matrices lies in the linear span of four
fixed rank one operators. We also relate such {\em quadrilateral
configurations} to the singularity locus of the corresponding Cayley cubic
symmetroid. More generally, we show that if $A_i , i = 1, \dots, m + 1$ are in
the linear span of $m + 1$ fixed rank-one matrices, the degeneracy locus
determines a {\em generalized Desargues configuration} which corresponds to a
Sylvester spectrahedron.
Authors' comments: This paper is incomplete
Shuhuai Ren, Junyang Lin, Guangxiang Zhao, Rui Men, An Yang, Jingren Zhou, Xu Sun, Hongxia Yang
Despite the achievements of large-scale multimodal pre-training approaches,
cross-modal retrieval, e.g., image-text retrieval, remains a challenging task.
To bridge the semantic gap between the two modalities, previous studies mainly
focus on word-region alignment at the object level, lacking the matching
between the linguistic relation among the words and the visual relation among
the regions. The neglect of such relation consistency impairs the
contextualized representation of image-text pairs and hinders the model
performance and the interpretability. In this paper, we first propose a novel
metric, Intra-modal Self-attention Distance (ISD), to quantify the relation
consistency by measuring the semantic distance between linguistic and visual
relations. In response, we present Inter-modal Alignment on Intra-modal
Self-attentions (IAIS), a regularized training method to optimize the ISD and
calibrate intra-modal self-attentions from the two modalities mutually via
inter-modal alignment. The IAIS regularizer boosts the performance of
prevailing models on Flickr30k and MS COCO datasets by a considerable margin,
which demonstrates the superiority of our approach.
Authors' comments: Accepted by ACL-IJCNLP 2021 main conference (Long Paper)
Yijiang Lian, Shuang Li, Chaobing Feng, YanFeng Zhu
Synonymous keyword retrieval has become an important problem for sponsored search ever since major search engines relax the exact match product's matching requirement to a synonymous level. Since the synonymous relations between queries and keywords are quite scarce, the traditional information retrieval framework is inefficient in this scenario. In this paper, we propose a novel quotient space-based retrieval framework to address this problem. Considering the synonymy among keywords as a mathematical equivalence relation, we can compress the synonymous keywords into one representative, and the corresponding quotient space would greatly reduce the size of the keyword repository. Then an embedding-based retrieval is directly conducted between queries and the keyword representatives. To mitigate the semantic gap of the quotient space-based retrieval, a single semantic siamese model is utilized to detect both the keyword--keyword and query-keyword synonymous relations. The experiments show that with our quotient space-based retrieval method, the synonymous keyword retrieving performance can be greatly improved in terms of memory cost and recall efficiency. This method has been successfully implemented in Baidu's online sponsored search system and has yielded a significant improvement in revenue.
Xingyi Yang, Muchao Ye, Quanzeng You, Fenglong Ma
Medical report generation is one of the most challenging tasks in medical
image analysis. Although existing approaches have achieved promising results,
they either require a predefined template database in order to retrieve
sentences or ignore the hierarchical nature of medical report generation. To
address these issues, we propose MedWriter that incorporates a novel
hierarchical retrieval mechanism to automatically extract both report and
sentence-level templates for clinically accurate report generation. MedWriter
first employs the Visual-Language Retrieval~(VLR) module to retrieve the most
relevant reports for the given images. To guarantee the logical coherence
between sentences, the Language-Language Retrieval~(LLR) module is introduced
to retrieve relevant sentences based on the previous generated description. At
last, a language decoder fuses image features and features from retrieved
reports and sentences to generate meaningful medical reports. We verified the
effectiveness of our model by automatic evaluation and human evaluation on two
datasets, i.e., Open-I and MIMIC-CXR.
Authors' comments: Accepted by ACL 2021, Camera-ready version
Ming-Hsun Yang, Y. -W. Peter Hong, Jwo-Yuh Wu
Conventional sparse phase retrieval schemes can recover sparse signals from the magnitude of linear measurements only up to a global phase ambiguity. This work proposes a novel approach that instead utilizes the magnitude of affine measurements to achieve ambiguity-free signal reconstruction. The proposed method relies on two-stage approach that consists of support identification followed by the exact recovery of nonzero signal entries. In the noise-free case, perfect support identification using a simple counting rule is guaranteed subject to a mild condition on the signal sparsity, and subsequent exact recovery of the nonzero signal entries can be obtained in closed-form. The proposed approach is then extended to two noisy scenarios, namely, sparse noise (or outliers) and non-sparse bounded noise. For both cases, perfect support identification is still ensured under mild conditions on the noise model, namely, the support size for sparse outliers and the power of the bounded noise. Under perfect support identification, exact signal recovery can be achieved using a simple majority rule for the sparse noise scenario, and reconstruction up to a bounded error can be achieved using linear least-squares (LS) estimation for the non-sparse bounded noise scenario. The obtained analytic performance guarantee for the latter case also sheds light on the construction of the sensing matrix and bias vector. In fact, we show that a near optimal performance can be achieved with high probability by the random generation of the nonzero entries of the sparse sensing matrix and bias vector according to the uniform distribution over a circle. Computer simulations using both synthetic and real-world data sets are provided to demonstrate the effectiveness of the proposed scheme.
Satya Rajendra Singh, Shiv Ram Dubey, Shruthi MS, Sairathan Ventrapragada, Saivamshi Salla Dasharatha
Deep learning has shown a great improvement in the performance of visual tasks. Image retrieval is the task of extracting the visually similar images from a database for a query image. The feature matching is performed to rank the images. Various hand-designed features have been derived in past to represent the images. Nowadays, the power of deep learning is being utilized for automatic feature learning from data in the field of biomedical image analysis. Autoencoder and Siamese networks are two deep learning models to learn the latent space (i.e., features or embedding). Autoencoder works based on the reconstruction of the image from latent space. Siamese network utilizes the triplets to learn the intra-class similarity and inter-class dissimilarity. Moreover, Autoencoder is unsupervised, whereas Siamese network is supervised. We propose a Joint Triplet Autoencoder Network (JTANet) by facilitating the triplet learning in autoencoder framework. A joint supervised learning for Siamese network and unsupervised learning for Autoencoder is performed. Moreover, the Encoder network of Autoencoder is shared with Siamese network and referred as the Siamcoder network. The features are extracted by using the trained Siamcoder network for retrieval purpose. The experiments are performed over Histopathological Routine Colon Cancer dataset. We have observed the promising performance using the proposed JTANet model against the Autoencoder and Siamese models for colon cancer nuclei retrieval in histopathological images.
Giovanni Bonetta, Rossella Cancelliere, Ding Liu, Paul Vozila
Transformer-based models have demonstrated excellent capabilities of
capturing patterns and structures in natural language generation and achieved
state-of-the-art results in many tasks. In this paper we present a
transformer-based model for multi-turn dialog response generation. Our solution
is based on a hybrid approach which augments a transformer-based generative
model with a novel retrieval mechanism, which leverages the memorized
information in the training data via k-Nearest Neighbor search. Our system is
evaluated on two datasets made by customer/assistant dialogs: the Taskmaster-1,
released by Google and holding high quality, goal-oriented conversational data
and a proprietary dataset collected from a real customer service call center.
Both achieve better BLEU scores over strong baselines.
Authors' comments: The International FLAIRS Conference Proceedings volume 34 issue 1
Jiansheng Fang, Huazhu Fu, Dan Zeng, Xiao Yan, Yuguang Yan, Jiang Liu
When encountering a dubious diagnostic case, medical instance retrieval can
help radiologists make evidence-based diagnoses by finding images containing
instances similar to a query case from a large image database. The similarity
between the query case and retrieved similar cases is determined by visual
features extracted from pathologically abnormal regions. However, the
manifestation of these regions often lacks specificity, i.e., different
diseases can have the same manifestation, and different manifestations may
occur at different stages of the same disease. To combat the manifestation
ambiguity in medical instance retrieval, we propose a novel deep framework
called Y-Net, encoding images into compact hash-codes generated from
convolutional features by feature aggregation. Y-Net can learn highly
discriminative convolutional features by unifying the pixel-wise segmentation
loss and classification loss. The segmentation loss allows exploring subtle
spatial differences for good spatial-discriminability while the classification
loss utilizes class-aware semantic information for good semantic-separability.
As a result, Y-Net can enhance the visual features in pathologically abnormal
regions and suppress the disturbing of the background during model training,
which could effectively embed discriminative features into the hash-codes in
the retrieval stage. Extensive experiments on two medical image datasets
demonstrate that Y-Net can alleviate the ambiguity of pathologically abnormal
regions and its retrieval performance outperforms the state-of-the-art method
by an average of 9.27\% on the returned list of 10.
Authors' comments: 11 pages,8 figures, JBHI Journal
Jun-Woo Tak, Sang-Hyo Kim, Yongjune Kim, Jong-Seon No
Private information retrieval (PIR) is a protocol that guarantees the privacy of a user who is in communication with databases. The user wants to download one of the messages stored in the databases while hiding the identity of the desired message. Recently, the benefits that can be obtained by weakening the privacy requirement have been studied, but the definition of weak privacy needs to be elaborated upon. In this paper, we attempt to quantify the weak privacy (i.e., information leakage) in PIR problems by using the R\'enyi divergence that generalizes the Kullback-Leibler divergence. By introducing R\'enyi divergence into the existing PIR problem, the tradeoff relationship between privacy (information leakage) and PIR performance (download cost) is characterized via convex optimization. Furthermore, we propose an alternative PIR scheme with smaller message sizes than the Tian-Sun-Chen (TSC) scheme. The proposed scheme cannot achieve the PIR capacity of perfect privacy since the message size of the TSC scheme is the minimum to achieve the PIR capacity. However, we show that the proposed scheme can be better than the TSC scheme in the weakly PIR setting, especially under a low download cost regime.
Shuo Zhang, Krisztian Balog
Tables on the Web contain a vast amount of knowledge in a structured form. To
tap into this valuable resource, we address the problem of table retrieval:
answering an information need with a ranked list of tables. We investigate this
problem in two different variants, based on how the information need is
expressed: as a keyword query or as an existing table ("query-by-table"). The
main novel contribution of this work is a semantic table retrieval framework
for matching information needs (keyword or table queries) against tables.
Specifically, we (i) represent queries and tables in multiple semantic spaces
(both discrete sparse and continuous dense vector representations) and (ii)
introduce various similarity measures for matching those semantic
representations. We consider all possible combinations of semantic
representations and similarity measures and use these as features in a
supervised learning model. Using two purpose-built test collections based on
Wikipedia tables, we demonstrate significant and substantial improvements over
state-of-the-art baselines.
Authors' comments: ACM Transactions on the Web (TWEB). arXiv admin note: substantial
text overlap with arXiv:1802.06159
Yan Xu, Etsuko Ishii, Samuel Cahyawijaya, Zihan Liu, Genta Indra Winata, Andrea Madotto, Dan Su, Pascale Fung
To diversify and enrich generated dialogue responses, knowledge-grounded
dialogue has been investigated in recent years. The existing methods tackle the
knowledge grounding challenge by retrieving the relevant sentences over a large
corpus and augmenting the dialogues with explicit extra information. Despite
their success, however, the existing works have drawbacks in inference
efficiency. This paper proposes KnowExpert, a framework to bypass the explicit
retrieval process and inject knowledge into the pre-trained language models
with lightweight adapters and adapt to the knowledge-grounded dialogue task. To
the best of our knowledge, this is the first attempt to tackle this challenge
without retrieval in this task under an open-domain chit-chat scenario. The
experimental results show that Knowexpert performs comparably with some
retrieval-based baselines while being time-efficient in inference,
demonstrating the effectiveness of our proposed method.
Authors' comments: The first two authors contribute equally; Accepted in ACL 2022
DialDoc Workshop (Best Student Paper Award)
Patricio E. Cubillos, Dylan Keating, Nicolas B. Cowan, Johanna M. Vos, Ben Burningham, Marie Ygouf, Theodora Karalidi, Yifan Zhou et al.
Thermal phase variations of short period planets indicate that they are not
spherical cows: day-to-night temperature contrasts range from hundreds to
thousands of degrees, rivaling their vertical temperature contrasts.
Nonetheless, the emergent spectra of short-period planets have typically been
fit using one-dimensional (1D) spectral retrieval codes that only account for
vertical temperature gradients. The popularity of 1D spectral retrieval codes
is easy to understand: they are robust and have a rich legacy in Solar System
atmospheric studies. Exoplanet researchers have recently introduced
multi-dimensional retrieval schemes for interpreting the spectra of
short-period planets, but these codes are necessarily more complex and
computationally expensive than their 1D counterparts. In this paper we present
an alternative: phase-dependent spectral observations are inverted to produce
longitudinally resolved spectra that can then be fitted using standard 1D
spectral retrieval codes. We test this scheme on the iconic phase-resolved
spectra of WASP-43b and on simulated JWST observations using the open-source
pyratbay 1D spectral retrieval framework. Notably, we take the model complexity
of the simulations one step further over previous studies by allowing for
longitudinal variations in composition in addition to temperature. We show that
performing 1D spectral retrieval on longitudinally resolved spectra is more
accurate than applying 1D spectral retrieval codes to disk-integrated emission
spectra, despite being identical in terms of computational load. We find that
for the extant Hubble and Spitzer observations of WASP-43b the difference
between the two approaches is negligible but that JWST phase measurements
should be treated with longitudinally \textbf{re}solved \textbf{spect}ral
retrieval (ReSpect).
Authors' comments: Accepted for publication at The Astrophysical Journal
Zhusheng Wang, Sennur Ulukus
We consider the problem of symmetric private information retrieval (SPIR) with user-side common randomness. In SPIR, a user retrieves a message out of $K$ messages from $N$ non-colluding and replicated databases in such a way that no single database knows the retrieved message index (user privacy), and the user gets to know nothing further than the retrieved message (database privacy). SPIR has a capacity smaller than the PIR capacity which requires only user privacy, is infeasible in the case of a single database, and requires shared common randomness among the databases. We introduce a new variant of SPIR where the user is provided with a random subset of the shared database common randomness, which is unknown to the databases. We determine the exact capacity region of the triple $(d, \rho_S, \rho_U)$, where $d$ is the download cost, $\rho_S$ is the amount of shared database (server) common randomness, and $\rho_U$ is the amount of available user-side common randomness. We show that with a suitable amount of $\rho_U$, this new SPIR achieves the capacity of conventional PIR. As a corollary, single-database SPIR becomes feasible. Further, the presence of user-side $\rho_U$ reduces the amount of required server-side $\rho_S$.
Barbara Rychalska, Mikolaj Wieczorek, Jacek Dabrowski
The key challenge in cross-modal retrieval is to find similarities between
objects represented with different modalities, such as image and text. However,
each modality embeddings stem from non-related feature spaces, which causes the
notorious 'heterogeneity gap'. Currently, many cross-modal systems try to
bridge the gap with self-attention. However, self-attention has been widely
criticized for its quadratic complexity, which prevents many real-life
applications. In response to this, we propose T-EMDE - a neural density
estimator inspired by the recently introduced Efficient Manifold Density
Estimator (EMDE) from the area of recommender systems. EMDE operates on
sketches - representations especially suitable for multimodal operations.
However, EMDE is non-differentiable and ingests precomputed, static embeddings.
With T-EMDE we introduce a trainable version of EMDE which allows full
end-to-end training. In contrast to self-attention, the complexity of our
solution is linear to the number of tokens/segments. As such, T-EMDE is a
drop-in replacement for the self-attention module, with beneficial influence on
both speed and metric performance in cross-modal settings. It facilitates
communication between modalities, as each global text/image representation is
expressed with a standardized sketch histogram which represents the same
manifold structures irrespective of the underlying modality. We evaluate T-EMDE
by introducing it into two recent cross-modal SOTA models and achieving new
state-of-the-art results on multiple datasets and decreasing model latency by
up to 20%.
Authors' comments: 10 pages,5 figures, 4 tables, 1 code snippet
Chen Qu, Hamed Zamani, Liu Yang, W. Bruce Croft, Erik Learned-Miller
In this work, we address multi-modal information needs that contain text
questions and images by focusing on passage retrieval for outside-knowledge
visual question answering. This task requires access to outside knowledge,
which in our case we define to be a large unstructured passage collection. We
first conduct sparse retrieval with BM25 and study expanding the question with
object names and image captions. We verify that visual clues play an important
role and captions tend to be more informative than object names in sparse
retrieval. We then construct a dual-encoder dense retriever, with the query
encoder being LXMERT, a multi-modal pre-trained transformer. We further show
that dense retrieval significantly outperforms sparse retrieval that uses
object expansion. Moreover, dense retrieval matches the performance of sparse
retrieval that leverages human-generated captions.
Authors' comments: Accepted to SIGIR'21 as a short paper