Jenifer Kalafatovich, Minji Lee, Seong-Whan Lee
Many studies have explored brain signals during the performance of a memory
task to predict later remembered items. However, prediction methods are still
poorly used in real life and are not practical due to the use of
electroencephalography (EEG) recorded from the scalp. Ear-EEG has been recently
used to measure brain signals due to its flexibility when applying it to real
world environments. In this study, we attempt to predict whether a shown
stimulus is going to be remembered or forgotten using ear-EEG and compared its
performance with scalp-EEG. Our results showed that there was no significant
difference between ear-EEG and scalp-EEG. In addition, the higher prediction
accuracy was obtained using a convolutional neural network (pre-stimulus:
74.06%, on-going stimulus: 69.53%) and it was compared to other baseline
methods. These results showed that it is possible to predict performance of a
memory task using ear-EEG signals and it could be used for predicting memory
retrieval in a practical brain-computer interface.
Authors' comments: Accected for publication at EMBC 2020
Wonseok Hwang, Jinyeong Yim, Seunghyun Park, Minjoon Seo
Deep learning approaches to semantic parsing require a large amount of
labeled data, but annotating complex logical forms is costly. Here, we propose
Syntactic Question Abstraction and Retrieval (SQAR), a method to build a neural
semantic parser that translates a natural language (NL) query to a SQL logical
form (LF) with less than 1,000 annotated examples. SQAR first retrieves a
logical pattern from the train data by computing the similarity between NL
queries and then grounds a lexical information on the retrieved pattern in
order to generate the final LF. We validate SQAR by training models using
various small subsets of WikiSQL train data achieving up to 4.9% higher LF
accuracy compared to the previous state-of-the-art models on WikiSQL test set.
We also show that by using query-similarity to retrieve logical pattern, SQAR
can leverage a paraphrasing dataset achieving up to 5.9% higher LF accuracy
compared to the case where SQAR is trained by using only WikiSQL data. In
contrast to a simple pattern classification approach, SQAR can generate unseen
logical patterns upon the addition of new examples without re-training the
model. We also discuss an ideal way to create cost efficient and robust train
datasets when the data distribution can be approximated under a data-hungry
setting.
Authors' comments: Accepted to AKBC 2020 (conference paper)
Yi Luan, Jacob Eisenstein, Kristina Toutanova, Michael Collins
Dual encoders perform retrieval by encoding documents and queries into dense
lowdimensional vectors, scoring each document by its inner product with the
query. We investigate the capacity of this architecture relative to sparse
bag-of-words models and attentional neural networks. Using both theoretical and
empirical analysis, we establish connections between the encoding dimension,
the margin between gold and lower-ranked documents, and the document length,
suggesting limitations in the capacity of fixed-length encodings to support
precise retrieval of long documents. Building on these insights, we propose a
simple neural model that combines the efficiency of dual encoders with some of
the expressiveness of more costly attentional architectures, and explore
sparse-dense hybrids to capitalize on the precision of sparse retrieval. These
models outperform strong alternatives in large-scale retrieval.
Authors' comments: To appear in TACL 2020. The arXiv version is a pre-MIT Press
publication version
Wen Yu Kon, Charles Ci Wen Lim
Private information retrieval (PIR) is a database query protocol that
provides user privacy, in that the user can learn a particular entry of the
database of his interest but his query would be hidden from the data centre.
Symmetric private information retrieval (SPIR) takes PIR further by
additionally offering database privacy, where the user cannot learn any
additional entries of the database. Unconditionally secure SPIR solutions with
multiple databases are known classically, but are unrealistic because they
require long shared secret keys between the parties for secure communication
and shared randomness in the protocol. Here, we propose using quantum key
distribution (QKD) instead for a practical implementation, which can realise
both the secure communication and shared randomness requirements. We prove that
QKD maintains the security of the SPIR protocol and that it is also secure
against any external eavesdropper. We also show how such a classical-quantum
system could be implemented practically, using the example of a two-database
SPIR protocol with keys generated by measurement device-independent QKD.
Through key rate calculations, we show that such an implementation is feasible
at the metropolitan level with current QKD technology.
Authors' comments: 19 pages
Wentao Ma, Yiming Cui, Ting Liu, Dong Wang, Shijin Wang, Guoping Hu
Human conversations contain many types of information, e.g., knowledge,
common sense, and language habits. In this paper, we propose a conversational
word embedding method named PR-Embedding, which utilizes the conversation pairs
$ \left\langle{post, reply} \right\rangle$ to learn word embedding. Different
from previous works, PR-Embedding uses the vectors from two different semantic
spaces to represent the words in post and reply. To catch the information among
the pair, we first introduce the word alignment model from statistical machine
translation to generate the cross-sentence window, then train the embedding on
word-level and sentence-level. We evaluate the method on single-turn and
multi-turn response selection tasks for retrieval-based dialog systems. The
experiment results show that PR-Embedding can improve the quality of the
selected response. PR-Embedding source code is available at
https://github.com/wtma/PR-Embedding
Authors' comments: To appear at ACL 2020
Tharindu Ranasinghe, Constantin Orasan, Ruslan Mitkov
Matching and retrieving previously translated segments from a Translation
Memory is the key functionality in Translation Memories systems. However this
matching and retrieving process is still limited to algorithms based on edit
distance which we have identified as a major drawback in Translation Memories
systems. In this paper we introduce sentence encoders to improve the matching
and retrieving process in Translation Memories systems - an effective and
efficient solution to replace edit distance based algorithms.
Authors' comments: Accepted to EAMT 2020
Amir Vakili Tahami, Kamyar Ghajar, Azadeh Shakery
Response retrieval is a subset of neural ranking in which a model selects a
suitable response from a set of candidates given a conversation history.
Retrieval-based chat-bots are typically employed in information seeking
conversational systems such as customer support agents. In order to make
pairwise comparisons between a conversation history and a candidate response,
two approaches are common: cross-encoders performing full self-attention over
the pair and bi-encoders encoding the pair separately. The former gives better
prediction quality but is too slow for practical use. In this paper, we propose
a new cross-encoder architecture and transfer knowledge from this model to a
bi-encoder model using distillation. This effectively boosts bi-encoder
performance at no cost during inference time. We perform a detailed analysis of
this approach on three response retrieval datasets.
Authors' comments: Accepted for publication in the 43rd International ACM SIGIR
Conference on Research and Development in Information Retrieval (SIGIR '20)
E. L. Sirks, P. Clark, R. J. Massey, S. J. Benton, A. M. Brown, C. J. Damaren, T. Eifler, A. A. Fraisse et al.
We present a publicly-available toolkit of flight-proven hardware and software to retrieve 5 TB of data or small physical samples from a stratospheric balloon platform. Before launch, a capsule is attached to the balloon, and rises with it. Upon remote command, the capsule is released and descends via parachute, continuously transmitting its location. Software to predict the trajectory can be used to select a safe but accessible landing site. We dropped two such capsules from the SuperBIT telescope, in September 2019. The capsules took ~37 minutes to descend from ~30 km altitude. They drifted 32 km and 19 km horizontally, but landed within 300 m and 600 m of their predicted landing sites. We found them easily, and successfully recovered the data. We welcome interest from other balloon teams for whom the technology would be useful.
Matthew C. Nixon, Nikku Madhusudhan
Atmospheric retrieval of exoplanets from spectroscopic observations requires
an extensive exploration of a highly degenerate and high-dimensional parameter
space to accurately constrain atmospheric parameters. Retrieval methods
commonly conduct Bayesian parameter estimation and statistical inference using
sampling algorithms such as Markov Chain Monte Carlo (MCMC) or Nested Sampling.
Recently several attempts have been made to use machine learning algorithms
either to complement or replace fully Bayesian methods. While much progress has
been made, these approaches are still at times unable to accurately reproduce
results from contemporary Bayesian retrievals. The goal of our present work is
to investigate the efficacy of machine learning for atmospheric retrieval. As a
case study, we use the Random Forest supervised machine learning algorithm
which has been applied previously with some success for atmospheric retrieval
of the hot Jupiter WASP-12b using its near-infrared transmission spectrum. We
reproduce previous results using the same approach and the same semi-analytic
models, and subsequently extend this method to develop a new algorithm that
results in a closer match to a fully Bayesian retrieval. We combine this new
method with a fully numerical atmospheric model and demonstrate excellent
agreement with a Bayesian retrieval of the transmission spectrum of another hot
Jupiter, HD 209458b. Despite this success, and achieving high computational
efficiency, we still find that the machine learning approach is computationally
prohibitive for high-dimensional parameter spaces that are routinely explored
with Bayesian retrievals with modest computational resources. We discuss the
trade offs and potential avenues for the future.
Authors' comments: 13 pages, 14 figures. Accepted for publication in MNRAS
Daniel Yang, Thitaree Tanprasert, Teerapat Jenrungrot, Mengyi Shan, TJ Tsai
This paper investigates a cross-modal retrieval problem in which a user would
like to retrieve a passage of music from a MIDI file by taking a cell phone
picture of a physical page of sheet music. While audio-sheet music retrieval
has been explored by a number of works, this scenario is novel in that the
query is a cell phone picture rather than a digital scan. To solve this
problem, we introduce a mid-level feature representation called a bootleg score
which explicitly encodes the rules of Western musical notation. We convert both
the MIDI and the sheet music into bootleg scores using deterministic rules of
music and classical computer vision techniques for detecting simple geometric
shapes. Once the MIDI and cell phone image have been converted into bootleg
scores, we estimate the alignment using dynamic programming. The most notable
characteristic of our system is that it does test-time adaptation and has no
trainable weights at all -- only a set of about 30 hyperparameters. On a
dataset containing 1000 cell phone pictures taken of 100 scores of classical
piano music, our system achieves an F measure score of .869 and outperforms
baseline systems based on commercial optical music recognition software.
Authors' comments: 8 pages, 8 figures, 1 table. Accepted paper at the International
Society for Music Information Retrieval Conference (ISMIR) 2019
Federico Vaccaro, Marco Bertini, Tiberio Uricchio, Alberto Del Bimbo
In this paper, we address the problem of image retrieval by learning images
representation based on the activations of a Convolutional Neural Network. We
present an end-to-end trainable network architecture that exploits a novel
multi-scale local pooling based on NetVLAD and a triplet mining procedure based
on samples difficulty to obtain an effective image representation. Extensive
experiments show that our approach is able to reach state-of-the-art results on
three standard datasets.
Authors' comments: Accepted at ICMR 2020
Nicola Messina, Fabrizio Falchi, Andrea Esuli, Giuseppe Amato
Image-text matching is an interesting and fascinating task in modern AI
research. Despite the evolution of deep-learning-based image and text
processing systems, multi-modal matching remains a challenging problem. In this
work, we consider the problem of accurate image-text matching for the task of
multi-modal large-scale information retrieval. State-of-the-art results in
image-text matching are achieved by inter-playing image and text features from
the two different processing pipelines, usually using mutual attention
mechanisms. However, this invalidates any chance to extract separate visual and
textual features needed for later indexing steps in large-scale retrieval
systems. In this regard, we introduce the Transformer Encoder Reasoning Network
(TERN), an architecture built upon one of the modern relationship-aware
self-attentive architectures, the Transformer Encoder (TE). This architecture
is able to separately reason on the two different modalities and to enforce a
final common abstract concept space by sharing the weights of the deeper
transformer layers. Thanks to this design, the implemented network is able to
produce compact and very rich visual and textual features available for the
successive indexing step. Experiments are conducted on the MS-COCO dataset, and
we evaluate the results using a discounted cumulative gain metric with
relevance computed exploiting caption similarities, in order to assess possibly
non-exact but relevant search results. We demonstrate that on this metric we
are able to achieve state-of-the-art results in the image retrieval task. Our
code is freely available at https://github.com/mesnico/TERN.
Authors' comments: Presented at ICPR 2020
Huy Manh Nguyen, Tomo Miyazaki, Yoshihiro Sugaya, Shinichiro Omachi
Visual-semantic embedding aims to learn a joint embedding space where related
video and sentence instances are located close to each other. Most existing
methods put instances in a single embedding space. However, they struggle to
embed instances due to the difficulty of matching visual dynamics in videos to
textual features in sentences. A single space is not enough to accommodate
various videos and sentences. In this paper, we propose a novel framework that
maps instances into multiple individual embedding spaces so that we can capture
multiple relationships between instances, leading to compelling video
retrieval. We propose to produce a final similarity between instances by fusing
similarities measured in each embedding space using a weighted sum strategy. We
determine the weights according to a sentence. Therefore, we can flexibly
emphasize an embedding space. We conducted sentence-to-video retrieval
experiments on a benchmark dataset. The proposed method achieved superior
performance, and the results are competitive to state-of-the-art methods. These
experimental results demonstrated the effectiveness of the proposed multiple
embedding approach compared to existing methods.
Authors' comments: 8 pages, 5 figures
Uma Roy, Noah Constant, Rami Al-Rfou, Aditya Barua, Aaron Phillips, Yinfei Yang
We present LAReQA, a challenging new benchmark for language-agnostic answer retrieval from a multilingual candidate pool. Unlike previous cross-lingual tasks, LAReQA tests for "strong" cross-lingual alignment, requiring semantically related cross-language pairs to be closer in representation space than unrelated same-language pairs. Building on multilingual BERT (mBERT), we study different strategies for achieving strong alignment. We find that augmenting training data via machine translation is effective, and improves significantly over using mBERT out-of-the-box. Interestingly, the embedding baseline that performs the best on LAReQA falls short of competing baselines on zero-shot variants of our task that only target "weak" alignment. This finding underscores our claim that languageagnostic retrieval is a substantively new kind of cross-lingual evaluation.
Rui Zhao, Kecheng Zheng, Zheng-jun Zha
Existing dominant approaches for cross-modal video-text retrieval task are to
learn a joint embedding space to measure the cross-modal similarity. However,
these methods rarely explore long-range dependency inside video frames or
textual words leading to insufficient textual and visual details. In this
paper, we propose a stacked convolutional deep encoding network for video-text
retrieval task, which considers to simultaneously encode long-range and
short-range dependency in the videos and texts. Specifically, a multi-scale
dilated convolutional (MSDC) block within our approach is able to encode
short-range temporal cues between video frames or text words by adopting
different scales of kernel size and dilation size of convolutional layer. A
stacked structure is designed to expand the receptive fields by repeatedly
adopting the MSDC block, which further captures the long-range relations
between these cues. Moreover, to obtain more robust textual representations, we
fully utilize the powerful language model named Transformer in two stages:
pretraining phrase and fine-tuning phrase. Extensive experiments on two
different benchmark datasets (MSR-VTT, MSVD) show that our proposed method
outperforms other state-of-the-art approaches.
Authors' comments: 6 pages
Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, Wen-tau Yih
Open-domain question answering relies on efficient passage retrieval to
select candidate contexts, where traditional sparse vector space models, such
as TF-IDF or BM25, are the de facto method. In this work, we show that
retrieval can be practically implemented using dense representations alone,
where embeddings are learned from a small number of questions and passages by a
simple dual-encoder framework. When evaluated on a wide range of open-domain QA
datasets, our dense retriever outperforms a strong Lucene-BM25 system largely
by 9%-19% absolute in terms of top-20 passage retrieval accuracy, and helps our
end-to-end QA system establish new state-of-the-art on multiple open-domain QA
benchmarks.
Authors' comments: EMNLP 2020
Bing Bai, Guanhua Zhang, Ye Lin, Hao Li, Kun Bai, Bo Luo
Nowadays, news apps have taken over the popularity of paper-based media, providing a great opportunity for personalization. Recurrent Neural Network (RNN)-based sequential recommendation is a popular approach that utilizes users' recent browsing history to predict future items. This approach is limited that it does not consider the societal influences of news consumption, i.e., users may follow popular topics that are constantly changing, while certain hot topics might be spreading only among specific groups of people. Such societal impact is difficult to predict given only users' own reading histories. On the other hand, the traditional User-based Collaborative Filtering (UserCF) makes recommendations based on the interests of the "neighbors", which provides the possibility to supplement the weaknesses of RNN-based methods. However, conventional UserCF only uses a single similarity metric to model the relationships between users, which is too coarse-grained and thus limits the performance. In this paper, we propose a framework of deep neural networks to integrate the RNN-based sequential recommendations and the key ideas from UserCF, to develop Collaborative Sequential Recommendation Networks (CSRNs). Firstly, we build a directed co-reading network of users, to capture the fine-grained topic-specific similarities between users in a vector space. Then, the CSRN model encodes users with RNNs, and learns to attend to neighbors and summarize what news they are reading at the moment. Finally, news articles are recommended according to both the user's own state and the summarized state of the neighbors. Experiments on two public datasets show that the proposed model outperforms the state-of-the-art approaches significantly.
Yulan Feng, Shikib Mehri, Maxine Eskenazi, Tiancheng Zhao
This paper discusses the importance of uncovering uncertainty in end-to-end
dialog tasks, and presents our experimental results on uncertainty
classification on the Ubuntu Dialog Corpus. We show that, instead of retraining
models for this specific purpose, the original retrieval model's underlying
confidence concerning the best prediction can be captured with trivial
additional computation.
Authors' comments: Accepted to ACL 2020 as short paper
Shuhei Yokoo, Kohei Ozaki, Edgar Simo-Serra, Satoshi Iizuka
We propose an efficient pipeline for large-scale landmark image retrieval
that addresses the diversity of the dataset through two-stage discriminative
re-ranking. Our approach is based on embedding the images in a feature-space
using a convolutional neural network trained with a cosine softmax loss. Due to
the variance of the images, which include extreme viewpoint changes such as
having to retrieve images of the exterior of a landmark from images of the
interior, this is very challenging for approaches based exclusively on visual
similarity. Our proposed re-ranking approach improves the results in two steps:
in the sort-step, $k$-nearest neighbor search with soft-voting to sort the
retrieved results based on their label similarity to the query images, and in
the insert-step, we add additional samples from the dataset that were not
retrieved by image-similarity. This approach allows overcoming the low visual
diversity in retrieved images. In-depth experimental results show that the
proposed approach significantly outperforms existing approaches on the
challenging Google Landmarks Datasets. Using our methods, we achieved 1st place
in the Google Landmark Retrieval 2019 challenge and 3rd place in the Google
Landmark Recognition 2019 challenge on Kaggle. Our code is publicly available
here: \url{https://github.com/lyakaap/Landmark2019-1st-and-3rd-Place-Solution}
Authors' comments: 10 pages, 5 figures
Lifa Hu, Wen Shena, Wenchao Ma, Dongting Hu, Xinyu Liu
Traditional phase-shifting interferometry technique cannot be used to measure
time-varying phase distributions. But single shot techniques could resolve the
problem. Many efforts have been made on the phase retrieval methods from a
single shot interferogram. In the paper, a simple and effective method is
presented without complex computation. The interference fringe is transferred
to a phase distribution with a look-up-table. And then it is divided into
different regions according to the parity of every pixel. The pixels in the
same region have the same parity, which determines the wrapped phase.
Additionally, the light spot displacement of a local wavefront is obtained to
solve the global sign ambiguity. The theoretical simulation results indicate
that the PV of wavefront error is 0.00054(lambda) and the rms is
0.000125(lambda), which is much better than the results from the Fast Fourier
Transformation method. We also use it in the experimentally measured
interferogram. Our algorithm has the advantages of simplicity, high precision
and effective for both open and closed interferometer fringes, which will be
valuable for real time monitoring the optical elements shape during their
processing.
Authors' comments: 12 pages, 8 figures