Şeyma Bodur, Edgar Martínez-Moro, Diego Ruano
A Private Information Retrieval (PIR) scheme allows users to retrieve data from a database without disclosing to the server information about the identity of the data retrieved. A coded storage in a distributed storage system with colluding servers is considered in this work, namely the approach in [$t$-private information retrieval schemes using transitive codes, IEEE Trans. Inform. Theory, vol. 65, no. 4, pp. 2107-2118, 2019] which considers a storage and retrieval code with a transitive group and provides binary PIR schemes with the highest possible rate. Reed-Muller codes were considered in [$t$-private information retrieval schemes using transitive codes, IEEE Trans. Inform. Theory, vol. 65, no. 4, pp. 2107-2118, 2019]. In this work, we consider cyclic codes and we show that binary PIR schemes using cyclic codes provide a larger constellation of PIR parameters and they may outperform the ones coming from Reed-Muller codes in some cases.
Alvet Miranda, Shah Jahan Miah
Objective: Our study objective is to design a feasible technology solution for health organizations to remove barriers to evidence-based clinical information retrieval, and improve Evidence-Based Practice. Methods: Literature from 2010 to 2020 was reviewed to define problems in evidence-based clinical information retrieval with recommendations from literature used to define solution objectives. Design Science Research is used to complete three projects in a research stream using cloud services such as Web-Scale Discovery, Content Management System, Federated Access, Global Knowledgebase, and Document Delivery. Design thinking, systems thinking, and user-oriented theory of information need are adopted to construct a design theory. Results: The research stream produced three novel and innovative artefacts: a contextual model, a unified architecture, and a context-aware unified architecture which we evaluate as part of academic reviews, scholarly publications, and conference proceedings in various research stream stages. A fourth artefact or design theory is presented to generalize results as mature knowledge.
Valentin Gabeur, Arsha Nagrani, Chen Sun, Karteek Alahari, Cordelia Schmid
Pre-training on large scale unlabelled datasets has shown impressive
performance improvements in the fields of computer vision and natural language
processing. Given the advent of large-scale instructional video datasets, a
common strategy for pre-training video encoders is to use the accompanying
speech as weak supervision. However, as speech is used to supervise the
pre-training, it is never seen by the video encoder, which does not learn to
process that modality. We address this drawback of current pre-training
methods, which fail to exploit the rich cues in spoken language. Our proposal
is to pre-train a video encoder using all the available video modalities as
supervision, namely, appearance, sound, and transcribed speech. We mask an
entire modality in the input and predict it using the other two modalities.
This encourages each modality to collaborate with the others, and our video
encoder learns to process appearance and audio as well as speech. We show the
superior performance of our "modality masking" pre-training approach for video
retrieval on the How2R, YouCook2 and Condensed Movies datasets.
Authors' comments: Accepted at WACV 2022
Sarthak Mittal, Sharath Chandra Raparthy, Irina Rish, Yoshua Bengio, Guillaume Lajoie
Multi-head, key-value attention is the backbone of the widely successful Transformer model and its variants. This attention mechanism uses multiple parallel key-value attention blocks (called heads), each performing two fundamental computations: (1) search - selection of a relevant entity from a set via query-key interactions, and (2) retrieval - extraction of relevant features from the selected entity via a value matrix. Importantly, standard attention heads learn a rigid mapping between search and retrieval. In this work, we first highlight how this static nature of the pairing can potentially: (a) lead to learning of redundant parameters in certain tasks, and (b) hinder generalization. To alleviate this problem, we propose a novel attention mechanism, called Compositional Attention, that replaces the standard head structure. The proposed mechanism disentangles search and retrieval and composes them in a dynamic, flexible and context-dependent manner through an additional soft competition stage between the query-key combination and value pairing. Through a series of numerical experiments, we show that it outperforms standard multi-head attention on a variety of tasks, including some out-of-distribution settings. Through our qualitative analysis, we demonstrate that Compositional Attention leads to dynamic specialization based on the type of retrieval needed. Our proposed mechanism generalizes multi-head attention, allows independent scaling of search and retrieval, and can easily be implemented in lieu of standard attention heads in any network architecture.
Panupong Pasupat, Yuan Zhang, Kelvin Guu
In practical applications of semantic parsing, we often want to rapidly
change the behavior of the parser, such as enabling it to handle queries in a
new domain, or changing its predictions on certain targeted queries. While we
can introduce new training examples exhibiting the target behavior, a mechanism
for enacting such behavior changes without expensive model re-training would be
preferable. To this end, we propose ControllAble Semantic Parser via Exemplar
Retrieval (CASPER). Given an input query, the parser retrieves related
exemplars from a retrieval index, augments them to the query, and then applies
a generative seq2seq model to produce an output parse. The exemplars act as a
control mechanism over the generic generative model: by manipulating the
retrieval index or how the augmented query is constructed, we can manipulate
the behavior of the parser. On the MTOP dataset, in addition to achieving
state-of-the-art on the standard setup, we show that CASPER can parse queries
in a new domain, adapt the prediction toward the specified patterns, or adapt
to new semantic schemas without having to further re-train the model.
Authors' comments: EMNLP 2021
Bhargavi Paranjape, Matthew Lamm, Ian Tenney
Deep NLP models have been shown to learn spurious correlations, leaving them
brittle to input perturbations. Recent work has shown that counterfactual or
contrastive data -- i.e. minimally perturbed inputs -- can reveal these
weaknesses, and that data augmentation using counterfactuals can help
ameliorate them. Proposed techniques for generating counterfactuals rely on
human annotations, perturbations based on simple heuristics, and meaning
representation frameworks. We focus on the task of creating counterfactuals for
question answering, which presents unique challenges related to world
knowledge, semantic diversity, and answerability. To address these challenges,
we develop a Retrieve-Generate-Filter(RGF) technique to create counterfactual
evaluation and training data with minimal human supervision. Using an
open-domain QA framework and question generation model trained on original task
data, we create counterfactuals that are fluent, semantically diverse, and
automatically labeled. Data augmentation with RGF counterfactuals improves
performance on out-of-domain and challenging evaluation sets over and above
existing methods, in both the reading comprehension and open-domain QA
settings. Moreover, we find that RGF data leads to significant improvements in
a model's robustness to local perturbations.
Authors' comments: ACL 2022 Camera-ready version
Tian Lan, Deng Cai, Yan Wang, Yixuan Su, Heyan Huang, Xian-Ling Mao
Recent progress in deep learning has continuously improved the accuracy of
dialogue response selection. In particular, sophisticated neural network
architectures are leveraged to capture the rich interactions between dialogue
context and response candidates. While remarkably effective, these models also
bring in a steep increase in computational cost. Consequently, such models can
only be used as a re-rank module in practice. In this study, we present a
solution to directly select proper responses from a large corpus or even a
nonparallel corpus that only consists of unpaired sentences, using a dense
retrieval model. To push the limits of dense retrieval, we design an
interaction layer upon the dense retrieval models and apply a set of
tailor-designed learning strategies. Our model shows superiority over strong
baselines on the conventional re-rank evaluation setting, which is remarkable
given its efficiency. To verify the effectiveness of our approach in realistic
scenarios, we also conduct full-rank evaluation, where the target is to select
proper responses from a full candidate pool that may contain millions of
candidates and evaluate them fairly through human annotations. Our proposed
model notably outperforms pipeline baselines that integrate fast recall and
expressive re-rank modules. Human evaluation results show that enlarging the
candidate pool with nonparallel corpora improves response quality further.
Authors' comments: 11 pages, 4 figures, 6 tables
Sulaiman Adesegun Kukoyi, O. F. W Onifade, Kamorudeen A. Amuda
Voice information retrieval is a technique that provides Information Retrieval System with the capacity to transcribe spoken queries and use the text output for information search. CIS is a field of research that involves studying the situation, motivations, and methods for people working in a collaborative group for information seeking projects, as well as building a system for supporting such activities. Humans find it easier to communicate and express ideas via speech. Existing voice search like Google and other mainstream voice search does not support collaborative search. The spoken speeches passed through the ASR for feature extraction using MFCC and HMM, Viterbi algorithm precisely for pattern matching. The result of the ASR is then passed as input into CIS System, results is then filtered to have an aggregate result. The result from the simulation shows that our model was able to achieve 81.25% transcription accuracy.
Shiv Ram Dubey, Satish Kumar Singh, Wei-Ta Chu
Deep learning has shown a tremendous growth in hashing techniques for image
retrieval. Recently, Transformer has emerged as a new architecture by utilizing
self-attention without convolution. Transformer is also extended to Vision
Transformer (ViT) for the visual recognition with a promising performance on
ImageNet. In this paper, we propose a Vision Transformer based Hashing (VTS)
for image retrieval. We utilize the pre-trained ViT on ImageNet as the backbone
network and add the hashing head. The proposed VTS model is fine tuned for
hashing under six different image retrieval frameworks, including Deep
Supervised Hashing (DSH), HashNet, GreedyHash, Improved Deep Hashing Network
(IDHN), Deep Polarized Network (DPN) and Central Similarity Quantization (CSQ)
with their objective functions. We perform the extensive experiments on
CIFAR10, ImageNet, NUS-Wide, and COCO datasets. The proposed VTS based image
retrieval outperforms the recent state-of-the-art hashing techniques with a
great margin. We also find the proposed VTS model as the backbone network is
better than the existing networks, such as AlexNet and ResNet. The code is
released at \url{https://github.com/shivram1987/VisionTransformerHashing}.
Authors' comments: Accepted in IEEE International Conference on Multimedia and Expo
(ICME), 2022
Vivek Gupta, Akshat Shrivastava, Adithya Sagar, Armen Aghajanyan, Denis Savenkov
While large pre-trained language models accumulate a lot of knowledge in
their parameters, it has been demonstrated that augmenting it with
non-parametric retrieval-based memory has a number of benefits from accuracy
improvements to data efficiency for knowledge-focused tasks, such as question
answering. In this paper, we are applying retrieval-based modeling ideas to the
problem of multi-domain task-oriented semantic parsing for conversational
assistants. Our approach, RetroNLU, extends a sequence-to-sequence model
architecture with a retrieval component, used to fetch existing similar
examples and provide them as an additional input to the model. In particular,
we analyze two settings, where we augment an input with (a) retrieved nearest
neighbor utterances (utterance-nn), and (b) ground-truth semantic parses of
nearest neighbor utterances (semparse-nn). Our technique outperforms the
baseline method by 1.5% absolute macro-F1, especially at the low resource
setting, matching the baseline model accuracy with only 40% of the data.
Furthermore, we analyze the nearest neighbor retrieval component's quality,
model sensitivity and break down the performance for semantic parses of
different utterance complexity.
Authors' comments: 12 pages, 9 figures, 5 Tables
Giuseppe Ortolano, Pauline Boucher, Ivano Ruo Berchera, Silvania F. Pereira, Marco Genovese
Quantum correlation, such as entanglement and squeezing have shown to improve phase estimation in interferometric setups on one side, and non-interferometric imaging scheme of amplitude object on the other. In the last case, quantum correlation among a pair of beams leads to a sub-shot-noise readout of the image intensity pattern, where weak details, otherwise hidden in the noise, can be appreciated. In this paper we propose a technique which exploits entanglement to enhance quantitative phase retrieval of an object in a non-interferometric setting, i.e only measuring the propagated intensity pattern after interaction with the object. The method exploits existing technology, it operates in wide field mode, so does not require time consuming raster scanning and can operate with small spatial coherence of the incident field. This protocol can find application in optical microscopy and X-ray imaging, reducing the photon dose necessary to achieve a fixed signal-to-noise ratio.
Ievgeniia Kuzminykh, Dan Shevchuk, Stavros Shiaeles, Bogdan Ghita
Modern streaming services are increasingly labeling videos based on their
visual or audio content. This typically augments the use of technologies such
as AI and ML by allowing to use natural speech for searching by keywords and
video descriptions. Prior research has successfully provided a number of
solutions for speech to text, in the case of a human speech, but this article
aims to investigate possible solutions to retrieve sound events based on a
natural language query, and estimate how effective and accurate they are. In
this study, we specifically focus on the YamNet, AlexNet, and ResNet-50
pre-trained models to automatically classify audio samples using their
respective melspectrograms into a number of predefined classes. The predefined
classes can represent sounds associated with actions within a video fragment.
Two tests are conducted to evaluate the performance of the models on two
separate problems: audio classification and intervals retrieval based on a
natural language query. Results show that the benchmarked models are comparable
in terms of performance, with YamNet slightly outperforming the other two
models. YamNet was able to classify single fixed-size audio samples with 92.7%
accuracy and 68.75% precision while its average accuracy on intervals retrieval
was 71.62% and precision was 41.95%. The investigated method may be embedded
into an automated event marking architecture for streaming services.
Authors' comments: 20th International Conference on Next Generation Teletraffic and
Wired/Wireless Advanced Networks and Systems, NEW2AN 2020 and 13th Conference
on the Internet of Things and Smart Spaces, ruSMART 2020
Yohan Jo, Haneul Yoo, JinYeong Bak, Alice Oh, Chris Reed, Eduard Hovy
Finding counterevidence to statements is key to many tasks, including
counterargument generation. We build a system that, given a statement,
retrieves counterevidence from diverse sources on the Web. At the core of this
system is a natural language inference (NLI) model that determines whether a
candidate sentence is valid counterevidence or not. Most NLI models to date,
however, lack proper reasoning abilities necessary to find counterevidence that
involves complex inference. Thus, we present a knowledge-enhanced NLI model
that aims to handle causality- and example-based inference by incorporating
knowledge graphs. Our NLI model outperforms baselines for NLI tasks, especially
for instances that require the targeted inference. In addition, this NLI model
further improves the counterevidence retrieval system, notably finding complex
counterevidence better.
Authors' comments: To appear in Findings of EMNLP 2021
Christopher Sciavolino, Zexuan Zhong, Jinhyuk Lee, Danqi Chen
Open-domain question answering has exploded in popularity recently due to the
success of dense retrieval models, which have surpassed sparse models using
only a few supervised training examples. However, in this paper, we demonstrate
current dense models are not yet the holy grail of retrieval. We first
construct EntityQuestions, a set of simple, entity-rich questions based on
facts from Wikidata (e.g., "Where was Arve Furset born?"), and observe that
dense retrievers drastically underperform sparse methods. We investigate this
issue and uncover that dense retrievers can only generalize to common entities
unless the question pattern is explicitly observed during training. We discuss
two simple solutions towards addressing this critical problem. First, we
demonstrate that data augmentation is unable to fix the generalization problem.
Second, we argue a more robust passage encoder helps facilitate better question
adaptation using specialized question encoders. We hope our work can shed light
on the challenges in creating a robust, universal dense retriever that works
well across different input distributions.
Authors' comments: EMNLP 2021. The code and data is publicly available at
https://github.com/princeton-nlp/EntityQuestions
Ying Wang, Tingzhen Liu, Zepeng Bu, Yuhui Huang, Lizhong Gao, Qiao Wang
In large-scale image retrieval, many indexing methods have been proposed to
narrow down the searching scope of retrieval. The features extracted from
images usually are of high dimensions or unfixed sizes due to the existence of
key points. Most of existing index structures suffer from the dimension curse,
the unfixed feature size and/or the loss of semantic similarity. In this paper
a new classification-based indexing structure, called Semantic Indexing
Structure (SIS), is proposed, in which we utilize the semantic categories
rather than clustering centers to create database partitions, such that the
proposed index SIS can be combined with feature extractors without the
restriction of dimensions. Besides, it is observed that the size of each
semantic partition is positively correlated with the semantic distribution of
database. Along this way, we found that when the partition number is normalized
to five, the proposed algorithm performed very well in all the tests. Compared
with state-of-the-art models, SIS achieves outstanding performance.
Authors' comments: 12 pages, 6 figures
Aashi Jain, Mandy Guo, Krishna Srinivasan, Ting Chen, Sneha Kudugunta, Chao Jia, Yinfei Yang, Jason Baldridge
Both image-caption pairs and translation pairs provide the means to learn deep representations of and connections between languages. We use both types of pairs in MURAL (MUltimodal, MUltitask Representations Across Languages), a dual encoder that solves two tasks: 1) image-text matching and 2) translation pair matching. By incorporating billions of translation pairs, MURAL extends ALIGN (Jia et al. PMLR'21)--a state-of-the-art dual encoder learned from 1.8 billion noisy image-text pairs. When using the same encoders, MURAL's performance matches or exceeds ALIGN's cross-modal retrieval performance on well-resourced languages across several datasets. More importantly, it considerably improves performance on under-resourced languages, showing that text-text learning can overcome a paucity of image-caption examples for these languages. On the Wikipedia Image-Text dataset, for example, MURAL-base improves zero-shot mean recall by 8.1% on average for eight under-resourced languages and by 6.8% on average when fine-tuning. We additionally show that MURAL's text representations cluster not only with respect to genealogical connections but also based on areal linguistics, such as the Balkan Sprachbund.
Md Rizwan Parvez, Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang
Software developers write a lot of source code and documentation during
software development. Intrinsically, developers often recall parts of source
code or code summaries that they had written in the past while implementing
software or documenting them. To mimic developers' code or summary generation
behavior, we propose a retrieval augmented framework, REDCODER, that retrieves
relevant code or summaries from a retrieval database and provides them as a
supplement to code generation or summarization models. REDCODER has a couple of
uniqueness. First, it extends the state-of-the-art dense retrieval technique to
search for relevant code or summaries. Second, it can work with retrieval
databases that include unimodal (only code or natural language description) or
bimodal instances (code-description pairs). We conduct experiments and
extensive analysis on two benchmark datasets of code generation and
summarization in Java and Python, and the promising results endorse the
effectiveness of our proposed retrieval augmented framework.
Authors' comments: accepted in EMNLP-Findings 2021
Antoine Louis, Gerasimos Spanakis
Statutory article retrieval is the task of automatically retrieving law
articles relevant to a legal question. While recent advances in natural
language processing have sparked considerable interest in many legal tasks,
statutory article retrieval remains primarily untouched due to the scarcity of
large-scale and high-quality annotated datasets. To address this bottleneck, we
introduce the Belgian Statutory Article Retrieval Dataset (BSARD), which
consists of 1,100+ French native legal questions labeled by experienced jurists
with relevant articles from a corpus of 22,600+ Belgian law articles. Using
BSARD, we benchmark several state-of-the-art retrieval approaches, including
lexical and dense architectures, both in zero-shot and supervised setups. We
find that fine-tuned dense retrieval models significantly outperform other
systems. Our best performing baseline achieves 74.8% R@100, which is promising
for the feasibility of the task and indicates there is still room for
improvement. By the specificity of the domain and addressed task, BSARD
presents a unique challenge problem for future research on legal information
retrieval. Our dataset and source code are publicly available.
Authors' comments: ACL 2022. Code and dataset are available at
https://github.com/maastrichtlawtech/bsard
Nicola Tonellotto, Craig Macdonald
Recent advances in dense retrieval techniques have offered the promise of being able not just to re-rank documents using contextualised language models such as BERT, but also to use such models to identify documents from the collection in the first place. However, when using dense retrieval approaches that use multiple embedded representations for each query, a large number of documents can be retrieved for each query, hindering the efficiency of the method. Hence, this work is the first to consider efficiency improvements in the context of a dense retrieval approach (namely ColBERT), by pruning query term embeddings that are estimated not to be useful for retrieving relevant documents. Our proposed query embeddings pruning reduces the cost of the dense retrieval operation, as well as reducing the number of documents that are retrieved and hence require to be fully scored. Experiments conducted on the MSMARCO passage ranking corpus demonstrate that, when reducing the number of query embeddings used from 32 to 3 based on the collection frequency of the corresponding tokens, query embedding pruning results in no statistically significant differences in effectiveness, while reducing the number of documents retrieved by 70%. In terms of mean response time for the end-to-end to end system, this results in a 2.65x speedup.
Yuhao Zhou, Huanhuan Fan, Shuang Gao, Yuchen Yang, Xudong Zhang, Jijunnan Li, Yandong Guo
Accurate visual re-localization is very critical to many artificial
intelligence applications, such as augmented reality, virtual reality, robotics
and autonomous driving. To accomplish this task, we propose an integrated
visual re-localization method called RLOCS by combining image retrieval,
semantic consistency and geometry verification to achieve accurate estimations.
The localization pipeline is designed as a coarse-to-fine paradigm. In the
retrieval part, we cascade the architecture of ResNet101-GeM-ArcFace and employ
DBSCAN followed by spatial verification to obtain a better initial coarse pose.
We design a module called observation constraints, which combines geometry
information and semantic consistency for filtering outliers. Comprehensive
experiments are conducted on open datasets, including retrieval on R-Oxford5k
and R-Paris6k, semantic segmentation on Cityscapes, localization on Aachen
Day-Night and InLoc. By creatively modifying separate modules in the total
pipeline, our method achieves many performance improvements on the challenging
localization benchmarks.
Authors' comments: Accepted by the 2021 International Conference on Robotics and
Automation (ICRA2021)