Huang Xie, Samuel Lipping, Tuomas Virtanen
Language-based audio retrieval is a task, where natural language textual
captions are used as queries to retrieve audio signals from a dataset. It has
been first introduced into DCASE 2022 Challenge as Subtask 6B of task 6, which
aims at developing computational systems to model relationships between audio
signals and free-form textual descriptions. Compared with audio captioning
(Subtask 6A), which is about generating audio captions for audio signals,
language-based audio retrieval (Subtask 6B) focuses on ranking audio signals
according to their relevance to natural language textual captions. In DCASE
2022 Challenge, the provided baseline system for Subtask 6B was significantly
outperformed, with top performance being 0.276 in mAP@10. This paper presents
the outcome of Subtask 6B in terms of submitted systems' performance and
analysis.
Authors' comments: Accepted at DCASE 2022 Workshop
Lorenzo Leone, Salvatore F. E. Oliviero, Stefano Piemontese, Sarah True, Alioscia Hamma
In a seminal paper[JHEP09(2007)120], Hayden and Preskill showed that information can be retrieved from a black hole that is sufficiently scrambling, assuming that the retriever has perfect control of the emitted Hawking radiation and perfect knowledge of the internal dynamics of the black hole. In this paper, we show that for $t-$doped Clifford black holes - that is, black holes modeled by random Clifford circuits doped with an amount $t$ of non-Clifford resources - an information retrieval decoder can be learned with fidelity scaling as $\exp(-\alpha t)$ using quantum machine learning while having access only to out-of-time-order correlation functions. We show that the crossover between learnability and non-learnability is driven by the amount of non-stabilizerness present in the black hole and sketch a different approach to quantum complexity.
M. C. Diamantini, C. A. Trugenberger
We show that a quantum state can be perfectly cloned up to global mirroring with a unitary transformation that depends on one single parameter. We then show that this is equivalent to "perfect" cloning for quantum associative memories which, as a consequence efficiently hold exponentially more information than their classical counterparts. Finally, we present a quantum associative retrieval algorithm which can correct corrupted inputs and is exponentially faster than the Grover algorithm.
Hyeonsu B. Kang, Sheshera Mysore, Kevin Huang, Haw-Shiuan Chang, Thorben Prein, Andrew McCallum, Aniket Kittur, Elsa Olivetti
Exposure to ideas in domains outside a scientist's own may benefit her in
reformulating existing research problems in novel ways and discovering new
application domains for existing solution ideas. While improved performance in
scholarly search engines can help scientists efficiently identify relevant
advances in domains they may already be familiar with, it may fall short of
helping them explore diverse ideas \textit{outside} such domains. In this paper
we explore the design of systems aimed at augmenting the end-user ability in
cross-domain exploration with flexible query specification. To this end, we
develop an exploratory search system in which end-users can select a portion of
text core to their interest from a paper abstract and retrieve papers that have
a high similarity to the user-selected core aspect but differ in terms of
domains. Furthermore, end-users can `zoom in' to specific domain clusters to
retrieve more papers from them and understand nuanced differences within the
clusters. Our case studies with scientists uncover opportunities and design
implications for systems aimed at facilitating cross-domain exploration and
inspiration.
Authors' comments: NLP+HCI Workshop at NAACL 2022
Dingmin Wang, Shengchao Liu, Hanchen Wang, Bernardo Cuenca Grau, Linfeng Song, Jian Tang, Song Le, Qi Liu
Graph Neural Networks (GNNs) are effective tools for graph representation
learning. Most GNNs rely on a recursive neighborhood aggregation scheme, named
message passing, thereby their theoretical expressive power is limited to the
first-order Weisfeiler-Lehman test (1-WL). An effective approach to this
challenge is to explicitly retrieve some annotated examples used to enhance GNN
models. While retrieval-enhanced models have been proved to be effective in
many language and vision domains, it remains an open question how effective
retrieval-enhanced GNNs are when applied to graph datasets. Motivated by this,
we want to explore how the retrieval idea can help augment the useful
information learned in the graph neural networks, and we design a
retrieval-enhanced scheme called GRAPHRETRIEVAL, which is agnostic to the
choice of graph neural network models. In GRAPHRETRIEVAL, for each input graph,
similar graphs together with their ground-true labels are retrieved from an
existing database. Thus they can act as a potential enhancement to complete
various graph property predictive tasks. We conduct comprehensive experiments
over 13 datasets, and we observe that GRAPHRETRIEVAL is able to reach
substantial improvements over existing GNNs. Moreover, our empirical study also
illustrates that retrieval enhancement is a promising remedy for alleviating
the long-tailed label distribution problem.
Authors' comments: Accepted by ECAI 2023
Man Luo
Information Retriever (IR) aims to find the relevant documents (e.g.
snippets, passages, and articles) to a given query at large scale. IR plays an
important role in many tasks such as open domain question answering and
dialogue systems, where external knowledge is needed. In the past, searching
algorithms based on term matching have been widely used. Recently, neural-based
algorithms (termed as neural retrievers) have gained more attention which can
mitigate the limitations of traditional methods. Regardless of the success
achieved by neural retrievers, they still face many challenges, e.g. suffering
from a small amount of training data and failing to answer simple
entity-centric questions. Furthermore, most of the existing neural retrievers
are developed for pure-text query. This prevents them from handling
multi-modality queries (i.e. the query is composed of textual description and
images). This proposal has two goals. First, we introduce methods to address
the abovementioned issues of neural retrievers from three angles, new model
architectures, IR-oriented pretraining tasks, and generating large scale
training data. Second, we identify the future research direction and propose
potential corresponding solution.
Authors' comments: Accepted to NAACL 2022 SRW
Tobias Uelwer, Sebastian Konietzny, Stefan Harmeling
Phase retrieval is the problem of reconstructing images from magnitude-only
measurements. In many real-world applications the problem is underdetermined.
When training data is available, generative models allow optimization in a
lower-dimensional latent space, hereby constraining the solution set to those
images that can be synthesized by the generative model. However, not all
possible solutions are within the range of the generator. Instead, they are
represented with some error. To reduce this representation error in the context
of phase retrieval, we first leverage a novel variation of intermediate layer
optimization (ILO) to extend the range of the generator while still producing
images consistent with the training data. Second, we introduce new
initialization schemes that further improve the quality of the reconstruction.
With extensive experiments on the Fourier phase retrieval problem and thorough
ablation studies, we can show the benefits of our modified ILO and the new
initialization schemes. Additionally, we analyze the performance of our
approach on the Gaussian phase retrieval problem.
Authors' comments: Published in Transactions on Machine Learning Research (TMLR). First
two authors contributed equally
Jin Chen, Defu Lian, Yucheng Li, Baoyun Wang, Kai Zheng, Enhong Chen
Recommender retrievers aim to rapidly retrieve a fraction of items from the
entire item corpus when a user query requests, with the representative
two-tower model trained with the log softmax loss. For efficiently training
recommender retrievers on modern hardwares, inbatch sampling, where the items
in the mini-batch are shared as negatives to estimate the softmax function, has
attained growing interest. However, existing inbatch sampling based strategies
just correct the sampling bias of inbatch items with item frequency, being
unable to distinguish the user queries within the mini-batch and still
incurring significant bias from the softmax. In this paper, we propose a
Cache-Augmented Inbatch Importance Resampling (XIR) for training recommender
retrievers, which not only offers different negatives to user queries with
inbatch items, but also adaptively achieves a more accurate estimation of the
softmax distribution. Specifically, XIR resamples items for the given
mini-batch training pairs based on certain probabilities, where a cache with
more frequently sampled items is adopted to augment the candidate item set,
with the purpose of reusing the historical informative samples. XIR enables to
sample query-dependent negatives based on inbatch items and to capture dynamic
changes of model training, which leads to a better approximation of the softmax
and further contributes to better convergence. Finally, we conduct experiments
to validate the superior performance of the proposed XIR compared with
competitive approaches.
Authors' comments: 18 pages
Hernan C. Vazquez, J. Andres Diaz Pace, Claudia Marcos, Santiago Vidal
The selection of software technologies is an important but complex task. We consider developers of JavaScript (JS) applications, for whom the assessment of JS libraries has become difficult and time-consuming due to the growing number of technology options available. A common strategy is to browse software repositories via search engines (e.g., NPM, or Google), although it brings some problems. First, given a technology need, the engines might return a long list of results, which often causes information overload issues. Second, the results should be ranked according to criteria of interest for the developer. However, deciding how to weight these criteria to make a decision is not straightforward. In this work, we propose a two-phase approach for assisting developers to retrieve and rank JS technologies in a semi-automated fashion. The first-phase (ST-Retrieval) uses a meta-search technique for collecting JS technologies that meet the developer's needs. The second-phase (called ST-Rank), relies on a machine learning technique to infer, based on criteria used by other projects in the Web, a ranking of the output of ST-Retrieval. We evaluated our approach with NPM and obtained satisfactory results in terms of the accuracy of the technologies retrieved and the order in which they were ranked.
Sanku Satya Uday, Satti Thanuja Pavani, T. Jaya Lakshmi, Rohit Chivukula
The novel coronavirus disease (COVID-19) began in Wuhan, China, in late 2019 and to date has infected over 148M people worldwide, resulting in 3.12M deaths. On March 10, 2020, the World Health Organisation (WHO) declared it as a global pandemic. Many academicians and researchers started to publish papers describing the latest discoveries on covid-19. The large influx of publications made it hard for other researchers to go through a large amount of data and find the appropriate one that helps their research. So, the proposed model attempts to extract relavent titles from the large corpus of research publications which makes the job easy for the researchers. Allen Institute for AI released the CORD-19 dataset, which consists of 2,00,000 journal articles related to coronavirus-related research publications from PubMed's PMC, WHO (World Health Organization), bioRxiv, and medRxiv pre-prints. Along with this document corpus, they have also provided a topics dataset named topics-rnd3 consisting of a list of topics. Each topic has three types of representations like query, question, and narrative. These Datasets are made open for research, and also they released a TREC-COVID competition on Kaggle. Using these topics like queries, our goal is to find out the relevant documents in the CORD-19 dataset. In this research, relevant documents should be recognized for the posed topics in topics-rnd3 data set. The proposed model uses Natural Language Processing(NLP) techniques like Bag-of-Words, Average Word-2-Vec, Average BERT Base model and Tf-Idf weighted Word2Vec model to fabricate vectors for query, question, narrative, and combinations of them. Similarly, fabricate vectors for titles in the CORD-19 dataset. After fabricating vectors, cosine similarity is used for finding similarities between every two vectors. Cosine similarity helps us to find relevant documents for the given topic.
Xiang Chen, Lei Li, Ningyu Zhang, Xiaozhuan Liang, Shumin Deng, Chuanqi Tan, Fei Huang, Luo Si et al.
Prompt learning approaches have made waves in natural language processing by
inducing better few-shot performance while they still follow a parametric-based
learning paradigm; the oblivion and rote memorization problems in learning may
encounter unstable generalization issues. Specifically, vanilla prompt learning
may struggle to utilize atypical instances by rote during fully-supervised
training or overfit shallow patterns with low-shot data. To alleviate such
limitations, we develop RetroPrompt with the motivation of decoupling knowledge
from memorization to help the model strike a balance between generalization and
memorization. In contrast with vanilla prompt learning, RetroPrompt constructs
an open-book knowledge-store from training instances and implements a retrieval
mechanism during the process of input, training and inference, thus equipping
the model with the ability to retrieve related contexts from the training
corpus as cues for enhancement. Extensive experiments demonstrate that
RetroPrompt can obtain better performance in both few-shot and zero-shot
settings. Besides, we further illustrate that our proposed RetroPrompt can
yield better generalization abilities with new datasets. Detailed analysis of
memorization indeed reveals RetroPrompt can reduce the reliance of language
models on memorization; thus, improving generalization for downstream tasks.
Code is available in
https://github.com/zjunlp/PromptKG/tree/main/research/RetroPrompt.
Authors' comments: NeurIPS 2022 (Spotlight)
Hui Wan, Siva Sankalp Patel, J. William Murdock, Saloni Potdar, Sachindra Joshi
Dialogue systems can benefit from being able to search through a corpus of
text to find information relevant to user requests, especially when
encountering a request for which no manually curated response is available. The
state-of-the-art technology for neural dense retrieval or re-ranking involves
deep learning models with hundreds of millions of parameters. However, it is
difficult and expensive to get such models to operate at an industrial scale,
especially for cloud services that often need to support a big number of
individually customized dialogue systems, each with its own text corpus. We
report our work on enabling advanced neural dense retrieval systems to operate
effectively at scale on relatively inexpensive hardware. We compare with
leading alternative industrial solutions and show that we can provide a
solution that is effective, fast, and cost-efficient.
Authors' comments: Accepted to appear in NAACL-HLT 2022 Industry Track
Wilson Silva, Maria Carvalho, Carlos Mavioso, Maria J. Cardoso, Jaime S. Cardoso
Treatments for breast cancer have continued to evolve and improve in recent years, resulting in a substantial increase in survival rates, with approximately 80\% of patients having a 10-year survival period. Given the serious impact that breast cancer treatments can have on a patient's body image, consequently affecting her self-confidence and sexual and intimate relationships, it is paramount to ensure that women receive the treatment that optimizes both survival and aesthetic outcomes. Currently, there is no gold standard for evaluating the aesthetic outcome of breast cancer treatment. In addition, there is no standard way to show patients the potential outcome of surgery. The presentation of similar cases from the past would be extremely important to manage women's expectations of the possible outcome. In this work, we propose a deep neural network to perform the aesthetic evaluation. As a proof-of-concept, we focus on a binary aesthetic evaluation. Besides its use for classification, this deep neural network can also be used to find the most similar past cases by searching for nearest neighbours in the highly semantic space before classification. We performed the experiments on a dataset consisting of 143 photos of women after conservative treatment for breast cancer. The results for accuracy and balanced accuracy showed the superior performance of our proposed model compared to the state of the art in aesthetic evaluation of breast cancer treatments. In addition, the model showed a good ability to retrieve similar previous cases, with the retrieved cases having the same or adjacent class (in the 4-class setting) and having similar types of asymmetry. Finally, a qualitative interpretability assessment was also performed to analyse the robustness and trustworthiness of the model.
Mujeen Sung, Jungsoo Park, Jaewoo Kang, Danqi Chen, Jinhyuk Lee
Recent developments of dense retrieval rely on quality representations of
queries and contexts from pre-trained query and context encoders. In this
paper, we introduce TOUR (Test-Time Optimization of Query Representations),
which further optimizes instance-level query representations guided by signals
from test-time retrieval results. We leverage a cross-encoder re-ranker to
provide fine-grained pseudo labels over retrieval results and iteratively
optimize query representations with gradient descent. Our theoretical analysis
reveals that TOUR can be viewed as a generalization of the classical Rocchio
algorithm for pseudo relevance feedback, and we present two variants that
leverage pseudo-labels as hard binary or soft continuous labels. We first apply
TOUR on phrase retrieval with our proposed phrase re-ranker, and also evaluate
its effectiveness on passage retrieval with an off-the-shelf re-ranker. TOUR
greatly improves end-to-end open-domain question answering accuracy, as well as
passage retrieval performance. TOUR also consistently improves direct
re-ranking by up to 2.0% while running 1.3-2.4x faster with an efficient
implementation.
Authors' comments: Findings of ACL 2023
Itay Harel, Hagai Taitelbaum, Idan Szpektor, Oren Kurland
We address the task of sentence retrieval for open-ended dialogues. The goal is to retrieve sentences from a document corpus that contain information useful for generating the next turn in a given dialogue. Prior work on dialogue-based retrieval focused on specific types of dialogues: either conversational QA or conversational search. To address a broader scope of this task where any type of dialogue can be used, we constructed a dataset that includes open-ended dialogues from Reddit, candidate sentences from Wikipedia for each dialogue and human annotations for the sentences. We report the performance of several retrieval baselines, including neural retrieval models, over the dataset. To adapt neural models to the types of dialogues in the dataset, we explored an approach to induce a large-scale weakly supervised training data from Reddit. Using this training set significantly improved the performance over training on the MS MARCO dataset.
Stanislav Dereka, Ivan Karpukhin, Sergey Kolesnikov
Large-scale datasets are essential for the success of deep learning in image retrieval. However, manual assessment errors and semi-supervised annotation techniques can lead to label noise even in popular datasets. As previous works primarily studied annotation quality in image classification tasks, it is still unclear how label noise affects deep learning approaches to image retrieval. In this work, we show that image retrieval methods are less robust to label noise than image classification ones. Furthermore, we, for the first time, investigate different types of label noise specific to image retrieval tasks and study their effect on model performance.
Qiuliang Ye, Li-Wen Wang, Daniel P. K. Lun
With the success of deep learning methods in many image processing tasks, deep learning approaches have also been introduced to the phase retrieval problem recently. These approaches are different from the traditional iterative optimization methods in that they usually require only one intensity measurement and can reconstruct phase images in real-time. However, because of tremendous domain discrepancy, the quality of the reconstructed images given by these approaches still has much room to improve to meet the general application requirements. In this paper, we design a novel deep neural network structure named SiSPRNet for phase retrieval based on a single Fourier intensity measurement. To effectively utilize the spectral information of the measurements, we propose a new feature extraction unit using the Multi-Layer Perceptron (MLP) as the front end. It allows all pixels of the input intensity image to be considered together for exploring their global representation. The size of the MLP is carefully designed to facilitate the extraction of the representative features while reducing noises and outliers. A dropout layer is also equipped to mitigate the possible overfitting problem in training the MLP. To promote the global correlation in the reconstructed images, a self-attention mechanism is introduced to the Up-sampling and Reconstruction (UR) blocks of the proposed SiSPRNet. These UR blocks are inserted into a residual learning structure to prevent the weak information flow and vanishing gradient problems due to their complex layer structure. Extensive evaluations of the proposed model are performed using different testing datasets of phase-only images and images with linearly related magnitude and phase. Experiments were conducted on an optical experimentation platform to understand the performance of different deep learning methods when working in a practical environment.
Zelong Zeng, Zheng Wang, Fan Yang, Shin'ichi Satoh
The large variation of viewpoint and irrelevant content around the target
always hinder accurate image retrieval and its subsequent tasks. In this paper,
we investigate an extremely challenging task: given a ground-view image of a
landmark, we aim to achieve cross-view geo-localization by searching out its
corresponding satellite-view images. Specifically, the challenge comes from the
gap between ground-view and satellite-view, which includes not only large
viewpoint changes (some parts of the landmark may be invisible from front view
to top view) but also highly irrelevant background (the target landmark tend to
be hidden in other surrounding buildings), making it difficult to learn a
common representation or a suitable mapping.
To address this issue, we take advantage of drone-view information as a
bridge between ground-view and satellite-view domains. We propose a Peer
Learning and Cross Diffusion (PLCD) framework. PLCD consists of three parts: 1)
a peer learning across ground-view and drone-view to find visible parts to
benefit ground-drone cross-view representation learning; 2) a patch-based
network for satellite-drone cross-view representation learning; 3) a cross
diffusion between ground-drone space and satellite-drone space. Extensive
experiments conducted on the University-Earth and University-Google datasets
show that our method outperforms state-of-the-arts significantly.
Authors' comments: 13 pages, 10 figures
Zhiruo Wang, Zhengbao Jiang, Eric Nyberg, Graham Neubig
Tables are an important form of structured data for both human and machine
readers alike, providing answers to questions that cannot, or cannot easily, be
found in texts. Recent work has designed special models and training paradigms
for table-related tasks such as table-based question answering and table
retrieval. Though effective, they add complexity in both modeling and data
acquisition compared to generic text solutions and obscure which elements are
truly beneficial. In this work, we focus on the task of table retrieval, and
ask: "is table-specific model design necessary for table retrieval, or can a
simpler text-based model be effectively used to achieve a similar result?"
First, we perform an analysis on a table-based portion of the Natural Questions
dataset (NQ-table), and find that structure plays a negligible role in more
than 70% of the cases. Based on this, we experiment with a general Dense
Passage Retriever (DPR) based on text and a specialized Dense Table Retriever
(DTR) that uses table-specific model designs. We find that DPR performs well
without any table-specific design and training, and even achieves superior
results compared to DTR when fine-tuned on properly linearized tables. We then
experiment with three modules to explicitly encode table structures, namely
auxiliary row/column embeddings, hard attention masks, and soft relation-based
attention biases. However, none of these yielded significant improvements,
suggesting that table-specific model design may not be necessary for table
retrieval.
Authors' comments: 11 pages total, 4 figures
Keshav Santhanam, Omar Khattab, Christopher Potts, Matei Zaharia
Pre-trained language models are increasingly important components across
multiple information retrieval (IR) paradigms. Late interaction, introduced
with the ColBERT model and recently refined in ColBERTv2, is a popular paradigm
that holds state-of-the-art status across many benchmarks. To dramatically
speed up the search latency of late interaction, we introduce the
Performance-optimized Late Interaction Driver (PLAID). Without impacting
quality, PLAID swiftly eliminates low-scoring passages using a novel centroid
interaction mechanism that treats every passage as a lightweight bag of
centroids. PLAID uses centroid interaction as well as centroid pruning, a
mechanism for sparsifying the bag of centroids, within a highly-optimized
engine to reduce late interaction search latency by up to 7$\times$ on a GPU
and 45$\times$ on a CPU against vanilla ColBERTv2, while continuing to deliver
state-of-the-art retrieval quality. This allows the PLAID engine with ColBERTv2
to achieve latency of tens of milliseconds on a GPU and tens or just few
hundreds of milliseconds on a CPU at large scale, even at the largest scales we
evaluate with 140M passages.
Authors' comments: Preprint. Omar and Keshav contributed equally to this work