Katherine Thai, Yapei Chang, Kalpesh Krishna, Mohit Iyyer
Humanities scholars commonly provide evidence for claims that they make about
a work of literature (e.g., a novel) in the form of quotations from the work.
We collect a large-scale dataset (RELiC) of 78K literary quotations and
surrounding critical analysis and use it to formulate the novel task of
literary evidence retrieval, in which models are given an excerpt of literary
analysis surrounding a masked quotation and asked to retrieve the quoted
passage from the set of all passages in the work. Solving this retrieval task
requires a deep understanding of complex literary and linguistic phenomena,
which proves challenging to methods that overwhelmingly rely on lexical and
semantic similarity matching. We implement a RoBERTa-based dense passage
retriever for this task that outperforms existing pretrained information
retrieval baselines; however, experiments and analysis by human domain experts
indicate that there is substantial room for improvement over our dense
retriever.
Authors' comments: ACL 2022 camera ready (19 pages)
Shuai Lu, Nan Duan, Hojae Han, Daya Guo, Seung-won Hwang, Alexey Svyatkovskiy
Code completion, which aims to predict the following code token(s) according
to the code context, can improve the productivity of software development.
Recent work has proved that statistical language modeling with transformers can
greatly improve the performance in the code completion task via learning from
large-scale source code datasets. However, current approaches focus only on
code context within the file or project, i.e. internal context. Our distinction
is utilizing "external" context, inspired by human behaviors of copying from
the related code snippets when writing code. Specifically, we propose a
retrieval-augmented code completion framework, leveraging both lexical copying
and referring to code with similar semantics by retrieval. We adopt a
stage-wise training approach that combines a source code retriever and an
auto-regressive language model for programming language. We evaluate our
approach in the code completion task in Python and Java programming languages,
achieving a state-of-the-art performance on CodeXGLUE benchmark.
Authors' comments: Published in ACL 2022
Qiang Wang, Yanhao Zhang, Yun Zheng, Pan Pan, Xian-Sheng Hua
Cross-modality interaction is a critical component in Text-Video Retrieval
(TVR), yet there has been little examination of how different influencing
factors for computing interaction affect performance. This paper first studies
the interaction paradigm in depth, where we find that its computation can be
split into two terms, the interaction contents at different granularity and the
matching function to distinguish pairs with the same semantics. We also observe
that the single-vector representation and implicit intensive function
substantially hinder the optimization. Based on these findings, we propose a
disentangled framework to capture a sequential and hierarchical representation.
Firstly, considering the natural sequential structure in both text and video
inputs, a Weighted Token-wise Interaction (WTI) module is performed to decouple
the content and adaptively exploit the pair-wise correlations. This interaction
can form a better disentangled manifold for sequential inputs. Secondly, we
introduce a Channel DeCorrelation Regularization (CDCR) to minimize the
redundancy between the components of the compared vectors, which facilitate
learning a hierarchical representation. We demonstrate the effectiveness of the
disentangled representation on various benchmarks, e.g., surpassing CLIP4Clip
largely by +2.9%, +3.1%, +7.9%, +2.3%, +2.8% and +6.5% R@1 on the MSR-VTT,
MSVD, VATEX, LSMDC, AcitivityNet, and DiDeMo, respectively.
Authors' comments: 22 pages, 11 figures, Tech report
Tong Yu, Pietro Mascagni, Juan Verde, Jacques Marescaux, Didier Mutter, Nicolas Padoy
Searching through large volumes of medical data to retrieve relevant
information is a challenging yet crucial task for clinical care. However the
primitive and most common approach to retrieval, involving text in the form of
keywords, is severely limited when dealing with complex media formats.
Content-based retrieval offers a way to overcome this limitation, by using rich
media as the query itself. Surgical video-to-video retrieval in particular is a
new and largely unexplored research problem with high clinical value,
especially in the real-time case: using real-time video hashing, search can be
achieved directly inside of the operating room. Indeed, the process of hashing
converts large data entries into compact binary arrays or hashes, enabling
large-scale search operations at a very fast rate. However, due to fluctuations
over the course of a video, not all bits in a given hash are equally reliable.
In this work, we propose a method capable of mitigating this uncertainty while
maintaining a light computational footprint. We present superior retrieval
results (3-4 % top 10 mean average precision) on a multi-task evaluation
protocol for surgery, using cholecystectomy phases, bypass phases, and coming
from an entirely new dataset introduced here, critical events across six
different surgery types. Success on this multi-task benchmark shows the
generalizability of our approach for surgical video retrieval.
Authors' comments: 16 pages, 13 figures
Ensheng Shi, Yanlin Wang, Wei Tao, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, Hongbin Sun
Commit messages are important for software development and maintenance. Many
neural network-based approaches have been proposed and shown promising results
on automatic commit message generation. However, the generated commit messages
could be repetitive or redundant. In this paper, we propose RACE, a new
retrieval-augmented neural commit message generation method, which treats the
retrieved similar commit as an exemplar and leverages it to generate an
accurate commit message. As the retrieved commit message may not always
accurately describe the content/intent of the current code diff, we also
propose an exemplar guider, which learns the semantic similarity between the
retrieved and current code diff and then guides the generation of commit
message based on the similarity. We conduct extensive experiments on a large
public dataset with five programming languages. Experimental results show that
RACE can outperform all baselines. Furthermore, RACE can boost the performance
of existing Seq2Seq models in commit message generation.
Authors' comments: Accepted by EMNLP 2022 (The 2022 Conference on Empirical Methods in
Natural Language Processing)
V. I. Yukalov, S. Gluzman
Methods of determining, from small-variable asymptotic expansions, the
characteristic exponents for variables tending to infinity are analyzed. The
following methods are considered: diff-log Pad\'e summation, self-similar
factor approximation, self-similar diff-log summation, self-similar Borel
summation, and self-similar Borel-Leroy summation. Several typical problems are
treated. The comparison of the results shows that all these methods provide
close estimates for the large-variable exponents. The reliable estimates are
obtained when different methods of summation are compatible with each other.
Authors' comments: Latex file, 19 pages
Simon Welker, Tal Peer, Henry N. Chapman, Timo Gerkmann
One of the most prominent challenges in the field of diffractive imaging is
the phase retrieval (PR) problem: In order to reconstruct an object from its
diffraction pattern, the inverse Fourier transform must be computed. This is
only possible given the full complex-valued diffraction data, i.e. magnitude
and phase. However, in diffractive imaging, generally only magnitudes can be
directly measured while the phase needs to be estimated. In this work we
specifically consider ptychography, a sub-field of diffractive imaging, where
objects are reconstructed from multiple overlapping diffraction images. We
propose an augmentation of existing iterative phase retrieval algorithms with a
neural network designed for refining the result of each iteration. For this
purpose we adapt and extend a recently proposed architecture from the speech
processing field. Evaluation results show the proposed approach delivers
improved convergence rates in terms of both iteration count and algorithm
runtime.
Authors' comments: \copyright{} 2022 IEEE. Personal use of this material is permitted.
Permission from IEEE must be obtained for all other uses, in any current or
future media, including reprinting/republishing this material for advertising
or promotional purposes, creating new collective works, for resale or
redistribution to servers or lists, or reuse of any copyrighted component of
this work in other works
Kaiyi Zhang, Ximing Yang, Yuan Wu, Cheng Jin
Given partial objects and some complete ones as references, point cloud
completion aims to recover authentic shapes. However, existing methods pay
little attention to general shapes, which leads to the poor authenticity of
completion results. Besides, the missing patterns are diverse in reality, but
existing methods can only handle fixed ones, which means a poor generalization
ability. Considering that a partial point cloud is a subset of the
corresponding complete one, we regard them as different samples of the same
distribution and propose Structure Retrieval based Point Completion Network
(SRPCN). It first uses k-means clustering to extract structure points and
disperses them into distributions, and then KL Divergence is used as a metric
to find the complete structure point cloud that best matches the input in a
database. Finally, a PCN-like decoder network is adopted to generate the final
results based on the retrieved structure point clouds. As structure plays an
important role in describing the general shape of an object and the proposed
structure retrieval method is robust to missing patterns, experiments show that
our method can generate more authentic results and has a stronger
generalization ability.
Authors' comments: I think the proposed method has some defects
Huayang Li, Yixuan Su, Deng Cai, Yan Wang, Lemao Liu
Recently, retrieval-augmented text generation attracted increasing attention
of the computational linguistics community. Compared with conventional
generation models, retrieval-augmented text generation has remarkable
advantages and particularly has achieved state-of-the-art performance in many
NLP tasks. This paper aims to conduct a survey about retrieval-augmented text
generation. It firstly highlights the generic paradigm of retrieval-augmented
generation, and then it reviews notable approaches according to different tasks
including dialogue response generation, machine translation, and other
generation tasks. Finally, it points out some important directions on top of
recent methods to facilitate future research.
Authors' comments: all authors contributed equally
Philippe Weinzaepfel, Thomas Lucas, Diane Larlus, Yannis Kalantidis
Methods that combine local and global features have recently shown excellent
performance on multiple challenging deep image retrieval benchmarks, but their
use of local features raises at least two issues. First, these local features
simply boil down to the localized map activations of a neural network, and
hence can be extremely redundant. Second, they are typically trained with a
global loss that only acts on top of an aggregation of local features; by
contrast, testing is based on local feature matching, which creates a
discrepancy between training and testing. In this paper, we propose a novel
architecture for deep image retrieval, based solely on mid-level features that
we call Super-features. These Super-features are constructed by an iterative
attention module and constitute an ordered set in which each element focuses on
a localized and discriminant image pattern. For training, they require only
image labels. A contrastive loss operates directly at the level of
Super-features and focuses on those that match across images. A second
complementary loss encourages diversity. Experiments on common landmark
retrieval benchmarks validate that Super-features substantially outperform
state-of-the-art methods when using the same number of features, and only
require a significantly smaller memory footprint to match their performance.
Code and models are available at: https://github.com/naver/FIRe.
Authors' comments: ICLR 2022
Samuel Pinilla, Kumar Vijay Mishra, Brian M. Sadler, Henry Arguello
The ability of a radar to discriminate in both range and Doppler velocity is
completely characterized by the ambiguity function (AF) of its transmit
waveform. Mathematically, it is obtained by correlating the waveform with its
Doppler-shifted and delayed replicas. We consider the inverse problem of
designing a radar transmit waveform that satisfies the specified AF magnitude.
This process can be viewed as a signal reconstruction with some variation of
phase retrieval methods. We provide a trust-region algorithm that minimizes a
smoothed non-convex least-squares objective function to iteratively recover the
underlying signal-of-interest for either time- or band-limited support. The
method first approximates the signal using an iterative spectral algorithm and
then refines the attained initialization based upon a sequence of gradient
iterations. Our theoretical analysis shows that unique signal reconstruction is
possible using signal samples no more than thrice the number of signal
frequencies or time samples. Numerical experiments demonstrate that our method
recovers both time- and band-limited signals from even sparsely and randomly
sampled AFs with mean-square-error of $1\times 10^{-6}$ and $9\times 10^{-2}$
for the full noiseless samples and sparse noisy samples, respectively.
Authors' comments: 18 pages, 12 figures, 1 table
Bing Gao
We consider the problem of recovering a signal from the magnitudes of affine
measurements, which is also known as {\em affine phase retrieval}. In this
paper, we formulate affine phase retrieval as an optimization problem and
develop a second-order algorithm based on Newton method to solve it. Besides
being able to convert into a phase retrieval problem, affine phase retrieval
has its unique advantages in its solution. For example, the linear information
in the observation makes it possible to solve this problem with second-order
algorithms under complex measurements. Another advantage is that our algorithm
doesn't have any special requirements for the initial point, while an
appropriate initial value is essential for most non-convex phase retrieval
algorithms. Starting from zero, our algorithm generates iteration point by
Newton method, and we prove that the algorithm can quadratically converge to
the true signal without any ambiguity for both Gaussian measurements and CDP
measurements. In addition, we also use some numerical simulations to verify the
conclusions and to show the effectiveness of the algorithm.
Authors' comments: 15 pages, 2 figures
Jianfeng Gao, Chenyan Xiong, Paul Bennett, Nick Craswell
A conversational information retrieval (CIR) system is an information
retrieval (IR) system with a conversational interface which allows users to
interact with the system to seek information via multi-turn conversations of
natural language, in spoken or written form. Recent progress in deep learning
has brought tremendous improvements in natural language processing (NLP) and
conversational AI, leading to a plethora of commercial conversational services
that allow naturally spoken and typed interaction, increasing the need for more
human-centric interactions in IR. As a result, we have witnessed a resurgent
interest in developing modern CIR systems in both research communities and
industry. This book surveys recent advances in CIR, focusing on neural
approaches that have been developed in the last few years. This book is based
on the authors' tutorial at SIGIR'2020 (Gao et al., 2020b), with IR and NLP
communities as the primary target audience. However, audiences with other
background, such as machine learning and human-computer interaction, will also
find it an accessible introduction to CIR. We hope that this book will prove a
valuable resource for students, researchers, and software developers. This
manuscript is a working draft. Comments are welcome.
Authors' comments: Book Draft
Tatiana Latychevskaia
Modern microscopy techniques are developing towards high-resolution imaging, and tremendous progress has been made in past decades; however, the imaging of individual biological macromolecules at atomic resolution using short-wavelength radiation such as electrons or X-rays has not yet been achieved. The construction of free-electron lasers in many countries around the world arises from the desire to develop new imaging techniques by employing coherent radiation to image individual macromolecules. This work deals with coherent imaging and related phase retrieval techniques, with an emphasis on their application in the imaging of individual biological macromolecules.
Sebastian Hofstätter, Sophia Althammer, Mete Sertkan, Allan Hanbury
We present strong Transformer-based re-ranking and dense retrieval baselines
for the recently released TripClick health ad-hoc retrieval collection. We
improve the - originally too noisy - training data with a simple negative
sampling policy. We achieve large gains over BM25 in the re-ranking task of
TripClick, which were not achieved with the original baselines. Furthermore, we
study the impact of different domain-specific pre-trained models on TripClick.
Finally, we show that dense retrieval outperforms BM25 by considerable margins,
even with simple training procedures.
Authors' comments: Accepted at ECIR 2022
Hai Su, Meiyin Han, Junle Liang, Jun Liang, Songsen Yu
Compared with the traditional hashing methods, deep hashing methods generate hash codes with rich semantic information and greatly improves the performances in the image retrieval field. However, it is unsatisfied for current deep hashing methods to predict the similarity of hard examples. It exists two main factors affecting the ability of learning hard examples, which are weak key features extraction and the shortage of hard examples. In this paper, we give a novel end-to-end model to extract the key feature from hard examples and obtain hash code with the accurate semantic information. In addition, we redesign a hard pair-wise loss function to assess the hard degree and update penalty weights of examples. It effectively alleviates the shortage problem in hard examples. Experimental results on CIFAR-10 and NUS-WIDE demonstrate that our model outperformances the mainstream hashing-based image retrieval methods.
Simion-Vlad Bogolin, Ioana Croitoru, Hailin Jin, Yang Liu, Samuel Albanie
Profiting from large-scale training datasets, advances in neural architecture
design and efficient inference, joint embeddings have become the dominant
approach for tackling cross-modal retrieval. In this work we first show that,
despite their effectiveness, state-of-the-art joint embeddings suffer
significantly from the longstanding "hubness problem" in which a small number
of gallery embeddings form the nearest neighbours of many queries. Drawing
inspiration from the NLP literature, we formulate a simple but effective
framework called Querybank Normalisation (QB-Norm) that re-normalises query
similarities to account for hubs in the embedding space. QB-Norm improves
retrieval performance without requiring retraining. Differently from prior
work, we show that QB-Norm works effectively without concurrent access to any
test set queries. Within the QB-Norm framework, we also propose a novel
similarity normalisation method, the Dynamic Inverted Softmax, that is
significantly more robust than existing approaches. We showcase QB-Norm across
a range of cross modal retrieval models and benchmarks where it consistently
enhances strong baselines beyond the state of the art. Code is available at
https://vladbogo.github.io/QB-Norm/.
Authors' comments: Accepted at CVPR 2022
Young Kyun Jang, Geonmo Gu, Byungsoo Ko, Isaac Kang, Nam Ik Cho
In hash-based image retrieval systems, degraded or transformed inputs usually
generate different codes from the original, deteriorating the retrieval
accuracy. To mitigate this issue, data augmentation can be applied during
training. However, even if augmented samples of an image are similar in real
feature space, the quantization can scatter them far away in Hamming space.
This results in representation discrepancies that can impede training and
degrade performance. In this work, we propose a novel self-distilled hashing
scheme to minimize the discrepancy while exploiting the potential of augmented
data. By transferring the hash knowledge of the weakly-transformed samples to
the strong ones, we make the hash code insensitive to various transformations.
We also introduce hash proxy-based similarity learning and binary cross
entropy-based quantization loss to provide fine quality hash codes. Ultimately,
we construct a deep hashing framework that not only improves the existing deep
hashing approaches, but also achieves the state-of-the-art retrieval results.
Extensive experiments are conducted and confirm the effectiveness of our work.
Authors' comments: ECCV2022
Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, Edouard Grave
Recently, information retrieval has seen the emergence of dense retrievers, using neural networks, as an alternative to classical sparse methods based on term-frequency. These models have obtained state-of-the-art results on datasets and tasks where large training sets are available. However, they do not transfer well to new applications with no training data, and are outperformed by unsupervised term-frequency methods such as BM25. In this work, we explore the limits of contrastive learning as a way to train unsupervised dense retrievers and show that it leads to strong performance in various retrieval settings. On the BEIR benchmark our unsupervised model outperforms BM25 on 11 out of 15 datasets for the Recall@100. When used as pre-training before fine-tuning, either on a few thousands in-domain examples or on the large MS~MARCO dataset, our contrastive model leads to improvements on the BEIR benchmark. Finally, we evaluate our approach for multi-lingual retrieval, where training data is even scarcer than for English, and show that our approach leads to strong unsupervised performance. Our model also exhibits strong cross-lingual transfer when fine-tuned on supervised English data only and evaluated on low resources language such as Swahili. We show that our unsupervised models can perform cross-lingual retrieval between different scripts, such as retrieving English documents from Arabic queries, which would not be possible with term matching methods.
Jianmo Ni, Chen Qu, Jing Lu, Zhuyun Dai, Gustavo Hernández Ábrego, Ji Ma, Vincent Y. Zhao, Yi Luan et al.
It has been shown that dual encoders trained on one domain often fail to generalize to other domains for retrieval tasks. One widespread belief is that the bottleneck layer of a dual encoder, where the final score is simply a dot-product between a query vector and a passage vector, is too limited to make dual encoders an effective retrieval model for out-of-domain generalization. In this paper, we challenge this belief by scaling up the size of the dual encoder model {\em while keeping the bottleneck embedding size fixed.} With multi-stage training, surprisingly, scaling up the model size brings significant improvement on a variety of retrieval tasks, especially for out-of-domain generalization. Experimental results show that our dual encoders, \textbf{G}eneralizable \textbf{T}5-based dense \textbf{R}etrievers (GTR), outperform %ColBERT~\cite{khattab2020colbert} and existing sparse and dense retrievers on the BEIR dataset~\cite{thakur2021beir} significantly. Most surprisingly, our ablation study finds that GTR is very data efficient, as it only needs 10\% of MS Marco supervised data to achieve the best out-of-domain performance. All the GTR models are released at https://tfhub.dev/google/collections/gtr/1.