Chengyuan Qian, Ruida Zhou, Chao Tian, Tie Liu
We study the problem of weakly private information retrieval (W-PIR), where a
user wishes to retrieve a desired message from $N$ non-colluding servers in a
way that the privacy leakage regarding the desired message's identity is less
than or equal to a threshold. We propose a new code construction which
significantly improves upon the best known result in the literature, based on
the following critical observation. In previous constructions, for the extreme
case of minimum download, the retrieval pattern is to download the message
directly from $N-1$ servers; however this causes leakage to all these $N-1$
servers, and a better retrieval pattern for this extreme case is to download
the message directly from a single server. The proposed code construction
allows a natural transition to such a pattern, and for both the maximal leakage
metric and the mutual information leakage metric, significant improvements can
be obtained. We provide explicit solutions, in contrast to a previous work by
Lin et al., where only numerical solutions were obtained.
Authors' comments: 6 pages 1 figure, ISIT 2022 accepted
Michael Christ, Ben Pineau, Mitchell A. Taylor
Examples are constructed of infinite-dimensional subspaces $V\subset L^2(\mu)$ with the property that for any $f,g\in V$, if $|f|$ is approximately equal to $|g|$ with respect to the $L^2$ norm, then there exists a unimodular scalar $z$ such that $f$ is approximately equal to $zg$.
Ziyue Wang, Aozhu Chen, Fan Hu, Xirong Li
Negation is a common linguistic skill that allows human to express what we do
NOT want. Naturally, one might expect video retrieval to support
natural-language queries with negation, e.g., finding shots of kids sitting on
the floor and not playing with a dog. However, the state-of-the-art deep
learning based video retrieval models lack such ability, as they are typically
trained on video description datasets such as MSR-VTT and VATEX that lack
negated descriptions. Their retrieved results basically ignore the negator in
the sample query, incorrectly returning videos showing kids playing with dog.
This paper presents the first study on learning to understand negation in video
retrieval and make contributions as follows. By re-purposing two existing
datasets (MSR-VTT and VATEX), we propose a new evaluation protocol for video
retrieval with negation. We propose a learning based method for training a
negation-aware video retrieval model. The key idea is to first construct a soft
negative caption for a specific training video by partially negating its
original caption, and then compute a bidirectionally constrained loss on the
triplet. This auxiliary loss is weightedly added to a standard retrieval loss.
Experiments on the re-purposed benchmarks show that re-training the CLIP
(Contrastive Language-Image Pre-Training) model by the proposed method clearly
improves its ability to handle queries with negation. In addition, the model
performance on the original benchmarks is also improved.
Authors' comments: Accepted by ACMMM2022
Hansi Zeng, Hamed Zamani, Vishwa Vinay
Recent work has shown that more effective dense retrieval models can be
obtained by distilling ranking knowledge from an existing base re-ranking
model. In this paper, we propose a generic curriculum learning based
optimization framework called CL-DRD that controls the difficulty level of
training data produced by the re-ranking (teacher) model. CL-DRD iteratively
optimizes the dense retrieval (student) model by increasing the difficulty of
the knowledge distillation data made available to it. In more detail, we
initially provide the student model coarse-grained preference pairs between
documents in the teacher's ranking and progressively move towards finer-grained
pairwise document ordering requirements. In our experiments, we apply a simple
implementation of the CL-DRD framework to enhance two state-of-the-art dense
retrieval models. Experiments on three public passage retrieval datasets
demonstrate the effectiveness of our proposed framework.
Authors' comments: Accepted to SIGIR 2022
Antonio Mallia, Joel Mackenzie, Torsten Suel, Nicola Tonellotto
Neural information retrieval architectures based on transformers such as BERT
are able to significantly improve system effectiveness over traditional sparse
models such as BM25. Though highly effective, these neural approaches are very
expensive to run, making them difficult to deploy under strict latency
constraints. To address this limitation, recent studies have proposed new
families of learned sparse models that try to match the effectiveness of
learned dense models, while leveraging the traditional inverted index data
structure for efficiency. Current learned sparse models learn the weights of
terms in documents and, sometimes, queries; however, they exploit different
vocabulary structures, document expansion techniques, and query expansion
strategies, which can make them slower than traditional sparse models such as
BM25. In this work, we propose a novel indexing and query processing technique
that exploits a traditional sparse model's "guidance" to efficiently traverse
the index, allowing the more effective learned model to execute fewer scoring
operations. Our experiments show that our guided processing heuristic is able
to boost the efficiency of the underlying learned sparse model by a factor of
four without any measurable loss of effectiveness.
Authors' comments: Accepted at SIGIR 2022
Fernando Diaz, Andres Ferraro
Offline evaluation of information retrieval and recommendation has
traditionally focused on distilling the quality of a ranking into a scalar
metric such as average precision or normalized discounted cumulative gain. We
can use this metric to compare the performance of multiple systems for the same
request. Although evaluation metrics provide a convenient summary of system
performance, they also collapse subtle differences across users into a single
number and can carry assumptions about user behavior and utility not supported
across retrieval scenarios. We propose recall-paired preference (RPP), a
metric-free evaluation method based on directly computing a preference between
ranked lists. RPP simulates multiple user subpopulations per query and compares
systems across these pseudo-populations. Our results across multiple search and
recommendation tasks demonstrate that RPP substantially improves discriminative
power while correlating well with existing metrics and being equally robust to
incomplete data.
Authors' comments: to appear at SIGIR 2022
Meng Huang, Zhiqiang Xu
The recovery of a signal from the intensity measurements with some entries
being known in advance is termed as {\em affine phase retrieval}. In this
paper, we prove that a natural least squares formulation for the affine phase
retrieval is strongly convex on the entire space under some mild conditions,
provided the measurements are complex Gaussian random vecotrs and the
measurement number $m \gtrsim d \log d$ where $d$ is the dimension of signals.
Based on the result, we prove that the simple gradient descent method for the
affine phase retrieval converges linearly to the target solution with high
probability from an arbitrary initial point. These results show an essential
difference between the affine phase retrieval and the classical phase
retrieval, where the least squares formulations for the classical phase
retrieval are non-convex.
Authors' comments: 32 pages
Xun Wang, Bingqing Ke, Xuanping Li, Fangyu Liu, Mingyu Zhang, Xiao Liang, Qiushi Xiao, Cheng Luo et al.
Video search has become the main routine for users to discover videos
relevant to a text query on large short-video sharing platforms. During
training a query-video bi-encoder model using online search logs, we identify a
modality bias phenomenon that the video encoder almost entirely relies on text
matching, neglecting other modalities of the videos such as vision, audio. This
modality imbalanceresults from a) modality gap: the relevance between a query
and a video text is much easier to learn as the query is also a piece of text,
with the same modality as the video text; b) data bias: most training samples
can be solved solely by text matching. Here we share our practices to improve
the first retrieval stage including our solution for the modality imbalance
issue. We propose MBVR (short for Modality Balanced Video Retrieval) with two
key components: manually generated modality-shuffled (MS) samples and a dynamic
margin (DM) based on visual relevance. They can encourage the video encoder to
pay balanced attentions to each modality. Through extensive experiments on a
real world dataset, we show empirically that our method is both effective and
efficient in solving modality bias problem. We have also deployed our MBVR in a
large video platform and observed statistically significant boost over a highly
optimized baseline in an A/B test and manual GSB evaluations.
Authors' comments: Accepted by SIGIR-2022, short paper
Bill Yuchen Lin, Kangmin Tan, Chris Miller, Beiwen Tian, Xiang Ren
Humans can perform unseen tasks by recalling relevant skills acquired
previously and then generalizing them to the target tasks, even if there is no
supervision at all. In this paper, we aim to improve this kind of cross-task
generalization ability of massive multi-task language models, such as T0 and
FLAN, in an unsupervised setting. We propose a retrieval-augmentation method
named ReCross that takes a few unlabelled examples as queries to retrieve a
small subset of upstream data and uses them to update the multi-task model for
better generalization. ReCross is a straightforward yet effective retrieval
method that combines both efficient dense retrieval and effective pair-wise
reranking. Our results and analysis show that it significantly outperforms both
non-retrieval methods and other baseline methods.
Authors' comments: Accepted to NeurIPS 2022. Website: https://inklab.usc.edu/ReCross/
Andrei Neculai, Yanbei Chen, Zeynep Akata
Existing works in image retrieval often consider retrieving images with one
or two query inputs, which do not generalize to multiple queries. In this work,
we investigate a more challenging scenario for composing multiple multimodal
queries in image retrieval. Given an arbitrary number of query images and (or)
texts, our goal is to retrieve target images containing the semantic concepts
specified in multiple multimodal queries. To learn an informative embedding
that can flexibly encode the semantics of various queries, we propose a novel
multimodal probabilistic composer (MPC). Specifically, we model input images
and texts as probabilistic embeddings, which can be further composed by a
probabilistic composition rule to facilitate image retrieval with multiple
multimodal queries. We propose a new benchmark based on the MS-COCO dataset and
evaluate our model on various setups that compose multiple images and (or) text
queries for multimodal image retrieval. Without bells and whistles, we show
that our probabilistic model formulation significantly outperforms existing
related methods on multimodal image retrieval while generalizing well to query
with different amounts of inputs given in arbitrary visual and (or) textual
modalities. Code is available here: https://github.com/andreineculai/MPC.
Authors' comments: CVPR2022 MULA workshop
Jiangui Chen, Ruqing Zhang, Jiafeng Guo, Yixing Fan, Xueqi Cheng
Fact verification (FV) is a challenging task which aims to verify a claim
using multiple evidential sentences from trustworthy corpora, e.g., Wikipedia.
Most existing approaches follow a three-step pipeline framework, including
document retrieval, sentence retrieval and claim verification. High-quality
evidences provided by the first two steps are the foundation of the effective
reasoning in the last step. Despite being important, high-quality evidences are
rarely studied by existing works for FV, which often adopt the off-the-shelf
models to retrieve relevant documents and sentences in an
"index-retrieve-then-rank" fashion. This classical approach has clear drawbacks
as follows: i) a large document index as well as a complicated search process
is required, leading to considerable memory and computational overhead; ii)
independent scoring paradigms fail to capture the interactions among documents
and sentences in ranking; iii) a fixed number of sentences are selected to form
the final evidence set. In this work, we propose GERE, the first system that
retrieves evidences in a generative fashion, i.e., generating the document
titles as well as evidence sentence identifiers. This enables us to mitigate
the aforementioned technical issues since: i) the memory and computational cost
is greatly reduced because the document index is eliminated and the heavy
ranking process is replaced by a light generative process; ii) the dependency
between documents and that between sentences could be captured via sequential
generation process; iii) the generative formulation allows us to dynamically
select a precise set of relevant evidences for each claim. The experimental
results on the FEVER dataset show that GERE achieves significant improvements
over the state-of-the-art baselines, with both time-efficiency and
memory-efficiency.
Authors' comments: Accepted by SIGIR 2022
Paulina Lewandowska, Ryszard Kukulski, Łukasz Pawela, Zbigniew Puchała
This work examines the problem of learning an unknown von Neumann measurement
of dimension $d$ from a finite number of copies. To obtain a faithful
approximation of the given measurement we are allowed to use it $N$ times. Our
main goal is to estimate the asymptotic behavior of the maximum value of the
average fidelity function $F_d$ for a general $N \rightarrow 1$ learning
scheme. We show that $F_d = 1 - \Theta\left(\frac{1}{N^2}\right)$ for arbitrary
but fixed dimension $d$. In addition to that, we compared various learning
schemes for $d=2$. We observed that the learning scheme based on deterministic
port-based teleportation is asymptotically optimal but performs poorly for low
$N$. In particular, we discovered a parallel learning scheme, which despite its
lack of asymptotic optimality, provides a high value of the fidelity for low
values of $N$ and uses only two-qubit entangled memory states.
Authors' comments: 19 pages, 9 figures
Katherine Thai, Yapei Chang, Kalpesh Krishna, Mohit Iyyer
Humanities scholars commonly provide evidence for claims that they make about
a work of literature (e.g., a novel) in the form of quotations from the work.
We collect a large-scale dataset (RELiC) of 78K literary quotations and
surrounding critical analysis and use it to formulate the novel task of
literary evidence retrieval, in which models are given an excerpt of literary
analysis surrounding a masked quotation and asked to retrieve the quoted
passage from the set of all passages in the work. Solving this retrieval task
requires a deep understanding of complex literary and linguistic phenomena,
which proves challenging to methods that overwhelmingly rely on lexical and
semantic similarity matching. We implement a RoBERTa-based dense passage
retriever for this task that outperforms existing pretrained information
retrieval baselines; however, experiments and analysis by human domain experts
indicate that there is substantial room for improvement over our dense
retriever.
Authors' comments: ACL 2022 camera ready (19 pages)
Shuai Lu, Nan Duan, Hojae Han, Daya Guo, Seung-won Hwang, Alexey Svyatkovskiy
Code completion, which aims to predict the following code token(s) according
to the code context, can improve the productivity of software development.
Recent work has proved that statistical language modeling with transformers can
greatly improve the performance in the code completion task via learning from
large-scale source code datasets. However, current approaches focus only on
code context within the file or project, i.e. internal context. Our distinction
is utilizing "external" context, inspired by human behaviors of copying from
the related code snippets when writing code. Specifically, we propose a
retrieval-augmented code completion framework, leveraging both lexical copying
and referring to code with similar semantics by retrieval. We adopt a
stage-wise training approach that combines a source code retriever and an
auto-regressive language model for programming language. We evaluate our
approach in the code completion task in Python and Java programming languages,
achieving a state-of-the-art performance on CodeXGLUE benchmark.
Authors' comments: Published in ACL 2022
Qiang Wang, Yanhao Zhang, Yun Zheng, Pan Pan, Xian-Sheng Hua
Cross-modality interaction is a critical component in Text-Video Retrieval
(TVR), yet there has been little examination of how different influencing
factors for computing interaction affect performance. This paper first studies
the interaction paradigm in depth, where we find that its computation can be
split into two terms, the interaction contents at different granularity and the
matching function to distinguish pairs with the same semantics. We also observe
that the single-vector representation and implicit intensive function
substantially hinder the optimization. Based on these findings, we propose a
disentangled framework to capture a sequential and hierarchical representation.
Firstly, considering the natural sequential structure in both text and video
inputs, a Weighted Token-wise Interaction (WTI) module is performed to decouple
the content and adaptively exploit the pair-wise correlations. This interaction
can form a better disentangled manifold for sequential inputs. Secondly, we
introduce a Channel DeCorrelation Regularization (CDCR) to minimize the
redundancy between the components of the compared vectors, which facilitate
learning a hierarchical representation. We demonstrate the effectiveness of the
disentangled representation on various benchmarks, e.g., surpassing CLIP4Clip
largely by +2.9%, +3.1%, +7.9%, +2.3%, +2.8% and +6.5% R@1 on the MSR-VTT,
MSVD, VATEX, LSMDC, AcitivityNet, and DiDeMo, respectively.
Authors' comments: 22 pages, 11 figures, Tech report
Tong Yu, Pietro Mascagni, Juan Verde, Jacques Marescaux, Didier Mutter, Nicolas Padoy
Searching through large volumes of medical data to retrieve relevant
information is a challenging yet crucial task for clinical care. However the
primitive and most common approach to retrieval, involving text in the form of
keywords, is severely limited when dealing with complex media formats.
Content-based retrieval offers a way to overcome this limitation, by using rich
media as the query itself. Surgical video-to-video retrieval in particular is a
new and largely unexplored research problem with high clinical value,
especially in the real-time case: using real-time video hashing, search can be
achieved directly inside of the operating room. Indeed, the process of hashing
converts large data entries into compact binary arrays or hashes, enabling
large-scale search operations at a very fast rate. However, due to fluctuations
over the course of a video, not all bits in a given hash are equally reliable.
In this work, we propose a method capable of mitigating this uncertainty while
maintaining a light computational footprint. We present superior retrieval
results (3-4 % top 10 mean average precision) on a multi-task evaluation
protocol for surgery, using cholecystectomy phases, bypass phases, and coming
from an entirely new dataset introduced here, critical events across six
different surgery types. Success on this multi-task benchmark shows the
generalizability of our approach for surgical video retrieval.
Authors' comments: 16 pages, 13 figures
Ensheng Shi, Yanlin Wang, Wei Tao, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, Hongbin Sun
Commit messages are important for software development and maintenance. Many
neural network-based approaches have been proposed and shown promising results
on automatic commit message generation. However, the generated commit messages
could be repetitive or redundant. In this paper, we propose RACE, a new
retrieval-augmented neural commit message generation method, which treats the
retrieved similar commit as an exemplar and leverages it to generate an
accurate commit message. As the retrieved commit message may not always
accurately describe the content/intent of the current code diff, we also
propose an exemplar guider, which learns the semantic similarity between the
retrieved and current code diff and then guides the generation of commit
message based on the similarity. We conduct extensive experiments on a large
public dataset with five programming languages. Experimental results show that
RACE can outperform all baselines. Furthermore, RACE can boost the performance
of existing Seq2Seq models in commit message generation.
Authors' comments: Accepted by EMNLP 2022 (The 2022 Conference on Empirical Methods in
Natural Language Processing)
V. I. Yukalov, S. Gluzman
Methods of determining, from small-variable asymptotic expansions, the
characteristic exponents for variables tending to infinity are analyzed. The
following methods are considered: diff-log Pad\'e summation, self-similar
factor approximation, self-similar diff-log summation, self-similar Borel
summation, and self-similar Borel-Leroy summation. Several typical problems are
treated. The comparison of the results shows that all these methods provide
close estimates for the large-variable exponents. The reliable estimates are
obtained when different methods of summation are compatible with each other.
Authors' comments: Latex file, 19 pages
Simon Welker, Tal Peer, Henry N. Chapman, Timo Gerkmann
One of the most prominent challenges in the field of diffractive imaging is
the phase retrieval (PR) problem: In order to reconstruct an object from its
diffraction pattern, the inverse Fourier transform must be computed. This is
only possible given the full complex-valued diffraction data, i.e. magnitude
and phase. However, in diffractive imaging, generally only magnitudes can be
directly measured while the phase needs to be estimated. In this work we
specifically consider ptychography, a sub-field of diffractive imaging, where
objects are reconstructed from multiple overlapping diffraction images. We
propose an augmentation of existing iterative phase retrieval algorithms with a
neural network designed for refining the result of each iteration. For this
purpose we adapt and extend a recently proposed architecture from the speech
processing field. Evaluation results show the proposed approach delivers
improved convergence rates in terms of both iteration count and algorithm
runtime.
Authors' comments: \copyright{} 2022 IEEE. Personal use of this material is permitted.
Permission from IEEE must be obtained for all other uses, in any current or
future media, including reprinting/republishing this material for advertising
or promotional purposes, creating new collective works, for resale or
redistribution to servers or lists, or reuse of any copyrighted component of
this work in other works
Kaiyi Zhang, Ximing Yang, Yuan Wu, Cheng Jin
Given partial objects and some complete ones as references, point cloud
completion aims to recover authentic shapes. However, existing methods pay
little attention to general shapes, which leads to the poor authenticity of
completion results. Besides, the missing patterns are diverse in reality, but
existing methods can only handle fixed ones, which means a poor generalization
ability. Considering that a partial point cloud is a subset of the
corresponding complete one, we regard them as different samples of the same
distribution and propose Structure Retrieval based Point Completion Network
(SRPCN). It first uses k-means clustering to extract structure points and
disperses them into distributions, and then KL Divergence is used as a metric
to find the complete structure point cloud that best matches the input in a
database. Finally, a PCN-like decoder network is adopted to generate the final
results based on the retrieved structure point clouds. As structure plays an
important role in describing the general shape of an object and the proposed
structure retrieval method is robust to missing patterns, experiments show that
our method can generate more authentic results and has a stronger
generalization ability.
Authors' comments: I think the proposed method has some defects