Yisha Yao
In this paper, we provide a general methodology to draw statistical inferences on individual signal coordinates or linear combinations of them in sparse phase retrieval. Given an initial estimator for the targeting parameter (some simple function of the signal), which is generated by some existing algorithm, we can modify it in a way that the modified version is asymptotically normal and unbiased. Then confidence intervals and hypothesis testings can be constructed based on this asymptotic normality. For conciseness, we focus on confidence intervals in this work, while a similar procedure can be adopted for hypothesis testings. Under some mild assumptions on the signal and sample size, we establish theoretical guarantees for the proposed method. These assumptions are generally weak in the sense that the dimension could exceed the sample size and many non-zero small coordinates are allowed. Furthermore, theoretical analysis reveals that the modified estimators for individual coordinates have uniformly bounded variance, and hence simultaneous interval estimation is possible. Numerical simulations in a wide range of settings are supportive of our theoretical results.
Davis Liang, Peng Xu, Siamak Shakeri, Cicero Nogueira dos Santos, Ramesh Nallapati, Zhiheng Huang, Bing Xiang
Passage retrieval addresses the problem of locating relevant passages, usually from a large corpus, given a query. In practice, lexical term-matching algorithms like BM25 are popular choices for retrieval owing to their efficiency. However, term-based matching algorithms often miss relevant passages that have no lexical overlap with the query and cannot be finetuned to downstream datasets. In this work, we consider the embedding-based two-tower architecture as our neural retrieval model. Since labeled data can be scarce and because neural retrieval models require vast amounts of data to train, we propose a novel method for generating synthetic training data for retrieval. Our system produces remarkable results, significantly outperforming BM25 on 5 out of 6 datasets tested, by an average of 2.45 points for Recall@1. In some cases, our model trained on synthetic data can even outperform the same model trained on real data
Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, Weizhu Chen
We propose Generation-Augmented Retrieval (GAR) for answering open-domain
questions, which augments a query through text generation of heuristically
discovered relevant contexts without external resources as supervision. We
demonstrate that the generated contexts substantially enrich the semantics of
the queries and GAR with sparse representations (BM25) achieves comparable or
better performance than state-of-the-art dense retrieval methods such as DPR.
We show that generating diverse contexts for a query is beneficial as fusing
their results consistently yields better retrieval accuracy. Moreover, as
sparse and dense representations are often complementary, GAR can be easily
combined with DPR to achieve even better performance. GAR achieves
state-of-the-art performance on Natural Questions and TriviaQA datasets under
the extractive QA setup when equipped with an extractive reader, and
consistently outperforms other retrieval methods when the same generative
reader is used.
Authors' comments: Minor format updates
Shen Yan, Yang Pen, Shiming Lai, Yu Liu, Maojun Zhang
Conventional image retrieval techniques for Structure-from-Motion (SfM) suffer from the limit of effectively recognizing repetitive patterns and cannot guarantee to create just enough match pairs with high precision and high recall. In this paper, we present a novel retrieval method based on Graph Convolutional Network (GCN) to generate accurate pairwise matches without costly redundancy. We formulate image retrieval task as a node binary classification problem in graph data: a node is marked as positive if it shares the scene overlaps with the query image. The key idea is that we find that the local context in feature space around a query image contains rich information about the matchable relation between this image and its neighbors. By constructing a subgraph surrounding the query image as input data, we adopt a learnable GCN to exploit whether nodes in the subgraph have overlapping regions with the query photograph. Experiments demonstrate that our method performs remarkably well on the challenging dataset of highly ambiguous and duplicated scenes. Besides, compared with state-of-the-art matchable retrieval methods, the proposed approach significantly reduces useless attempted matches without sacrificing the accuracy and completeness of reconstruction.
Aran Komatsuzaki
We classify and re-examine some of the current approaches to improve the performance-computes trade-off of language models, including (1) non-causal models (such as masked language models), (2) extension of batch length with efficient attention, (3) recurrence, (4) conditional computation and (5) retrieval. We identify some limitations (1) - (4) suffer from. For example, (1) currently struggles with open-ended text generation with the output loosely constrained by the input as well as performing general textual tasks like GPT-2/3 due to its need for a specific fine-tuning dataset. (2) and (3) do not improve the prediction of the first $\sim 10^3$ tokens. Scaling up a model size (e.g. efficiently with (4)) still results in poor performance scaling for some tasks. We argue (5) would resolve many of these limitations, and it can (a) reduce the amount of supervision and (b) efficiently extend the context over the entire training dataset and the entire past of the current sample. We speculate how to modify MARGE to perform unsupervised causal modeling that achieves (b) with the retriever jointly trained.
Zhihua Xia, Qi Gu, Lizhi Xiong, Wenhao Zhou, Jian Weng
The rapid growth of digital images motivates individuals and organizations to
upload their images to the cloud server. To preserve privacy, image owners
would prefer to encrypt the images before uploading, but it would strongly
limit the efficient usage of images. Plenty of existing schemes on
privacy-preserving Content-Based Image Retrieval (PPCBIR) try to seek the
balance between security and retrieval ability. However, compared to the
advanced technologies in CBIR like Convolutional Neural Network (CNN), the
existing PPCBIR schemes are far deficient in both accuracy and efficiency. With
more cloud service providers, the collaborative secure image retrieval service
provided by multiple cloud servers becomes possible. In this paper, inspired by
additive secret sharing technology, we propose a series of additive secure
computing protocols on numbers and matrices with better efficiency, and then
show their application in PPCBIR. Specifically, we extract CNN features,
decrease the dimension of features and build the index securely with the help
of our protocols, which include the full process of image retrieval in the
plaintext domain. The experiments and security analysis demonstrate the
efficiency, accuracy, and security of our scheme.
Authors' comments: This version repairs many mistakes
Rima Alaifari, Francesca Bartolucci, Matthias Wellershoff
We study the recovery of square-integrable signals from the absolute values
of their wavelet transforms, also called wavelet phase retrieval. We present a
new uniqueness result for wavelet phase retrieval. To be precise, we show that
any wavelet with finitely many vanishing moments allows for the unique recovery
of real-valued bandlimited signals up to global sign. Additionally, we present
the first uniqueness result for sampled wavelet phase retrieval in which the
underlying wavelets are allowed to be complex-valued and we present a
uniqueness result for phase retrieval from sampled Cauchy wavelet transform
measurements.
Authors' comments: 18 pages
Rolando Perez
We prove that if f and g are holomorphic functions on an open connected domain, with the same moduli on two intersecting segments, then f = g up to the multiplication of a unimodular constant, provided the segments make an angle that is an irrational multiple of $\pi$. We also prove that if f and g are functions in the Nevanlinna class, and if |f | = |g| on the unit circle and on a circle inside the unit disc, then f = g up to the multiplication of a unimodular constant.
Marcelo de Rezende Martins, Marco A. Gerosa
Software developers routinely search for code using general-purpose search engines. However, these search engines cannot find code semantically unless it has an accompanying description. We propose a technique for semantic code search: A Convolutional Neural Network approach to code retrieval (CoNCRA). Our technique aims to find the code snippet that most closely matches the developer's intent, expressed in natural language. We evaluated our approach's efficacy on a dataset composed of questions and code snippets collected from Stack Overflow. Our preliminary results showed that our technique, which prioritizes local interactions (words nearby), improved the state-of-the-art (SOTA) by 5% on average, retrieving the most relevant code snippets in the top 3 (three) positions by almost 80% of the time. Therefore, our technique is promising and can improve the efficacy of semantic code retrieval.
Surgan Jandial, Pinkesh Badjatiya, Pranit Chawla, Ayush Chopra, Mausoom Sarkar, Balaji Krishnamurthy
The ability to efficiently search for images is essential for improving the
user experiences across various products. Incorporating user feedback, via
multi-modal inputs, to navigate visual search can help tailor retrieved results
to specific user queries. We focus on the task of text-conditioned image
retrieval that utilizes support text feedback alongside a reference image to
retrieve images that concurrently satisfy constraints imposed by both inputs.
The task is challenging since it requires learning composite image-text
features by incorporating multiple cross-granular semantic edits from text
feedback and then applying the same to visual features. To address this, we
propose a novel framework SAC which resolves the above in two major steps:
"where to see" (Semantic Feature Attention) and "how to change" (Semantic
Feature Modification). We systematically show how our architecture streamlines
the generation of text-aware image features by removing the need for various
modules required by other state-of-art techniques. We present extensive
quantitative, qualitative analysis, and ablation studies, to show that our
architecture SAC outperforms existing techniques by achieving state-of-the-art
performance on 3 benchmark datasets: FashionIQ, Shoes, and Birds-to-Words,
while supporting natural language feedback of varying lengths.
Authors' comments: Surgan Jandial, Pinkesh Badjatiya, Pranit Chawla, and Ayush Chopra
contributed equally to this work. Work accepted at WACV 2022
Lei Zhang, Zhenwei He, Yi Yang, Liang Wang, Xinbo Gao
The traditional object retrieval task aims to learn a discriminative feature
representation with intra-similarity and inter-dissimilarity, which supposes
that the objects in an image are manually or automatically pre-cropped exactly.
However, in many real-world searching scenarios (e.g., video surveillance), the
objects (e.g., persons, vehicles, etc.) are seldom accurately detected or
annotated. Therefore, object-level retrieval becomes intractable without
bounding-box annotation, which leads to a new but challenging topic, i.e.
image-level search. In this paper, to address the image search issue, we first
introduce an end-to-end Integrated Net (I-Net), which has three merits: 1) A
Siamese architecture and an on-line pairing strategy for similar and dissimilar
objects in the given images are designed. 2) A novel on-line pairing (OLP) loss
is introduced with a dynamic feature dictionary, which alleviates the
multi-task training stagnation problem, by automatically generating a number of
negative pairs to restrict the positives. 3) A hard example priority (HEP)
based softmax loss is proposed to improve the robustness of classification task
by selecting hard categories. With the philosophy of divide and conquer, we
further propose an improved I-Net, called DC-I-Net, which makes two new
contributions: 1) two modules are tailored to handle different tasks separately
in the integrated framework, such that the task specification is guaranteed. 2)
A class-center guided HEP loss (C2HEP) by exploiting the stored class centers
is proposed, such that the intra-similarity and inter-dissimilarity can be
captured for ultimate retrieval. Extensive experiments on famous image-level
search oriented benchmark datasets demonstrate that the proposed DC-I-Net
outperforms the state-of-the-art tasks-integrated and tasks-separated image
search models.
Authors' comments: To appear in IEEE TPAMI, 18 pages
Mohammad Keshavarzi, Clayton Hutson, Chin-Yi Cheng, Mehdi Nourbakhsh, Michael Bergin, Mohammad Rahmani Asl
Developing fully parametric building models for performance-based generative design tasks often requires proficiency in many advanced 3D modeling and visual programming, limiting its use for many building designers. Moreover, iterations of such models can be time-consuming tasks and sometimes limiting, as major changes in the layout design may result in remodeling the entire parametric definition. To address these challenges, we introduce a novel automated generative design system, which takes a basic floor plan sketch as an input and provides a parametric model prepared for multi-objective building optimization as output. Furthermore, the user-designer can assign various design variables for its desired building elements by using simple annotations in the drawing. The system would recognize the corresponding element and define variable constraints to prepare for a multi-objective optimization problem.
Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä
The query-based moment retrieval is a problem of localising a specific clip
from an untrimmed video according a query sentence. This is a challenging task
that requires interpretation of both the natural language query and the video
content. Like in many other areas in computer vision and machine learning, the
progress in query-based moment retrieval is heavily driven by the benchmark
datasets and, therefore, their quality has significant impact on the field. In
this paper, we present a series of experiments assessing how well the benchmark
results reflect the true progress in solving the moment retrieval task. Our
results indicate substantial biases in the popular datasets and unexpected
behaviour of the state-of-the-art models. Moreover, we present new sanity check
experiments and approaches for visualising the results. Finally, we suggest
possible directions to improve the temporal sentence grounding in the future.
Our code for this paper is available at
https://mayu-ot.github.io/hidden-challenges-MR .
Authors' comments: British Machine Vision Conference (BMVC), 2020. (v2) added references
Ronak Gupta, Prerana Mukherjee, Brejesh Lall, Varshul Gupta
Monument classification can be performed on the basis of their appearance and
shape from coarse to fine categories. Although there is much semantic
information present in the monuments which is reflected in the eras they were
built, its type or purpose, the dynasty which established it, etc.
Particularly, Indian subcontinent exhibits a huge deal of variation in terms of
architectural styles owing to its rich cultural heritage. In this paper, we
propose a framework that utilizes hierarchy to preserve semantic information
while performing image classification or image retrieval. We encode the learnt
deep embeddings to construct a dictionary of images and then utilize a
re-ranking framework on the the retrieved results using DeLF features. The
semantic information preserved in these embeddings helps to classify unknown
monuments at higher level of granularity in hierarchy. We have curated a large,
novel Indian heritage monuments dataset comprising of images of historical,
cultural and religious importance with subtypes of eras, dynasties and
architectural styles. We demonstrate the performance of the proposed framework
in image classification and retrieval tasks and compare it with other competing
methods on this dataset.
Authors' comments: Accepted in Structuring and Understanding of Multimedia heritAge
Contents (SUMAC2020), ACM Multimedia Workshops, Seattle, United States,
October 2020
Olga Moskvyak, Frederic Maire, Feras Dayoub, Mahsa Baktashmotlagh
Learning embeddings that are invariant to the pose of the object is crucial
in visual image retrieval and re-identification. The existing approaches for
person, vehicle, or animal re-identification tasks suffer from high intra-class
variance due to deformable shapes and different camera viewpoints. To overcome
this limitation, we propose to align the image embedding with a predefined
order of the keypoints. The proposed keypoint aligned embeddings model
(KAE-Net) learns part-level features via multi-task learning which is guided by
keypoint locations. More specifically, KAE-Net extracts channels from a feature
map activated by a specific keypoint through learning the auxiliary task of
heatmap reconstruction for this keypoint. The KAE-Net is compact, generic and
conceptually simple. It achieves state of the art performance on the benchmark
datasets of CUB-200-2011, Cars196 and VeRi-776 for retrieval and
re-identification tasks.
Authors' comments: 8 pages, 7 figures, accepted to WACV 2021
Kaihua Qin, Henryk Hadass, Arthur Gervais, Joel Reardon
Lightweight Bitcoin clients execute a Simple Payment Verification (SPV) protocol to verify the validity of transactions related to a particular user. Currently, lightweight clients use Bloom filters to significantly reduce the amount of bandwidth required to validate a particular transaction. This is despite the fact that research has shown that Bloom filters are insufficient at preserving the privacy of clients' queries. In this paper we describe our design of an SPV protocol that leverages Private Information Retrieval (PIR) to create fully private and performant queries. We show that our protocol has a low bandwidth and latency cost; properties that make our protocol a viable alternative for lightweight Bitcoin clients and other cryptocurrencies with a similar SPV model. In contract to Bloom filters, our PIR-based approach offers deterministic privacy to the user. Among our results, we show that in the worst case, clients who would like to verify 100 transactions occurring in the past week incurs a bandwidth cost of 33.54 MB with an associated latency of approximately 4.8 minutes, when using our protocol. The same query executed using the Bloom-filter-based SPV protocol incurs a bandwidth cost of 12.85 MB; this is a modest overhead considering the privacy guarantees it provides.
SeungKee Jeon
This paper presents the 1st place solution to the Google Landmark Retrieval
2020 Competition on Kaggle. The solution is based on metric learning to
classify numerous landmark classes, and uses transfer learning with two train
datasets, fine-tuning on bigger images, adjusting loss weight for cleaner
samples, and esemble to enhance the model's performance further. Finally, it
scored 0.38677 mAP@100 on the private leaderboard.
Authors' comments: 3 pages
Paul Hand, Oscar Leong, Vladislav Voroninski
Advances in compressive sensing provided reconstruction algorithms of sparse
signals from linear measurements with optimal sample complexity, but natural
extensions of this methodology to nonlinear inverse problems have been met with
potentially fundamental sample complexity bottlenecks. In particular, tractable
algorithms for compressive phase retrieval with sparsity priors have not been
able to achieve optimal sample complexity. This has created an open problem in
compressive phase retrieval: under generic, phaseless linear measurements, are
there tractable reconstruction algorithms that succeed with optimal sample
complexity? Meanwhile, progress in machine learning has led to the development
of new data-driven signal priors in the form of generative models, which can
outperform sparsity priors with significantly fewer measurements. In this work,
we resolve the open problem in compressive phase retrieval and demonstrate that
generative priors can lead to a fundamental advance by permitting optimal
sample complexity by a tractable algorithm in this challenging nonlinear
inverse problem. We additionally provide empirics showing that exploiting
generative priors in phase retrieval can significantly outperform sparsity
priors. These results provide support for generative priors as a new paradigm
for signal recovery in a variety of contexts, both empirically and
theoretically. The strengths of this paradigm are that (1) generative priors
can represent some classes of natural signals more concisely than sparsity
priors, (2) generative priors allow for direct optimization over the natural
signal manifold, which is intractable under sparsity priors, and (3) the
resulting non-convex optimization problems with generative priors can admit
benign optimization landscapes at optimal sample complexity, perhaps
surprisingly, even in cases of nonlinear measurements.
Authors' comments: 62 pages, 3 figures
Ke Mei, Lei li, Jinchang Xu, Yanhua Cheng, Yugeng Lin
Image retrieval is a fundamental problem in computer vision. This paper presents our 3rd place detailed solution to the Google Landmark Retrieval 2020 challenge. We focus on the exploration of data cleaning and models with metric learning. We use a data cleaning strategy based on embedding clustering. Besides, we employ a data augmentation method called Corner-Cutmix, which improves the model's ability to recognize multi-scale and occluded landmark images. We show in detail the ablation experiments and results of our method.
Michal Sedlák, Mário Ziman
Probabilistic storage and retrieval (PSR) of unitary quantum dynamics is
possible with exponentially small failure probability with respect to the
number of systems used as a quantum memory [PRL 122, 170502 (2019)]. Here we
study improvements due to a priori knowledge about the unitary transformation
to be stored. In particular, we study $N \rightarrow 1$ PSR of qubit phase
gates, i.e. qubit rotations a round $Z$ axis with an unknown angle, and show
that if we access the gate only $N$-times, the optimal probability of perfect
retrieving of its single use is $N/(N+1)$. We propose a quantum circuit
realization for the optimal protocol and show that programmable phase gate [PRL
88, 047905 (2002)] can be turned into $(2^k-1)\rightarrow 1$ optimal PSR of
phase gates and requires only $k$ CNOT gates, while having exponentially small
failure probability in $k$.
Authors' comments: 9 pages, 7 figures