Seito Kasai, Yuchi Ishikawa, Masaki Hayashi, Yoshimitsu Aoki, Kensho Hara, Hirokatsu Kataoka
In this paper, we present a framework that jointly retrieves and
spatiotemporally highlights actions in videos by enhancing current deep
cross-modal retrieval methods. Our work takes on the novel task of action
highlighting, which visualizes where and when actions occur in an untrimmed
video setting. Action highlighting is a fine-grained task, compared to
conventional action recognition tasks which focus on classification or
window-based localization. Leveraging weak supervision from annotated captions,
our framework acquires spatiotemporal relevance maps and generates local
embeddings which relate to the nouns and verbs in captions. Through
experiments, we show that our model generates various maps conditioned on
different actions, in which conventional visual reasoning methods only go as
far as to show a single deterministic saliency map. Also, our model improves
retrieval recall over our baseline without alignment by 2-3% on the MSR-VTT
dataset.
Authors' comments: Accepted to ICIP 2020
Wenhao Yu, Lingfei Wu, Qingkai Zeng, Shu Tao, Yu Deng, Meng Jiang
Answer retrieval is to find the most aligned answer from a large set of
candidates given a question. Learning vector representations of
questions/answers is the key factor. Question-answer alignment and
question/answer semantics are two important signals for learning the
representations. Existing methods learned semantic representations with dual
encoders or dual variational auto-encoders. The semantic information was
learned from language models or question-to-question (answer-to-answer)
generative processes. However, the alignment and semantics were too separate to
capture the aligned semantics between question and answer. In this work, we
propose to cross variational auto-encoders by generating questions with aligned
answers and generating answers with aligned questions. Experiments show that
our method outperforms the state-of-the-art answer retrieval method on SQuAD.
Authors' comments: Accepted to ACL 2020
Luyu Gao, Zhuyun Dai, Tongfei Chen, Zhen Fan, Benjamin Van Durme, Jamie Callan
This paper presents CLEAR, a retrieval model that seeks to complement
classical lexical exact-match models such as BM25 with semantic matching
signals from a neural embedding matching model. CLEAR explicitly trains the
neural embedding to encode language structures and semantics that lexical
retrieval fails to capture with a novel residual-based embedding learning
method. Empirical evaluations demonstrate the advantages of CLEAR over
state-of-the-art retrieval models, and that it can substantially improve the
end-to-end accuracy and efficiency of reranking pipelines.
Authors' comments: ECIR 2021
Zhuolin Jiang, Amro El-Jaroudi, William Hartmann, Damianos Karakos, Lingjun Zhao
Multiple neural language models have been developed recently, e.g., BERT and XLNet, and achieved impressive results in various NLP tasks including sentence classification, question answering and document ranking. In this paper, we explore the use of the popular bidirectional language model, BERT, to model and learn the relevance between English queries and foreign-language documents in the task of cross-lingual information retrieval. A deep relevance matching model based on BERT is introduced and trained by finetuning a pretrained multilingual BERT model with weak supervision, using home-made CLIR training data derived from parallel corpora. Experimental results of the retrieval of Lithuanian documents against short English queries show that our model is effective and outperforms the competitive baseline approaches.
Ankit Singh Rawat, Aditya Krishna Menon, Andreas Veit, Felix Yu, Sashank J. Reddi, Sanjiv Kumar
Modern retrieval problems are characterised by training sets with potentially billions of labels, and heterogeneous data distributions across subpopulations (e.g., users of a retrieval system may be from different countries), each of which poses a challenge. The first challenge concerns scalability: with a large number of labels, standard losses are difficult to optimise even on a single example. The second challenge concerns uniformity: one ideally wants good performance on each subpopulation. While several solutions have been proposed to address the first challenge, the second challenge has received relatively less attention. In this paper, we propose doubly-stochastic mining (S2M ), a stochastic optimization technique that addresses both challenges. In each iteration of S2M, we compute a per-example loss based on a subset of hardest labels, and then compute the minibatch loss based on the hardest examples. We show theoretically and empirically that by focusing on the hardest examples, S2M ensures that all data subpopulations are modelled well.
Jiawang Bai, Bin Chen, Yiming Li, Dongxian Wu, Weiwei Guo, Shu-tao Xia, En-hui Yang
The deep hashing based retrieval method is widely adopted in large-scale
image and video retrieval. However, there is little investigation on its
security. In this paper, we propose a novel method, dubbed deep hashing
targeted attack (DHTA), to study the targeted attack on such retrieval.
Specifically, we first formulate the targeted attack as a point-to-set
optimization, which minimizes the average distance between the hash code of an
adversarial example and those of a set of objects with the target label. Then
we design a novel component-voting scheme to obtain an anchor code as the
representative of the set of hash codes of objects with the target label, whose
optimality guarantee is also theoretically derived. To balance the performance
and perceptibility, we propose to minimize the Hamming distance between the
hash code of the adversarial example and the anchor code under the
$\ell^\infty$ restriction on the perturbation. Extensive experiments verify
that DHTA is effective in attacking both deep hashing based image retrieval and
video retrieval.
Authors' comments: Accepted by ECCV 2020 as Oral
Stefan Steinerberger
Phase retrieval is concerned with recovering a function $f$ from the absolute value of its Fourier transform $|\widehat{f}|$. We study the stability properties of this problem in Lebesgue spaces. Our main results shows that $$ \| f-g\|_{L^2(\mathbb{R}^n)} \leq 2\cdot \| |\widehat{f}| - |\widehat{g}| \|_{L^2(\mathbb{R}^n)} + h_f\left( \|f-g\|^{}_{L^p(\mathbb{R}^n)}\right) + J(\widehat{f}, \widehat{g}),$$ where $1 \leq p < 2$, $h_f$ is an explicit nonlinear function depending on the smoothness of $f$ and $J$ is an explicit term capturing the invariance under translations. A noteworthy aspect is that the stability is phrased in terms of $L^p$ for $1 \leq p < 2$ while, usually, $L^p$ cannot be used to control $L^2$, the stability estimate has the flavor of an inverse H\"older inequality. It seems conceivable that the estimate is optimal up to constants.
Mikaela Angelina Uy, Jingwei Huang, Minhyuk Sung, Tolga Birdal, Leonidas Guibas
We introduce a new problem of retrieving 3D models that are deformable to a
given query shape and present a novel deep deformation-aware embedding to solve
this retrieval task. 3D model retrieval is a fundamental operation for
recovering a clean and complete 3D model from a noisy and partial 3D scan.
However, given a finite collection of 3D shapes, even the closest model to a
query may not be satisfactory. This motivates us to apply 3D model deformation
techniques to adapt the retrieved model so as to better fit the query. Yet,
certain restrictions are enforced in most 3D deformation techniques to preserve
important features of the original model that prevent a perfect fitting of the
deformed model to the query. This gap between the deformed model and the query
induces asymmetric relationships among the models, which cannot be handled by
typical metric learning techniques. Thus, to retrieve the best models for
fitting, we propose a novel deep embedding approach that learns the asymmetric
relationships by leveraging location-dependent egocentric distance fields. We
also propose two strategies for training the embedding network. We demonstrate
that both of these approaches outperform other baselines in our experiments
with both synthetic and real data. Our project page can be found at
https://deformscan2cad.github.io/.
Authors' comments: Accepted for publication at ECCV 2020. Project page under
https://deformscan2cad.github.io
Joanna K. Barstow, Kevin Heng
Spectral retrieval has long been a powerful tool for interpreting planetary
remote sensing observations. Flexible, parameterised, agnostic models are
coupled with inversion algorithms in order to infer atmospheric properties
directly from observations, with minimal reliance on physical assumptions. This
approach, originally developed for application to Earth satellite data and
subsequently observations of other Solar System planets, has been recently and
successfully applied to transit, eclipse and phase curve spectra of transiting
exoplanets. In this review, we present the current state-of-the-art in terms of
our ability to accurately retrieve information about atmospheric chemistry,
temperature, clouds and spatial variability; we discuss the limitations of
this, both in the available data and modelling strategies used; and we
recommend approaches for future improvement.
Authors' comments: 30 pages, 6 figures. Accepted by Space Science Reviews
Niloofar Tavakolian, Azadeh Nazemi, Donal Fitzpatrick
Information is frequently retrieved from valid personal ID cards by the
authorised organisation to address different purposes. The successful
information retrieval (IR) depends on the accuracy and timing process. A
process which necessitates a long time to respond is frustrating for both sides
in the exchange of data. This paper aims to propose a series of
state-of-the-art methods for the journey of an Identification card (ID) from
the scanning or capture phase to the point before Optical character recognition
(OCR). The key factors for this proposal are the accuracy and speed of the
process during the journey. The experimental results of this research prove
that utilising the methods based on deep learning, such as Efficient and
Accurate Scene Text (EAST) detector and Deep Neural Network (DNN) for face
detection, instead of traditional methods increase the efficiency considerably.
Authors' comments: 6pages,10 figures,conference
Yujie Zhong, Relja Arandjelović, Andrew Zisserman
The objective of this work is to learn a compact embedding of a set of
descriptors that is suitable for efficient retrieval and ranking, whilst
maintaining discriminability of the individual descriptors. We focus on a
specific example of this general problem -- that of retrieving images
containing multiple faces from a large scale dataset of images. Here the set
consists of the face descriptors in each image, and given a query for multiple
identities, the goal is then to retrieve, in order, images which contain all
the identities, all but one, \etc
To this end, we make the following contributions: first, we propose a CNN
architecture -- {\em SetNet} -- to achieve the objective: it learns face
descriptors and their aggregation over a set to produce a compact fixed length
descriptor designed for set retrieval, and the score of an image is a count of
the number of identities that match the query; second, we show that this
compact descriptor has minimal loss of discriminability up to two faces per
image, and degrades slowly after that -- far exceeding a number of baselines;
third, we explore the speed vs.\ retrieval quality trade-off for set retrieval
using this compact descriptor; and, finally, we collect and annotate a large
dataset of images containing various number of celebrities, which we use for
evaluation and is publicly released.
Authors' comments: 20 pages
Yaotian Wang, Xiaohang Sun, Jason W. Fleischer
Recovering a signal from its Fourier intensity underlies many important applications, including lensless imaging and imaging through scattering media. Conventional algorithms for retrieving the phase suffer when noise is present but display global convergence when given clean data. Neural networks have been used to improve algorithm robustness, but efforts to date are sensitive to initial conditions and give inconsistent performance. Here, we combine iterative methods from phase retrieval with image statistics from deep denoisers, via regularization-by-denoising. The resulting methods inherit the advantages of each approach and outperform other noise-robust phase retrieval algorithms. Our work paves the way for hybrid imaging methods that integrate machine-learned constraints in conventional algorithms.
Luca Pedrelli, Phillip D. Keathley, Laura Cattaneo, Franz X. Kärtner, Ursula Keller
Coherent, broadband pulses of extreme ultraviolet (XUV) light provide a new and exciting tool for exploring attosecond electron dynamics. Using photoelectron streaking, interferometric spectrograms can be generated that contain a wealth of information about the phase properties of the photoionization process. If properly retrieved, this phase information reveals attosecond dynamics during photoelectron emission such as multielectron dynamics and resonance processes. However, until now, the full retrieval of the continuous electron wavepacket phase from isolated attosecond pulses has remained challenging. Here, after elucidating key approximations and limitations that hinder one from extracting the coherent electron wavepacket dynamics using available retrieval algorithms, we present a new method called Absolute Complex Dipole transmission matrix element reConstruction (ACDC). We apply the ACDC method to experimental spectrograms to resolve the phase and group delay difference between photoelectrons emitted from Ne and Ar. Our results reveal subtle dynamics in this group delay difference of photoelectrons emitted form Ar. These group delay dynamics were not resolvable with prior methods that were only able to extract phase information at discrete energy levels, emphasizing the importance of a complete and continuous phase retrieval technique such as ACDC. Here we also make this new ACDC retrieval algorithm available with appropriate citation in return.
Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, Ming-Wei Chang
Language model pre-training has been shown to capture a surprising amount of world knowledge, crucial for NLP tasks such as question answering. However, this knowledge is stored implicitly in the parameters of a neural network, requiring ever-larger networks to cover more facts. To capture knowledge in a more modular and interpretable way, we augment language model pre-training with a latent knowledge retriever, which allows the model to retrieve and attend over documents from a large corpus such as Wikipedia, used during pre-training, fine-tuning and inference. For the first time, we show how to pre-train such a knowledge retriever in an unsupervised manner, using masked language modeling as the learning signal and backpropagating through a retrieval step that considers millions of documents. We demonstrate the effectiveness of Retrieval-Augmented Language Model pre-training (REALM) by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA). We compare against state-of-the-art models for both explicit and implicit knowledge storage on three popular Open-QA benchmarks, and find that we outperform all previous methods by a significant margin (4-16% absolute accuracy), while also providing qualitative benefits such as interpretability and modularity.
Jonas Kornprobst, Alexander Paulus, Josef Knapp, Thomas F. Eibert
Phase retrieval is in general a non-convex and non-linear task and the
corresponding algorithms struggle with the issue of local minima. We consider
the case where the measurement samples within typically very small and
disconnected subsets are coherently linked to each other - which is a
reasonable assumption for our objective of antenna measurements. Two classes of
measurement setups are discussed which can provide this kind of extra
information: multi-probe systems and holographic measurements with multiple
reference signals. We propose several formulations of the corresponding phase
retrieval problem. The simplest of these formulations poses a linear system of
equations similar to an eigenvalue problem where a unique non-trivial
null-space vector needs to be found. Accurate phase reconstruction for
partially coherent observations is, thus, possible by a reliable solution
process and with judgment of the solution quality. Under ideal, noise-free
conditions, the required sampling density is less than two times the number of
unknowns. Noise and other observation errors increase this value slightly.
Simulations for Gaussian random matrices and for antenna measurement scenarios
demonstrate that reliable phase reconstruction is possible with the presented
approach.
Authors' comments: 12 pages, 14 figures
Joanna K. Barstow, Quentin Changeat, Ryan Garland, Michael R. Line, Marco Rocchetto, Ingo P. Waldmann
Over the last several years, spectroscopic observations of transiting
exoplanets have begun to uncover information about their atmospheres, including
atmospheric composition and indications of the presence of clouds and hazes.
Spectral retrieval is the leading technique for interpretation of transmission
spectra and is employed by several teams using a variety of forward models and
parameter estimation algorithms. However, different model suites have mostly
been used in isolation and so it is unknown whether the results from each are
comparable. As we approach the launch of the James Webb Space Telescope we
anticipate advances in wavelength coverage, precision, and resolution of
transit spectroscopic data, so it is important that the tools that will be used
to interpret these information rich spectra are validated. To this end, we
present an inter-model comparison of three retrieval suites: TauREx, NEMESIS
and CHIMERA. We demonstrate that the forward model spectra are in good
agreement (residual deviations on the order of 20 - 40 ppm), and discuss the
results of cross retrievals between the three tools. Generally, the constraints
from the cross retrievals are consistent with each other and with input values
to within 1 sigma However, for high precision scenarios with error envelopes of
order 30 ppm, subtle differences in the simulated spectra result in
discrepancies between the different retrieval suites, and inaccuracies in
retrieved values of several sigma. This can be considered analogous to
substantial systematic/astrophysical noise in a real observation, or
errors/omissions in a forward model such as molecular linelist incompleteness
or missing absorbers.
Authors' comments: 25 pages, 21 figures. Accepted in MNRAS
Kaitao Zhang, Chenyan Xiong, Zhenghao Liu, Zhiyuan Liu
This paper democratizes neural information retrieval to scenarios where large
scale relevance training signals are not available. We revisit the classic IR
intuition that anchor-document relations approximate query-document relevance
and propose a reinforcement weak supervision selection method, ReInfoSelect,
which learns to select anchor-document pairs that best weakly supervise the
neural ranker (action), using the ranking performance on a handful of relevance
labels as the reward. Iteratively, for a batch of anchor-document pairs,
ReInfoSelect back propagates the gradients through the neural ranker, gathers
its NDCG reward, and optimizes the data selection network using policy
gradients, until the neural ranker's performance peaks on target relevance
metrics (convergence). In our experiments on three TREC benchmarks, neural
rankers trained by ReInfoSelect, with only publicly available anchor data,
significantly outperform feature-based learning to rank methods and match the
effectiveness of neural rankers trained with private commercial search logs.
Our analyses show that ReInfoSelect effectively selects weak supervision
signals based on the stage of the neural ranker training, and intuitively picks
anchor-document pairs similar to query-document pairs.
Authors' comments: Accepted by WWW 2020
Ori Shmuel, Asaf Cohen
Consider the problem of Private Information Retrieval (PIR), where a user wishes to retrieve a single message from $N$ non-communicating and non-colluding databases (servers). All servers store the same set of $M$ messages and they respond to the user through a block fading Gaussian Multiple Access Channel (MAC). The goal in this setting is to keep the index of the required message private from the servers while minimizing the overall communication overhead. This work provides joint privacy and channel coding retrieval schemes for the Gaussian MAC with and without fading. The schemes exploit the linearity of the channel while using the Compute and Forward (CF) coding scheme. Consequently, single-user encoding and decoding are performed to retrieve the private message. In the case of a channel without fading, the achievable retrieval rate is shown to outperform a separation-based scheme, in which the retrieval and the channel coding are designed separately. Moreover, this rate is asymptotically optimal as the SNR grows, and are up to a constant gap of $2$ bits per channel use from the channel capacity without privacy constraints, for all SNR values. When the channel suffers from fading, the asymmetry between the servers' channels forces a more complicated solution, which involves a hard optimization problem. Nevertheless, we provide coding scheme and lower bounds on the expected achievable retrieval rate which are shown to have the same scaling laws as the channel capacity, both in the number of servers and the SNR.
Yongcheng Ding, José D. Martín-Guerrero, Mikel Sanz, Rafael Magdalena-Benedicto, Xi Chen, Enrique Solano
Active learning is a machine learning method aiming at optimal design for model training. At variance with supervised learning, which labels all samples, active learning provides an improved model by labeling samples with maximal uncertainty according to the estimation model. Here, we propose the use of active learning for efficient quantum information retrieval, which is a crucial task in the design of quantum experiments. Meanwhile, when dealing with large data output, we employ active learning for the sake of classification with minimal cost in fidelity loss. Indeed, labeling only 5% samples, we achieve almost 90% rate estimation. The introduction of active learning methods in the data analysis of quantum experiments will enhance applications of quantum technologies.
Tobias Uelwer, Alexander Oberstraß, Stefan Harmeling
In this paper, we propose the application of conditional generative
adversarial networks to solve various phase retrieval problems. We show that
including knowledge of the measurement process at training time leads to an
optimization at test time that is more robust to initialization than existing
approaches involving generative models. In addition, conditioning the generator
network on the measurements enables us to achieve much more detailed results.
We empirically demonstrate that these advantages provide meaningful solutions
to the Fourier and the compressive phase retrieval problem and that our method
outperforms well-established projection-based methods as well as existing
methods that are based on neural networks. Like other deep learning methods,
our approach is very robust to noise and can therefore be very useful for
real-world applications.
Authors' comments: Accepted at the 25th International Conference on Pattern Recognition
2020 (ICPR)