Joanna K. Barstow, Quentin Changeat, Katy L. Chubb, Patricio E. Cubillos, Billy Edwards, Ryan J. MacDonald, Michiel Min, Ingo P. Waldmann
The Ariel mission, due to launch in 2029, will obtain spectroscopic
information for 1000 exoplanets, providing an unprecedented opportunity for
comparative exoplanetology. Retrieval codes - parameteric atmospheric models
coupled with an inversion algorithm - represent the tool of choice for
interpreting Ariel data. Ensuring that reliable and consistent results can be
produced by these tools is a critical preparatory step for the mission. Here,
we present the results of a retrieval challenge. We use five different
exoplanet retrieval codes to analyse the same synthetic datasets, and test a)
the ability of each to recover the correct input solution and b) the
consistency of the results. We find that generally there is very good agreement
between the five codes, and in the majority of cases the correct solutions are
recovered. This demonstrates the reproducibility of retrievals for transit
spectra of exoplanets, even when codes are not previously benchmarked against
each other.
Authors' comments: 28 pages, 14 figures. Accepted in Experimental Astronomy (2022)
Ian Dobbs-Dixon, Jasmina Blecic
We present a novel physically motivated, parametrized temperature model for
phase-curve retrieval, able to self-consistently assess the variation in
thermal structure in multidimensions. To develop this approach, we drew
motivation from both full three-dimensional general circulation models and
analytic formulations, accounting for the dominant dynamical feature of tidally
locked planets, the planetary jet. Our formulation shows notable flexibility.
It can generate planetary jets of various characteristics and redistribution
efficiencies seen in the literature, including both standard eastward and
unusual westward offset hotspots, as well as more exotic configurations for
potential future observations. In our modeling scheme we utilize a tractable
set of parameters efficient enough to enable future Bayesian analysis and, in
addition to the resolved temperature structure, we return physical insights not
yet derived from retrievals: the amplitude and the phase offset, and the
location and the extent of the equatorial jet.
Authors' comments: 10 pages, 12 figures
Alexander Long, Wei Yin, Thalaiyasingam Ajanthan, Vu Nguyen, Pulak Purkait, Ravi Garg, Alan Blair, Chunhua Shen et al.
We introduce Retrieval Augmented Classification (RAC), a generic approach to augmenting standard image classification pipelines with an explicit retrieval module. RAC consists of a standard base image encoder fused with a parallel retrieval branch that queries a non-parametric external memory of pre-encoded images and associated text snippets. We apply RAC to the problem of long-tail classification and demonstrate a significant improvement over previous state-of-the-art on Places365-LT and iNaturalist-2018 (14.5% and 6.7% respectively), despite using only the training datasets themselves as the external information source. We demonstrate that RAC's retrieval module, without prompting, learns a high level of accuracy on tail classes. This, in turn, frees the base encoder to focus on common classes, and improve its performance thereon. RAC represents an alternative approach to utilizing large, pretrained models without requiring fine-tuning, as well as a first step towards more effectively making use of external memory within common computer vision architectures.
Zhichao Geng, Hang Yan, Zhangyue Yin, Chenxin An, Xipeng Qiu
Chinese NER is a difficult undertaking due to the ambiguity of Chinese characters and the absence of word boundaries. Previous work on Chinese NER focus on lexicon-based methods to introduce boundary information and reduce out-of-vocabulary (OOV) cases during prediction. However, it is expensive to obtain and dynamically maintain high-quality lexicons in specific domains, which motivates us to utilize more general knowledge resources, e.g., search engines. In this paper, we propose TURNER: The Uncertainty-based Retrieval framework for Chinese NER. The idea behind TURNER is to imitate human behavior: we frequently retrieve auxiliary knowledge as assistance when encountering an unknown or uncertain entity. To improve the efficiency and effectiveness of retrieval, we first propose two types of uncertainty sampling methods for selecting the most ambiguous entity-level uncertain components of the input text. Then, the Knowledge Fusion Model re-predict the uncertain samples by combining retrieved knowledge. Experiments on four benchmark datasets demonstrate TURNER's effectiveness. TURNER outperforms existing lexicon-based approaches and achieves the new SOTA.
Yang Shi, Young-joo Chung
Cross-modal retrieval aims to search for data with similar semantic meanings
across different content modalities. However, cross-modal retrieval requires
huge amounts of storage and retrieval time since it needs to process data in
multiple modalities. Existing works focused on learning single-source compact
features such as binary hash codes that preserve similarities between different
modalities. In this work, we propose a jointly learned deep hashing and
quantization network (HQ) for cross-modal retrieval. We simultaneously learn
binary hash codes and quantization codes to preserve semantic information in
multiple modalities by an end-to-end deep learning architecture. At the
retrieval step, binary hashing is used to retrieve a subset of items from the
search space, then quantization is used to re-rank the retrieved items. We
theoretically and empirically show that this two-stage retrieval approach
provides faster retrieval results while preserving accuracy. Experimental
results on the NUS-WIDE, MIR-Flickr, and Amazon datasets demonstrate that HQ
achieves boosts of more than 7% in precision compared to supervised neural
network-based compact coding models.
Authors' comments: Accepted at BMVC 2021
Soo Min Kwon, Xin Li, Anand D. Sarwate
We study the low-rank phase retrieval problem, where the objective is to
recover a sequence of signals (typically images) given the magnitude of linear
measurements of those signals. Existing solutions involve recovering a matrix
constructed by vectorizing and stacking each image. These algorithms model this
matrix to be low-rank and leverage the low-rank property to decrease the sample
complexity required for accurate recovery. However, when the number of
available measurements is more limited, these low-rank matrix models can often
fail. We propose an algorithm called Tucker-Structured Phase Retrieval (TSPR)
that models the sequence of images as a tensor rather than a matrix that we
factorize using the Tucker decomposition. This factorization reduces the number
of parameters that need to be estimated, allowing for a more accurate
reconstruction in the under-sampled regime. Interestingly, we observe that this
structure also has improved performance in the over-determined setting when the
Tucker ranks are chosen appropriately. We demonstrate the effectiveness of our
approach on real video datasets under several different measurement models.
Authors' comments: A shorter version of this paper is in 2022 International Conference
on Acoustics, Speech, and Signal Processing (ICASSP)
Pranav Kadam, Qingyang Zhou, Shan Liu, C. -C. Jay Kuo
An unsupervised point cloud object retrieval and pose estimation method,
called PCRP, is proposed in this work. It is assumed that there exists a
gallery point cloud set that contains point cloud objects with given pose
orientation information. PCRP attempts to register the unknown point cloud
object with those in the gallery set so as to achieve content-based object
retrieval and pose estimation jointly, where the point cloud registration task
is built upon an enhanced version of the unsupervised R-PointHop method.
Experiments on the ModelNet40 dataset demonstrate the superior performance of
PCRP in comparison with traditional and learning based methods.
Authors' comments: 8 pages, 3 figures
Licheng Yu, Jun Chen, Animesh Sinha, Mengjiao MJ Wang, Hugo Chen, Tamara L. Berg, Ning Zhang
We introduce CommerceMM - a multimodal model capable of providing a diverse
and granular understanding of commerce topics associated to the given piece of
content (image, text, image+text), and having the capability to generalize to a
wide range of tasks, including Multimodal Categorization, Image-Text Retrieval,
Query-to-Product Retrieval, Image-to-Product Retrieval, etc. We follow the
pre-training + fine-tuning training regime and present 5 effective pre-training
tasks on image-text pairs. To embrace more common and diverse commerce data
with text-to-multimodal, image-to-multimodal, and multimodal-to-multimodal
mapping, we propose another 9 novel cross-modal and cross-pair retrieval tasks,
called Omni-Retrieval pre-training. The pre-training is conducted in an
efficient manner with only two forward/backward updates for the combined 14
tasks. Extensive experiments and analysis show the effectiveness of each task.
When combining all pre-training tasks, our model achieves state-of-the-art
performance on 7 commerce-related downstream tasks after fine-tuning.
Additionally, we propose a novel approach of modality randomization to
dynamically adjust our model under different efficiency constraints.
Authors' comments: 10 pages, 7 figures. Commerce Multimodal Model towards Real
Applications at Facebook
Sungdong Kim, Gangwoo Kim
Conversational search (CS) needs a holistic understanding of conversational
inputs to retrieve relevant passages. In this paper, we demonstrate the
existence of a retrieval shortcut in CS, which causes models to retrieve
passages solely relying on partial history while disregarding the latest
question. With in-depth analysis, we first show that naively trained dense
retrievers heavily exploit the shortcut and hence perform poorly when asked to
answer history-independent questions. To build more robust models against
shortcut dependency, we explore various hard negative mining strategies.
Experimental results show that training with the model-based hard negatives
effectively mitigates the dependency on the shortcut, significantly improving
dense retrievers on recent CS benchmarks. In particular, our retriever
outperforms the previous state-of-the-art model by 11.0 in Recall@10 on QReCC.
Authors' comments: Accepted to EMNLP 2022 main conference
Maurits Bleeker, Maarten de Rijke
The triplet loss with semi-hard negatives has become the de facto choice for
image-caption retrieval (ICR) methods that are optimized from scratch. Recent
progress in metric learning has given rise to new loss functions that
outperform the triplet loss on tasks such as image retrieval and representation
learning. We ask whether these findings generalize to the setting of ICR by
comparing three loss functions on two ICR methods. We answer this question
negatively: the triplet loss with semi-hard negative mining still outperforms
newly introduced loss functions from metric learning on the ICR task. To gain a
better understanding of these outcomes, we introduce an analysis method to
compare loss functions by counting how many samples contribute to the gradient
w.r.t. the query representation during optimization. We find that loss
functions that result in lower evaluation scores on the ICR task, in general,
take too many (non-informative) samples into account when computing a gradient
w.r.t. the query representation, which results in sub-optimal performance. The
triplet loss with semi-hard negatives is shown to outperform the other loss
functions, as it only takes one (hard) negative into account when computing the
gradient.
Authors' comments: Accepted to ECIR 2022 Reproducibility track
Anatoli S. Kheifets, Rickson Wielian, Igor A. Ivanov, Anna Li Wang, Agostino Marinelli, James P. Cryan
We demonstrate an accurate phase retrieval of XUV atomic ionization by
streaking the photoelectron in a circularly polarized IR laser field. The
streaking phase can then be converted to the atomic time delay containing the
Wigner and continuum-continuum components. Our demonstration is based on a
numerical solution of the time-dependent Schr\"odinger equation. We test this
technique using the hydrogen atom ionized by an isolated attosecond XUV pulse
across a wide range of photon energies. In parallel, we run a series of RABBITT
simulations and demonstrate equivalence of the phase and timing information
provided by the two methods. This validates the proposed technique and makes it
a useful tool that can be applied to a broad range of atomic and molecular
targets exposed to XUV radiation from novel free-electron laser sources.
Authors' comments: 7 pages, 5 figures
Luiz Bonifacio, Hugo Abonizio, Marzieh Fadaee, Rodrigo Nogueira
The information retrieval community has recently witnessed a revolution due to large pretrained transformer models. Another key ingredient for this revolution was the MS MARCO dataset, whose scale and diversity has enabled zero-shot transfer learning to various tasks. However, not all IR tasks and domains can benefit from one single dataset equally. Extensive research in various NLP tasks has shown that using domain-specific training data, as opposed to a general-purpose one, improves the performance of neural models. In this work, we harness the few-shot capabilities of large pretrained language models as synthetic data generators for IR tasks. We show that models finetuned solely on our unsupervised dataset outperform strong baselines such as BM25 as well as recently proposed self-supervised dense retrieval methods. Furthermore, retrievers finetuned on both supervised and our synthetic data achieve better zero-shot transfer than models finetuned only on supervised data. Code, models, and data are available at https://github.com/zetaalphavector/inpars .
Eva Agapaki, Ioannis Brilakis
This paper devises, implements and benchmarks a novel shape retrieval method that can accurately match individual labelled point clusters (instances) of existing industrial facilities with their respective CAD models. It employs a combination of image and point cloud deep learning networks to classify and match instances to their geometrically similar CAD model. It extends our previous research on geometric digital twin generation from point cloud data, which currently is a tedious, manual process. Experiments with our joint network reveal that it can reliably retrieve CAD models at 85.2\% accuracy. The proposed research is a fundamental framework to enable the geometric Digital Twin (gDT) pipeline and incorporate the real geometric configuration into the Digital Twin.
Jinpeng Wang, Bin Chen, Dongliang Liao, Ziyun Zeng, Gongfu Li, Shu-Tao Xia, Jin Xu
With the recent boom of video-based social platforms (e.g., YouTube and
TikTok), video retrieval using sentence queries has become an important demand
and attracts increasing research attention. Despite the decent performance,
existing text-video retrieval models in vision and language communities are
impractical for large-scale Web search because they adopt brute-force search
based on high-dimensional embeddings. To improve efficiency, Web search engines
widely apply vector compression libraries (e.g., FAISS) to post-process the
learned embeddings. Unfortunately, separate compression from feature encoding
degrades the robustness of representations and incurs performance decay. To
pursue a better balance between performance and efficiency, we propose the
first quantized representation learning method for cross-view video retrieval,
namely Hybrid Contrastive Quantization (HCQ). Specifically, HCQ learns both
coarse-grained and fine-grained quantizations with transformers, which provide
complementary understandings for texts and videos and preserve comprehensive
semantic information. By performing Asymmetric-Quantized Contrastive Learning
(AQ-CL) across views, HCQ aligns texts and videos at coarse-grained and
multiple fine-grained levels. This hybrid-grained learning strategy serves as
strong supervision on the cross-view video quantization model, where
contrastive learning at different levels can be mutually promoted. Extensive
experiments on three Web video benchmark datasets demonstrate that HCQ achieves
competitive performance with state-of-the-art non-compressed retrieval methods
while showing high efficiency in storage and computation. Code and
configurations are available at https://github.com/gimpong/WWW22-HCQ.
Authors' comments: Accepted to The Web Conference 2022 (WWW'22). 11 pages, 5 tables, 6
figures
Ashish Rana, Deepanshu Khanna, Tirthankar Ghosal, Muskaan Singh, Harpreet Singh, Prashant Singh Rana
Exponential growth in digital information outlets and the race to publish has
made scientific misinformation more prevalent than ever. However, the task to
fact-verify a given scientific claim is not straightforward even for
researchers. Scientific claim verification requires in-depth knowledge and
great labor from domain experts to substantiate supporting and refuting
evidence from credible scientific sources. The SciFact dataset and
corresponding task provide a benchmarking leaderboard to the community to
develop automatic scientific claim verification systems via extracting and
assimilating relevant evidence rationales from source abstracts. In this work,
we propose a modular approach that sequentially carries out binary
classification for every prediction subtask as in the SciFact leaderboard. Our
simple classifier-based approach uses reduced abstract representations to
retrieve relevant abstracts. These are further used to train the relevant
rationale-selection model. Finally, we carry out two-step stance predictions
that first differentiate non-relevant rationales and then identify supporting
or refuting rationales for a given claim. Experimentally, our system RerrFact
with no fine-tuning, simple design, and a fraction of model parameters fairs
competitively on the leaderboard against large-scale, modular, and joint
modeling approaches. We make our codebase available at
https://github.com/ashishrana160796/RerrFact.
Authors' comments: Accepted in the AAAI-22 Workshop on Scientific Document Understanding
at the Thirty-Sixth AAAI Conference on Artificial Intelligence (SDU@AAAI-22)
Akshat Shrivastava, Shrey Desai, Anchit Gupta, Ali Elkahky, Aleksandr Livshits, Alexander Zotov, Ahmed Aly
Task-oriented semantic parsing models have achieved strong results in recent years, but unfortunately do not strike an appealing balance between model size, runtime latency, and cross-domain generalizability. We tackle this problem by introducing scenario-based semantic parsing: a variant of the original task which first requires disambiguating an utterance's "scenario" (an intent-slot template with variable leaf spans) before generating its frame, complete with ontology and utterance tokens. This formulation enables us to isolate coarse-grained and fine-grained aspects of the task, each of which we solve with off-the-shelf neural modules, also optimizing for the axes outlined above. Concretely, we create a Retrieve-and-Fill (RAF) architecture comprised of (1) a retrieval module which ranks the best scenario given an utterance and (2) a filling module which imputes spans into the scenario to create the frame. Our model is modular, differentiable, interpretable, and allows us to garner extra supervision from scenarios. RAF achieves strong results in high-resource, low-resource, and multilingual settings, outperforming recent approaches by wide margins despite, using base pre-trained encoders, small sequence lengths, and parallel decoding.
Jishnu Ray Chowdhury, Yong Zhuang, Shuyi Wang
Paraphrase generation is a fundamental and long-standing task in natural
language processing. In this paper, we concentrate on two contributions to the
task: (1) we propose Retrieval Augmented Prompt Tuning (RAPT) as a
parameter-efficient method to adapt large pre-trained language models for
paraphrase generation; (2) we propose Novelty Conditioned RAPT (NC-RAPT) as a
simple model-agnostic method of using specialized prompt tokens for controlled
paraphrase generation with varying levels of lexical novelty. By conducting
extensive experiments on four datasets, we demonstrate the effectiveness of the
proposed approaches for retaining the semantic content of the original text
while inducing lexical novelty in the generation.
Authors' comments: Accepted by AAAI 2022 (Oral)
Uri Alon, Frank F. Xu, Junxian He, Sudipta Sengupta, Dan Roth, Graham Neubig
Retrieval-based language models (R-LM) model the probability of natural
language text by combining a standard language model (LM) with examples
retrieved from an external datastore at test time. While effective, a major
bottleneck of using these models in practice is the computationally costly
datastore search, which can be performed as frequently as every time step. In
this paper, we present RetoMaton - retrieval automaton - which approximates the
datastore search, based on (1) saving pointers between consecutive datastore
entries, and (2) clustering of entries into "states". This effectively results
in a weighted finite automaton built on top of the datastore, instead of
representing the datastore as a flat list. The creation of the automaton is
unsupervised, and a RetoMaton can be constructed from any text collection:
either the original training corpus or from another domain. Traversing this
automaton at inference time, in parallel to the LM inference, reduces its
perplexity by up to 1.85, or alternatively saves up to 83% of the nearest
neighbor searches over $k$NN-LM (Khandelwal et al., 2020) without hurting
perplexity. Our code and trained models are available at
https://github.com/neulab/retomaton .
Authors' comments: Accepted to ICML'2022. Code and models are available at
https://github.com/neulab/retomaton
Kanishak Vaidya, B Sundar Rajan
We consider the problem of multi-access cache-aided multi-user Private
Information Retrieval (MuPIR). In this problem, several files are replicated
across multiple servers. There are $K$ users and $C$ cache nodes. Each user can
access $L$ cache nodes, and every cache node can be accessed by several users.
Each user wants to retrieve one file from the servers, but the users do not
want the servers to know their demands. Before the users decide their
respective demands, servers will fill the cache nodes from the content of the
files. Users will then request their desired files from the servers. Servers
will perform coded transmissions, and all the users should get their desired
files from these transmissions and the content placed in the caches they are
accessing. It is required that any individual server should not get any
information about the demands of the users. This problem is an extension of the
dedicated cache-aided MuPIR problem, which itself generalizes the widely
studied single user PIR setup. In this paper, we propose a MuPIR scheme which
utilizes a multi-access setup of the coded caching problem. The presented
scheme is order optimal when $K=\binom{C}{L}$ users. We also characterize the
rate of the scheme for the special case of cyclic wraparound multi-access
setup, where $C=K$ and each user access $L$ consecutive cache nodes in cyclic
wraparound fashion.
Authors' comments: 15 pages, 11 figures, 2 tables. Fixed minor errors in the previous
version and the presentation improved
Robert Beinert, Michael Quellmalz
In optical diffraction tomography (ODT), the three-dimensional scattering potential of a microscopic object rotating around its center is recovered by a series of illuminations with coherent light. Reconstruction algorithms such as the filtered backpropagation require knowledge of the complex-valued wave at the measurement plane, whereas often only intensities, i.e., phaseless measurements, are available in practice. We propose a new reconstruction approach for ODT with unknown phase information based on three key ingredients. First, the light propagation is modeled using Born's approximation enabling us to use the Fourier diffraction theorem. Second, we stabilize the inversion of the non-uniform discrete Fourier transform via total variation regularization utilizing a primal-dual iteration, which also yields a novel numerical inversion formula for ODT with known phase. The third ingredient is a hybrid input-output scheme. We achieved convincing numerical results, which indicate that ODT with phaseless data is possible. The so-obtained 2D and 3D reconstructions are even comparable to the ones with known phase.