Ayan Kumar Bhunia, Subhadeep Koley, Abdullah Faiz Ur Rahman Khilji, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song
Sketching enables many exciting applications, notably, image retrieval. The
fear-to-sketch problem (i.e., "I can't sketch") has however proven to be fatal
for its widespread adoption. This paper tackles this "fear" head on, and for
the first time, proposes an auxiliary module for existing retrieval models that
predominantly lets the users sketch without having to worry. We first conducted
a pilot study that revealed the secret lies in the existence of noisy strokes,
but not so much of the "I can't sketch". We consequently design a stroke subset
selector that {detects noisy strokes, leaving only those} which make a positive
contribution towards successful retrieval. Our Reinforcement Learning based
formulation quantifies the importance of each stroke present in a given subset,
based on the extent to which that stroke contributes to retrieval. When
combined with pre-trained retrieval models as a pre-processing module, we
achieve a significant gain of 8%-10% over standard baselines and in turn report
new state-of-the-art performance. Last but not least, we demonstrate the
selector once trained, can also be used in a plug-and-play manner to empower
various sketch applications in ways that were not previously possible.
Authors' comments: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2022
Code: https://github.com/AyanKumarBhunia/Stroke_Subset_Selector-for-FGSBIR
Zhirong Xu, Shiyang Wen, Junshan Wang, Guojun Liu, Liang Wang, Zhi Yang, Lei Ding, Yan Zhang et al.
Graph embedding based retrieval has become one of the most popular techniques
in the information retrieval community and search engine industry. The
classical paradigm mainly relies on the flat Euclidean geometry. In recent
years, hyperbolic (negative curvature) and spherical (positive curvature)
representation methods have shown their superiority to capture hierarchical and
cyclic data structures respectively. However, in industrial scenarios such as
e-commerce sponsored search platforms, the large-scale heterogeneous
query-item-advertisement interaction graphs often have multiple structures
coexisting. Existing methods either only consider a single geometry space, or
combine several spaces manually, which are incapable and inflexible to model
the complexity and heterogeneity in the real scenario. To tackle this
challenge, we present a web-scale Adaptive Mixed-Curvature ADvertisement
retrieval system (AMCAD) to automatically capture the complex and heterogeneous
graph structures in non-Euclidean spaces. Specifically, entities are
represented in adaptive mixed-curvature spaces, where the types and curvatures
of the subspaces are trained to be optimal combinations. Besides, an attentive
edge-wise space projector is designed to model the similarities between
heterogeneous nodes according to local graph structures and the relation types.
Moreover, to deploy AMCAD in Taobao, one of the largest ecommerce platforms
with hundreds of million users, we design an efficient two-layer online
retrieval framework for the task of graph based advertisement retrieval.
Extensive evaluations on real-world datasets and A/B tests on online traffic
are conducted to illustrate the effectiveness of the proposed system.
Authors' comments: To appear in ICDE 2022
Min Cao, Shiping Li, Juntao Li, Liqiang Nie, Min Zhang
In the past few years, cross-modal image-text retrieval (ITR) has experienced
increased interest in the research community due to its excellent research
value and broad real-world application. It is designed for the scenarios where
the queries are from one modality and the retrieval galleries from another
modality. This paper presents a comprehensive and up-to-date survey on the ITR
approaches from four perspectives. By dissecting an ITR system into two
processes: feature extraction and feature alignment, we summarize the recent
advance of the ITR approaches from these two perspectives. On top of this, the
efficiency-focused study on the ITR system is introduced as the third
perspective. To keep pace with the times, we also provide a pioneering overview
of the cross-modal pre-training ITR approaches as the fourth perspective.
Finally, we outline the common benchmark datasets and valuation metric for ITR,
and conduct the accuracy comparison among the representative ITR approaches.
Some critical yet less studied issues are discussed at the end of the paper.
Authors' comments: Accpted by IJCAI'2022 survey track
Kithmini Herath, Udith Haputhanthri, Ramith Hettiarachchi, Hasindu Kariyawasam, Raja N. Ahmad, Azeem Ahmad, Balpreet S. Ahluwalia, Chamira U. S. Edussooriya et al.
Since the late 16th century, scientists have continuously innovated and developed new microscope types for various applications. Creating a new architecture from the ground up requires substantial scientific expertise and creativity, often spanning years or even decades. In this study, we propose an alternative approach called "Differentiable Microscopy," which introduces a top-down design paradigm for optical microscopes. Using all-optical phase retrieval as an illustrative example, we demonstrate the effectiveness of data-driven microscopy design through $\partial\mu$. Furthermore, we conduct comprehensive comparisons with competing methods, showcasing the consistent superiority of our learned designs across multiple datasets, including biological samples. To substantiate our ideas, we experimentally validate the functionality of one of the learned designs, providing a proof of concept. The proposed differentiable microscopy framework supplements the creative process of designing new optical systems and would perhaps lead to unconventional but better optical designs.
Tim Beyer, Angela Dai
CAD model retrieval to real-world scene observations has shown strong promise
as a basis for 3D perception of objects and a clean, lightweight mesh-based
scene representation; however, current approaches to retrieve CAD models to a
query scan rely on expensive manual annotations of 1:1 associations of CAD-scan
objects, which typically contain strong lower-level geometric differences. We
thus propose a new weakly-supervised approach to retrieve semantically and
structurally similar CAD models to a query 3D scanned scene without requiring
any CAD-scan associations, and only object detection information as oriented
bounding boxes. Our approach leverages a fully-differentiable top-$k$ retrieval
layer, enabling end-to-end training guided by geometric and perceptual
similarity of the top retrieved CAD models to the scan queries. We demonstrate
that our weakly-supervised approach can outperform fully-supervised retrieval
methods on challenging real-world ScanNet scans, and maintain robustness for
unseen class categories, achieving significantly improved performance over
fully-supervised state of the art in zero-shot CAD retrieval.
Authors' comments: Accompanying video at https://youtu.be/3bCUMxpscdQ
Matthias Hagen, Maik Fröbe, Artur Jurk, Martin Potthast
We introduce and study the task of clickbait spoiling: generating a short
text that satisfies the curiosity induced by a clickbait post. Clickbait links
to a web page and advertises its contents by arousing curiosity instead of
providing an informative summary. Our contributions are approaches to classify
the type of spoiler needed (i.e., a phrase or a passage), and to generate
appropriate spoilers. A large-scale evaluation and error analysis on a new
corpus of 5,000 manually spoiled clickbait posts -- the Webis Clickbait
Spoiling Corpus 2022 -- shows that our spoiler type classifier achieves an
accuracy of 80%, while the question answering model DeBERTa-large outperforms
all others in generating spoilers for both types.
Authors' comments: Accepted at ACL 2022
Alex Falcon, Giuseppe Serra, Oswald Lanz
Due to the amount of videos and related captions uploaded every hour, deep
learning-based solutions for cross-modal video retrieval are attracting more
and more attention. A typical approach consists in learning a joint text-video
embedding space, where the similarity of a video and its associated caption is
maximized, whereas a lower similarity is enforced with all the other captions,
called negatives. This approach assumes that only the video and caption pairs
in the dataset are valid, but different captions - positives - may also
describe its visual contents, hence some of them may be wrongly penalized. To
address this shortcoming, we propose the Relevance-Aware Negatives and
Positives mining (RANP) which, based on the semantics of the negatives,
improves their selection while also increasing the similarity of other valid
positives. We explore the influence of these techniques on two video-text
datasets: EPIC-Kitchens-100 and MSR-VTT. By using the proposed techniques, we
achieve considerable improvements in terms of nDCG and mAP, leading to
state-of-the-art results, e.g. +5.3% nDCG and +3.0% mAP on EPIC-Kitchens-100.
We share code and pretrained models at
\url{https://github.com/aranciokov/ranp}.
Authors' comments: Accepted at 21st International Conference on Image Analysis and
Processing (ICIAP 2021)
Arian Eamaz, Farhang Yeganegi, Mojtaba Soltanalian
The classical problem of phase retrieval has found a wide array of applications in optics, imaging and signal processing. In this paper, we consider the phase retrieval problem in a one-bit setting, where the signals are sampled using one-bit analog-to-digital converters (ADCs). A significant advantage of deploying one-bit ADCs in signal processing systems is their superior sampling rates as compared to their high-resolution counterparts. This leads to an enormous amount of one-bit samples gathered at the output of the ADC in a short period of time. We demonstrate that this advantage pays extraordinary dividends when it comes to convex phase retrieval formulations, namely that the often encountered matrix semi-definiteness constraints as well as rank constraints (that are computationally prohibitive to enforce), become redundant for phase retrieval in the face of a growing sample size. Several numerical results are presented to illustrate the effectiveness of the proposed methodologies.
Shunyu Zhang, Yaobo Liang, Ming Gong, Daxin Jiang, Nan Duan
Dense retrieval has achieved impressive advances in first-stage retrieval
from a large-scale document collection, which is built on bi-encoder
architecture to produce single vector representation of query and document.
However, a document can usually answer multiple potential queries from
different views. So the single vector representation of a document is hard to
match with multi-view queries, and faces a semantic mismatch problem. This
paper proposes a multi-view document representation learning framework, aiming
to produce multi-view embeddings to represent documents and enforce them to
align with different queries. First, we propose a simple yet effective method
of generating multiple embeddings through viewers. Second, to prevent
multi-view embeddings from collapsing to the same one, we further propose a
global-local loss with annealed temperature to encourage the multiple viewers
to better align with different potential queries. Experiments show our method
outperforms recent works and achieves state-of-the-art results.
Authors' comments: ACL 2022
Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, Jong C. Park
Dense retrieval models, which aim at retrieving the most relevant document
for an input query on a dense representation space, have gained considerable
attention for their remarkable success. Yet, dense models require a vast amount
of labeled training data for notable performance, whereas it is often
challenging to acquire query-document pairs annotated by humans. To tackle this
problem, we propose a simple but effective Document Augmentation for dense
Retrieval (DAR) framework, which augments the representations of documents with
their interpolation and perturbation. We validate the performance of DAR on
retrieval tasks with two benchmark datasets, showing that the proposed DAR
significantly outperforms relevant baselines on the dense retrieval of both the
labeled and unlabeled documents.
Authors' comments: ACL 2022
Xuandong Zhao, Zhiguo Yu, Ming Wu, Lei Li
How to learn highly compact yet effective sentence representation?
Pre-trained language models have been effective in many NLP tasks. However,
these models are often huge and produce large sentence embeddings. Moreover,
there is a big performance gap between large and small models. In this paper,
we propose Homomorphic Projective Distillation (HPD) to learn compressed
sentence embeddings. Our method augments a small Transformer encoder model with
learnable projection layers to produce compact representations while mimicking
a large pre-trained language model to retain the sentence representation
quality. We evaluate our method with different model sizes on both semantic
textual similarity (STS) and semantic retrieval (SR) tasks. Experiments show
that our method achieves 2.7-4.5 points performance gain on STS tasks compared
with previous best representations of the same size. In SR tasks, our method
improves retrieval speed (8.2$\times$) and memory usage (8.0$\times$) compared
with state-of-the-art large models.
Authors' comments: Findings of ACL 2022
Simran Arora, Patrick Lewis, Angela Fan, Jacob Kahn, Christopher Ré
Users and organizations are generating ever-increasing amounts of private data from a wide range of sources. Incorporating private data is important to personalize open-domain applications such as question-answering, fact-checking, and personal assistants. State-of-the-art systems for these tasks explicitly retrieve relevant information to a user question from a background corpus before producing an answer. While today's retrieval systems assume the corpus is fully accessible, users are often unable or unwilling to expose their private data to entities hosting public data. We first define the PUBLIC-PRIVATE AUTOREGRESSIVE INFORMATION RETRIEVAL (PAIR) privacy framework for the novel retrieval setting over multiple privacy scopes. We then argue that an adequate benchmark is missing to study PAIR since existing textual benchmarks require retrieving from a single data distribution. However, public and private data intuitively reflect different distributions, motivating us to create ConcurrentQA, the first textual QA benchmark to require concurrent retrieval over multiple data-distributions. Finally, we show that existing systems face large privacy vs. performance tradeoffs when applied to our proposed retrieval setting and investigate how to mitigate these tradeoffs.
Luyu Gao, Xueguang Ma, Jimmy Lin, Jamie Callan
Recent rapid advancements in deep pre-trained language models and the introductions of large datasets have powered research in embedding-based dense retrieval. While several good research papers have emerged, many of them come with their own software stacks. These stacks are typically optimized for some particular research goals instead of efficiency or code structure. In this paper, we present Tevatron, a dense retrieval toolkit optimized for efficiency, flexibility, and code simplicity. Tevatron provides a standardized pipeline for dense retrieval including text processing, model training, corpus/query encoding, and search. This paper presents an overview of Tevatron and demonstrates its effectiveness and efficiency across several IR and QA data sets. We also show how Tevatron's flexible design enables easy generalization across datasets, model architectures, and accelerator platforms(GPU/TPU). We believe Tevatron can serve as an effective software foundation for dense retrieval system research including design, modeling, and optimization.
Dingkun Long, Qiong Gao, Kuan Zou, Guangwei Xu, Pengjun Xie, Ruijie Guo, Jian Xu, Guanjun Jiang et al.
Passage retrieval is a fundamental task in information retrieval (IR)
research, which has drawn much attention recently. In the English field, the
availability of large-scale annotated dataset (e.g, MS MARCO) and the emergence
of deep pre-trained language models (e.g, BERT) has resulted in a substantial
improvement of existing passage retrieval systems. However, in the Chinese
field, especially for specific domains, passage retrieval systems are still
immature due to quality-annotated dataset being limited by scale. Therefore, in
this paper, we present a novel multi-domain Chinese dataset for passage
retrieval (Multi-CPR). The dataset is collected from three different domains,
including E-commerce, Entertainment video and Medical. Each dataset contains
millions of passages and a certain amount of human annotated query-passage
related pairs. We implement various representative passage retrieval methods as
baselines. We find that the performance of retrieval models trained on dataset
from general domain will inevitably decrease on specific domain. Nevertheless,
a passage retrieval system built on in-domain annotated dataset can achieve
significant improvement, which indeed demonstrates the necessity of domain
labeled data for further optimization. We hope the release of the Multi-CPR
dataset could benchmark Chinese passage retrieval task in specific domain and
also make advances for future studies.
Authors' comments: SIGIR 2022 Resource Track
Ioannis Dimitriou
We introduce a novel single-server queue with general retrial times and event-dependent arrivals. This is a versatile model for the study of service systems, in which the server needs a non-negligible time to retrieve waiting customers upon a service completion, while future arrivals depend on the last realized event. Such a model is motivated by the customers' behaviour in service systems where they decide to join based on the last realized event. We investigate the necessary and sufficient stability condition and derive the stationary distribution both at service completion epochs, and at an arbitrary epoch using the supplementary variable technique. We also study the asymptotic behaviour under high rate of retrials. Performance measures are explicitly derived and extensive numerical examples are performed to investigate the impact of event-dependency. Moreover, constrained optimisation problems are formulated and solved with ultimate goal to investigate the admission control problem.
Yanik-Pascal Förster, Alessia Annibale, Luca Gamberi, Evan Tzanis, Pierpaolo Vivo
We introduce a model for the retrieval of information hidden in legal texts.
These are typically organised in a hierarchical (tree) structure, which a
reader interested in a given provision needs to explore down to the "deepest"
level (articles, clauses,...). We assess the structural complexity of legal
trees by computing the mean first-passage time a random reader takes to
retrieve information planted in the leaves. The reader is assumed to skim
through the content of a legal text based on their interests/keywords, and be
drawn towards the sought information based on keywords affinity, i.e. how well
the Chapters/Section headers of the hierarchy seem to match the informational
content of the leaves. Using randomly generated keyword patterns, we
investigate the effect of two main features of the text -- the horizontal and
vertical coherence -- on the searching time, and consider ways to validate our
results using real legal texts. We obtain numerical and analytical results, the
latter based on a mean-field approximation on the level of patterns, which lead
to an explicit expression for the complexity of legal trees as a function of
the structural parameters of the model. Policy implications of our results are
briefly discussed.
Authors' comments: 47 pages, 17 figures
Klara Krieg, Emilia Parada-Cabaleiro, Markus Schedl, Navid Rekabsaz
This work investigates the effect of gender-stereotypical biases in the
content of retrieved results on the relevance judgement of users/annotators. In
particular, since relevance in information retrieval (IR) is a
multi-dimensional concept, we study whether the value and quality of the
retrieved documents for some bias-sensitive queries can be judged differently
when the content of the documents represents different genders. To this aim, we
conduct a set of experiments where the genders of the participants are known as
well as experiments where the participants genders are not specified. The set
of experiments comprise of retrieval tasks, where participants perform a rated
relevance judgement for different search query and search result document
compilations. The shown documents contain different gender indications and are
either relevant or non-relevant to the query. The results show the differences
between the average judged relevance scores among documents with various gender
contents. Our work initiates further research on the connection of the
perception of gender stereotypes in users with their judgements and effects on
IR systems, and aim to raise awareness about the possible biases in this
domain.
Authors' comments: Accepted at workshop on Algorithmic Bias in Search and Recommendation
at ECIR 2022
Jake Taylor
Inverse techniques are used to extract information about an exoplanet's
atmosphere. These techniques are prone to biased results if the appropriate
forward model is not used. One assumption used in a forward model is to assume
that the radius of the planet is constant with wavelength, however a more
realistic assumption is that the photospheric radius varies with each
wavelength. We explore the bias induced when attempting to extract the
molecular abundance from an emission spectrum which was generated with a
variable radius. We find that for low gravity planets, the retrieval model is
not able to fit the data if a constant radius model is used. We find that
biased results are obtained when studying a typical hot Jupiter in the MIRI LRS
wavelength range. Finally, we show that high gravity planets do not suffer a
bias. We recommend that future spectral retrievals that interpret exoplanet
emission spectra should take into account a variable radius.
Authors' comments: 4 pages, 4 figures, 3 tables. Accepted MNRAS letters
Samuel Pinilla, Kumar Vijay Mishra, Igor Shevkunov, Mojtaba Soltanalian, Vladimir Katkovnik, Karen Egiazarian
Phase retrieval in optical imaging refers to the recovery of a complex signal
from phaseless data acquired in the form of its diffraction patterns. These
patterns are acquired through a system with a coherent light source that
employs a diffractive optical element (DOE) to modulate the scene resulting in
coded diffraction patterns at the sensor. Recently, the hybrid approach of
model-driven network or deep unfolding has emerged as an effective alternative
to conventional model-based and learning-based phase retrieval techniques
because it allows for bounding the complexity of algorithms while also
retaining their efficacy. Additionally, such hybrid approaches have shown
promise in improving the design of DOEs that follow theoretical uniqueness
conditions. There are opportunities to exploit novel experimental setups and
resolve even more complex DOE phase retrieval applications. This paper presents
an overview of algorithms and applications of deep unfolding for bootstrapped -
regardless of near, middle, and far zones - phase retrieval.
Authors' comments: 13 pages, 11 figures, 1 table
Francisco Ardevol Martinez, Michiel Min, Inga Kamp, Paul I. Palmer
Exoplanet observations are currently analysed with Bayesian retrieval
techniques. Due to the computational load of the models used, a compromise is
needed between model complexity and computing time. Analysis of data from
future facilities, will need more complex models which will increase the
computational load of retrievals, prompting the search for a faster approach
for interpreting exoplanet observations. Our goal is to compare machine
learning retrievals of exoplanet transmission spectra with nested sampling, and
understand if machine learning can be as reliable as Bayesian retrievals for a
statistically significant sample of spectra while being orders of magnitude
faster. We generate grids of synthetic transmission spectra and their
corresponding planetary and atmospheric parameters, one using free chemistry
models, and the other using equilibrium chemistry models. Each grid is
subsequently rebinned to simulate both HST/WFC3 and JWST/NIRSpec observations,
yielding four datasets in total. Convolutional neural networks (CNNs) are
trained with each of the datasets. We perform retrievals on a 1,000 simulated
observations for each combination of model type and instrument with nested
sampling and machine learning. We also use both methods to perform retrievals
on real WFC3 transmission spectra. Finally, we test how robust machine learning
and nested sampling are against incorrect assumptions in our models. CNNs reach
a lower coefficient of determination between predicted and true values of the
parameters. Nested sampling underestimates the uncertainty in ~8% of
retrievals, whereas CNNs estimate them correctly. For real WFC3 observations,
nested sampling and machine learning agree within $2\sigma$ for ~86% of
spectra. When doing retrievals with incorrect assumptions, nested sampling
underestimates the uncertainty in ~12% to ~41% of cases, whereas this is always
below ~10% for the CNN.
Authors' comments: Accepted for publication in A&A