Zichao Wang, Weili Nie, Zhuoran Qiao, Chaowei Xiao, Richard Baraniuk, Anima Anandkumar
Generating new molecules with specified chemical and biological properties
via generative models has emerged as a promising direction for drug discovery.
However, existing methods require extensive training/fine-tuning with a large
dataset, often unavailable in real-world generation tasks. In this work, we
propose a new retrieval-based framework for controllable molecule generation.
We use a small set of exemplar molecules, i.e., those that (partially) satisfy
the design criteria, to steer the pre-trained generative model towards
synthesizing molecules that satisfy the given design criteria. We design a
retrieval mechanism that retrieves and fuses the exemplar molecules with the
input molecule, which is trained by a new self-supervised objective that
predicts the nearest neighbor of the input molecule. We also propose an
iterative refinement process to dynamically update the generated molecules and
retrieval database for better generalization. Our approach is agnostic to the
choice of generative models and requires no task-specific fine-tuning. On
various tasks ranging from simple design criteria to a challenging real-world
scenario for designing lead compounds that bind to the SARS-CoV-2 main
protease, we demonstrate our approach extrapolates well beyond the retrieval
database, and achieves better performance and wider applicability than previous
methods. Code is available at https://github.com/NVlabs/RetMol.
Authors' comments: ICLR 2023
Shijie Wang, Jianlong Chang, Zhihui Wang, Haojie Li, Wanli Ouyang, Qi Tian
Fine-grained object retrieval aims to learn discriminative representation to
retrieve visually similar objects. However, existing top-performing works
usually impose pairwise similarities on the semantic embedding spaces or design
a localization sub-network to continually fine-tune the entire model in limited
data scenarios, thus resulting in convergence to suboptimal solutions. In this
paper, we develop Fine-grained Retrieval Prompt Tuning (FRPT), which steers a
frozen pre-trained model to perform the fine-grained retrieval task from the
perspectives of sample prompting and feature adaptation. Specifically, FRPT
only needs to learn fewer parameters in the prompt and adaptation instead of
fine-tuning the entire model, thus solving the issue of convergence to
suboptimal solutions caused by fine-tuning the entire model. Technically, a
discriminative perturbation prompt (DPP) is introduced and deemed as a sample
prompting process, which amplifies and even exaggerates some discriminative
elements contributing to category prediction via a content-aware inhomogeneous
sampling operation. In this way, DPP can make the fine-grained retrieval task
aided by the perturbation prompts close to the solved task during the original
pre-training. Thereby, it preserves the generalization and discrimination of
representation extracted from input samples. Besides, a category-specific
awareness head is proposed and regarded as feature adaptation, which removes
the species discrepancies in features extracted by the pre-trained model using
category-guided instance normalization. And thus, it makes the optimized
features only include the discrepancies among subcategories. Extensive
experiments demonstrate that our FRPT with fewer learnable parameters achieves
the state-of-the-art performance on three widely-used fine-grained datasets.
Authors' comments: Accepted by AAAI 2023
Shervin Ardeshir, Nagendra Kamath, Hossein Taghavi
We explore retrieving character-focused video frames as candidates for being
video thumbnails. To evaluate each frame of the video based on the character(s)
present in it, characters (faces) are evaluated in two aspects:
Facial-expression: We train a CNN model to measure whether a face has an
acceptable facial expression for being in a video thumbnail. This model is
trained to distinguish faces extracted from artworks/thumbnails, from faces
extracted from random frames of videos. Prominence and interactions:
Character(s) in the thumbnail should be important character(s) in the video, to
prevent the algorithm from suggesting non-representative frames as candidates.
We use face clustering to identify the characters in the video, and form a
graph in which the prominence (frequency of appearance) of the character(s),
and their interactions (co-occurrence) are captured. We use this graph to infer
the relevance of the characters present in each candidate frame. Once every
face is scored based on the two criteria above, we infer frame level scores by
combining the scores for all the faces within a frame.
Authors' comments: International Conference on Machine Learning. Machine Learning for
Media Discovery (ML4MD) Workshop 2020
Xiao Han, Sen He, Li Zhang, Yi-Zhe Song, Tao Xiang
Interactive garment retrieval (IGR) aims to retrieve a target garment image
based on a reference garment image along with user feedback on what to change
on the reference garment. Two IGR tasks have been studied extensively:
text-guided garment retrieval (TGR) and visually compatible garment retrieval
(VCR). The user feedback for the former indicates what semantic attributes to
change with the garment category preserved, while the category is the only
thing to be changed explicitly for the latter, with an implicit requirement on
style preservation. Despite the similarity between these two tasks and the
practical need for an efficient system tackling both, they have never been
unified and modeled jointly. In this paper, we propose a Unified Interactive
Garment Retrieval (UIGR) framework to unify TGR and VCR. To this end, we first
contribute a large-scale benchmark suited for both problems. We further propose
a strong baseline architecture to integrate TGR and VCR in one model. Extensive
experiments suggest that unifying two tasks in one framework is not only more
efficient by requiring a single model only, it also leads to better
performance. Code and datasets are available at
https://github.com/BrandonHanx/CompFashion.
Authors' comments: CVPRW 2022
Seongwon Lee, Hongje Seong, Suhyeon Lee, Euntai Kim
Geometric verification is considered a de facto solution for the re-ranking
task in image retrieval. In this study, we propose a novel image retrieval
re-ranking network named Correlation Verification Networks (CVNet). Our
proposed network, comprising deeply stacked 4D convolutional layers, gradually
compresses dense feature correlation into image similarity while learning
diverse geometric matching patterns from various image pairs. To enable
cross-scale matching, it builds feature pyramids and constructs cross-scale
feature correlations within a single inference, replacing costly multi-scale
inferences. In addition, we use curriculum learning with the hard negative
mining and Hide-and-Seek strategy to handle hard samples without losing
generality. Our proposed re-ranking network shows state-of-the-art performance
on several retrieval benchmarks with a significant margin (+12.6% in mAP on
ROxford-Hard+1M set) over state-of-the-art methods. The source code and models
are available online: https://github.com/sungonce/CVNet.
Authors' comments: Accepted to CVPR 2022 (Oral Presentation)
Benno Krojer, Vaibhav Adlakha, Vibhav Vineet, Yash Goyal, Edoardo Ponti, Siva Reddy
The ability to integrate context, including perceptual and temporal cues,
plays a pivotal role in grounding the meaning of a linguistic utterance. In
order to measure to what extent current vision-and-language models master this
ability, we devise a new multimodal challenge, Image Retrieval from Contextual
Descriptions (ImageCoDe). In particular, models are tasked with retrieving the
correct image from a set of 10 minimally contrastive candidates based on a
contextual description. As such, each description contains only the details
that help distinguish between images. Because of this, descriptions tend to be
complex in terms of syntax and discourse and require drawing pragmatic
inferences. Images are sourced from both static pictures and video frames. We
benchmark several state-of-the-art models, including both cross-encoders such
as ViLBERT and bi-encoders such as CLIP, on ImageCoDe. Our results reveal that
these models dramatically lag behind human performance: the best variant
achieves an accuracy of 20.9 on video frames and 59.4 on static pictures,
compared with 90.8 in humans. Furthermore, we experiment with new model
variants that are better equipped to incorporate visual and temporal context
into their representations, which achieve modest gains. Our hope is that
ImageCoDE will foster progress in grounded language understanding by
encouraging models to focus on fine-grained visual differences.
Authors' comments: accepted to ACL 2022
Siyu Lou, Xuenan Xu, Mengyue Wu, Kai Yu
Audio-text retrieval based on natural language descriptions is a challenging task. It involves learning cross-modality alignments between long sequences under inadequate data conditions. In this work, we investigate several audio features as well as sequence aggregation methods for better audio-text alignment. Moreover, through a qualitative analysis we observe that semantic mapping is more important than temporal relations in contextual retrieval. Using pre-trained audio features and a descriptor-based aggregation method, we build our contextual audio-text retrieval system. Specifically, we utilize PANNs features pre-trained on a large sound event dataset and NetRVLAD pooling, which directly works with averaged descriptors. Experiments are conducted on the AudioCaps and CLOTHO datasets, and results are compared with the previous state-of-the-art system. With our proposed system, a significant improvement has been achieved on bidirectional audio-text retrieval, on all metrics including recall, median and mean rank.
Tamir Bendory, Dan Edidin
The purpose of this article is to discuss recent advances in the growing field of phase retrieval, and to publicize open problems that we believe will be of interest to mathematicians in general, and algebraists in particular.
Jacopo Surace, Matteo Scandi
In the context of irreversible dynamics, associating to a physical process its intuitive reverse can result to be a quite ambiguous task. It is a standard choice to define the reverse process using Bayes' theorem, but, in general, this choice is not optimal. In this work we explore whether it is possible to characterise an optimal reverse map building from the concept of state retrieval maps. In doing so, we propose a set of principles that state retrieval maps should satisfy. We find out that the Bayes inspired reverse is just one case in a whole class of possible choices, which can be optimised to give a map retrieving the initial state more precisely than the Bayes rule. Our analysis has the advantage of naturally extending to the quantum regime. In fact, we find a class of reverse transformations containing the Petz recovery map as a particular case, corroborating its interpretation as quantum analogue of the Bayes retrieval. Finally, we present numerical evidences that by adding a single extra axiom one can isolate the usual reverse process derived from Bayes' theorem.
Hartmut Führ, Vignon Oussa
We study the phase retrieval property for orbits of general irreducible
representations of nilpotent groups, for the classes of simply connected
connected Lie groups, and for finite groups. We prove by induction that in the
Lie group case, all irreducible representations do phase retrieval.
For the finite group case, we mostly focus on $p$-groups. Here our main
result states that every irreducible representation of an arbitrary $p$-group
with exponent $p$ and size $\le p^{2+p/2}$ does phase retrieval.
Despite the fundamental differences between the two settings, our inductive
proof methods are remarkably similar.
Authors' comments: Revised version, correcting some insufficient assumptions made in the
previous version. In particular, the general theorem about $p$-groups is only
established for $p$-groups of exponent $p$
Arthur Câmara, Claudia Hauff
Word embeddings, made widely popular in 2013 with the release of word2vec,
have become a mainstay of NLP engineering pipelines. Recently, with the release
of BERT, word embeddings have moved from the term-based embedding space to the
contextual embedding space -- each term is no longer represented by a single
low-dimensional vector but instead each term and \emph{its context} determine
the vector weights. BERT's setup and architecture have been shown to be general
enough to be applicable to many natural language tasks. Importantly for
Information Retrieval (IR), in contrast to prior deep learning solutions to IR
problems which required significant tuning of neural net architectures and
training regimes, "vanilla BERT" has been shown to outperform existing
retrieval algorithms by a wide margin, including on tasks and corpora that have
long resisted retrieval effectiveness gains over traditional IR baselines (such
as Robust04). In this paper, we employ the recently proposed axiomatic dataset
analysis technique -- that is, we create diagnostic datasets that each fulfil a
retrieval heuristic (both term matching and semantic-based) -- to explore what
BERT is able to learn. In contrast to our expectations, we find BERT, when
applied to a recently released large-scale web corpus with ad-hoc topics, to
\emph{not} adhere to any of the explored axioms. At the same time, BERT
outperforms the traditional query likelihood retrieval model by 40\%. This
means that the axiomatic approach to IR (and its extension of diagnostic
datasets created for retrieval heuristics) may in its current form not be
applicable to large-scale corpora. Additional -- different -- axioms are
needed.
Authors' comments: Published at ECIR 2020
Chitrank Gupta, Yash Jain
Graph Retrieval has witnessed continued interest and progress in the past few
years. In thisreport, we focus on neural network based approaches for Graph
matching and retrieving similargraphs from a corpus of graphs. We explore
methods which can soft predict the similaritybetween two graphs. Later, we
gauge the power of a particular baseline (Shortest Path Kernel)and try to model
it in our product graph random walks setting while making it more generalised.
Authors' comments: BS Thesis
Ahtsham Manzoor, Dietmar Jannach
Conversational recommender systems have attracted immense attention recently.
The most recent approaches rely on neural models trained on recorded dialogs
between humans, implementing an end-to-end learning process. These systems are
commonly designed to generate responses given the user's utterances in natural
language. One main challenge is that these generated responses both have to be
appropriate for the given dialog context and must be grammatically and
semantically correct. An alternative to such generation-based approaches is to
retrieve responses from pre-recorded dialog data and to adapt them if needed.
Such retrieval-based approaches were successfully explored in the context of
general conversational systems, but have received limited attention in recent
years for CRS. In this work, we re-assess the potential of such approaches and
design and evaluate a novel technique for response retrieval and ranking. A
user study (N=90) revealed that the responses by our system were on average of
higher quality than those of two recent generation-based systems. We
furthermore found that the quality ranking of the two generation-based
approaches is not aligned with the results from the literature, which points to
open methodological questions. Overall, our research underlines that
retrieval-based approaches should be considered an alternative or complement to
language generation approaches.
Authors' comments: 29 pages, 5 figures, 7 tables
Sachin Pathiyan Cherumanal, Damiano Spina, Falk Scholer, W. Bruce Croft
Existing commercial search engines often struggle to represent different
perspectives of a search query. Argument retrieval systems address this
limitation of search engines and provide both positive (PRO) and negative (CON)
perspectives about a user's information need on a controversial topic (e.g.,
climate change). The effectiveness of such argument retrieval systems is
typically evaluated based on topical relevance and argument quality, without
taking into account the often differing number of documents shown for the
argument stances (PRO or CON). Therefore, systems may retrieve relevant
passages, but with a biased exposure of arguments. In this work, we analyze a
range of non-stochastic fairness-aware ranking and diversity metrics to
evaluate the extent to which argument stances are fairly exposed in argument
retrieval systems.
Using the official runs of the argument retrieval task Touch\'e at CLEF 2020,
as well as synthetic data to control the amount and order of argument stances
in the rankings, we show that systems with the best effectiveness in terms of
topical relevance are not necessarily the most fair or the most diverse in
terms of argument stance. The relationships we found between (un)fairness and
diversity metrics shed light on how to evaluate group fairness -- in addition
to topical relevance -- in argument retrieval settings.
Authors' comments: Accepted at CIKM 2021
Shi Yu, Zhenghao Liu, Chenyan Xiong, Tao Feng, Zhiyuan Liu
Dense retrieval (DR) has the potential to resolve the query understanding
challenge in conversational search by matching in the learned embedding space.
However, this adaptation is challenging due to DR models' extra needs for
supervision signals and the long-tail nature of conversational search. In this
paper, we present a Conversational Dense Retrieval system, ConvDR, that learns
contextualized embeddings for multi-turn conversational queries and retrieves
documents solely using embedding dot products. In addition, we grant ConvDR
few-shot ability using a teacher-student framework, where we employ an ad hoc
dense retriever as the teacher, inherit its document encodings, and learn a
student query encoder to mimic the teacher embeddings on oracle reformulated
queries. Our experiments on TREC CAsT and OR-QuAC demonstrate ConvDR's
effectiveness in both few-shot and fully-supervised settings. It outperforms
previous systems that operate in the sparse word space, matches the retrieval
accuracy of oracle query reformulations, and is also more efficient thanks to
its simplicity. Our analyses reveal that the advantages of ConvDR come from its
ability to capture informative context while ignoring the unrelated context in
previous conversation rounds. This makes ConvDR more effective as conversations
evolve while previous systems may get confused by the increased noise from
previous turns. Our code is publicly available at
https://github.com/thunlp/ConvDR.
Authors' comments: Accepted by SIGIR 2021
Yifan Gao, Jingjing Li, Chien-Sheng Wu, Michael R. Lyu, Irwin King
In conversational machine reading, systems need to interpret natural language rules, answer high-level questions such as "May I qualify for VA health care benefits?", and ask follow-up clarification questions whose answer is necessary to answer the original question. However, existing works assume the rule text is provided for each user question, which neglects the essential retrieval step in real scenarios. In this work, we propose and investigate an open-retrieval setting of conversational machine reading. In the open-retrieval setting, the relevant rule texts are unknown so that a system needs to retrieve question-relevant evidence from a collection of rule texts, and answer users' high-level questions according to multiple retrieved rule texts in a conversational manner. We propose MUDERN, a Multi-passage Discourse-aware Entailment Reasoning Network which extracts conditions in the rule texts through discourse segmentation, conducts multi-passage entailment reasoning to answer user questions directly, or asks clarification follow-up questions to inquiry more information. On our created OR-ShARC dataset, MUDERN achieves the state-of-the-art performance, outperforming existing single-passage conversational machine reading models as well as a new multi-passage conversational machine reading baseline by a large margin. In addition, we conduct in-depth analyses to provide new insights into this new setting and our model.
Sidharth Gupta, Ivan Dokmanić
We address the phase retrieval problem with errors in the sensing vectors. A number of recent methods for phase retrieval are based on least squares (LS) formulations which assume errors in the quadratic measurements. We extend this approach to handle errors in the sensing vectors by adopting the total least squares (TLS) framework that is used in linear inverse problems with operator errors. We show how gradient descent and the specific geometry of the phase retrieval problem can be used to obtain a simple and efficient TLS solution. Additionally, we derive the gradients of the TLS and LS solutions with respect to the sensing vectors and measurements which enables us to calculate the solution errors. By analyzing these error expressions we determine conditions under which each method should outperform the other. We run simulations to demonstrate that our method can lead to more accurate solutions. We further demonstrate the effectiveness of our approach by performing phase retrieval experiments on real optical hardware which naturally contains both sensing vector and measurement errors.
Ruida Zhou, Chao Tian, Hua Sun, James Plank
In the conventional robust $T$-colluding private information retrieval (PIR) system, the user needs to retrieve one of the possible messages while keeping the identity of the requested message private from any $T$ colluding servers. Motivated by the possible heterogeneous privacy requirements for different messages, we consider the $(N, T_1:K_1, T_2:K_2)$ two-level PIR system with a total of $K_2$ messages in the system, where $T_1\geq T_2$ and $K_1\leq K_2$. Any one of the $K_1$ messages needs to be retrieved privately against $T_1$ colluding servers, and any one of the full set of $K_2$ messages needs to be retrieved privately against $T_2$ colluding servers. We obtain a lower bound to the capacity by proposing two novel coding schemes, namely the non-uniform successive cancellation scheme and the non-uniform block cancellation scheme. A capacity upper bound is also derived. The gap between the upper bound and the lower bounds is analyzed, and shown to vanish when $T_1=T_2$. Lastly, we show that the upper bound is in general not tight by providing a stronger bound for a special setting.
Wei Qu, Xiao-Yun Sun, Guan-Tie Deng
This paper concerns the study of reconstructing a function $f$ in the Hardy
space of the unit disc $\D$ from intensity measurements $|f(z)|,\ z\in \D.$
It's known as the problem of phase retrieval. We transform it into solving the
corresponding outer and inner function through the Nevanlinna factorization
Theorem. The outer function will be established based on the mechanical
quadrature method, while we use two different ways to find out the zero points
of Blashcke product, thereby computing the inner function under the assumption
that the singular inner function part is trivial. Then the concrete algorithms
and illustrative experiments follow. Finally, we give a sparse representation
of $f$ by introducing the unwinding adaptive Fourier decomposition.
Authors' comments: 10 pages, 25 figures
Carlos Alexandre Brasil, Miled Hassan Youssef Moussa, Reginaldo de Jesus Napolitano
We present an analytic method, based on the Bohmian equations for quantum
mechanics, for approaching the phase-retrieval problem in the following
formulation: By knowing the probability density $\left\vert
\psi\left(\overrightarrow{r},t\right)\right\vert ^{2}$ and the energy potential
$V\left(\overrightarrow{r},t\right)$ of a system, how can one determine the
complex state $\psi\left(\overrightarrow{r},t\right)$? We illustrate our method
with three classic examples involving Gaussian states, suggesting applications
to quantum state and Hamiltonian engineering.
Authors' comments: 9 pages