Xiao Han, Sen He, Li Zhang, Yi-Zhe Song, Tao Xiang
Interactive garment retrieval (IGR) aims to retrieve a target garment image
based on a reference garment image along with user feedback on what to change
on the reference garment. Two IGR tasks have been studied extensively:
text-guided garment retrieval (TGR) and visually compatible garment retrieval
(VCR). The user feedback for the former indicates what semantic attributes to
change with the garment category preserved, while the category is the only
thing to be changed explicitly for the latter, with an implicit requirement on
style preservation. Despite the similarity between these two tasks and the
practical need for an efficient system tackling both, they have never been
unified and modeled jointly. In this paper, we propose a Unified Interactive
Garment Retrieval (UIGR) framework to unify TGR and VCR. To this end, we first
contribute a large-scale benchmark suited for both problems. We further propose
a strong baseline architecture to integrate TGR and VCR in one model. Extensive
experiments suggest that unifying two tasks in one framework is not only more
efficient by requiring a single model only, it also leads to better
performance. Code and datasets are available at
https://github.com/BrandonHanx/CompFashion.
Authors' comments: CVPRW 2022
Seongwon Lee, Hongje Seong, Suhyeon Lee, Euntai Kim
Geometric verification is considered a de facto solution for the re-ranking
task in image retrieval. In this study, we propose a novel image retrieval
re-ranking network named Correlation Verification Networks (CVNet). Our
proposed network, comprising deeply stacked 4D convolutional layers, gradually
compresses dense feature correlation into image similarity while learning
diverse geometric matching patterns from various image pairs. To enable
cross-scale matching, it builds feature pyramids and constructs cross-scale
feature correlations within a single inference, replacing costly multi-scale
inferences. In addition, we use curriculum learning with the hard negative
mining and Hide-and-Seek strategy to handle hard samples without losing
generality. Our proposed re-ranking network shows state-of-the-art performance
on several retrieval benchmarks with a significant margin (+12.6% in mAP on
ROxford-Hard+1M set) over state-of-the-art methods. The source code and models
are available online: https://github.com/sungonce/CVNet.
Authors' comments: Accepted to CVPR 2022 (Oral Presentation)
Benno Krojer, Vaibhav Adlakha, Vibhav Vineet, Yash Goyal, Edoardo Ponti, Siva Reddy
The ability to integrate context, including perceptual and temporal cues,
plays a pivotal role in grounding the meaning of a linguistic utterance. In
order to measure to what extent current vision-and-language models master this
ability, we devise a new multimodal challenge, Image Retrieval from Contextual
Descriptions (ImageCoDe). In particular, models are tasked with retrieving the
correct image from a set of 10 minimally contrastive candidates based on a
contextual description. As such, each description contains only the details
that help distinguish between images. Because of this, descriptions tend to be
complex in terms of syntax and discourse and require drawing pragmatic
inferences. Images are sourced from both static pictures and video frames. We
benchmark several state-of-the-art models, including both cross-encoders such
as ViLBERT and bi-encoders such as CLIP, on ImageCoDe. Our results reveal that
these models dramatically lag behind human performance: the best variant
achieves an accuracy of 20.9 on video frames and 59.4 on static pictures,
compared with 90.8 in humans. Furthermore, we experiment with new model
variants that are better equipped to incorporate visual and temporal context
into their representations, which achieve modest gains. Our hope is that
ImageCoDE will foster progress in grounded language understanding by
encouraging models to focus on fine-grained visual differences.
Authors' comments: accepted to ACL 2022
Siyu Lou, Xuenan Xu, Mengyue Wu, Kai Yu
Audio-text retrieval based on natural language descriptions is a challenging task. It involves learning cross-modality alignments between long sequences under inadequate data conditions. In this work, we investigate several audio features as well as sequence aggregation methods for better audio-text alignment. Moreover, through a qualitative analysis we observe that semantic mapping is more important than temporal relations in contextual retrieval. Using pre-trained audio features and a descriptor-based aggregation method, we build our contextual audio-text retrieval system. Specifically, we utilize PANNs features pre-trained on a large sound event dataset and NetRVLAD pooling, which directly works with averaged descriptors. Experiments are conducted on the AudioCaps and CLOTHO datasets, and results are compared with the previous state-of-the-art system. With our proposed system, a significant improvement has been achieved on bidirectional audio-text retrieval, on all metrics including recall, median and mean rank.
Tamir Bendory, Dan Edidin
The purpose of this article is to discuss recent advances in the growing field of phase retrieval, and to publicize open problems that we believe will be of interest to mathematicians in general, and algebraists in particular.
Jacopo Surace, Matteo Scandi
In the context of irreversible dynamics, associating to a physical process its intuitive reverse can result to be a quite ambiguous task. It is a standard choice to define the reverse process using Bayes' theorem, but, in general, this choice is not optimal. In this work we explore whether it is possible to characterise an optimal reverse map building from the concept of state retrieval maps. In doing so, we propose a set of principles that state retrieval maps should satisfy. We find out that the Bayes inspired reverse is just one case in a whole class of possible choices, which can be optimised to give a map retrieving the initial state more precisely than the Bayes rule. Our analysis has the advantage of naturally extending to the quantum regime. In fact, we find a class of reverse transformations containing the Petz recovery map as a particular case, corroborating its interpretation as quantum analogue of the Bayes retrieval. Finally, we present numerical evidences that by adding a single extra axiom one can isolate the usual reverse process derived from Bayes' theorem.
Hartmut Führ, Vignon Oussa
We study the phase retrieval property for orbits of general irreducible
representations of nilpotent groups, for the classes of simply connected
connected Lie groups, and for finite groups. We prove by induction that in the
Lie group case, all irreducible representations do phase retrieval.
For the finite group case, we mostly focus on $p$-groups. Here our main
result states that every irreducible representation of an arbitrary $p$-group
with exponent $p$ and size $\le p^{2+p/2}$ does phase retrieval.
Despite the fundamental differences between the two settings, our inductive
proof methods are remarkably similar.
Authors' comments: Revised version, correcting some insufficient assumptions made in the
previous version. In particular, the general theorem about $p$-groups is only
established for $p$-groups of exponent $p$
Arthur Câmara, Claudia Hauff
Word embeddings, made widely popular in 2013 with the release of word2vec,
have become a mainstay of NLP engineering pipelines. Recently, with the release
of BERT, word embeddings have moved from the term-based embedding space to the
contextual embedding space -- each term is no longer represented by a single
low-dimensional vector but instead each term and \emph{its context} determine
the vector weights. BERT's setup and architecture have been shown to be general
enough to be applicable to many natural language tasks. Importantly for
Information Retrieval (IR), in contrast to prior deep learning solutions to IR
problems which required significant tuning of neural net architectures and
training regimes, "vanilla BERT" has been shown to outperform existing
retrieval algorithms by a wide margin, including on tasks and corpora that have
long resisted retrieval effectiveness gains over traditional IR baselines (such
as Robust04). In this paper, we employ the recently proposed axiomatic dataset
analysis technique -- that is, we create diagnostic datasets that each fulfil a
retrieval heuristic (both term matching and semantic-based) -- to explore what
BERT is able to learn. In contrast to our expectations, we find BERT, when
applied to a recently released large-scale web corpus with ad-hoc topics, to
\emph{not} adhere to any of the explored axioms. At the same time, BERT
outperforms the traditional query likelihood retrieval model by 40\%. This
means that the axiomatic approach to IR (and its extension of diagnostic
datasets created for retrieval heuristics) may in its current form not be
applicable to large-scale corpora. Additional -- different -- axioms are
needed.
Authors' comments: Published at ECIR 2020
Chitrank Gupta, Yash Jain
Graph Retrieval has witnessed continued interest and progress in the past few
years. In thisreport, we focus on neural network based approaches for Graph
matching and retrieving similargraphs from a corpus of graphs. We explore
methods which can soft predict the similaritybetween two graphs. Later, we
gauge the power of a particular baseline (Shortest Path Kernel)and try to model
it in our product graph random walks setting while making it more generalised.
Authors' comments: BS Thesis
Ahtsham Manzoor, Dietmar Jannach
Conversational recommender systems have attracted immense attention recently.
The most recent approaches rely on neural models trained on recorded dialogs
between humans, implementing an end-to-end learning process. These systems are
commonly designed to generate responses given the user's utterances in natural
language. One main challenge is that these generated responses both have to be
appropriate for the given dialog context and must be grammatically and
semantically correct. An alternative to such generation-based approaches is to
retrieve responses from pre-recorded dialog data and to adapt them if needed.
Such retrieval-based approaches were successfully explored in the context of
general conversational systems, but have received limited attention in recent
years for CRS. In this work, we re-assess the potential of such approaches and
design and evaluate a novel technique for response retrieval and ranking. A
user study (N=90) revealed that the responses by our system were on average of
higher quality than those of two recent generation-based systems. We
furthermore found that the quality ranking of the two generation-based
approaches is not aligned with the results from the literature, which points to
open methodological questions. Overall, our research underlines that
retrieval-based approaches should be considered an alternative or complement to
language generation approaches.
Authors' comments: 29 pages, 5 figures, 7 tables
Sachin Pathiyan Cherumanal, Damiano Spina, Falk Scholer, W. Bruce Croft
Existing commercial search engines often struggle to represent different
perspectives of a search query. Argument retrieval systems address this
limitation of search engines and provide both positive (PRO) and negative (CON)
perspectives about a user's information need on a controversial topic (e.g.,
climate change). The effectiveness of such argument retrieval systems is
typically evaluated based on topical relevance and argument quality, without
taking into account the often differing number of documents shown for the
argument stances (PRO or CON). Therefore, systems may retrieve relevant
passages, but with a biased exposure of arguments. In this work, we analyze a
range of non-stochastic fairness-aware ranking and diversity metrics to
evaluate the extent to which argument stances are fairly exposed in argument
retrieval systems.
Using the official runs of the argument retrieval task Touch\'e at CLEF 2020,
as well as synthetic data to control the amount and order of argument stances
in the rankings, we show that systems with the best effectiveness in terms of
topical relevance are not necessarily the most fair or the most diverse in
terms of argument stance. The relationships we found between (un)fairness and
diversity metrics shed light on how to evaluate group fairness -- in addition
to topical relevance -- in argument retrieval settings.
Authors' comments: Accepted at CIKM 2021
Shi Yu, Zhenghao Liu, Chenyan Xiong, Tao Feng, Zhiyuan Liu
Dense retrieval (DR) has the potential to resolve the query understanding
challenge in conversational search by matching in the learned embedding space.
However, this adaptation is challenging due to DR models' extra needs for
supervision signals and the long-tail nature of conversational search. In this
paper, we present a Conversational Dense Retrieval system, ConvDR, that learns
contextualized embeddings for multi-turn conversational queries and retrieves
documents solely using embedding dot products. In addition, we grant ConvDR
few-shot ability using a teacher-student framework, where we employ an ad hoc
dense retriever as the teacher, inherit its document encodings, and learn a
student query encoder to mimic the teacher embeddings on oracle reformulated
queries. Our experiments on TREC CAsT and OR-QuAC demonstrate ConvDR's
effectiveness in both few-shot and fully-supervised settings. It outperforms
previous systems that operate in the sparse word space, matches the retrieval
accuracy of oracle query reformulations, and is also more efficient thanks to
its simplicity. Our analyses reveal that the advantages of ConvDR come from its
ability to capture informative context while ignoring the unrelated context in
previous conversation rounds. This makes ConvDR more effective as conversations
evolve while previous systems may get confused by the increased noise from
previous turns. Our code is publicly available at
https://github.com/thunlp/ConvDR.
Authors' comments: Accepted by SIGIR 2021
Yifan Gao, Jingjing Li, Chien-Sheng Wu, Michael R. Lyu, Irwin King
In conversational machine reading, systems need to interpret natural language rules, answer high-level questions such as "May I qualify for VA health care benefits?", and ask follow-up clarification questions whose answer is necessary to answer the original question. However, existing works assume the rule text is provided for each user question, which neglects the essential retrieval step in real scenarios. In this work, we propose and investigate an open-retrieval setting of conversational machine reading. In the open-retrieval setting, the relevant rule texts are unknown so that a system needs to retrieve question-relevant evidence from a collection of rule texts, and answer users' high-level questions according to multiple retrieved rule texts in a conversational manner. We propose MUDERN, a Multi-passage Discourse-aware Entailment Reasoning Network which extracts conditions in the rule texts through discourse segmentation, conducts multi-passage entailment reasoning to answer user questions directly, or asks clarification follow-up questions to inquiry more information. On our created OR-ShARC dataset, MUDERN achieves the state-of-the-art performance, outperforming existing single-passage conversational machine reading models as well as a new multi-passage conversational machine reading baseline by a large margin. In addition, we conduct in-depth analyses to provide new insights into this new setting and our model.
Sidharth Gupta, Ivan Dokmanić
We address the phase retrieval problem with errors in the sensing vectors. A number of recent methods for phase retrieval are based on least squares (LS) formulations which assume errors in the quadratic measurements. We extend this approach to handle errors in the sensing vectors by adopting the total least squares (TLS) framework that is used in linear inverse problems with operator errors. We show how gradient descent and the specific geometry of the phase retrieval problem can be used to obtain a simple and efficient TLS solution. Additionally, we derive the gradients of the TLS and LS solutions with respect to the sensing vectors and measurements which enables us to calculate the solution errors. By analyzing these error expressions we determine conditions under which each method should outperform the other. We run simulations to demonstrate that our method can lead to more accurate solutions. We further demonstrate the effectiveness of our approach by performing phase retrieval experiments on real optical hardware which naturally contains both sensing vector and measurement errors.
Ruida Zhou, Chao Tian, Hua Sun, James Plank
In the conventional robust $T$-colluding private information retrieval (PIR) system, the user needs to retrieve one of the possible messages while keeping the identity of the requested message private from any $T$ colluding servers. Motivated by the possible heterogeneous privacy requirements for different messages, we consider the $(N, T_1:K_1, T_2:K_2)$ two-level PIR system with a total of $K_2$ messages in the system, where $T_1\geq T_2$ and $K_1\leq K_2$. Any one of the $K_1$ messages needs to be retrieved privately against $T_1$ colluding servers, and any one of the full set of $K_2$ messages needs to be retrieved privately against $T_2$ colluding servers. We obtain a lower bound to the capacity by proposing two novel coding schemes, namely the non-uniform successive cancellation scheme and the non-uniform block cancellation scheme. A capacity upper bound is also derived. The gap between the upper bound and the lower bounds is analyzed, and shown to vanish when $T_1=T_2$. Lastly, we show that the upper bound is in general not tight by providing a stronger bound for a special setting.
Wei Qu, Xiao-Yun Sun, Guan-Tie Deng
This paper concerns the study of reconstructing a function $f$ in the Hardy
space of the unit disc $\D$ from intensity measurements $|f(z)|,\ z\in \D.$
It's known as the problem of phase retrieval. We transform it into solving the
corresponding outer and inner function through the Nevanlinna factorization
Theorem. The outer function will be established based on the mechanical
quadrature method, while we use two different ways to find out the zero points
of Blashcke product, thereby computing the inner function under the assumption
that the singular inner function part is trivial. Then the concrete algorithms
and illustrative experiments follow. Finally, we give a sparse representation
of $f$ by introducing the unwinding adaptive Fourier decomposition.
Authors' comments: 10 pages, 25 figures
Carlos Alexandre Brasil, Miled Hassan Youssef Moussa, Reginaldo de Jesus Napolitano
We present an analytic method, based on the Bohmian equations for quantum
mechanics, for approaching the phase-retrieval problem in the following
formulation: By knowing the probability density $\left\vert
\psi\left(\overrightarrow{r},t\right)\right\vert ^{2}$ and the energy potential
$V\left(\overrightarrow{r},t\right)$ of a system, how can one determine the
complex state $\psi\left(\overrightarrow{r},t\right)$? We illustrate our method
with three classic examples involving Gaussian states, suggesting applications
to quantum state and Hamiltonian engineering.
Authors' comments: 9 pages
Eileen Gonzales, Ben Burningham, Jackie Faherty, Colleen Cleary, Channon Visscher, Mark Marley, Roxana Lupu, Richard Freedman
We present the distance-calibrated spectral energy distribution (SED) of the
d/sdL7 SDSS J14162408+1348263A (J1416A) and an updated SED for SDSS
J14162408+1348263B (J1416B). We also present the first retrieval analysis of
J1416A using the Brewster retrieval code base and the second retrieval of
J1416B. We find that the primary is best fit by a non-grey cloud opacity with a
power-law wavelength dependence, but is indistinguishable between the type of
cloud parameterization. J1416B is best fit by a cloud-free model, consistent
with the results from Line et al. (2017). Most fundamental parameters derived
via SEDs and retrievals are consistent within 1 sigma for both J1416A and
J1416B. The exceptions include the radius of J1416A, where the retrieved radius
is smaller than the evolutionary model-based radius from the SED for the deck
cloud model, and the bolometric luminosity which is consistent within 2.5 sigma
for both cloud models. The pair's metallicity and Carbon-to-Oxygen (C/O) ratio
point towards formation and evolution as a system. By comparing the retrieved
alkali abundances while using two opacity models, we are able to evaluate how
the opacities behave for the L and T dwarf. Lastly, we find that relatively
small changes in composition can drive major observable differences for lower
temperature objects.
Authors' comments: 40 pages, 25 figures
Samarth Rawal, Chitta Baral
Information Retrieval (IR) is the task of obtaining pieces of data (such as documents or snippets of text) that are relevant to a particular query or need from a large repository of information. While a combination of traditional keyword- and modern BERT-based approaches have been shown to be effective in recent work, there are often nuances in identifying what information is "relevant" to a particular query, which can be difficult to properly capture using these systems. This work introduces the concept of a Multi-Perspective IR system, a novel methodology that combines multiple deep learning and traditional IR models to better predict the relevance of a query-sentence pair, along with a standardized framework for tuning this system. This work is evaluated on the BioASQ Biomedical IR + QA challenges.
Xukang Wei, H. Paul Urbach, Peter van der Walle, Wim M. J. Coene
We present a parameter retrieval method which combines ptychography and additional prior knowledge about the object. The proposed method is applied to two applications: (1) parameter retrieval of small particles from Fourier ptychographic dark field measurements; (2) parameter retrieval of retangule with real-space ptychography. The influence of Poisson noise is discussed in the second part of the paper. The Cram\'{e}r Rao Lower Bound in both two applications is computed and Monte Carlo analysis is used to verify the calculated lower bound. With the computation results we report the lower bound for various noise levels and the correlation of particles in Application 1. For Application 2 the correlation of parameters of the rectangule is discussed.