Joshua J. C. Hayes, E. Kerins, S. Awiphan, I. McDonald, J. S. Morgan, P. Chuanraksasat, S. Komonjinda, N. Sanguansak et al.
One of the principal bottlenecks to atmosphere characterisation in the era of
all-sky surveys is the availability of fast, autonomous and robust atmospheric
retrieval methods. We present a new approach using unsupervised machine
learning to generate informed priors for retrieval of exoplanetary atmosphere
parameters from transmission spectra. We use principal component analysis (PCA)
to efficiently compress the information content of a library of transmission
spectra forward models generated using the PLATON package. We then apply a
$k$-means clustering algorithm in PCA space to segregate the library into
discrete classes. We show that our classifier is almost always able to
instantaneously place a previously unseen spectrum into the correct class, for
low-to-moderate spectral resolutions, $R$, in the range $R~=~30-300$ and noise
levels up to $10$~per~cent of the peak-to-trough spectrum amplitude. The
distribution of physical parameters for all members of the class therefore
provides an informed prior for standard retrieval methods such as nested
sampling. We benchmark our informed-prior approach against a standard
uniform-prior nested sampler, finding that our approach is up to a factor two
faster, with negligible reduction in accuracy. We demonstrate the application
of this method to existing and near-future observatories, and show that it is
suitable for real-world application. Our general approach is not specific to
transmission spectroscopy and should be more widely applicable to cases that
involve repetitive fitting of trusted high-dimensional models to large data
catalogues, including beyond exoplanetary science.
Authors' comments: Accepted for publication in MNRAS
Zhanghui Kuang, Yiming Gao, Guanbin Li, Ping Luo, Yimin Chen, Liang Lin, Wayne Zhang
Matching clothing images from customers and online shopping stores has rich
applications in E-commerce. Existing algorithms encoded an image as a global
feature vector and performed retrieval with the global representation. However,
discriminative local information on clothes are submerged in this global
representation, resulting in sub-optimal performance. To address this issue, we
propose a novel Graph Reasoning Network (GRNet) on a Similarity Pyramid, which
learns similarities between a query and a gallery cloth by using both global
and local representations in multiple scales. The similarity pyramid is
represented by a Graph of similarity, where nodes represent similarities
between clothing components at different scales, and the final matching score
is obtained by message passing along edges. In GRNet, graph reasoning is solved
by training a graph convolutional network, enabling to align salient clothing
components to improve clothing retrieval. To facilitate future researches, we
introduce a new benchmark FindFashion, containing rich annotations of bounding
boxes, views, occlusions, and cropping. Extensive experiments show that GRNet
obtains new state-of-the-art results on two challenging benchmarks, e.g.,
pushing the top-1, top-20, and top-50 accuracies on DeepFashion to 26%, 64%,
and 75% (i.e., 4%, 10%, and 10% absolute improvements), outperforming
competitors with large margins. On FindFashion, GRNet achieves considerable
improvements on all empirical settings.
Authors' comments: ICCV 2019 (oral)
Christian Joppi, Marco Godi, Andrea Giachetti, Fabio Pellacini, Marco Cristani
Capturing the essence of a textile image in a robust way is important to
retrieve it in a large repository, especially if it has been acquired in the
wild (by taking a photo of the textile of interest). In this paper we show that
a texel-based representation fits well with this task. In particular, we refer
to Texel-Att, a recent texel-based descriptor which has shown to capture fine
grained variations of a texture, for retrieval purposes. After a brief
explanation of Texel-Att, we will show in our experiments that this descriptor
is robust to distortions resulting from acquisitions in the wild by setting up
an experiment in which textures from the ElBa (an Element-Based texture
dataset) are artificially distorted and then used to retrieve the original
image. We compare our approach with existing descriptors using a simple ranking
framework based on distance functions. Results show that even under extreme
conditions (such as a down-sampling with a factor of 10), we perform better
than alternative approaches.
Authors' comments: ICIAP - International Conference on Image Analysis and Processing
Giovanni Angelo Meles, Lele Zhang, Jan Thorbecke, Kees Wapenaar, Evert Slob
Seismic images provided by reverse time migration can be contaminated by
artefacts associated with the migration of multiples. Multiples can corrupt
seismic images, producing both false positives, i.e. by focusing energy at
unphysical interfaces, and false negatives, i.e. by destructively interfering
with primaries. Multiple prediction / primary synthesis methods are usually
designed to operate on point source gathers, and can therefore be
computationally demanding when large problems are considered. A computationally
attractive scheme that operates on plane-wave datasets is derived by adapting a
data-driven point source gathers method, based on convolutions and
cross-correlations of the reflection response with itself, to include
plane-wave concepts. As a result, the presented algorithm allows fully
data-driven synthesis of primary reflections associated with plane-wave source
responses. Once primary plane-wave responses are estimated, they are used for
multiple-free imaging via plane-wave reverse time migration. Numerical tests of
increasing complexity demonstrate the potential of the proposed algorithm to
produce multiple-free images from only a small number of plane-wave datasets.
Authors' comments: 20 pages, 8 figure
Umut Özaydın, Theodoros Georgiou, Michael Lew
Feature detectors and descriptors have been successfully used for various
computer vision tasks, such as video object tracking and content-based image
retrieval. Many methods use image gradients in different stages of the
detection-description pipeline to describe local image structures. Recently,
some, or all, of these stages have been replaced by convolutional neural
networks (CNNs), in order to increase their performance. A detector is defined
as a selection problem, which makes it more challenging to implement as a CNN.
They are therefore generally defined as regressors, converting input images to
score maps and keypoints can be selected with non-maximum suppression. This
paper discusses and compares several recent methods that use CNNs for keypoint
detection. Experiments are performed both on the CNN based approaches, as well
as a selection of conventional methods. In addition to qualitative measures
defined on keypoints and descriptors, the bag-of-words (BoW) model is used to
implement an image retrieval application, in order to determine how the methods
perform in practice. The results show that each type of features are best in
different contexts.
Authors' comments: 5 pages, 3 figures, 3 tables, CBMI 2019
Akhilesh Sudhakar, Bhargav Upadhyay, Arjun Maheswaran
Text style transfer is the task of transferring the style of text having
certain stylistic attributes, while preserving non-stylistic or content
information. In this work we introduce the Generative Style Transformer (GST) -
a new approach to rewriting sentences to a target style in the absence of
parallel style corpora. GST leverages the power of both, large unsupervised
pre-trained language models as well as the Transformer. GST is a part of a
larger `Delete Retrieve Generate' framework, in which we also propose a novel
method of deleting style attributes from the source sentence by exploiting the
inner workings of the Transformer. Our models outperform state-of-art systems
across 5 datasets on sentiment, gender and political slant transfer. We also
propose the use of the GLEU metric as an automatic metric of evaluation of
style transfer, which we found to compare better with human ratings than the
predominantly used BLEU score.
Authors' comments: 11 pages, 6 Tables, 2 Figures, Accepted at 2019 Conference on
Empirical Methods in Natural Language Processing (EMNLP - 2019)
Tong Guo, Huilin Gao
This paper studies the performances of BERT combined with tree structure in short sentence ranking task. In retrieval-based question answering system, we retrieve the most similar question of the query question by ranking all the questions in datasets. If we want to rank all the sentences by neural rankers, we need to score all the sentence pairs. However it consumes large amount of time. So we design a specific tree for searching and combine deep model to solve this problem. We fine-tune BERT on the training data to get semantic vector or sentence embeddings on the test data. We use all the sentence embeddings of test data to build our tree based on k-means and do beam search at predicting time when given a sentence as query. We do the experiments on the semantic textual similarity dataset, Quora Question Pairs, and process the dataset for sentence ranking. Experimental results show that our methods outperform the strong baseline. Our tree accelerate the predicting speed by 500%-1000% without losing too much ranking accuracy.
Quentin Changeat, Luke Keyte, Ingo P Waldmann, Giovanna Tinetti
In current models used to interpret exoplanet atmospheric observations, the
planet mass is treated as a prior and is estimated independently with external
methods, such as RV or TTV techniques. This approach is necessary as available
spectroscopic data do not have sufficient wavelength coverage and/or SNR to
infer the planetary mass. We examine here the impact of mass uncertainties on
spectral retrieval analyses for a host of atmospheric scenarios. Our approach
is both analytical and numerical: we first use simple approximations to extract
analytically the influence of each parameter to the wavelength-dependent
transit depth. We then adopt a fully Bayesian retrieval model to quantify the
propagation of the mass uncertainty onto other atmospheric parameters. We found
that for clear-sky, gaseous atmospheres the posterior distributions are the
same when the mass is known or retrieved. The retrieved mass is very accurate,
with a precision of more than 10%, provided the wavelength coverage and S/N are
adequate. When opaque clouds are included in the simulations, the uncertainties
in the retrieved mass increase, especially for high altitude clouds. However
atmospheric parameters such as the temperature and trace-gas abundances are
unaffected by the knowledge of the mass. Secondary atmospheres are more
challenging due to the higher degree of freedom for the atmospheric main
component, which is unknown. For broad wavelength range and adequate SNR, the
mass can still be retrieved accurately and precisely if clouds are not present,
and so are all the other atmospheric/planetary parameters. When clouds are
added, we find that the mass uncertainties may impact substantially the
retrieval of the mean molecular weight: an independent characterisation of the
mass would therefore be helpful to capture/confirm the main atmospheric
constituent.
Authors' comments: 19 pages, 12 figures, Accepted in ApJ
Michael Wray, Diane Larlus, Gabriela Csurka, Dima Damen
We address the problem of cross-modal fine-grained action retrieval between
text and video. Cross-modal retrieval is commonly achieved through learning a
shared embedding space, that can indifferently embed modalities. In this paper,
we propose to enrich the embedding by disentangling parts-of-speech (PoS) in
the accompanying captions. We build a separate multi-modal embedding space for
each PoS tag. The outputs of multiple PoS embeddings are then used as input to
an integrated multi-modal space, where we perform action retrieval. All
embeddings are trained jointly through a combination of PoS-aware and
PoS-agnostic losses. Our proposal enables learning specialised embedding spaces
that offer multiple views of the same embedded entities.
We report the first retrieval results on fine-grained actions for the
large-scale EPIC dataset, in a generalised zero-shot setting. Results show the
advantage of our approach for both video-to-text and text-to-video action
retrieval. We also demonstrate the benefit of disentangling the PoS for the
generic task of cross-modal video retrieval on the MSR-VTT dataset.
Authors' comments: Accepted for presentation at ICCV. Project Page:
https://mwray.github.io/FGAR
Xinwei He, Tengteng Huang, Song Bai, Xiang Bai
How to aggregate multi-view representations of a 3D object into an
informative and discriminative one remains a key challenge for multi-view 3D
object retrieval. Existing methods either use view-wise pooling strategies
which neglect the spatial information across different views or employ
recurrent neural networks which may face the efficiency problem. To address
these issues, we propose an effective and efficient framework called View
N-gram Network (VNN). Inspired by n-gram models in natural language processing,
VNN divides the view sequence into a set of visual n-grams, which involve
overlapping consecutive view sub-sequences. By doing so, spatial information
across multiple views is captured, which helps to learn a discriminative global
embedding for each 3D object. Experiments on 3D shape retrieval benchmarks,
including ModelNet10, ModelNet40 and ShapeNetCore55 datasets, demonstrate the
superiority of our proposed method.
Authors' comments: The paper was accepted to ICCV 2019
Li Yuan, Tao Wang, Xiaopeng Zhang, Francis EH Tay, Zequn Jie, Wei Liu, Jiashi Feng
Existing data-dependent hashing methods usually learn hash functions from
pairwise or triplet data relationships, which only capture the data similarity
locally, and often suffer from low learning efficiency and low collision rate.
In this work, we propose a new \emph{global} similarity metric, termed as
\emph{central similarity}, with which the hash codes of similar data pairs are
encouraged to approach a common center and those for dissimilar pairs to
converge to different centers, to improve hash learning efficiency and
retrieval accuracy. We principally formulate the computation of the proposed
central similarity metric by introducing a new concept, i.e., \emph{hash
center} that refers to a set of data points scattered in the Hamming space with
a sufficient mutual distance between each other. We then provide an efficient
method to construct well separated hash centers by leveraging the Hadamard
matrix and Bernoulli distributions. Finally, we propose the Central Similarity
Quantization (CSQ) that optimizes the central similarity between data points
w.r.t.\ their hash centers instead of optimizing the local similarity. CSQ is
generic and applicable to both image and video hashing scenarios. Extensive
experiments on large-scale image and video retrieval tasks demonstrate that CSQ
can generate cohesive hash codes for similar data pairs and dispersed hash
codes for dissimilar pairs, achieving a noticeable boost in retrieval
performance, i.e. 3\%-20\% in mAP over the previous state-of-the-arts. The code
is at: \url{https://github.com/yuanli2333/Hadamard-Matrix-for-hashing}
Authors' comments: CVPR2020, Codes:
https://github.com/yuanli2333/Hadamard-Matrix-for-hashing
Basma El Amel Boussaha, Nicolas Hernandez, Christine Jacquin, Emmanuel Morin
Building dialogue systems that naturally converse with humans is being an attractive and an active research domain. Multiple systems are being designed everyday and several datasets are being available. For this reason, it is being hard to keep an up-to-date state-of-the-art. In this work, we present the latest and most relevant retrieval-based dialogue systems and the available datasets used to build and evaluate them. We discuss their limitations and provide insights and guidelines for future work.
Byungsoo Ko, Minchul Shin, Geonmo Gu, HeeJae Jun, Tae Kwan Lee, Youngjoon Kim
Many studies have been performed on metric learning, which has become a key ingredient in top-performing methods of instance-level image retrieval. Meanwhile, less attention has been paid to pre-processing and post-processing tricks that can significantly boost performance. Furthermore, we found that most previous studies used small scale datasets to simplify processing. Because the behavior of a feature representation in a deep learning model depends on both domain and data, it is important to understand how model behave in large-scale environments when a proper combination of retrieval tricks is used. In this paper, we extensively analyze the effect of well-known pre-processing, post-processing tricks, and their combination for large-scale image retrieval. We found that proper use of these tricks can significantly improve model performance without necessitating complex architecture or introducing loss, as confirmed by achieving a competitive result on the Google Landmark Retrieval Challenge 2019.
Mo Deng, Shuai Li, Alexandre Goy, Iksung Kang, George Barbastathis
The quality of inverse problem solutions obtained through deep learning [Barbastathis et al, 2019] is limited by the nature of the priors learned from examples presented during the training phase. In the case of quantitative phase retrieval [Sinha et al, 2017, Goy et al, 2019], in particular, spatial frequencies that are underrepresented in the training database, most often at the high band, tend to be suppressed in the reconstruction. Ad hoc solutions have been proposed, such as pre-amplifying the high spatial frequencies in the examples [Li et al, 2018]; however, while that strategy improves resolution, it also leads to high-frequency artifacts as well as low-frequency distortions in the reconstructions. Here, we present a new approach that learns separately how to handle the two frequency bands, low and high; and also learns how to synthesize these two bands into the full-band reconstructions. We show that this "learning to synthesize" (LS) method yields phase reconstructions of high spatial resolution and artifact-free; and it is also resilient to high-noise conditions, e.g. in the case of very low photon flux. In addition to the problem of quantitative phase retrieval, the LS method is applicable, in principle, to any inverse problem where the forward operator treats different frequency bands unevenly, i.e. is ill-posed.
Zhihao Shen, Wan Du, Xi Zhao, Jianhua Zou
Retrieving similar trajectories from a large trajectory dataset is important
for a variety of applications, like transportation planning and mobility
analysis. Unlike previous works based on fine-grained GPS trajectories, this
paper investigates the feasibility of identifying similar trajectories from
cellular data observed by mobile infrastructure, which provide more
comprehensive coverage. To handle the large localization errors and low sample
rates of cellular data, we develop a holistic system, cellSim, which seamlessly
integrates map matching and similar trajectory search. A set of map matching
techniques are proposed to transform cell tower sequences into moving
trajectories on a road map by considering the unique features of cellular data,
like the dynamic density of cell towers and bidirectional roads. To further
improve the accuracy of similarity search, map matching outputs M trajectory
candidates of different confidence, and a new similarity measure scheme is
developed to process the map matching results. Meanwhile, M is dynamically
adapted to maintain a low false positive rate of the similarity search, and two
pruning schemes are proposed to minimize the computation overhead. Extensive
experiments on a large-scale dataset and real-world trajectories of 1701 km
reveal that cellSim provides high accuracy (precision 62.4% and recall of
89.8%).
Authors' comments: This paper has been submitted to IEEE Transactions on Mobile
Computing
Zheng Liu, Yu Xing, Jianxun Lian, Defu Lian, Ziyao Li, Xing Xie
Candidate retrieval is a fundamental issue in recommendation system. Given user's recommendation request, relevant candidates need to be retrieved in realtime for subsequent ranking operations. Considering that the retrieval operation is conducted over considerable items, it has to be both precise and scalable so that high-quality candidates can be acquired within tolerable latency. Unfortunately, conventional methods would trade off precision for high running efficiency, which leads to inferior retrieval quality. In contrast, those deep learning-based approaches can be highly accurate in identifying relevant items; yet, they are unsuitable for candidate retrieval due to their inherent limitation on scalability. In this work, a novel framework is proposed to address the above challenges. The underlying intuition is to rely on a well-trained ranking model for the supervision of an efficient retrieval model, such that it will unify the scalability and precision as a whole. We have implemented our conceptual framework and made comprehensive evaluation for it, where promising results are achieved against representative baselines. Our work is undergoing a anonymous review, and it will soon be released after the notification. If you're also interested in this problem, please feel free to contact us.
Guoping Zhao, Mingyu Zhang, Jiajun Liu, Ji-Rong Wen
Studies show that Deep Neural Network (DNN)-based image classification models are vulnerable to maliciously constructed adversarial examples. However, little effort has been made to investigate how DNN-based image retrieval models are affected by such attacks. In this paper, we introduce Unsupervised Adversarial Attacks with Generative Adversarial Networks (UAA-GAN) to attack deep feature-based image retrieval systems. UAA-GAN is an unsupervised learning model that requires only a small amount of unlabeled data for training. Once trained, it produces query-specific perturbations for query images to form adversarial queries. The core idea is to ensure that the attached perturbation is barely perceptible to human yet effective in pushing the query away from its original position in the deep feature space. UAA-GAN works with various application scenarios that are based on deep features, including image retrieval, person Re-ID and face search. Empirical results show that UAA-GAN cripples retrieval performance without significant visual changes in the query images. UAA-GAN generated adversarial examples are less distinguishable because they tend to incorporate subtle perturbations in textured or salient areas of the images, such as key body parts of human, dominant structural patterns/textures or edges, rather than in visually insignificant areas (e.g., background and sky). Such tendency indicates that the model indeed learned how to toy with both image retrieval systems and human eyes.
Minchul Shin, Sanghyuk Park, Taeksoo Kim
With a growing demand for the search by image, many works have studied the
task of fashion instance-level image retrieval (FIR). Furthermore, the recent
works introduce a concept of fashion attribute manipulation (FAM) which
manipulates a specific attribute (e.g color) of a fashion item while
maintaining the rest of the attributes (e.g shape, and pattern). In this way,
users can search not only "the same" items but also "similar" items with the
desired attributes. FAM is a challenging task in that the attributes are hard
to define, and the unique characteristics of a query are hard to be preserved.
Although both FIR and FAM are important in real-life applications, most of the
previous studies have focused on only one of these problem. In this study, we
aim to achieve competitive performance on both FIR and FAM. To do so, we
propose a novel method that converts a query into a representation with the
desired attributes. We introduce a new idea of attribute manipulation at the
feature level, by matching the distribution of manipulated features with real
features. In this fashion, the attribute manipulation can be done independently
from learning a representation from the image. By introducing the feature-level
attribute manipulation, the previous methods for FIR can perform attribute
manipulation without sacrificing their retrieval performance.
Authors' comments: Accepted to BMVC 2019
Amin Ahmad, Noah Constant, Yinfei Yang, Daniel Cer
Popular QA benchmarks like SQuAD have driven progress on the task of identifying answer spans within a specific passage, with models now surpassing human performance. However, retrieving relevant answers from a huge corpus of documents is still a challenging problem, and places different requirements on the model architecture. There is growing interest in developing scalable answer retrieval models trained end-to-end, bypassing the typical document retrieval step. In this paper, we introduce Retrieval Question-Answering (ReQA), a benchmark for evaluating large-scale sentence-level answer retrieval models. We establish baselines using both neural encoding models as well as classical information retrieval techniques. We release our evaluation code to encourage further work on this challenging task.
Xian Chen, Zhe-Feng Shen
Gravitational waves (GWs) encode important information about the mass of the
source. For binary black holes (BBHs), the templates that are used to retrieve
the masses normally are developed under the assumption of a vacuum environment.
However, theories suggest that some BBHs form in gas-rich environments. Here we
study the effect of hydrodynamic drag on the chirp signal of a stellar-mass BBH
and the impact on the measurement of the mass. Based on theoretical arguments,
we show that the waveform of a BBH in gas resembles that of a more massive BBH
residing in a vacuum. The effect is important for LISA sources but negligible
for LIGO/Virgo binaries. Furthermore, we carry out a matched-filtering search
of the best fitting parameters. We find that the best-fit chirp mass could be
significantly greater than the real mass if the gas effect is not appropriately
accounted for. Our results have important implications for the future joint
observation of BBHs using both ground- and space-based detectors.
Authors' comments: 5 pages, 1 figure. This is a contribution to the conference
proceedings: Recent Progress in Relativistic Astrophysics, Fudan University,
China