Cheng Chang, Himanshu Rai, Satya Krishna Gorti, Junwei Ma, Chundi Liu, Guangwei Yu, Maksims Volkovs
We present our solution to Landmark Image Retrieval Challenge 2019. This challenge was based on the large Google Landmarks Dataset V2[9]. The goal was to retrieve all database images containing the same landmark for every provided query image. Our solution is a combination of global and local models to form an initial KNN graph. We then use a novel extension of the recently proposed graph traversal method EGT [1] referred to as semi-supervised EGT to refine the graph and retrieve better candidates.
Xinyu Hua, Zhe Hu, Lu Wang
Automatic argument generation is an appealing but challenging task. In this
paper, we study the specific problem of counter-argument generation, and
present a novel framework, CANDELA. It consists of a powerful retrieval system
and a novel two-step generation model, where a text planning decoder first
decides on the main talking points and a proper language style for each
sentence, then a content realization decoder reflects the decisions and
constructs an informative paragraph-level argument. Furthermore, our generation
model is empowered by a retrieval system indexed with 12 million articles
collected from Wikipedia and popular English news media, which provides access
to high-quality content with diversity. Automatic evaluation on a large-scale
dataset collected from Reddit shows that our model yields significantly higher
BLEU, ROUGE, and METEOR scores than the state-of-the-art and non-trivial
comparisons. Human evaluation further indicates that our system arguments are
more appropriate for refutation and richer in content.
Authors' comments: Accepted as a long paper to ACL 2019
Bor-Chun Chen, Zuxuan Wu, Larry S. Davis, Ser-Nam Lim
Detecting spliced images is one of the emerging challenges in computer vision. Unlike prior methods that focus on detecting low-level artifacts generated during the manipulation process, we use an image retrieval approach to tackle this problem. When given a spliced query image, our goal is to retrieve the original image from a database of authentic images. To achieve this goal, we propose representing an image by its constituent objects based on the intuition that the finest granularity of manipulations is oftentimes at the object-level. We introduce a framework, object embeddings for spliced image retrieval (OE-SIR), that utilizes modern object detectors to localize object regions. Each region is then embedded and collectively used to represent the image. Further, we propose a student-teacher training paradigm for learning discriminative embeddings within object regions to avoid expensive multiple forward passes. Detailed analysis of the efficacy of different feature embedding models is also provided in this study. Extensive experimental results show that the OE-SIR achieves state-of-the-art performance in spliced image retrieval.
Estefania Talavera, Petia Radeva, Nicolai Petkov
The availability and use of egocentric data are rapidly increasing due to the growing use of wearable cameras. Our aim is to study the effect (positive, neutral or negative) of egocentric images or events on an observer. Given egocentric photostreams capturing the wearer's days, we propose a method that aims to assign sentiment to events extracted from egocentric photostreams. Such moments can be candidates to retrieve according to their possibility of representing a positive experience for the camera's wearer. The proposed approach obtained a classification accuracy of 75% on the test set, with a deviation of 8%. Our model makes a step forward opening the door to sentiment recognition in egocentric photostreams.
Philippe Jaming, Karim Kellay, Rolando Perez
This study investigates the phase retrieval problem for wide-band signals. We solve the following problem: given f $\in$ L 2 (R) with Fourier transform in L 2 (R, e^{2c|x|} dx), we find all functions g $\in$ L 2 (R) with Fourier transform in L 2 (R, e^{2c|x| dx}), such that |f (x)| = |g(x)| for all x $\in$ R. To do so, we first translate the problem to functions in the Hardy spaces on the disc via a conformal bijection, and take advantage of the inner-outer factorization. We also consider the same problem with additional constraints involving some transforms of f and g, and determine if these constraints force uniqueness of the solution.
Çağatay Işıl, Figen S. Oktem, Aykut Koç
Classical phase retrieval problem is the recovery of a constrained image from
the magnitude of its Fourier transform. Although there are several well-known
phase retrieval algorithms including the hybrid input-output (HIO) method, the
reconstruction performance is generally sensitive to initialization and
measurement noise. Recently, deep neural networks (DNNs) have been shown to
provide state-of-the-art performance in solving several inverse problems such
as denoising, deconvolution, and superresolution. In this work, we develop a
phase retrieval algorithm that utilizes two DNNs together with the model-based
HIO method. First, a DNN is trained to remove the HIO artifacts and is used
iteratively with the HIO method to improve the reconstructions. After this
iterative phase, a second DNN is trained to remove the remaining artifacts.
Numerical results demonstrate the effectiveness of ourapproach, which has
little additional computational cost compared to the HIO method. Our approach
not only achieves state-of-the-art reconstruction performance but also is more
robust to different initialization and noise levels.
Authors' comments: 14 pages, 8 figures, published in Applied Optics (Vol. 58, Issue 20,
pp. 5422-5431 (2019))
Yinchuan Li, Xu Zhang, Zegang Ding, Xiaodong Wang
This paper concerns the problem of estimating multidimensional (MD) frequencies using prior knowledge of the signal spectral sparsity from partial time samples. In many applications, such as radar, wireless communications, and super-resolution imaging, some structural information about the signal spectrum might be known beforehand. Suppose that the frequencies lie in given intervals, the goal is to improve the frequency estimation performance by using the prior information. We study the MD Vandermonde decomposition of block Toeplitz matrices in which the frequencies are restricted to given intervals. We then propose to solve the frequency-selective atomic norm minimization by converting them into semidefinite program based on the MD Vandermonde decomposition. Numerical simulation results are presented to illustrate the good performance of the proposed method.
Bing Gao, Xinwei Sun, Yang Wang, Zhiqiang Xu
In this paper, we propose a new non-convex algorithm for solving the phase retrieval problem, i.e., the reconstruction of a signal $ \vx\in\H^n $ ($\H=\R$ or $\C$) from phaseless samples $ b_j=\abs{\langle \va_j, \vx\rangle } $, $ j=1,\ldots,m $. The proposed algorithm solves a new proposed model, perturbed amplitude-based model, for phase retrieval and is correspondingly named as {\em Perturbed Amplitude Flow} (PAF). We prove that PAF can recover $c\vx$ ($\abs{c} = 1$) under $\mathcal{O}(n)$ Gaussian random measurements (optimal order of measurements). Starting with a designed initial point, our PAF algorithm iteratively converges to the true solution at a linear rate for both real and complex signals. Besides, PAF algorithm needn't any truncation or re-weighted procedure, so it enjoys simplicity for implementation. The effectiveness and benefit of the proposed method are validated by both the simulation studies and the experiment of recovering natural images.
Liu Yang, Junjie Hu, Minghui Qiu, Chen Qu, Jianfeng Gao, W. Bruce Croft, Xiaodong Liu, Yelong Shen et al.
Intelligent personal assistant systems that are able to have multi-turn
conversations with human users are becoming increasingly popular. Most previous
research has been focused on using either retrieval-based or generation-based
methods to develop such systems. Retrieval-based methods have the advantage of
returning fluent and informative responses with great diversity. However, the
performance of the methods is limited by the size of the response repository.
On the other hand, generation-based methods can produce highly coherent
responses on any topics. But the generated responses are often generic and not
informative due to the lack of grounding knowledge. In this paper, we propose a
hybrid neural conversation model that combines the merits of both response
retrieval and generation methods. Experimental results on Twitter and
Foursquare data show that the proposed model outperforms both retrieval-based
methods and generation-based methods (including a recently proposed
knowledge-grounded neural conversation model) under both automatic evaluation
metrics and human evaluation. We hope that the findings in this study provide
new insights on how to integrate text retrieval and text generation models for
building conversation systems.
Authors' comments: Accepted as a Full Paper in CIKM 2019. 10 pages
Siddhant Arora, Andrew Yates
We consider algorithm selection in the context of ad-hoc information
retrieval. Given a query and a pair of retrieval methods, we propose a
meta-learner that predicts how to combine the methods' relevance scores into an
overall relevance score. Inspired by neural models' different properties with
regard to IR axioms, these predictions are based on features that quantify
axiom-related properties of the query and its top ranked documents. We conduct
an evaluation on TREC Web Track data and find that the meta-learner often
significantly improves over the individual methods. Finally, we conduct feature
and query weight analyses to investigate the meta-learner's behavior.
Authors' comments: Algorithm Selection and Meta-Learning in Information Retrieval
(AMIR'19) workshop at ECIR'19
Zhan Yang, Osolo Ian Raymond, WuQing Sun, Jun Long
Due to its fast retrieval and storage efficiency capabilities, hashing has
been widely used in nearest neighbor retrieval tasks. By using deep learning
based techniques, hashing can outperform non-learning based hashing technique
in many applications. However, we argue that the current deep learning based
hashing methods ignore some critical problems (e.g., the learned hash codes are
not discriminative due to the hashing methods being unable to discover rich
semantic information and the training strategy having difficulty optimizing the
discrete binary codes). In this paper, we propose a novel image hashing method,
termed as \textbf{\underline{A}}symmetric \textbf{\underline{D}}eep
\textbf{\underline{S}}emantic \textbf{\underline{Q}}uantization
(\textbf{ADSQ}). \textbf{ADSQ} is implemented using three stream frameworks,
which consist of one \emph{LabelNet} and two \emph{ImgNets}. The
\emph{LabelNet} leverages the power of three fully-connected layers, which are
used to capture rich semantic information between image pairs. For the two
\emph{ImgNets}, they each adopt the same convolutional neural network
structure, but with different weights (i.e., asymmetric convolutional neural
networks). The two \emph{ImgNets} are used to generate discriminative compact
hash codes. Specifically, the function of the \emph{LabelNet} is to capture
rich semantic information that is used to guide the two \emph{ImgNets} in
minimizing the gap between the real-continuous features and the discrete binary
codes. Furthermore, \textbf{ADSQ} can utilize the most critical semantic
information to guide the feature learning process and consider the consistency
of the common semantic space and Hamming space. Experimental results on three
benchmarks (i.e., CIFAR-10, NUS-WIDE, and ImageNet) demonstrate that the
proposed \textbf{ADSQ} can outperforms current state-of-the-art methods.
Authors' comments: Accepted to IEEE ACCESS. arXiv admin note: text overlap with
arXiv:1812.01404
Philipp Grohs, Martin Rathmair
In recent work [P. Grohs and M. Rathmair. Stable Gabor Phase Retrieval and Spectral Clustering. Communications on Pure and Applied Mathematics (2018)] the instabilities of the Gabor phase retrieval problem, i.e., the problem of reconstructing a function $f$ from its spectrogram $|\mathcal{G}f|$, where $$ \mathcal{G}f(x,y)=\int_{\mathbb{R}^d} f(t) e^{-\pi|t-x|^2} e^{-2\pi i t\cdot y} dt, \quad x,y\in \mathbb{R}^d, $$ have been completely classified in terms of the disconnectedness of the spectrogram. These findings, however, were crucially restricted to the onedimensional case ($d=1$) and therefore not relevant for many practical applications. In the present paper we not only generalize the aforementioned results to the multivariate case but also significantly improve on them. Our new results have comprehensive implications in various applications such as ptychography, a highly popular method in coherent diffraction imaging.
David A. Barmherzig, Ju Sun, T. J. Lane, Po-Nan Li, Emmanuel J. Candès
A general mathematical framework and recovery algorithm is presented for the
holographic phase retrieval problem. In this problem, which arises in
holographic coherent diffraction imaging, a "reference" portion of the signal
to be recovered via phase retrieval is a priori known from experimental design.
A generic formula is also derived for the expected recovery error when the
measurement data is corrupted by Poisson shot noise. This facilitates an
optimization perspective towards reference design and analysis. We employ this
optimization perspective towards quantifying the performance of various
reference choices.
Authors' comments: 27 pages, 10 figures
Karim Banawan, Sennur Ulukus
We consider the problem of private information retrieval (PIR) of a single message out of $K$ messages from $N$ non-colluding and non-replicated databases. Different from the majority of the existing literature, which considers the case of replicated databases where all databases store the same content in the form of all $K$ messages, here, we consider the case of non-replicated databases under a special non-replication structure where each database stores $M$ out of $K$ messages and each message is stored across $R$ different databases. This generates an $R$-regular graph structure for the storage system where the vertices of the graph are the messages and the edges are the databases. We derive a general upper bound for $M=2$ that depends on the graph structure. We then specialize the problem to storage systems described by two special types of graph structures: cyclic graphs and \emph{fully-connected graphs}. We prove that the PIR capacity for the case of cyclic graphs is $\frac{2}{K+1}$, and the PIR capacity for the case of fully-connected graphs is $\min\{\frac{2}{K},\frac{1}{2}\}$. To that end, we propose novel achievable schemes for both graph structures that are capacity-achieving. The central insight in both schemes is to introduce dependency in the queries submitted to databases that do not contain the desired message, such that the requests can be compressed. In both cases, the results show severe degradation in PIR capacity due to non-replication.
Jie Li, Rongrong Ji, Hong Liu, Xiaopeng Hong, Yue Gao, Qi Tian
Universal adversarial perturbations (UAPs), a.k.a. input-agnostic
perturbations, has been proved to exist and be able to fool cutting-edge deep
learning models on most of the data samples. Existing UAP methods mainly focus
on attacking image classification models. Nevertheless, little attention has
been paid to attacking image retrieval systems. In this paper, we make the
first attempt in attacking image retrieval systems. Concretely, image retrieval
attack is to make the retrieval system return irrelevant images to the query at
the top ranking list. It plays an important role to corrupt the neighbourhood
relationships among features in image retrieval attack. To this end, we propose
a novel method to generate retrieval-against UAP to break the neighbourhood
relationships of image features via degrading the corresponding ranking metric.
To expand the attack method to scenarios with varying input sizes or
untouchable network parameters, a multi-scale random resizing scheme and a
ranking distillation strategy are proposed. We evaluate the proposed method on
four widely-used image retrieval datasets, and report a significant performance
drop in terms of different metrics, such as mAP and mP@10. Finally, we test our
attack methods on the real-world visual search engine, i.e., Google Images,
which demonstrates the practical potentials of our methods.
Authors' comments: ICCV 2019
Daniel Gillick, Alessandro Presta, Gaurav Singh Tomar
Most text-based information retrieval (IR) systems index objects by words or phrases. These discrete systems have been augmented by models that use embeddings to measure similarity in continuous space. But continuous-space models are typically used just to re-rank the top candidates. We consider the problem of end-to-end continuous retrieval, where standard approximate nearest neighbor (ANN) search replaces the usual discrete inverted index, and rely entirely on distances between learned embeddings. By training simple models specifically for retrieval, with an appropriate model architecture, we improve on a discrete baseline by 8% and 26% (MAP) on two similar-question retrieval tasks. We also discuss the problem of evaluation for retrieval systems, and show how to modify existing pairwise similarity datasets for this purpose.
Frank Soboczenski, Michael D. Himes, Molly D. O'Beirne, Simone Zorzan, Atilim Gunes Baydin, Adam D. Cobb, Yarin Gal, Daniel Angerhausen et al.
Over the past decade, the study of extrasolar planets has evolved rapidly
from plain detection and identification to comprehensive categorization and
characterization of exoplanet systems and their atmospheres. Atmospheric
retrieval, the inverse modeling technique used to determine an exoplanetary
atmosphere's temperature structure and composition from an observed spectrum,
is both time-consuming and compute-intensive, requiring complex algorithms that
compare thousands to millions of atmospheric models to the observational data
to find the most probable values and associated uncertainties for each model
parameter. For rocky, terrestrial planets, the retrieved atmospheric
composition can give insight into the surface fluxes of gaseous species
necessary to maintain the stability of that atmosphere, which may in turn
provide insight into the geological and/or biological processes active on the
planet. These atmospheres contain many molecules, some of them biosignatures,
spectral fingerprints indicative of biological activity, which will become
observable with the next generation of telescopes. Runtimes of traditional
retrieval models scale with the number of model parameters, so as more
molecular species are considered, runtimes can become prohibitively long.
Recent advances in machine learning (ML) and computer vision offer new ways to
reduce the time to perform a retrieval by orders of magnitude, given a
sufficient data set to train with. Here we present an ML-based retrieval
framework called Intelligent exoplaNet Atmospheric RetrievAl (INARA) that
consists of a Bayesian deep learning model for retrieval and a data set of
3,000,000 synthetic rocky exoplanetary spectra generated using the NASA
Planetary Spectrum Generator. Our work represents the first ML retrieval model
for rocky, terrestrial exoplanets and the first synthetic data set of
terrestrial spectra generated at this scale.
Authors' comments: Third workshop on Bayesian Deep Learning (NeurIPS 2018), Montreal,
Canada
Julien Lavauzelle, Razane Tajeddine, Ragnar Freij-Hollanti, Camilla Hollanti
A private information retrieval (PIR) scheme allows a user to retrieve a file from a database without revealing any information on the file being requested. As of now, PIR schemes have been proposed for several kinds of storage systems, including replicated and MDS-coded data. In this paper, the problem of constructing a PIR scheme on regenerating codes is considered. A regenerating code is a storage code whose codewords are distributed among $n$ nodes, enabling efficient storage of files, as well as low-bandwidth retrieval of files and repair of nodes. In this work, a PIR scheme on regenerating codes is constructed, using the product-matrix (PM) framework of Rashmi, Shah and Kumar. Both the minimum-bandwidth (MBR) and minimum-storage (MSR) settings are considered, and the structure given by the PM framework is used in order to reduce the download communication complexity of our schemes.
Zhongdao Wang, Liang Zheng, Shengjin Wang
Feature fusion is a commonly used strategy in image retrieval tasks, which aggregates the matching responses of multiple visual features. Feasible sets of features can be either descriptors (SIFT, HSV) for an entire image or the same descriptor for different local parts (face, body). Ideally, the to-be-fused heterogeneous features are pre-assumed to be discriminative and complementary to each other. However, the effectiveness of different features varies dramatically according to different queries. That is to say, for some queries, a feature may be neither discriminative nor complementary to existing ones, while for other queries, the feature suffices. As a result, it is important to estimate the effectiveness of features in a query-adaptive manner. To this end, this article proposes a new late fusion scheme at the score level. We base our method on the observation that the sorted score curves contain patterns that describe their effectiveness. For example, an "L"-shaped curve indicates that the feature is discriminative while a gradually descending curve suggests a bad feature. As such, this paper introduces a query-adaptive late fusion pipeline. In the hand-crafted version, it can be an unsupervised approach to tasks like particular object retrieval. In the learning version, it can also be applied to supervised tasks like person recognition and pedestrian retrieval, based on a trainable neural module. Extensive experiments are conducted on two object retrieval datasets and one person recognition dataset. We show that our method is able to highlight the good features and suppress the bad ones, is resilient to distractor features, and achieves very competitive retrieval accuracy compared with the state of the art. In an additional person re-identification dataset, the application scope and limitation of the proposed method are studied.
Alvaro Henrique Chaim Correia, Jorge Luiz Moreira Silva, Thiago de Castro Martins, Fabio Gagliardi Cozman
Recurrent neural networks are now the state-of-the-art in natural language
processing because they can build rich contextual representations and process
texts of arbitrary length. However, recent developments on attention mechanisms
have equipped feedforward networks with similar capabilities, hence enabling
faster computations due to the increase in the number of operations that can be
parallelized. We explore this new type of architecture in the domain of
question-answering and propose a novel approach that we call Fully Attention
Based Information Retriever (FABIR). We show that FABIR achieves competitive
results in the Stanford Question Answering Dataset (SQuAD) while having fewer
parameters and being faster at both learning and inference than rival methods.
Authors' comments: Accepted for presentation at the International Joint Conference on
Neural Networks (IJCNN) 2018