Sean MacAvaney, Andrew Yates, Arman Cohan, Luca Soldaini, Kai Hui, Nazli Goharian, Ophir Frieder
Complex answer retrieval (CAR) is the process of retrieving answers to
questions that have multifaceted or nuanced answers. In this work, we present
two novel approaches for CAR based on the observation that question facets can
vary in utility: from structural (facets that can apply to many similar topics,
such as 'History') to topical (facets that are specific to the question's
topic, such as the 'Westward expansion' of the United States). We first explore
a way to incorporate facet utility into ranking models during query term score
combination. We then explore a general approach to reform the structure of
ranking models to aid in learning of facet utility in the query-document term
matching phase. When we use our techniques with a leading neural ranker on the
TREC CAR dataset, our methods rank first in the 2017 TREC CAR benchmark, and
yield up to 26% higher performance than the next best method.
Authors' comments: 4 pages; SIGIR 2018 Short Paper
Yale Song, Mohammad Soleymani
Traditional cross-modal retrieval assumes explicit association of concepts across modalities, where there is no ambiguity in how the concepts are linked to each other, e.g., when we do the image search with a query "dogs", we expect to see dog images. In this paper, we consider a different setting for cross-modal retrieval where data from different modalities are implicitly linked via concepts that must be inferred by high-level reasoning; we call this setting implicit concept association. To foster future research in this setting, we present a new dataset containing 47K pairs of animated GIFs and sentences crawled from the web, in which the GIFs depict physical or emotional reactions to the scenarios described in the text (called "reaction GIFs"). We report on a user study showing that, despite the presence of implicit concept association, humans are able to identify video-sentence pairs with matching concepts, suggesting the feasibility of our task. Furthermore, we propose a novel visual-semantic embedding network based on multiple instance learning. Unlike traditional approaches, we compute multiple embeddings from each modality, each representing different concepts, and measure their similarity by considering all possible combinations of visual-semantic embeddings in the framework of multiple instance learning. We evaluate our approach on two video-sentence datasets with explicit and implicit concept association and report competitive results compared to existing approaches on cross-modal retrieval.
Hieu Thao Nguyen, D. Russell Luke, Oleg Soloviev, Michel Verhaegen
For the first time, this paper investigates the phase retrieval problem with the assumption that the phase (of the complex signal) is sparse in contrast to the sparsity assumption on the signal itself as considered in the literature of sparse signal processing. The intended application of this new problem model, which will be conducted in a follow-up paper, is to practical phase retrieval problems where the aberration phase is sparse with respect to the orthogonal basis of Zernike polynomials. Such a problem is called sparse phase retrieval (SPR) problem in this paper. When the amplitude modulation at the exit pupil is uniform, a new scheme of sparsity regularization on phase is proposed to capture the sparsity property of the SPR problem. Based on this regularization scheme, we design and analyze an efficient solution method, named SROP algorithm, for solving SPR given only a single intensity point-spread-function image. The algorithm is a combination of the Gerchberg-Saxton algorithm with the newly proposed sparsity regularization on the phase. The latter regularization step is mathematically a rotation but with direction varying in iterations. Surprisingly, this rotation is shown to be a metric projection on an auxiliary set which is independent of iterations. As a consequence, SROP algorithm is proved to be the cyclic projections algorithm for solving a feasibility problem involving three auxiliary sets. Analyzing regularity properties of the latter auxiliary sets, we obtain convergence results for SROP algorithm based on recent convergence theory for the cyclic projections algorithm. Numerical results show clear effectiveness of the new regularization scheme for solving the SPR problem.
Sherry Xue-Ying Ni, Man-Chung Yue, Kam-Fung Cheung, Anthony Man-Cho So
The problem of phase retrieval is revisited and studied from a fresh perspective. In particular, we establish a connection between the phase retrieval problem and the sensor network localization problem, which allows us to utilize the vast theoretical and algorithmic literature on the latter to tackle the former. Leveraging this connection, we develop a two-stage algorithm for phase retrieval that can provably recover the desired signal. In both sparse and dense settings, our proposed algorithm improves upon prior approaches simultaneously in the number of required measurements for recovery and the reconstruction time. We present numerical results to corroborate our theory and to demonstrate the efficiency of the proposed algorithm. As a side result, we propose a new form of phase retrieval problem and connect it to the complex rigidity theory proposed by Gortler and Thurston.
Anh Nguyen, Thanh-Toan Do, Ian Reid, Darwin G. Caldwell, Nikos G. Tsagarakis
We address the problem of jointly learning vision and language to understand
the object in a fine-grained manner. The key idea of our approach is the use of
object descriptions to provide the detailed understanding of an object. Based
on this idea, we propose two new architectures to solve two related problems:
object captioning and natural language-based object retrieval. The goal of the
object captioning task is to simultaneously detect the object and generate its
associated description, while in the object retrieval task, the goal is to
localize an object given an input query. We demonstrate that both problems can
be solved effectively using hybrid end-to-end CNN-LSTM networks. The
experimental results on our new challenging dataset show that our methods
outperform recent methods by a fair margin, while providing a detailed
understanding of the object and having fast inference time. The source code
will be made available.
Authors' comments: 8 pages, 8 figures
Minh Pham, Penghang Yin, Arjun Rana, Stanley Osher, Jiawei Miao
In this paper, we report the development of the generalized proximal
smoothing (GPS) algorithm for phase retrieval of noisy data. GPS is a
optimization-based algorithm, in which we relax both the Fourier magnitudes and
object constraints. We relax the object constraint by introducing the
generalized Moreau-Yosida regularization and heat kernel smoothing. We are able
to readily handle the associated proximal mapping in the dual variable by using
an infimal convolution. We also relax the magnitude constraint into a least
squares fidelity term, whose proximal mapping is available. GPS alternatively
iterates between the two proximal mappings in primal and dual spaces,
respectively. Using both numerical simulation and experimental data, we show
that GPS algorithm consistently outperforms the classical phase retrieval
algorithms such as hybrid input-output (HIO) and oversampling smoothness (OSS),
in terms of the convergence speed, consistency of the phase retrieval, and
robustness to noise.
Authors' comments: 12 pages, 38 figures
Abhijit Suprem, Polo Chau
Traditional image recognition involves identifying the key object in a portrait-type image with a single object focus (ILSVRC, AlexNet, and VGG). More recent approaches consider dense image recognition - segmenting an image with appropriate bounding boxes and performing image recognition within these bounding boxes (Semantic segmentation). The Visual Genome dataset [5] is an attempt to bridge these various approaches to a cohesive dataset for each subtask - bounding box generation, image recognition, captioning, and a new operation: scene graph generation. Our focus is on using such scene graphs to perform graph search on image databases to holistically retrieve images based on a search criteria. We develop a method to store scene graphs and metadata in graph databases (using Neo4J) and to perform fast approximate retrieval of images based on a graph search query. We process more complex queries than single object search, e.g. "girl eating cake" retrieves images that contain the specified relation as well as variations.
Junfeng Su, Datang Xu, Guoxiang Huang
We investigate the memory of surface polariton (SP) via the electromagnetically induced transparency (EIT) of quantum emitters doped at the interface between a dielectric and a metamaterial. We show that, due to the strong mode confinement provided by the interface, the EIT effect can be largely enhanced; furthermore, the storage and retrieval of the SP can be realized by switching off and on of a control laser field; additionally, the efficiency and fidelity of the SP memory can be improved much by using a weak microwave field. The results reported here are helpful not only for enhancing the understanding of SP property but also for promising applications in quantum information processing and transmission.
Dayong Tian
In social networks, heterogeneous multimedia data correlate to each other, such as videos and their corresponding tags in YouTube and image-text pairs in Facebook. Nearest neighbor retrieval across multiple modalities on large data sets becomes a hot yet challenging problem. Hashing is expected to be an efficient solution, since it represents data as binary codes. As the bit-wise XOR operations can be fast handled, the retrieval time is greatly reduced. Few existing multimodal hashing methods consider the correlation among hashing bits. The correlation has negative impact on hashing codes. When the hashing code length becomes longer, the retrieval performance improvement becomes slower. In this paper, we propose a minimum correlation regularization (MCR) for multimodal hashing. First, the sigmoid function is used to embed the data matrices. Then, the MCR is applied on the output of sigmoid function. As the output of sigmoid function approximates a binary code matrix, the proposed MCR can efficiently decorrelate the hashing codes. Experiments show the superiority of the proposed method becomes greater as the code length increases.
Shuo Zhang, Krisztian Balog
We introduce and address the problem of ad hoc table retrieval: answering a
keyword query with a ranked list of tables. This task is not only interesting
on its own account, but is also being used as a core component in many other
table-based information access scenarios, such as table completion or table
mining. The main novel contribution of this work is a method for performing
semantic matching between queries and tables. Specifically, we (i) represent
queries and tables in multiple semantic spaces (both discrete sparse and
continuous dense vector representations) and (ii) introduce various similarity
measures for matching those semantic representations. We consider all possible
combinations of semantic representations and similarity measures and use these
as features in a supervised learning model. Using a purpose-built test
collection based on Wikipedia tables, we demonstrate significant and
substantial improvements over a state-of-the-art baseline.
Authors' comments: The web conference 2018 (WWW'18)
Vladimir Katkovnik, Karen Egiazarian
The phase retrieval from multi-frequency intensity (power) observations is
considered. The object to be reconstructed is complex-valued. A novel algorithm
is presented that accomplishes both the object phase (absolute phase) retrieval
and denoising for Poissonian and Gaussian measurements. The algorithm is
derived from the maximum likelihood formulation with Block Matching 3D (BM3D)
sparsity priors. These priors result in two filtering: one is in the complex
domain for complex-valued multi-frequency object images and another one in the
real domain for the object phase. The algorithm is iterative with alternating
projections between the object and measurement variables. The simulation
experiments are produced for Fourier transform image formation and random phase
modulations of the object, then the observations are random object diffraction
patterns. The results demonstrate the success of the algorithm for
reconstruction of the complex phase objects with the high-accuracy performance
even for very noisy data.
Authors' comments: 5 pages, 3 figures
Tanya Piplani, David Bamman
Most of the internet today is composed of digital media that includes videos
and images. With pixels becoming the currency in which most transactions happen
on the internet, it is becoming increasingly important to have a way of
browsing through this ocean of information with relative ease. YouTube has 400
hours of video uploaded every minute and many million images are browsed on
Instagram, Facebook, etc. Inspired by recent advances in the field of deep
learning and success that it has gained on various problems like image
captioning and, machine translation , word2vec , skip thoughts, etc, we present
DeepSeek a natural language processing based deep learning model that allows
users to enter a description of the kind of images that they want to search,
and in response the system retrieves all the images that semantically and
contextually relate to the query. Two approaches are described in the following
sections.
Authors' comments: arXiv admin note: text overlap with arXiv:1706.06064 by other authors
Zhuoran Yang, Lin F. Yang, Ethan X. Fang, Tuo Zhao, Zhaoran Wang, Matey Neykov
Existing nonconvex statistical optimization theory and methods crucially rely
on the correct specification of the underlying "true" statistical models. To
address this issue, we take a first step towards taming model misspecification
by studying the high-dimensional sparse phase retrieval problem with
misspecified link functions. In particular, we propose a simple variant of the
thresholded Wirtinger flow algorithm that, given a proper initialization,
linearly converges to an estimator with optimal statistical accuracy for a
broad family of unknown link functions. We further provide extensive numerical
experiments to support our theoretical findings.
Authors' comments: 56 pages
Zhangjie Cao, Mingsheng Long, Chao Huang, Jianmin Wang
Hashing is widely applied to large-scale image retrieval due to the storage and retrieval efficiency. Existing work on deep hashing assumes that the database in the target domain is identically distributed with the training set in the source domain. This paper relaxes this assumption to a transfer retrieval setting, which allows the database and the training set to come from different but relevant domains. However, the transfer retrieval setting will introduce two technical difficulties: first, the hash model trained on the source domain cannot work well on the target domain due to the large distribution gap; second, the domain gap makes it difficult to concentrate the database points to be within a small Hamming ball. As a consequence, transfer retrieval performance within Hamming Radius 2 degrades significantly in existing hashing methods. This paper presents Transfer Adversarial Hashing (TAH), a new hybrid deep architecture that incorporates a pairwise $t$-distribution cross-entropy loss to learn concentrated hash codes and an adversarial network to align the data distributions between the source and target domains. TAH can generate compact transfer hash codes for efficient image retrieval on both source and target domains. Comprehensive experiments validate that TAH yields state of the art Hamming space retrieval performance on standard datasets.
Liang Zhang, Gang Wang, Georgios B. Giannakis, Jie Chen
The problem of reconstructing a sparse signal vector from magnitude-only measurements (a.k.a., compressive phase retrieval), emerges naturally in diverse applications, but it is NP-hard in general. Building on recent advances in nonconvex optimization, this paper puts forth a new algorithm that is termed compressive reweighted amplitude flow and abbreviated as CRAF, for compressive phase retrieval. Specifically, CRAF operates in two stages. The first stage seeks a sparse initial guess via a new spectral procedure. In the second stage, CRAF implements a few hard thresholding based iterations using reweighted gradients. When there are sufficient measurements, CRAF provably recovers the underlying signal vector exactly with high probability under suitable conditions. Moreover, its sample complexity coincides with that of the state-of-the-art procedures. Finally, substantial simulated tests showcase remarkable performance of the new spectral initialization, as well as improved exact recovery relative to competing alternatives.
Qing Qu, Yuqian Zhang, Yonina C. Eldar, John Wright
We study the convolutional phase retrieval problem, of recovering an unknown
signal $\mathbf x \in \mathbb C^n $ from $m$ measurements consisting of the
magnitude of its cyclic convolution with a given kernel $\mathbf a \in \mathbb
C^m $. This model is motivated by applications such as channel estimation,
optics, and underwater acoustic communication, where the signal of interest is
acted on by a given channel/filter, and phase information is difficult or
impossible to acquire. We show that when $\mathbf a$ is random and the number
of observations $m$ is sufficiently large, with high probability $\mathbf x$
can be efficiently recovered up to a global phase shift using a combination of
spectral initialization and generalized gradient descent. The main challenge is
coping with dependencies in the measurement operator. We overcome this
challenge by using ideas from decoupling theory, suprema of chaos processes and
the restricted isometry property of random circulant matrices, and recent
analysis of alternating minimization methods.
Authors' comments: 64 pages , 9 figures, appeared in NeurIPS 2017. Accepted at IEEE
Transactions on Information Theory. This is the final (minor) update: fixed
typos and grammar issues
Aniruddha Tammewar, Monik Pamecha, Chirag Jain, Apurva Nagvenkar, Krupal Modi
In this paper, we present a hybrid model that combines a neural
conversational model and a rule-based graph dialogue system that assists users
in scheduling reminders through a chat conversation. The graph based system has
high precision and provides a grammatically accurate response but has a low
recall. The neural conversation model can cater to a variety of requests, as it
generates the responses word by word as opposed to using canned responses. The
hybrid system shows significant improvements over the existing baseline system
of rule based approach and caters to complex queries with a domain-restricted
neural model. Restricting the conversation topic and combination of graph based
retrieval system with a neural generative model makes the final system robust
enough for a real world application.
Authors' comments: DEEPDIAL-18, AAAI-2018
Binanda Sengupta, Sushmita Ruj
Cloud servers offer data outsourcing facility to their clients. A client
outsources her data without having any copy at her end. Therefore, she needs a
guarantee that her data are not modified by the server which may be malicious.
Data auditing is performed on the outsourced data to resolve this issue.
Moreover, the client may want all her data to be stored untampered. In this
chapter, we describe proofs of retrievability (POR) that convince the client
about the integrity of all her data.
Authors' comments: A version has been published as a book chapter in Guide to Security
Assurance for Cloud Computing (Springer International Publishing Switzerland
2015)
Daniel Heestermans Svendsen, Luca Martino, Manuel Campos-Taberner, Francisco Javier García-Haro, Gustau Camps-Valls
Solving inverse problems is central to geosciences and remote sensing.
Radiative transfer models (RTMs) represent mathematically the physical laws
which govern the phenomena in remote sensing applications (forward models). The
numerical inversion of the RTM equations is a challenging and computationally
demanding problem, and for this reason, often the application of a nonlinear
statistical regression is preferred. In general, regression models predict the
biophysical parameter of interest from the corresponding received radiance.
However, this approach does not employ the physical information encoded in the
RTMs. An alternative strategy, which attempts to include the physical
knowledge, consists in learning a regression model trained using data simulated
by an RTM code. In this work, we introduce a nonlinear nonparametric regression
model which combines the benefits of the two aforementioned approaches. The
inversion is performed taking into account jointly both real observations and
RTM-simulated data. The proposed Joint Gaussian Process (JGP) provides a solid
framework for exploiting the regularities between the two types of data. The
JGP automatically detects the relative quality of the simulated and real data,
and combines them accordingly. This occurs by learning an additional
hyper-parameter w.r.t. a standard GP model, and fitting parameters through
maximizing the pseudo-likelihood of the real observations. The resulting scheme
is both simple and robust, i.e., capable of adapting to different scenarios.
The advantages of the JGP method compared to benchmark strategies are shown
considering RTM-simulated and real observations in different experiments.
Specifically, we consider leaf area index (LAI) retrieval from Landsat data
combined with simulated data generated by the PROSAIL model.
Authors' comments: 21 pages single column, Accepted for publication in IEEE Transactions
on Geoscience and Remote Sensing
Damek Davis, Dmitriy Drusvyatskiy, Courtney Paquette
We consider a popular nonsmooth formulation of the real phase retrieval
problem. We show that under standard statistical assumptions, a simple
subgradient method converges linearly when initialized within a constant
relative distance of an optimal solution. Seeking to understand the
distribution of the stationary points of the problem, we complete the paper by
proving that as the number of Gaussian measurements increases, the stationary
points converge to a codimension two set, at a controlled rate. Experiments on
image recovery problems illustrate the developed algorithm and theory.
Authors' comments: 42 Pages, 15 figures