Bernhard Kratzwald, Stefan Feuerriegel
State-of-the-art systems in deep question answering proceed as follows: (1)
an initial document retrieval selects relevant documents, which (2) are then
processed by a neural network in order to extract the final answer. Yet the
exact interplay between both components is poorly understood, especially
concerning the number of candidate documents that should be retrieved. We show
that choosing a static number of documents -- as used in prior research --
suffers from a noise-information trade-off and yields suboptimal results. As a
remedy, we propose an adaptive document retrieval model. This learns the
optimal candidate number for document retrieval, conditional on the size of the
corpus and the query. We report extensive experimental results showing that our
adaptive approach outperforms state-of-the-art methods on multiple benchmark
datasets, as well as in the context of corpora with variable sizes.
Authors' comments: EMNLP 2018
Behrooz Tahmasebi, Mohammad Ali Maddah-Ali, Seyed Abolfazl Motahari
The objective of a genome-wide association study (GWAS) is to associate subsequences of individuals' genomes to the observable characteristics called phenotypes (e.g., high blood pressure). Motivated by the GWAS problem, in this paper we introduce the information-theoretic problem of \emph{associated subsequence retrieval}, where a dataset of $N$ (possibly high-dimensional) sequences of length $G$, and their corresponding observable (binary) characteristics is given. The sequences are chosen independently and uniformly at random from $\mathcal{X}^G$, where $\mathcal{X}$ is a finite alphabet. The observable (binary) characteristic is only related to a specific unknown subsequence of length $L$ of the sequences, called \textit{associated subsequence}. For each sequence, if the associated subsequence of it belongs to a universal finite set, then it is more likely to display the observable characteristic (i.e., it is more likely that the observable characteristic is one). The goal is to retrieve the associated subsequence using a dataset of $N$ sequences and their observable characteristics. We demonstrate that as the parameters $N$, $G$, and $L$ grow, a threshold effect appears in the curve of probability of error versus the rate which is defined as ${Gh(L/G)}/{N}$, where $h(\cdot)$ is the binary entropy function. This effect allows us to define the capacity of associated subsequence retrieval. We develop an achievable scheme and a matching converse for this problem, and thus characterize its capacity in two scenarios: the zero-error-rate and the $\epsilon$-error-rate.
Bochen Guan, Hanrong Ye, Hong Liu, William A. Sethares
Estimation of the frequency and duration of logos in videos is important and
challenging in the advertisement industry as a way of estimating the impact of
ad purchases. Since logos occupy only a small area in the videos, the popular
methods of image retrieval could fail. This paper develops an algorithm called
Video Logo Retrieval (VLR), which is an image-to-video retrieval algorithm
based on the spatial distribution of local image descriptors that measure the
distance between the query image (the logo) and a collection of video images.
VLR uses local features to overcome the weakness of global feature-based models
such as convolutional neural networks (CNN). Meanwhile, VLR is flexible and
does not require training after setting some hyper-parameters. The performance
of VLR is evaluated on two challenging open benchmark tasks (SoccerNet and
Standford I2V), and compared with other state-of-the-art logo retrieval or
detection algorithms. Overall, VLR shows significantly higher accuracy compared
with the existing methods.
Authors' comments: Accepted by ICIP 20. Contact author: Bochen Guan (gbochen@wisc.edu)
Gilles Baechler, Miranda Kreković, Juri Ranieri, Amina Chebira, Yue M. Lu, Martin Vetterli
In a variety of fields, in particular those involving imaging and optics, we often measure signals whose phase is missing or has been irremediably distorted. Phase retrieval attempts to recover the phase information of a signal from the magnitude of its Fourier transform to enable the reconstruction of the original signal. Solving the phase retrieval problem is equivalent to recovering a signal from its auto-correlation function. In this paper, we assume the original signal to be sparse; this is a natural assumption in many applications, such as X-ray crystallography, speckle imaging and blind channel estimation. We propose an algorithm that resolves the phase retrieval problem in three stages: i) we leverage the finite rate of innovation sampling theory to super-resolve the auto-correlation function from a limited number of samples, ii) we design a greedy algorithm that identifies the locations of a sparse solution given the super-resolved auto-correlation function, iii) we recover the amplitudes of the atoms given their locations and the measured auto-correlation function. Unlike traditional approaches that recover a discrete approximation of the underlying signal, our algorithm estimates the signal on a continuous domain, which makes it the first of its kind. Along with the algorithm, we derive its performance bound with a theoretical analysis and propose a set of enhancements to improve its computational complexity and noise resilience. Finally, we demonstrate the benefits of the proposed method via a comparison against Charge Flipping, a notable algorithm in crystallography.
Rafael Valle
This paper describes computational methods for the visual display and analysis of music information. We provide a concise description of software, music descriptors and data visualization techniques commonly used in music information retrieval. Finally, we provide use cases where the described software, descriptors and visualizations are showcased.
Samuel Pinilla, Jorge Bacca, Henry Arguello
Phase retrieval (PR) is an ill-conditioned inverse problem which can be found in various science and engineering applications. Assuming sparse priority over the signal of interest, recent algorithms have been developed to solve the phase retrieval problem. Some examples include SparseAltMinPhase (SAMP), Sparse Wirtinger flow (SWF) and Sparse Truncated Amplitude flow (SPARTA). However, the optimization cost functions of the mentioned algorithms are non-convex and non-smooth. In order to fix the non-smoothness of the cost function, the SPARTA method uses truncation thresholds to calculate a truncated step update direction. In practice, the truncation procedure requires calculating more parameters to obtain a desired performance in the phase recovery. Therefore, this paper proposes an algorithm called SPRSF (Sparse Phase retrieval via Smoothing Function) to solve the sparse PR problem by introducing a smoothing function. SPRSF is an iterative algorithm where the update step is obtained by a hard thresholding over a gradient descent direction. Theoretical analyses show that the smoothing function uniformly approximates the non-convex and non-smooth sparse PR optimization problem. Moreover, SPRSF does not require the truncation procedure used in SPARTA. Numerical tests demonstrate that SPRSF performs better than state-of-the-art methods, especially when there is no knowledge about the sparsity $k$. In particular, SPRSF attains a higher mean recovery rate in comparison with SPARTA, SAMP and SWF methods, when the sparsity varies for the real and complex cases. Further, in terms of the sampling complexity, the SPRSF method outperforms its competitive alternatives.
Paul Hand, Oscar Leong, Vladislav Voroninski
The phase retrieval problem asks to recover a natural signal $y_0 \in
\mathbb{R}^n$ from $m$ quadratic observations, where $m$ is to be minimized. As
is common in many imaging problems, natural signals are considered sparse with
respect to a known basis, and the generic sparsity prior is enforced via
$\ell_1$ regularization. While successful in the realm of linear inverse
problems, such $\ell_1$ methods have encountered possibly fundamental
limitations, as no computationally efficient algorithm for phase retrieval of a
$k$-sparse signal has been proven to succeed with fewer than $O(k^2\log n)$
generic measurements, exceeding the theoretical optimum of $O(k \log n)$. In
this paper, we propose a novel framework for phase retrieval by 1) modeling
natural signals as being in the range of a deep generative neural network $G :
\mathbb{R}^k \rightarrow \mathbb{R}^n$ and 2) enforcing this prior directly by
optimizing an empirical risk objective over the domain of the generator. Our
formulation has provably favorable global geometry for gradient methods, as
soon as $m = O(kd^2\log n)$, where $d$ is the depth of the network.
Specifically, when suitable deterministic conditions on the generator and
measurement matrix are met, we construct a descent direction for any point
outside of a small neighborhood around the unique global minimizer and its
negative multiple, and show that such conditions hold with high probability
under Gaussian ensembles of multilayer fully-connected generator networks and
measurement matrices. This formulation for structured phase retrieval thus has
two advantages over sparsity based methods: 1) deep generative priors can more
tightly represent natural signals and 2) information theoretically optimal
sample complexity. We corroborate these results with experiments showing that
exploiting generative models in phase retrieval tasks outperforms sparse phase
retrieval methods.
Authors' comments: 28 pages, 5 figures
Manos Schinas, Symeon Papadopoulos, Yiannis Kompatsiaris, Pericles Mitkas
In the recent years, we have witnessed the rapid adoption of social media platforms, such as Twitter, Facebook and YouTube, and their use as part of the everyday life of billions of people worldwide. Given the habit of people to use these platforms to share thoughts, daily activities and experiences it is not surprising that the amount of user generated content has reached unprecedented levels, with a substantial part of that content being related to real-world events, i.e. actions or occurrences taking place at a certain time and location. Given the key role of events in our life, the task of annotating and organizing social media content around them is of crucial importance for ensuring real-time and future access to multimedia content about an event of interest. In this chapter, we present several research efforts from recent years that tackle two main problems: a) event detection and b) event-based media retrieval and summarization. Given archived collections or live streams of social media items, the purpose of event detection methods is to identify previously unknown events in the form of sets of items that describe them. In general, the events could be of any type, but there are also approaches aiming at events of specific type. Given a target event the goal of event summarization is first to identify relevant content and then to represent it in a concise way, selecting the most appealing and representative content.
Zehong Cao, Mukesh Prasad, M. Tanveer, Chin-Teng Lin
Prior studies have proposed methods to recover multi-channel electroencephalography (EEG) signal ensembles from their partially sampled entries. These methods depend on spatial scenarios, yet few approaches aiming to a temporal reconstruction with lower loss. The goal of this study is to retrieve the temporal EEG signals independently which was overlooked in data pre-processing. We considered EEG signals are impinging on tensor-based approach, named nonlinear Canonical Polyadic Decomposition (CPD). In this study, we collected EEG signals during a resting-state task. Then, we defined that the source signals are original EEG signals and the generated tensor is perturbed by Gaussian noise with a signal-to-noise ratio of 0 dB. The sources are separated using a basic non-negative CPD and the relative errors on the estimates of the factor matrices. Comparing the similarities between the source signals and their recovered versions, the results showed significantly high correlation over 95%. Our findings reveal the possibility of recoverable temporal signals in EEG applications.
Rawad Bitar, Salim El Rouayheb
We consider the problem of designing private information retrieval (PIR)
schemes on data of $m$ files replicated on $n$ servers that can possibly
collude. We focus on devising robust PIR schemes that can tolerate stragglers,
i.e., slow or unresponsive servers. In many settings, the number of stragglers
is not known a priori or may change with time. We define universally robust PIR
as schemes that achieve PIR capacity asymptotically in $m$ and simultaneously
for any number of stragglers up to a given threshold. We introduce
Staircase-PIR schemes and prove that they are universally robust. Towards that
end, we establish an equivalence between robust PIR and communication efficient
secret sharing.
Authors' comments: Extended version of a paper accepted in IEEE Information Theory
workshop 2018
Ali Ahmed, Alireza Aghasi, Paul Hand
We consider the task of recovering two real or complex $m$-vectors from phaseless Fourier measurements of their circular convolution. Our method is a novel convex relaxation that is based on a lifted matrix recovery formulation that allows a nontrivial convex relaxation of the bilinear measurements from convolution. We prove that if the two signals belong to known random subspaces of dimensions $k$ and $n$, then they can be recovered up to the inherent scaling ambiguity with $m >> (k+n) \log^2 m$ phaseless measurements. Our method provides the first theoretical recovery guarantee for this problem by a computationally efficient algorithm and does not require a solution estimate to be computed for initialization. Our proof is based Rademacher complexity estimates. Additionally, we provide an ADMM implementation of the method and provide numerical experiments that verify the theory.
Rima Alaifari, Philipp Grohs
The problem of reconstructing a function from the magnitudes of its frame
coefficients has recently been shown to be never uniformly stable in
infinite-dimensional spaces [5]. This result also holds for frames that are
possibly continuous [2]. On the other hand, the problem is always stable in
finite-dimensional settings. A prominent example of such a phase retrieval
problem is the recovery of a signal from the modulus of its Gabor transform. In
this paper, we study Gabor phase retrieval and ask how the stability degrades
on a natural family of finite-dimensional subspaces of the signal domain
$L^2(\mathbb{R})$. We prove that the stability constant scales at least
quadratically exponentially in the dimension of the subspaces. Our construction
also shows that typical priors such as sparsity or smoothness promoting
penalties do not constitute regularization terms for phase retrieval.
Authors' comments: 17 pages
Sean MacAvaney, Andrew Yates, Arman Cohan, Luca Soldaini, Kai Hui, Nazli Goharian, Ophir Frieder
Complex answer retrieval (CAR) is the process of retrieving answers to
questions that have multifaceted or nuanced answers. In this work, we present
two novel approaches for CAR based on the observation that question facets can
vary in utility: from structural (facets that can apply to many similar topics,
such as 'History') to topical (facets that are specific to the question's
topic, such as the 'Westward expansion' of the United States). We first explore
a way to incorporate facet utility into ranking models during query term score
combination. We then explore a general approach to reform the structure of
ranking models to aid in learning of facet utility in the query-document term
matching phase. When we use our techniques with a leading neural ranker on the
TREC CAR dataset, our methods rank first in the 2017 TREC CAR benchmark, and
yield up to 26% higher performance than the next best method.
Authors' comments: 4 pages; SIGIR 2018 Short Paper
Yale Song, Mohammad Soleymani
Traditional cross-modal retrieval assumes explicit association of concepts across modalities, where there is no ambiguity in how the concepts are linked to each other, e.g., when we do the image search with a query "dogs", we expect to see dog images. In this paper, we consider a different setting for cross-modal retrieval where data from different modalities are implicitly linked via concepts that must be inferred by high-level reasoning; we call this setting implicit concept association. To foster future research in this setting, we present a new dataset containing 47K pairs of animated GIFs and sentences crawled from the web, in which the GIFs depict physical or emotional reactions to the scenarios described in the text (called "reaction GIFs"). We report on a user study showing that, despite the presence of implicit concept association, humans are able to identify video-sentence pairs with matching concepts, suggesting the feasibility of our task. Furthermore, we propose a novel visual-semantic embedding network based on multiple instance learning. Unlike traditional approaches, we compute multiple embeddings from each modality, each representing different concepts, and measure their similarity by considering all possible combinations of visual-semantic embeddings in the framework of multiple instance learning. We evaluate our approach on two video-sentence datasets with explicit and implicit concept association and report competitive results compared to existing approaches on cross-modal retrieval.
Hieu Thao Nguyen, D. Russell Luke, Oleg Soloviev, Michel Verhaegen
For the first time, this paper investigates the phase retrieval problem with the assumption that the phase (of the complex signal) is sparse in contrast to the sparsity assumption on the signal itself as considered in the literature of sparse signal processing. The intended application of this new problem model, which will be conducted in a follow-up paper, is to practical phase retrieval problems where the aberration phase is sparse with respect to the orthogonal basis of Zernike polynomials. Such a problem is called sparse phase retrieval (SPR) problem in this paper. When the amplitude modulation at the exit pupil is uniform, a new scheme of sparsity regularization on phase is proposed to capture the sparsity property of the SPR problem. Based on this regularization scheme, we design and analyze an efficient solution method, named SROP algorithm, for solving SPR given only a single intensity point-spread-function image. The algorithm is a combination of the Gerchberg-Saxton algorithm with the newly proposed sparsity regularization on the phase. The latter regularization step is mathematically a rotation but with direction varying in iterations. Surprisingly, this rotation is shown to be a metric projection on an auxiliary set which is independent of iterations. As a consequence, SROP algorithm is proved to be the cyclic projections algorithm for solving a feasibility problem involving three auxiliary sets. Analyzing regularity properties of the latter auxiliary sets, we obtain convergence results for SROP algorithm based on recent convergence theory for the cyclic projections algorithm. Numerical results show clear effectiveness of the new regularization scheme for solving the SPR problem.
Sherry Xue-Ying Ni, Man-Chung Yue, Kam-Fung Cheung, Anthony Man-Cho So
The problem of phase retrieval is revisited and studied from a fresh perspective. In particular, we establish a connection between the phase retrieval problem and the sensor network localization problem, which allows us to utilize the vast theoretical and algorithmic literature on the latter to tackle the former. Leveraging this connection, we develop a two-stage algorithm for phase retrieval that can provably recover the desired signal. In both sparse and dense settings, our proposed algorithm improves upon prior approaches simultaneously in the number of required measurements for recovery and the reconstruction time. We present numerical results to corroborate our theory and to demonstrate the efficiency of the proposed algorithm. As a side result, we propose a new form of phase retrieval problem and connect it to the complex rigidity theory proposed by Gortler and Thurston.
Anh Nguyen, Thanh-Toan Do, Ian Reid, Darwin G. Caldwell, Nikos G. Tsagarakis
We address the problem of jointly learning vision and language to understand
the object in a fine-grained manner. The key idea of our approach is the use of
object descriptions to provide the detailed understanding of an object. Based
on this idea, we propose two new architectures to solve two related problems:
object captioning and natural language-based object retrieval. The goal of the
object captioning task is to simultaneously detect the object and generate its
associated description, while in the object retrieval task, the goal is to
localize an object given an input query. We demonstrate that both problems can
be solved effectively using hybrid end-to-end CNN-LSTM networks. The
experimental results on our new challenging dataset show that our methods
outperform recent methods by a fair margin, while providing a detailed
understanding of the object and having fast inference time. The source code
will be made available.
Authors' comments: 8 pages, 8 figures
Minh Pham, Penghang Yin, Arjun Rana, Stanley Osher, Jiawei Miao
In this paper, we report the development of the generalized proximal
smoothing (GPS) algorithm for phase retrieval of noisy data. GPS is a
optimization-based algorithm, in which we relax both the Fourier magnitudes and
object constraints. We relax the object constraint by introducing the
generalized Moreau-Yosida regularization and heat kernel smoothing. We are able
to readily handle the associated proximal mapping in the dual variable by using
an infimal convolution. We also relax the magnitude constraint into a least
squares fidelity term, whose proximal mapping is available. GPS alternatively
iterates between the two proximal mappings in primal and dual spaces,
respectively. Using both numerical simulation and experimental data, we show
that GPS algorithm consistently outperforms the classical phase retrieval
algorithms such as hybrid input-output (HIO) and oversampling smoothness (OSS),
in terms of the convergence speed, consistency of the phase retrieval, and
robustness to noise.
Authors' comments: 12 pages, 38 figures
Abhijit Suprem, Polo Chau
Traditional image recognition involves identifying the key object in a portrait-type image with a single object focus (ILSVRC, AlexNet, and VGG). More recent approaches consider dense image recognition - segmenting an image with appropriate bounding boxes and performing image recognition within these bounding boxes (Semantic segmentation). The Visual Genome dataset [5] is an attempt to bridge these various approaches to a cohesive dataset for each subtask - bounding box generation, image recognition, captioning, and a new operation: scene graph generation. Our focus is on using such scene graphs to perform graph search on image databases to holistically retrieve images based on a search criteria. We develop a method to store scene graphs and metadata in graph databases (using Neo4J) and to perform fast approximate retrieval of images based on a graph search query. We process more complex queries than single object search, e.g. "girl eating cake" retrieves images that contain the specified relation as well as variations.
Junfeng Su, Datang Xu, Guoxiang Huang
We investigate the memory of surface polariton (SP) via the electromagnetically induced transparency (EIT) of quantum emitters doped at the interface between a dielectric and a metamaterial. We show that, due to the strong mode confinement provided by the interface, the EIT effect can be largely enhanced; furthermore, the storage and retrieval of the SP can be realized by switching off and on of a control laser field; additionally, the efficiency and fidelity of the SP memory can be improved much by using a weak microwave field. The results reported here are helpful not only for enhancing the understanding of SP property but also for promising applications in quantum information processing and transmission.