Maxim Pisov, Gleb Makarchuk, Valery Kostjuchenko, Alexandra Dalechina, Andrey Golanov, Mikhail Belyaev
Classification-based image retrieval systems are built by training convolutional neural networks (CNNs) on a relevant classification problem and using the distance in the resulting feature space as a similarity metric. However, in practical applications, it is often desirable to have representations which take into account several aspects of the data (e.g., brain tumor type and its localization). In our work, we extend the classification-based approach with multitask learning: we train a CNN on brain MRI scans with heterogeneous labels and implement a corresponding tumor image retrieval system. We validate our approach on brain tumor data which contains information about tumor types, shapes and localization. We show that our method allows us to build representations that contain more relevant information about tumors than single-task classification-based approaches.
Meng Huang, Ming-Jun Lai, Abraham Varghese, Zhiqiang Xu
In this paper, we develop a new computational approach which is based on
minimizing the difference of two convex functionals (DC) to solve a broader
class of phase retrieval problems. The approach splits a standard nonlinear
least squares minimizing function associated with the phase retrieval problem
into the difference of two convex functions and then solves a sequence of
convex minimization sub-problems. For each subproblem, the Nesterov's
accelerated gradient descent algorithm or the Barzilai-Borwein (BB) algorithm
is used. In the setting of sparse phase retrieval, a standard $\ell_1$ norm
term is added into the minimization mentioned above. The subproblem is
approximated by a proximal gradient method which is solved by the
shrinkage-threshold technique directly without iterations. In addition, a
modified Attouch-Peypouquet technique is used to accelerate the iterative
computation. These lead to more effective algorithms than the Wirtinger flow
(WF) algorithm and the Gauss-Newton (GN) algorithm and etc.. A convergence
analysis of both DC based algorithms shows that the iterative solutions is
convergent linearly to a critical point and will be closer to a global
minimizer than the given initial starting point. Our study is a deterministic
analysis while the study for the Wirtinger flow (WF) algorithm and its
variants, the Gauss-Newton (GN) algorithm, the trust region algorithm is based
on the probability analysis. In particular, the DC based algorithms are able to
retrieve solutions using a number $m$ of measurements which is about twice of
the number $n$ of entries in the solution with high frequency of successes.
When $m\approx n$, the $\ell_1$ DC based algorithm is able to retrieve sparse
signals.
Authors' comments: 28 pages
Razane Tajeddine, Antonia Wachter-Zeh, Camilla Hollanti
In this paper, the problem of providing privacy to users requesting data over a network from a distributed storage system (DSS) is considered. The DSS, which is considered as the multi-terminal destination of the network from the user's perspective, is encoded by a maximum rank distance (MRD) code to store the data on these multiple servers. A private information retrieval (PIR) scheme ensures that a user can request a file without revealing any information on which file is being requested to any of the servers. In this paper, a novel PIR scheme is proposed, allowing the user to recover a file from a storage system with low communication cost, while allowing some servers in the system to collude in the quest of revealing the identity of the requested file. The network is modeled as a random linear network, i.e., all nodes of the network forward random (unknown) linear combinations of incoming packets. Both error-free and erroneous random linear networks are considered.
David Semedo, João Magalhães
Multimedia information have strong temporal correlations that shape the way
modalities co-occur over time. In this paper we study the dynamic nature of
multimedia and social-media information, where the temporal dimension emerges
as a strong source of evidence for learning the temporal correlations across
visual and textual modalities. So far, cross-media retrieval models, explored
the correlations between different modalities (e.g. text and image) to learn a
common subspace, in which semantically similar instances lie in the same
neighbourhood. Building on such knowledge, we propose a novel temporal
cross-media neural architecture, that departs from standard cross-media
methods, by explicitly accounting for the temporal dimension through temporal
subspace learning. The model is softly-constrained with temporal and
inter-modality constraints that guide the new subspace learning task by
favouring temporal correlations between semantically similar and temporally
close instances. Experiments on three distinct datasets show that accounting
for time turns out to be important for cross-media retrieval. Namely, the
proposed method outperforms a set of baselines on the task of temporal
cross-media retrieval, demonstrating its effectiveness for performing temporal
subspace learning.
Authors' comments: To appear in ACM MM 2018
Jianfeng Dong, Xirong Li, Chaoxi Xu, Shouling Ji, Yuan He, Gang Yang, Xun Wang
This paper attacks the challenging problem of zero-example video retrieval.
In such a retrieval paradigm, an end user searches for unlabeled videos by
ad-hoc queries described in natural language text with no visual example
provided. Given videos as sequences of frames and queries as sequences of
words, an effective sequence-to-sequence cross-modal matching is required. The
majority of existing methods are concept based, extracting relevant concepts
from queries and videos and accordingly establishing associations between the
two modalities. In contrast, this paper takes a concept-free approach,
proposing a dual deep encoding network that encodes videos and queries into
powerful dense representations of their own. Dual encoding is conceptually
simple, practically effective and end-to-end. As experiments on three
benchmarks, i.e. MSR-VTT, TRECVID 2016 and 2017 Ad-hoc Video Search show, the
proposed solution establishes a new state-of-the-art for zero-example video
retrieval.
Authors' comments: Accepted by CVPR 2019. Code and data are available at
https://github.com/danieljf24/dual_encoding
Elham Ashoori, Terry Rudolph
There have been suggestions within the Information Retrieval (IR) community that quantum mechanics (QM) can be used to help formalise the foundations of IR. The invoked connection to QM is mathematical rather than physical. The proposed ideas are concerned with information which is encoded, processed and accessed in classical computers. However, some of the suggestions have been thoroughly muddled with questions about applying techniques of quantum information theory in IR, and it is often unclear whether or not the suggestion is to perform actual quantum information processing on the information. This paper is an attempt to provide some conceptual clarity on the emerging issues.
Giorgos Kordopatis-Zilos, Symeon Papadopoulos, Ioannis Patras, Ioannis Kompatsiaris
This paper introduces the problem of Fine-grained Incident Video Retrieval (FIVR). Given a query video, the objective is to retrieve all associated videos, considering several types of associations that range from duplicate videos to videos from the same incident. FIVR offers a single framework that contains several retrieval tasks as special cases. To address the benchmarking needs of all such tasks, we construct and present a large-scale annotated video dataset, which we call FIVR-200K, and it comprises 225,960 videos. To create the dataset, we devise a process for the collection of YouTube videos based on major news events from recent years crawled from Wikipedia and deploy a retrieval pipeline for the automatic selection of query videos based on their estimated suitability as benchmarks. We also devise a protocol for the annotation of the dataset with respect to the four types of video associations defined by FIVR. Finally, we report the results of an experimental study on the dataset comparing five state-of-the-art methods developed based on a variety of visual descriptors, highlighting the challenges of the current problem.
Alexander Barnett, Charles L. Epstein, Leslie Greengard, Jeremy Magland
One of the most powerful approaches to imaging at the nanometer or
subnanometer length scale is coherent diffraction imaging using X-ray sources.
For amorphous (non-crystalline) samples, the raw data can be interpreted as the
modulus of the continuous Fourier transform of the unknown object. Making use
of prior information about the sample (such as its support), a natural goal is
to recover the phase through computational means, after which the unknown
object can be visualized at high resolution. While many algorithms have been
proposed for this phase retrieval problem, careful analysis of its
well-posedness has received relatively little attention. In this paper, we show
that the problem is, in general, not well-posed and describe some of the
underlying issues that are responsible for the ill-posedness. We then show how
this analysis can be used to develop experimental protocols that lead to better
conditioned inverse problems.
Authors' comments: This is a substantial revision of version 1
Bernhard Kratzwald, Stefan Feuerriegel
State-of-the-art systems in deep question answering proceed as follows: (1)
an initial document retrieval selects relevant documents, which (2) are then
processed by a neural network in order to extract the final answer. Yet the
exact interplay between both components is poorly understood, especially
concerning the number of candidate documents that should be retrieved. We show
that choosing a static number of documents -- as used in prior research --
suffers from a noise-information trade-off and yields suboptimal results. As a
remedy, we propose an adaptive document retrieval model. This learns the
optimal candidate number for document retrieval, conditional on the size of the
corpus and the query. We report extensive experimental results showing that our
adaptive approach outperforms state-of-the-art methods on multiple benchmark
datasets, as well as in the context of corpora with variable sizes.
Authors' comments: EMNLP 2018
Behrooz Tahmasebi, Mohammad Ali Maddah-Ali, Seyed Abolfazl Motahari
The objective of a genome-wide association study (GWAS) is to associate subsequences of individuals' genomes to the observable characteristics called phenotypes (e.g., high blood pressure). Motivated by the GWAS problem, in this paper we introduce the information-theoretic problem of \emph{associated subsequence retrieval}, where a dataset of $N$ (possibly high-dimensional) sequences of length $G$, and their corresponding observable (binary) characteristics is given. The sequences are chosen independently and uniformly at random from $\mathcal{X}^G$, where $\mathcal{X}$ is a finite alphabet. The observable (binary) characteristic is only related to a specific unknown subsequence of length $L$ of the sequences, called \textit{associated subsequence}. For each sequence, if the associated subsequence of it belongs to a universal finite set, then it is more likely to display the observable characteristic (i.e., it is more likely that the observable characteristic is one). The goal is to retrieve the associated subsequence using a dataset of $N$ sequences and their observable characteristics. We demonstrate that as the parameters $N$, $G$, and $L$ grow, a threshold effect appears in the curve of probability of error versus the rate which is defined as ${Gh(L/G)}/{N}$, where $h(\cdot)$ is the binary entropy function. This effect allows us to define the capacity of associated subsequence retrieval. We develop an achievable scheme and a matching converse for this problem, and thus characterize its capacity in two scenarios: the zero-error-rate and the $\epsilon$-error-rate.
Bochen Guan, Hanrong Ye, Hong Liu, William A. Sethares
Estimation of the frequency and duration of logos in videos is important and
challenging in the advertisement industry as a way of estimating the impact of
ad purchases. Since logos occupy only a small area in the videos, the popular
methods of image retrieval could fail. This paper develops an algorithm called
Video Logo Retrieval (VLR), which is an image-to-video retrieval algorithm
based on the spatial distribution of local image descriptors that measure the
distance between the query image (the logo) and a collection of video images.
VLR uses local features to overcome the weakness of global feature-based models
such as convolutional neural networks (CNN). Meanwhile, VLR is flexible and
does not require training after setting some hyper-parameters. The performance
of VLR is evaluated on two challenging open benchmark tasks (SoccerNet and
Standford I2V), and compared with other state-of-the-art logo retrieval or
detection algorithms. Overall, VLR shows significantly higher accuracy compared
with the existing methods.
Authors' comments: Accepted by ICIP 20. Contact author: Bochen Guan (gbochen@wisc.edu)
Gilles Baechler, Miranda Kreković, Juri Ranieri, Amina Chebira, Yue M. Lu, Martin Vetterli
In a variety of fields, in particular those involving imaging and optics, we often measure signals whose phase is missing or has been irremediably distorted. Phase retrieval attempts to recover the phase information of a signal from the magnitude of its Fourier transform to enable the reconstruction of the original signal. Solving the phase retrieval problem is equivalent to recovering a signal from its auto-correlation function. In this paper, we assume the original signal to be sparse; this is a natural assumption in many applications, such as X-ray crystallography, speckle imaging and blind channel estimation. We propose an algorithm that resolves the phase retrieval problem in three stages: i) we leverage the finite rate of innovation sampling theory to super-resolve the auto-correlation function from a limited number of samples, ii) we design a greedy algorithm that identifies the locations of a sparse solution given the super-resolved auto-correlation function, iii) we recover the amplitudes of the atoms given their locations and the measured auto-correlation function. Unlike traditional approaches that recover a discrete approximation of the underlying signal, our algorithm estimates the signal on a continuous domain, which makes it the first of its kind. Along with the algorithm, we derive its performance bound with a theoretical analysis and propose a set of enhancements to improve its computational complexity and noise resilience. Finally, we demonstrate the benefits of the proposed method via a comparison against Charge Flipping, a notable algorithm in crystallography.
Rafael Valle
This paper describes computational methods for the visual display and analysis of music information. We provide a concise description of software, music descriptors and data visualization techniques commonly used in music information retrieval. Finally, we provide use cases where the described software, descriptors and visualizations are showcased.
Samuel Pinilla, Jorge Bacca, Henry Arguello
Phase retrieval (PR) is an ill-conditioned inverse problem which can be found in various science and engineering applications. Assuming sparse priority over the signal of interest, recent algorithms have been developed to solve the phase retrieval problem. Some examples include SparseAltMinPhase (SAMP), Sparse Wirtinger flow (SWF) and Sparse Truncated Amplitude flow (SPARTA). However, the optimization cost functions of the mentioned algorithms are non-convex and non-smooth. In order to fix the non-smoothness of the cost function, the SPARTA method uses truncation thresholds to calculate a truncated step update direction. In practice, the truncation procedure requires calculating more parameters to obtain a desired performance in the phase recovery. Therefore, this paper proposes an algorithm called SPRSF (Sparse Phase retrieval via Smoothing Function) to solve the sparse PR problem by introducing a smoothing function. SPRSF is an iterative algorithm where the update step is obtained by a hard thresholding over a gradient descent direction. Theoretical analyses show that the smoothing function uniformly approximates the non-convex and non-smooth sparse PR optimization problem. Moreover, SPRSF does not require the truncation procedure used in SPARTA. Numerical tests demonstrate that SPRSF performs better than state-of-the-art methods, especially when there is no knowledge about the sparsity $k$. In particular, SPRSF attains a higher mean recovery rate in comparison with SPARTA, SAMP and SWF methods, when the sparsity varies for the real and complex cases. Further, in terms of the sampling complexity, the SPRSF method outperforms its competitive alternatives.
Paul Hand, Oscar Leong, Vladislav Voroninski
The phase retrieval problem asks to recover a natural signal $y_0 \in
\mathbb{R}^n$ from $m$ quadratic observations, where $m$ is to be minimized. As
is common in many imaging problems, natural signals are considered sparse with
respect to a known basis, and the generic sparsity prior is enforced via
$\ell_1$ regularization. While successful in the realm of linear inverse
problems, such $\ell_1$ methods have encountered possibly fundamental
limitations, as no computationally efficient algorithm for phase retrieval of a
$k$-sparse signal has been proven to succeed with fewer than $O(k^2\log n)$
generic measurements, exceeding the theoretical optimum of $O(k \log n)$. In
this paper, we propose a novel framework for phase retrieval by 1) modeling
natural signals as being in the range of a deep generative neural network $G :
\mathbb{R}^k \rightarrow \mathbb{R}^n$ and 2) enforcing this prior directly by
optimizing an empirical risk objective over the domain of the generator. Our
formulation has provably favorable global geometry for gradient methods, as
soon as $m = O(kd^2\log n)$, where $d$ is the depth of the network.
Specifically, when suitable deterministic conditions on the generator and
measurement matrix are met, we construct a descent direction for any point
outside of a small neighborhood around the unique global minimizer and its
negative multiple, and show that such conditions hold with high probability
under Gaussian ensembles of multilayer fully-connected generator networks and
measurement matrices. This formulation for structured phase retrieval thus has
two advantages over sparsity based methods: 1) deep generative priors can more
tightly represent natural signals and 2) information theoretically optimal
sample complexity. We corroborate these results with experiments showing that
exploiting generative models in phase retrieval tasks outperforms sparse phase
retrieval methods.
Authors' comments: 28 pages, 5 figures
Manos Schinas, Symeon Papadopoulos, Yiannis Kompatsiaris, Pericles Mitkas
In the recent years, we have witnessed the rapid adoption of social media platforms, such as Twitter, Facebook and YouTube, and their use as part of the everyday life of billions of people worldwide. Given the habit of people to use these platforms to share thoughts, daily activities and experiences it is not surprising that the amount of user generated content has reached unprecedented levels, with a substantial part of that content being related to real-world events, i.e. actions or occurrences taking place at a certain time and location. Given the key role of events in our life, the task of annotating and organizing social media content around them is of crucial importance for ensuring real-time and future access to multimedia content about an event of interest. In this chapter, we present several research efforts from recent years that tackle two main problems: a) event detection and b) event-based media retrieval and summarization. Given archived collections or live streams of social media items, the purpose of event detection methods is to identify previously unknown events in the form of sets of items that describe them. In general, the events could be of any type, but there are also approaches aiming at events of specific type. Given a target event the goal of event summarization is first to identify relevant content and then to represent it in a concise way, selecting the most appealing and representative content.
Zehong Cao, Mukesh Prasad, M. Tanveer, Chin-Teng Lin
Prior studies have proposed methods to recover multi-channel electroencephalography (EEG) signal ensembles from their partially sampled entries. These methods depend on spatial scenarios, yet few approaches aiming to a temporal reconstruction with lower loss. The goal of this study is to retrieve the temporal EEG signals independently which was overlooked in data pre-processing. We considered EEG signals are impinging on tensor-based approach, named nonlinear Canonical Polyadic Decomposition (CPD). In this study, we collected EEG signals during a resting-state task. Then, we defined that the source signals are original EEG signals and the generated tensor is perturbed by Gaussian noise with a signal-to-noise ratio of 0 dB. The sources are separated using a basic non-negative CPD and the relative errors on the estimates of the factor matrices. Comparing the similarities between the source signals and their recovered versions, the results showed significantly high correlation over 95%. Our findings reveal the possibility of recoverable temporal signals in EEG applications.
Rawad Bitar, Salim El Rouayheb
We consider the problem of designing private information retrieval (PIR)
schemes on data of $m$ files replicated on $n$ servers that can possibly
collude. We focus on devising robust PIR schemes that can tolerate stragglers,
i.e., slow or unresponsive servers. In many settings, the number of stragglers
is not known a priori or may change with time. We define universally robust PIR
as schemes that achieve PIR capacity asymptotically in $m$ and simultaneously
for any number of stragglers up to a given threshold. We introduce
Staircase-PIR schemes and prove that they are universally robust. Towards that
end, we establish an equivalence between robust PIR and communication efficient
secret sharing.
Authors' comments: Extended version of a paper accepted in IEEE Information Theory
workshop 2018
Ali Ahmed, Alireza Aghasi, Paul Hand
We consider the task of recovering two real or complex $m$-vectors from phaseless Fourier measurements of their circular convolution. Our method is a novel convex relaxation that is based on a lifted matrix recovery formulation that allows a nontrivial convex relaxation of the bilinear measurements from convolution. We prove that if the two signals belong to known random subspaces of dimensions $k$ and $n$, then they can be recovered up to the inherent scaling ambiguity with $m >> (k+n) \log^2 m$ phaseless measurements. Our method provides the first theoretical recovery guarantee for this problem by a computationally efficient algorithm and does not require a solution estimate to be computed for initialization. Our proof is based Rademacher complexity estimates. Additionally, we provide an ADMM implementation of the method and provide numerical experiments that verify the theory.
Rima Alaifari, Philipp Grohs
The problem of reconstructing a function from the magnitudes of its frame
coefficients has recently been shown to be never uniformly stable in
infinite-dimensional spaces [5]. This result also holds for frames that are
possibly continuous [2]. On the other hand, the problem is always stable in
finite-dimensional settings. A prominent example of such a phase retrieval
problem is the recovery of a signal from the modulus of its Gabor transform. In
this paper, we study Gabor phase retrieval and ask how the stability degrades
on a natural family of finite-dimensional subspaces of the signal domain
$L^2(\mathbb{R})$. We prove that the stability constant scales at least
quadratically exponentially in the dimension of the subspaces. Our construction
also shows that typical priors such as sparsity or smoothness promoting
penalties do not constitute regularization terms for phase retrieval.
Authors' comments: 17 pages