Sreyasi Nag Chowdhury, Mateusz Malinowski, Andreas Bulling, Mario Fritz
The widespread integration of cameras in hand-held and head-worn devices as
well as the ability to share content online enables a large and diverse visual
capture of the world that millions of users build up collectively every day. We
envision these images as well as associated meta information, such as GPS
coordinates and timestamps, to form a collective visual memory that can be
queried while automatically taking the ever-changing context of mobile users
into account. As a first step towards this vision, in this work we present
Xplore-M-Ego: a novel media retrieval system that allows users to query a
dynamic database of images and videos using spatio-temporal natural language
queries. We evaluate our system using a new dataset of real user queries as
well as through a usability study. One key finding is that there is a
considerable amount of inter-user variability, for example in the resolution of
spatial relations in natural language utterances. We show that our retrieval
system can cope with this variability using personalisation through an online
learning-based retrieval formulation.
Authors' comments: 8 pages, 9 figures, 1 table
Andreas M. Tillmann, Yonina C. Eldar, Julien Mairal
We propose a new algorithm to learn a dictionary for reconstructing and sparsely encoding signals from measurements without phase. Specifically, we consider the task of estimating a two-dimensional image from squared-magnitude measurements of a complex-valued linear transformation of the original image. Several recent phase retrieval algorithms exploit underlying sparsity of the unknown signal in order to improve recovery performance. In this work, we consider such a sparse signal prior in the context of phase retrieval, when the sparsifying dictionary is not known in advance. Our algorithm jointly reconstructs the unknown signal - possibly corrupted by noise - and learns a dictionary such that each patch of the estimated image can be sparsely represented. Numerical experiments demonstrate that our approach can obtain significantly better reconstructions for phase retrieval problems with noise than methods that cannot exploit such "hidden" sparsity. Moreover, on the theoretical side, we provide a convergence result for our method.
Hua Sun, Syed A. Jafar
Blind interference alignment (BIA) refers to interference alignment schemes that are designed only based on channel coherence pattern knowledge at the transmitters (the "blind" transmitters do not know the exact channel values). Private information retrieval (PIR) refers to the problem where a user retrieves one out of K messages from N non-communicating databases (each holds all K messages) without revealing anything about the identity of the desired message index to any individual database. In this paper, we identify an intriguing connection between PIR and BIA. Inspired by this connection, we characterize the information theoretic optimal download cost of PIR, when we have K = 2 messages and the number of databases, N, is arbitrary.
Jameson Cahill, Peter G. Casazza, Ingrid Daubechies
The main result of this paper states that phase retrieval in infinite-dimensional Hilbert spaces is never uniformly stable, in sharp contrast to the finite dimensional setting in which phase retrieval is always stable. This leads us to derive stability results for signals depending on how well they are approximated by finite expansions.
Marc Sloan, Jun Wang
Theoretical frameworks like the Probability Ranking Principle and its more
recent Interactive Information Retrieval variant have guided the development of
ranking and retrieval algorithms for decades, yet they are not capable of
helping us model problems in Dynamic Information Retrieval which exhibit the
following three properties; an observable user signal, retrieval over multiple
stages and an overall search intent. In this paper a new theoretical framework
for retrieval in these scenarios is proposed. We derive a general dynamic
utility function for optimizing over these types of tasks, that takes into
account the utility of each stage and the probability of observing user
feedback. We apply our framework to experiments over TREC data in the dynamic
multi page search scenario as a practical demonstration of its effectiveness
and to frame the discussion of its use, its limitations and to compare it
against the existing frameworks.
Authors' comments: ACM SIGIR International Conference on the Theory of Information
Retrieval (ICTIR), 10 pages, 4 figures, 2 algorithms, 3 tables. in
Proceedings of the 2015 International Conference on The Theory of Information
Retrieval
Tao Lei, Hrishikesh Joshi, Regina Barzilay, Tommi Jaakkola, Katerina Tymoshenko, Alessandro Moschitti, Lluis Marquez
Question answering forums are rapidly growing in size with no effective
automated ability to refer to and reuse answers already available for previous
posted questions. In this paper, we develop a methodology for finding
semantically related questions. The task is difficult since 1) key pieces of
information are often buried in extraneous details in the question body and 2)
available annotations on similar questions are scarce and fragmented. We design
a recurrent and convolutional model (gated convolution) to effectively map
questions to their semantic representations. The models are pre-trained within
an encoder-decoder framework (from body to title) on the basis of the entire
raw corpus, and fine-tuned discriminatively from limited annotations. Our
evaluation demonstrates that our model yields substantial gains over a standard
IR baseline and various neural network architectures (including CNNs, LSTMs and
GRUs).
Authors' comments: NAACL 2016
Philipp Walk, Henning Becker, Peter Jung
Pilot-aided channel estimation is nowadays a standard component in each
wireless receiver enabling coherent transmission of complex-valued
constellations, only affected by noise and interference. Whenever these
disturbances are sufficiently small and long data frames are used, high data
rates can be achieved and the resource overhead due to the pilots vanishes
asymptotically. On the other, it is expected that for the next generation of
mobile networks not only data rate is in the main focus but also low latency,
short and sporadic messages, massive connectivity, distributed & adhoc
processing and robustness with respect to asynchronism. Therefore a review of
several well-established principles in communication has been started already.
A particular implication when using complex-valued pilots is that these values
have to be known at the receiver and therefore these resources can not be used
simultaneously for user data. For an OFDM-like multicarrier scheme this means
that pilot tones (usually placed equidistantly according to the Nyquist
theorem) are allocated with globally known amplitudes and phases to reconstruct
the channel impulse response. Phases are designed and allocated globally which
is in contrast to a distributed infrastructure. In this work we present
therefore a new phaseless pilot scheme where only pilot amplitudes need to be
known at the receiver, i.e., phases are available again and can be used for
various other purposes. The idea is based on a phase retrieval result for
symmetrized and zero-padded magnitude Fourier measurements obtained by two of
the authors. The phases on the pilot tones can now be used to carry additional
user-specific data or compensate for other signal characteristics, like the
PAPR.
Authors' comments: 8 pages, 2 figures, presented on Asilomar 2015
Tianyu Qiu, Prabhu Babu, Daniel P. Palomar
This paper considers the phase retrieval problem in which measurements consist of only the magnitude of several linear measurements of the unknown, e.g., spectral components of a time sequence. We develop low-complexity algorithms with superior performance based on the majorization-minimization (MM) framework. The proposed algorithms are referred to as PRIME: Phase Retrieval vIa the Majorization-minimization techniquE. They are preferred to existing benchmark methods since at each iteration a simple surrogate problem is solved with a closed-form solution that monotonically decreases the original objective function. In total, four algorithms are proposed using different majorization-minimization techniques. Experimental results validate that our algorithms outperform existing methods in terms of successful recovery and mean square error under various settings.
Shelby Kimmel, Yi-Kai Liu
We consider a variant of the phase retrieval problem, where vectors are
replaced by unitary matrices, i.e., the unknown signal is a unitary matrix U,
and the measurements consist of squared inner products |Tr(C*U)|^2 with unitary
matrices C that are chosen by the observer. This problem has applications to
quantum process tomography, when the unknown process is a unitary operation.
We show that PhaseLift, a convex programming algorithm for phase retrieval,
can be adapted to this matrix setting, using measurements that are sampled from
unitary 4- and 2-designs. In the case of unitary 4-design measurements, we show
that PhaseLift can reconstruct all unitary matrices, using a near-optimal
number of measurements. This extends previous work on PhaseLift using spherical
4-designs.
In the case of unitary 2-design measurements, we show that PhaseLift still
works pretty well on average: it recovers almost all signals, up to a constant
additive error, using a near-optimal number of measurements. These 2-design
measurements are convenient for quantum process tomography, as they can be
implemented via randomized benchmarking techniques. This is the first positive
result on PhaseLift using 2-designs.
Authors' comments: 21 pages; v3: minor revisions, to appear at SampTA 2017; v2:
rewritten to focus on phase retrieval, with new title, improved error bounds,
and numerics; v1: original version, titled "Quantum Compressed Sensing Using
2-Designs"
Kishore Jaganathan, Yonina C. Eldar, Babak Hassibi
The problem of phase retrieval is a classic one in optics and arises when one is interested in recovering an unknown signal from the magnitude (intensity) of its Fourier transform. While there have existed quite a few approaches to phase retrieval, recent developments in compressed sensing and convex optimization-based signal recovery have inspired a host of new ones. This work presents an overview of these approaches. Since phase retrieval, by its very nature, is ill-posed, to make the problem meaningful one needs to either assume prior structure on the signal (e.g., sparsity) or obtain additional measurements (e.g., masks, structured illuminations). For both the cases, we review conditions for the identifiability of the signal, as well as practical algorithms for signal recovery. In particular, we demonstrate that it is possible to robustly and efficiently identify an unknown signal solely from phaseless Fourier measurements, a fact with potentially far-reaching implications.
Artem Babenko, Victor Lempitsky
Several recent works have shown that image descriptors produced by deep
convolutional neural networks provide state-of-the-art performance for image
classification and retrieval problems. It has also been shown that the
activations from the convolutional layers can be interpreted as local features
describing particular image regions. These local features can be aggregated
using aggregation approaches developed for local features (e.g. Fisher
vectors), thus providing new powerful global descriptors.
In this paper we investigate possible ways to aggregate local deep features
to produce compact global descriptors for image retrieval. First, we show that
deep features and traditional hand-engineered features have quite different
distributions of pairwise similarities, hence existing aggregation methods have
to be carefully re-evaluated. Such re-evaluation reveals that in contrast to
shallow features, the simple aggregation method based on sum pooling provides
arguably the best performance for deep convolutional features. This method is
efficient, has few parameters, and bears little risk of overfitting when e.g.
learning the PCA matrix. Overall, the new compact global descriptor improves
the state-of-the-art on four common benchmarks considerably.
Authors' comments: accepted for ICCV 2015
Yusuke Matsui, Kota Ito, Yuji Aramaki, Toshihiko Yamasaki, Kiyoharu Aizawa
Manga (Japanese comics) are popular worldwide. However, current e-manga
archives offer very limited search support, including keyword-based search by
title or author, or tag-based categorization. To make the manga search
experience more intuitive, efficient, and enjoyable, we propose a content-based
manga retrieval system. First, we propose a manga-specific image-describing
framework. It consists of efficient margin labeling, edge orientation histogram
feature description, and approximate nearest-neighbor search using product
quantization. Second, we propose a sketch-based interface as a natural way to
interact with manga content. The interface provides sketch-based querying,
relevance feedback, and query retouch. For evaluation, we built a novel dataset
of manga images, Manga109, which consists of 109 comic books of 21,142 pages
drawn by professional manga artists. To the best of our knowledge, Manga109 is
currently the biggest dataset of manga images available for research. We
conducted a comparative study, a localization evaluation, and a large-scale
qualitative study. From the experiments, we verified that: (1) the retrieval
accuracy of the proposed method is higher than those of previous methods; (2)
the proposed method can localize an object instance with reasonable runtime and
accuracy; and (3) sketch querying is useful for manga search.
Authors' comments: 13 pages
Çağkan Yapar, Volker Pohl, Holger Boche
This contribution proposes a two stage strategy to allow for phase retrieval
in state of the art sub-Nyquist sampling schemes for sparse multiband signals.
The proposed strategy is based on data acquisition via modulated wideband
converters known from sub-Nyquist sampling. This paper describes how the
modulators have to be modified such that signal recovery from sub-Nyquist
amplitude samples becomes possible and a corresponding recovery algorithm is
given which is computational efficient. In addition, the proposed strategy is
fairly general, allowing for several constructions and recovery algorithms.
Authors' comments: Submitted to ICASSP 2016
Vidyadhar Rao, Prateek Jain, C. V Jawahar
Typical retrieval systems have three requirements: a) Accurate retrieval
i.e., the method should have high precision, b) Diverse retrieval, i.e., the
obtained set of points should be diverse, c) Retrieval time should be small.
However, most of the existing methods address only one or two of the above
mentioned requirements. In this work, we present a method based on randomized
locality sensitive hashing which tries to address all of the above requirements
simultaneously. While earlier hashing approaches considered approximate
retrieval to be acceptable only for the sake of efficiency, we argue that one
can further exploit approximate retrieval to provide impressive trade-offs
between accuracy and diversity. We extend our method to the problem of
multi-label prediction, where the goal is to output a diverse and accurate set
of labels for a given document in real-time. Moreover, we introduce a new
notion to simultaneously evaluate a method's performance for both the precision
and diversity measures. Finally, we present empirical results on several
different retrieval tasks and show that our method retrieves diverse and
accurate images/labels while ensuring $100x$-speed-up over the existing diverse
retrieval approaches.
Authors' comments: 10 pages
Gerard t Hooft
The mechanism by which black holes return the absorbed information to the
outside world is reconsidered, and described in terms of a set of mutually
non-interacting modes. Our mechanism is based on the mostly classical
gravitational back-reaction. The diagonalized formalism is particularly useful
for further studies of this process. Although no use is made of string theory,
our analysis appears to point towards an ensuing string-like interaction. It is
shown how black hole entropy can be traced down to classical gravitational
back-reaction.
Authors' comments: 10 pages, no figures
Jinma Guo, Jianmin Li
Along with data on the web increasing dramatically, hashing is becoming more
and more popular as a method of approximate nearest neighbor search. Previous
supervised hashing methods utilized similarity/dissimilarity matrix to get
semantic information. But the matrix is not easy to construct for a new
dataset. Rather than to reconstruct the matrix, we proposed a straightforward
CNN-based hashing method, i.e. binarilizing the activations of a fully
connected layer with threshold 0 and taking the binary result as hash codes.
This method achieved the best performance on CIFAR-10 and was comparable with
the state-of-the-art on MNIST. And our experiments on CIFAR-10 suggested that
the signs of activations may carry more information than the relative values of
activations between samples, and that the co-adaption between feature extractor
and hash functions is important for hashing.
Authors' comments: 16 pages, 6 figures
Ingo P. Waldmann, Marco Rocchetto, Giovanna Tinetti, Emma J. Barton, Sergey N. Yurchenko, Jonathan Tennyson
Tau-REx (Tau Retrieval of Exoplanets) is a novel, fully Bayesian atmospheric
retrieval code custom built for extrasolar atmospheres. In Waldmann et al.
(2015) the transmission spectroscopic case was introduced, here we present the
emission spectroscopy spectral retrieval for the Tau-REx framework. Compared to
transmission spectroscopy, the emission case is often significantly more
degenerate due to the need to retrieve the full atmospheric
temperature-pressure (TP) profile. This is particularly true in the case of
current measurements of exoplanetary atmospheres, which are either of low
signal-to-noise, low spectral resolution or both. Here we present a new way of
combining two existing approaches to the modelling of the said TP profile: 1)
the parametric profile, where the atmospheric TP structure is analytically
approximated by a few model parameters, 2) the Layer-by-Layer approach, where
individual atmospheric layers are modelled. Both these approaches have distinct
advantages and disadvantages in terms of convergence properties and potential
model biases. The Tau-REx hybrid model presented here is a new two-stage TP
profile retrieval, which combines the robustness of the analytic solution with
the accuracy of the Layer-by-Layer approach. The retrieval process is
demonstrated using simulations of the hot-Jupiter WASP-76b and the hot
SuperEarth 55 Cnc e, as well as on the secondary eclipse measurements of
HD189733b.
Authors' comments: ApJ accepted
Christina Lioma, Jakob Grue Simonsen, Birger Larsen, Niels Dalum Hansen
Modelling term dependence in IR aims to identify co-occurring terms that are too heavily dependent on each other to be treated as a bag of words, and to adapt the indexing and ranking accordingly. Dependent terms are predominantly identified using lexical frequency statistics, assuming that (a) if terms co-occur often enough in some corpus, they are semantically dependent; (b) the more often they co-occur, the more semantically dependent they are. This assumption is not always correct: the frequency of co-occurring terms can be separate from the strength of their semantic dependence. E.g. "red tape" might be overall less frequent than "tape measure" in some corpus, but this does not mean that "red"+"tape" are less dependent than "tape"+"measure". This is especially the case for non-compositional phrases, i.e. phrases whose meaning cannot be composed from the individual meanings of their terms (such as the phrase "red tape" meaning bureaucracy). Motivated by this lack of distinction between the frequency and strength of term dependence in IR, we present a principled approach for handling term dependence in queries, using both lexical frequency and semantic evidence. We focus on non-compositional phrases, extending a recent unsupervised model for their detection [21] to IR. Our approach, integrated into ranking using Markov Random Fields [31], yields e?ectiveness gains over competitive TREC baselines, showing that there is still room for improvement in the very well-studied area of term dependence in IR.
Zehra Camlica, H. R. Tizhoosh, Farzad Khalvati
Content-based image retrieval (CBIR) of medical images is a crucial task that
can contribute to a more reliable diagnosis if applied to big data. Recent
advances in feature extraction and classification have enormously improved CBIR
results for digital images. However, considering the increasing accessibility
of big data in medical imaging, we are still in need of reducing both memory
requirements and computational expenses of image retrieval systems. This work
proposes to exclude the features of image blocks that exhibit a low encoding
error when learned by a $n/p/n$ autoencoder ($p\!<\!n$). We examine the
histogram of autoendcoding errors of image blocks for each image class to
facilitate the decision which image regions, or roughly what percentage of an
image perhaps, shall be declared relevant for the retrieval task. This leads to
reduction of feature dimensionality and speeds up the retrieval process. To
validate the proposed scheme, we employ local binary patterns (LBP) and support
vector machines (SVM) which are both well-established approaches in CBIR
research community. As well, we use IRMA dataset with 14,410 x-ray images as
test data. The results show that the dimensionality of annotated feature
vectors can be reduced by up to 50% resulting in speedups greater than 27% at
expense of less than 1% decrease in the accuracy of retrieval when validating
the precision and recall of the top 20 hits.
Authors' comments: To appear in proceedings of The 5th International Conference on Image
Processing Theory, Tools and Applications (IPTA'15), Nov 10-13, 2015,
Orleans, France
Leonardo A. Duarte, Otávio A. B. Penatti, Jurandy Almeida
Often, videos are composed of multiple concepts or even genres. For instance, news videos may contain sports, action, nature, etc. Therefore, encoding the distribution of such concepts/genres in a compact and effective representation is a challenging task. In this sense, we propose the Bag of Genres representation, which is based on a visual dictionary defined by a genre classifier. Each visual word corresponds to a region in the classification space. The Bag of Genres video vector contains a summary of the activations of each genre in the video content. We evaluate the proposed method for video genre retrieval using the dataset of MediaEval Tagging Task of 2012 and for video event retrieval using the EVVE dataset. Results show that the proposed method achieves results comparable or superior to state-of-the-art methods, with the advantage of providing a much more compact representation than existing features.