Shahriar Shariat, Vladimir Pavlovic
Traditional pairwise sequence alignment is based on matching individual samples from two sequences, under time monotonicity constraints. However, in many application settings matching subsequences (segments) instead of individual samples may bring in additional robustness to noise or local non-causal perturbations. This paper presents an approach to segmental sequence alignment that jointly segments and aligns two sequences, generalizing the traditional per-sample alignment. To accomplish this task, we introduce a distance metric between segments based on average pairwise distances and then present a modified pair-HMM (PHMM) that incorporates the proposed distance metric to solve the joint segmentation and alignment task. We also propose a relaxation to our model that improves the computational efficiency of the generic segmental PHMM. Our results demonstrate that this new measure of sequence similarity can lead to improved classification performance, while being resilient to noise, on a variety of sequence retrieval problems, from EEG to motion sequence classification.
Karim Banawan, Sennur Ulukus
We consider the problem of private information retrieval (PIR) over a
distributed storage system. The storage system consists of $N$ non-colluding
databases, each storing a coded version of $M$ messages. In the PIR problem,
the user wishes to retrieve one of the available messages without revealing the
message identity to any individual database. We derive the
information-theoretic capacity of this problem, which is defined as the maximum
number of bits of the desired message that can be privately retrieved per one
bit of downloaded information. We show that the PIR capacity in this case is
$C=\left(1+\frac{K}{N}+\frac{K^2}{N^2}+\cdots+\frac{K^{M-1}}{N^{M-1}}\right)^{-1}=(1+R_c+R_c^2+\cdots+R_c^{M-1})^{-1}=\frac{1-R_c}{1-R_c^M}$,
where $R_c$ is the rate of the $(N,K)$ code used. The capacity is a function of
the code rate and the number of messages only regardless of the explicit
structure of the storage code. The result implies a fundamental tradeoff
between the optimal retrieval cost and the storage cost. The result generalizes
the achievability and converse results for the classical PIR with replicating
databases to the case of coded databases.
Authors' comments: Submitted to IEEE Transactions on Information Theory, September 2016
Yen-Chen Wu, Tzu-Hsiang Lin, Yang-De Chen, Hung-Yi Lee, Lin-Shan Lee
User-machine interaction is important for spoken content retrieval. For text
content retrieval, the user can easily scan through and select on a list of
retrieved item. This is impossible for spoken content retrieval, because the
retrieved items are difficult to show on screen. Besides, due to the high
degree of uncertainty for speech recognition, the retrieval results can be very
noisy. One way to counter such difficulties is through user-machine
interaction. The machine can take different actions to interact with the user
to obtain better retrieval results before showing to the user. The suitable
actions depend on the retrieval status, for example requesting for extra
information from the user, returning a list of topics for user to select, etc.
In our previous work, some hand-crafted states estimated from the present
retrieval results are used to determine the proper actions. In this paper, we
propose to use Deep-Q-Learning techniques instead to determine the machine
actions for interactive spoken content retrieval. Deep-Q-Learning bypasses the
need for estimation of the hand-crafted states, and directly determine the best
action base on the present retrieval status even without any human knowledge.
It is shown to achieve significantly better performance compared with the
previous hand-crafted states.
Authors' comments: Accepted conference paper: "The Annual Conference of the
International Speech Communication Association (Interspeech), 2016"
Hamid R. Tizhoosh, Christopher Mitcheltree, Shujin Zhu, Shamak Dutta
Using content-based binary codes to tag digital images has emerged as a
promising retrieval technology. Recently, Radon barcodes (RBCs) have been
introduced as a new binary descriptor for image search. RBCs are generated by
binarization of Radon projections and by assembling them into a vector, namely
the barcode. A simple local thresholding has been suggested for binarization.
In this paper, we put forward the idea of "autoencoded Radon barcodes". Using
images in a training dataset, we autoencode Radon projections to perform
binarization on outputs of hidden layers. We employed the mini-batch stochastic
gradient descent approach for the training. Each hidden layer of the
autoencoder can produce a barcode using a threshold determined based on the
range of the logistic function used. The compressing capability of autoencoders
apparently reduces the redundancies inherent in Radon projections leading to
more accurate retrieval results. The IRMA dataset with 14,410 x-ray images is
used to validate the performance of the proposed method. The experimental
results, containing comparison with RBCs, SURF and BRISK, show that autoencoded
Radon barcode (ARBC) has the capacity to capture important information and to
learn richer representations resulting in lower retrieval errors for image
retrieval measured with the accuracy of the first hit only.
Authors' comments: o appear in proceedings of the 23rd International Conference on
Pattern Recognition (ICPR 2016), Cancun, Mexico, December 2016
Irène Waldspurger
We consider a phase retrieval problem, where we want to reconstruct a $n$-dimensional vector from its phaseless scalar products with $m$ sensing vectors. We assume the sensing vectors to be independently sampled from complex normal distributions. We propose to solve this problem with the classical non-convex method of alternating projections. We show that, when $m\geq Cn$ for $C$ large enough, alternating projections succeed with high probability, provided that they are carefully initialized. We also show that there is a regime in which the stagnation points of the alternating projections method disappear, and the initialization procedure becomes useless. However, in this regime, $m$ has to be of the order of $n^2$. Finally, we conjecture from our numerical experiments that, in the regime $m=O(n)$, there are stagnation points, but the size of their attraction basin is small if $m/n$ is large enough, so alternating projections can succeed with probability close to $1$ even with no special initialization.
Pavel Kliuiev, Tatiana Latychevskaia, Juerg Osterwalder, Matthias Hengsberger, Luca Castiglioni
Electronic wave functions of planar molecules can be reconstructed via
inverse Fourier transform of angle-resolved photoelectron spectroscopy (ARPES)
data, provided the phase of the electron wave in the detector plane is known.
Since the recorded intensity is proportional to the absolute square of the
Fourier transform of the initial state wave function, information about the
phase distribution is lost in the measurement. It was shown that the phase can
be retrieved in some cases by iterative algorithms using a priori information
about the object such as its size and symmetry. We suggest a more generalized
and robust approach for the reconstruction of molecular orbitals based on
state-of-the-art phase-retrieval algorithms currently used in coherent
diffraction imaging. We draw an analogy between the phase problem in molecular
orbital imaging by ARPES and of that in optical coherent diffraction imaging by
performing an optical analogue experiment on micrometer-sized structures. We
successfully reconstruct amplitude and phase of both the micrometer-sized
objects and a molecular orbital from the optical and photoelectron far-field
intensity distributions, respectively, without any prior information about the
shape of the objects.
Authors' comments: 13 pages, 5 figures
Aravind Sesagiri Raamkumar, Schubert Foo, Natalie Pang
Information retrieval (IR) and recommender systems (RS) have been employed
for addressing search tasks executed during literature review and the overall
scholarly communication lifecycle. Majority of the studies have concentrated on
algorithm design for improving the accuracy and usefulness of these systems.
Contextual elements related to the scholarly tasks have been largely ignored.
In this paper, we propose a framework called the Scientific Paper Recommender
and Retrieval Framework (SPRRF) that combines aspects of user role modeling and
user-interface features with IR/RS components. The framework is based on eight
emergent themes identified from participants feedback in a user evaluation
study conducted with a prototype assistive system. 119 researchers participated
in the study for evaluating the prototype system that provides recommendations
for two literature review and one manuscript writing tasks. This holistic
framework is meant to guide future studies in this area.
Authors' comments: Technical Report, 2 tables, 1 figure
Guanghan Ning, Zhi Zhang, Xiaobo Ren, Haohong Wang, Zhihai He
In this work, we propose a joint audio-video fingerprint Automatic Content
Recognition (ACR) technology for media retrieval. The problem is focused on how
to balance the query accuracy and the size of fingerprint, and how to allocate
the bits of the fingerprint to video frames and audio frames to achieve the
best query accuracy. By constructing a novel concept called Coverage, which is
highly correlated to the query accuracy, we are able to form a rate-coverage
model to translate the original problem into an optimization problem that can
be resolved by dynamic programming. To the best of our knowledge, this is the
first work that uses joint audio-video fingerprint ACR technology for media
retrieval with a theoretical problem formulation. Experimental results indicate
that compared to reference algorithms, the proposed method has up to 25% query
accuracy improvement while using 60% overall bit-rates, and 25% bit-rate
reduction while achieving 85% accuracy, and it significantly outperforms the
solution with single audio or video source fingerprint.
Authors' comments: 12 pages, 14 figures
Dennis J. Lee, Andrew M. Weiner
We perform quantitative phase imaging using phase retrieval to implement
synthetic aperture imaging. Compared to digital holography, the developed
technique is simpler, less expensive, and more stable.
Authors' comments: 2 pages, Conference on Lasers and Electro-optics: 2014, OSA Technical
Digest (Optical Society of America, 2014), paper JTu4A.75
Judit Bar-Ilan, Rob Koopman, Shenghui Wang, Andrea Scharnhorst, Marcus John, Philipp Mayr, Dietmar Wolfram
This panel brings together experts in bibliometrics and information retrieval
to discuss how each of these two important areas of information science can
help to inform the research of the other. There is a growing body of literature
that capitalizes on the synergies created by combining methodological
approaches of each to solve research problems and practical issues related to
how information is created, stored, organized, retrieved and used. The session
will begin with an overview of the common threads that exist between IR and
metrics, followed by a summary of findings from the BIR workshops and examples
of research projects that combine aspects of each area to benefit IR or metrics
research areas, including search results ranking, semantic indexing and
visualization. The panel will conclude with an engaging discussion with the
audience to identify future areas of research and collaboration.
Authors' comments: 4 pages, accepted at the 2016 Annual Meeting of the Association for
Information Science and Technology (ASIST 2016) in Copenhagen
Cristian Reyes, Eva Mohedano, Kevin McGuinness, Noel E. O'Connor, Xavier Giro-i-Nieto
This work presents a retrieval pipeline and evaluation scheme for the problem
of finding the last appearance of personal objects in a large dataset of images
captured from a wearable camera. Each personal object is modelled by a small
set of images that define a query for a visual search engine.The retrieved
results are reranked considering the temporal timestamps of the images to
increase the relevance of the later detections. Finally, a temporal
interleaving of the results is introduced for robustness against false
detections. The Mean Reciprocal Rank is proposed as a metric to evaluate this
problem. This application could help into developing personal assistants
capable of helping users when they do not remember where they left their
personal belongings.
Authors' comments: Lifelogging Tools and Applications Workshop (LTA'16) at ACM
Multimedia 2016
Bing Gao, Qiyu Sun, Yang Wang, Zhiqiang Xu
In this paper, we consider the phase retrieval problem in which one aims to
recover a signal from the magnitudes of affine measurements. Let $\{{\mathbf
a}_j\}_{j=1}^m \subset {\mathbb H}^d$ and ${\mathbf b}=(b_1, \ldots,
b_m)^\top\in{\mathbb H}^m$, where ${\mathbb H}={\mathbb R}$ or ${\mathbb C}$.
We say $\{{\mathbf a}_j\}_{j=1}^m$ and $\mathbf b$ are affine phase retrievable
for ${\mathbb H}^d$ if any ${\mathbf x}\in{\mathbb H}^d$ can be recovered from
the magnitudes of the affine measurements $\{|<{\mathbf a}_j,{\mathbf
x}>+b_j|,\, 1\leq j\leq m\}$. We develop general framework for affine phase
retrieval and prove necessary and sufficient conditions for $\{{\mathbf
a}_j\}_{j=1}^m$ and $\mathbf b$ to be affine phase retrievable. We establish
results on minimal measurements and generic measurements for affine phase
retrieval as well as on sparse affine phase retrieval. In particular, we also
highlight some notable differences between affine phase retrieval and the
standard phase retrieval in which one aims to recover a signal $\mathbf x$ from
the magnitudes of its linear measurements. In standard phase retrieval, one can
only recover $\mathbf x$ up to a unimodular constant, while affine phase
retrieval removes this ambiguity. We prove that unlike standard phase
retrieval, the affine phase retrievable measurements $\{{\mathbf
a}_j\}_{j=1}^m$ and $\mathbf b$ do not form an open set in ${\mathbb
H}^{m\times d}\times {\mathbb H}^m$. Also in the complex setting, the standard
phase retrieval requires $4d-O(\log_2d)$ measurements, while the affine phase
retrieval only needs $m=3d$ measurements.
Authors' comments: 17 pages
Liang Zheng, Yi Yang, Qi Tian
In the early days, content-based image retrieval (CBIR) was studied with
global features. Since 2003, image retrieval based on local descriptors (de
facto SIFT) has been extensively studied for over a decade due to the advantage
of SIFT in dealing with image transformations. Recently, image representations
based on the convolutional neural network (CNN) have attracted increasing
interest in the community and demonstrated impressive performance. Given this
time of rapid evolution, this article provides a comprehensive survey of
instance retrieval over the last decade. Two broad categories, SIFT-based and
CNN-based methods, are presented. For the former, according to the codebook
size, we organize the literature into using large/medium-sized/small codebooks.
For the latter, we discuss three lines of methods, i.e., using pre-trained or
fine-tuned CNN models, and hybrid methods. The first two perform a single-pass
of an image to the network, while the last category employs a patch-based
feature extraction scheme. This survey presents milestones in modern instance
retrieval, reviews a broad selection of previous works in different categories,
and provides insights on the connection between SIFT and CNN-based methods.
After analyzing and comparing retrieval performance of different categories on
several datasets, we discuss promising directions towards generic and
specialized instance retrieval.
Authors' comments: Accepted to IEEE Transactions on Pattern Analysis and Machine
Intelligence
Sonal Goel, Niharika Sachdeva, Ponnurangam Kumaraguru, A V Subramanyam, Divam Gupta
First responders are increasingly using social media to identify and reduce crime for well-being and safety of the society. Images shared on social media hurting religious, political, communal and other sentiments of people, often instigate violence and create law & order situations in society. This results in the need for first responders to inspect the spread of such images and users propagating them on social media. In this paper, we present a comparison between different hand-crafted features and a Convolutional Neural Network (CNN) model to retrieve similar images, which outperforms state-of-art hand-crafted features. We propose an Open-Source-Intelligent (OSINT) real-time image search system, robust to retrieve modified images that allows first responders to analyze the current spread of images, sentiments floating and details of users propagating such content. The system also aids officials to save time of manually analyzing the content by reducing the search space on an average by 67%.
Jian Zhang, Yuxin Peng
Hashing methods have been widely used for efficient similarity retrieval on
large scale image database. Traditional hashing methods learn hash functions to
generate binary codes from hand-crafted features, which achieve limited
accuracy since the hand-crafted features cannot optimally represent the image
content and preserve the semantic similarity. Recently, several deep hashing
methods have shown better performance because the deep architectures generate
more discriminative feature representations. However, these deep hashing
methods are mainly designed for supervised scenarios, which only exploit the
semantic similarity information, but ignore the underlying data structures. In
this paper, we propose the semi-supervised deep hashing (SSDH) approach, to
perform more effective hash function learning by simultaneously preserving
semantic similarity and underlying data structures. The main contributions are
as follows: (1) We propose a semi-supervised loss to jointly minimize the
empirical error on labeled data, as well as the embedding error on both labeled
and unlabeled data, which can preserve the semantic similarity and capture the
meaningful neighbors on the underlying data structures for effective hashing.
(2) A semi-supervised deep hashing network is designed to extensively exploit
both labeled and unlabeled data, in which we propose an online graph
construction method to benefit from the evolving deep features during training
to better capture semantic neighbors. To the best of our knowledge, the
proposed deep network is the first deep hashing method that can perform hash
code learning and feature learning simultaneously in a semi-supervised fashion.
Experimental results on 5 widely-used datasets show that our proposed approach
outperforms the state-of-the-art hashing methods.
Authors' comments: 14 pages, accepted by IEEE Transactions on Circuits and Systems for
Video Technology
Ji Li, Tie Zhou
Wavefront phase retrieval from a set of intensity measurements can be
formulated as an optimization problem. Two nonconvex objective models (MLP and
its variants LS) based on maximum likelihood estimation are investigated. We
develop numerical optimization algorithms for real-valued function of complex
variables and apply them to solve the wavefront phase retrieval problem
efficiently. Numerical simulation is given with application to three wavefront
phase retrieval problems. LS model shows better numerical performances than MLP
model. An explanation for this is that the distribution of the eigenvalues of
Hessian matrix of LS model is more clustered than MLP model. LBFGS shows more
robust performance and takes fewer calculations than other line search methods.
Authors' comments: 23 pages, 6 figures, submitting to Inverse Problems and Imaging
Ji Li, Tie Zhou
In this paper, we study the generalized phase retrieval problem: to recover a
signal $\bm{x}\in\mathbb{C}^n$ from the measurements $y_r=\lvert
\langle\bm{a}_r,\bm{x}\rangle\rvert^2$, $r=1,2,\ldots,m$. The problem can be
reformulated as a least-squares minimization problem. Although the cost
function is nonconvex, the global convergence of gradient descent algorithm
from a random initialization is studied, when $m$ is large enough. We improve
the known result of the local convergence from a spectral initialization. When
the signal $\bm{x}$ is real-valued, we prove that the cost function is local
convex near the solution $\{\pm\bm{x}\}$. To accelerate the gradient descent,
we review and apply several efficient line search methods. We also perform a
comparative numerical study of the line search methods and the alternative
projection method. Numerical simulations demonstrate the superior ability of
LBFGS algorithm than other algorithms.
Authors' comments: 14 pages, 14 figures
Le Dong, Xiuyuan Chen, Mengdie Mao, Qianni Zhang
This paper proposes a classification network to image semantic retrieval
(NIST) framework to counter the image retrieval challenge. Our approach
leverages the successful classification network GoogleNet based on
Convolutional Neural Networks to obtain the semantic feature matrix which
contains the serial number of classes and corresponding probabilities. Compared
with traditional image retrieval using feature matching to compute the
similarity between two images, NIST leverages the semantic information to
construct semantic feature matrix and uses the semantic distance algorithm to
compute the similarity. Besides, the fusion strategy can significantly reduce
storage and time consumption due to less classes participating in the last
semantic distance computation. Experiments demonstrate that our NIST framework
produces state-of-the-art results in retrieval experiments on MIRFLICKR-25K
dataset.
Authors' comments: 4 pages
Elaheh Mohammadi, Alireza Fallah, Farokh Marvasti
Consider a continuous signal that cannot be observed directly. Instead, one
has access to multiple corrupted versions of the signal. The available
corrupted signals are correlated because they carry information about the
common remote signal. The goal is to reconstruct the original signal from the
data collected from its corrupted versions. The information theoretic
formulation of the remote reconstruction problem assumes that the corrupted
signals are uniformly sampled and the focus is on optimal compression of the
samples. In this paper we revisit this problem from a sampling perspective. We
look at the problem of finding the best sampling locations for each signal to
minimize the total reconstruction distortion of the remote signal. In finding
the sampling locations, one can take advantage of the correlation among the
corrupted signals. Our main contribution is a fundamental lower bound on the
reconstruction distortion for any arbitrary nonuniform sampling strategy. This
lower bound is valid for any sampling rate. Furthermore, it is tight and
matches the optimal reconstruction distortion in low and high sampling rates.
Moreover, it is shown that in the low sampling rate region, it is optimal to
use a certain nonuniform sampling scheme on all the signals. On the other hand,
in the high sampling rate region, it is optimal to uniformly sample all the
signals. We also consider the problem of finding the optimal sampling locations
to recover the set of corrupted signals, rather than the remote signal. Unlike
the information theoretic formulation of the problem in which these two
problems were equivalent, we show that they are not equivalent in our setting.
Authors' comments: Under review
Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Xueqi Cheng
Deep neural networks have been successfully applied to many text matching
tasks, such as paraphrase identification, question answering, and machine
translation. Although ad-hoc retrieval can also be formalized as a text
matching task, few deep models have been tested on it. In this paper, we study
a state-of-the-art deep matching model, namely MatchPyramid, on the ad-hoc
retrieval task. The MatchPyramid model employs a convolutional neural network
over the interactions between query and document to produce the matching score.
We conducted extensive experiments to study the impact of different pooling
sizes, interaction functions and kernel sizes on the retrieval performance.
Finally, we show that the MatchPyramid models can significantly outperform
several recently introduced deep matching models on the retrieval task, but
still cannot compete with the traditional retrieval models, such as BM25 and
language models.
Authors' comments: Neu-IR '16 SIGIR Workshop on Neural Information Retrieval