Judit Bar-Ilan, Rob Koopman, Shenghui Wang, Andrea Scharnhorst, Marcus John, Philipp Mayr, Dietmar Wolfram
This panel brings together experts in bibliometrics and information retrieval
to discuss how each of these two important areas of information science can
help to inform the research of the other. There is a growing body of literature
that capitalizes on the synergies created by combining methodological
approaches of each to solve research problems and practical issues related to
how information is created, stored, organized, retrieved and used. The session
will begin with an overview of the common threads that exist between IR and
metrics, followed by a summary of findings from the BIR workshops and examples
of research projects that combine aspects of each area to benefit IR or metrics
research areas, including search results ranking, semantic indexing and
visualization. The panel will conclude with an engaging discussion with the
audience to identify future areas of research and collaboration.
Authors' comments: 4 pages, accepted at the 2016 Annual Meeting of the Association for
Information Science and Technology (ASIST 2016) in Copenhagen
Cristian Reyes, Eva Mohedano, Kevin McGuinness, Noel E. O'Connor, Xavier Giro-i-Nieto
This work presents a retrieval pipeline and evaluation scheme for the problem
of finding the last appearance of personal objects in a large dataset of images
captured from a wearable camera. Each personal object is modelled by a small
set of images that define a query for a visual search engine.The retrieved
results are reranked considering the temporal timestamps of the images to
increase the relevance of the later detections. Finally, a temporal
interleaving of the results is introduced for robustness against false
detections. The Mean Reciprocal Rank is proposed as a metric to evaluate this
problem. This application could help into developing personal assistants
capable of helping users when they do not remember where they left their
personal belongings.
Authors' comments: Lifelogging Tools and Applications Workshop (LTA'16) at ACM
Multimedia 2016
Bing Gao, Qiyu Sun, Yang Wang, Zhiqiang Xu
In this paper, we consider the phase retrieval problem in which one aims to
recover a signal from the magnitudes of affine measurements. Let $\{{\mathbf
a}_j\}_{j=1}^m \subset {\mathbb H}^d$ and ${\mathbf b}=(b_1, \ldots,
b_m)^\top\in{\mathbb H}^m$, where ${\mathbb H}={\mathbb R}$ or ${\mathbb C}$.
We say $\{{\mathbf a}_j\}_{j=1}^m$ and $\mathbf b$ are affine phase retrievable
for ${\mathbb H}^d$ if any ${\mathbf x}\in{\mathbb H}^d$ can be recovered from
the magnitudes of the affine measurements $\{|<{\mathbf a}_j,{\mathbf
x}>+b_j|,\, 1\leq j\leq m\}$. We develop general framework for affine phase
retrieval and prove necessary and sufficient conditions for $\{{\mathbf
a}_j\}_{j=1}^m$ and $\mathbf b$ to be affine phase retrievable. We establish
results on minimal measurements and generic measurements for affine phase
retrieval as well as on sparse affine phase retrieval. In particular, we also
highlight some notable differences between affine phase retrieval and the
standard phase retrieval in which one aims to recover a signal $\mathbf x$ from
the magnitudes of its linear measurements. In standard phase retrieval, one can
only recover $\mathbf x$ up to a unimodular constant, while affine phase
retrieval removes this ambiguity. We prove that unlike standard phase
retrieval, the affine phase retrievable measurements $\{{\mathbf
a}_j\}_{j=1}^m$ and $\mathbf b$ do not form an open set in ${\mathbb
H}^{m\times d}\times {\mathbb H}^m$. Also in the complex setting, the standard
phase retrieval requires $4d-O(\log_2d)$ measurements, while the affine phase
retrieval only needs $m=3d$ measurements.
Authors' comments: 17 pages
Liang Zheng, Yi Yang, Qi Tian
In the early days, content-based image retrieval (CBIR) was studied with
global features. Since 2003, image retrieval based on local descriptors (de
facto SIFT) has been extensively studied for over a decade due to the advantage
of SIFT in dealing with image transformations. Recently, image representations
based on the convolutional neural network (CNN) have attracted increasing
interest in the community and demonstrated impressive performance. Given this
time of rapid evolution, this article provides a comprehensive survey of
instance retrieval over the last decade. Two broad categories, SIFT-based and
CNN-based methods, are presented. For the former, according to the codebook
size, we organize the literature into using large/medium-sized/small codebooks.
For the latter, we discuss three lines of methods, i.e., using pre-trained or
fine-tuned CNN models, and hybrid methods. The first two perform a single-pass
of an image to the network, while the last category employs a patch-based
feature extraction scheme. This survey presents milestones in modern instance
retrieval, reviews a broad selection of previous works in different categories,
and provides insights on the connection between SIFT and CNN-based methods.
After analyzing and comparing retrieval performance of different categories on
several datasets, we discuss promising directions towards generic and
specialized instance retrieval.
Authors' comments: Accepted to IEEE Transactions on Pattern Analysis and Machine
Intelligence
Sonal Goel, Niharika Sachdeva, Ponnurangam Kumaraguru, A V Subramanyam, Divam Gupta
First responders are increasingly using social media to identify and reduce crime for well-being and safety of the society. Images shared on social media hurting religious, political, communal and other sentiments of people, often instigate violence and create law & order situations in society. This results in the need for first responders to inspect the spread of such images and users propagating them on social media. In this paper, we present a comparison between different hand-crafted features and a Convolutional Neural Network (CNN) model to retrieve similar images, which outperforms state-of-art hand-crafted features. We propose an Open-Source-Intelligent (OSINT) real-time image search system, robust to retrieve modified images that allows first responders to analyze the current spread of images, sentiments floating and details of users propagating such content. The system also aids officials to save time of manually analyzing the content by reducing the search space on an average by 67%.
Jian Zhang, Yuxin Peng
Hashing methods have been widely used for efficient similarity retrieval on
large scale image database. Traditional hashing methods learn hash functions to
generate binary codes from hand-crafted features, which achieve limited
accuracy since the hand-crafted features cannot optimally represent the image
content and preserve the semantic similarity. Recently, several deep hashing
methods have shown better performance because the deep architectures generate
more discriminative feature representations. However, these deep hashing
methods are mainly designed for supervised scenarios, which only exploit the
semantic similarity information, but ignore the underlying data structures. In
this paper, we propose the semi-supervised deep hashing (SSDH) approach, to
perform more effective hash function learning by simultaneously preserving
semantic similarity and underlying data structures. The main contributions are
as follows: (1) We propose a semi-supervised loss to jointly minimize the
empirical error on labeled data, as well as the embedding error on both labeled
and unlabeled data, which can preserve the semantic similarity and capture the
meaningful neighbors on the underlying data structures for effective hashing.
(2) A semi-supervised deep hashing network is designed to extensively exploit
both labeled and unlabeled data, in which we propose an online graph
construction method to benefit from the evolving deep features during training
to better capture semantic neighbors. To the best of our knowledge, the
proposed deep network is the first deep hashing method that can perform hash
code learning and feature learning simultaneously in a semi-supervised fashion.
Experimental results on 5 widely-used datasets show that our proposed approach
outperforms the state-of-the-art hashing methods.
Authors' comments: 14 pages, accepted by IEEE Transactions on Circuits and Systems for
Video Technology
Ji Li, Tie Zhou
Wavefront phase retrieval from a set of intensity measurements can be
formulated as an optimization problem. Two nonconvex objective models (MLP and
its variants LS) based on maximum likelihood estimation are investigated. We
develop numerical optimization algorithms for real-valued function of complex
variables and apply them to solve the wavefront phase retrieval problem
efficiently. Numerical simulation is given with application to three wavefront
phase retrieval problems. LS model shows better numerical performances than MLP
model. An explanation for this is that the distribution of the eigenvalues of
Hessian matrix of LS model is more clustered than MLP model. LBFGS shows more
robust performance and takes fewer calculations than other line search methods.
Authors' comments: 23 pages, 6 figures, submitting to Inverse Problems and Imaging
Ji Li, Tie Zhou
In this paper, we study the generalized phase retrieval problem: to recover a
signal $\bm{x}\in\mathbb{C}^n$ from the measurements $y_r=\lvert
\langle\bm{a}_r,\bm{x}\rangle\rvert^2$, $r=1,2,\ldots,m$. The problem can be
reformulated as a least-squares minimization problem. Although the cost
function is nonconvex, the global convergence of gradient descent algorithm
from a random initialization is studied, when $m$ is large enough. We improve
the known result of the local convergence from a spectral initialization. When
the signal $\bm{x}$ is real-valued, we prove that the cost function is local
convex near the solution $\{\pm\bm{x}\}$. To accelerate the gradient descent,
we review and apply several efficient line search methods. We also perform a
comparative numerical study of the line search methods and the alternative
projection method. Numerical simulations demonstrate the superior ability of
LBFGS algorithm than other algorithms.
Authors' comments: 14 pages, 14 figures
Le Dong, Xiuyuan Chen, Mengdie Mao, Qianni Zhang
This paper proposes a classification network to image semantic retrieval
(NIST) framework to counter the image retrieval challenge. Our approach
leverages the successful classification network GoogleNet based on
Convolutional Neural Networks to obtain the semantic feature matrix which
contains the serial number of classes and corresponding probabilities. Compared
with traditional image retrieval using feature matching to compute the
similarity between two images, NIST leverages the semantic information to
construct semantic feature matrix and uses the semantic distance algorithm to
compute the similarity. Besides, the fusion strategy can significantly reduce
storage and time consumption due to less classes participating in the last
semantic distance computation. Experiments demonstrate that our NIST framework
produces state-of-the-art results in retrieval experiments on MIRFLICKR-25K
dataset.
Authors' comments: 4 pages
Elaheh Mohammadi, Alireza Fallah, Farokh Marvasti
Consider a continuous signal that cannot be observed directly. Instead, one
has access to multiple corrupted versions of the signal. The available
corrupted signals are correlated because they carry information about the
common remote signal. The goal is to reconstruct the original signal from the
data collected from its corrupted versions. The information theoretic
formulation of the remote reconstruction problem assumes that the corrupted
signals are uniformly sampled and the focus is on optimal compression of the
samples. In this paper we revisit this problem from a sampling perspective. We
look at the problem of finding the best sampling locations for each signal to
minimize the total reconstruction distortion of the remote signal. In finding
the sampling locations, one can take advantage of the correlation among the
corrupted signals. Our main contribution is a fundamental lower bound on the
reconstruction distortion for any arbitrary nonuniform sampling strategy. This
lower bound is valid for any sampling rate. Furthermore, it is tight and
matches the optimal reconstruction distortion in low and high sampling rates.
Moreover, it is shown that in the low sampling rate region, it is optimal to
use a certain nonuniform sampling scheme on all the signals. On the other hand,
in the high sampling rate region, it is optimal to uniformly sample all the
signals. We also consider the problem of finding the optimal sampling locations
to recover the set of corrupted signals, rather than the remote signal. Unlike
the information theoretic formulation of the problem in which these two
problems were equivalent, we show that they are not equivalent in our setting.
Authors' comments: Under review
Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Xueqi Cheng
Deep neural networks have been successfully applied to many text matching
tasks, such as paraphrase identification, question answering, and machine
translation. Although ad-hoc retrieval can also be formalized as a text
matching task, few deep models have been tested on it. In this paper, we study
a state-of-the-art deep matching model, namely MatchPyramid, on the ad-hoc
retrieval task. The MatchPyramid model employs a convolutional neural network
over the interactions between query and document to produce the matching score.
We conducted extensive experiments to study the impact of different pooling
sizes, interaction functions and kernel sizes on the retrieval performance.
Finally, we show that the MatchPyramid models can significantly outperform
several recently introduced deep matching models on the retrieval task, but
still cannot compete with the traditional retrieval models, such as BM25 and
language models.
Authors' comments: Neu-IR '16 SIGIR Workshop on Neural Information Retrieval
Bing Gao
This article presents new results concerning the recovery of a signal from
magnitude only measurements where the signal is not sparse in an orthonormal
basis but in a redundant dictionary. To solve this phaseless problem, we
analyze the $ \ell_1 $-analysis model. Firstly we investigate the noiseless
case with presenting a null space property of the measurement matrix under
which the $ \ell_1 $-analysis model provide an exact recovery. Secondly we
introduce a new property (S-DRIP) of the measurement matrix. By solving the $
\ell_1 $-analysis model, we prove that this property can guarantee a stable
recovery of real signals that are nearly sparse in highly overcomplete
dictionaries.
Authors' comments: 16 pages
Dong Yin, Kangwook Lee, Ramtin Pedarsani, Kannan Ramchandran
In this paper, we tackle the compressive phase retrieval problem in the presence of noise. The noisy compressive phase retrieval problem is to recover a $K$-sparse complex signal $s \in \mathbb{C}^n$, from a set of $m$ noisy quadratic measurements: $ y_i=| a_i^H s |^2+w_i$, where $a_i^H\in\mathbb{C}^n$ is the $i$th row of the measurement matrix $A\in\mathbb{C}^{m\times n}$, and $w_i$ is the additive noise to the $i$th measurement. We consider the regime where $K=\beta n^\delta$, with constants $\beta>0$ and $\delta\in(0,1)$. We use the architecture of PhaseCode algorithm, and robustify it using two schemes: the almost-linear scheme and the sublinear scheme. We prove that with high probability, the almost-linear scheme recovers $s$ with sample complexity $\Theta(K \log(n))$ and computational complexity $\Theta(n \log(n))$, and the sublinear scheme recovers $s$ with sample complexity $\Theta(K\log^3(n))$ and computational complexity $\Theta(K\log^3(n))$. To the best of our knowledge, this is the first scheme that achieves sublinear computational complexity for compressive phase retrieval problem. Finally, we provide simulation results that support our theoretical contributions.
Yang Wang, Zhiqiang Xu
In this paper, we develop a framework of generalized phase retrieval in which
one aims to reconstruct a vector ${\mathbf x}$ in ${\mathbb R}^d$ or ${\mathbb
C}^d$ through quadratic samples ${\mathbf x}^*A_1{\mathbf x}, \dots, {\mathbf
x}^*A_N{\mathbf x}$. The generalized phase retrieval includes as special cases
the standard phase retrieval as well as the phase retrieval by orthogonal
projections. We first explore the connections among generalized phase
retrieval, low-rank matrix recovery and nonsingular bilinear form. Motivated by
the connections, we present results on the minimal measurement number needed
for recovering a matrix that lies in a set $W\in {\mathbb C}^{d\times d}$.
Applying the results to phase retrieval, we show that generic $d \times d$
matrices $A_1,\ldots, A_N$ have the phase retrieval property if $N\geq 2d-1$ in
the real case and $N \geq 4d-4$ in the complex case for very general classes of
$A_1,\ldots,A_N$, e.g. matrices with prescribed ranks or orthogonal
projections. Our method also leads to a novel proof for the classical
Stiefel-Hopf condition on nonsingular bilinear form. We also give lower bounds
on the minimal measurement number required for generalized phase retrieval. For
several classes of dimensions $d$ we obtain the precise values of the minimal
measurement number. Our work unifies and enhances results from the standard
phase retrieval, phase retrieval by projections and low-rank matrix recovery.
Authors' comments: 31 pages, add Theorem 5.2, 5.3
Lan Li, Cheng Cheng, Deguang Han, Qiyu Sun, Guangming Shi
In this paper, we introduce two symmetric directed graphs depending on supports of signals and windows, and we show that the connectivity of those graphs provides either necessary and sufficient conditions to phase retrieval of a signal from magnitude measurements of its multiple-window short-time Fourier transform. Also we propose an algebraic reconstruction algorithm, and provide an error estimate to our algorithm when magnitude measurements are corrupted by deterministic/random noises.
Hua Sun, Syed A. Jafar
Private information retrieval (PIR) is the problem of retrieving as efficiently as possible, one out of $K$ messages from $N$ non-communicating replicated databases (each holds all $K$ messages) while keeping the identity of the desired message index a secret from each individual database. The information theoretic capacity of PIR (equivalently, the reciprocal of minimum download cost) is the maximum number of bits of desired information that can be privately retrieved per bit of downloaded information. $T$-private PIR is a generalization of PIR to include the requirement that even if any $T$ of the $N$ databases collude, the identity of the retrieved message remains completely unknown to them. Robust PIR is another generalization that refers to the scenario where we have $M \geq N$ databases, out of which any $M - N$ may fail to respond. For $K$ messages and $M\geq N$ databases out of which at least some $N$ must respond, we show that the capacity of $T$-private and Robust PIR is $\left(1+T/N+T^2/N^2+\cdots+T^{K-1}/N^{K-1}\right)^{-1}$. The result includes as special cases the capacity of PIR without robustness ($M=N$) or $T$-privacy constraints ($T=1$).
Yu Wu, Wei Wu, Zhoujun Li, Ming Zhou
We consider incorporating topic information into message-response matching to
boost responses with rich content in retrieval-based chatbots. To this end, we
propose a topic-aware convolutional neural tensor network (TACNTN). In TACNTN,
matching between a message and a response is not only conducted between a
message vector and a response vector generated by convolutional neural
networks, but also leverages extra topic information encoded in two topic
vectors. The two topic vectors are linear combinations of topic words of the
message and the response respectively, where the topic words are obtained from
a pre-trained LDA model and their weights are determined by themselves as well
as the message vector and the response vector. The message vector, the response
vector, and the two topic vectors are fed to neural tensors to calculate a
matching score. Empirical study on a public data set and a human annotated data
set shows that TACNTN can significantly outperform state-of-the-art methods for
message-response matching.
Authors' comments: under reviewed of AAAI 2017
Andre Araujo, Jason Chaves, Haricharan Lakshman, Roland Angst, Bernd Girod
We consider the problem of using image queries to retrieve videos from a database. Our focus is on large-scale applications, where it is infeasible to index each database video frame independently. Our main contribution is a framework based on Bloom filters, which can be used to index long video segments, enabling efficient image-to-video comparisons. Using this framework, we investigate several retrieval architectures, by considering different types of aggregation and different functions to encode visual information -- these play a crucial role in achieving high performance. Extensive experiments show that the proposed technique improves mean average precision by 24% on a public dataset, while being 4X faster, compared to the previous state-of-the-art.
Xiu-Shen Wei, Jian-Hao Luo, Jianxin Wu, Zhi-Hua Zhou
Deep convolutional neural network models pre-trained for the ImageNet
classification task have been successfully adopted to tasks in other domains,
such as texture description and object proposal generation, but these tasks
require annotations for images in the new domain. In this paper, we focus on a
novel and challenging task in the pure unsupervised setting: fine-grained image
retrieval. Even with image labels, fine-grained images are difficult to
classify, let alone the unsupervised retrieval task. We propose the Selective
Convolutional Descriptor Aggregation (SCDA) method. SCDA firstly localizes the
main object in fine-grained images, a step that discards the noisy background
and keeps useful deep descriptors. The selected descriptors are then aggregated
and dimensionality reduced into a short feature vector using the best practices
we found. SCDA is unsupervised, using no image label or bounding box
annotation. Experiments on six fine-grained datasets confirm the effectiveness
of SCDA for fine-grained image retrieval. Besides, visualization of the SCDA
features shows that they correspond to visual attributes (even subtle ones),
which might explain SCDA's high mean average precision in fine-grained
retrieval. Moreover, on general image retrieval datasets, SCDA achieves
comparable retrieval results with state-of-the-art general image retrieval
approaches.
Authors' comments: IEEE Transactions on Image Processing (TIP), 2017, 26(6): 2868-2881
Shujin Zhu, H. R. Tizhoosh
For more than two decades, research has been performed on content-based image
retrieval (CBIR). By combining Radon projections and the support vector
machines (SVM), a content-based medical image retrieval method is presented in
this work. The proposed approach employs the normalized Radon projections with
corresponding image category labels to build an SVM classifier, and the Radon
barcode database which encodes every image in a binary format is also generated
simultaneously to tag all images. To retrieve similar images when a query image
is given, Radon projections and the barcode of the query image are generated.
Subsequently, the k-nearest neighbor search method is applied to find the
images with minimum Hamming distance of the Radon barcode within the same class
predicted by the trained SVM classifier that uses Radon features. The
performance of the proposed method is validated by using the IRMA 2009 dataset
with 14,410 x-ray images in 57 categories. The results demonstrate that our
method has the capacity to retrieve similar responses for the correctly
identified query image and even for those mistakenly classified by SVM. The
approach further is very fast and has low memory requirement.
Authors' comments: To appear in proceedings of The 2016 IEEE International Joint
Conference on Neural Networks (IJCNN 2016), July 24-29, 2016, Vancouver,
Canada