Sailesh Conjeti, Anees Kazi, Nassir Navab, Amin Katouzian
This paper presents a new scalable algorithm for cross-modal similarity preserving retrieval in a learnt manifold space. Unlike existing approaches that compromise between preserving global and local geometries, the proposed technique respects both simultaneously during manifold alignment. The global topologies are maintained by recovering underlying mapping functions in the joint manifold space by deploying partially corresponding instances. The inter-, and intra-modality affinity matrices are then computed to reinforce original data skeleton using perturbed minimum spanning tree (pMST), and maximizing the affinity among similar cross-modal instances, respectively. The performance of proposed algorithm is evaluated upon two multimodal image datasets (coronary atherosclerosis histology and brain MRI) for two applications: classification, and regression. Our exhaustive validations and results demonstrate the superiority of our technique over comparative methods and its feasibility for improving computer-assisted diagnosis systems, where disease-specific complementary information shall be aggregated and interpreted across modalities to form the final decision.
Matthias Dorfer, Andreas Arzt, Gerhard Widmer
This paper demonstrates the feasibility of learning to retrieve short
snippets of sheet music (images) when given a short query excerpt of music
(audio) -- and vice versa --, without any symbolic representation of music or
scores. This would be highly useful in many content-based musical retrieval
scenarios. Our approach is based on Deep Canonical Correlation Analysis (DCCA)
and learns correlated latent spaces allowing for cross-modality retrieval in
both directions. Initial experiments with relatively simple monophonic music
show promising results.
Authors' comments: In NIPS 2016 End-to-end Learning for Speech and Audio Processing
Workshop, Barcelona, Spain
Ivan Maric
A heuristic procedure based on novel recursive formulation of sinusoid (RFS)
and on regression with predictive least-squares (LS) enables to decompose both
uniformly and nonuniformly sampled 1-d signals into a sparse set of sinusoids
(SSS). An optimal SSS is found by Levenberg-Marquardt (LM) optimization of RFS
parameters of near-optimal sinusoids combined with common criteria for the
estimation of the number of sinusoids embedded in noise. The procedure
estimates both the cardinality and the parameters of SSS. The proposed
algorithm enables to identify the RFS parameters of a sinusoid from a data
sequence containing only a fraction of its cycle. In extreme cases when the
frequency of a sinusoid approaches zero the algorithm is able to detect a
linear trend in data. Also, an irregular sampling pattern enables the algorithm
to correctly reconstruct the under-sampled sinusoid. Parsimonious nature of the
obtaining models opens the possibilities of using the proposed method in
machine learning and in expert and intelligent systems needing analysis and
simple representation of 1-d signals. The properties of the proposed algorithm
are evaluated on examples of irregularly sampled artificial signals in noise
and are compared with high accuracy frequency estimation algorithms based on
linear prediction (LP) approach, particularly with respect to Cramer-Rao Bound
(CRB).
Authors' comments: 16 pages, 17 figures, Expert Systems with Applications - published
Irène Waldspurger
We consider a phase retrieval problem, where we want to reconstruct a
$n$-dimensional vector from its phaseless scalar products with $m$ sensing
vectors, independently sampled from complex normal distributions. We show that,
with a suitable initalization procedure, the classical algorithm of alternating
projections succeeds with high probability when $m\geq Cn$, for some $C>0$. We
conjecture that this result is still true when no special initialization
procedure is used, and present numerical experiments that support this
conjecture.
Authors' comments: Short version of arXiv:1609.03088v1
Yuping Duan, Chunlin Wu, Zhi-Feng Pang, Huibin Chang
We study the problem of recovering the underlining sparse signals from clean
or noisy phaseless measurements. Due to the sparse prior of signals, we adopt
an L0regularized variational model to ensure only a small number of nonzero
elements being recovered in the signal and two different formulations are
established in the modeling based on the choices of data fidelity, i.e., L2and
L1norms. We also propose efficient algorithms based on the Alternating
Direction Method of Multipliers (ADMM) with convergence guarantee and nearly
optimal computational complexity. Thanks to the existence of closed-form
solutions to all subproblems, the proposed algorithm is very efficient with low
computational cost in each iteration. Numerous experiments show that our
proposed methods can recover sparse signals from phaseless measurements with
higher successful recovery rates and lower computation cost compared with the
state-of-art methods.
Authors' comments: 11 pages
Jian Zhang, Yuxin Peng
Hashing methods have attracted much attention for large scale image
retrieval. Some deep hashing methods have achieved promising results by taking
advantage of the strong representation power of deep networks recently.
However, existing deep hashing methods treat all hash bits equally. On one
hand, a large number of images share the same distance to a query image due to
the discrete Hamming distance, which raises a critical issue of image retrieval
where fine-grained rankings are very important. On the other hand, different
hash bits actually contribute to the image retrieval differently, and treating
them equally greatly affects the retrieval accuracy of image. To address the
above two problems, we propose the query-adaptive deep weighted hashing (QaDWH)
approach, which can perform fine-grained ranking for different queries by
weighted Hamming distance. First, a novel deep hashing network is proposed to
learn the hash codes and corresponding class-wise weights jointly, so that the
learned weights can reflect the importance of different hash bits for different
image classes. Second, a query-adaptive image retrieval method is proposed,
which rapidly generates hash bit weights for different query images by fusing
its semantic probability and the learned class-wise weights. Fine-grained image
retrieval is then performed by the weighted Hamming distance, which can provide
more accurate ranking than the traditional Hamming distance. Experiments on
four widely used datasets show that the proposed approach outperforms eight
state-of-the-art hashing methods.
Authors' comments: 13 pages, submitted to IEEE Transactions On Multimedia
Donghun Ryu, Zihao Wang, Kuan He, Roarke Horstmeyer, Oliver Cossairt
On-chip holographic video is a convenient way to monitor biological samples simultaneously at high spatial resolution and over a wide field-of-view. However, due to the limited readout rate of digital detector arrays, one often faces a tradeoff between the per-frame pixel count and frame rate of the captured video. In this report, we propose a subsampled phase retrieval (SPR) algorithm to overcome the spatial-temporal trade-off in holographic video. Compared to traditional phase retrieval approaches, our SPR algorithm uses over an order of magnitude less pixel measurements while maintaining suitable reconstruction quality. We use an on-chip holographic video setup with pixel sub-sampling to experimentally demonstrate a factor of 5.5 increase in sensor frame rate while monitoring the in vivo movement of Peranema microorganisms.
Ruicong Xu, Yang Yang, Yadan Luo, Fumin Shen, Zi Huang, Heng Tao Shen
The query-by-image video retrieval (QBIVR) task has been attracting considerable research attention recently. However, most existing methods represent a video by either aggregating or projecting all its frames into a single datum point, which may easily cause severe information loss. In this paper, we propose an efficient QBIVR framework to enable an effective and efficient video search with image query. We first define a similarity-preserving distance metric between an image and its orthogonal projection in the subspace of the video, which can be equivalently transformed to a Maximum Inner Product Search (MIPS) problem. Besides, to boost the efficiency of solving the MIPS problem, we propose two asymmetric hashing schemes, which bridge the domain gap of images and videos. The first approach, termed Inner-product Binary Coding (IBC), preserves the inner relationships of images and videos in a common Hamming space. To further improve the retrieval efficiency, we devise a Bilinear Binary Coding (BBC) approach, which employs compact bilinear projections instead of a single large projection matrix. Extensive experiments have been conducted on four real-world video datasets to verify the effectiveness of our proposed approaches as compared to the state-of-the-arts.
Alexandr Notchenko, Ermek Kapushev, Evgeny Burnaev
In this paper we present results of performance evaluation of S3DCNN - a
Sparse 3D Convolutional Neural Network - on a large-scale 3D Shape benchmark
ModelNet40, and measure how it is impacted by voxel resolution of input shape.
We demonstrate comparable classification and retrieval performance to
state-of-the-art models, but with much less computational costs in training and
inference phases. We also notice that benefits of higher input resolution can
be limited by an ability of a neural network to generalize high level features.
Authors' comments: 8 pages, 3 figures, 2 tables, accepted to 3D Deep Learning Workshop
at NIPS 2016
Ang Li, Jin Sun, Joe Yue-Hei Ng, Ruichi Yu, Vlad I. Morariu, Larry S. Davis
Spatial relationships between objects provide important information for
text-based image retrieval. As users are more likely to describe a scene from a
real world perspective, using 3D spatial relationships rather than 2D
relationships that assume a particular viewing direction, one of the main
challenges is to infer the 3D structure that bridges images with users' text
descriptions. However, direct inference of 3D structure from images requires
learning from large scale annotated data. Since interactions between objects
can be reduced to a limited set of atomic spatial relations in 3D, we study the
possibility of inferring 3D structure from a text description rather than an
image, applying physical relation models to synthesize holistic 3D abstract
object layouts satisfying the spatial constraints present in a textual
description. We present a generic framework for retrieving images from a
textual description of a scene by matching images with these generated abstract
object layouts. Images are ranked by matching object detection outputs
(bounding boxes) to 2D layout candidates (also represented by bounding boxes)
which are obtained by projecting the 3D scenes with sampled camera directions.
We validate our approach using public indoor scene datasets and show that our
method outperforms baselines built upon object occurrence histograms and
learned 2D pairwise relations.
Authors' comments: CVPR 2017
Karl Ni, Kyle Zaragoza, Charles Foster, Carmen Carrano, Barry Chen, Yonas Tesfaye, Alex Gude
Traditional image tagging and retrieval algorithms have limited value as a result of being trained with heavily curated datasets. These limitations are most evident when arbitrary search words are used that do not intersect with training set labels. Weak labels from user generated content (UGC) found in the wild (e.g., Google Photos, FlickR, etc.) have an almost unlimited number of unique words in the metadata tags. Prior work on word embeddings successfully leveraged unstructured text with large vocabularies, and our proposed method seeks to apply similar cost functions to open source imagery. Specifically, we train a deep learning image tagging and retrieval system on large scale, user generated content (UGC) using sampling methods and joint optimization of word embeddings. By using the Yahoo! FlickR Creative Commons (YFCC100M) dataset, such an approach builds robustness to common unstructured data issues that include but are not limited to irrelevant tags, misspellings, multiple languages, polysemy, and tag imbalance. As a result, the final proposed algorithm will not only yield comparable results to state of the art in conventional image tagging, but will enable new capability to train algorithms on large, scale unstructured text in the YFCC100M dataset and outperform cited work in zero-shot capability.
Hua Sun, Syed A. Jafar
The capacity has recently been characterized for the private information retrieval (PIR) problem as well as several of its variants. In every case it is assumed that all the queries are generated by the user simultaneously. Here we consider multiround PIR, where the queries in each round are allowed to depend on the answers received in previous rounds. We show that the capacity of multiround PIR is the same as the capacity of single-round PIR (the result is generalized to also include $T$-privacy constraints). Combined with previous results, this shows that there is no capacity advantage from multiround over single-round schemes, non-linear over linear schemes or from $\epsilon$-error over zero-error schemes. However, we show through an example that there is an advantage in terms of storage overhead. We provide an example of a multiround, non-linear, $\epsilon$-error PIR scheme that requires a strictly smaller storage overhead than the best possible with single-round, linear, zero-error PIR schemes.
Ragnar Freij-Hollanti, Oliver Gnilke, Camilla Hollanti, David Karpuk
We present a general framework for Private Information Retrieval (PIR) from arbitrary coded databases, that allows one to adjust the rate of the scheme according to the suspected number of colluding servers. If the storage code is a generalized Reed-Solomon code of length n and dimension k, we design PIR schemes which simultaneously protect against t colluding servers and provide PIR rate 1-(k+t-1)/n, for all t between 1 and n-k. This interpolates between the previously studied cases of t=1 and k=1 and asymptotically achieves the known capacity bounds in both of these cases, as the size of the database grows.
Goetz E. Pfander, Palina Salanevich
We address the problem of signal reconstruction from intensity measurements
with respect to a measurement frame. This non-convex inverse problem is known
as phase retrieval. The case considered in this paper concerns phaseless
measurements taken with respect to a Gabor frame. It arises naturally in many
practical applications, such as diffraction imaging and speech recognition. We
present a reconstruction algorithm that uses a nearly optimal number of
phaseless time-frequency structured measurements and discuss its robustness in
the case when the measurements are corrupted by noise. We show how geometric
properties of the measurement frame are related to the robustness of the
phaseless reconstruction. The presented algorithm is based on the idea of
polarization as proposed by Alexeev, Bandeira, Fickus, and Mixon.
Authors' comments: 27 pager, 6 figures
Murtuza Ali Abidini, Onno Boxma, Bara Kim, Jeongsim Kim, Jacques Resing
We consider gated polling systems with two special features: (i) retrials, and (ii) glue or reservation periods. When a type-$i$ customer arrives, or retries, during a glue period of station $i$, it will be served in the next visit period of the server to that station. Customers arriving at station $i$ in any other period join the orbit of that station and retry after an exponentially distributed time. Such polling systems can be used to study the performance of certain switches in optical communication systems. For the case of exponentially distributed glue periods, we present an algorithm to obtain the moments of the number of customers in each station. For generally distributed glue periods, we consider the distribution of the total workload in the system, using it to derive a pseudo conservation law which in its turn is used to obtain accurate approximations of the individual mean waiting times. We also consider the problem of choosing the lengths of the glue periods, under a constraint on the total glue period per cycle, so as to minimize a weighted sum of the mean waiting times.
Miguel Miranda, Joao Penedones, Chen Guo, Anne Harth, Maite Louisy, Lana Neoricic, Anne L'Huillier, Cord L. Arnold
We present a retrieval algorithm based on generalized projections for ultrashort pulse characterization using dispersion scan (d-scan). The new algorithm is tested on several simulated cases and in two different experimental cases in the few-cycle regime. The proposed algorithm is much faster and leads to a drastic reduction of retrieval times, but performs less robust in the retrieval of noisy d-scan traces compared to the standard algorithm.
Qiwen Wang, Mikael Skoglund
A user wants to retrieve a file from a database without revealing the identity of the file retrieved at the database, which is known as the problem of private information retrieval (PIR). If it is further required that the user obtains no information about the database other than the desired file, the concept of symmetric private information retrieval (SPIR) is introduced to guarantee privacy for both parties. In this paper, the problem of SPIR is studied for a database stored among $N$ nodes in a distributed way, by using an $(N,M)$-MDS storage code. The information-theoretic capacity of SPIR, defined as the maximum number of symbols of the desired file retrieved per downloaded symbol, for the coded database is derived. It is shown that the SPIR capacity for coded database is $1-\frac{M}{N}$, when the amount of the shared common randomness of distributed nodes (unavailable at the user) is at least $\frac{M}{N-M}$ times the file size. Otherwise, the SPIR capacity for the coded database equals zero.
Sohail Bahmani, Justin Romberg
We propose a flexible convex relaxation for the phase retrieval problem that
operates in the natural domain of the signal. Therefore, we avoid the
prohibitive computational cost associated with "lifting" and semidefinite
programming (SDP) in methods such as PhaseLift and compete with recently
developed non-convex techniques for phase retrieval. We relax the quadratic
equations for phaseless measurements to inequality constraints each of which
representing a symmetric "slab". Through a simple convex program, our proposed
estimator finds an extreme point of the intersection of these slabs that is
best aligned with a given anchor vector. We characterize geometric conditions
that certify success of the proposed estimator. Furthermore, using classic
results in statistical learning theory, we show that for random measurements
the geometric certificates hold with high probability at an optimal sample
complexity. Phase transition of our estimator is evaluated through simulations.
Our numerical experiments also suggest that the proposed method can solve phase
retrieval problems with coded diffraction measurements as well.
Authors' comments: Accepted in AISTATS 2017. Extended the discussion of related work and
added a few more references. Clarified some of the statements and notations
J. Shane Culpepper, Charles L. A. Clarke, Jimmy Lin
Modern multi-stage retrieval systems are comprised of a candidate generation stage followed by one or more reranking stages. In such an architecture, the quality of the final ranked list may not be sensitive to the quality of initial candidate pool, especially in terms of early precision. This provides several opportunities to increase retrieval efficiency without significantly sacrificing effectiveness. In this paper, we explore a new approach to dynamically predicting two different parameters in the candidate generation stage which can directly affect the overall efficiency and effectiveness of the entire system. Previous work exploring this tradeoff has focused on global parameter settings that apply to all queries, even though optimal settings vary across queries. In contrast, we propose a technique which makes a parameter prediction that maximizes efficiency within a effectiveness envelope on a per query basis, using only static pre-retrieval features. The query-specific tradeoff point between effectiveness and efficiency is decided using a classifier cascade that weighs possible efficiency gains against effectiveness losses over a range of possible parameter cutoffs to make the prediction. The interesting twist in our new approach is to train classifiers without requiring explicit relevance judgments. We show that our framework is generalizable by applying it to two different retrieval parameters - selecting k in common top-k query retrieval algorithms, and setting a quality threshold, $\rho$, for score-at-a-time approximate query evaluation algorithms. Experimental results show that substantial efficiency gains are achievable depending on the dynamic parameter choice. In addition, our framework provides a versatile tool that can be used to estimate the effectiveness-efficiency tradeoffs that are possible before selecting and tuning algorithms to make machine learned predictions.
Yusuke Uchida, Shigeyuki Sakazawa, Shin'ichi Satoh
Recently, the Fisher vector representation of local features has attracted much attention because of its effectiveness in both image classification and image retrieval. Another trend in the area of image retrieval is the use of binary features such as ORB, FREAK, and BRISK. Considering the significant performance improvement for accuracy in both image classification and retrieval by the Fisher vector of continuous feature descriptors, if the Fisher vector were also to be applied to binary features, we would receive similar benefits in binary feature based image retrieval and classification. In this paper, we derive the closed-form approximation of the Fisher vector of binary features modeled by the Bernoulli mixture model. We also propose accelerating the Fisher vector by using the approximate value of posterior probability. Experiments show that the Fisher vector representation significantly improves the accuracy of image retrieval compared with a bag of binary words approach.