Meng Huang, Yi Rong, Yang Wang, Zhiqiang Xu
The aim of generalized phase retrieval is to recover $\mathbf{x}\in
\mathbb{F}^d$ from the quadratic measurements
$\mathbf{x}^*A_1\mathbf{x},\ldots,\mathbf{x}^*A_N\mathbf{x}$, where $A_j\in
\mathbf{H}_d(\mathbb{F})$ and $\mathbb{F}=\mathbb{R}$ or $\mathbb{C}$. In this
paper, we study the matrix set $\mathcal{A}=(A_j)_{j=1}^N$ which has the almost
everywhere phase retrieval property. For the case $\mathbb{F}=\mathbb{R}$, we
show that $N\geq d+1$ generic matrices with prescribed ranks have almost
everywhere phase retrieval property. We also extend this result to the case
where $A_1,\ldots,A_N$ are orthogonal matrices and hence establish the almost
everywhere phase retrieval property for the fusion frame phase retrieval. For
the case where $\mathbb{F}=\mathbb{C}$, we obtain similar results under the
assumption of $N\geq 2d$. We lower the measurement number $d+1$ (resp. $2d$)
with showing that there exist $N=d$ (resp. $2d-1$) matrices $A_1,\ldots, A_N\in
\mathbf{H}_d(\mathbb{R})$ (resp. $\mathbf{H}_d(\mathbb{C})$) which have the
almost everywhere phase retrieval property. Our results are an extension of
almost everywhere phase retrieval from the standard phase retrieval to the
general setting and the proofs are often based on some new ideas about
determinant variety.
Authors' comments: 27 pages
Shahrzad Naseri, Sheikh Muhammad Sarwar, James Allan
A common approach for knowledge-base entity search is to consider an entity as a document with multiple fields. Models that focus on matching query terms in different fields are popular choices for searching such entity representations. An instance of such a model is FSDM (Fielded Sequential Dependence Model). We propose to integrate field-level semantic features into FSDM. We use FSDM to retrieve a pool of documents, and then to use semantic field-level features to re-rank those documents. We propose to represent queries as bags of terms as well as bags of entities, and eventually, use their dense vector representation to compute semantic features based on query document similarity. Our proposed re-ranking approach achieves significant improvement in entity retrieval on the DBpedia-Entity (v2) dataset over existing FSDM model. Specifically, for all queries we achieve 2.5% and 1.2% significant improvement in NDCG@10 and NDCG@100, respectively.
Sho Maeoki, Kohei Uehara, Tatsuya Harada
Now that everyone can easily record videos, the quantity of which is continuously increasing, research on methods for improved video retrieval is important in the contemporary world. In cases where target videos are to be identified within a large collection gathered by individuals, the appropriate information must be obtained to retrieve the correct video within a large number of similar items in the target database. The purpose of this research is to retrieve target videos in such cases by introducing an interaction, or a dialog, between the system and the user. We propose a system to retrieve videos by asking questions about the content of the videos and leveraging the user's responses to the questions. Additionally, we confirmed the usefulness of the proposed system through experiments using the dataset called AVSD which includes videos and dialogs about the videos.
Da Li, Zhang Zhang
The Large-Scale Pedestrian Retrieval Competition (LSPRC) mainly focuses on person retrieval which is an important end application in intelligent vision system of surveillance. Person retrieval aims at searching the interested target with specific visual attributes or images. The low image quality, various camera viewpoints, large pose variations and occlusions in real scenes make it a challenge problem. By providing large-scale surveillance data in real scene and standard evaluation methods that are closer to real application, the competition aims to improve the robust of related algorithms and further meet the complicated situations in real application. LSPRC includes two kinds of tasks, i.e., Attribute based Pedestrian Retrieval (PR-A) and Re-IDentification (ReID) based Pedestrian Retrieval (PR-ID). The normal evaluation index, i.e., mean Average Precision (mAP), is used to measure the performances of the two tasks under various scale, pose and occlusion. While the method of system evaluation is introduced to evaluate the person retrieval system in which the related algorithms of the two tasks are integrated into a large-scale video parsing platform (named ISEE) combing with algorithm of pedestrian detection.
Seyedehsara Nayer, Praneeth Narayanamurthy, Namrata Vaswani
We study the Low Rank Phase Retrieval (LRPR) problem defined as follows:
recover an $n \times q$ matrix $X^*$ of rank $r$ from a different and
independent set of $m$ phaseless (magnitude-only) linear projections of each of
its columns. To be precise, we need to recover $X^*$ from $y_k := |A_k{}'
x^*_k|, k=1,2,\dots, q$ when the measurement matrices $A_k$ are mutually
independent. Here $y_k$ is an $m$ length vector, $A_k$ is an $n \times m$
matrix, and $'$ denotes matrix transpose. The question is when can we solve
LRPR with $m \ll n$? A reliable solution can enable fast and low-cost phaseless
dynamic imaging, e.g., Fourier ptychographic imaging of live biological
specimens. In this work, we develop the first provably correct approach for
solving this LRPR problem. Our proposed algorithm, Alternating Minimization for
Low-Rank Phase Retrieval (AltMinLowRaP), is an AltMin based solution and hence
is also provably fast (converges geometrically). Our guarantee shows that
AltMinLowRaP solves LRPR to $\epsilon$ accuracy, with high probability, as long
as $m q \ge C n r^4 \log(1/\epsilon)$, the matrices $A_k$ contain i.i.d.
standard Gaussian entries, and the right singular vectors of $X^*$ satisfy the
incoherence assumption from matrix completion literature. Here $C$ is a
numerical constant that only depends on the condition number of $X^*$ and on
its incoherence parameter. Its time complexity is only $ C mq nr
\log^2(1/\epsilon)$. Since even the linear (with phase) version of the above
problem is not fully solved, the above result is also the first complete
solution and guarantee for the linear case. Finally, we also develop a simple
extension of our results for the dynamic LRPR setting.
Authors' comments: A short version of this work is in ICML 2019, this longer version is
published in IEEE Trans. Info. Th on March 2020. Fixing minor but important
errors in Lemmas 3.10, 3.11, 3.12 statements and in proof of the Term1 bound.
No change to Theorem statement
Philipp Grohs, Sarah Koppensteiner, Martin Rathmair
The problem of phase retrieval, i.e., the problem of recovering a function
from the magnitudes of its Fourier transform, naturally arises in various
fields of physics, such as astronomy, radar, speech recognition, quantum
mechanics and, perhaps most prominently, diffraction imaging. The mathematical
study of phase retrieval problems possesses a long history with a number of
beautiful and deep results drawing from different mathematical fields, such as
harmonic analyis, complex analysis, or Riemannian geometry. The present paper
aims to present a summary of some of these results with an emphasis on recent
activities. In particular we aim to summarize our current understanding of
uniqueness and stability properties of phase retrieval problems.
Authors' comments: 52 pages, 3 figures
Hugo Germain, Guillaume Bourmaud, Vincent Lepetit
Outdoor visual localization is a crucial component to many computer vision systems. We propose an approach to localization from images that is designed to explicitly handle the strong variations in appearance happening between daytime and nighttime. As revealed by recent long-term localization benchmarks, both traditional feature-based and retrieval-based approaches still struggle to handle such changes. Our novel localization method combines a state-of-the-art image retrieval architecture with condition-specific sub-networks allowing the computation of global image descriptors that are explicitly dependent of the capturing conditions. We show that our approach improves localization by a factor of almost 300\% compared to the popular VLAD-based methods on nighttime localization.
Mansi Butola, Sunaina, Kedar Khare
Iterative phase retrieval methods based on the Gerchberg-Saxton (GS) or Fienup algorithm require a large number of iterations to converge to a meaningful solution. For complex-valued or phase objects, these approaches also suffer from stagnation problems where the solution does not change much from iteration to iteration but the resultant solution shows artifacts such as presence of a twin. We introduce a complexity parameter $\zeta$ that can be computed directly from the Fourier magnitude data and provides a measure of fluctuations in the desired phase retrieval solution. It is observed that when initiated with a uniformly random phase map, the complexity of the Fienup solution containing stagnation artifacts stabilizes at a numerical value that is much higher than $\zeta$. We propose a modified Fienup algorithm that uses a controlled sparsity enhancing step such that in every iteration the complexity of the resulting solution is explicitly made close to $\zeta$. This approach which we refer to as complexity guided phase retrieval (CGPR) is seen to significantly reduce the number of phase retrieval iterations required for convergence to a meaningful solution and automatically addresses the stagnation problems. The CGPR methodology can enable new applications of iterative phase retrieval that are considered practically difficult due to large number of iterations required for a reliable phase recovery.
Shirley Anugrah Hayati, Raphael Olivier, Pravalika Avvaru, Pengcheng Yin, Anthony Tomasic, Graham Neubig
In models to generate program source code from natural language, representing
this code in a tree structure has been a common approach. However, existing
methods often fail to generate complex code correctly due to a lack of ability
to memorize large and complex structures. We introduce ReCode, a method based
on subtree retrieval that makes it possible to explicitly reference existing
code examples within a neural code generation model. First, we retrieve
sentences that are similar to input sentences using a dynamic-programming-based
sentence similarity scoring method. Next, we extract n-grams of action
sequences that build the associated abstract syntax tree. Finally, we increase
the probability of actions that cause the retrieved n-gram action subtree to be
in the predicted code. We show that our approach improves the performance on
two code generation tasks by up to +2.6 BLEU.
Authors' comments: This paper is accepted in EMNLP 2018. It has 6 pages
Lluís Gómez, Andrés Mafla, Marçal Rusiñol, Dimosthenis Karatzas
Textual information found in scene images provides high level semantic
information about the image and its context and it can be leveraged for better
scene understanding. In this paper we address the problem of scene text
retrieval: given a text query, the system must return all images containing the
queried text. The novelty of the proposed model consists in the usage of a
single shot CNN architecture that predicts at the same time bounding boxes and
a compact text representation of the words in them. In this way, the text based
image retrieval task can be casted as a simple nearest neighbor search of the
query text representation over the outputs of the CNN over the entire image
database. Our experiments demonstrate that the proposed architecture
outperforms previous state-of-the-art while it offers a significant increase in
processing speed.
Authors' comments: ECCV 2018
Jochen L. Leidner
There has been a recent trend to migrate IT infrastructure into the cloud. In
this paper, we discuss the impact of this trend on searching for textual and
other data, i.e. the distributed indexing and retrieval of information, from an
organizational context.
Keywords: information retrieval (IR); federated search; cloud search.
Authors' comments: 6 pages, 1 figure, 1 table
Hussein Suleman
Developing Information Retrieval (IR) tools and techniques in African languages suffers from the dual problems of a lack of algorithms and very small test data collections. This affects the creation of practical IR systems and limits the ability to apply IR to address human and socio-economic problems, which is an urgent need in poor countries. This position paper presents an overview of recent and current work conducted at the University of Cape Town in this area. While many problems have been investigated at an early stage, limited dataset sizes for local African languages still persists as a significant limitation and stumbling block.
Chene Tradonsky, Oren Raz, Vishwa Pal, Ronen Chriki, Asher A. Friesem, Nir Davidson
Reconstructing an object solely from its scattered intensity distribution is a common problem that occurs in many applications. Currently, there are no efficient direct methods to reconstruct the object, though in many cases, with some prior knowledge, iterative algorithms result in reasonable reconstructions. Unfortunately, even with advanced computational resources, these algorithms are highly time consuming. Here we present a novel rapid all-optical method based on a digital degenerate cavity laser, whose most probable lasing mode well approximates the object. We present experimental results showing the high speed (<100 ns) and efficiency of our method in agreement with our numerical simulations and analysis. The method is scalable, and can be applicable to any two dimensional object with known compact support, including complex-valued objects.
Xiaoxiao Guo, Hui Wu, Yu Cheng, Steven Rennie, Gerald Tesauro, Rogerio Schmidt Feris
Existing methods for interactive image retrieval have demonstrated the merit
of integrating user feedback, improving retrieval results. However, most
current systems rely on restricted forms of user feedback, such as binary
relevance responses, or feedback based on a fixed set of relative attributes,
which limits their impact. In this paper, we introduce a new approach to
interactive image search that enables users to provide feedback via natural
language, allowing for more natural and effective interaction. We formulate the
task of dialog-based interactive image retrieval as a reinforcement learning
problem, and reward the dialog system for improving the rank of the target
image during each dialog turn. To mitigate the cumbersome and costly process of
collecting human-machine conversations as the dialog system learns, we train
our system with a user simulator, which is itself trained to describe the
differences between target and candidate images. The efficacy of our approach
is demonstrated in a footwear retrieval application. Experiments on both
simulated and real-world data show that 1) our proposed learning framework
achieves better accuracy than other supervised and reinforcement learning
baselines and 2) user feedback based on natural language rather than
pre-specified attributes leads to more effective retrieval results, and a more
natural and expressive communication interface.
Authors' comments: accepted at NeurIPS 2018
Sara Botelho-Andrade, Peter G. Casazza, Desai Cheng, John Haas, Tin T. Tran
We will review the major results in finite dimensional real phase retrieval for vectors and projections. We then (1)prove that many of these theorems hold in infinite dimensions, (2) give counter-examples to show that many others fail in infinite dimensions, (3)list finite dimensional results are unknown for $\ell_2$.
Ziyang Yuan, Hongxia Wang
Phase retrieval problem has been studied in various applications. It is an inverse problem without the standard uniqueness guarantee. To make complete theoretical analyses and devise efficient algorithms to recover the signal is sophisticated. In this paper, we come up with a model called \textit{phase retrieval with background information} which recovers the signal with the known background information from the intensity of their combinational Fourier transform spectrum. We prove that the uniqueness of phase retrieval can be guaranteed even considering those trivial solutions when the background information is sufficient. Under this condition, we construct a loss function and utilize the projected gradient descent method to search for the ground truth. We prove that the stationary point is the global optimum with probability 1. Numerical simulations demonstrate the projected gradient descent method performs well both for 1-D and 2-D signals. Furthermore, this method is quite robust to the Gaussian noise and the bias of the background information.
Tom Kenter, Alexey Borisov, Christophe Van Gysel, Mostafa Dehghani, Maarten de Rijke, Bhaskar Mitra
Machine learning plays a role in many aspects of modern IR systems, and deep
learning is applied in all of them. The fast pace of modern-day research has
given rise to many approaches to many IR problems. The amount of information
available can be overwhelming both for junior students and for experienced
researchers looking for new research topics and directions. The aim of this
full-day tutorial is to give a clear overview of current tried-and-trusted
neural methods in IR and how they benefit IR.
Authors' comments: Overview of full-day tutorial at WSDM 2018
Rohan Chandra, Ziyuan Zhong, Justin Hontz, Val McCulloch, Christoph Studer, Tom Goldstein
Phase retrieval deals with the estimation of complex-valued signals solely from the magnitudes of linear measurements. While there has been a recent explosion in the development of phase retrieval algorithms, the lack of a common interface has made it difficult to compare new methods against the state-of-the-art. The purpose of PhasePack is to create a common software interface for a wide range of phase retrieval algorithms and to provide a common testbed using both synthetic data and empirical imaging datasets. PhasePack is able to benchmark a large number of recent phase retrieval methods against one another to generate comparisons using a range of different performance metrics. The software package handles single method testing as well as multiple method comparisons. The algorithm implementations in PhasePack differ slightly from their original descriptions in the literature in order to achieve faster speed and improved robustness. In particular, PhasePack uses adaptive stepsizes, line-search methods, and fast eigensolvers to speed up and automate convergence.
Ferréol Soulez, Éric Thiébaut, Antony Schutz, André Ferrari, Frédéric Courbin, Michael Unser
We present a new formulation of a family of proximity operators that generalize the projector step for phase retrieval. These proximity operators for noisy intensity measurements can replace the classical "noise free" projection in any projection-based algorithm. They are derived from a maximum likelihood formulation and admit closed form solutions for both the Gaussian and the Poisson cases. In addition, we extend these proximity operators to undersampled intensity measurements. To assess their performance, these operators are exploited in a classical Gerchberg Saxton algorithm. We present numerical experiments showing that the reconstructed complex amplitudes with these proximity operators perform always better than using the classical intensity projector while their computational overhead is moderate.
Ryota Hinami, Yusuke Matsui, Shin'ichi Satoh
Region-based image retrieval (RBIR) technique is revisited. In early attempts
at RBIR in the late 90s, researchers found many ways to specify region-based
queries and spatial relationships; however, the way to characterize the
regions, such as by using color histograms, were very poor at that time. Here,
we revisit RBIR by incorporating semantic specification of objects and
intuitive specification of spatial relationships. Our contributions are the
following. First, to support multiple aspects of semantic object specification
(category, instance, and attribute), we propose a multitask CNN feature that
allows us to use deep learning technique and to jointly handle multi-aspect
object specification. Second, to help users specify spatial relationships among
objects in an intuitive way, we propose recommendation techniques of spatial
relationships. In particular, by mining the search results, a system can
recommend feasible spatial relationships among the objects. The system also can
recommend likely spatial relationships by assigned object category names based
on language prior. Moreover, object-level inverted indexing supports very fast
shortlist generation, and re-ranking based on spatial constraints provides
users with instant RBIR experiences.
Authors' comments: To appear in ACM Multimedia 2017 (Oral)