Jonathan J. Fortney, Joanna K. Barstow, Nikku Madhusudhan
This brief review focuses on methods and applications of modeling
exoplanetary atmospheres. We discuss various kinds of state of the art
self-consistent and retrieval models in 1D and multi-D with a focus on open
questions and short- and long-term goals in the field. Expertise previously
developed in modeling cool stellar atmospheres and in modeling solar system
planetary atmospheres has proven valuable to the field, and will continue to do
so. We described upcoming opportunities for making progress in our
understanding of atmospheres, and close with what we see as the field's
challenges.
Authors' comments: Accepted in Autumn of 2020. To appear as a book chapter in
"ExoFrontiers: Big questions in exoplanetary science", Ed. N Madhusudhan
(Bristol: IOP Publishing Ltd) AAS-IOP ebooks
https://iopscience.iop.org/bookListInfo/aas-iop-astronomy
Daniil Mamaev
A smooth function f in a neighbourhood of the unit sphere $S^{n - 1}$ is said
to admit index $\lambda$ if it can be extended to a function F in the unit ball
$B^n$ such that F has a unique critical point p and the Morse index of p is
equal to $\lambda$. It is easy to see that a function f cannot admit two
indices of different parity. We prove that for any two indices that differ by
two there exists a function f that admits both of them.
Authors' comments: 15 pages, 3 figures
Tor Lattimore, Botao Hao
We study a bandit version of phase retrieval where the learner chooses actions $(A_t)_{t=1}^n$ in the $d$-dimensional unit ball and the expected reward is $\langle A_t, \theta_\star\rangle^2$ where $\theta_\star \in \mathbb R^d$ is an unknown parameter vector. We prove that the minimax cumulative regret in this problem is $\smash{\tilde \Theta(d \sqrt{n})}$, which improves on the best known bounds by a factor of $\smash{\sqrt{d}}$. We also show that the minimax simple regret is $\smash{\tilde \Theta(d / \sqrt{n})}$ and that this is only achievable by an adaptive algorithm. Our analysis shows that an apparently convincing heuristic for guessing lower bounds can be misleading and that uniform bounds on the information ratio for information-directed sampling are not sufficient for optimal regret.
Haonan Wang, Chang Zhou, Carl Yang, Hongxia Yang, Jingrui He
In this paper, we identify and study an important problem of gradient item
retrieval. We define the problem as retrieving a sequence of items with a
gradual change on a certain attribute, given a reference item and a
modification text. For example, after a customer saw a white dress, she/he
wants to buy a similar one but more floral on it. The extent of "more floral"
is subjective, thus prompting one floral dress is hard to satisfy the
customer's needs. A better way is to present a sequence of products with
increasingly floral attributes based on the white dress, and allow the customer
to select the most satisfactory one from the sequence. Existing item retrieval
methods mainly focus on whether the target items appear at the top of the
retrieved sequence, but ignore the demand for retrieving a sequence of products
with gradual change on a certain attribute. To deal with this problem, we
propose a weakly-supervised method that can learn a disentangled item
representation from user-item interaction data and ground the semantic meaning
of attributes to dimensions of the item representation. Our method takes a
reference item and a modification as a query. During inference, we start from
the reference item and "walk" along the direction of the modification in the
item representation space to retrieve a sequence of items in a gradient manner.
We demonstrate our proposed method can achieve disentanglement through weak
supervision. Besides, we empirically show that an item sequence retrieved by
our method is gradually changed on an indicated attribute and, in the item
retrieval task, our method outperforms existing approaches on three different
datasets.
Authors' comments: Accepted by The International World Wide Web Conference (WWW), 2021
Karim Banawan, Ahmed Arafa, Sennur Ulukus
We introduce the problem of \emph{timely} private information retrieval (PIR)
from $N$ non-colluding and replicated servers. In this problem, a user desires
to retrieve a message out of $M$ messages from the servers, whose contents are
continuously updating. The retrieval process should be executed in a timely
manner such that no information is leaked about the identity of the message. To
assess the timeliness, we use the \emph{age of information} (AoI) metric.
Interestingly, the timely PIR problem reduces to an AoI minimization subject to
PIR constraints under \emph{asymmetric traffic}. We explicitly characterize the
optimal tradeoff between the PIR rate and the AoI metric (peak AoI or average
AoI) for the case of $N=2$, $M=3$. Further, we provide some structural insights
on the general problem with arbitrary $N$, $M$.
Authors' comments: Accepted for presentation in ISIT 2021
Òscar Lorente, Ian Riera, Shauryadeep Chaudhuri, Oriol Catalan, Víctor Casales
To retrieve images based on their content is one of the most studied topics in the field of computer vision. Nowadays, this problem can be addressed using modern techniques such as feature extraction using machine learning, but over the years different classical methods have been developed. In this paper, we implement a query by example retrieval system for finding paintings in a museum image collection using classic computer vision techniques. Specifically, we study the performance of the color, texture, text and feature descriptors in datasets with different perturbations in the images: noise, overlapping text boxes, color corruption and rotation. We evaluate each of the cases using the Mean Average Precision (MAP) metric, and we obtain results that vary between 0.5 and 1.0 depending on the problem conditions.
Xuyang Chang, Liheng Bian, Jun Zhang
High-throughput computational imaging requires efficient processing algorithms to retrieve multi-dimensional and multi-scale information. In computational phase imaging, phase retrieval (PR) is required to reconstruct both amplitude and phase in complex space from intensity-only measurements. The existing PR algorithms suffer from the tradeoff among low computational complexity, robustness to measurement noise and strong generalization on different modalities. In this work, we report an efficient large-scale phase retrieval technique termed as LPR. It extends the plug-and-play generalized-alternating-projection framework from real space to nonlinear complex space. The alternating projection solver and enhancing neural network are respectively derived to tackle the measurement formation and statistical prior regularization. This framework compensates the shortcomings of each operator, so as to realize high-fidelity phase retrieval with low computational complexity and strong generalization. We applied the technique for a series of computational phase imaging modalities including coherent diffraction imaging, coded diffraction pattern imaging, and Fourier ptychographic microscopy. Extensive simulations and experiments validate that the technique outperforms the existing PR algorithms with as much as 17dB enhancement on signal-to-noise ratio, and more than one order-of-magnitude increased running efficiency. Besides, we for the first time demonstrate ultra-large-scale phase retrieval at the 8K level (7680$\times$4320 pixels) in minute-level time.
David J. Brady, Timothy J. Schulz, Chengyu Wang
While characterization of coherent wavefields is essential to laser, x-ray and electron imaging, sensors measure the squared magnitude of the field, rather than the field itself. Holography or phase retrieval must be used to characterize the field. The need for a reference severely restricts the utility of holography. Phase retrieval, in contrast, is theoretically consistent with sensors that directly measure coherent or partially coherent fields with no prior assumptions. Unfortunately, phase retrieval has not yet been successfully implemented for large-scale fields. Here we show that both holography and phase retrieval are capable of quantum-limited coherent signal estimation and we describe phase retrieval strategies that approach the quantum limit for >1 megapixel fields. These strategies rely on group testing using networks of interferometers, such as might be constructed using emerging integrated photonic, plasmonic and/or metamaterial devices. Phase-sensitive sensor planes using such devices could eliminate the need both for lenses and reference signals, creating a path to large aperture diffraction limited laser imaging.
Nicola De Cao, Gautier Izacard, Sebastian Riedel, Fabio Petroni
Entities are at the center of how we represent and aggregate knowledge. For
instance, Encyclopedias such as Wikipedia are structured by entities (e.g., one
per Wikipedia article). The ability to retrieve such entities given a query is
fundamental for knowledge-intensive tasks such as entity linking and
open-domain question answering. Current approaches can be understood as
classifiers among atomic labels, one for each entity. Their weight vectors are
dense entity representations produced by encoding entity meta information such
as their descriptions. This approach has several shortcomings: (i) context and
entity affinity is mainly captured through a vector dot product, potentially
missing fine-grained interactions; (ii) a large memory footprint is needed to
store dense representations when considering large entity sets; (iii) an
appropriately hard set of negative data has to be subsampled at training time.
In this work, we propose GENRE, the first system that retrieves entities by
generating their unique names, left to right, token-by-token in an
autoregressive fashion. This mitigates the aforementioned technical issues
since: (i) the autoregressive formulation directly captures relations between
context and entity name, effectively cross encoding both; (ii) the memory
footprint is greatly reduced because the parameters of our encoder-decoder
architecture scale with vocabulary size, not entity count; (iii) the softmax
loss is computed without subsampling negative data. We experiment with more
than 20 datasets on entity disambiguation, end-to-end entity linking and
document retrieval tasks, achieving new state-of-the-art or very competitive
results while using a tiny fraction of the memory footprint of competing
systems. Finally, we demonstrate that new entities can be added by simply
specifying their names. Code and pre-trained models at
https://github.com/facebookresearch/GENRE.
Authors' comments: Accepted (spotlight) at International Conference on Learning
Representations (ICLR) 2021. Code at
https://github.com/facebookresearch/GENRE. 20 pages, 9 figures, 8 tables
Martin Reiche, Peter Jung
This paper shows how data-driven deep generative models can be utilized to
solve challenging phase retrieval problems, in which one wants to reconstruct a
signal from only few intensity measurements. Classical iterative algorithms are
known to work well if initialized close to the optimum but otherwise suffer
from non-convexity and often get stuck in local minima. We therefore propose
DeepInit Phase Retrieval, which uses regularized gradient descent under a deep
generative data prior to compute a trained initialization for a fast classical
algorithm (e.g. the randomized Kaczmarz method). We empirically show that our
hybrid approach is able to deliver very high reconstruction results at low
sampling rates even when there is significant generator model error.
Conceptually, learned initializations may therefore help to overcome the
non-convexity of the problem by starting classical descent steps closer to the
global optimum. Also, our idea demonstrates superior runtime performance over
conventional gradient-based reconstruction methods. We evaluate our method for
generic measurements and show empirically that it is also applicable to
diffraction-type measurement models which are found in terahertz single-pixel
phase retrieval.
Authors' comments: 9 pages, 12 figures
Sajani Vithana, Karim Banawan, Sennur Ulukus
We investigate the problem of semantic private information retrieval
(semantic PIR). In semantic PIR, a user retrieves a message out of $K$
independent messages stored in $N$ replicated and non-colluding databases
without revealing the identity of the desired message to any individual
database. The messages come with \emph{different semantics}, i.e., the messages
are allowed to have \emph{non-uniform a priori probabilities} denoted by
$(p_i>0,\: i \in [K])$, which are a proxy for their respective popularity of
retrieval, and \emph{arbitrary message sizes} $(L_i,\: i \in [K])$. This is a
generalization of the classical private information retrieval (PIR) problem,
where messages are assumed to have equal a priori probabilities and equal
message sizes. We derive the semantic PIR capacity for general $K$, $N$. The
results show that the semantic PIR capacity depends on the number of databases
$N$, the number of messages $K$, the a priori probability distribution of
messages $p_i$, and the message sizes $L_i$. We present two achievable semantic
PIR schemes: The first one is a deterministic scheme which is based on message
asymmetry. This scheme employs non-uniform subpacketization. The second scheme
is probabilistic and is based on choosing one query set out of multiple options
at random to retrieve the required message without the need for exponential
subpacketization. We derive necessary and sufficient conditions for the
semantic PIR capacity to exceed the classical PIR capacity with equal priors
and sizes. Our results show that the semantic PIR capacity can be larger than
the classical PIR capacity when longer messages have higher popularities.
However, when messages are equal-length, the non-uniform priors cannot be
exploited to improve the retrieval rate over the classical PIR capacity.
Authors' comments: submitted for publication
Yiwei Zhang, Eitan Yaakobi, Tuvi Etzion
A \emph{private proximity retrieval} (\emph{PPR}) scheme is a protocol which allows a user to retrieve the identities of all records in a database that are within some distance $r$ from the user's record $x$. The user's \emph{privacy} at each server is given by the fraction of the record $x$ that is kept private. We assume that each server stores a copy of the database. In this paper, this research is initiated and protocols that offer trade-offs between privacy and computational complexity and storage are studied. In particular, we study the required minimum number of servers by our protocol which provides a given privacy level. Each server gets a query in the protocol and the set of queries form a code. We study the family of codes generated by the set of queries and in particular the minimum number of codewords in such a code which is the minimum number of servers required for the protocol. These codes are closely related to a family of codes known as \emph{covering designs}. We introduce several lower bounds on the sizes of such codes as well as several constructions. This work focuses on the case when the records are binary vectors together with the Hamming distance. Other metrics such as the Johnson metric are also investigated.
Marcello Balduccini, Emily LeBlanc
Information Retrieval (IR) aims at retrieving documents that are most
relevant to a query provided by a user. Traditional techniques rely mostly on
syntactic methods. In some cases, however, links at a deeper semantic level
must be considered. In this paper, we explore a type of IR task in which
documents describe sequences of events, and queries are about the state of the
world after such events. In this context, successfully matching documents and
query requires considering the events' possibly implicit, uncertain effects and
side-effects. We begin by analyzing the problem, then propose an action
language based formalization, and finally automate the corresponding IR task
using Answer Set Programming.
Authors' comments: Under consideration in Theory and Practice of Logic Programming
(TPLP)
Fabio Crestani, Stefano Mizzaro, Ivan Scagnetto
Mobile Information Retrieval (Mobile IR) is a relatively recent branch of
Information Retrieval (IR) that is concerned with enabling users to carry out,
using a mobile device, all the classical IR operations that they were used to
carry out on a desktop. This includes finding content available on local
repositories or on the web in response to a user query, interacting with the
system in an explicit or implicit way, reformulate the query and/or visualise
the content of the retrieved documents, as well as providing relevance
judgments to improve the retrieval process.
This book is structured as follows. Chapter 2 provides a very brief overview
of IR and of Mobile IR, briefly outlining what in Mobile IR is different from
IR. Chapter 3 provides the foundations of Mobile IR, looking at the
characteristics of mobile devices and what they bring to IR, but also looking
at how the concept of relevance changed from standard IR to Mobile IR. Chapter
4 presents an overview of the document collections that are searchable by a
Mobile IR system, and that are somehow different from classical IR ones;
available for experimentation, including collections of data that have become
complementary to Mobile IR. Similarly, Chapter 5 reviews mobile information
needs studies and users log analysis. Chapter 6 reviews studies aimed at
adapting and improving the users interface to the needs of Mobile IR. Chapter
7, instead, reviews work on context awareness, which studies the many aspects
of the user context that Mobile IR employs. Chapter 8 reviews some of
evaluation work done in Mobile IR, highlighting the distinctions with classical
IR from the perspectives of two main IR evaluation methodologies: users studies
and test collections. Finally, Chapter 9 reports the conclusions of this
review, highlighting briefly some trends in Mobile IR that we believe will
drive research in the next few years.
Authors' comments: 116 pages, published in 2017
Hsuan-Yin Lin, Siddhartha Kumar, Eirik Rosnes, Alexandre Graell i Amat, Eitan Yaakobi
Private information retrieval (PIR) protocols make it possible to retrieve a
file from a database without disclosing any information about the identity of
the file being retrieved. These protocols have been rigorously explored from an
information-theoretic perspective in recent years. While existing protocols
strictly impose that no information is leaked on the file's identity, this work
initiates the study of the tradeoffs that can be achieved by relaxing the
requirement of perfect privacy. In case the user is willing to leak some
information on the identity of the retrieved file, we study how the PIR rate,
as well as the upload cost and access complexity, can be improved. For the
particular case of replicated servers, we propose two weakly-private
information retrieval schemes based on two recent PIR protocols and a family of
schemes based on partitioning. Lastly, we compare the performance of the
proposed schemes.
Authors' comments: To be presented at 2019 IEEE International Symposium on Information
Theory (ISIT)
Subhadip Mukherjee, Chandra Sekhar Seelamantula
We address the problem of phase retrieval (PR) from quantized measurements. The goal is to reconstruct a signal from quadratic measurements encoded with a finite precision, which is indeed the case in many practical applications. We develop a rank-1 projection algorithm that recovers the signal subject to ensuring consistency with the measurement, that is, the recovered signal when encoded must yield the same set of measurements that one started with. The rank-1 projection stems from the idea of lifting, originally proposed in the context of PhaseLift. The consistency criterion is enforced using a one-sided quadratic cost. We also determine the probability with which different vectors lead to the same set of quantized measurements, which makes it impossible to resolve them. Naturally, this probability depends on how correlated such vectors are, and how coarsely/finely the measurements get quantized. The proposed algorithm is also capable of incorporating a sparsity constraint on the signal. An analysis of the cost function reveals that it is bounded, both above and below, by functions that are dependent on how well correlated the estimate is with the ground truth. We also derive the Cram\'er-Rao lower bound (CRB) on the achievable reconstruction accuracy. A comparison with the state-of-the- art algorithms shows that the proposed algorithm has a higher reconstruction accuracy and is about 2 to 3 dB away from the CRB. The edge, in terms of the reconstruction signal-to-noise ratio, over the competing algorithms is higher (about 5 to 6 dB) when the quantization is coarse.
Nikku Madhusudhan
Exoplanetary atmospheric retrieval refers to the inference of atmospheric
properties of an exoplanet given an observed spectrum. The atmospheric
properties include the chemical compositions, temperature profiles,
clouds/hazes, and energy circulation. These properties, in turn, can provide
key insights into the atmospheric physicochemical processes of exoplanets as
well as their formation mechanisms. Major advancements in atmospheric retrieval
have been made in the last decade, thanks to a combination of state-of-the-art
spectroscopic observations and advanced atmospheric modeling and statistical
inference methods. These developments have already resulted in key constraints
on the atmospheric H2O abundances, temperature profiles, and other properties
for several exoplanets. Upcoming facilities such as the JWST will further
advance this area. The present chapter is a pedagogical review of this exciting
frontier of exoplanetary science. The principles of atmospheric retrievals of
exoplanets are discussed in detail, including parametric models and statistical
inference methods, along with a review of key results in the field. Some of the
main challenges in retrievals with current observations are discussed along
with new directions and the future landscape.
Authors' comments: 30 pages, 3 figures, Published in Springer Handbook of Exoplanets
Shah Nawaz, Muhammad Kamran Janjua, Alessandro Calefati, Ignazio Gallo
This paper proposes a cross-modal retrieval system that leverages on image
and text encoding. Most multimodal architectures employ separate networks for
each modality to capture the semantic relationship between them. However, in
our work image-text encoding can achieve comparable results in terms of
cross-modal retrieval without having to use a separate network for each
modality. We show that text encodings can capture semantic relationships
between multiple modalities. In our knowledge, this work is the first of its
kind in terms of employing a single network and fused image-text embedding for
cross-modal retrieval. We evaluate our approach on two famous multimodal
datasets: MS-COCO and Flickr30K.
Authors' comments: 14 pages. Under review at ECCVW (MULA 2018)
Ramina Ghods, Andrew S. Lan, Tom Goldstein, Christoph Studer
Phase retrieval deals with the recovery of complex- or real-valued signals
from magnitude measurements. As shown recently, the method PhaseMax enables
phase retrieval via convex optimization and without lifting the problem to a
higher dimension. To succeed, PhaseMax requires an initial guess of the
solution, which can be calculated via spectral initializers. In this paper, we
show that with the availability of an initial guess, phase retrieval can be
carried out with an ever simpler, linear procedure. Our algorithm, called
PhaseLin, is the linear estimator that minimizes the mean squared error (MSE)
when applied to the magnitude measurements. The linear nature of PhaseLin
enables an exact and nonasymptotic MSE analysis for arbitrary measurement
matrices. We furthermore demonstrate that by iteratively using PhaseLin, one
arrives at an efficient phase retrieval algorithm that performs on par with
existing convex and nonconvex methods on synthetic and real-world data.
Authors' comments: To be presented at CISS 2018 (http://ee-ciss.princeton.edu/)
Mahtab Mirmohseni, Mohammad Ali Maddah-Ali
The widespread use of cloud computing services raises the question of how one can delegate the processing tasks to the untrusted distributed parties without breeching the privacy of its data and algorithms. Motivated by the algorithm privacy concerns in a distributed computing system, in this paper, we introduce the private function retrieval (PFR) problem, where a user wishes to efficiently retrieve a linear function of $K$ messages from $N$ non-communicating replicated servers while keeping the function hidden from each individual server. The goal is to find a scheme with minimum communication cost. To characterize the fundamental limits of the communication cost, we define the capacity of PFR problem as the size of the message that can be privately retrieved (which is the size of one file) normalized to the required downloaded information bits. We first show that for the PFR problem with $K$ messages, $N=2$ servers and a linear function with binary coefficients the capacity is $C=\frac{1}{2}\Big(1-\frac{1}{2^K}\Big)^{-1}$. Interestingly, this is the capacity of retrieving one of $K$ messages from $N=2$ servers while keeping the index of the requested message hidden from each individual server, the problem known as private information retrieval (PIR). Then, we extend the proposed achievable scheme to the case of arbitrary number of servers and coefficients in the field $GF(q)$ with arbitrary $q$ and obtain $R=\Big(1-\frac{1}{N}\Big)\Big(1+\frac{\frac{1}{N-1}}{(\frac{q^K-1}{q-1})^{N-1}}\Big)$.