Eileen Gonzales, Ben Burningham, Jackie Faherty, Colleen Cleary, Channon Visscher, Mark Marley, Roxana Lupu, Richard Freedman
We present the distance-calibrated spectral energy distribution (SED) of the
d/sdL7 SDSS J14162408+1348263A (J1416A) and an updated SED for SDSS
J14162408+1348263B (J1416B). We also present the first retrieval analysis of
J1416A using the Brewster retrieval code base and the second retrieval of
J1416B. We find that the primary is best fit by a non-grey cloud opacity with a
power-law wavelength dependence, but is indistinguishable between the type of
cloud parameterization. J1416B is best fit by a cloud-free model, consistent
with the results from Line et al. (2017). Most fundamental parameters derived
via SEDs and retrievals are consistent within 1 sigma for both J1416A and
J1416B. The exceptions include the radius of J1416A, where the retrieved radius
is smaller than the evolutionary model-based radius from the SED for the deck
cloud model, and the bolometric luminosity which is consistent within 2.5 sigma
for both cloud models. The pair's metallicity and Carbon-to-Oxygen (C/O) ratio
point towards formation and evolution as a system. By comparing the retrieved
alkali abundances while using two opacity models, we are able to evaluate how
the opacities behave for the L and T dwarf. Lastly, we find that relatively
small changes in composition can drive major observable differences for lower
temperature objects.
Authors' comments: 40 pages, 25 figures
Samarth Rawal, Chitta Baral
Information Retrieval (IR) is the task of obtaining pieces of data (such as documents or snippets of text) that are relevant to a particular query or need from a large repository of information. While a combination of traditional keyword- and modern BERT-based approaches have been shown to be effective in recent work, there are often nuances in identifying what information is "relevant" to a particular query, which can be difficult to properly capture using these systems. This work introduces the concept of a Multi-Perspective IR system, a novel methodology that combines multiple deep learning and traditional IR models to better predict the relevance of a query-sentence pair, along with a standardized framework for tuning this system. This work is evaluated on the BioASQ Biomedical IR + QA challenges.
Xukang Wei, H. Paul Urbach, Peter van der Walle, Wim M. J. Coene
We present a parameter retrieval method which combines ptychography and additional prior knowledge about the object. The proposed method is applied to two applications: (1) parameter retrieval of small particles from Fourier ptychographic dark field measurements; (2) parameter retrieval of retangule with real-space ptychography. The influence of Poisson noise is discussed in the second part of the paper. The Cram\'{e}r Rao Lower Bound in both two applications is computed and Monte Carlo analysis is used to verify the calculated lower bound. With the computation results we report the lower bound for various noise levels and the correlation of particles in Application 1. For Application 2 the correlation of parameters of the rectangule is discussed.
Yu Xia, Zhiqiang Xu
The aim of sparse phase retrieval is to recover a $k$-sparse signal
$\mathbf{x}_0\in \mathbb{C}^{d}$ from quadratic measurements $|\langle
\mathbf{a}_i,\mathbf{x}_0\rangle|^2$ where $\mathbf{a}_i\in \mathbb{C}^d,
i=1,\ldots,m$. Noting $|\langle
\mathbf{a}_i,\mathbf{x}_0\rangle|^2={\text{Tr}}(A_iX_0)$ with
$A_i=\mathbf{a}_i\mathbf{a}_i^*\in \mathbb{C}^{d\times d},
X_0=\mathbf{x}_0\mathbf{x}_0^*\in \mathbb{C}^{d\times d}$, one can recast
sparse phase retrieval as a problem of recovering a rank-one sparse matrix from
linear measurements. Yin and Xin introduced PhaseLiftOff which presents a proxy
of rank-one condition via the difference of trace and Frobenius norm. By adding
sparsity penalty to PhaseLiftOff, in this paper, we present a novel model to
recover sparse signals from quadratic measurements. Theoretical analysis shows
that the solution to our model provides the stable recovery of $\mathbf{x}_0$
under almost optimal sampling complexity $m=O(k\log(d/k))$. The computation of
our model is carried out by the difference of convex function algorithm (DCA).
Numerical experiments demonstrate that our algorithm outperforms other
state-of-the-art algorithms used for solving sparse phase retrieval.
Authors' comments: 23 pages, 5 figures
Pierre-Emmanuel Emeriau, Mark Howard, Shane Mansfield
Random access codes have provided many examples of quantum advantage in
communication, but concern only one kind of information retrieval task. We
introduce a related task - the Torpedo Game - and show that it admits greater
quantum advantage than the comparable random access code. Perfect quantum
strategies involving prepare-and-measure protocols with experimentally
accessible three-level systems emerge via analysis in terms of the discrete
Wigner function. The example is leveraged to an operational advantage in a
pacifist version of the strategy game Battleship. We pinpoint a characteristic
of quantum systems that enables quantum advantage in any bounded-memory
information retrieval task. While preparation contextuality has previously been
linked to advantages in random access coding we focus here on a different
characteristic called sequential contextuality. It is shown not only to be
necessary and sufficient for quantum advantage, but also to quantify the degree
of advantage. Our perfect qutrit strategy for the Torpedo Game entails the
strongest type of inconsistency with non-contextual hidden variables, revealing
logical paradoxes with respect to those assumptions.
Authors' comments: 15 pages, 11 figures; new presentation, additional figures and
references
Kai Wan, Hua Sun, Mingyue Ji, Daniela Tuninetti, Giuseppe Caire
Coded caching is a promising technique to smooth out network traffic by
storing part of the library content at the users' local caches. The seminal
work on coded caching for single file retrieval by Maddah-Ali and Niesen (MAN)
showed the existence of a global caching gain that scales with the total memory
in the system, in addition to the known local caching gain in uncoded systems.
This paper formulates a novel cache-aided matrix multiplication retrieval
problem, relevant for data analytics and machine learning applications. In the
considered problem, each cache-aided user requests the product of two matrices
from the library. A structure-agnostic solution is to treat each possible
matrix product as an independent file and use the MAN coded caching scheme for
single file retrieval. This paper proposes two structure-aware schemes, which
partition each matrix in the library by either rows or columns and let a subset
of users cache some sub-matrices, that improve on the structure-agnostic
scheme. For the case where the library matrices are "fat" matrices, the
structure-aware row-partition scheme is shown to be order optimal under some
constraint.
Authors' comments: 41 pages, 5 figures, submitted to Transactions on Information Theory
Namrata Vaswani
Phase retrieval (PR), also sometimes referred to as quadratic sensing, is a
problem that occurs in numerous signal and image acquisition domains ranging
from optics, X-ray crystallography, Fourier ptychography, sub-diffraction
imaging, and astronomy. In each of these domains, the physics of the
acquisition system dictates that only the magnitude (intensity) of certain
linear projections of the signal or image can be measured. Without any
assumptions on the unknown signal, accurate recovery necessarily requires an
over-complete set of measurements. The only way to reduce the
measurements/sample complexity is to place extra assumptions on the unknown
signal/image. A simple and practically valid set of assumptions is obtained by
exploiting the structure inherently present in many natural signals or
sequences of signals. Two commonly used structural assumptions are (i) sparsity
of a given signal/image or (ii) a low rank model on the matrix formed by a set,
e.g., a time sequence, of signals/images. Both have been explored for solving
the PR problem in a sample-efficient fashion. This article describes this work,
with a focus on non-convex approaches that come with sample complexity
guarantees under simple assumptions. We also briefly describe other different
types of structural assumptions that have been used in recent literature.
Authors' comments: to appear in IEEE Signal Processing Magazine (Special Issue on
Non-Convex Optimization for Signal Processing and Machine Learning)
Islam Samy, Mohamed A. Attia, Ravi Tandon, Loukas Lazos
Information-theoretic formulations of the private information retrieval (PIR) problem have been investigated under a variety of scenarios. Symmetric private information retrieval (SPIR) is a variant where a user is able to privately retrieve one out of $K$ messages from $N$ non-colluding replicated databases without learning anything about the remaining $K-1$ messages. However, the goal of perfect privacy can be too taxing for certain applications. In this paper, we investigate if the information-theoretic capacity of SPIR (equivalently, the inverse of the minimum download cost) can be increased by relaxing both user and DB privacy definitions. Such relaxation is relevant in applications where privacy can be traded for communication efficiency. We introduce and investigate the Asymmetric Leaky PIR (AL-PIR) model with different privacy leakage budgets in each direction. For user privacy leakage, we bound the probability ratios between all possible realizations of DB queries by a function of a non-negative constant $\epsilon$. For DB privacy, we bound the mutual information between the undesired messages, the queries, and the answers, by a function of a non-negative constant $\delta$. We propose a general AL-PIR scheme that achieves an upper bound on the optimal download cost for arbitrary $\epsilon$ and $\delta$. We show that the optimal download cost of AL-PIR is upper-bounded as $D^{*}(\epsilon,\delta)\leq 1+\frac{1}{N-1}-\frac{\delta e^{\epsilon}}{N^{K-1}-1}$. Second, we obtain an information-theoretic lower bound on the download cost as $D^{*}(\epsilon,\delta)\geq 1+\frac{1}{Ne^{\epsilon}-1}-\frac{\delta}{(Ne^{\epsilon})^{K-1}-1}$. The gap analysis between the two bounds shows that our AL-PIR scheme is optimal when $\epsilon =0$, i.e., under perfect user privacy and it is optimal within a maximum multiplicative gap of $\frac{N-e^{-\epsilon}}{N-1}$ for any $(\epsilon,\delta)$.
Djamal Belazzougui, Gregory Kucherov
We study a document retrieval problem in the new framework where $D$ text
documents are organized in a {\em category tree} with a pre-defined number $h$
of categories. This situation occurs e.g. with taxomonic trees in biology or
subject classification systems for scientific literature. Given a string
pattern $p$ and a category (level in the category tree), we wish to efficiently
retrieve the $t$ \emph{categorical units} containing this pattern and belonging
to the category. We propose several efficient solutions for this problem. One
of them uses $n(\log\sigma(1+o(1))+\log D+O(h)) + O(\Delta)$ bits of space and
$O(|p|+t)$ query time, where $n$ is the total length of the documents, $\sigma$
the size of the alphabet used in the documents and $\Delta$ is the total number
of nodes in the category tree. Another solution uses
$n(\log\sigma(1+o(1))+O(\log D))+O(\Delta)+O(D\log n)$ bits of space and
$O(|p|+t\log D)$ query time. We finally propose other solutions which are more
space-efficient at the expense of a slight increase in query time.
Authors' comments: Full version of a paper accepted for presentation at the 31st Annual
Symposium on Combinatorial Pattern Matching (CPM 2020)
Chen Qu, Liu Yang, Cen Chen, Minghui Qiu, W. Bruce Croft, Mohit Iyyer
Conversational search is one of the ultimate goals of information retrieval.
Recent research approaches conversational search by simplified settings of
response ranking and conversational question answering, where an answer is
either selected from a given candidate set or extracted from a given passage.
These simplifications neglect the fundamental role of retrieval in
conversational search. To address this limitation, we introduce an
open-retrieval conversational question answering (ORConvQA) setting, where we
learn to retrieve evidence from a large collection before extracting answers,
as a further step towards building functional conversational search systems. We
create a dataset, OR-QuAC, to facilitate research on ORConvQA. We build an
end-to-end system for ORConvQA, featuring a retriever, a reranker, and a reader
that are all based on Transformers. Our extensive experiments on OR-QuAC
demonstrate that a learnable retriever is crucial for ORConvQA. We further show
that our system can make a substantial improvement when we enable history
modeling in all system components. Moreover, we show that the reranker
component contributes to the model performance by providing a regularization
effect. Finally, further in-depth analyses are performed to provide new
insights into ORConvQA.
Authors' comments: Accepted to SIGIR'20
Emma J. Gerritse, Faegheh Hasibi, Arjen P. de Vries
In this research, we improve upon the current state of the art in entity retrieval by re-ranking the result list using graph embeddings. The paper shows that graph embeddings are useful for entity-oriented search tasks. We demonstrate empirically that encoding information from the knowledge graph into (graph) embeddings contributes to a higher increase in effectiveness of entity retrieval results than using plain word embeddings. We analyze the impact of the accuracy of the entity linker on the overall retrieval effectiveness. Our analysis further deploys the cluster hypothesis to explain the observed advantages of graph embeddings over the more widely used word embeddings, for user tasks involving ranking entities.
Mark Loyman, Hayit Greenspan
Content based image retrieval (CBIR) provides the clinician with visual information that can support, and hopefully improve, his or her decision making process. Given an input query image, a CBIR system provides as its output a set of images, ranked by similarity to the query image. Retrieved images may come with relevant information, such as biopsy-based malignancy labeling, or categorization. Ground truth on similarity between dataset elements (e.g. between nodules) is not readily available, thus greatly challenging machine learning methods. Such annotations are particularly difficult to obtain, due to the subjective nature of the task, with high inter-observer variability requiring multiple expert annotators. Consequently, past approaches have focused on manual feature extraction, while current approaches use auxiliary tasks, such as a binary classification task (e.g. malignancy), for which ground-true is more readily accessible. However, in a previous study, we have shown that binary auxiliary tasks are inferior to the usage of a rough similarity estimate that are derived from data annotations. The current study suggests a semi-supervised approach that involves two steps: 1) Automatic annotation of a given partially labeled dataset; 2) Learning a semantic similarity metric space based on the predicated annotations. The proposed system is demonstrated in lung nodule retrieval using the LIDC dataset, and shows that it is feasible to learn embedding from predicted ratings. The semi-supervised approach has demonstrated a significantly higher discriminative ability than the fully-unsupervised reference.
Vaibhav Pandey, Nitish Nag, Ramesh Jain
Knowing the state of our health at every moment in time is critical for
advances in health science. Using data obtained outside an episodic clinical
setting is the first step towards building a continuous health estimation
system. In this paper, we explore a system that allows users to combine events
and data streams from different sources to retrieve complex biological events,
such as cardiovascular volume overload. These complex events, which have been
explored in biomedical literature and which we call interface events, have a
direct causal impact on relevant biological systems. They are the interface
through which the lifestyle events influence our health. We retrieve the
interface events from existing events and data streams by encoding domain
knowledge using an event operator language.
Authors' comments: ACM International Conference on Multimedia Retrieval 2020 (ICMR
2020), held in Dublin, Ireland from June 8-11, 2020
Albert Fannjiang, Thomas Strohmer
Phase retrieval, i.e., the problem of recovering a function from the squared magnitude of its Fourier transform, arises in many applications such as X-ray crystallography, diffraction imaging, optics, quantum mechanics, and astronomy. This problem has confounded engineers, physicists, and mathematicians for many decades. Recently, phase retrieval has seen a resurgence in research activity, ignited by new imaging modalities and novel mathematical concepts. As our scientific experiments produce larger and larger datasets and we aim for faster and faster throughput, it becomes increasingly important to study the involved numerical algorithms in a systematic and principled manner. Indeed, the last decade has witnessed a surge in the systematic study of computational algorithms for phase retrieval. In this paper we will review these recent advances from a numerical viewpoint.
Noemi Mauro, Liliana Ardissono, Adriano Savoca
Textual queries are largely employed in information retrieval to let users specify search goals in a natural way. However, differences in user and system terminologies can challenge the identification of the user's information needs, and thus the generation of relevant results. We argue that the explicit management of ontological knowledge, and of the meaning of concepts (by integrating linguistic and encyclopedic knowledge in the system ontology), can improve the analysis of search queries, because it enables a flexible identification of the topics the user is searching for, regardless of the adopted vocabulary. This paper proposes an information retrieval support model based on semantic concept identification. Starting from the recognition of the ontology concepts that the search query refers to, this model exploits the qualifiers specified in the query to select information items on the basis of possibly fine-grained features. Moreover, it supports query expansion and reformulation by suggesting the exploration of semantically similar concepts, as well as of concepts related to those referred in the query through thematic relations. A test on a data-set collected using the OnToMap Participatory GIS has shown that this approach provides accurate results.
Karam Abdulahhad
Concepts are used to solve the term-mismatch problem. However, we need an
effective similarity measure between concepts. Word embedding presents a
promising solution. We present in this study three approaches to build concepts
vectors based on words vectors. We use a vector-based measure to estimate
inter-concepts similarity. Our experiments show promising results. Furthermore,
words and concepts become comparable. This could be used to improve conceptual
indexing process.
Authors' comments: 6 pages
Islam Samy, Mohamed A. Attia, Ravi Tandon, Loukas Lazos
In many applications, content accessed by users (movies, videos, news articles, etc.) can leak sensitive latent attributes, such as religious and political views, sexual orientation, ethnicity, gender, and others. To prevent such information leakage, the goal of classical PIR is to hide the identity of the content/message being accessed, which subsequently also hides the latent attributes. This solution, while private, can be too costly, particularly, when perfect (information-theoretic) privacy constraints are imposed. For instance, for a single database holding $K$ messages, privately retrieving one message is possible if and only if the user downloads the entire database of $K$ messages. Retrieving content privately, however, may not be necessary to perfectly hide the latent attributes. Motivated by the above, we formulate and study the problem of latent-variable private information retrieval (LV-PIR), which aims at allowing the user efficiently retrieve one out of $K$ messages (indexed by $\theta$) without revealing any information about the latent variable (modeled by $S$). We focus on the practically relevant setting of a single database and show that one can significantly reduce the download cost of LV-PIR (compared to the classical PIR) based on the correlation between $\theta$ and $S$. We present a general scheme for LV-PIR as a function of the statistical relationship between $\theta$ and $S$, and also provide new results on the capacity/download cost of LV-PIR. Several open problems and new directions are also discussed.
Yen-Liang Lin, Son Tran, Larry S. Davis
Complementary fashion item recommendation is critical for fashion outfit
completion. Existing methods mainly focus on outfit compatibility prediction
but not in a retrieval setting. We propose a new framework for outfit
complementary item retrieval. Specifically, a category-based subspace attention
network is presented, which is a scalable approach for learning the subspace
attentions. In addition, we introduce an outfit ranking loss that better models
the item relationships of an entire outfit. We evaluate our method on the
outfit compatibility, FITB and new retrieval tasks. Experimental results
demonstrate that our approach outperforms state-of-the-art methods in both
compatibility prediction and complementary item retrieval
Authors' comments: Accepted by CVPR 2020
Sadegh Fadaei, Abdolreza Rashno, Elyas Rashno
Content-based image retrieval (CBIR) is a task of retrieving images from their contents. Since retrieval process is a time-consuming task in large image databases, acceleration methods can be very useful. This paper presents a novel method to speed up CBIR systems. In the proposed method, first Zernike moments are extracted from query image and an interval is calculated for that query. Images in database which are out of the interval are ignored in retrieval process. Therefore, a database reduction occurs before retrieval which leads to speed up. It is shown that in reduced database, relevant images to query image are preserved and irrelevant images are throwed away. Therefore, the proposed method speed up retrieval process and preserve CBIR accuracy, simultaneously.
Siamak Shakeri, Abhinav Sethy, Cheng Cheng
Complex deep learning models now achieve state of the art performance for
many document retrieval tasks. The best models process the query or claim
jointly with the document. However for fast scalable search it is desirable to
have document embeddings which are independent of the claim. In this paper we
show that knowledge distillation can be used to encourage a model that
generates claim independent document encodings to mimic the behavior of a more
complex model which generates claim dependent encodings. We explore this
approach in document retrieval for a fact extraction and verification task. We
show that by using the soft labels from a complex cross attention teacher
model, the performance of claim independent student LSTM or CNN models is
improved across all the ranking metrics. The student models we use are 12x
faster in runtime and 20x smaller in number of parameters than the teacher
Authors' comments: Published at Amazon Machine Learning Conference(AMLC) 2019