Yuxiang Lu, Zhuqing Jia, Syed A. Jafar
Double blind $T$-private information retrieval (DB-TPIR) enables two users,
each of whom specifies an index ($\theta_1, \theta_2$, resp.), to efficiently
retrieve a message $W(\theta_1,\theta_2)$ labeled by the two indices, from a
set of $N$ servers that store all messages $W(k_1,k_2),
k_1\in\{1,2,\cdots,K_1\}, k_2\in\{1,2,\cdots,K_2\}$, such that the two users'
indices are kept private from any set of up to $T_1,T_2$ colluding servers,
respectively, as well as from each other. A DB-TPIR scheme based on
cross-subspace alignment is proposed in this paper, and shown to be
capacity-achieving in the asymptotic setting of large number of messages and
bounded latency. The scheme is then extended to $M$-way blind $X$-secure
$T$-private information retrieval (MB-XS-TPIR) with multiple ($M$) indices,
each belonging to a different user, arbitrary privacy levels for each index
($T_1, T_2,\cdots, T_M$), and arbitrary level of security ($X$) of data
storage, so that the message $W(\theta_1,\theta_2,\cdots, \theta_M)$ can be
efficiently retrieved while the stored data is held secure against collusion
among up to $X$ colluding servers, the $m^{th}$ user's index is private against
collusion among up to $T_m$ servers, and each user's index $\theta_m$ is
private from all other users. The general scheme relies on a tensor-product
based extension of cross-subspace alignment and retrieves
$1-(X+T_1+\cdots+T_M)/N$ bits of desired message per bit of download.
Authors' comments: Accepted for publication in IEEE Journal on Selected Areas in
Information Theory (JSAIT)
Xueya Zhang, Tong Zhang, Xiaobin Hong, Zhen Cui, Jian Yang
Movie graphs play an important role to bridge heterogenous modalities of videos and texts in human-centric retrieval. In this work, we propose Graph Wasserstein Correlation Analysis (GWCA) to deal with the core issue therein, i.e, cross heterogeneous graph comparison. Spectral graph filtering is introduced to encode graph signals, which are then embedded as probability distributions in a Wasserstein space, called graph Wasserstein metric learning. Such a seamless integration of graph signal filtering together with metric learning results in a surprise consistency on both learning processes, in which the goal of metric learning is just to optimize signal filters or vice versa. Further, we derive the solution of the graph comparison model as a classic generalized eigenvalue decomposition problem, which has an exactly closed-form solution. Finally, GWCA together with movie/text graphs generation are unified into the framework of movie retrieval to evaluate our proposed method. Extensive experiments on MovieGrpahs dataset demonstrate the effectiveness of our GWCA as well as the entire framework.
Alexander Lebedev, Andrei Khrennikov
Recently people started to understand that applications of the mathematical formalism of quantum theory are not reduced to physics. Nowadays, this formalism is widely used outside of quantum physics, in particular, in cognition, psychology, decision making, information processing, especially information retrieval. The latter is very promising. The aim of this brief introductory review is to stimulate research in this exciting area of information science. This paper is not aimed to present a complete review on the state of art in quantum information retrieval.
Ming-Wei Li, Qing-Yuan Jiang, Wu-Jun Li
Due to its low storage cost and fast query speed, hashing has been widely
used in large-scale image retrieval tasks. Hash bucket search returns data
points within a given Hamming radius to each query, which can enable search at
a constant or sub-linear time cost. However, existing hashing methods cannot
achieve satisfactory retrieval performance for hash bucket search in complex
scenarios, since they learn only one hash code for each image. More
specifically, by using one hash code to represent one image, existing methods
might fail to put similar image pairs to the buckets with a small Hamming
distance to the query when the semantic information of images is complex. As a
result, a large number of hash buckets need to be visited for retrieving
similar images, based on the learned codes. This will deteriorate the
efficiency of hash bucket search. In this paper, we propose a novel hashing
framework, called multiple code hashing (MCH), to improve the performance of
hash bucket search. The main idea of MCH is to learn multiple hash codes for
each image, with each code representing a different region of the image.
Furthermore, we propose a deep reinforcement learning algorithm to learn the
parameters in MCH. To the best of our knowledge, this is the first work that
proposes to learn multiple hash codes for each image in image retrieval.
Experiments demonstrate that MCH can achieve a significant improvement in hash
bucket search, compared with existing methods that learn only one hash code for
each image.
Authors' comments: 12 pages, 9 figures, 3 tables
Te-Yuan Liu, Ata Mahjoubfar, Daniel Prusinski, Luis Stevens
Neuromorphic computing mimics the neural activity of the brain through emulating spiking neural networks. In numerous machine learning tasks, neuromorphic chips are expected to provide superior solutions in terms of cost and power efficiency. Here, we explore the application of Loihi, a neuromorphic computing chip developed by Intel, for the computer vision task of image retrieval. We evaluated the functionalities and the performance metrics that are critical in content-based visual search and recommender systems using deep-learning embeddings. Our results show that the neuromorphic solution is about 2.5 times more energy-efficient compared with an ARM Cortex-A72 CPU and 12.5 times more energy-efficient compared with NVIDIA T4 GPU for inference by a lightweight convolutional neural network without batching while maintaining the same level of matching accuracy. The study validates the potential of neuromorphic computing in low-power image retrieval, as a complementary paradigm to the existing von Neumann architectures.
Rakib Hyder, Zikui Cai, M. Salman Asif
Fourier phase retrieval is a classical problem that deals with the recovery
of an image from the amplitude measurements of its Fourier coefficients.
Conventional methods solve this problem via iterative (alternating)
minimization by leveraging some prior knowledge about the structure of the
unknown image. The inherent ambiguities about shift and flip in the Fourier
measurements make this problem especially difficult; and most of the existing
methods use several random restarts with different permutations. In this paper,
we assume that a known (learned) reference is added to the signal before
capturing the Fourier amplitude measurements. Our method is inspired by the
principle of adding a reference signal in holography. To recover the signal, we
implement an iterative phase retrieval method as an unrolled network. Then we
use back propagation to learn the reference that provides us the best
reconstruction for a fixed number of phase retrieval iterations. We performed a
number of simulations on a variety of datasets under different conditions and
found that our proposed method for phase retrieval via unrolled network and
learned reference provides near-perfect recovery at fixed (small) computational
cost. We compared our method with standard Fourier phase retrieval methods and
observed significant performance enhancement using the learned reference.
Authors' comments: Accepted to ECCV 2020. Code is available at
https://github.com/CSIPlab/learnPR_reference
Craig Macdonald, Nicola Tonellotto
The advent of deep machine learning platforms such as Tensorflow and Pytorch, developed in expressive high-level languages such as Python, have allowed more expressive representations of deep neural network architectures. We argue that such a powerful formalism is missing in information retrieval (IR), and propose a framework called PyTerrier that allows advanced retrieval pipelines to be expressed, and evaluated, in a declarative manner close to their conceptual design. Like the aforementioned frameworks that compile deep learning experiments into primitive GPU operations, our framework targets IR platforms as backends in order to execute and evaluate retrieval pipelines. Further, we can automatically optimise the retrieval pipelines to increase their efficiency to suite a particular IR platform backend. Our experiments, conducted on TREC Robust and ClueWeb09 test collections, demonstrate the efficiency benefits of these optimisations for retrieval pipelines involving both the Anserini and Terrier IR platforms.
Valentin Gabeur, Chen Sun, Karteek Alahari, Cordelia Schmid
The task of retrieving video content relevant to natural language queries
plays a critical role in effectively handling internet-scale datasets. Most of
the existing methods for this caption-to-video retrieval problem do not fully
exploit cross-modal cues present in video. Furthermore, they aggregate
per-frame visual features with limited or no temporal information. In this
paper, we present a multi-modal transformer to jointly encode the different
modalities in video, which allows each of them to attend to the others. The
transformer architecture is also leveraged to encode and model the temporal
information. On the natural language side, we investigate the best practices to
jointly optimize the language embedding together with the multi-modal
transformer. This novel framework allows us to establish state-of-the-art
results for video retrieval on three datasets. More details are available at
http://thoth.inrialpes.fr/research/MMT.
Authors' comments: ECCV 2020 (spotlight paper)
Mikolaj Jankowski, Deniz Gunduz, Krystian Mikolajczyk
We study the image retrieval problem at the wireless edge, where an edge device captures an image, which is then used to retrieve similar images from an edge server. These can be images of the same person or a vehicle taken from other cameras at different times and locations. Our goal is to maximize the accuracy of the retrieval task under power and bandwidth constraints over the wireless link. Due to the stringent delay constraint of the underlying application, sending the whole image at a sufficient quality is not possible. We propose two alternative schemes based on digital and analog communications, respectively. In the digital approach, we first propose a deep neural network (DNN) aided retrieval-oriented image compression scheme, whose output bit sequence is transmitted over the channel using conventional channel codes. In the analog joint source and channel coding (JSCC) approach, the feature vectors are directly mapped into channel symbols. We evaluate both schemes on image based re-identification (re-ID) tasks under different channel conditions, including both static and fading channels. We show that the JSCC scheme significantly increases the end-to-end accuracy, speeds up the encoding process, and provides graceful degradation with channel conditions. The proposed architecture is evaluated through extensive simulations on different datasets and channel conditions, as well as through ablation studies.
Hsuan-Yin Lin, Siddhartha Kumar, Eirik Rosnes, Alexandre Graell i Amat, Eitan Yaakobi
Private information retrieval (PIR) protocols ensure that a user can download
a file from a database without revealing any information on the identity of the
requested file to the servers storing the database. While existing protocols
strictly impose that no information is leaked on the file's identity, this work
initiates the study of the tradeoffs that can be achieved by relaxing the
perfect privacy requirement. We refer to such protocols as weakly-private
information retrieval (WPIR) protocols. In particular, for the case of multiple
noncolluding replicated servers, we study how the download rate, the upload
cost, and the access complexity can be improved when relaxing the full privacy
constraint. To quantify the information leakage on the requested file's
identity we consider mutual information (MI), worst-case information leakage,
and maximal leakage (MaxL). We present two WPIR schemes, denoted by Scheme A
and Scheme B, based on two recent PIR protocols and show that the download rate
of the former can be optimized by solving a convex optimization problem. We
also show that Scheme A achieves an improved download rate compared to the
recently proposed scheme by Samy et al. under the so-called $\epsilon$-privacy
metric. Additionally, a family of schemes based on partitioning is presented.
Moreover, we provide an information-theoretic converse bound for the maximum
possible download rate for the MI and MaxL privacy metrics under a practical
restriction on the alphabet size of queries and answers. For two servers and
two files, the bound is tight under the MaxL metric, which settles the WPIR
capacity in this particular case. Finally, we compare the performance of the
proposed schemes and their gap to the converse bound.
Authors' comments: To appear in IEEE Transactions on Information Theory. arXiv admin
note: text overlap with arXiv:1901.06730
Xiao Wang, Craig Macdonald, Iadh Ounis
Query reformulations have long been a key mechanism to alleviate the
vocabulary-mismatch problem in information retrieval, for example by expanding
the queries with related query terms or by generating paraphrases of the
queries. In this work, we propose a deep reinforced query reformulation (DRQR)
model to automatically generate new reformulations of the query. To encourage
the model to generate queries which can achieve high performance when
performing the retrieval task, we incorporate query performance prediction into
our reward function. In addition, to evaluate the quality of the reformulated
query in the context of information retrieval, we first train our DRQR model,
then apply the retrieval ranking model on the obtained reformulated query.
Experiments are conducted on the TREC 2020 Deep Learning track MSMARCO document
ranking dataset. Our results show that our proposed model outperforms several
query reformulation model baselines when performing retrieval task. In
addition, improvements are also observed when combining with various retrieval
models, such as query expansion and BERT.
Authors' comments: 10 pages, 4 figures
Golsa Tahmasebzadeh, Sherzod Hakimov, Eric Müller-Budack, Ralph Ewerth
Content-based information retrieval is based on the information contained in
documents rather than using metadata such as keywords. Most information
retrieval methods are either based on text or image. In this paper, we
investigate the usefulness of multimodal features for cross-lingual news search
in various domains: politics, health, environment, sport, and finance. To this
end, we consider five feature types for image and text and compare the
performance of the retrieval system using different combinations. Experimental
results show that retrieval results can be improved when considering both
visual and textual information. In addition, it is observed that among textual
features entity overlap outperforms word embeddings, while geolocation
embeddings achieve better performance among visual features in the retrieval
task.
Authors' comments: CLEOPATRA Workshop co-located with ESWC 2020
Zakaria Laskar, Juho Kannala
Recent advances in deep learning has lead to rapid developments in the field
of image retrieval. However, the best performing architectures incur
significant computational cost. Recent approaches tackle this issue using
knowledge distillation to transfer knowledge from a deeper and heavier
architecture to a much smaller network. In this paper we address knowledge
distillation for metric learning problems. Unlike previous approaches, our
proposed method jointly addresses the following constraints i) limited queries
to teacher model, ii) black box teacher model with access to the final output
representation, and iii) small fraction of original training data without any
ground-truth labels. In addition, the distillation method does not require the
student and teacher to have same dimensionality. Addressing these constraints
reduces computation requirements, dependency on large-scale training datasets
and addresses practical scenarios of limited or partial access to private data
such as teacher models or the corresponding training data/labels. The key idea
is to augment the original training set with additional samples by performing
linear interpolation in the final output representation space. Distillation is
then performed in the joint space of original and augmented teacher-student
sample representations. Results demonstrate that our approach can match
baseline models trained with full supervision. In low training sample settings,
our approach outperforms the fully supervised approach on two challenging image
retrieval datasets, ROxford5k and RParis6k \cite{Roxf} with the least possible
teacher supervision.
Authors' comments: 10 pages, 2 figures. Edited figure 7
Raul Gomez, Jaume Gibert, Lluis Gomez, Dimosthenis Karatzas
People from different parts of the globe describe objects and concepts in distinct manners. Visual appearance can thus vary across different geographic locations, which makes location a relevant contextual information when analysing visual data. In this work, we address the task of image retrieval related to a given tag conditioned on a certain location on Earth. We present LocSens, a model that learns to rank triplets of images, tags and coordinates by plausibility, and two training strategies to balance the location influence in the final ranking. LocSens learns to fuse textual and location information of multimodal queries to retrieve related images at different levels of location granularity, and successfully utilizes location information to improve image tagging.
Emory Hufbauer, Hana Khamfroush
Twitter, like many social media and data brokering companies, makes their data available through a search API (application programming interface). In addition to filtering results by date and location, researchers can search for tweets with specific content with a boolean text query, using {\it AND}, {\it OR}, and {\it NOT} operators to select the combinations of phrases which must, or must not, appear in matching tweets. This boolean text search system is not at all unique to Twitter and is found in many different contexts, including academic, legal, and medical databases, however it is stretched to its limits in Twitter's use case because of the relative volume and brevity of tweets. In addition, the semi-automated use of such systems was well studied under the topic of Information Retrieval during the 1980s and 1990s, however the study of such systems has greatly declined since that time. As such, we propose updated methods for automatically selecting and refining complex boolean search queries that can isolate relevant results with greater specificity and completeness. Furthermore, we present preliminary results of using an optimized query to collect a sample of traffic-incident-related tweets, along with the results of manually classifying and analyzing them.
Jui-Ting Huang, Ashish Sharma, Shuying Sun, Li Xia, David Zhang, Philip Pronin, Janani Padmanabhan, Giuseppe Ottaviano et al.
Search in social networks such as Facebook poses different challenges than in
classical web search: besides the query text, it is important to take into
account the searcher's context to provide relevant results. Their social graph
is an integral part of this context and is a unique aspect of Facebook search.
While embedding-based retrieval (EBR) has been applied in eb search engines for
years, Facebook search was still mainly based on a Boolean matching model. In
this paper, we discuss the techniques for applying EBR to a Facebook Search
system. We introduce the unified embedding framework developed to model
semantic embeddings for personalized search, and the system to serve
embedding-based retrieval in a typical search system based on an inverted
index. We discuss various tricks and experiences on end-to-end optimization of
the whole system, including ANN parameter tuning and full-stack optimization.
Finally, we present our progress on two selected advanced topics about
modeling. We evaluated EBR on verticals for Facebook Search with significant
metrics gains observed in online A/B experiments. We believe this paper will
provide useful insights and experiences to help people on developing
embedding-based retrieval systems in search engines.
Authors' comments: 9 pages, 3 figures, 3 tables, to be published in KDD '20
Seyedehsara Nayer, Namrata Vaswani
This work studies the Low Rank Phase Retrieval (LRPR) problem: recover an $n
\times q$ rank-$r$ matrix $X^*$ from $y_k = |A_k^\top x^*_k|$, $k=1, 2,..., q$,
when each $y_k$ is an m-length vector containing independent phaseless linear
projections of $x^*_k$. The different matrices $A_k$ are i.i.d. and each
contains i.i.d. standard Gaussian entries. We obtain an improved guarantee for
AltMinLowRaP, which is an Alternating Minimization solution to LRPR that was
introduced and studied in our recent work. As long as the right singular
vectors of $X^*$ satisfy the incoherence assumption, we can show that the
AltMinLowRaP estimate converges geometrically to $X^*$ if the total number of
measurements $mq \gtrsim nr^2 (r + \log(1/\epsilon))$. In addition, we also
need $m \gtrsim max(r, \log q, \log n)$ because of the specific asymmetric
nature of our problem. Compared to our recent work, we improve the sample
complexity of the AltMin iterations by a factor of $r^2$, and that of the
initialization by a factor of $r$. We also extend our result to the noisy case;
we prove stability to corruption by small additive noise.
Authors' comments: Revised for IEEE Trans. Info. Th
Jie Yang, J. Pedro F. Nunes, Kathryn Ledbetter, Elisa Biasin, Martin Centurion, Zhijiang Chen, Amy A. Cordones, Christ Crissman et al.
Electron scattering on liquid samples has been enabled recently by the development of ultrathin liquid sheet technologies. The data treatment of liquid-phase electron scattering has been mostly reliant on methodologies developed for gas electron diffraction, in which theoretical inputs and empirical fittings are often needed to account for the atomic form factor and remove the inelastic scattering background. The accuracy and impact of these theoretical and empirical inputs has not been benchmarked for liquid-phase electron scattering data. In this work, we present an alternative data treatment method that requires neither theoretical inputs nor empirical fittings. The merits of this new method are illustrated through the retrieval of real-space molecular structure from experimental electron scattering patterns of liquid water, carbon tetrachloride, chloroform, and dichloromethane.
Fan Wu, Patrick Rebeschini
We consider the problem of reconstructing an $n$-dimensional $k$-sparse signal from a set of noiseless magnitude-only measurements. Formulating the problem as an unregularized empirical risk minimization task, we study the sample complexity performance of gradient descent with Hadamard parametrization, which we call Hadamard Wirtinger flow (HWF). Provided knowledge of the signal sparsity $k$, we prove that a single step of HWF is able to recover the support from $k(x^*_{max})^{-2}$ (modulo logarithmic term) samples, where $x^*_{max}$ is the largest component of the signal in magnitude. This support recovery procedure can be used to initialize existing reconstruction methods and yields algorithms with total runtime proportional to the cost of reading the data and improved sample complexity, which is linear in $k$ when the signal contains at least one large component. We numerically investigate the performance of HWF at convergence and show that, while not requiring any explicit form of regularization nor knowledge of $k$, HWF adapts to the signal sparsity and reconstructs sparse signals with fewer measurements than existing gradient based methods.
Ashlee Milton, Maria Soledad Pera
Evaluation of information retrieval systems (IRS) is a prominent topic among
information retrieval researchers--mainly directed at a general population.
Children require unique IRS and by extension different ways to evaluate these
systems, but as a large population that use IRS have largely been ignored on
the evaluation front. In this position paper, we explore many perspectives that
must be considered when evaluating IRS; we specially discuss problems faced by
researchers who work with children IRS, including lack of evaluation
frameworks, limitations of data, and lack of user judgment understanding.
Authors' comments: Accepted at the 4th International and Interdisciplinary Perspectives
on Children & Recommender and Information Retrieval Systems (KidRec '20),
co-located with the 19th ACM International Conference on Interaction Design
and Children (IDC '20), https://kidrec.github.io/