William Thong, Cees G. M. Snoek, Arnold W. M. Smeulders
The goal of this paper is to retrieve an image based on instance, attribute and category similarity notions. Different from existing works, which usually address only one of these entities in isolation, we introduce a cooperative embedding to integrate them while preserving their specific level of semantic representation. An algebraic structure defines a superspace filled with instances. Attributes are axis-aligned to form subspaces, while categories influence the arrangement of similar instances. These relationships enable them to cooperate for their mutual benefits for image retrieval. We derive a proxy-based softmax embedding loss to learn simultaneously all similarity measures in both superspace and subspaces. We evaluate our model on datasets from two different domains. Experiments on image retrieval tasks show the benefits of the cooperative embeddings for modeling multiple image similarities, and for discovering style evolution of instances between- and within-categories.
Jun Yu, Xiao-Jun Wu
With the advantage of low storage cost and high efficiency, hashing learning
has received much attention in the domain of Big Data. In this paper, we
propose a novel unsupervised hashing learning method to cope with this open
problem to directly preserve the manifold structure by hashing. To address this
problem, both the semantic correlation in textual space and the locally
geometric structure in the visual space are explored simultaneously in our
framework. Besides, the `2;1-norm constraint is imposed on the projection
matrices to learn the discriminative hash function for each modality. Extensive
experiments are performed to evaluate the proposed method on the three publicly
available datasets and the experimental results show that our method can
achieve superior performance over the state-of-the-art methods.
Authors' comments: 4 pages, 4 figures
Wei Yang, Haotian Zhang, Jimmy Lin
Following recent successes in applying BERT to question answering, we explore simple applications to ad hoc document retrieval. This required confronting the challenge posed by documents that are typically longer than the length of input BERT was designed to handle. We address this issue by applying inference on sentences individually, and then aggregating sentence scores to produce document scores. Experiments on TREC microblog and newswire test collections show that our approach is simple yet effective, as we report the highest average precision on these datasets by neural approaches that we are aware of.
HeeJae Jun, Byungsoo Ko, Youngjoon Kim, Insik Kim, Jongtack Kim
Recent studies in image retrieval task have shown that ensembling different models and combining multiple global descriptors lead to performance improvement. However, training different models for the ensemble is not only difficult but also inefficient with respect to time and memory. In this paper, we propose a novel framework that exploits multiple global descriptors to get an ensemble effect while it can be trained in an end-to-end manner. The proposed framework is flexible and expandable by the global descriptor, CNN backbone, loss, and dataset. Moreover, we investigate the effectiveness of combining multiple global descriptors with quantitative and qualitative analysis. Our extensive experiments show that the combined descriptor outperforms a single global descriptor, as it can utilize different types of feature properties. In the benchmark evaluation, the proposed framework achieves the state-of-the-art performance on the CARS196, CUB200-2011, In-shop Clothes, and Stanford Online Products on image retrieval tasks. Our model implementations and pretrained models are publicly available.
Seunghoan Song, Masahito Hayashi
We study the capacity of quantum private information retrieval (QPIR) with multiple servers. In the QPIR problem with multiple servers, a user retrieves a classical file by downloading quantum systems from multiple servers each of which contains the copy of a classical file set while the identity of the downloaded file is not leaked to each server. The QPIR capacity is defined as the maximum rate of the file size over the whole dimension of the downloaded quantum systems. When the servers are assumed to share prior entanglement, we prove that the QPIR capacity with multiple servers is 1 regardless of the number of servers and files. We construct a rate-one protocol only with two servers. This capacity-achieving protocol outperforms its classical counterpart in the sense of capacity, server secrecy, and upload cost. The strong converse bound is derived concisely without using any secrecy condition. We also prove that the capacity of multi-round QPIR is 1.
Joel Brogan, Aparna Bharati, Daniel Moreira, Kevin Bowyer, Patrick Flynn, Anderson Rocha, Walter Scheirer
Images from social media can reflect diverse viewpoints, heated arguments, and expressions of creativity, adding new complexity to retrieval tasks. Researchers working onContent-Based Image Retrieval (CBIR) have traditionally tuned their algorithms to match filtered results with user search intent. However, we are now bombarded with composite images of unknown origin, authenticity, and even meaning. With such uncertainty, users may not have an initial idea of what the results of a search query should look like. For instance, hidden people, spliced objects, and subtly altered scenes can be difficult for a user to detect initially in a meme image, but may contribute significantly to its composition. We propose a new approach for spatial verification that aims at modeling object-level regions dynamically clustering keypoints in a 2D Hough space, which are then used to accurately weight small contributing objects within the results, without the need for costly object detection steps. We call this method Objects in Scene to Objects in Scene (OS2OS) score, and it is optimized for fast matrix operations on CPUs. OS2OS performs comparably to state-of-the-art methods in classic CBIR problems, on the Oxford5K, Paris 6K, and Google-Landmarks datasets, without the need for bounding boxes. It also succeeds in emerging retrieval tasks such as image composite matching in the NIST MFC2018 dataset and meme-style composite imagery fromReddit.
Christy Y. Li, Xiaodan Liang, Zhiting Hu, Eric P. Xing
Generating long and semantic-coherent reports to describe medical images poses great challenges towards bridging visual and linguistic modalities, incorporating medical domain knowledge, and generating realistic and accurate descriptions. We propose a novel Knowledge-driven Encode, Retrieve, Paraphrase (KERP) approach which reconciles traditional knowledge- and retrieval-based methods with modern learning-based methods for accurate and robust medical report generation. Specifically, KERP decomposes medical report generation into explicit medical abnormality graph learning and subsequent natural language modeling. KERP first employs an Encode module that transforms visual features into a structured abnormality graph by incorporating prior medical knowledge; then a Retrieve module that retrieves text templates based on the detected abnormalities; and lastly, a Paraphrase module that rewrites the templates according to specific cases. The core of KERP is a proposed generic implementation unit---Graph Transformer (GTR) that dynamically transforms high-level semantics between graph-structured data of multiple domains such as knowledge graphs, images and sequences. Experiments show that the proposed approach generates structured and robust reports supported with accurate abnormality description and explainable attentive regions, achieving the state-of-the-art results on two medical report benchmarks, with the best medical abnormality and disease classification accuracy and improved human evaluation performance.
Raffaele Imbriaco, Clint Sebastian, Egor Bondarev, Peter H. N. de With
Remote Sensing Image Retrieval remains a challenging topic due to the special
nature of Remote Sensing Imagery. Such images contain various different
semantic objects, which clearly complicates the retrieval task. In this paper,
we present an image retrieval pipeline that uses attentive, local convolutional
features and aggregates them using the Vector of Locally Aggregated Descriptors
(VLAD) to produce a global descriptor. We study various system parameters such
as the multiplicative and additive attention mechanisms and descriptor
dimensionality. We propose a query expansion method that requires no external
inputs. Experiments demonstrate that even without training, the local
convolutional features and global representation outperform other systems.
After system tuning, we can achieve state-of-the-art or competitive results.
Furthermore, we observe that our query expansion method increases overall
system performance by about 3%, using only the top-three retrieved images.
Finally, we show how dimensionality reduction produces compact descriptors with
increased retrieval performance and fast retrieval computation times, e.g. 50%
faster than the current systems.
Authors' comments: Published in Remote Sensing. The first two authors have equal
contribution
Jiafeng Guo, Yixing Fan, Liang Pang, Liu Yang, Qingyao Ai, Hamed Zamani, Chen Wu, W. Bruce Croft et al.
Ranking models lie at the heart of research on information retrieval (IR). During the past decades, different techniques have been proposed for constructing ranking models, from traditional heuristic methods, probabilistic methods, to modern machine learning methods. Recently, with the advance of deep learning technology, we have witnessed a growing body of work in applying shallow or deep neural networks to the ranking problem in IR, referred to as neural ranking models in this paper. The power of neural ranking models lies in the ability to learn from the raw text inputs for the ranking problem to avoid many limitations of hand-crafted features. Neural networks have sufficient capacity to model complicated tasks, which is needed to handle the complexity of relevance estimation in ranking. Since there have been a large variety of neural ranking models proposed, we believe it is the right time to summarize the current status, learn from existing methodologies, and gain some insights for future development. In contrast to existing reviews, in this survey, we will take a deep look into the neural ranking models from different dimensions to analyze their underlying assumptions, major design principles, and learning strategies. We compare these models through benchmark tasks to obtain a comprehensive empirical understanding of the existing techniques. We will also discuss what is missing in the current literature and what are the promising and desired future directions.
Ziyu Yao, Jayavardhan Reddy Peddamail, Huan Sun
To accelerate software development, much research has been performed to help
people understand and reuse the huge amount of available code resources. Two
important tasks have been widely studied: code retrieval, which aims to
retrieve code snippets relevant to a given natural language query from a code
base, and code annotation, where the goal is to annotate a code snippet with a
natural language description. Despite their advancement in recent years, the
two tasks are mostly explored separately. In this work, we investigate a novel
perspective of Code annotation for Code retrieval (hence called `CoaCor'),
where a code annotation model is trained to generate a natural language
annotation that can represent the semantic meaning of a given code snippet and
can be leveraged by a code retrieval model to better distinguish relevant code
snippets from others. To this end, we propose an effective framework based on
reinforcement learning, which explicitly encourages the code annotation model
to generate annotations that can be used for the retrieval task. Through
extensive experiments, we show that code annotations generated by our framework
are much more detailed and more useful for code retrieval, and they can further
improve the performance of existing code retrieval models significantly.
Authors' comments: 10 pages, 2 figures. Accepted by The Web Conference (WWW) 2019
Junjie Ma, Rishabh Dudeja, Ji Xu, Arian Maleki, Xiaodong Wang
Phase retrieval refers to the problem of recovering a signal
$\mathbf{x}_{\star}\in\mathbb{C}^n$ from its phaseless measurements
$y_i=|\mathbf{a}_i^{\mathrm{H}}\mathbf{x}_{\star}|$, where
$\{\mathbf{a}_i\}_{i=1}^m$ are the measurement vectors. Many popular phase
retrieval algorithms are based on the following two-step procedure: (i)
initialize the algorithm based on a spectral method, (ii) refine the initial
estimate by a local search algorithm (e.g., gradient descent). The quality of
the spectral initialization step can have a major impact on the performance of
the overall algorithm. In this paper, we focus on the model where the
measurement matrix $\mathbf{A}=[\mathbf{a}_1,\ldots,\mathbf{a}_m]^{\mathrm{H}}$
has orthonormal columns, and study the spectral initialization under the
asymptotic setting $m,n\to\infty$ with $m/n\to\delta\in(1,\infty)$. We use the
expectation propagation framework to characterize the performance of spectral
initialization for Haar distributed matrices. Our numerical results confirm
that the predictions of the EP method are accurate for not-only Haar
distributed matrices, but also for realistic Fourier based models (e.g. the
coded diffraction model). The main findings of this paper are the following:
(1) There exists a threshold on $\delta$ (denoted as
$\delta_{\mathrm{weak}}$) below which the spectral method cannot produce a
meaningful estimate. We show that $\delta_{\mathrm{weak}}=2$ for the
column-orthonormal model. In contrast, previous results by Mondelli and
Montanari show that $\delta_{\mathrm{weak}}=1$ for the i.i.d. Gaussian model.
(2) The optimal design for the spectral method coincides with that for the
i.i.d. Gaussian model, where the latter was recently introduced by Luo,
Alghamdi and Lu.
Authors' comments: Accepted by IEEE Transactions on Information Theory
Chao Li, Cheng Deng, Lei Wang, De Xie, Xianglong Liu
In recent years, hashing has attracted more and more attention owing to its superior capacity of low storage cost and high query efficiency in large-scale cross-modal retrieval. Benefiting from deep leaning, continuously compelling results in cross-modal retrieval community have been achieved. However, existing deep cross-modal hashing methods either rely on amounts of labeled information or have no ability to learn an accuracy correlation between different modalities. In this paper, we proposed Unsupervised coupled Cycle generative adversarial Hashing networks (UCH), for cross-modal retrieval, where outer-cycle network is used to learn powerful common representation, and inner-cycle network is explained to generate reliable hash codes. Specifically, our proposed UCH seamlessly couples these two networks with generative adversarial mechanism, which can be optimized simultaneously to learn representation and hash codes. Extensive experiments on three popular benchmark datasets show that the proposed UCH outperforms the state-of-the-art unsupervised cross-modal hashing methods.
Gregory R. Brady, Christopher Moriarty, Peter Petrone, Iva Laginja, Keira Brooks, Tom Comeau, Lucie Leboulleux, Remi Soummer
We discuss the use of parametric phase-diverse phase retrieval as an in-situ
high-fidelity wavefront measurement method to characterize and optimize the
transmitted wavefront of a high-contrast coronagraphic instrument. We apply our
method to correct the transmitted wavefront of the HiCAT (High contrast imager
for Complex Aperture Telescopes) coronagraphic testbed. This correction
requires a series of calibration steps, which we describe. The correction
improves the system wavefront from 16 nm RMS to 3.0 nm RMS.
Authors' comments: 13 pages, 12 figures
Svebor Karaman, Xudong Lin, Xuefeng Hu, Shih-Fu Chang
We propose an unsupervised hashing method which aims to produce binary codes that preserve the ranking induced by a real-valued representation. Such compact hash codes enable the complete elimination of real-valued feature storage and allow for significant reduction of the computation complexity and storage cost of large-scale image retrieval applications. Specifically, we learn a neural network-based model, which transforms the input representation into a binary representation. We formalize the training objective of the network in an intuitive and effective way, considering each training sample as a query and aiming to obtain the same retrieval results using the produced hash codes as those obtained with the original features. This training formulation directly optimizes the hashing model for the target usage of the hash codes it produces. We further explore the addition of a decoder trained to obtain an approximated reconstruction of the original features. At test time, we retrieved the most promising database samples with an efficient graph-based search procedure using only our hash codes and perform re-ranking using the reconstructed features, thus without needing to access the original features at all. Experiments conducted on multiple publicly available large-scale datasets show that our method consistently outperforms all compared state-of-the-art unsupervised hashing methods and that the reconstruction procedure can effectively boost the search accuracy with a minimal constant additional cost.
Ji Liu, Lei Zhang
Recently, learning to hash has been widely studied for image retrieval thanks to the computation and storage efficiency of binary codes. For most existing learning to hash methods, sufficient training images are required and used to learn precise hashing codes. However, in some real-world applications, there are not always sufficient training images in the domain of interest. In addition, some existing supervised approaches need a amount of labeled data, which is an expensive process in term of time, label and human expertise. To handle such problems, inspired by transfer learning, we propose a simple yet effective unsupervised hashing method named Optimal Projection Guided Transfer Hashing (GTH) where we borrow the images of other different but related domain i.e., source domain to help learn precise hashing codes for the domain of interest i.e., target domain. Besides, we propose to seek for the maximum likelihood estimation (MLE) solution of the hashing functions of target and source domains due to the domain gap. Furthermore,an alternating optimization method is adopted to obtain the two projections of target and source domains such that the domain hashing disparity is reduced gradually. Extensive experiments on various benchmark databases verify that our method outperforms many state-of-the-art learning to hash methods. The implementation details are available at https://github.com/liuji93/GTH.
Jinho Choi
In this paper, we propose a compressive random access (CRA) scheme using
multiple resource blocks (RBs) to support massive connections for machine type
communications (MTC). The proposed CRA scheme is scalable. As a result, if the
number of devices increases, more RBs can be added to support them. Thanks to
multiple RBs, we can employ fast retrial between RBs for re-transmissions of
collided packets, which can result in short access delay. For stable CRA with
fast retrial, we derive conditions (with a rate control scheme), and analyze
the steady state performance to find the throughput and delay. Through analysis
and simulation results, we can see that the proposed scheme can perform better
than conventional multichannel ALOHA and enjoy a trade-off between the
performance and complexity in terms of the number of RBs.
Authors' comments: 11 pages, 8 figures
Hongtao Lin, Jun Yan, Meng Qu, Xiang Ren
Relation extraction is an important task in structuring content of text data,
and becomes especially challenging when learning with weak supervision---where
only a limited number of labeled sentences are given and a large number of
unlabeled sentences are available. Most existing work exploits unlabeled data
based on the ideas of self-training (i.e., bootstrapping a model) and
multi-view learning (e.g., ensembling multiple model variants). However, these
methods either suffer from the issue of semantic drift, or do not fully capture
the problem characteristics of relation extraction. In this paper, we leverage
a key insight that retrieving sentences expressing a relation is a dual task of
predicting relation label for a given sentence---two tasks are complementary to
each other and can be optimized jointly for mutual enhancement. To model this
intuition, we propose DualRE, a principled framework that introduces a
retrieval module which is jointly trained with the original relation prediction
module. In this way, high-quality samples selected by retrieval module from
unlabeled data can be used to improve prediction module, and vice versa.
Experimental results\footnote{\small Code and data can be found at
\url{https://github.com/INK-USC/DualRE}.} on two public datasets as well as
case studies demonstrate the effectiveness of the DualRE approach.
Authors' comments: 10 pages, 2-page references. Accepted to The Web Conference 2019
Mohammad Hossein Mousavi, Mohammad Ali Maddah-Ali, Mahtab Mirmohseni
In this paper, we argue that in many basic algorithms for machine learning, including support vector machine (SVM) for classification, principal component analysis (PCA) for dimensionality reduction, and regression for dependency estimation, we need the inner products of the data samples, rather than the data samples themselves. Motivated by the above observation, we introduce the problem of private inner product retrieval for distributed machine learning, where we have a system including a database of some files, duplicated across some non-colluding servers. A user intends to retrieve a subset of specific size of the inner products of the data files with minimum communication load, without revealing any information about the identity of the requested subset. For achievability, we use the algorithms for multi-message private information retrieval. For converse, we establish that as the length of the files becomes large, the set of all inner products converges to independent random variables with uniform distribution, and derive the rate of convergence. To prove that, we construct special dependencies among sequences of the sets of all inner products with different length, which forms a time-homogeneous irreducible Markov chain, without affecting the marginal distribution. We show that this Markov chain has a uniform distribution as its unique stationary distribution, with rate of convergence dominated by the second largest eigenvalue of the transition probability matrix. This allows us to develop a converse, which converges to a tight bound in some cases, as the size of the files becomes large. While this converse is based on the one in multi-message private information retrieval, due to the nature of retrieving inner products instead of data itself some changes are made to reach the desired result.
Federico Simonetta, Stavros Ntalampiras, Federico Avanzini
Towards improving the performance in various music information processing tasks, recent studies exploit different modalities able to capture diverse aspects of music. Such modalities include audio recordings, symbolic music scores, mid-level representations, motion, and gestural data, video recordings, editorial or cultural tags, lyrics and album cover arts. This paper critically reviews the various approaches adopted in Music Information Processing and Retrieval and highlights how multimodal algorithms can help Music Computing applications. First, we categorize the related literature based on the application they address. Subsequently, we analyze existing information fusion approaches, and we conclude with the set of challenges that Music Information Retrieval and Sound and Music Computing research communities should focus in the next years.
Ralph Gasser, Luca Rossetto, Heiko Schuldt
The growth of multimedia collections - in terms of size, heterogeneity, and variety of media types - necessitates systems that are able to conjointly deal with several forms of media, especially when it comes to searching for particular objects. However, existing retrieval systems are organized in silos and treat different media types separately. As a consequence, retrieval across media types is either not supported at all or subject to major limitations. In this paper, we present vitrivr, a content-based multimedia information retrieval stack. As opposed to the keyword search approach implemented by most media management systems, vitrivr makes direct use of the object's content to facilitate different types of similarity search, such as Query-by-Example or Query-by-Sketch, for and, most importantly, across different media types - namely, images, audio, videos, and 3D models. Furthermore, we introduce a new web-based user interface that enables easy-to-use, multimodal retrieval from and browsing in mixed media collections. The effectiveness of vitrivr is shown on the basis of a user study that involves different query and media types. To the best of our knowledge, the full vitrivr stack is unique in that it is the first multimedia retrieval system that seamlessly integrates support for four different types of media. As such, it paves the way towards an all-purpose, content-based multimedia information retrieval system.