Jiansheng Fang, Yanwu Xu, Xiaoqing Zhang, Yan Hu, Jiang Liu
Deep hashing methods have been proved to be effective for the large-scale
medical image search assisting reference-based diagnosis for clinicians.
However, when the salient region plays a maximal discriminative role in
ophthalmic image, existing deep hashing methods do not fully exploit the
learning ability of the deep network to capture the features of salient regions
pointedly. The different grades or classes of ophthalmic images may be share
similar overall performance but have subtle differences that can be
differentiated by mining salient regions. To address this issue, we propose a
novel end-to-end network, named Attention-based Saliency Hashing (ASH), for
learning compact hash-code to represent ophthalmic images. ASH embeds a
spatial-attention module to focus more on the representation of salient regions
and highlights their essential role in differentiating ophthalmic images.
Benefiting from the spatial-attention module, the information of salient
regions can be mapped into the hash-code for similarity calculation. In the
training stage, we input the image pairs to share the weights of the network,
and a pairwise loss is designed to maximize the discriminability of the
hash-code. In the retrieval stage, ASH obtains the hash-code by inputting an
image with an end-to-end manner, then the hash-code is used to similarity
calculation to return the most similar images. Extensive experiments on two
different modalities of ophthalmic image datasets demonstrate that the proposed
ASH can further improve the retrieval performance compared to the
state-of-the-art deep hashing methods due to the huge contributions of the
spatial-attention module.
Authors' comments: 8 pages, 4 figures, BIBM2020 conference
Chung-Wei Weng, Yauhen Yakimenka, Hsuan-Yin Lin, Eirik Rosnes, Joerg Kliewer
We propose to extend the concept of private information retrieval by allowing
for distortion in the retrieval process and relaxing the perfect privacy
requirement at the same time. In particular, we study the trade-off between
download rate, distortion, and user privacy leakage, and show that in the limit
of large file sizes this trade-off can be captured via a novel
information-theoretical formulation for datasets with a known distribution.
Moreover, for scenarios where the statistics of the dataset is unknown, we
propose a new deep learning framework by leveraging a generative adversarial
network approach, which allows the user to learn efficient schemes from the
data itself. We evaluate the performance of the scheme on a synthetic Gaussian
dataset as well as on the MNIST, CIFAR-10, and LSUN datasets. For the MNIST,
CIFAR-10, and LSUN datasets, the data-driven approach significantly outperforms
a nonlearning-based scheme which combines source coding with the download of
multiple files.
Authors' comments: Accepted for Publication in IEEE TRANSACTIONS ON INFORMATION
FORENSICS AND SECURITY (TIFS)
Dara Entekhabi, Alexandra Konings, Maria Piles, Narendra Das
Over land the vegetation canopy affects the microwave brightness temperature by emission, scattering and attenuation of surface soil emission. The questions addressed in this study are: 1) what is the transparency of the vegetation canopy for different biomes around the Globe at the low-frequency L-band?, 2) what is the seasonal amplitude of vegetation microwave optical depth for different biomes?, 3) what is the effective scattering at this frequency for different vegetation types?, 4) what is the impact of imprecise characterization of vegetation microwave properties on retrieval of soil surface conditions? These questions are addressed based on the recently completed one full annual cycle measurements by the NASA Soil Moisture Active Passive (SMAP) measurements.
Parshwa Shah, Arpit Garg, Vandit Gajjar
A person is usually characterized by descriptors like age, gender, height,
cloth type, pattern, color, etc. Such descriptors are known as attributes
and/or soft-biometrics. They link the semantic gap between a person's
description and retrieval in video surveillance. Retrieving a specific person
with the query of semantic description has an important application in video
surveillance. Using computer vision to fully automate the person retrieval task
has been gathering interest within the research community. However, the
Current, trend mainly focuses on retrieving persons with image-based queries,
which have major limitations for practical usage. Instead of using an image
query, in this paper, we study the problem of person retrieval in video
surveillance with a semantic description. To solve this problem, we develop a
deep learning-based cascade filtering approach (PeR-ViS), which uses Mask R-CNN
[14] (person detection and instance segmentation) and DenseNet-161 [16]
(soft-biometric classification). On the standard person retrieval dataset of
SoftBioSearch [6], we achieve 0.566 Average IoU and 0.792 %w $IoU > 0.4$,
surpassing the current state-of-the-art by a large margin. We hope our simple,
reproducible, and effective approach will help ease future research in the
domain of person retrieval in video surveillance. The source code and
pretrained weights available at https://parshwa1999.github.io/PeR-ViS/.
Authors' comments: 10 pages, 6 figures, 3 tables; Human Activity Detection in
multi-camera, Continuous, long-duration Video (HADCV'21)under the IEEE Winter
Conf. on Applications of Computer Vision (WACV), Virtual Conference, January
5, 2021
Wenchao Gu, Zongjie Li, Cuiyun Gao, Chaozheng Wang, Hongyu Zhang, Zenglin Xu, Michael R. Lyu
Code retrieval is a common practice for programmers to reuse existing code snippets in open-source repositories. Given a user query (i.e., a natural language description), code retrieval aims at searching for the most relevant ones from a set of code snippets. The main challenge of effective code retrieval lies in mitigating the semantic gap between natural language descriptions and code snippets. With the ever-increasing amount of available open-source code, recent studies resort to neural networks to learn the semantic matching relationships between the two sources. The statement-level dependency information, which highlights the dependency relations among the program statements during the execution, reflects the structural importance of one statement in the code, which is favorable for accurately capturing the code semantics but has never been explored for the code retrieval task. In this paper, we propose CRaDLe, a novel approach for Code Retrieval based on statement-level semantic Dependency Learning. Specifically, CRaDLe distills code representations through fusing both the dependency and semantic information at the statement level and then learns a unified vector representation for each code and description pair for modeling the matching relationship. Comprehensive experiments and analysis on real-world datasets show that the proposed approach can accurately retrieve code snippets for a given query and significantly outperform the state-of-the-art approaches to the task.
Allen Schmaltz, Andrew Beam
We present a novel end-to-end language model for joint retrieval and
classification, unifying the strengths of bi- and cross- encoders into a single
language model via a coarse-to-fine memory matching search procedure for
learning and inference. Evaluated on the standard blind test set of the FEVER
fact verification dataset, classification accuracy is significantly higher than
approaches that only rely on the language model parameters as a knowledge base,
and approaches some recent multi-model pipeline systems, using only a single
BERT base model augmented with memory layers. We further demonstrate how
coupled retrieval and classification can be leveraged to identify low
confidence instances, and we extend exemplar auditing to this setting for
analyzing and constraining the model. As a result, our approach yields a means
of updating language model behavior through two distinct mechanisms: The
retrieved information can be updated explicitly, and the model behavior can be
modified via the exemplar database.
Authors' comments: 19 pages, 3 figures, 7 tables (main: 11 pages, 2 figures, 4 tables)
Auxiliadora Padrón-Brito, Roberto Tricarico, Pau Farrera, Emanuele Distante, Klara Theophilo, Darrick Chang, Hugues de Riedmatten
We study the photon statistics of weak coherent pulses propagating through a
cold Rydberg atomic ensemble in the regime of Rydberg electromagnetically
induced transparency. We show experimentally that the value of the second-order
autocorrelation function of the transmitted light strongly depends on the
position within the pulse and heavily varies during the transients of the
pulse. In particular, we show that the falling edge of the transmitted pulse
displays much lower values than the rest of the pulse. We derive a theoretical
model that quantitatively predicts our results and explains the physical
behavior involved. Finally, we use this effect to generate single photons
localized within a pulse from the atomic ensemble. We show that by selecting
only the last part of the transmitted pulse, the single photons show an
antibunching parameter as low as 0.12 and a generation efficiency per trial
larger than possible with probabilistic generation schemes with atomic
ensembles.
Authors' comments: 21 pages, 11 figures
Frederik Warburg, Martin Jørgensen, Javier Civera, Søren Hauberg
Uncertainty quantification in image retrieval is crucial for downstream decisions, yet it remains a challenging and largely unexplored problem. Current methods for estimating uncertainties are poorly calibrated, computationally expensive, or based on heuristics. We present a new method that views image embeddings as stochastic features rather than deterministic features. Our two main contributions are (1) a likelihood that matches the triplet constraint and that evaluates the probability of an anchor being closer to a positive than a negative; and (2) a prior over the feature space that justifies the conventional l2 normalization. To ensure computational efficiency, we derive a variational approximation of the posterior, called the Bayesian triplet loss, that produces state-of-the-art uncertainty estimates and matches the predictive performance of current state-of-the-art methods.
Pierre-Hugo Vial, Paul Magron, Thomas Oberlin, Cédric Févotte
Phase retrieval aims to recover a signal from magnitude or power spectra
measurements. It is often addressed by considering a minimization problem
involving a quadratic cost function. We propose a different formulation based
on Bregman divergences, which encompass divergences that are appropriate for
audio signal processing applications. We derive a fast gradient algorithm to
solve this problem.
Authors' comments: in Proceedings of iTWIST'20, Paper-ID: 16, Nantes, France, December,
2-4, 2020
Xirong Li, Fangming Zhou, Chaoxi Xu, Jiaqi Ji, Gang Yang
Retrieving unlabeled videos by textual queries, known as Ad-hoc Video Search
(AVS), is a core theme in multimedia data management and retrieval. The success
of AVS counts on cross-modal representation learning that encodes both query
sentences and videos into common spaces for semantic similarity computation.
Inspired by the initial success of previously few works in combining multiple
sentence encoders, this paper takes a step forward by developing a new and
general method for effectively exploiting diverse sentence encoders. The
novelty of the proposed method, which we term Sentence Encoder Assembly (SEA),
is two-fold. First, different from prior art that use only a single common
space, SEA supports text-video matching in multiple encoder-specific common
spaces. Such a property prevents the matching from being dominated by a
specific encoder that produces an encoding vector much longer than other
encoders. Second, in order to explore complementarities among the individual
common spaces, we propose multi-space multi-loss learning. As extensive
experiments on four benchmarks (MSR-VTT, TRECVID AVS 2016-2019, TGIF and MSVD)
show, SEA surpasses the state-of-the-art. In addition, SEA is extremely ease to
implement. All this makes SEA an appealing solution for AVS and promising for
continuously advancing the task by harvesting new sentence encoders.
Authors' comments: accepted for publication as a REGULAR paper in the IEEE Transactions
on Multimedia
Sitan Chen, Xiaoxiao Li, Zhao Song, Danyang Zhuo
In this work, we examine the security of InstaHide, a scheme recently
proposed by [Huang, Song, Li and Arora, ICML'20] for preserving the security of
private datasets in the context of distributed learning. To generate a
synthetic training example to be shared among the distributed learners,
InstaHide takes a convex combination of private feature vectors and randomly
flips the sign of each entry of the resulting vector with probability 1/2. A
salient question is whether this scheme is secure in any provable sense,
perhaps under a plausible hardness assumption and assuming the distributions
generating the public and private data satisfy certain properties.
We show that the answer to this appears to be quite subtle and closely
related to the average-case complexity of a new multi-task, missing-data
version of the classic problem of phase retrieval. Motivated by this
connection, we design a provable algorithm that can recover private vectors
using only the public vectors and synthetic vectors generated by InstaHide,
under the assumption that the private and public vectors are isotropic
Gaussian.
Authors' comments: 30 pages, to appear in ICLR 2021, v2: updated discussion of follow-up
work
Eunju Cha, Chanseok Lee, Mooseok Jang, Jong Chul Ye
Fourier phase retrieval is a classical problem of restoring a signal only from the measured magnitude of its Fourier transform. Although Fienup-type algorithms, which use prior knowledge in both spatial and Fourier domains, have been widely used in practice, they can often stall in local minima. Modern methods such as PhaseLift and PhaseCut may offer performance guarantees with the help of convex relaxation. However, these algorithms are usually computationally intensive for practical use. To address this problem, we propose a novel, unsupervised, feed-forward neural network for Fourier phase retrieval which enables immediate high quality reconstruction. Unlike the existing deep learning approaches that use a neural network as a regularization term or an end-to-end blackbox model for supervised training, our algorithm is a feed-forward neural network implementation of PhaseCut algorithm in an unsupervised learning framework. Specifically, our network is composed of two generators: one for the phase estimation using PhaseCut loss, followed by another generator for image reconstruction, all of which are trained simultaneously using a cycleGAN framework without matched data. The link to the classical Fienup-type algorithms and the recent symmetry-breaking learning approach is also revealed. Extensive experiments demonstrate that the proposed method outperforms all existing approaches in Fourier phase retrieval problems.
Zhaoqun Li
How to obtain the desirable representation of a 3D shape is a key challenge in 3D shape retrieval task. Most existing 3D shape retrieval methods focus on capturing shape representation with different neural network architectures, while the learning ability of each layer in the network is neglected. A common and tough issue that limits the capacity of the network is overfitting. To tackle this, L2 regularization is applied widely in existing deep learning frameworks. However,the effect on the generalization ability with L2 regularization is limited as it only controls large value in parameters. To make up the gap, in this paper, we propose a novel regularization term called Gram regularization which reinforces the learning ability of the network by encouraging the weight kernels to extract different information on the corresponding feature map. By forcing the variance between weight kernels to be large, the regularizer can help to extract discriminative features. The proposed Gram regularization is data independent and can converge stably and quickly without bells and whistles. Moreover, it can be easily plugged into existing off-the-shelf architectures. Extensive experimental results on the popular 3D object retrieval benchmark ModelNet demonstrate the effectiveness of our method.
Yuhe Zhang, Mike Andreas Noack, Patrik Vagovic, Kamel Fezzaa, Francisco Garcia-Moreno, Tobias Ritschel, Pablo Villanueva-Perez
Phase retrieval approaches based on DL provide a framework to obtain phase information from an intensity hologram or diffraction pattern in a robust manner and in real time. However, current DL architectures applied to the phase problem rely i) on paired datasets, i.e., they are only applicable when a satisfactory solution of the phase problem has been found, and ii) on the fact that most of them ignore the physics of the imaging process. Here, we present PhaseGAN, a new DL approach based on Generative Adversarial Networks, which allows the use of unpaired datasets and includes the physics of image formation. Performance of our approach is enhanced by including the image formation physics and provides phase reconstructions when conventional phase retrieval algorithms fail, such as ultra-fast experiments. Thus, PhaseGAN offers the opportunity to address the phase problem when no phase reconstructions are available, but good simulations of the object or data from other experiments are available, enabling us to obtain results not possible before.
Wissam Bejjani, Wisdom C. Agboh, Mehmet R. Dogar, Matteo Leonetti
We address the manipulation task of retrieving a target object from a cluttered shelf. When the target object is hidden, the robot must search through the clutter for retrieving it. Solving this task requires reasoning over the likely locations of the target object. It also requires physics reasoning over multi-object interactions and future occlusions. In this work, we present a data-driven hybrid planner for generating occlusion-aware actions in closed-loop. The hybrid planner explores likely locations of the occluded target object as predicted by a learned distribution from the observation stream. The search is guided by a heuristic trained with reinforcement learning to act on observations with occlusions. We evaluate our approach in different simulation and real-world settings (video available on https://youtu.be/dY7YQ3LUVQg). The results validate that our approach can search and retrieve a target object in near real time in the real world while only being trained in simulation.
Sandra Obermeier, Max Berrendorf, Peer Kröger
The reverse k-nearest neighbor (RkNN) query is an established query type with various applications reaching from identifying highly influential objects over incrementally updating kNN graphs to optimizing sensor communication and outlier detection. State-of-the-art solutions exploit that the k-distances in real-world datasets often follow the power-law distribution, and bound them with linear lines in log-log space. In this work, we investigate this assumption and uncover that it is violated in regions of changing density, which we show are typical for real-life datasets. Towards a generic solution, we pose the estimation of k-distances as a regression problem. Thereby, we enable harnessing the power of the abundance of available Machine Learning models and profiting from their advancement. We propose a flexible approach which allows steering the performance-memory consumption trade-off, and in particular to find good solutions with a fixed memory budget crucial in the context of edge computing. Moreover, we show how to obtain and improve guaranteed bounds essential to exact query processing. In experiments on real-world datasets, we demonstrate how this framework can significantly reduce the index memory consumption, and strongly reduce the candidate set size. We publish our code at https://github.com/sobermeier/nonlinear-kdist.
Muntaha Iqbal, Kamran Amjad, Bilal Tahir, Muhammad Amir Mehmood
Urdu is a widely spoken language with 163 million speakers worldwide across the globe. Information Retrieval (IR) for Urdu entails special consideration of research community due to its rich morphological features and a large number of speakers. In general, IR evaluation task is not extensively explored for Urdu. The most important missing element is the availability of a standardized evaluation corpus specific to Urdu. In this research work, we propose and construct a standard test collection of Urdu documents for IR evaluation and named it Collection for Urdu Retrieval Evaluation (CURE). We select 1,096 unique documents against 50 diverse queries from a large collection of 0.5 million crawled documents using two IR models. The purpose of test collection is the evaluation of IR models, ranking algorithms, and different natural language processing techniques. Next, we perform binary relevance judgment on the selected documents. We also built two other language resources for lemmatization and query expansion specific to our test collection. Evaluation of test collection is carried out using four retrieval models as well using the stop-words list, lemmatization, and query expansion. Furthermore, error analysis was performed for each query with different NLP techniques. To the best of our knowledge, this work is the first attempt for preparing a standardized information retrieval evaluation test collection for the Urdu language.
Minz Won, Sergio Oramas, Oriol Nieto, Fabien Gouyon, Xavier Serra
Tag-based music retrieval is crucial to browse large-scale music libraries
efficiently. Hence, automatic music tagging has been actively explored, mostly
as a classification task, which has an inherent limitation: a fixed vocabulary.
On the other hand, metric learning enables flexible vocabularies by using
pretrained word embeddings as side information. Also, metric learning has
already proven its suitability for cross-modal retrieval tasks in other domains
(e.g., text-to-image) by jointly learning a multimodal embedding space. In this
paper, we investigate three ideas to successfully introduce multimodal metric
learning for tag-based music retrieval: elaborate triplet sampling, acoustic
and cultural music information, and domain-specific word embeddings. Our
experimental results show that the proposed ideas enhance the retrieval system
quantitatively, and qualitatively. Furthermore, we release the MSD500, a subset
of the Million Song Dataset (MSD) containing 500 cleaned tags, 7 manually
annotated tag categories, and user taste profiles.
Authors' comments: 5 pages, 2 figures, submitted to ICASSP 2021
Wen-Ting Tseng, Tien-Hong Lo, Yung-Chang Hsu, Berlin Chen
Frequently asked question (FAQ) retrieval, with the purpose of providing information on frequent questions or concerns, has far-reaching applications in many areas, where a collection of question-answer (Q-A) pairs compiled a priori can be employed to retrieve an appropriate answer in response to a user\u2019s query that is likely to reoccur frequently. To this end, predominant approaches to FAQ retrieval typically rank question-answer pairs by considering either the similarity between the query and a question (q-Q), the relevance between the query and the associated answer of a question (q-A), or combining the clues gathered from the q-Q similarity measure and the q-A relevance measure. In this paper, we extend this line of research by combining the clues gathered from the q-Q similarity measure and the q-A relevance measure and meanwhile injecting extra word interaction information, distilled from a generic (open domain) knowledge base, into a contextual language model for inferring the q-A relevance. Furthermore, we also explore to capitalize on domain-specific topically-relevant relations between words in an unsupervised manner, acting as a surrogate to the supervised domain-specific knowledge base information. As such, it enables the model to equip sentence representations with the knowledge about domain-specific and topically-relevant relations among words, thereby providing a better q-A relevance measure. We evaluate variants of our approach on a publicly-available Chinese FAQ dataset, and further apply and contextualize it to a large-scale question-matching task, which aims to search questions from a QA dataset that have a similar intent as an input query. Extensive experimental results on these two datasets confirm the promising performance of the proposed approach in relation to some state-of-the-art ones.
Liang Yao, Baosong Yang, Haibo Zhang, Weihua Luo, Boxing Chen
As a crucial role in cross-language information retrieval (CLIR), query
translation has three main challenges: 1) the adequacy of translation; 2) the
lack of in-domain parallel training data; and 3) the requisite of low latency.
To this end, existing CLIR systems mainly exploit statistical-based machine
translation (SMT) rather than the advanced neural machine translation (NMT),
limiting the further improvements on both translation and retrieval quality. In
this paper, we investigate how to exploit neural query translation model into
CLIR system. Specifically, we propose a novel data augmentation method that
extracts query translation pairs according to user clickthrough data, thus to
alleviate the problem of domain-adaptation in NMT. Then, we introduce an
asynchronous strategy which is able to leverage the advantages of the real-time
in SMT and the veracity in NMT. Experimental results reveal that the proposed
approach yields better retrieval quality than strong baselines and can be well
applied into a real-world CLIR system, i.e. Aliexpress e-Commerce search
engine. Readers can examine and test their cases on our website:
https://aliexpress.com .
Authors' comments: SIGIR eCom 2020