Micael Carvalho, Rémi Cadène, David Picard, Laure Soulier, Matthieu Cord
Recent advances in the machine learning community allowed different use cases
to emerge, as its association to domains like cooking which created the
computational cuisine. In this paper, we tackle the picture-recipe alignment
problem, having as target application the large-scale retrieval task (finding a
recipe given a picture, and vice versa). Our approach is validated on the
Recipe1M dataset, composed of one million image-recipe pairs and additional
class information, for which we achieve state-of-the-art results.
Authors' comments: Published at DECOR / ICDE 2018. Extended version accepted at SIGIR
2018, available here: arXiv:1804.11146
Meghana Dinesh Kumar, Morteza Babaie, Hamid Tizhoosh
We investigate the concept of deep barcodes and propose two methods to
generate them in order to expedite the process of classification and retrieval
of histopathology images. Since binary search is computationally less
expensive, in terms of both speed and storage, deep barcodes could be useful
when dealing with big data retrieval. Our experiments use the dataset Kimia
Path24 to test three pre-trained networks for image retrieval. The dataset
consists of 27,055 training images in 24 different classes with large
variability, and 1,325 test images for testing. Apart from the high-speed and
efficiency, results show a surprising retrieval accuracy of 71.62% for deep
barcodes, as compared to 68.91% for deep features and 68.53% for compressed
deep features.
Authors' comments: Accepted for publication in proceedings of the IEEE World Congress on
Computational Intelligence (IEEE WCCI), Rio de Janeiro, Brazil, 8-3 July,
2018
Lin Wu, Yang Wang, Ling Shao
In this paper, we propose a novel deep generative approach to cross-modal
retrieval to learn hash functions in the absence of paired training samples
through the cycle consistency loss. Our proposed approach employs adversarial
training scheme to lean a couple of hash functions enabling translation between
modalities while assuming the underlying semantic relationship. To induce the
hash codes with semantics to the input-output pair, cycle consistency loss is
further proposed upon the adversarial training to strengthen the correlations
between inputs and corresponding outputs. Our approach is generative to learn
hash functions such that the learned hash codes can maximally correlate each
input-output correspondence, meanwhile can also regenerate the inputs so as to
minimize the information loss. The learning to hash embedding is thus performed
to jointly optimize the parameters of the hash functions across modalities as
well as the associated generative models. Extensive experiments on a variety of
large-scale cross-modal data sets demonstrate that our proposed method achieves
better retrieval results than the state-of-the-arts.
Authors' comments: To appeared on IEEE Trans. Image Processing. arXiv admin note: text
overlap with arXiv:1703.10593 by other authors
Qiwen Wang, Hua Sun, Mikael Skoglund
We consider the problem of private information retrieval (PIR) with colluding servers and eavesdroppers (abbreviated as ETPIR). The ETPIR problem is comprised of $K$ messages, $N$ servers where each server stores all $K$ messages, a user who wants to retrieve one of the $K$ messages without revealing the desired message index to any set of $T$ colluding servers, and an eavesdropper who can listen to the queries and answers of any $E$ servers but is prevented from learning any information about the messages. The information theoretic capacity of ETPIR is defined to be the maximum number of desired message symbols retrieved privately per information symbol downloaded. We show that the capacity of ETPIR is $C = \left( 1- \frac{E}{N} \right) \left(1 + \frac{T-E}{N-E} + \cdots + \left( \frac{T-E}{N-E} \right)^{K-1} \right)^{-1}$ when $E < T$, and $C = \left( 1 - \frac{E}{N} \right)$ when $E \geq T$. To achieve the capacity, the servers need to share a common random variable (independent of the messages), and its size must be at least $\frac{E}{N} \cdot \frac{1}{C}$ symbols per message symbol. Otherwise, with less amount of shared common randomness, ETPIR is not feasible and the capacity reduces to zero. An interesting observation is that the ETPIR capacity expression takes different forms in two regimes. When $E < T$, the capacity equals the inverse of a sum of a geometric series with $K$ terms and decreases with $K$; this form is typical for capacity expressions of PIR. When $E \geq T$, the capacity does not depend on $K$, a typical form for capacity expressions of SPIR (symmetric PIR, which further requires data-privacy, {\it i.e.,} the user learns no information about other undesired messages); the capacity does not depend on $T$ either. In addition, the ETPIR capacity result includes multiple previous PIR and SPIR capacity results as special cases.
Aaron Jaech, Shobhit Hathi, Mari Ostendorf
This paper addresses the problem of community membership detection using only
text features in a scenario where a small number of positive labeled examples
defines the community. The solution introduces an unsupervised proxy task for
learning user embeddings: user re-identification. Experiments with 16 different
communities show that the resulting embeddings are more effective for community
membership identification than common unsupervised representations.
Authors' comments: NAACL 2018
Huijuan Xu, Kun He, Bryan A. Plummer, Leonid Sigal, Stan Sclaroff, Kate Saenko
We address the problem of text-based activity retrieval in video. Given a
sentence describing an activity, our task is to retrieve matching clips from an
untrimmed video. To capture the inherent structures present in both text and
video, we introduce a multilevel model that integrates vision and language
features earlier and more tightly than prior work. First, we inject text
features early on when generating clip proposals, to help eliminate unlikely
clips and thus speed up processing and boost performance. Second, to learn a
fine-grained similarity metric for retrieval, we use visual features to
modulate the processing of query sentences at the word level in a recurrent
neural network. A multi-task loss is also employed by adding query
re-generation as an auxiliary task. Our approach significantly outperforms
prior work on two challenging benchmarks: Charades-STA and ActivityNet
Captions.
Authors' comments: AAAI 2019
Jingyi Zhang, Masao Utiyama, Eiichro Sumita, Graham Neubig, Satoshi Nakamura
One of the difficulties of neural machine translation (NMT) is the recall and
appropriate translation of low-frequency words or phrases. In this paper, we
propose a simple, fast, and effective method for recalling previously seen
translation examples and incorporating them into the NMT decoding process.
Specifically, for an input sentence, we use a search engine to retrieve
sentence pairs whose source sides are similar with the input sentence, and then
collect $n$-grams that are both in the retrieved target sentences and aligned
with words that match in the source sentences, which we call "translation
pieces". We compute pseudo-probabilities for each retrieved sentence based on
similarities between the input sentence and the retrieved source sentences, and
use these to weight the retrieved translation pieces. Finally, an existing NMT
model is used to translate the input sentence, with an additional bonus given
to outputs that contain the collected translation pieces. We show our method
improves NMT translation results up to 6 BLEU points on three narrow domain
translation tasks where repetitiveness of the target sentences is particularly
salient. It also causes little increase in the translation time, and compares
favorably to another alternative retrieval-based method with respect to
accuracy, speed, and simplicity of implementation.
Authors' comments: NAACL 2018
Peng Xu, Yongye Huang, Tongtong Yuan, Kaiyue Pang, Yi-Zhe Song, Tao Xiang, Timothy M. Hospedales, Zhanyu Ma et al.
We propose a deep hashing framework for sketch retrieval that, for the first
time, works on a multi-million scale human sketch dataset. Leveraging on this
large dataset, we explore a few sketch-specific traits that were otherwise
under-studied in prior literature. Instead of following the conventional sketch
recognition task, we introduce the novel problem of sketch hashing retrieval
which is not only more challenging, but also offers a better testbed for
large-scale sketch analysis, since: (i) more fine-grained sketch feature
learning is required to accommodate the large variations in style and
abstraction, and (ii) a compact binary code needs to be learned at the same
time to enable efficient retrieval. Key to our network design is the embedding
of unique characteristics of human sketch, where (i) a two-branch CNN-RNN
architecture is adapted to explore the temporal ordering of strokes, and (ii) a
novel hashing loss is specifically designed to accommodate both the temporal
and abstract traits of sketches. By working with a 3.8M sketch dataset, we show
that state-of-the-art hashing models specifically engineered for static images
fail to perform well on temporal sketch data. Our network on the other hand not
only offers the best retrieval performance on various code sizes, but also
yields the best generalization performance under a zero-shot setting and when
re-purposed for sketch recognition. Such superior performances effectively
demonstrate the benefit of our sketch-specific design.
Authors' comments: Accepted by CVPR2018
Chao Li, Cheng Deng, Ning Li, Wei Liu, Xinbo Gao, Dacheng Tao
Thanks to the success of deep learning, cross-modal retrieval has made significant progress recently. However, there still remains a crucial bottleneck: how to bridge the modality gap to further enhance the retrieval accuracy. In this paper, we propose a self-supervised adversarial hashing (\textbf{SSAH}) approach, which lies among the early attempts to incorporate adversarial learning into cross-modal hashing in a self-supervised fashion. The primary contribution of this work is that two adversarial networks are leveraged to maximize the semantic correlation and consistency of the representations between different modalities. In addition, we harness a self-supervised semantic network to discover high-level semantic information in the form of multi-label annotations. Such information guides the feature learning process and preserves the modality relationships in both the common semantic space and the Hamming space. Extensive experiments carried out on three benchmark datasets validate that the proposed SSAH surpasses the state-of-the-art methods.
Bin Liu, Yiqiang Q. Zhao
In this paper, we study the asymptotic behavior of the tail probability of
the number of customers in the steady-state $M/G/1$ retrial queue with
Bernoulli schedule, under the assumption that the service time distribution has
a regularly varying tail. Detailed tail asymptotic properties are obtained for
the (conditional and unconditional) probability of the number of customers in
the (priority) queue, orbit and system, respectively.
Authors' comments: 18 pages; revised version: 20 pages
Shiv Ram Dubey, Soumendu Chakraborty
The convolutional neural networks (CNN), including AlexNet, GoogleNet,
VGGNet, etc. extract features for many computer vision problems which are very
discriminative. The trained CNN model over one dataset performs reasonably well
whereas on another dataset of similar type the hand-designed feature descriptor
outperforms the same trained CNN model. The Rectified Linear Unit (ReLU) layer
discards some values in order to introduce the non-linearity. In this paper, it
is proposed that the discriminative ability of deep image representation using
trained model can be improved by Average Biased ReLU (AB-ReLU) at the last few
layers. Basically, AB-ReLU improves the discriminative ability in two ways: 1)
it exploits some of the discriminative and discarded negative information of
ReLU and 2) it also neglects the irrelevant and positive information used in
ReLU. The VGGFace model trained in MatConvNet over the VGG-Face dataset is used
as the feature descriptor for face retrieval over other face datasets. The
proposed approach is tested over six challenging, unconstrained and robust face
datasets (PubFig, LFW, PaSC, AR, FERET and ExtYale) and also on a large scale
face dataset (PolyUNIR) in retrieval framework. It is observed that the AB-ReLU
outperforms the ReLU when used with a pre-trained VGGFace model over the face
datasets. The validation error by training the network after replacing all
ReLUs with AB-ReLUs is also observed to be favorable over each dataset. The
AB-ReLU even outperforms the state-of-the-art activation functions, such as
Sigmoid, ReLU, Leaky ReLU and Flexible ReLU over all seven face datasets.
Authors' comments: Published by Multimedia Tools and Applications, Springer
Maher Abdullah, Mohammed GH. I. Al Zamil
Large amount of unstructured designed information is difficult to deal with.
Obtaining specific information is a hard mission and takes a lot of time.
Information Retrieval System (IR) is a way to solve this kind of problem. IR is
a good mechanism but does not give the perfect solution. Other techniques have
been added to IR to develop the result. One of the techniques is text
classification. Text classification task is to assign a document to one or more
category. It could be done manually or algorithmically. Text classification
enhances the output of this process by reducing the results. This study proved
that text classification has a positive influence on Information Retrieval
Systems.
Authors' comments: the paper consists of 16 pages. It presents an idea that is expected
to be expanded in the near future
Deyue Zhang, Yukun Guo, Jingzhi Li, Hongyu Liu
This paper is concerned with the inverse source problem of reconstructing an
unknown acoustic excitation from phaseless measurements of the radiated fields
away at multiple frequencies. It is well known that the non-uniqueness issue is
a major challenge associated with such an inverse problem. We develop a novel
strategy to overcome this challenging problem by recovering the radiated fields
via adding some reference point sources as extra artificial sources to the
inverse source system. This novel reference source technique requires only a
few extra data, and brings in a simple phase retrieval formula. The stability
of this phase retrieval approach is rigorously analyzed. After the
reacquisition of the phase information, the multi-frequency inverse source
problem with recovered phase information is solved by the Fourier method, which
is non-iterative, fast and easy to implement. Several numerical examples are
presented to demonstrate the feasibility and effectiveness of the proposed
method.
Authors' comments: 27 pages, 4 figures
Filip Radenović, Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, Ondřej Chum
In this paper we address issues with image retrieval benchmarking on standard
and popular Oxford 5k and Paris 6k datasets. In particular, annotation errors,
the size of the dataset, and the level of challenge are addressed: new
annotation for both datasets is created with an extra attention to the
reliability of the ground truth. Three new protocols of varying difficulty are
introduced. The protocols allow fair comparison between different methods,
including those using a dataset pre-processing stage. For each dataset, 15 new
challenging queries are introduced. Finally, a new set of 1M hard,
semi-automatically cleaned distractors is selected.
An extensive comparison of the state-of-the-art methods is performed on the
new benchmark. Different types of methods are evaluated, ranging from
local-feature-based to modern CNN based methods. The best results are achieved
by taking the best of the two worlds. Most importantly, image retrieval appears
far from being solved.
Authors' comments: CVPR 2018
Haotian Zhang, Gordon V. Cormack, Maura R. Grossman, Mark D. Smucker
This study uses a novel simulation framework to evaluate whether the time and
effort necessary to achieve high recall using active learning is reduced by
presenting the reviewer with isolated sentences, as opposed to full documents,
for relevance feedback. Under the weak assumption that more time and effort is
required to review an entire document than a single sentence, simulation
results indicate that the use of isolated sentences for relevance feedback can
yield comparable accuracy and higher efficiency, relative to the
state-of-the-art Baseline Model Implementation (BMI) of the AutoTAR Continuous
Active Learning ("CAL") method employed in the TREC 2015 and 2016 Total Recall
Track.
Authors' comments: 25 pages
Jiaxing Wang, Jihua Zhu, Shanmin Pang, Zhongyu Li, Yaochen Li, Xueming Qian
Aggregating deep convolutional features into a global image vector has
attracted sustained attention in image retrieval. In this paper, we propose an
efficient unsupervised aggregation method that uses an adaptive Gaussian filter
and an elementvalue sensitive vector to co-weight deep features. Specifically,
the Gaussian filter assigns large weights to features of region-of-interests
(RoI) by adaptively determining the RoI's center, while the element-value
sensitive channel vector suppresses burstiness phenomenon by assigning small
weights to feature maps with large sum values of all locations. Experimental
results on benchmark datasets validate the proposed two weighting schemes both
effectively improve the discrimination power of image vectors. Furthermore,
with the same experimental setting, our method outperforms other very recent
aggregation approaches by a considerable margin.
Authors' comments: 6 pages,5 figures,ICME2018 poster
Jasmina Blecic, Ian Dobbs-Dixon, Thomas Greene
Using the atmospheric structure from a 3D global radiation-hydrodynamic
simulation of HD 189733b and the open-source BART code, we investigate the
difference between the secondary-eclipse temperature structure produced with a
3D simulation and the best-fit 1D retrieved model. Synthetic data are generated
by integrating the 3D models over the Spitzer, HST, and JWST bandpasses,
covering the wavelength range between 1 and 11 um. Using the data from
different observing instruments, we present detailed comparisons between the
temperature-pressure profiles recovered by BART and those from the 3D
simulations. We calculate several averages of the 3D thermal structure and
implement two temperature parameterizations to investigate different thermal
profile shapes. To assess which part of the thermal structure is best
constrained by the data, we generate contribution functions for both our
theoretical model and each of our retrieved models. Our conclusions are
strongly affected by the spectral resolution of the instruments included, their
wavelength coverage, and the number of data points combined. We also see some
limitations in each of the temperature parametrizations. The results show that
our 1D retrieval is recovering a temperature and pressure profile that most
closely matches the arithmetic average of the 3D thermal structure. When we use
a higher resolution, more data points, and a parametrized temperature profile
that allows more flexibility in the middle part of the atmosphere, we find a
better match between the retrieved temperature and pressure profile and the
arithmetic average.
Authors' comments: 23 pages
Xinwei He, Yang Zhou, Zhichao Zhou, Song Bai, Xiang Bai
Most existing 3D object recognition algorithms focus on leveraging the strong
discriminative power of deep learning models with softmax loss for the
classification of 3D data, while learning discriminative features with deep
metric learning for 3D object retrieval is more or less neglected. In the
paper, we study variants of deep metric learning losses for 3D object
retrieval, which did not receive enough attention from this area. First , two
kinds of representative losses, triplet loss and center loss, are introduced
which could learn more discriminative features than traditional classification
loss. Then, we propose a novel loss named triplet-center loss, which can
further enhance the discriminative power of the features. The proposed
triplet-center loss learns a center for each class and requires that the
distances between samples and centers from the same class are closer than those
from different classes. Extensive experimental results on two popular 3D object
retrieval benchmarks and two widely-adopted sketch-based 3D shape retrieval
benchmarks consistently demonstrate the effectiveness of our proposed loss, and
significant improvements have been achieved compared with the
state-of-the-arts.
Authors' comments: accepted by CVPR2018
Sean Billings
This paper will explore the use of autoencoders for semantic hashing in the context of Information Retrieval. This paper will summarize how to efficiently train an autoencoder in order to create meaningful and low-dimensional encodings of data. This paper will demonstrate how computing and storing the closest encodings to an input query can help speed up search time and improve the quality of our search results. The novel contributions of this paper involve using the representation of the data learned by an auto-encoder in order to augment our search query in various ways. I present and evaluate the new gradient search augmentation (GSA) approach, as well as the more well-known pseudo-relevance-feedback (PRF) adjustment. I find that GSA helps to improve the performance of the TF-IDF based information retrieval system, and PRF combined with GSA works best overall for the systems compared in this paper.
Won-Kwang Park
In this paper, direct sampling method is considered for determining the location of a set of small, linear perfectly conducting cracks from the collected far-field data corresponding to an incident field. To show the feasibility of the direct sampling method, this study proves that the indicator function of the direct sampling method can be represented by the Bessel function of order zero and the crack lengths. The results of the numerical simulations are shown to support the fact that the imaging performance is highly dependent on the crack lengths. To explain the fact that the imaging performance is highly dependent on the rotation of the cracks, the direct sampling method is further analyzed by establishing a representation using Bessel functions of orders zero and one. Based on the derived representation of indicator function, we design improved direct sampling methods by applying incident fields with multiple directions and multiple frequencies. Corresponding analysis of indicator functions and simulation results are shown for demonstrating the effectiveness and improvements.