Alexander Bartl, Gerasimos Spanakis
Finding semantically rich and computer-understandable representations for
textual dialogues, utterances and words is crucial for dialogue systems (or
conversational agents), as their performance mostly depends on understanding
the context of conversations. Recent research aims at finding distributed
vector representations (embeddings) for words, such that semantically similar
words are relatively close within the vector-space. Encoding the "meaning" of
text into vectors is a current trend, and text can range from words, phrases
and documents to actual human-to-human conversations. In recent research
approaches, responses have been generated utilizing a decoder architecture,
given the vector representation of the current conversation. In this paper, the
utilization of embeddings for answer retrieval is explored by using
Locality-Sensitive Hashing Forest (LSH Forest), an Approximate Nearest Neighbor
(ANN) model, to find similar conversations in a corpus and rank possible
candidates. Experimental results on the well-known Ubuntu Corpus (in English)
and a customer service chat dataset (in Dutch) show that, in combination with a
candidate selection method, retrieval-based approaches outperform generative
ones and reveal promising future research directions towards the usability of
such a system.
Authors' comments: A shorter version is accepted at ICMLA2017 conference;
acknowledgement added; typos corrected
Nikhil Mukund, Saurabh Thakur, Sheelu Abraham, A. K. Aniyan, Sanjit Mitra, Ninan Sajeeth Philip, Kaustubh Vaghmare, D. P. Acharjya
We present a machine learning based information retrieval system for
astronomical observatories that tries to address user defined queries related
to an instrument. In the modern instrumentation scenario where heterogeneous
systems and talents are simultaneously at work, the ability to supply with the
right information helps speeding up the detector maintenance operations.
Enhancing the detector uptime leads to increased coincidence observation and
improves the likelihood for the detection of astrophysical signals. Besides,
such efforts will efficiently disseminate technical knowledge to a wider
audience and will help the ongoing efforts to build upcoming detectors like the
LIGO-India etc even at the design phase to foresee possible challenges. The
proposed method analyses existing documented efforts at the site to
intelligently group together related information to a query and to present it
on-line to the user. The user in response can further go into interesting links
and find already developed solutions or probable ways to address the present
situation optimally. A web application that incorporates the above idea has
been implemented and tested for LIGO Livingston, LIGO Hanford and Virgo
observatories.
Authors' comments: 14 pages, 7 figures
Oussama Dhifallah, Christos Thrampoulidis, Yue M. Lu
A recently proposed convex formulation of the phase retrieval problem estimates the unknown signal by solving a simple linear program. This new scheme, known as PhaseMax, is computationally efficient compared to standard convex relaxation methods based on lifting techniques. In this paper, we present an exact performance analysis of PhaseMax under Gaussian measurements in the large system limit. In contrast to previously known performance bounds in the literature, our results are asymptotically exact and they also reveal a sharp phase transition phenomenon. Furthermore, the geometrical insights gained from our analysis led us to a novel nonconvex formulation of the phase retrieval problem and an accompanying iterative algorithm based on successive linearization and maximization over a polytope. This new algorithm, which we call PhaseLamp, has provably superior recovery performance over the original PhaseMax method.
Jordan A Comins, Stephanie A Carmack, Loet Leydesdorff
One essential component in the construction of patent landscapes in biomedical research and development (R&D) is identifying the most seminal patents. Hitherto, the identification of seminal patents required subject matter experts within biomedical areas. In this brief communication, we report an analytical method and tool, Patent Citation Spectroscopy (PCS), for rapidly identifying landmark patents in user-specified areas of biomedical innovation. PCS mines the cited references within large sets of patents and provides an estimate of the most historically impactful prior work. The efficacy of PCS is shown in two case studies of biomedical innovation with clinical relevance: (1) RNA interference and (2) cholesterol. PCS mined and analyzed 4,065 cited references related to patents on RNA interference and correctly identified the foundational patent of this technology, as independently reported by subject matter experts on RNAi intellectual property. Secondly, PCS was applied to a broad set of patents dealing with cholesterol - a case study chosen to reflect a more general, as opposed to expert, patent search query. PCS mined through 11,326 cited references and identified the seminal patent as that for Lipitor, the groundbreaking medication for treating high cholesterol as well as the pair of patents underlying Repatha. These cases suggest that PCS provides a useful method for identifying seminal patents in areas of biomedical innovation and therapeutics. The interactive tool is free-to-use at: www.leydesdorff.net/pcs/.
Qiwen Wang, Mikael Skoglund
The problem of private information retrieval (PIR) is to retrieve one message out of $K$ messages replicated at $N$ databases, without revealing the identity of the desired message to the databases. We consider the problem of PIR with colluding servers and eavesdroppers, named T-EPIR. Specifically, any $T$ out of $N$ databases may collude, i.e. they may communicate their interactions with the user to guess the identity of the requested message. An eavesdropper is curious to know the database and can tap in on the incoming and outgoing transmissions of any $E$ databases. The databases share some common randomness unknown to the eavesdropper and the user, and use the common randomness to generate the answers, such that the eavesdropper can learn no information about the $K$ messages. Define $R^*$ as the optimal ratio of the number of the desired message information bits to the number of total downloaded bits, and $\rho^*$ to be the optimal ratio of the information bits of the shared common randomness to the information bits of the desired file. In our previous work, we found that when $E \geq T$, the optimal ratio that can be achieved equals $1-\frac{E}{N}$. In this work, we focus on the case when $E \leq T$. We derive an outer bound $R^* \leq (1-\frac{T}{N}) \frac{1-\frac{E}{N} \cdot (\frac{T}{N})^{K-1}}{1-(\frac{T}{N})^K}$. We also obtain a lower bound of $\rho^* \geq \frac{\frac{E}{N}(1-(\frac{T}{N})^K)}{(1-\frac{T}{N})(1-\frac{E}{N} \cdot (\frac{T}{N})^{K-1})}$. For the achievability, we propose a scheme which achieves the rate (inner bound) $R=\frac{1-\frac{T}{N}}{1-(\frac{T}{N})^K}-\frac{E}{KN}$. The amount of shared common randomness used in the achievable scheme is $\frac{\frac{E}{N}(1-(\frac{T}{N})^K)}{1-\frac{T}{N}-\frac{E}{KN}(1-(\frac{T}{N})^K)}$ times the file size. The gap between the derived inner and outer bounds vanishes as the number of messages $K$ tends to infinity.
Zhizhong Zhang, Yuan Xie, Wensheng Zhang, Qi Tian
Multi-index fusion has demonstrated impressive performances in retrieval task
by integrating different visual representations in a unified framework.
However, previous works mainly consider propagating similarities via neighbor
structure, ignoring the high order information among different visual
representations. In this paper, we propose a new multi-index fusion scheme for
image retrieval. By formulating this procedure as a multilinear based
optimization problem, the complementary information hidden in different indexes
can be explored more thoroughly. Specially, we first build our multiple indexes
from various visual representations. Then a so-called index-specific functional
matrix, which aims to propagate similarities, is introduced for updating the
original index. The functional matrices are then optimized in a unified tensor
space to achieve a refinement, such that the relevant images can be pushed more
closer. The optimization problem can be efficiently solved by the augmented
Lagrangian method with theoretical convergence guarantee. Unlike the
traditional multi-index fusion scheme, our approach embeds the multi-index
subspace structure into the new indexes with sparse constraint, thus it has
little additional memory consumption in online query stage. Experimental
evaluation on three benchmark datasets reveals that the proposed approach
achieves the state-of-the-art performance, i.e., N-score 3.94 on UKBench, mAP
94.1\% on Holiday and 62.39\% on Market-1501.
Authors' comments: 12 pages
Luke Evans, Chun-Kit Lai
In this paper, we will introduce the notion of {\it conjugate phase
retrieval}, which is a relaxed definition of phase retrieval allowing recovery
of signals up to conjugacy as well as a global phase factor. It is known that
frames of real vectors are never phase retrievable on ${\mathbb C}^M$ in the
ordinary sense, but we show that they can be conjugate phase retrievable in
complex vector spaces. We continue to develop the theory on conjugate phase
retrievable real frames. In particular, a complete characterization of
conjugate phase retrievable real frames on ${\mathbb C}^2$ and ${\mathbb C}^3$
is given. Furthermore, we show that a generic real frame with at least $4M - 6$
measurements is conjugate phase retrievable in ${\mathbb C}^M$ for $ M \ge 4.$
Authors' comments: Minor revisions, typos fixed
Shiv Ram Dubey
Face recognition is still a very demanding area of research. This problem
becomes more challenging in unconstrained environment and in the presence of
several variations like pose, illumination, expression, etc. Local descriptors
are widely used for this task. The most of the existing local descriptors
consider only few immediate local neighbors and not able to utilize the wider
local information to make the descriptor more discriminative. The wider local
information based descriptors mainly suffer due to the increased
dimensionality. In this paper, this problem is solved by encoding the
relationship among directional neighbors in an efficient manner. The
relationship between the center pixel and the encoded directional neighbors is
utilized further to form the proposed local directional relation pattern
(LDRP). The descriptor is inherently uniform illumination invariant. The
multi-scale mechanism is also adapted to further boost the discriminative
ability of the descriptor. The proposed descriptor is evaluated under the image
retrieval framework over face databases. Very challenging databases like PaSC,
LFW, PubFig, ESSEX, FERET, AT&T, and FaceScrub are used to test the
discriminative ability and robustness of LDRP descriptor. Results are also
compared with the recent state-of-the-art face descriptors such as LBP, LTP,
LDP, LDN, LVP, DCP, LDGP and LGHP. Very promising performance is observed using
the proposed descriptor over very appealing face databases as compared to the
existing face descriptors. The proposed LDRP descriptor also outperforms the
pre-trained ImageNet CNN models over large-scale FaceScrub face dataset.
Moreover, it also outperforms the deep learning based DLib face descriptor in
many scenarios.
Authors' comments: Multimedia Tools and Applications
Keunwoo Choi, György Fazekas, Kyunghyun Cho, Mark Sandler
Following their success in Computer Vision and other areas, deep learning techniques have recently become widely adopted in Music Information Retrieval (MIR) research. However, the majority of works aim to adopt and assess methods that have been shown to be effective in other domains, while there is still a great need for more original research focusing on music primarily and utilising musical knowledge and insight. The goal of this paper is to boost the interest of beginners by providing a comprehensive tutorial and reducing the barriers to entry into deep learning for MIR. We lay out the basic principles and review prominent works in this hard to navigate the field. We then outline the network structures that have been successful in MIR problems and facilitate the selection of building blocks for the problems at hand. Finally, guidelines for new tasks and some advanced topics in deep learning are discussed to stimulate new research in this fascinating field.
Longhui Wei, Shiliang Zhang, Hantao Yao, Wen Gao, Qi Tian
The huge variance of human pose and the misalignment of detected human images
significantly increase the difficulty of person Re-Identification (Re-ID).
Moreover, efficient Re-ID systems are required to cope with the massive visual
data being produced by video surveillance systems. Targeting to solve these
problems, this work proposes a Global-Local-Alignment Descriptor (GLAD) and an
efficient indexing and retrieval framework, respectively. GLAD explicitly
leverages the local and global cues in human body to generate a discriminative
and robust representation. It consists of part extraction and descriptor
learning modules, where several part regions are first detected and then deep
neural networks are designed for representation learning on both the local and
global regions. A hierarchical indexing and retrieval framework is designed to
eliminate the huge redundancy in the gallery set, and accelerate the online
Re-ID procedure. Extensive experimental results show GLAD achieves competitive
accuracy compared to the state-of-the-art methods. Our retrieval framework
significantly accelerates the online Re-ID procedure without loss of accuracy.
Therefore, this work has potential to work better on person Re-ID tasks in real
scenarios.
Authors' comments: Accepted by ACM MM2017, 9 pages, 5 figures
Dan Li, Evangelos Kanoulas
Evaluation is crucial in Information Retrieval. The development of models, tools and methods has significantly benefited from the availability of reusable test collections formed through a standardized and thoroughly tested methodology, known as the Cranfield paradigm. Constructing these collections requires obtaining relevance judgments for a pool of documents, retrieved by systems participating in an evaluation task; thus involves immense human labor. To alleviate this effort different methods for constructing collections have been proposed in the literature, falling under two broad categories: (a) sampling, and (b) active selection of documents. The former devises a smart sampling strategy by choosing only a subset of documents to be assessed and inferring evaluation measure on the basis of the obtained sample; the sampling distribution is being fixed at the beginning of the process. The latter recognizes that systems contributing documents to be judged vary in quality, and actively selects documents from good systems. The quality of systems is measured every time a new document is being judged. In this paper we seek to solve the problem of large-scale retrieval evaluation combining the two approaches. We devise an active sampling method that avoids the bias of the active selection methods towards good systems, and at the same time reduces the variance of the current sampling approaches by placing a distribution over systems, which varies as judgments become available. We validate the proposed method using TREC data and demonstrate the advantages of this new method compared to past approaches.
Noa Garcia, George Vogiatzis
Measuring visual similarity between two or more instances within a data
distribution is a fundamental task in image retrieval. Theoretically,
non-metric distances are able to generate a more complex and accurate
similarity model than metric distances, provided that the non-linear data
distribution is precisely captured by the system. In this work, we explore
neural networks models for learning a non-metric similarity function for
instance search. We argue that non-metric similarity functions based on neural
networks can build a better model of human visual perception than standard
metric distances. As our proposed similarity function is differentiable, we
explore a real end-to-end trainable approach for image retrieval, i.e. we learn
the weights from the input image pixels to the final similarity score.
Experimental evaluation shows that non-metric similarity networks are able to
learn visual similarities between images and improve performance on top of
state-of-the-art image representations, boosting results in standard image
retrieval datasets with respect standard metric distances.
Authors' comments: Image and Vision Computing (2019)
Esmerando Escoto, Tamas Nagy, Ayhan Tajalli, Günter Steinmeyer
Dispersion scan is a self-referenced measurement technique for ultrashort pulses. Similar to frequency-resolved optical gating, the dispersion scan technique records the dependence of nonlinearly generated spectra as a function of a parameter. For the two mentioned techniques, these parameters are the delay and the dispersion, respectively. While dispersion scan seems to offer a number of potential advantages over other characterization methods, in particular for measuring few-cycle pulses, retrieval of the spectral phase from the measured traces has so far mostly relied on the Nelder-Mead algorithm, which has a tendency of stagnation in a local minimum and may produce ghost satellites in the retrieval of pulses with complex spectra. We evaluate three different strategies to overcome these retrieval problems, namely regularization, use of a generalized-projections algorithm, and an evolutionary retrieval algorithm. While all these measures are found to improve the precision and convergence of dispersion scan retrieval, differential evolution is found to provide the best performance, enabling the near-perfect retrieval of the phase of complex supercontinuum pulses within less than ten seconds, even in the presence of strong detection noise and limited phase-matching bandwidth of the nonlinear process.
M. Mansuripur, P. K. Khulbe, S. M. Kuebler, J. W. Perry, M. S. Giridhar, J. Kevin Erwin, Kibyung Seong, Seth Marder et al.
To store information at extremely high-density and data-rate, we propose to
adapt, integrate, and extend the techniques developed by chemists and molecular
biologists for the purpose of manipulating biological and other macromolecules.
In principle, volumetric densities in excess of 10^21 bits/cm^3 can be achieved
when individual molecules having dimensions below a nanometer or so are used to
encode the 0's and 1's of a binary string of data. In practice, however, given
the limitations of electron-beam lithography, thin film deposition and
patterning technologies, molecular manipulation in submicron dimensions, etc.,
we believe that volumetric storage densities on the order of 10^16 bits/cm^3
(i.e., petabytes per cubic centimeter) should be readily attainable, leaving
plenty of room for future growth. The unique feature of the proposed new
approach is its focus on the feasibility of storing bits of information in
individual molecules, each only a few angstroms in size.
Authors' comments: 13 pages, 5 references, 13 figures
Marco Mondelli, Andrea Montanari
In phase retrieval we want to recover an unknown signal $\boldsymbol
x\in\mathbb C^d$ from $n$ quadratic measurements of the form $y_i =
|\langle{\boldsymbol a}_i,{\boldsymbol x}\rangle|^2+w_i$ where $\boldsymbol
a_i\in \mathbb C^d$ are known sensing vectors and $w_i$ is measurement noise.
We ask the following weak recovery question: what is the minimum number of
measurements $n$ needed to produce an estimator $\hat{\boldsymbol
x}(\boldsymbol y)$ that is positively correlated with the signal $\boldsymbol
x$? We consider the case of Gaussian vectors $\boldsymbol a_i$. We prove that -
in the high-dimensional limit - a sharp phase transition takes place, and we
locate the threshold in the regime of vanishingly small noise. For $n\le
d-o(d)$ no estimator can do significantly better than random and achieve a
strictly positive correlation. For $n\ge d+o(d)$ a simple spectral estimator
achieves a positive correlation. Surprisingly, numerical simulations with the
same spectral estimator demonstrate promising performance with realistic
sensing matrices. Spectral methods are used to initialize non-convex
optimization algorithms in phase retrieval, and our approach can boost the
performance in this setting as well.
Our impossibility result is based on classical information-theory arguments.
The spectral algorithm computes the leading eigenvector of a weighted empirical
covariance matrix. We obtain a sharp characterization of the spectral
properties of this random matrix using tools from free probability and
generalizing a recent result by Lu and Li. Both the upper and lower bound
generalize beyond phase retrieval to measurements $y_i$ produced according to a
generalized linear model. As a byproduct of our analysis, we compare the
threshold of the proposed spectral method with that of a message passing
algorithm.
Authors' comments: 63 pages, 3 figures, presented at COLT'18 and accepted at Foundations
of Computational Mathematics
Xuelong Li, Di Hu, Xiaoqiang Lu
Image is usually taken for expressing some kinds of emotions or purposes,
such as love, celebrating Christmas. There is another better way that combines
the image and relevant song to amplify the expression, which has drawn much
attention in the social network recently. Hence, the automatic selection of
songs should be expected. In this paper, we propose to retrieve semantic
relevant songs just by an image query, which is named as the image2song
problem. Motivated by the requirements of establishing correlation in
semantic/content, we build a semantic-based song retrieval framework, which
learns the correlation between image content and lyric words. This model uses a
convolutional neural network to generate rich tags from image regions, a
recurrent neural network to model lyric, and then establishes correlation via a
multi-layer perceptron. To reduce the content gap between image and lyric, we
propose to make the lyric modeling focus on the main image content via a tag
attention. We collect a dataset from the social-sharing multimodal data to
study the proposed problem, which consists of (image, music clip, lyric)
triplets. We demonstrate that our proposed model shows noticeable results in
the image2song retrieval task and provides suitable songs. Besides, the
song2image task is also performed.
Authors' comments: 13 pages, 13 figures, accepted by ICCV 2017
Jian-Feng Cai, Haixia Liu, Yang Wang
The phase retrieval problem is a fundamental problem in many fields, which is
appealing for investigation. It is to recover the signal vector
$\tilde{x}\in\mathbb{C}^d$ from a set of $N$ measurements
$b_n=|f^*_n\tilde{x}|^2,\ n=1,\cdots, N$, where $\{f_n\}_{n=1}^N$ forms a frame
of $\mathbb{C}^d$. %It is generally a non-convex minimization problem, which is
NP-hard. Existing algorithms usually use a least squares fitting to the
measurements, yielding a quartic polynomial minimization. In this paper, we
employ a new strategy by splitting the variables, and we solve a bi-variate
optimization problem that is quadratic in each of the variables. An alternating
gradient descent algorithm is proposed, and its convergence for any
initialization is provided. Since a larger step size is allowed due to the
smaller Hessian, the alternating gradient descent algorithm converges faster
than the gradient descent algorithm (known as the Wirtinger flow algorithm)
applied to the quartic objective without splitting the variables. Numerical
results illustrate that our proposed algorithm needs less iterations than
Wirtinger flow to achieve the same accuracy.
Authors' comments: 18 pages, 5 figures
Oussama Dhifallah, Yue M. Lu
We consider a recently proposed convex formulation, known as the PhaseMax method, for solving the phase retrieval problem. Using the replica method from statistical mechanics, we analyze the performance of PhaseMax in the high-dimensional limit. Our analysis predicts the \emph{exact} asymptotic performance of PhaseMax. In particular, we show that a sharp phase transition phenomenon takes place, with a simple analytical formula characterizing the phase transition boundary. This result shows that the oversampling ratio required by existing performance bounds in the literature can be significantly reduced. Numerical results confirm the validity of our replica analysis, showing that the theoretical predictions are in excellent agreement with the actual performance of the algorithm, even for moderate signal dimensions.
Xin Jin, Shiming Ge, Chenggen Song
Recently, cloud storage and processing have been widely adopted. Mobile users
in one family or one team may automatically backup their photos to the same
shared cloud storage space. The powerful face detector trained and provided by
a 3rd party may be used to retrieve the photo collection which contains a
specific group of persons from the cloud storage server. However, the privacy
of the mobile users may be leaked to the cloud server providers. In the
meanwhile, the copyright of the face detector should be protected. Thus, in
this paper, we propose a protocol of privacy preserving face retrieval in the
cloud for mobile users, which protects the user photos and the face detector
simultaneously. The cloud server only provides the resources of storage and
computing and can not learn anything of the user photos and the face detector.
We test our protocol inside several families and classes. The experimental
results reveal that our protocol can successfully retrieve the proper photos
from the cloud server and protect the user photos and the face detector.
Authors' comments: Abuse Preventive Data Mining (APDM2017, IJCAI Workshop), 19-25
August, 2017 Melbourne, Australia
Xin Huang, Yuxin Peng, Mingkuan Yuan
Cross-modal retrieval has drawn wide interest for retrieval across different
modalities of data. However, existing methods based on DNN face the challenge
of insufficient cross-modal training data, which limits the training
effectiveness and easily leads to overfitting. Transfer learning is for
relieving the problem of insufficient training data, but it mainly focuses on
knowledge transfer only from large-scale datasets as single-modal source domain
to single-modal target domain. Such large-scale single-modal datasets also
contain rich modal-independent semantic knowledge that can be shared across
different modalities. Besides, large-scale cross-modal datasets are very
labor-consuming to collect and label, so it is significant to fully exploit the
knowledge in single-modal datasets for boosting cross-modal retrieval. This
paper proposes modal-adversarial hybrid transfer network (MHTN), which to the
best of our knowledge is the first work to realize knowledge transfer from
single-modal source domain to cross-modal target domain, and learn cross-modal
common representation. It is an end-to-end architecture with two subnetworks:
(1) Modal-sharing knowledge transfer subnetwork is proposed to jointly transfer
knowledge from a large-scale single-modal dataset in source domain to all
modalities in target domain with a star network structure, which distills
modal-independent supplementary knowledge for promoting cross-modal common
representation learning. (2) Modal-adversarial semantic learning subnetwork is
proposed to construct an adversarial training mechanism between common
representation generator and modality discriminator, making the common
representation discriminative for semantics but indiscriminative for modalities
to enhance cross-modal semantic consistency during transfer process.
Comprehensive experiments on 4 widely-used datasets show its effectiveness and
generality.
Authors' comments: 12 pages, submitted to IEEE Transactions on Cybernetics