Weitsung Lin, Tinghsuan Chao, Jianmin Wu, Tianhuang Su
As emojis are widely used in social media, people not only use an emoji to
express their emotions or mention things but also extend its usage to represent
complicate emotions, concepts or activities by combining multiple emojis. In
this work, we study how emoji combination, a consecutive emoji sequence, is
used like a new language. We propose a novel algorithm called Retrieval
Strategy to predict what emoji combination follows given a short text as
context. Our algorithm treats emoji combinations as phrase in language, ranking
sets of emoji combinations like retrieving words from dictionary. We show that
our algorithm largely improves the F1 score from 0.141 to 0.204 on emoji
combination prediction task.
Authors' comments: 4 pages, 2 figures, published in anlp.jp 2019
Donghuo Zeng
A cross-modal retrieval process is to use a query in one modality to obtain
relevant data in another modality. The challenging issue of cross-modal
retrieval lies in bridging the heterogeneous gap for similarity computation,
which has been broadly discussed in image-text, audio-text, and video-text
cross-modal multimedia data mining and retrieval. However, the gap in temporal
structures of different data modalities is not well addressed due to the lack
of alignment relationship between temporal cross-modal structures. Our research
focuses on learning the correlation between different modalities for the task
of cross-modal retrieval. We have proposed an architecture: Supervised-Deep
Canonical Correlation Analysis (S-DCCA), for cross-modal retrieval. In this
forum paper, we will talk about how to exploit triplet neural networks (TNN) to
enhance the correlation learning for cross-modal retrieval. The experimental
result shows the proposed TNN-based supervised correlation learning
architecture can get the best result when the data representation extracted by
supervised learning.
Authors' comments: 3 pages, 1 figure, Submitted to ICDM2019 Ph.D. Forum session
Stanislav Morozov, Artem Babenko
In plenty of machine learning applications, the most relevant items for a particular query should be efficiently extracted, while the relevance function is based on a highly-nonlinear model, e.g., DNNs or GBDTs. Due to the high computational complexity of such models, exhaustive search is infeasible even for medium-scale problems. To address this issue, we introduce Relevance Proximity Graphs (RPG): an efficient non-exhaustive approach that provides a high-quality approximate solution for maximal relevance retrieval. Namely, we extend the recent similarity graphs framework to the setting, when there is no similarity measure defined on item pairs, which is a common practical use-case. By design, our approach directly maximizes off-the-shelf relevance functions and does not require any proxy auxiliary models. Via extensive experiments, we show that the developed method provides excellent retrieval accuracy while requiring only a few model computations, outperforming indirect models. We open-source our implementation as well as two large-scale datasets to support further research on relevance retrieval.
Rodrigo Nogueira
A goal shared by artificial intelligence and information retrieval is to create an oracle, that is, a machine that can answer our questions, no matter how difficult they are. A more limited, but still instrumental, version of this oracle is a question-answering system, in which an open-ended question is given to the machine, and an answer is produced based on the knowledge it has access to. Such systems already exist and are increasingly capable of answering complicated questions. This progress can be partially attributed to the recent success of machine learning and to the efficient methods for storing and retrieving information, most notably through web search engines. One can imagine that this general-purpose question-answering system can be built as a billion-parameters neural network trained end-to-end with a large number of pairs of questions and answers. We argue, however, that although this approach has been very successful for tasks such as machine translation, storing the world's knowledge as parameters of a learning machine can be very hard. A more efficient way is to train an artificial agent on how to use an external retrieval system to collect relevant information. This agent can leverage the effort that has been put into designing and running efficient storage and retrieval systems by learning how to best utilize them to accomplish a task. ...
Felix Hamann, Nadja Kurz, Adrian Ulges
In retrieval applications, binary hashes are known to offer significant
improvements in terms of both memory and speed. We investigate the compression
of sentence embeddings using a neural encoder-decoder architecture, which is
trained by minimizing reconstruction error. Instead of employing the original
real-valued embeddings, we use latent representations in Hamming space produced
by the encoder for similarity calculations.
In quantitative experiments on several benchmarks for semantic similarity
tasks, we show that our compressed hamming embeddings yield a comparable
performance to uncompressed embeddings (Sent2Vec, InferSent, Glove-BoW), at
compression ratios of up to 256:1. We further demonstrate that our model
strongly decorrelates input features, and that the compressor generalizes well
when pre-trained on Wikipedia sentences. We publish the source code on Github
and all experimental results.
Authors' comments: 4 Pages, 9 Figures, 1 Table
Rahaf Aljundi, Lucas Caccia, Eugene Belilovsky, Massimo Caccia, Min Lin, Laurent Charlin, Tinne Tuytelaars
Continual learning, the setting where a learning agent is faced with a never ending stream of data, continues to be a great challenge for modern machine learning systems. In particular the online or "single-pass through the data" setting has gained attention recently as a natural setting that is difficult to tackle. Methods based on replay, either generative or from a stored memory, have been shown to be effective approaches for continual learning, matching or exceeding the state of the art in a number of standard benchmarks. These approaches typically rely on randomly selecting samples from the replay memory or from a generative model, which is suboptimal. In this work, we consider a controlled sampling of memories for replay. We retrieve the samples which are most interfered, i.e. whose prediction will be most negatively impacted by the foreseen parameters update. We show a formulation for this sampling criterion in both the generative replay and the experience replay setting, producing consistent gains in performance and greatly reduced forgetting. We release an implementation of our method at https://github.com/optimass/Maximally_Interfered_Retrieval.
Darío Garigliotti, Dyaa Albakour, Miguel Martinez, Krisztian Balog
Monitoring entities in media streams often relies on rich entity
representations, like structured information available in a knowledge base
(KB). For long-tail entities, such monitoring is highly challenging, due to
their limited, if not entirely missing, representation in the reference KB. In
this paper, we address the problem of retrieving textual contexts for
monitoring long-tail entities. We propose an unsupervised method to overcome
the limited representation of long-tail entities by leveraging established
entities and their contexts as support information. Evaluation on a
purpose-built test collection shows the suitability of our approach and its
robustness for out-of-KB entities.
Authors' comments: Proceedings of the 2019 ACM International Conference on Theory of
Information Retrieval (ICTIR' 19)
Hossein S. Aghamiry, Ali Gholami, Stéphane Operto
Extended formulation of Full Waveform Inversion (FWI), called Wavefield Reconstruction Inversion (WRI), offers potential benefits of decreasing the nonlinearity of the inverse problem by replacing the explicit inverse of the ill-conditioned wave-equation operator of classical FWI (the oscillating Green functions) with a suitably defined data-driven regularized inverse. This regularization relaxes the wave-equation constraint to reconstruct wavefields that match the data, hence mitigating the risk of cycle skipping. The subsurface model parameters are then updated in a direction that reduces these constraint violations. However, in the case of a rough initial model, the phase errors in the reconstructed wavefields may trap the waveform inversion in a local minimum leading to inaccurate subsurface models. In this paper, in order to avoid matching such incorrect phase information during the early WRI iterations, we design a new cost function based upon phase retrieval, namely a process which seeks to reconstruct a signal from the amplitude of linear measurements. This new formulation, called Wavefield Inversion with Phase Retrieval (WIPR), further improves the robustness of the parameter estimation subproblem by a suitable phase correction. We implement the resulting WIPR problem with an alternating-direction approach, which combines the Majorization-Minimization (MM) algorithm to linearise the phase-retrieval term and a variable splitting technique based upon the alternating direction method of multipliers (ADMM). This new workflow equipped with Tikhonov-total variation (TT) regularization, which is the combination of second-order Tikhonov and total variation regularizations and bound constraints, successfully reconstructs the 2004 BP salt model from a sparse fixed-spread acquisition using a 3~Hz starting frequency and a homogeneous initial velocity model.
Mehdi Amara, Christine Opagiste, Rose-Marie Galera
The reported temperature variations of CeB6 s magnetic entropy are
inconsistent with the fourfold degeneracy of the crystal field ground state.
This old question is here addressed through new specific heat measurements and
an improved description, in the cage context, of both the phonons and crystal
field contributions to the specific heat. The antiferromagnetic transition is
characterized as first-order and its latent heat determined. From the phonons
dispersion for a cage compound, the lattice specific heat contribution is
derived from the LaB6 data. Once corrected for the first-order transition and
lattice contributions, the magnetic entropy displays the characteristic plateau
of the quadruplet crystal field ground state, but at temperatures in excess of
30 K. Below 30 K, as the ordering temperature is approached, the magnetic
entropy is substantially reduced. This anomalous temperature dependence is
consistent with a crystal field ground state split by the rare-earth movement,
a phenomenon specific to rare-earth cage compounds.
Authors' comments: 11 double column pages, 9 figures, latex for PRB
Yinfei Yang, Daniel Cer, Amin Ahmad, Mandy Guo, Jax Law, Noah Constant, Gustavo Hernandez Abrego, Steve Yuan et al.
We introduce two pre-trained retrieval focused multilingual sentence encoding
models, respectively based on the Transformer and CNN model architectures. The
models embed text from 16 languages into a single semantic space using a
multi-task trained dual-encoder that learns tied representations using
translation based bridge tasks (Chidambaram al., 2018). The models provide
performance that is competitive with the state-of-the-art on: semantic
retrieval (SR), translation pair bitext retrieval (BR) and retrieval question
answering (ReQA). On English transfer learning tasks, our sentence-level
embeddings approach, and in some cases exceed, the performance of monolingual,
English only, sentence embedding models. Our models are made available for
download on TensorFlow Hub.
Authors' comments: 6 pages, 6 tables, 2 listings, and 1 figure
Cristian Rusu
In this note, we discuss the shift retrieval problems, both classical and
compressed, and provide connections between them using circulant matrices. We
review the properties of circulant matrices necessary for our calculations and
then show how shifts can be recovered from a single measurement.
Authors' comments: arXiv admin note: substantial text overlap with arXiv:1812.01115
Felix Krahmer, Dominik Stöger
Phase retrieval refers to the problem of reconstructing an unknown vector
$x_0 \in \mathbb{C}^n$ or $x_0 \in \mathbb{R}^n $ from $m$ measurements of the
form $y_i = \big\vert \langle \xi^{\left(i\right)}, x_0 \rangle \big\vert^2 $,
where $ \left\{ \xi^{\left(i\right)} \right\}^m_{i=1} \subset \mathbb{C}^m $
are known measurement vectors. While Gaussian measurements allow for recovery
of arbitrary signals provided the number of measurements scales at least
linearly in the number of dimensions, it has been shown that ambiguities may
arise for certain other classes of measurements $ \left\{ \xi^{\left(i\right)}
\right\}^{m}_{i=1}$ such as Bernoulli measurements or Fourier measurements. In
this paper, we will prove that even when a subgaussian vector $
\xi^{\left(i\right)} \in \mathbb{C}^m $ does not fulfill a small-ball
probability assumption, the PhaseLift method is still able to reconstruct a
large class of signals $x_0 \in \mathbb{R}^n$ from the measurements. This
extends recent work by Krahmer and Liu from the real-valued to the
complex-valued case. However, our proof strategy is quite different and we
expect some of the new proof ideas to be useful in several other measurement
scenarios as well. We then extend our results $x_0 \in \mathbb{C}^n $ up to an
additional assumption which, as we show, is necessary.
Authors' comments: 25 pages
Cheng Chang, Himanshu Rai, Satya Krishna Gorti, Junwei Ma, Chundi Liu, Guangwei Yu, Maksims Volkovs
We present our solution to Landmark Image Retrieval Challenge 2019. This challenge was based on the large Google Landmarks Dataset V2[9]. The goal was to retrieve all database images containing the same landmark for every provided query image. Our solution is a combination of global and local models to form an initial KNN graph. We then use a novel extension of the recently proposed graph traversal method EGT [1] referred to as semi-supervised EGT to refine the graph and retrieve better candidates.
Xinyu Hua, Zhe Hu, Lu Wang
Automatic argument generation is an appealing but challenging task. In this
paper, we study the specific problem of counter-argument generation, and
present a novel framework, CANDELA. It consists of a powerful retrieval system
and a novel two-step generation model, where a text planning decoder first
decides on the main talking points and a proper language style for each
sentence, then a content realization decoder reflects the decisions and
constructs an informative paragraph-level argument. Furthermore, our generation
model is empowered by a retrieval system indexed with 12 million articles
collected from Wikipedia and popular English news media, which provides access
to high-quality content with diversity. Automatic evaluation on a large-scale
dataset collected from Reddit shows that our model yields significantly higher
BLEU, ROUGE, and METEOR scores than the state-of-the-art and non-trivial
comparisons. Human evaluation further indicates that our system arguments are
more appropriate for refutation and richer in content.
Authors' comments: Accepted as a long paper to ACL 2019
Bor-Chun Chen, Zuxuan Wu, Larry S. Davis, Ser-Nam Lim
Detecting spliced images is one of the emerging challenges in computer vision. Unlike prior methods that focus on detecting low-level artifacts generated during the manipulation process, we use an image retrieval approach to tackle this problem. When given a spliced query image, our goal is to retrieve the original image from a database of authentic images. To achieve this goal, we propose representing an image by its constituent objects based on the intuition that the finest granularity of manipulations is oftentimes at the object-level. We introduce a framework, object embeddings for spliced image retrieval (OE-SIR), that utilizes modern object detectors to localize object regions. Each region is then embedded and collectively used to represent the image. Further, we propose a student-teacher training paradigm for learning discriminative embeddings within object regions to avoid expensive multiple forward passes. Detailed analysis of the efficacy of different feature embedding models is also provided in this study. Extensive experimental results show that the OE-SIR achieves state-of-the-art performance in spliced image retrieval.
Estefania Talavera, Petia Radeva, Nicolai Petkov
The availability and use of egocentric data are rapidly increasing due to the growing use of wearable cameras. Our aim is to study the effect (positive, neutral or negative) of egocentric images or events on an observer. Given egocentric photostreams capturing the wearer's days, we propose a method that aims to assign sentiment to events extracted from egocentric photostreams. Such moments can be candidates to retrieve according to their possibility of representing a positive experience for the camera's wearer. The proposed approach obtained a classification accuracy of 75% on the test set, with a deviation of 8%. Our model makes a step forward opening the door to sentiment recognition in egocentric photostreams.
Philippe Jaming, Karim Kellay, Rolando Perez
This study investigates the phase retrieval problem for wide-band signals. We solve the following problem: given f $\in$ L 2 (R) with Fourier transform in L 2 (R, e^{2c|x|} dx), we find all functions g $\in$ L 2 (R) with Fourier transform in L 2 (R, e^{2c|x| dx}), such that |f (x)| = |g(x)| for all x $\in$ R. To do so, we first translate the problem to functions in the Hardy spaces on the disc via a conformal bijection, and take advantage of the inner-outer factorization. We also consider the same problem with additional constraints involving some transforms of f and g, and determine if these constraints force uniqueness of the solution.
Çağatay Işıl, Figen S. Oktem, Aykut Koç
Classical phase retrieval problem is the recovery of a constrained image from
the magnitude of its Fourier transform. Although there are several well-known
phase retrieval algorithms including the hybrid input-output (HIO) method, the
reconstruction performance is generally sensitive to initialization and
measurement noise. Recently, deep neural networks (DNNs) have been shown to
provide state-of-the-art performance in solving several inverse problems such
as denoising, deconvolution, and superresolution. In this work, we develop a
phase retrieval algorithm that utilizes two DNNs together with the model-based
HIO method. First, a DNN is trained to remove the HIO artifacts and is used
iteratively with the HIO method to improve the reconstructions. After this
iterative phase, a second DNN is trained to remove the remaining artifacts.
Numerical results demonstrate the effectiveness of ourapproach, which has
little additional computational cost compared to the HIO method. Our approach
not only achieves state-of-the-art reconstruction performance but also is more
robust to different initialization and noise levels.
Authors' comments: 14 pages, 8 figures, published in Applied Optics (Vol. 58, Issue 20,
pp. 5422-5431 (2019))
Yinchuan Li, Xu Zhang, Zegang Ding, Xiaodong Wang
This paper concerns the problem of estimating multidimensional (MD) frequencies using prior knowledge of the signal spectral sparsity from partial time samples. In many applications, such as radar, wireless communications, and super-resolution imaging, some structural information about the signal spectrum might be known beforehand. Suppose that the frequencies lie in given intervals, the goal is to improve the frequency estimation performance by using the prior information. We study the MD Vandermonde decomposition of block Toeplitz matrices in which the frequencies are restricted to given intervals. We then propose to solve the frequency-selective atomic norm minimization by converting them into semidefinite program based on the MD Vandermonde decomposition. Numerical simulation results are presented to illustrate the good performance of the proposed method.
Bing Gao, Xinwei Sun, Yang Wang, Zhiqiang Xu
In this paper, we propose a new non-convex algorithm for solving the phase retrieval problem, i.e., the reconstruction of a signal $ \vx\in\H^n $ ($\H=\R$ or $\C$) from phaseless samples $ b_j=\abs{\langle \va_j, \vx\rangle } $, $ j=1,\ldots,m $. The proposed algorithm solves a new proposed model, perturbed amplitude-based model, for phase retrieval and is correspondingly named as {\em Perturbed Amplitude Flow} (PAF). We prove that PAF can recover $c\vx$ ($\abs{c} = 1$) under $\mathcal{O}(n)$ Gaussian random measurements (optimal order of measurements). Starting with a designed initial point, our PAF algorithm iteratively converges to the true solution at a linear rate for both real and complex signals. Besides, PAF algorithm needn't any truncation or re-weighted procedure, so it enjoys simplicity for implementation. The effectiveness and benefit of the proposed method are validated by both the simulation studies and the experiment of recovering natural images.