Li-Phen Yen, Zhen-Yu Wu, Kuan-Yu Chen
Recent developments in deep learning have led to a significant innovation in various classic and practical subjects, including speech recognition, computer vision, question answering, information retrieval and so on. In the context of natural language processing (NLP), language representations have shown giant successes in many downstream tasks, so the school of studies have become a major stream of research recently. Because the immenseness of multimedia data along with speech have spread around the world in our daily life, spoken document retrieval (SDR) has become an important research subject in the past decades. Targeting on enhancing the SDR performance, the paper concentrates on proposing a neural retrieval framework, which assembles the merits of using language modeling (LM) mechanism in SDR and leveraging the abstractive information learned by the language representation models. Consequently, to our knowledge, this is a pioneer study on supervised training of a neural LM-based SDR framework, especially combined with the pretrained language representation methods.
Siddharth Gandhi, Nikku Madhusudhan, George Hawker, Anjali Piette
High-resolution Doppler spectroscopy has been used to detect several chemical
species in exoplanetary atmospheres. Such detections have traditionally relied
on cross correlation of observed spectra against spectral model templates, an
approach that is successful for detecting chemical species but not optimised
for constraining abundances. Recent work has explored ways to perform
atmospheric retrievals on high-resolution spectra (HRS) and combine them with
retrievals routinely performed for low-resolution spectra (LRS) by developing a
mapping from the cross correlation function to a likelihood metric. We build
upon previous studies and report HyDRA-H, a hybrid retrieval code for
simultaneous analysis of low- and high- resolution thermal emission spectra of
exoplanets in a fully Bayesian approach. We demonstrate HyDRA-H on the hot
Jupiter HD 209458b as a case study. We validate our HRS retrieval capability by
confirming previous results and report a simultaneous hybrid retrieval using
both HRS and LRS data. The LRS data span the HST WFC3 (1.1-1.7 $\mu$m) and
Spitzer photometry (IRAC 3.6-8 $\mu$m) bands, while the HRS data were obtained
with CRIRES on VLT at 2.3 $\mu$m. The constraints on the composition and
temperature profiles for the hybrid retrieval are more stringent than
retrievals with either LRS or HRS datasets individually. We retrieve abundances
of $\log(\mathrm{H_2O)} = -4.11^{+0.91}_{-0.30}$ and $\log(\mathrm{CO}) =
{-2.16}^{+0.99}_{-0.47}$, and $\mathrm{C/O} = 0.99^{+0.01}_{-0.02}$, consistent
with previous works. We constrain the photospheric temperature to be
$1498^{+216}_{-57}$ K, consistent with the equilibrium temperature. Our results
demonstrate the significant advantages of hybrid retrievals by combining
strengths of both HRS and LRS observations which probe complementary aspects of
exoplanetary atmospheres.
Authors' comments: 16 pages, 9 figures, accepted for publication in The Astronomical
Journal
Mikolaj Jankowski, Deniz Gunduz, Krystian Mikolajczyk
Motivated by surveillance applications with wireless cameras or drones, we consider the problem of image retrieval over a wireless channel. Conventional systems apply lossy compression on query images to reduce the data that must be transmitted over the bandwidth and power limited wireless link. We first note that reconstructing the original image is not needed for retrieval tasks; hence, we introduce a deep neutral network (DNN) based compression scheme targeting the retrieval task. Then, we completely remove the compression step, and propose another DNN-based communication scheme that directly maps the feature vectors to channel inputs. This joint source-channel coding (JSCC) approach not only improves the end-to-end accuracy, but also simplifies and speeds up the encoding operation which is highly beneficial for power and latency constrained IoT applications.
Chao Tian
We consider the fundamental tradeoff between the storage cost and the
download cost in private information retrieval systems, without any explicit
structural restrictions on the storage codes, such as maximum distance
separable codes or uncoded storage. Two novel outer bounds are provided, which
have the following implications. When the messages are stored without any
redundancy across the databases, the optimal PIR strategy is to download all
the messages; on the other hand, for PIR capacity-achieving codes, each
database can reduce the storage cost, from storing all the messages, by no more
than one message on average. We then focus on the two-message two-database
case, and show that a stronger outer bound can be derived through a novel
pseudo-message technique. This stronger outer bound suggests that a precise
characterization of the storage-download tradeoff may require non-Shannon type
inequalities, or at least more sophisticated bounding techniques.
Authors' comments: 17 pages, 3 figures
Daniele Giunchi, Stuart james, Donald Degraen, Anthony Steed
Drawing tools for Virtual Reality (VR) enable users to model 3D designs from
within the virtual environment itself. These tools employ sketching and
sculpting techniques known from desktop-based interfaces and apply them to
hand-based controller interaction. While these techniques allow for mid-air
sketching of basic shapes, it remains difficult for users to create detailed
and comprehensive 3D models. In our work, we focus on supporting the user in
designing the virtual environment around them by enhancing sketch-based
interfaces with a supporting system for interactive model retrieval. Through
sketching, an immersed user can query a database containing detailed 3D models
and replace them into the virtual environment. To understand supportive
sketching within a virtual environment, we compare different methods of sketch
interaction, i.e., 3D mid-air sketching, 2D sketching on a virtual tablet, 2D
sketching on a fixed virtual whiteboard, and 2D sketching on a real tablet.
%using a 2D physical tablet, a 2D virtual tablet, a 2D virtual whiteboard, and
3D mid-air sketching. Our results show that 3D mid-air sketching is considered
to be a more intuitive method to search a collection of models while the
addition of physical devices creates confusion due to the complications of
their inclusion within a virtual environment. While we pose our work as a
retrieval problem for 3D models of chairs, our results can be extrapolated to
other sketching tasks for virtual environments.
Authors' comments: 10 pages
Kiryung Lee, Sohail Bahmani, Yonina Eldar, Justin Romberg
We study the low-rank phase retrieval problem, where we try to recover a $d_1\times d_2$ low-rank matrix from a series of phaseless linear measurements. This is a fourth-order inverse problem, as we are trying to recover factors of matrix that have been put through a quadratic nonlinearity after being multiplied together. We propose a solution to this problem using the recently introduced technique of anchored regression. This approach uses two different types of convex relaxations: we replace the quadratic equality constraints for the phaseless measurements by a search over a polytope, and enforce the rank constraint through nuclear norm regularization. The result is a convex program that works in the space of $d_1 \times d_2$ matrices. We analyze two specific scenarios. In the first, the target matrix is rank-$1$, and the observations are structured to correspond to a phaseless blind deconvolution. In the second, the target matrix has general rank, and we observe the magnitudes of the inner products against a series of independent Gaussian random matrices. In each of these problems, we show that the anchored regression returns an accurate estimate from a near-optimal number of measurements given that we have access to an anchor matrix of sufficient quality. We also show how to create such an anchor in the phaseless blind deconvolution problem, again from an optimal number of measurements, and present a partial result in this direction for the general rank problem.
Bolin Wei
Code comment generation is a crucial task in the field of automatic software
development. Most previous neural comment generation systems used an
encoder-decoder neural network and encoded only information from source code as
input. Software reuse is common in software development. However, this feature
has not been introduced to existing systems. Inspired by the traditional
IR-based approaches, we propose to use the existing comments of similar source
code as exemplars to guide the comment generation process. Based on an open
source search engine, we first retrieve a similar code and treat its comment as
an exemplar. Then we applied a seq2seq neural network to conduct an
exemplar-based comment generation. We evaluate our approach on a large-scale
Java corpus, and experimental results demonstrate that our model significantly
outperforms the state-of-the-art methods.
Authors' comments: To appear at ASE 2019 Student Research Competition
Alireza Mohammadshahi, Remi Lebret, Karl Aberer
In this paper, we propose a new approach to learn multimodal multilingual embeddings for matching images and their relevant captions in two languages. We combine two existing objective functions to make images and captions close in a joint embedding space while adapting the alignment of word embeddings between existing languages in our model. We show that our approach enables better generalization, achieving state-of-the-art performance in text-to-image and image-to-text retrieval task, and caption-caption similarity task. Two multimodal multilingual datasets are used for evaluation: Multi30k with German and English captions and Microsoft-COCO with English and Japanese captions.
Yao Wan, Jingdong Shu, Yulei Sui, Guandong Xu, Zhou Zhao, Jian Wu, Philip S. Yu
Code retrieval techniques and tools have been playing a key role in facilitating software developers to retrieve existing code fragments from available open-source repositories given a user query. Despite the existing efforts in improving the effectiveness of code retrieval, there are still two main issues hindering them from being used to accurately retrieve satisfiable code fragments from large-scale repositories when answering complicated queries. First, the existing approaches only consider shallow features of source code such as method names and code tokens, but ignoring structured features such as abstract syntax trees (ASTs) and control-flow graphs (CFGs) of source code, which contains rich and well-defined semantics of source code. Second, although the deep learning-based approach performs well on the representation of source code, it lacks the explainability, making it hard to interpret the retrieval results and almost impossible to understand which features of source code contribute more to the final results. To tackle the two aforementioned issues, this paper proposes MMAN, a novel Multi-Modal Attention Network for semantic source code retrieval. A comprehensive multi-modal representation is developed for representing unstructured and structured features of source code, with one LSTM for the sequential tokens of code, a Tree-LSTM for the AST of code and a GGNN (Gated Graph Neural Network) for the CFG of code. Furthermore, a multi-modal attention fusion layer is applied to assign weights to different parts of each modality of source code and then integrate them into a single hybrid representation. Comprehensive experiments and analysis on a large-scale real-world dataset show that our proposed model can accurately retrieve code snippets and outperforms the state-of-the-art methods.
Tao Guo, Ruida Zhou, Chao Tian
We consider information leakage to the user in private information retrieval
(PIR) systems. Information leakage can be measured in terms of individual
message leakage or total leakage. Individual message leakage, or simply
individual leakage, is defined as the amount of information that the user can
obtain on any individual message that is not being requested, and the total
leakage is defined as the amount of information that the user can obtain about
all the other messages except the one being requested. In this work, we
characterize the tradeoff between the minimum download cost and the individual
leakage, and that for the total leakage, respectively. New codes are proposed
to achieve these optimal tradeoffs, which are also shown to be optimal in terms
of the message size. We further characterize the optimal tradeoff between the
minimum amount of common randomness and the total leakage. Moreover, we show
that under individual leakage, common randomness is in fact unnecessary when
there are more than two messages.
Authors' comments: 14 double-column pages, 5 figures, submitted to IEEE Transactions on
Information Forensics & Security
Jie Li, David Karpuk, Camilla Hollanti
Private information retrieval (PIR) is the problem of privately retrieving
one out of $M$ original files from $N$ severs, i.e., each individual server
learns nothing about the file that the user is requesting. Usually, the $M$
files are replicated or encoded by a maximum distance separable (MDS) code and
then stored across the $N$ servers. Compared to mere replication, MDS coded
servers can significantly reduce the storage overhead. Particularly, PIR from
minimum storage regenerating (MSR) coded servers can simultaneously reduce the
repair bandwidth when repairing failed servers. Existing PIR schemes from MSR
coded servers either require large sub-packetization levels or are not
capacity-achieving. In this paper, a PIR protocol from MDS array codes is
proposed, subsuming PIR from MSR coded servers as a special case. Particularly,
the case of non-colluding, honest-but-curious servers is considered. The
retrieval rate of the new PIR protocol achieves the capacity of PIR from
MDS/MSR coded servers. By choosing different MDS array codes, the new PIR
protocol can have some advantages when compared with existing protocols, e.g.,
1) small sub-packetization, 2) (near-) optimal repair bandwidth, 3)
implementable over the binary field $\mathbf{F}_2$.
Authors' comments: Accepted for publication in the IEEE Transactions on Communications
Dimitri Gominski, Martyna Poreba, Valérie Gouet-Brunet, Liming Chen
This article proposes to study the behavior of recent and efficient
state-of-the-art deep-learning based image descriptors for content-based image
retrieval, facing a panel of complex variations appearing in heterogeneous
image datasets, in particular in cultural collections that may involve
multi-source, multi-date and multi-view Permission to make digital
Authors' comments: SUMAC '19, 2019
Rinat Khaziev, Bryce Casavant, Pearce Washabaugh, Amy A. Winecoff, Matthew Graham
Information retrieval (IR) systems often leverage query data to suggest relevant items to users. This introduces the possibility of unfairness if the query (i.e., input) and the resulting recommendations unintentionally correlate with latent factors that are protected variables (e.g., race, gender, and age). For instance, a visual search system for fashion recommendations may pick up on features of the human models rather than fashion garments when generating recommendations. In this work, we introduce a statistical test for "distribution parity" in the top-K IR results, which assesses whether a given set of recommendations is fair with respect to a specific protected variable. We evaluate our test using both simulated and empirical results. First, using artificially biased recommendations, we demonstrate the trade-off between statistically detectable bias and the size of the search catalog. Second, we apply our test to a visual search system for fashion garments, specifically testing for recommendation bias based on the skin tone of fashion models. Our distribution parity test can help ensure that IR systems' results are fair and produce a good experience for all users.
Zihao Wang, Xihui Liu, Hongsheng Li, Lu Sheng, Junjie Yan, Xiaogang Wang, Jing Shao
Text-image cross-modal retrieval is a challenging task in the field of
language and vision. Most previous approaches independently embed images and
sentences into a joint embedding space and compare their similarities. However,
previous approaches rarely explore the interactions between images and
sentences before calculating similarities in the joint space. Intuitively, when
matching between images and sentences, human beings would alternatively attend
to regions in images and words in sentences, and select the most salient
information considering the interaction between both modalities. In this paper,
we propose Cross-modal Adaptive Message Passing (CAMP), which adaptively
controls the information flow for message passing across modalities. Our
approach not only takes comprehensive and fine-grained cross-modal interactions
into account, but also properly handles negative pairs and irrelevant
information with an adaptive gating scheme. Moreover, instead of conventional
joint embedding approaches for text-image matching, we infer the matching score
based on the fused features, and propose a hardest negative binary
cross-entropy loss for training. Results on COCO and Flickr30k significantly
surpass state-of-the-art methods, demonstrating the effectiveness of our
approach.
Authors' comments: Accepted by ICCV 2019
Xinyu Zhang, Rufeng Zhang, Jiewei Cao, Dong Gong, Mingyu You, Chunhua Shen
Vehicle instance retrieval often requires one to recognize the fine-grained
visual differences between vehicles. Besides the holistic appearance of
vehicles which is easily affected by the viewpoint variation and distortion,
vehicle parts also provide crucial cues to differentiate near-identical
vehicles. Motivated by these observations, we introduce a Part-Guided Attention
Network (PGAN) to pinpoint the prominent part regions and effectively combine
the global and part information for discriminative feature learning. PGAN first
detects the locations of different part components and salient regions
regardless of the vehicle identity, which serve as the bottom-up attention to
narrow down the possible searching regions. To estimate the importance of
detected parts, we propose a Part Attention Module (PAM) to adaptively locate
the most discriminative regions with high-attention weights and suppress the
distraction of irrelevant parts with relatively low weights. The PAM is guided
by the instance retrieval loss and therefore provides top-down attention that
enables attention to be calculated at the level of car parts and other salient
regions. Finally, we aggregate the global appearance and part features to
improve the feature performance further. The PGAN combines part-guided
bottom-up and top-down attention, global and part visual features in an
end-to-end framework. Extensive experiments demonstrate that the proposed
method achieves new state-of-the-art vehicle instance retrieval performance on
four large-scale benchmark datasets.
Authors' comments: 12 pages
Zheyuan Zhu, Yangyang Sun, Johnathon White, Zenghu Chang, Shuo Pang
Signal retrieval from a series of indirect measurements is a common task in
many imaging, metrology and characterization platforms in science and
engineering. Because most of the indirect measurement processes are
well-described by physical models, signal retrieval can be solved with an
iterative optimization that enforces measurement consistency and prior
knowledge on the signal. These iterative processes are time-consuming and only
accommodate a linear measurement process and convex signal constraints.
Recently, neural networks have been widely adopted to supersede iterative
signal retrieval methods by approximating the inverse mapping of the
measurement model. However, networks with deterministic processes have failed
to distinguish signal ambiguities in an ill-posed measurement system, and
retrieved signals often lack consistency with the measurement. In this work we
introduce a variational generative model to capture the distribution of all
possible signals, given a particular measurement. By exploiting the known
measurement model in the variational generative framework, our signal retrieval
process resolves the ambiguity in the forward process, and learns to retrieve
signals that satisfy the measurement with high fidelity in a variety of linear
and nonlinear ill-posed systems, including ultrafast pulse retrieval, coded
aperture compressive video sensing and image retrieval from Fresnel hologram.
Authors' comments: 8 pages, 5 figures. Initial submission to IEEE Transactions on
Computational Imaging
Zihui Wu, Yu Sun, Jiaming Liu, Ulugbek S. Kamilov
Regularization by denoising (RED) is a powerful framework for solving imaging
inverse problems. Most RED algorithms are iterative batch procedures, which
limits their applicability to very large datasets. In this paper, we address
this limitation by introducing a novel online RED (On-RED) algorithm, which
processes a small subset of the data at a time. We establish the theoretical
convergence of On-RED in convex settings and empirically discuss its
effectiveness in non-convex ones by illustrating its applicability to phase
retrieval. Our results suggest that On-RED is an effective alternative to the
traditional RED algorithms when dealing with large datasets.
Authors' comments: Accepted ICCVW 2019 (LCI)
Yang Chen, Cheng Cheng, Qiyu Sun
The phase retrieval problem in the classical setting is to reconstruct real/complex functions from the magnitudes of their Fourier/frame measurements. In this paper, we consider a new phase retrieval paradigm in the complex/quaternion/vector-valued setting, and we provide several characterizations to determine complex/quaternion/vector-valued functions $f$ in a linear space $S$ of (in)finite dimensions, up to a trivial ambiguity, from the magnitudes $\|\phi(f)\|$ of their linear measurements $\phi(f), \phi\in \Phi$. Our characterization in the scalar setting implies the well-known equivalence between the complement property for linear measurements $\Phi$ and the phase retrieval of linear space $S$. In this paper, we also discuss the affine phase retrieval of vector-valued functions in a linear space and the reconstruction of vector fields on a graph, up to an orthogonal matrix, from their absolute magnitudes at vertices and relative magnitudes between neighboring vertices.
Tianlang Chen, Zhaowen Wang, Ning Xu, Hailin Jin, Jiebo Luo
Font selection is one of the most important steps in a design workflow.
Traditional methods rely on ordered lists which require significant domain
knowledge and are often difficult to use even for trained professionals. In
this paper, we address the problem of large-scale tag-based font retrieval
which aims to bring semantics to the font selection process and enable people
without expert knowledge to use fonts effectively. We collect a large-scale
font tagging dataset of high-quality professional fonts. The dataset contains
nearly 20,000 fonts, 2,000 tags, and hundreds of thousands of font-tag
relations. We propose a novel generative feature learning algorithm that
leverages the unique characteristics of fonts. The key idea is that font images
are synthetic and can therefore be controlled by the learning algorithm. We
design an integrated rendering and learning process so that the visual feature
from one image can be used to reconstruct another image with different text.
The resulting feature captures important font design details while is robust to
nuisance factors such as text. We propose a novel attention mechanism to
re-weight the visual feature for joint visual-text modeling. We combine the
feature and the attention mechanism in a novel recognition-retrieval model.
Experimental results show that our method significantly outperforms the
state-of-the-art for the important problem of large-scale tag-based font
retrieval.
Authors' comments: accepted by ICCV 2019
Tommaso Teofili, Niyati Chhaya
Distributed representations of words have shown to be useful to improve the effectiveness of IR systems in many sub-tasks like query expansion, retrieval and ranking. Algorithms like word2vec, GloVe and others are also key factors in many improvements in different NLP tasks. One common issue with such embedding models is that words like happy and sad appear in similar contexts and hence are wrongly clustered close in the embedding space. In this paper we leverage Aff2Vec, a set of word embeddings models which include affect information, in order to better capture the affect aspect in news text to achieve better results in information retrieval tasks, also such embeddings are less hit by the synonym/antonym issue. We evaluate their effectiveness on two IR related tasks (query expansion and ranking) over the New York Times dataset (TREC-core '17) comparing them against other word embeddings based models and classic ranking models.