Yuting Chen, Joseph Wang, Yannan Bai, Gregory Castañón, Venkatesh Saligrama
We present a novel framework for finding complex activities matching
user-described queries in cluttered surveillance videos. The wide diversity of
queries coupled with unavailability of annotated activity data limits our
ability to train activity models. To bridge the semantic gap we propose to let
users describe an activity as a semantic graph with object attributes and
inter-object relationships associated with nodes and edges, respectively. We
learn node/edge-level visual predictors during training and, at test-time,
propose to retrieve activity by identifying likely locations that match the
semantic graph. We formulate a novel CRF based probabilistic activity
localization objective that accounts for mis-detections, mis-classifications
and track-losses, and outputs a likelihood score for a candidate grounded
location of the query in the video. We seek groundings that maximize overall
precision and recall. To handle the combinatorial search over all
high-probability groundings, we propose a highest precision subgraph matching
algorithm. Our method outperforms existing retrieval methods on benchmarked
datasets.
Authors' comments: 1520-9210 (c) 2018 IEEE. This paper has been accepted by IEEE
Transactions on Multimedia. Print ISSN: 1520-9210. Online ISSN: 1941-0077.
Preprint link is https://ieeexplore.ieee.org/document/8438958/
Avikalp Srivastava, Madhav Datt
Semantic similarity based retrieval is playing an increasingly important role
in many IR systems such as modern web search, question-answering, similar
document retrieval etc. Improvements in retrieval of semantically similar
content are very significant to applications like Quora, Stack Overflow, Siri
etc. We propose a novel unsupervised model for semantic similarity based
content retrieval, where we construct semantic flow graphs for each query, and
introduce the concept of "soft seeding" in graph based semi-supervised learning
(SSL) to convert this into an unsupervised model.
We demonstrate the effectiveness of our model on an equivalent question
retrieval problem on the Stack Exchange QA dataset, where our unsupervised
approach significantly outperforms the state-of-the-art unsupervised models,
and produces comparable results to the best supervised models. Our research
provides a method to tackle semantic similarity based retrieval without any
training data, and allows seamless extension to different domain QA
communities, as well as to other semantic equivalence tasks.
Authors' comments: Published in Proceedings of the 2017 ACM Conference on Information
and Knowledge Management (CIKM '17)
Denis Fedorenko, Nikita Smetanin, Artem Rodichev
Retrieval-based conversation systems generally tend to highly rank responses that are semantically similar or even identical to the given conversation context. While the system's goal is to find the most appropriate response, rather than the most semantically similar one, this tendency results in low-quality responses. We refer to this challenge as the echoing problem. To mitigate this problem, we utilize a hard negative mining approach at the training stage. The evaluation shows that the resulting model reduces echoing and achieves better results in terms of Average Precision and Recall@N metrics, compared to the models trained without the proposed approach.
Stéphane Larouche, Vesna Radisic
Electromagnetic metamaterials offer a great avenue to engineer and amplify the nonlinear response of materials. Their electric, magnetic, and magneto-electric linear and nonlinear response are related to their structure, providing unprecedented liberty to control those properties. Both the linear and the nonlinear properties of metamaterials are typically anisotropic. While the methods to retrieve the effective linear properties are well established, existing nonlinear retrieval methods have serious limitations. In the present work, we generalize a nonlinear transfer matrix approach to account for all nonlinear susceptibility terms and show how to use this approach to retrieve all effective nonlinear susceptibilities of metamaterial elements. The approach is demonstrated using sum frequency generation, but can be applied to other second order or higher order processes.
Milad Bakhshizadeh, Arian Maleki, Shirin Jalali
Compressive phase retrieval refers to the problem of recovering a structured
$n$-dimensional complex-valued vector from its phase-less under-determined
linear measurements. The non-linearity of measurements makes designing
theoretically-analyzable efficient phase retrieval algorithms challenging. As a
result, to a great extent, algorithms designed in this area are developed to
take advantage of simple structures such as sparsity and its convex
generalizations. The goal of this paper is to move beyond simple models through
employing compression codes. Such codes are typically developed to take
advantage of complex signal models to represent the signals as efficiently as
possible. In this work, it is shown how an existing compression code can be
treated as a black box and integrated into an efficient solution for phase
retrieval. First, COmpressive PhasE Retrieval (COPER) optimization, a
computationally-intensive compression-based phase retrieval method, is
proposed. COPER provides a theoretical framework for studying compression-based
phase retrieval. The number of measurements required by COPER is connected to
$\kappa$, the $\alpha$-dimension (closely related to the rate-distortion
dimension) of the given family of compression codes. To finds the solution of
COPER, an efficient iterative algorithm called gradient descent for COPER
(GD-COPER) is proposed. It is proven that under some mild conditions on the
initialization, if the number of measurements is larger than $ C \kappa^2
\log^2 n$, where $C$ is a constant, GD-COPER obtains an accurate estimate of
the input vector in polynomial time. In the simulation results, JPEG2000 is
integrated in GD-COPER to confirm the superb performance of the resulting
algorithm on real-world images.
Authors' comments: 43 pages
Ragnar Freij-Hollanti, Oliver W. Gnilke, Camilla Hollanti, Anna-Lena Horlemann-Trautmann, David Karpuk, Ivo Kubjas
This paper presents private information retrieval (PIR) schemes for coded storage with colluding servers, which are not restricted to maximum distance separable (MDS) codes. PIR schemes for general linear codes are constructed and the resulting PIR rate is calculated explicitly. It is shown that codes with transitive automorphism groups yield the highest possible rates obtainable with the proposed scheme. This rate coincides with the known asymptotic PIR capacity for MDS-coded storage systems without collusion. While many PIR schemes in the literature require field sizes that grow with the number of servers and files in the system, we focus especially on the case of a binary base field, for which Reed- Muller codes serve as an important and explicit class of examples.
Yj Dong, JG Li
Recently, with the enormous growth of online videos, fast video retrieval research has received increasing attention. As an extension of image hashing techniques, traditional video hashing methods mainly depend on hand-crafted features and transform the real-valued features into binary hash codes. As videos provide far more diverse and complex visual information than images, extracting features from videos is much more challenging than that from images. Therefore, high-level semantic features to represent videos are needed rather than low-level hand-crafted methods. In this paper, a deep convolutional neural network is proposed to extract high-level semantic features and a binary hash function is then integrated into this framework to achieve an end-to-end optimization. Particularly, our approach also combines triplet loss function which preserves the relative similarity and difference of videos and classification loss function as the optimization objective. Experiments have been performed on two public datasets and the results demonstrate the superiority of our proposed method compared with other state-of-the-art video retrieval methods.
Xi Zhang, Siyu Zhou, Jiashi Feng, Hanjiang Lai, Bo Li, Yan Pan, Jian Yin, Shuicheng Yan
As the rapid growth of multi-modal data, hashing methods for cross-modal
retrieval have received considerable attention. Deep-networks-based cross-modal
hashing methods are appealing as they can integrate feature learning and hash
coding into end-to-end trainable frameworks. However, it is still challenging
to find content similarities between different modalities of data due to the
heterogeneity gap. To further address this problem, we propose an adversarial
hashing network with attention mechanism to enhance the measurement of content
similarities by selectively focusing on informative parts of multi-modal data.
The proposed new adversarial network, HashGAN, consists of three building
blocks: 1) the feature learning module to obtain feature representations, 2)
the generative attention module to generate an attention mask, which is used to
obtain the attended (foreground) and the unattended (background) feature
representations, 3) the discriminative hash coding module to learn hash
functions that preserve the similarities between different modalities. In our
framework, the generative module and the discriminative module are trained in
an adversarial way: the generator is learned to make the discriminator cannot
preserve the similarities of multi-modal data w.r.t. the background feature
representations, while the discriminator aims to preserve the similarities of
multi-modal data w.r.t. both the foreground and the background feature
representations. Extensive evaluations on several benchmark datasets
demonstrate that the proposed HashGAN brings substantial improvements over
other state-of-the-art cross-modal hashing methods.
Authors' comments: 10 pages, 8 figures, 3 tables
Danping Liao, Yuntao Qian
Academic literature retrieval is concerned with the selection of papers that are most likely to match a user's information needs. Most of the retrieval systems are limited to list-output models, in which the retrieval results are isolated from each other. In this work, we aim to uncover the relationships of the retrieval results and propose a method for building structural retrieval results for academic literatures, which we call a paper evolution graph (PEG). A PEG describes the evolution of the diverse aspects of input queries through several evolution chains of papers. By utilizing the author, citation and content information, PEGs can uncover the various underlying relationships among the papers and present the evolution of articles from multiple viewpoints. Our system supports three types of input queries: keyword, single-paper and two-paper queries. The construction of a PEG mainly consists of three steps. First, the papers are soft-clustered into communities via metagraph factorization during which the topic distribution of each paper is obtained. Second, topically cohesive evolution chains are extracted from the communities that are relevant to the query. Each chain focuses on one aspect of the query. Finally, the extracted chains are combined to generate a PEG, which fully covers all the topics of the query. The experimental results on a real-world dataset demonstrate that the proposed method is able to construct meaningful PEGs.
Jiafeng Guo, Yixing Fan, Qingyao Ai, W. Bruce Croft
In recent years, deep neural networks have led to exciting breakthroughs in
speech recognition, computer vision, and natural language processing (NLP)
tasks. However, there have been few positive results of deep models on ad-hoc
retrieval tasks. This is partially due to the fact that many important
characteristics of the ad-hoc retrieval task have not been well addressed in
deep models yet. Typically, the ad-hoc retrieval task is formalized as a
matching problem between two pieces of text in existing work using deep models,
and treated equivalent to many NLP tasks such as paraphrase identification,
question answering and automatic conversation. However, we argue that the
ad-hoc retrieval task is mainly about relevance matching while most NLP
matching tasks concern semantic matching, and there are some fundamental
differences between these two matching tasks. Successful relevance matching
requires proper handling of the exact matching signals, query term importance,
and diverse matching requirements. In this paper, we propose a novel deep
relevance matching model (DRMM) for ad-hoc retrieval. Specifically, our model
employs a joint deep architecture at the query term level for relevance
matching. By using matching histogram mapping, a feed forward matching network,
and a term gating network, we can effectively deal with the three relevance
matching factors mentioned above. Experimental results on two representative
benchmark collections show that our model can significantly outperform some
well-known retrieval models as well as state-of-the-art deep matching models.
Authors' comments: CIKM 2016, long paper
Christophe Van Gysel
Search engines rely heavily on term-based approaches that represent queries
and documents as bags of words. Text---a document or a query---is represented
by a bag of its words that ignores grammar and word order, but retains word
frequency counts. When presented with a search query, the engine then ranks
documents according to their relevance scores by computing, among other things,
the matching degrees between query and document terms. While term-based
approaches are intuitive and effective in practice, they are based on the
hypothesis that documents that exactly contain the query terms are highly
relevant regardless of query semantics. Inversely, term-based approaches assume
documents that do not contain query terms as irrelevant. However, it is known
that a high matching degree at the term level does not necessarily mean high
relevance and, vice versa, documents that match null query terms may still be
relevant. Consequently, there exists a vocabulary gap between queries and
documents that occurs when both use different words to describe the same
concepts. It is the alleviation of the effect brought forward by this
vocabulary gap that is the topic of this dissertation. More specifically, we
propose (1) methods to formulate an effective query from complex textual
structures and (2) latent vector space models that circumvent the vocabulary
gap in information retrieval.
Authors' comments: PhD thesis
Wasuwee Sodsong, Bernhard Scholz, Sanjay Chawla
Program analysis is a technique to reason about programs without executing them, and it has various applications in compilers, integrated development environments, and security. In this work, we present a machine learning pipeline that induces a security analyzer for programs by example. The security analyzer determines whether a program is either secure or insecure based on symbolic rules that were deduced by our machine learning pipeline. The machine pipeline is two-staged consisting of a Recurrent Neural Networks (RNN) and an Extractor that converts an RNN to symbolic rules. To evaluate the quality of the learned symbolic rules, we propose a sampling-based similarity measurement between two infinite regular languages. We conduct a case study using real-world data. In this work, we discuss the limitations of existing techniques and possible improvements in the future. The results show that with sufficient training data and a fair distribution of program paths it is feasible to deducing symbolic security rules for the OpenJDK library with millions lines of code.
Filip Radenović, Giorgos Tolias, Ondřej Chum
Image descriptors based on activations of Convolutional Neural Networks
(CNNs) have become dominant in image retrieval due to their discriminative
power, compactness of representation, and search efficiency. Training of CNNs,
either from scratch or fine-tuning, requires a large amount of annotated data,
where a high quality of annotation is often crucial. In this work, we propose
to fine-tune CNNs for image retrieval on a large collection of unordered images
in a fully automated manner. Reconstructed 3D models obtained by the
state-of-the-art retrieval and structure-from-motion methods guide the
selection of the training data. We show that both hard-positive and
hard-negative examples, selected by exploiting the geometry and the camera
positions available from the 3D models, enhance the performance of
particular-object retrieval. CNN descriptor whitening discriminatively learned
from the same training data outperforms commonly used PCA whitening. We propose
a novel trainable Generalized-Mean (GeM) pooling layer that generalizes max and
average pooling and show that it boosts retrieval performance. Applying the
proposed method to the VGG network achieves state-of-the-art performance on the
standard benchmarks: Oxford Buildings, Paris, and Holidays datasets.
Authors' comments: TPAMI 2018. arXiv admin note: substantial text overlap with
arXiv:1604.02426
Björn Barz, Joachim Denzler
Query images presented to content-based image retrieval systems often have
various different interpretations, making it difficult to identify the search
objective pursued by the user. We propose a technique for overcoming this
ambiguity, while keeping the amount of required user interaction at a minimum.
To achieve this, the neighborhood of the query image is divided into coherent
clusters from which the user may choose the relevant ones. A novel feedback
integration technique is then employed to re-rank the entire database with
regard to both the user feedback and the original query. We evaluate our
approach on the publicly available MIRFLICKR-25K dataset, where it leads to a
relative improvement of average precision by 23% over the baseline retrieval,
which does not distinguish between different image senses.
Authors' comments: VISAPP 2018 paper, 8 pages, 5 figures. Source code:
https://github.com/cvjena/aid
Zhuoxiang Chen, Zhe Xu, Ya Zhang, Xiao Gu
Image-based clothing retrieval is receiving increasing interest with the
growth of online shopping. In practice, users may often have a desired piece of
clothing in mind (e.g., either having seen it before on the street or requiring
certain specific clothing attributes) but may be unable to supply an image as a
query. We model this problem as a new type of image retrieval task in which the
target image resides only in the user's mind (called "mental image retrieval"
hereafter). Because of the absence of an explicit query image, we propose to
solve this problem through relevance feedback. Specifically, a new Bayesian
formulation is proposed that simultaneously models the retrieval target and its
high-level representation in the mind of the user (called the "user metric"
hereafter) as posterior distributions of pre-fetched shop images and
heterogeneous features extracted from multiple clothing attributes,
respectively. Requiring only clicks as user feedback, the proposed algorithm is
able to account for the variability in human decision-making. Experiments with
real users demonstrate the effectiveness of the proposed algorithm.
Authors' comments: 12 pages, under review at IEEE Transactions on Multimedia
Max H. Quinn, Erik Conser, Jordan M. Witte, Melanie Mitchell
We describe a novel architecture for semantic image retrieval---in particular, retrieval of instances of visual situations. Visual situations are concepts such as "a boxing match," "walking the dog," "a crowd waiting for a bus," or "a game of ping-pong," whose instantiations in images are linked more by their common spatial and semantic structure than by low-level visual similarity. Given a query situation description, our architecture---called Situate---learns models capturing the visual features of expected objects as well the expected spatial configuration of relationships among objects. Given a new image, Situate uses these models in an attempt to ground (i.e., to create a bounding box locating) each expected component of the situation in the image via an active search procedure. Situate uses the resulting grounding to compute a score indicating the degree to which the new image is judged to contain an instance of the situation. Such scores can be used to rank images in a collection as part of a retrieval system. In the preliminary study described here, we demonstrate the promise of this system by comparing Situate's performance with that of two baseline methods, as well as with a related semantic image-retrieval system based on "scene graphs."
Vikash Singh
In this paper we introduce, the FlashText algorithm for replacing keywords or finding keywords in a given text. FlashText can search or replace keywords in one pass over a document. The time complexity of this algorithm is not dependent on the number of terms being searched or replaced. For a document of size N (characters) and a dictionary of M keywords, the time complexity will be O(N). This algorithm is much faster than Regex, because regex time complexity is O(MxN). It is also different from Aho Corasick Algorithm, as it doesn't match substrings. FlashText is designed to only match complete words (words with boundary characters on both sides). For an input dictionary of {Apple}, this algorithm won't match it to 'I like Pineapple'. This algorithm is also designed to go for the longest match first. For an input dictionary {Machine, Learning, Machine learning} on a string 'I like Machine learning', it will only consider the longest match, which is Machine Learning. We have made python implementation of this algorithm available as open-source on GitHub, released under the permissive MIT License.
Philipp Mayr, Ingo Frommholz, Guillaume Cabanac
Bibliometric-enhanced Information Retrieval (BIR) workshops serve as the
annual gathering of IR researchers who address various information-related
tasks on scientific corpora and bibliometrics. The workshop features original
approaches to search, browse, and discover value-added knowledge from
scientific documents and related information networks (e.g., terms, authors,
institutions, references). We welcome contributions elaborating on dedicated IR
systems, as well as studies revealing original characteristics on how
scientific knowledge is created, communicated, and used. In this paper we
introduce the BIR workshop series and discuss some selected papers presented at
previous BIR workshops.
Authors' comments: 6 pages, workshop paper accepted at 39th European Conference on IR
Research, ECIR 2017
Sahil Agarwal, John S. Wettlaufer
We extend a data-based model-free multifractal method of exoplanet detection to probe exoplanetary atmospheres. Whereas the transmission spectrum is studied during the primary eclipse, we analyze the emission spectrum during the secondary eclipse, thereby probing the atmospheric limb. In addition to the spectral structure of exoplanet atmospheres, the approach provides information to study phenomena such as atmospheric flows, tidal-locking behavior, and the dayside-nightside redistribution of energy. The approach is demonstrated using Spitzer data for exoplanet HD189733b. The central advantage of the method is the lack of model assumptions in the detection and observational schemes.
Noa Garcia, George Vogiatzis
This work proposes a system for retrieving clothing and fashion products from video content. Although films and television are the perfect showcase for fashion brands to promote their products, spectators are not always aware of where to buy the latest trends they see on screen. Here, a framework for breaking the gap between fashion products shown on videos and users is presented. By relating clothing items and video frames in an indexed database and performing frame retrieval with temporal aggregation and fast indexing techniques, we can find fashion products from videos in a simple and non-intrusive way. Experiments in a large-scale dataset conducted here show that, by using the proposed framework, memory requirements can be reduced by 42.5X with respect to linear search, whereas accuracy is maintained at around 90%.