Marvin Teichmann, Andre Araujo, Menglong Zhu, Jack Sim
Retrieving object instances among cluttered scenes efficiently requires
compact yet comprehensive regional image representations. Intuitively, object
semantics can help build the index that focuses on the most relevant regions.
However, due to the lack of bounding-box datasets for objects of interest among
retrieval benchmarks, most recent work on regional representations has focused
on either uniform or class-agnostic region selection. In this paper, we first
fill the void by providing a new dataset of landmark bounding boxes, based on
the Google Landmarks dataset, that includes $86k$ images with manually curated
boxes from $15k$ unique landmarks. Then, we demonstrate how a trained landmark
detector, using our new dataset, can be leveraged to index image regions and
improve retrieval accuracy while being much more efficient than existing
regional methods. In addition, we introduce a novel regional aggregated
selective match kernel (R-ASMK) to effectively combine information from
detected regions into an improved holistic image representation. R-ASMK boosts
image retrieval accuracy substantially with no dimensionality increase, while
even outperforming systems that index image regions independently. Our complete
image retrieval system improves upon the previous state-of-the-art by
significant margins on the Revisited Oxford and Paris datasets. Code and data
available at the project webpage:
https://github.com/tensorflow/models/tree/master/research/delf.
Authors' comments: CVPR 2019. Code and dataset available:
https://github.com/tensorflow/models/tree/master/research/delf
Teng Zhang
We consider a phase retrieval problem, where the goal is to reconstruct a $n$-dimensional complex vector from its phaseless scalar products with $m$ sensing vectors, independently sampled from complex normal distributions. We show that, with a random initialization, the classical algorithm of alternating minimization succeeds with high probability as $n,m\rightarrow\infty$ when ${m}/{\log^3m}\geq Mn^{3/2}\log^{1/2}n$ for some $M>0$. This is a step toward proving the conjecture in \cite{Waldspurger2016}, which conjectures that the algorithm succeeds when $m=O(n)$. The analysis depends on an approach that enables the decoupling of the dependency between the algorithmic iterates and the sensing vectors.
Tatsunori B. Hashimoto, Kelvin Guu, Yonatan Oren, Percy Liang
For the task of generating complex outputs such as source code, editing
existing outputs can be easier than generating complex outputs from scratch.
With this motivation, we propose an approach that first retrieves a training
example based on the input (e.g., natural language description) and then edits
it to the desired output (e.g., code). Our contribution is a computationally
efficient method for learning a retrieval model that embeds the input in a
task-dependent way without relying on a hand-crafted metric or incurring the
expense of jointly training the retriever with the editor. Our
retrieve-and-edit framework can be applied on top of any base model. We show
that on a new autocomplete task for GitHub Python code and the Hearthstone
cards benchmark, retrieve-and-edit significantly boosts the performance of a
vanilla sequence-to-sequence model on both tasks.
Authors' comments: To appear, NeurIPS 2018
Qiuchi Li, Yijun Yu, Dawei Song, Bashar Nuseibeh
During software maintenance and evolution, developers need to deal with a
large number of change requests by modifying existing code or adding code into
the system. An efficient tackling of change request calls for an accurate
localising of software changes, i.e. identifying which code are problematic and
where new files should be added for any type of change request at hand, such as
a bug report or a feature request. Existing automatic techniques for this
change localisation problem are limited in two aspects: on the one hand, they
are only limited to tackle a specific type of change request; on the other
hand, they are focused on finding files that should be modified for a change
request, yet barely capable of recommending what files or packages might be
newly created. To address the limitations, we are inspired to propose a
generalised change localisation approach to identify the to-be-modified files
(mostly for bugs), and at the same time point out where new files or packages
should be created (mostly for new feature requests) for an arbitrary type of
change request. In order to tackle the key challenge of predicting
to-be-created program elements, our proposed SeekChanges approach leverages the
hierarchical package structure for Java projects, and model the change
localisation problem as a structured information retrieval (IR) task. A
systematic investigation of three structured IR strategies is carried out for
scoring and ranking both the files that should be modified and the software
packages in which the new files should be created to address change requests.
Extensive experiments on four open source Java projects from the Apache
Software Foundation demonstrate that structured IR strategies have a good
performance on recommending newly created files, while the overall performance
of localising change requests is equally satisfactory.
Authors' comments: This paper has been withdrawn. This work is not mature enough
Nikit Begwani, Shrutendra Harsola, Rahul Agrawal
Retrieval models such as CLSM is trained on click-through data which treats
each clicked query-document pair as equivalent. While training on click-through
data is reasonable, this paper argues that it is sub-optimal because of its
noisy and long-tail nature (especially for sponsored search). In this paper, we
discuss the impact of incorporating or disregarding the long tail pairs in the
training set. Also, we propose a weighing based strategy using which we can
learn semantic representations for tail pairs without compromising the quality
of retrieval. We conducted our experiments on Bing sponsored search and also on
Amazon product recommendation to demonstrate that the methodology is domain
agnostic.
Online A/B testing on live search engine traffic showed improvements in
clicks (11.8\% higher CTR) and as well as improvement in quality (8.2\% lower
bounce rate) when compared to the unweighted model. We also conduct the
experiment on Amazon Product Recommendation data where we see slight
improvements in NDCG Scores calculated by retrieving among co-purchased
product.
Authors' comments: 7 pages, 5 figures, DAPA, WSDM Workshop
Stefano Marchesin
The goal of case-based retrieval is to assist physicians in the clinical decision making process, by finding relevant medical literature in large archives. We propose a research that aims at improving the effectiveness of case-based retrieval systems through the use of automatically created document-level semantic networks. The proposed research tackles different aspects of information systems and leverages the recent advancements in information extraction and relational learning to revisit and advance the core ideas of concept-centered hypertext models. We propose a two-step methodology that in the first step addresses the automatic creation of document-level semantic networks, then in the second step it designs methods that exploit such document representations to retrieve relevant cases from medical literature. For the automatic creation of documents' semantic networks, we design a combination of information extraction techniques and relational learning models. Mining concepts and relations from text, information extraction techniques represent the core of the document-level semantic networks' building process. On the other hand, relational learning models have the task of enriching the graph with additional connections that have not been detected by information extraction algorithms and strengthening the confidence score of extracted relations. For the retrieval of relevant medical literature, we investigate methods that are capable of comparing the documents' semantic networks in terms of structure and semantics. The automatic extraction of semantic relations from documents, and their centrality in the creation of the documents' semantic networks, represent our attempt to go one step further than previous graph-based approaches.
Tianwei Shen, Zixin Luo, Lei Zhou, Runze Zhang, Siyu Zhu, Tian Fang, Long Quan
Convolutional Neural Networks (CNNs) have achieved superior performance on
object image retrieval, while Bag-of-Words (BoW) models with handcrafted local
features still dominate the retrieval of overlapping images in 3D
reconstruction. In this paper, we narrow down this gap by presenting an
efficient CNN-based method to retrieve images with overlaps, which we refer to
as the matchable image retrieval problem. Different from previous methods that
generates training data based on sparse reconstruction, we create a large-scale
image database with rich 3D geometrics and exploit information from surface
reconstruction to obtain fine-grained training data. We propose a batched
triplet-based loss function combined with mesh re-projection to effectively
learn the CNN representation. The proposed method significantly accelerates the
image retrieval process in 3D reconstruction and outperforms the
state-of-the-art CNN-based and BoW methods for matchable image retrieval. The
code and data are available at https://github.com/hlzz/mirror.
Authors' comments: accepted by ACCV 2018
Kateřina Jiráková, Karol Bartkiewicz, Antonín Černoch, Karel Lemr
The concept of quantum money (QM) was proposed by Wiesner in the 1970s. Its
main advantage is that every attempt to copy QM unavoidably leads to imperfect
counterfeits. In the Wiesner's protocol, quantum banknotes need to be delivered
to the issuing bank for verification. Thus, QM requires quantum communication
which range is limited by noise and losses. Recently, Bozzio et al. (2018) have
demonstrated experimentally how to replace challenging quantum verification
with a classical channel and a quantum retrieval game (QRG). This brings QM
significantly closer to practical realisation, but still thorough analysis of
the revised scheme QM is required before it can be considered secure. We
address this problem by presenting a proof-of-concept attack on QRG-based QM
schemes, where we show that even imperfect quantum cloning can, under some
circumstances, provide enough information to break a QRG-based QM scheme.
Authors' comments: 6 pages, 7 figures
Suwon Shon, Younggun Lee, Taesu Kim
This paper describes a fast speaker search system to retrieve segments of the
same voice identity in the large-scale data. A recent study shows that Locality
Sensitive Hashing (LSH) enables quick retrieval of a relevant voice in the
large-scale data in conjunction with i-vector while maintaining accuracy. In
this paper, we proposed Random Speaker-variability Subspace (RSS) projection to
map a data into LSH based hash tables. We hypothesized that rather than
projecting on completely random subspace without considering data, projecting
on randomly generated speaker variability space would give more chance to put
the same speaker representation into the same hash bins, so we can use less
number of hash tables. Multiple RSS can be generated by randomly selecting a
subset of speakers from a large speaker cohort. From the experimental result,
the proposed approach shows 100 times and 7 times faster than the linear search
and LSH, respectively
Authors' comments: Interspeech 2019
Rana Jafari, Rick Trebino
Frequency-resolved optical gating (FROG) is widely used to measure ultrashort laser pulses, also providing an excellent indication of pulse-shape instabilities by disagreement between measured and retrieved FROG traces. FROG, however, requires -- but currently lacks -- an extremely reliable pulse-retrieval algorithm. So, this work provides one. It uses a simple procedure for directly retrieving the precise pulse spectrum from the measured trace. Additionally, it implements a multi-grid scheme, also quickly yielding a vastly improved guess for the spectral phase before implementing the entire measured trace. As a result, it achieves 100% convergence for the three most common variants of FROG for pulses with time-bandwidth products as high as 100, even with traces contaminated with noise. Here we consider the polarization-gate (PG) and transient-grating (TG) variants of FROG, which measure amplified, UV, and broadly tunable pulses. Convergence occurs for all of the >20,000 simulated noisy PG/TG FROG traces considered and is also faster.
Zhaoqun Li, Cheng Xu, Biao Leng
How to obtain the desirable representation of a 3D shape, which is
discriminative across categories and polymerized within classes, is a
significant challenge in 3D shape retrieval. Most existing 3D shape retrieval
methods focus on capturing strong discriminative shape representation with
softmax loss for the classification task, while the shape feature learning with
metric loss is neglected for 3D shape retrieval. In this paper, we address this
problem based on the intuition that the cosine distance of shape embeddings
should be close enough within the same class and far away across categories.
Since most of 3D shape retrieval tasks use cosine distance of shape features
for measuring shape similarity, we propose a novel metric loss named angular
triplet-center loss, which directly optimizes the cosine distances between the
features. It inherits the triplet-center loss property to achieve larger
inter-class distance and smaller intra-class distance simultaneously. Unlike
previous metric loss utilized in 3D shape retrieval methods, where Euclidean
distance is adopted and the margin design is difficult, the proposed method is
more convenient to train feature embeddings and more suitable for 3D shape
retrieval. Moreover, the angle margin is adopted to replace the cosine margin
in order to provide more explicit discriminative constraints on an embedding
space. Extensive experimental results on two popular 3D object retrieval
benchmarks, ModelNet40 and ShapeNetCore 55, demonstrate the effectiveness of
our proposed loss, and our method has achieved state-of-the-art results on
various 3D shape datasets.
Authors' comments: Accepted by AAAI 2019
Sean MacAvaney, Andrew Yates, Arman Cohan, Luca Soldaini, Kai Hui, Nazli Goharian, Ophir Frieder
Many questions cannot be answered simply; their answers must include numerous
nuanced details and additional context. Complex Answer Retrieval (CAR) is the
retrieval of answers to such questions. In their simplest form, these questions
are constructed from a topic entity (e.g., `cheese') and a facet (e.g., `health
effects'). While topic matching has been thoroughly explored, we observe that
some facets use general language that is unlikely to appear verbatim in
answers. We call these low-utility facets. In this work, we present an approach
to CAR that identifies and addresses low-utility facets. We propose two
estimators of facet utility. These include exploiting the hierarchical
structure of CAR queries and using facet frequency information from training
data. To improve the retrieval performance on low-utility headings, we also
include entity similarity scores using knowledge graph embeddings. We apply our
approaches to a leading neural ranking technique, and evaluate using the TREC
CAR dataset. We find that our approach perform significantly better than the
unmodified neural ranker and other leading CAR techniques. We also provide a
detailed analysis of our results, and verify that low-utility facets are indeed
more difficult to match, and that our approach improves the performance for
these difficult queries.
Authors' comments: This is a pre-print of an article published in Information Retrieval
Journal. The final authenticated version (including additional experimental
results, analysis, etc.) is available online at:
https://doi.org/10.1007/s10791-018-9343-0
Xin Zhang, Qian Wang, Toby Breckon, Ioannis Ivrissimtzis
We present a method for reading digital data embedded in planar 3D printed surfaces. The data are organised in binary arrays and embedded as surface textures in a way inspired by QR codes. At the core of the retrieval method lies a Convolutional Neural Network, outputting a confidence map of the location of the surface textures encoding value 1 bits. Subsequently, the bit array is retrieved through a series of simple image processing and statistical operations applied on the confidence map. Extensive experimentation with images captured from various camera views, under various illumination conditions and from objects printed with various material colours, shows that the proposed method generalizes well and achieves the level of accuracy required in practical applications.
Jian Xu, Chunheng Wang, Cunzhao Shi, Baihua Xiao
In recent year, the compact representations based on activations of
Convolutional Neural Network (CNN) achieve remarkable performance in image
retrieval. However, retrieval of some interested object that only takes up a
small part of the whole image is still a challenging problem. Therefore, it is
significant to extract the discriminative representations that contain regional
information of the pivotal small object. In this paper, we propose a novel
adversarial soft-detection-based aggregation (ASDA) method free from bounding
box annotations for image retrieval, based on adversarial detector and soft
region proposal layer. Our trainable adversarial detector generates semantic
maps based on adversarial erasing strategy to preserve more discriminative and
detailed information. Computed based on semantic maps corresponding to various
discriminative patterns and semantic contents, our soft region proposal is
arbitrary shape rather than only rectangle and it reflects the significance of
objects. The aggregation based on trainable soft region proposal highlights
discriminative semantic contents and suppresses the noise of background.
We conduct comprehensive experiments on standard image retrieval datasets.
Our weakly supervised ASDA method achieves state-of-the-art performance on most
datasets. The results demonstrate that the proposed ASDA method is effective
for image retrieval.
Authors' comments: 10 pages, 6 figures
Dan Edidin
We consider the geometry associated to the ambiguities of the one-dimensional
Fourier phase retrieval problem for vectors in ${\mathbb C}^{N+1}$. Our first
result states that the space of signals has a finite covering (which we call
the root covering) where any two signals in the covering space with the same
Fourier intensity function differ by a trivial covering ambiguity. Next we use
the root covering to study how the non-trivial ambiguities of a signal vary as
the signal varies. This is done by describing of the incidence variety of pairs
of signals with same fourier intensity function modulo global phase. As an
application we give a criterion for a real subvariety of the space of signals
to admit generic phase retrieval. The extension of this result to multi-vectors
played an important role in the author's work with Bendory and Eldar on blind
phaseless short-time fourier transform recovery.
Authors' comments: To appear in SIAM Journal of Algebra and Geometry
Sagar Uprety, Dimitris Gkoumas, Dawei Song
Relevance judgment in Information Retrieval is influenced by multiple
factors. These include not only the topicality of the documents but also other
user oriented factors like trust, user interest, etc. Recent works have
identified these various factors into seven dimensions of relevance. In a
previous work, these relevance dimensions were quantified and user's cognitive
state with respect to a document was represented as a state vector in a Hilbert
Space, with each relevance dimension representing a basis. It was observed that
relevance dimensions are incompatible in some documents, when making a
judgment. Incompatibility being a fundamental feature of Quantum Theory, this
motivated us to test the Quantum nature of relevance judgments using Bell type
inequalities. However, none of the Bell-type inequalities tested have shown any
violation. We discuss our methodology to construct incompatible basis for
documents from real world query log data, the experiments to test Bell
inequalities on this dataset and possible reasons for the lack of violation.
Authors' comments: 11th Quantum Interaction Conference, Nice, France
Xing Wei, Carsten Eickhoff
Neural network representation learning frameworks have recently shown to be
highly effective at a wide range of tasks ranging from radiography
interpretation via data-driven diagnostics to clinical decision support. This
often superior performance comes at the price of dramatically increased
training data requirements that cannot be satisfied in every given institution
or scenario. As a means of countering such data sparsity effects, distant
supervision alleviates the need for scarce in-domain data by relying on a
related, resource-rich, task for training.
This study presents an end-to-end neural clinical decision support system
that recommends relevant literature for individual patients (few available
resources) via distant supervision on the well-known MIMIC-III collection
(abundant resource). Our experiments show significant improvements in retrieval
effectiveness over traditional statistical as well as purely locally supervised
retrieval models.
Authors' comments: Published in AMIA Annual Symposium 2018
Dae Hoon Park, Yi Chang
Ad-hoc retrieval models with implicit feedback often have problems, e.g., the
imbalanced classes in the data set. Too few clicked documents may hurt
generalization ability of the models, whereas too many non-clicked documents
may harm effectiveness of the models and efficiency of training. In addition,
recent neural network-based models are vulnerable to adversarial examples due
to the linear nature in them. To solve the problems at the same time, we
propose an adversarial sampling and training framework to learn ad-hoc
retrieval models with implicit feedback. Our key idea is (i) to augment clicked
examples by adversarial training for better generalization and (ii) to obtain
very informational non-clicked examples by adversarial sampling and training.
Experiments are performed on benchmark data sets for common ad-hoc retrieval
tasks such as Web search, item recommendation, and question answering.
Experimental results indicate that the proposed approaches significantly
outperform strong baselines especially for high-ranked documents, and they
outperform IRGAN in NDCG@5 using only 5% of labeled data for the Web search
task.
Authors' comments: Published in WWW 2019
Roshanak Zakizadeh, Yu Qian, Michele Sasdelli, Eduard Vazquez
In this paper, we present a method for instance ranking and retrieval at fine-grained level based on the global features extracted from a multi-attribute recognition model which is not dependent on landmarks information or part-based annotations. Further, we make this architecture suitable for mobile-device application by adopting the bilinear CNN to make the multi-attribute recognition model smaller (in terms of the number of parameters). The experiments run on the Dress category of DeepFashion In-Shop Clothes Retrieval and CUB200 datasets show that the results of instance retrieval at fine-grained level are promising for these datasets, specially in terms of texture and color.
Aniket Bhatnagar, Sanchit Aggarwal
The ability to correctly classify and retrieve apparel images has a variety
of applications important to e-commerce, online advertising and internet
search. In this work, we propose a robust framework for fine-grained apparel
classification, in-shop and cross-domain retrieval which eliminates the
requirement of rich annotations like bounding boxes and human-joints or
clothing landmarks, and training of bounding box/ key-landmark detector for the
same. Factors such as subtle appearance differences, variations in human poses,
different shooting angles, apparel deformations, and self-occlusion add to the
challenges in classification and retrieval of apparel items. Cross-domain
retrieval is even harder due to the presence of large variation between online
shopping images, usually taken in ideal lighting, pose, positive angle and
clean background as compared with street photos captured by users in
complicated conditions with poor lighting and cluttered scenes. Our framework
uses compact bilinear CNN with tensor sketch algorithm to generate embeddings
that capture local pairwise feature interactions in a translationally invariant
manner. For apparel classification, we pass the feature embeddings through a
softmax classifier, while, the in-shop and cross-domain retrieval pipelines use
a triplet-loss based optimization approach, such that squared Euclidean
distance between embeddings measures the dissimilarity between the images.
Unlike previous works that relied on bounding box, key clothing landmarks or
human joint detectors to assist the final deep classifier, proposed framework
can be trained directly on the provided category labels or generated triplets
for triplet loss optimization. Lastly, Experimental results on the DeepFashion
fine-grained categorization, and in-shop and consumer-to-shop retrieval
datasets provide a comparative analysis with previous work performed in the
domain.
Authors' comments: 14 pages, 6 figures, 3 tables, Submitted to Springer Journal of
Applied Intelligence