Changjoo Nam, Jinhwi Lee, Sang Hun Cheong, Brian Y. Cho, ChangHwan Kim
This paper presents a task and motion planning (TAMP) framework for a robotic
manipulator in order to retrieve a target object from clutter. We consider a
configuration of objects in a confined space with a high density so no
collision-free path to the target exists. The robot must relocate some objects
to retrieve the target without collisions. For fast completion of object
rearrangement, the robot aims to optimize the number of pick-and-place actions
which often determines the efficiency of a TAMP framework.
We propose a task planner incorporating motion planning to generate
executable plans which aims to minimize the number of pick-and-place actions.
In addition to fully known and static environments, our method can deal with
uncertain and dynamic situations incurred by occluded views. Our method is
shown to reduce the number of pick-and-place actions compared to baseline
methods (e.g., at least 28.0% of reduction in a known static environment with
20 objects).
Authors' comments: 2020 IEEE International Conference on Robotics and Automation (ICRA).
arXiv admin note: text overlap with arXiv:1907.03956
Du Su, Ali Yekkehkhany, Yi Lu, Wenmiao Lu
We propose a new application of embedding techniques for problem retrieval in adaptive tutoring. The objective is to retrieve problems whose mathematical concepts are similar. There are two challenges: First, like sentences, problems helpful to tutoring are never exactly the same in terms of the underlying concepts. Instead, good problems mix concepts in innovative ways, while still displaying continuity in their relationships. Second, it is difficult for humans to determine a similarity score that is consistent across a large enough training set. We propose a hierarchical problem embedding algorithm, called Prob2Vec, that consists of abstraction and embedding steps. Prob2Vec achieves 96.88\% accuracy on a problem similarity test, in contrast to 75\% from directly applying state-of-the-art sentence embedding methods. It is interesting that Prob2Vec is able to distinguish very fine-grained differences among problems, an ability humans need time and effort to acquire. In addition, the sub-problem of concept labeling with imbalanced training data set is interesting in its own right. It is a multi-label problem suffering from dimensionality explosion, which we propose ways to ameliorate. We propose the novel negative pre-training algorithm that dramatically reduces false negative and positive ratios for classification, using an imbalanced training data set.
Advait Madhavan, Mark D. Stiles
We extend the reach of temporal computing schemes by developing a memory for
multi-channel temporal patterns or "wavefronts." This temporal memory
re-purposes conventional one-transistor-one-resistor (1T1R) memristor crossbars
for use in an arrival-time coded, single-event-per-wire temporal computing
environment. The memristor resistances and the associated circuit capacitances
provide the necessary time constants, enabling the memory array to store and
retrieve wavefronts. The retrieval operation of such a memory is naturally in
the temporal domain and the resulting wavefronts can be used to trigger
time-domain computations. While recording the wavefronts can be done using
standard digital techniques, that approach has substantial translation costs
between temporal and digital domains. To avoid these costs, we propose a spike
timing dependent plasticity (STDP) inspired wavefront recording scheme to
capture incoming wavefronts. We simulate these designs with experimentally
validated memristor models and analyze the effects of memristor non-idealities
on the operation of such a memory.
Authors' comments: 5 Pages, 4 figures
Giovanna Castellano, Eufemia Lella, Gennaro Vessio
Visual arts are of inestimable importance for the cultural, historic and
economic growth of our society. One of the building blocks of most analysis in
visual arts is to find similarity relationships among paintings of different
artists and painting schools. To help art historians better understand visual
arts, this paper presents a framework for visual link retrieval and knowledge
discovery in digital painting datasets. Visual link retrieval is accomplished
by using a deep convolutional neural network to perform feature extraction and
a fully unsupervised nearest neighbor mechanism to retrieve links among
digitized paintings. Historical knowledge discovery is achieved by performing a
graph analysis that makes it possible to study influences among artists. An
experimental evaluation on a database collecting paintings by very popular
artists shows the effectiveness of the method. The unsupervised strategy makes
the method interesting especially in cases where metadata are scarce,
unavailable or difficult to collect.
Authors' comments: Published on Multimedia Tools and Applications. Modified references.
Corrected typos. Added observations according to reviewers
Nathanial Wilson, Christopher Perrella, Russell Anderson, André Luiten, Philip Light
We develop and demonstrate a new protocol that allows sensing of magnetic fields in an extra-ordinary regime for atomic magnetometry. Until now, the demonstrated bandwidth for atomic magnetometry has been constrained to be slower than the natural precession of atomic spins in a magnetic field---the Larmor frequency. We demonstrate a new approach that tracks the instantaneous phase of atomic spins to measure arbitrarily modulated magnetic fields with frequencies up to fifty times higher than the Larmor frequency. By accessing this regime, we demonstrate magnetic-field measurements across four decades in frequency up to 400 kHz, over three orders of magnitude wider than conventional atomic magnetometers. Furthermore, we demonstrate that our protocol can linearly detect transient fields 100--fold higher in amplitude than conventional methods. We highlight the bandwidth and dynamic range of the technique by measuring a magnetic field with a broad and dynamical spectrum.
Naveed Naimipour, Shahin Khobahi, Mojtaba Soltanalian
The problem of phase retrieval has been intriguing researchers for decades due to its appearance in a wide range of applications. The task of a phase retrieval algorithm is typically to recover a signal from linear phase-less measurements. In this paper, we approach the problem by proposing a hybrid model-based data-driven deep architecture, referred to as the Unfolded Phase Retrieval (UPR), that shows potential in improving the performance of the state-of-the-art phase retrieval algorithms. Specifically, the proposed method benefits from versatility and interpretability of well established model-based algorithms, while simultaneously benefiting from the expressive power of deep neural networks. Our numerical results illustrate the effectiveness of such hybrid deep architectures and showcase the untapped potential of data-aided methodologies to enhance the existing phase retrieval algorithms.
Fuxiang Huang, Lei Zhang, Yang Yang, Xichuan Zhou
Domain adaptive image retrieval includes single-domain retrieval and
cross-domain retrieval. Most of the existing image retrieval methods only focus
on single-domain retrieval, which assumes that the distributions of retrieval
databases and queries are similar. However, in practical application, the
discrepancies between retrieval databases often taken in ideal
illumination/pose/background/camera conditions and queries usually obtained in
uncontrolled conditions are very large. In this paper, considering the
practical application, we focus on challenging cross-domain retrieval. To
address the problem, we propose an effective method named Probability Weighted
Compact Feature Learning (PWCF), which provides inter-domain correlation
guidance to promote cross-domain retrieval accuracy and learns a series of
compact binary codes to improve the retrieval speed. First, we derive our loss
function through the Maximum A Posteriori Estimation (MAP): Bayesian
Perspective (BP) induced focal-triplet loss, BP induced quantization loss and
BP induced classification loss. Second, we propose a common manifold structure
between domains to explore the potential correlation across domains.
Considering the original feature representation is biased due to the
inter-domain discrepancy, the manifold structure is difficult to be
constructed. Therefore, we propose a new feature named Histogram Feature of
Neighbors (HFON) from the sample statistics perspective. Extensive experiments
on various benchmark databases validate that our method outperforms many
state-of-the-art image retrieval methods for domain adaptive image retrieval.
The source code is available at https://github.com/fuxianghuang1/PWCF
Authors' comments: Accepted by CVPR 2020; The source code is available at
https://github.com/fuxianghuang1/PWCF
Igor Shalyminov, Alessandro Sordoni, Adam Atkinson, Hannes Schulz
Domain adaptation has recently become a key problem in dialogue systems
research. Deep learning, while being the preferred technique for modeling such
systems, works best given massive training data. However, in the real-world
scenario, such resources aren't available for every new domain, so the ability
to train with a few dialogue examples can be considered essential. Pre-training
on large data sources and adapting to the target data has become the standard
method for few-shot problems within the deep learning framework. In this paper,
we present the winning entry at the fast domain adaptation task of DSTC8, a
hybrid generative-retrieval model based on GPT-2 fine-tuned to the multi-domain
MetaLWOz dataset. Robust and diverse in response generation, our model uses
retrieval logic as a fallback, being SoTA on MetaLWOz in human evaluation (>4%
improvement over the 2nd place system) and attaining competitive generalization
performance in adaptation to the unseen MultiWOZ dataset.
Authors' comments: Presented at DSTC8@AAAI 2020
Quentin Changeat, Ahmed Al-Refaie, Lorenzo V. Mugnai, Billy Edwards, Ingo P. Waldmann, Enzo Pascale, Giovanna Tinetti
In this work, we present Alfnoor, a dedicated tool optimised for population
studies of exoplanet atmospheres. Alfnoor combines the latest version of the
retrieval algorithm TauREx 3, with the instrument noise simulator ArielRad and
enables the simultaneous retrieval analysis of a large sample of
exo-atmospheres. We applied this tool to the Ariel list of planetary candidates
and focus on hydrogen dominated, cloudy atmospheres observed in transit with
the Tier-2 mode (medium Ariel resolution). As a first experiment, we randomised
the abundances - ranging from 10$^{-7}$ to 10$^{-2}$ - of the trace gases,
which include H$_2$O, CH$_4$, CO, CO$_2$ and NH$_3$. This exercise allowed to
estimate the detection limits for Ariel Tier-2 and Tier-3 modes when clouds are
present. In a second experiment, we imposed an arbitrary trend between a
chemical species and the effective temperature of the planet. A last experiment
was run requiring molecular abundances being dictated by equilibrium chemistry
at a certain temperature. Our results demonstrate the ability of Ariel Tier-2
and Tier-3 surveys to reveal trends between the chemistry and associated
planetary parameters. Future work will focus on eclipse data, on atmospheres
heavier than hydrogen and will be applied also to other observatories.
Authors' comments: 33 pages, 24 figures, Accepted in AJ
David Pickup, Xianfang Sun, Paul L Rosin, Ralph R Martin, Z Cheng, Zhouhui Lian, Masaki Aono, A Ben Hamza et al.
3D models of humans are commonly used within computer graphics and vision,
and so the ability to distinguish between body shapes is an important shape
retrieval problem. We extend our recent paper which provided a benchmark for
testing non-rigid 3D shape retrieval algorithms on 3D human models. This
benchmark provided a far stricter challenge than previous shape benchmarks. We
have added 145 new models for use as a separate training set, in order to
standardise the training data used and provide a fairer comparison. We have
also included experiments with the FAUST dataset of human scans. All
participants of the previous benchmark study have taken part in the new tests
reported here, many providing updated results using the new data. In addition,
further participants have also taken part, and we provide extra analysis of the
retrieval results. A total of 25 different shape retrieval methods.
Authors' comments: International Journal of Computer Vision, 2016
Shizhe Chen, Yida Zhao, Qin Jin, Qi Wu
Cross-modal retrieval between videos and texts has attracted growing
attentions due to the rapid emergence of videos on the web. The current
dominant approach for this problem is to learn a joint embedding space to
measure cross-modal similarities. However, simple joint embeddings are
insufficient to represent complicated visual and textual details, such as
scenes, objects, actions and their compositions. To improve fine-grained
video-text retrieval, we propose a Hierarchical Graph Reasoning (HGR) model,
which decomposes video-text matching into global-to-local levels. To be
specific, the model disentangles texts into hierarchical semantic graph
including three levels of events, actions, entities and relationships across
levels. Attention-based graph reasoning is utilized to generate hierarchical
textual embeddings, which can guide the learning of diverse and hierarchical
video representations. The HGR model aggregates matchings from different
video-text levels to capture both global and local details. Experimental
results on three video-text datasets demonstrate the advantages of our model.
Such hierarchical decomposition also enables better generalization across
datasets and improves the ability to distinguish fine-grained semantic
differences.
Authors' comments: To be appeared in CVPR 2020
Fahad Shamshad, Ali Ahmed
In this paper, we consider the highly ill-posed problem of jointly recovering
two real-valued signals from the phaseless measurements of their circular
convolution. The problem arises in various imaging modalities such as Fourier
ptychography, X-ray crystallography, and in visible light communication. We
propose to solve this inverse problem using alternating gradient descent
algorithm under two pretrained deep generative networks as priors; one is
trained on sharp images and the other on blur kernels. The proposed recovery
algorithm strives to find a sharp image and a blur kernel in the range of the
respective pre-generators that \textit{best} explain the forward measurement
model. In doing so, we are able to reconstruct quality image estimates.
Moreover, the numerics show that the proposed approach performs well on the
challenging measurement models that reflect the physically realizable imaging
systems and is also robust to noise
Authors' comments: 10 pages
Yan Feng, Bin Chen, Tao Dai, Shutao Xia
Deep product quantization network (DPQN) has recently received much attention
in fast image retrieval tasks due to its efficiency of encoding
high-dimensional visual features especially when dealing with large-scale
datasets. Recent studies show that deep neural networks (DNNs) are vulnerable
to input with small and maliciously designed perturbations (a.k.a., adversarial
examples). This phenomenon raises the concern of security issues for DPQN in
the testing/deploying stage as well. However, little effort has been devoted to
investigating how adversarial examples affect DPQN. To this end, we propose
product quantization adversarial generation (PQ-AG), a simple yet effective
method to generate adversarial examples for product quantization based
retrieval systems. PQ-AG aims to generate imperceptible adversarial
perturbations for query images to form adversarial queries, whose nearest
neighbors from a targeted product quantizaiton model are not semantically
related to those from the original queries. Extensive experiments show that our
PQ-AQ successfully creates adversarial examples to mislead targeted product
quantization retrieval models. Besides, we found that our PQ-AG significantly
degrades retrieval performance in both white-box and black-box settings.
Authors' comments: Accepted at AAAI20
Zalan Fabian, Justin Haldar, Richard Leahy, Mahdi Soltanolkotabi
Imaging 3D nano-structures at very high resolution is crucial in a variety of scientific fields. However, due to fundamental limitations of light propagation we can only measure the object indirectly via 2D intensity measurements of the 3D specimen through highly nonlinear projection mappings where a variety of information (including phase) is lost. Reconstruction therefore involves inverting highly non-linear and seemingly non-invertible mappings. In this paper, we introduce a novel technique where the 3D object is directly reconstructed from an accurate non-linear propagation model. Furthermore, we characterize the ambiguities of this model and leverage a priori knowledge to mitigate their effect and also significantly reduce the required number of measurements and hence the acquisition time. We demonstrate the performance of our algorithm via numerical experiments aimed at nano-scale reconstruction of 3D integrated circuits. Moreover, we provide rigorous theoretical guarantees for convergence to stationarity.
Young Kyun Jang, Nam Ik Cho
Image retrieval methods that employ hashing or vector quantization have
achieved great success by taking advantage of deep learning. However, these
approaches do not meet expectations unless expensive label information is
sufficient. To resolve this issue, we propose the first quantization-based
semi-supervised image retrieval scheme: Generalized Product Quantization (GPQ)
network. We design a novel metric learning strategy that preserves semantic
similarity between labeled data, and employ entropy regularization term to
fully exploit inherent potentials of unlabeled data. Our solution increases the
generalization capacity of the quantization network, which allows overcoming
previous limitations in the retrieval community. Extensive experimental results
demonstrate that GPQ yields state-of-the-art performance on large-scale real
image benchmark datasets.
Authors' comments: 10 pages, 10 figures, Computer Vision and Pattern Recognition (CVPR)
2020 accpeted paper
Tamir Bendory, Dan Edidin
Motivated by the X-ray crystallography technology to determine the atomic structure of biological molecules, we study the crystallographic phase retrieval problem, arguably the leading and hardest phase retrieval setup. This problem entails recovering a K-sparse signal of length N from its Fourier magnitude or, equivalently, from its periodic auto-correlation. Specifically, this work focuses on the fundamental question of uniqueness: what is the maximal sparsity level K/N that allows unique mapping between a signal and its Fourier magnitude, up to intrinsic symmetries. We design a systemic computational technique to affirm uniqueness for any specific pair (K,N), and establish the following conjecture: the Fourier magnitude determines a generic signal uniquely, up to intrinsic symmetries, as long as K<=N/2. Based on group-theoretic considerations and an additional computational technique, we formulate a second conjecture: if K<N/2, then for any signal the set of solutions to the crystallographic phase retrieval problem has measure zero in the set of all signals with a given Fourier magnitude. Together, these conjectures constitute the first attempt to establish a mathematical theory for the crystallographic phase retrieval problem.
Hadi Abdi Khojasteh, Ebrahim Ansari, Parvin Razzaghi, Akbar Karimi
This paper considers the task of matching images and sentences by learning a
visual-textual embedding space for cross-modal retrieval. Finding such a space
is a challenging task since the features and representations of text and image
are not comparable. In this work, we introduce an end-to-end deep multimodal
convolutional-recurrent network for learning both vision and language
representations simultaneously to infer image-text similarity. The model learns
which pairs are a match (positive) and which ones are a mismatch (negative)
using a hinge-based triplet ranking. To learn about the joint representations,
we leverage our newly extracted collection of tweets from Twitter. The main
characteristic of our dataset is that the images and tweets are not
standardized the same as the benchmarks. Furthermore, there can be a higher
semantic correlation between the pictures and tweets contrary to benchmarks in
which the descriptions are well-organized. Experimental results on MS-COCO
benchmark dataset show that our model outperforms certain methods presented
previously and has competitive performance compared to the state-of-the-art.
The code and dataset have been made available publicly.
Authors' comments: 6 pages and 2 figures, Learn more about this project at
https://iasbs.ac.ir/~ansari/deeptwitter
Peng Xu, Kun Liu, Tao Xiang, Timothy M. Hospedales, Zhanyu Ma, Jun Guo, Yi-Zhe Song
Existing sketch-analysis work studies sketches depicting static objects or scenes. In this work, we propose a novel cross-modal retrieval problem of fine-grained instance-level sketch-based video retrieval (FG-SBVR), where a sketch sequence is used as a query to retrieve a specific target video instance. Compared with sketch-based still image retrieval, and coarse-grained category-level video retrieval, this is more challenging as both visual appearance and motion need to be simultaneously matched at a fine-grained level. We contribute the first FG-SBVR dataset with rich annotations. We then introduce a novel multi-stream multi-modality deep network to perform FG-SBVR under both strong and weakly supervised settings. The key component of the network is a relation module, designed to prevent model over-fitting given scarce training data. We show that this model significantly outperforms a number of existing state-of-the-art models designed for video analysis.
Zheng Chen, Sadid A. Hasan, Joey Liu, Vivek Datla, Md Shamsuzzaman, Hafiz Khan, Mohammad S Sorower, Gabe Mankovich et al.
This paper presents an ontology-driven treatment article retrieval system developed and experimented using the data and ground truths provided by the TREC 2017 precision medicine track. The key aspects of our system include: meaningful integration of various disease, gene, and drug name ontologies, training of a novel perceptron model for article relevance labeling, a ranking module that considers additional factors such as journal impact and article publication year, and comprehensive query matching rules. Experimental results demonstrate that our proposed system considerably outperforms the results of the best participating system of the TREC 2017 precision medicine challenge.
Wei-Cheng Chang, Felix X. Yu, Yin-Wen Chang, Yiming Yang, Sanjiv Kumar
We consider the large-scale query-document retrieval problem: given a query
(e.g., a question), return the set of relevant documents (e.g., paragraphs
containing the answer) from a large document corpus. This problem is often
solved in two steps. The retrieval phase first reduces the solution space,
returning a subset of candidate documents. The scoring phase then re-ranks the
documents. Critically, the retrieval algorithm not only desires high recall but
also requires to be highly efficient, returning candidates in time sublinear to
the number of documents. Unlike the scoring phase witnessing significant
advances recently due to the BERT-style pre-training tasks on cross-attention
models, the retrieval phase remains less well studied. Most previous works rely
on classic Information Retrieval (IR) methods such as BM-25 (token matching +
TF-IDF weights). These models only accept sparse handcrafted features and can
not be optimized for different downstream tasks of interest. In this paper, we
conduct a comprehensive study on the embedding-based retrieval models. We show
that the key ingredient of learning a strong embedding-based Transformer model
is the set of pre-training tasks. With adequately designed paragraph-level
pre-training tasks, the Transformer models can remarkably improve over the
widely-used BM-25 as well as embedding models without Transformers. The
paragraph-level pre-training tasks we studied are Inverse Cloze Task (ICT),
Body First Selection (BFS), Wiki Link Prediction (WLP), and the combination of
all three.
Authors' comments: Accepted by ICLR 2020