Daichi Ono, Hiroyuki Yabe, Tsutomu Horikawa
We propose the method that uses only computer graphics datasets to parse the real world 3D scenes. 3D scene parsing based on semantic segmentation is required to implement the categorical interaction in the virtual world. Convolutional Neural Networks (CNNs) have recently shown state-of-theart performance on computer vision tasks including semantic segmentation. However, collecting and annotating a huge amount of data are needed to train CNNs. Especially in the case of semantic segmentation, annotating pixel by pixel takes a significant amount of time and often makes mistakes. In contrast, computer graphics can generate a lot of accurate annotated data and easily scale up by changing camera positions, textures and lights. Despite these advantages, models trained on computer graphics datasets cannot perform well on real data, which is known as the domain shift. To address this issue, we first present that depth modal and synthetic noise are effective to reduce the domain shift. Then, we develop the class-wise adaptation which obtains domain invariant features of CNNs. To reduce the domain shift, we create computer graphics rooms with a lot of props, and provide photo-realistic rendered images.We also demonstrate the application which is combined semantic segmentation with Simultaneous Localization and Mapping (SLAM). Our application performs accurate 3D scene parsing in real-time on an actual room.
Sergio Vitale, Davide Cozzolino, Giuseppe Scarpa, Luisa Verdoliva, Giovanni Poggi
We propose a new method for SAR image despeckling which leverages information drawn from co-registered optical imagery. Filtering is performed by plain patch-wise nonlocal means, operating exclusively on SAR data. However, the filtering weights are computed by taking into account also the optical guide, which is much cleaner than the SAR data, and hence more discriminative. To avoid injecting optical-domain information into the filtered image, a SAR-domain statistical test is preliminarily performed to reject right away any risky predictor. Experiments on two SAR-optical datasets prove the proposed method to suppress very effectively the speckle, preserving structural details, and without introducing visible filtering artifacts. Overall, the proposed method compares favourably with all state-of-the-art despeckling filters, and also with our own previous optical-guided filter.
Aadirupa Saha, Aditya Gopalan
We consider the problem of probably approximately correct (PAC) ranking $n$
items by adaptively eliciting subset-wise preference feedback. At each round,
the learner chooses a subset of $k$ items and observes stochastic feedback
indicating preference information of the winner (most preferred) item of the
chosen subset drawn according to a Plackett-Luce (PL) subset choice model
unknown a priori. The objective is to identify an $\epsilon$-optimal ranking of
the $n$ items with probability at least $1 - \delta$. When the feedback in each
subset round is a single Plackett-Luce-sampled item, we show $(\epsilon,
\delta)$-PAC algorithms with a sample complexity of
$O\left(\frac{n}{\epsilon^2} \ln \frac{n}{\delta} \right)$ rounds, which we
establish as being order-optimal by exhibiting a matching sample complexity
lower bound of $\Omega\left(\frac{n}{\epsilon^2} \ln \frac{n}{\delta}
\right)$---this shows that there is essentially no improvement possible from
the pairwise comparisons setting ($k = 2$). When, however, it is possible to
elicit top-$m$ ($\leq k$) ranking feedback according to the PL model from each
adaptively chosen subset of size $k$, we show that an $(\epsilon, \delta)$-PAC
ranking sample complexity of $O\left(\frac{n}{m \epsilon^2} \ln
\frac{n}{\delta} \right)$ is achievable with explicit algorithms, which
represents an $m$-wise reduction in sample complexity compared to the pairwise
case. This again turns out to be order-wise unimprovable across the class of
symmetric ranking algorithms. Our algorithms rely on a novel {pivot trick} to
maintain only $n$ itemwise score estimates, unlike $O(n^2)$ pairwise score
estimates that has been used in prior work. We report results of numerical
experiments that corroborate our findings.
Authors' comments: In 22nd International Conference on Artificial Intelligence and
Statistics (AISTATS), 2019. (44 pages, 8 figures). arXiv admin note: text
overlap with arXiv:1808.04008
Carlos-Emiliano González-Gallardo, Juan-Manuel Torres-Moreno
Sentence Boundary Detection (SBD) has been a major research topic since
Automatic Speech Recognition transcripts have been used for further Natural
Language Processing tasks like Part of Speech Tagging, Question Answering or
Automatic Summarization. But what about evaluation? Do standard evaluation
metrics like precision, recall, F-score or classification error; and more
important, evaluating an automatic system against a unique reference is enough
to conclude how well a SBD system is performing given the final application of
the transcript? In this paper we propose Window-based Sentence Boundary
Evaluation (WiSeBE), a semi-supervised metric for evaluating Sentence Boundary
Detection systems based on multi-reference (dis)agreement. We evaluate and
compare the performance of different SBD systems over a set of Youtube
transcripts using WiSeBE and standard metrics. This double evaluation gives an
understanding of how WiSeBE is a more reliable metric for the SBD task.
Authors' comments: In proceedings of the 17th Mexican International Conference on
Artificial Intelligence (MICAI), 2018
Homanga Bharadhwaj
In this paper, we tackle the problem of explanations in a deep-learning based
model for recommendations by leveraging the technique of layer-wise relevance
propagation. We use a Deep Convolutional Neural Network to extract relevant
features from the input images before identifying similarity between the images
in feature space. Relationships between the images are identified by the model
and layer-wise relevance propagation is used to infer pixel-level details of
the images that may have significantly informed the model's choice. We evaluate
our method on an Amazon products dataset and demonstrate the efficacy of our
approach.
Authors' comments: Accepted in Proceedings of the EARS Workshop at SIGIR 2018
Papri Dey, Paul Görlach, Nidhi Kaihnsa
We introduce and study coordinate-wise powers of subvarieties of
$\mathbb{P}^n$, i.e. varieties arising from raising all points in a given
subvariety of $\mathbb{P}^n$ to the $r$-th power, coordinate by coordinate.
This corresponds to studying the image of a subvariety of $\mathbb{P}^n$ under
the quotient of $\mathbb{P}^n$ by the action of the finite group
$\mathbb{Z}_r^{n+1}$. We determine the degree of coordinate-wise powers and
study their defining equations, particularly for hypersurfaces and linear
spaces. Applying these results, we compute the degree of the variety of
orthostochastic matrices and determine iterated dual and reciprocal varieties
of power sum hypersurfaces. We also establish a link between coordinate-wise
squares of linear spaces and the study of real symmetric matrices with a
degenerate eigenspectrum.
Authors' comments: 26 pages
Ryan O'Donnell, Yu Zhao
A probability distribution over {-1, 1}^n is (eps, k)-wise uniform if, roughly, it is eps-close to the uniform distribution when restricted to any k coordinates. We consider the problem of how far an (eps, k)-wise uniform distribution can be from any globally k-wise uniform distribution. We show that every (eps, k)-wise uniform distribution is O(n^(k/2) eps)-close to a k-wise uniform distribution in total variation distance. In addition, we show that this bound is optimal for all even k: we find an (eps, k)-wise uniform distribution that is Omega(n^(k/2) eps)-far from any k-wise uniform distribution in total variation distance. For k = 1, we get a better upper bound of O(eps), which is also optimal. One application of our closeness result is to the sample complexity of testing whether a distribution is k-wise uniform or delta-far from k-wise uniform. We give an upper bound of O(n^k/delta^2) (or O(log n/delta^2) when k = 1) on the required samples. We show an improved upper bound of O~(n^(k/2)/delta^2) for the special case of testing fully uniform vs. delta-far from k-wise uniform. Finally, we complement this with a matching lower bound of Omega(n/delta^2) when k = 2. Our results improve upon the best known bounds from [AAK+07], and have simpler proofs.
Michael Motro, Joydeep Ghosh
Handling object interaction is a fundamental challenge in practical
multi-object tracking, even for simple interactive effects such as one object
temporarily occluding another. We formalize the problem of occlusion in
tracking with two different abstractions. In object-wise occlusion, objects
that are occluded by other objects do not generate measurements. In
measurement-wise occlusion, a previously unstudied approach, all objects may
generate measurements but some measurements may be occluded by others. While
the relative validity of each abstraction depends on the situation and sensor,
measurement-wise occlusion fits into probabilistic multi-object tracking
algorithms with much looser assumptions on object interaction. Its value is
demonstrated by showing that it naturally derives a popular approximation for
lidar tracking, and by an example of visual tracking in image space.
Authors' comments: presenting at 21st International Conference on Information Fusion,
2018
Xiangyu Zhao, Long Xia, Liang Zhang, Zhuoye Ding, Dawei Yin, Jiliang Tang
Recommender systems can mitigate the information overload problem by
suggesting users' personalized items. In real-world recommendations such as
e-commerce, a typical interaction between the system and its users is -- users
are recommended a page of items and provide feedback; and then the system
recommends a new page of items. To effectively capture such interaction for
recommendations, we need to solve two key problems -- (1) how to update
recommending strategy according to user's \textit{real-time feedback}, and 2)
how to generate a page of items with proper display, which pose tremendous
challenges to traditional recommender systems. In this paper, we study the
problem of page-wise recommendations aiming to address aforementioned two
challenges simultaneously. In particular, we propose a principled approach to
jointly generate a set of complementary items and the corresponding strategy to
display them in a 2-D page; and propose a novel page-wise recommendation
framework based on deep reinforcement learning, DeepPage, which can optimize a
page of items with proper display based on real-time feedback from users. The
experimental results based on a real-world e-commerce dataset demonstrate the
effectiveness of the proposed framework.
Authors' comments: arXiv admin note: text overlap with arXiv:1802.06501
Jennie Paine, Jeremy Darling, Alexandra Truebenbach
The Gaia mission has detected a large number of active galactic nuclei (AGN)
and galaxies, but these objects must be identified among the 1000-fold more
numerous stars. Extant astrometric AGN catalogs do not have the uniform sky
coverage required to detect and characterize the all-sky low-multipole proper
motion signals produced by the barycenter motion, gravitational waves, and
cosmological effects. To remedy this, we present an all-sky sample of 567,721
AGN in Gaia Data Release 1, selected using WISE two-color criteria. The catalog
has fairly uniform sky coverage beyond the Galactic plane, with a mean density
of 12.8 AGN per square degree. The objects have magnitudes ranging from $G=8.8$
down to Gaia's magnitude limit, $G=20.7$. The catalog is approximately 50%
complete but suffers from low stellar contamination, roughly 0.2%. We predict
that the end-of-mission Gaia proper motions for this catalog will enable
detection of the secular aberration drift to high significance (23$\sigma$) and
will limit the anisotropy of the Hubble expansion to about 2%.
Authors' comments: 11 pages, 7 figures; Submitted to ApJS
Yair Censor, Howard Heaton, Reinhard Schulte
Superiorization reduces, not necessarily minimizes, the value of a target
function while seeking constraints-compatibility. This is done by taking a
solely feasibility-seeking algorithm, analyzing its perturbations resilience,
and proactively perturbing its iterates accordingly to steer them toward a
feasible point with reduced value of the target function. When the perturbation
steps are computationally efficient, this enables generation of a superior
result with essentially the same computational cost as that of the original
feasibility-seeking algorithm. In this work, we refine previous formulations of
the superiorization method to create a more general framework, enabling target
function reduction steps that do not require partial derivatives of the target
function. In perturbations that use partial derivatives the step-sizes in the
perturbation phase of the superiorization method are chosen independently from
the choice of the nonascent directions. This is no longer true when
component-wise perturbations are employed. In that case, the step-sizes must be
linked to the choice of the nonascent direction in every step. Besides
presenting and validating these notions, we give a computational demonstration
of superiorization with component-wise perturbations for a problem of
computerized tomography image reconstruction.
Authors' comments: Numerical Algorithms, accepted for publication
Xiangyu Zhao, Liang Zhang, Long Xia, Zhuoye Ding, Dawei Yin, Jiliang Tang
Recommender systems play a crucial role in mitigating the problem of information overload by suggesting users' personalized items or services. The vast majority of traditional recommender systems consider the recommendation procedure as a static process and make recommendations following a fixed strategy. In this paper, we propose a novel recommender system with the capability of continuously improving its strategies during the interactions with users. We model the sequential interactions between users and a recommender system as a Markov Decision Process (MDP) and leverage Reinforcement Learning (RL) to automatically learn the optimal strategies via recommending trial-and-error items and receiving reinforcements of these items from users' feedbacks. In particular, we introduce an online user-agent interacting environment simulator, which can pre-train and evaluate model parameters offline before applying the model online. Moreover, we validate the importance of list-wise recommendations during the interactions between users and agent, and develop a novel approach to incorporate them into the proposed framework LIRD for list-wide recommendations. The experimental results based on a real-world e-commerce dataset demonstrate the effectiveness of the proposed framework.
Keith Frankston, Jeff Kahn, Bhargav Narayanan
Ellis and the third author showed, verifying a conjecture of Frankl, that any $3$-wise intersecting family of subsets of $\{1,2,\dots,n\}$ admitting a transitive automorphism group has cardinality $o(2^n)$, while a construction of Frankl demonstrates that the same conclusion need not hold under the weaker constraint of being regular. Answering a question of Cameron, Frankl and Kantor from 1989, we show that the restriction of admitting a transitive automorphism group may be relaxed significantly: we prove that any $3$-wise intersecting family of subsets of $\{1,2,\dots,n\}$ that is regular and increasing has cardinality $o(2^n)$.
Zhiyun Ren, Xia Ning, Huzefa Rangwala
There is a critical need to develop new educational technology applications
that analyze the data collected by universities to ensure that students
graduate in a timely fashion (4 to 6 years); and they are well prepared for
jobs in their respective fields of study. In this paper, we present a novel
approach for analyzing historical educational records from a large, public
university to perform next-term grade prediction; i.e., to estimate the grades
that a student will get in a course that he/she will enroll in the next term.
Accurate next-term grade prediction holds the promise for better student degree
planning, personalized advising and automated interventions to ensure that
students stay on track in their chosen degree program and graduate on time. We
present a factorization-based approach called Matrix Factorization with
Temporal Course-wise Influence that incorporates course-wise influence effects
and temporal effects for grade prediction. In this model, students and courses
are represented in a latent "knowledge" space. The grade of a student on a
course is modeled as the similarity of their latent representation in the
"knowledge" space. Course-wise influence is considered as an additional factor
in the grade prediction. Our experimental results show that the proposed method
outperforms several baseline approaches and infer meaningful patterns between
pairs of courses within academic programs.
Authors' comments: 8 pages, 5 figures
Lixue Zhuang, Yi Xu, Bingbing Ni, Hongteng Xu
How to effectively approximate real-valued parameters with binary codes plays
a central role in neural network binarization. In this work, we reveal an
important fact that binarizing different layers has a widely-varied effect on
the compression ratio of network and the loss of performance. Based on this
fact, we propose a novel and flexible neural network binarization method by
introducing the concept of layer-wise priority which binarizes parameters in
inverse order of their layer depth. In each training step, our method selects a
specific network layer, minimizes the discrepancy between the original
real-valued weights and its binary approximations, and fine-tunes the whole
network accordingly. During the iteration of the above process, it is
significant that we can flexibly decide whether to binarize the remaining
floating layers or not and explore a trade-off between the loss of performance
and the compression ratio of model. The resulting binary network is applied for
efficient pedestrian detection. Extensive experimental results on several
benchmarks show that under the same compression ratio, our method achieves much
lower miss rate and faster detection speed than the state-of-the-art neural
network binarization method.
Authors' comments: More experiments on image classification are planned
Zhao Zhong, Junjie Yan, Wei Wu, Jing Shao, Cheng-Lin Liu
Convolutional neural networks have gained a remarkable success in computer
vision. However, most usable network architectures are hand-crafted and usually
require expertise and elaborate design. In this paper, we provide a block-wise
network generation pipeline called BlockQNN which automatically builds
high-performance networks using the Q-Learning paradigm with epsilon-greedy
exploration strategy. The optimal network block is constructed by the learning
agent which is trained sequentially to choose component layers. We stack the
block to construct the whole auto-generated network. To accelerate the
generation process, we also propose a distributed asynchronous framework and an
early stop strategy. The block-wise generation brings unique advantages: (1) it
performs competitive results in comparison to the hand-crafted state-of-the-art
networks on image classification, additionally, the best network generated by
BlockQNN achieves 3.54% top-1 error rate on CIFAR-10 which beats all existing
auto-generate networks. (2) in the meanwhile, it offers tremendous reduction of
the search space in designing networks which only spends 3 days with 32 GPUs,
and (3) moreover, it has strong generalizability that the network built on
CIFAR also performs well on a larger-scale ImageNet dataset.
Authors' comments: Accepted to CVPR 2018
Lina Wei, Shanshan Zhao, Omar El Farouk Bourahla, Xi Li, Fei Wu
In this paper, we propose an end-to-end group-wise deep co-saliency detection
approach to address the co-salient object discovery problem based on the fully
convolutional network (FCN) with group input and group output. The proposed
approach captures the group-wise interaction information for group images by
learning a semantics-aware image representation based on a convolutional neural
network, which adaptively learns the group-wise features for co-saliency
detection. Furthermore, the proposed approach discovers the collaborative and
interactive relationships between group-wise feature representation and
single-image individual feature representation, and model this in a
collaborative learning framework. Finally, we set up a unified end-to-end deep
learning scheme to jointly optimize the process of group-wise feature
representation learning and the collaborative learning, leading to more
reliable and robust co-saliency detection results. Experimental results
demonstrate the effectiveness of our approach in comparison with the
state-of-the-art approaches.
Authors' comments: IJCAI 2017
Jialei Wang, Weiran Wang, Dan Garber, Nathan Srebro
We develop and analyze efficient "coordinate-wise" methods for finding the leading eigenvector, where each step involves only a vector-vector product. We establish global convergence with overall runtime guarantees that are at least as good as Lanczos's method and dominate it for slowly decaying spectrum. Our methods are based on combining a shift-and-invert approach with coordinate-wise algorithms for linear regression.
Siddharth Agrawal, Ambedkar Dukkipati
Variational autoencoders (VAEs), that are built upon deep neural networks have emerged as popular generative models in computer vision. Most of the work towards improving variational autoencoders has focused mainly on making the approximations to the posterior flexible and accurate, leading to tremendous progress. However, there have been limited efforts to replace pixel-wise reconstruction, which have known shortcomings. In this work, we use real-valued non-volume preserving transformations (real NVP) to exactly compute the conditional likelihood of the data given the latent distribution. We show that a simple VAE with this form of reconstruction is competitive with complicated VAE structures, on image modeling tasks. As part of our model, we develop powerful conditional coupling layers that enable real NVP to learn with fewer intermediate layers.
David Ellis, Bhargav Narayanan
A family of sets is said to be symmetric if its automorphism group is
transitive, and $3$-wise intersecting if any three sets in the family have
nonempty intersection. Frankl conjectured in 1981 that if $\mathcal{A}$ is a
symmetric $3$-wise intersecting family of subsets of $\{1,2,\dots,n\}$, then
$|\mathcal{A}| = o(2^n)$. Here, we give a short proof of Frankl's conjecture
using a 'sharp threshold' result of Friedgut and Kalai.
Authors' comments: 7 pages, typo corrected in description of 'tree' construction in
Section 3. Proc. Amer. Math. Soc