Yuchen Fan, Jiahui Yu, Ding Liu, Thomas S. Huang
While scale-invariant modeling has substantially boosted the performance of
visual recognition tasks, it remains largely under-explored in deep networks
based image restoration. Naively applying those scale-invariant techniques
(e.g. multi-scale testing, random-scale data augmentation) to image restoration
tasks usually leads to inferior performance. In this paper, we show that
properly modeling scale-invariance into neural networks can bring significant
benefits to image restoration performance. Inspired from spatial-wise
convolution for shift-invariance, "scale-wise convolution" is proposed to
convolve across multiple scales for scale-invariance. In our scale-wise
convolutional network (SCN), we first map the input image to the feature space
and then build a feature pyramid representation via bi-linear down-scaling
progressively. The feature pyramid is then passed to a residual network with
scale-wise convolutions. The proposed scale-wise convolution learns to
dynamically activate and aggregate features from different input scales in each
residual building block, in order to exploit contextual information on multiple
scales. In experiments, we compare the restoration accuracy and parameter
efficiency among our model and many different variants of multi-scale neural
networks. The proposed network with scale-wise convolution achieves superior
performance in multiple image restoration tasks including image
super-resolution, image denoising and image compression artifacts removal. Code
and models are available at: https://github.com/ychfan/scn_sr
Authors' comments: AAAI 2020
Ahmad Abdi, Gérard Cornuéjols, Tony Huynh, Dabeen Lee
A clutter is \emph{$k$-wise intersecting} if every $k$ members have a common
element, yet no element belongs to all members. We conjecture that, for some
integer $k\geq 4$, every $k$-wise intersecting clutter is non-ideal. As
evidence for our conjecture, we prove it for $k=4$ for the class of binary
clutters. Two key ingredients for our proof are Jaeger's $8$-flow theorem for
graphs, and Seymour's characterization of the binary matroids with the sums of
circuits property. As further evidence for our conjecture, we also note that it
follows from an unpublished conjecture of Seymour from 1975. We also discuss
connections to the chromatic number of a clutter, projective geometries over
the two-element field, uniform cycle covers in graphs, and quarter-integral
packings of value two in ideal clutters.
Authors' comments: 20 pages, 2 figures. An extended abstract under the same title
appeared in the 21st Conference in Integer Programming and Combinatorial
Optimization
Jason O'Neill, Jacques Verstraete
For an integer $d \geq 2$, a family $\mathcal{F}$ of sets is $\textit{$d$-wise intersecting}$ if for any distinct sets $A_1,A_2,\dots,A_d \in \mathcal{F}$, $A_1 \cap A_2 \cap \dots \cap A_d \neq \emptyset$, and $\textit{non-trivial}$ if $\bigcap \mathcal{F} = \emptyset$. Hilton and Milner conjectured that for $k \geq d \geq 2$ and large enough $n$, the extremal non-trivial $d$-wise intersecting family of $k$-element subsets of $[n]$ is one of the following two families: \begin{align*} &\mathcal{H}(k,d) = \{A \in \binom{[n]}{k} : [d-1] \subset A, A \cap [d,k+1] \neq \emptyset\} \cup \{[k+1] \setminus \{i \} : i \in [d - 1]\} \\ &\mathcal{A}(k,d) = \{ A \in \binom{[n]}{k} : |A \cap [d+1]| \geq d \}. \end{align*} The celebrated Hilton-Milner Theorem states that $\mathcal{H}(k,2)$ is the unique extremal non-trivial intersecting family for $k>3$. We prove the conjecture and prove a stability theorem, stating that any large enough non-trivial $d$-wise intersecting family of $k$-element subsets of $[n]$ is a subfamily of $\mathcal{A}(k,d)$ or $\mathcal{H}(k,d)$.
Cyprien Ruffino, Romain Hérault, Eric Laloy, Gilles Gasso
Generative Adversarial Networks (GANs) have proven successful for unsupervised image generation. Several works extended GANs to image inpainting by conditioning the generation with parts of the image one wants to reconstruct. However, these methods have limitations in settings where only a small subset of the image pixels is known beforehand. In this paper, we study the effectiveness of conditioning GANs by adding an explicit regularization term to enforce pixel-wise conditions when very few pixel values are provided. In addition, we also investigate the influence of this regularization term on the quality of the generated images and the satisfaction of the conditions. Conducted experiments on MNIST and FashionMNIST show evidence that this regularization term allows for controlling the trade-off between quality of the generated images and constraint satisfaction.
Tapas Kumar Mishra
Let $L = \{\frac{a_1}{b_1}, \ldots , \frac{a_s}{b_s}\}$, where for every $i
\in [s]$, $\frac{a_i}{b_i} \in [0,1)$ is an irreducible fraction. Let
$\mathcal{F} = \{A_1, \ldots , A_m\}$ be a family of subsets of $[n]$. We say
$\mathcal{F}$ is a \emph{r-wise fractional $L$-intersecting family} if for
every distinct $i_1,i_2, \ldots,i_r \in [m]$, there exists an $\frac{a}{b} \in
L$ such that $|A_{i_1} \cap A_{i_2} \cap \ldots \cap A_{i_r}| \in \{
\frac{a}{b}|A_{i_1}|, \frac{a}{b} |A_{i_2}|,\ldots, \frac{a}{b} |A_{i_r}| \}$.
In this paper, we introduce and study the notion of r-wise fractional
$L$-intersecting families. This is a generalization of notion of fractional
$L$-intersecting families studied in [Niranjan et.al, Fractional
$L$-intersecting families, The Electronic Journal of Combinatorics, 2019].
Authors' comments: 8 pages
Ricard Durall, Franz-Josef Pfreundt, Ullrich Köthe, Janis Keuper
Recent deep learning based approaches have shown remarkable success on object segmentation tasks. However, there is still room for further improvement. Inspired by generative adversarial networks, we present a generic end-to-end adversarial approach, which can be combined with a wide range of existing semantic segmentation networks to improve their segmentation performance. The key element of our method is to replace the commonly used binary adversarial loss with a high resolution pixel-wise loss. In addition, we train our generator employing stochastic weight averaging fashion, which further enhances the predicted output label maps leading to state-of-the-art results. We show, that this combination of pixel-wise adversarial training and weight averaging leads to significant and consistent gains in segmentation performance, compared to the baseline models.
Raniere de Menezes, Harold A. Peña-Herazo, Ezequiel J. Marchesini, Raffaele D'Abrusco, Nicola Masetti, Rodrigo Nemmen, Francesco Massaro, Federica Ricci et al.
Over the last decade more than five thousand gamma-ray sources were detected
by the Large Area Telescope (LAT) on board Fermi Gamma-ray Space Telescope.
Given the positional uncertainty of the telescope, nearly 30% of these sources
remain without an obvious counterpart in lower energies. This motivated the
release of new catalogs of gamma-ray counterpart candidates and several follow
up campaigns in the last decade. Recently, two new catalogs of blazar
candidates were released, they are the improved and expanded version of the
WISE Blazar-Like Radio-Loud Sources (WIBRaLS2) catalog and the Kernel Density
Estimation selected candidate BL Lacs (KDEBLLACS) catalog, both selecting
blazar-like sources based on their infrared colors from the Wide-field Infrared
Survey Explorer (WISE). In this work we characterized these two catalogs,
clarifying the true nature of their sources based on their optical spectra from
SDSS data release 15, thus testing how efficient they are in selecting true
blazars. We first selected all WIBRaLS2 and KDEBLLACS sources with available
optical spectra in the footprint of Sloan Digital Sky Survey data release 15.
Then we analyzed these spectra to verify the nature of each selected candidate
and see which fraction of the catalogs is composed by spectroscopically
confirmed blazars. Finally, we evaluated the impact of selection effects,
specially those related to optical colors of WIBRaLS2/KDEBLLACS sources and
their optical magnitude distributions. We found that at least ~ 30% of each
catalog is composed by confirmed blazars, with quasars being the major
contaminants in the case of WIBRaLS2 (~ 58%) and normal galaxies in the case of
KDEBLLACS (~ 38.2%). The spectral analysis also allowed us to identify the
nature of 11 blazar candidates of uncertain type (BCUs) from the Fermi-LAT 4th
Point Source Catalog (4FGL) and to find 25 new BL Lac objects.
Authors' comments: 11 pages, 11 figures
Sachin Mehta, Hannaneh Hajishirzi, Mohammad Rastegari
We introduce a novel and generic convolutional unit, DiCE unit, that is built
using dimension-wise convolutions and dimension-wise fusion. The dimension-wise
convolutions apply light-weight convolutional filtering across each dimension
of the input tensor while dimension-wise fusion efficiently combines these
dimension-wise representations; allowing the DiCE unit to efficiently encode
spatial and channel-wise information contained in the input tensor. The DiCE
unit is simple and can be seamlessly integrated with any architecture to
improve its efficiency and performance. Compared to depth-wise separable
convolutions, the DiCE unit shows significant improvements across different
architectures. When DiCE units are stacked to build the DiCENet model, we
observe significant improvements over state-of-the-art models across various
computer vision tasks including image classification, object detection, and
semantic segmentation. On the ImageNet dataset, the DiCENet delivers 2-4%
higher accuracy than state-of-the-art manually designed models (e.g.,
MobileNetv2 and ShuffleNetv2). Also, DiCENet generalizes better to tasks (e.g.,
object detection) that are often used in resource-constrained devices in
comparison to state-of-the-art separable convolution-based efficient networks,
including neural search-based methods (e.g., MobileNetv3 and MixNet. Our source
code in PyTorch is open-source and is available at
https://github.com/sacmehta/EdgeNets/
Authors' comments: Accepted at IEEE Transactions on Pattern Analysis and Machine
Intelligence (TPAMI)
Radhika Vasisht, Ruchi Das
In this paper, we study continuum-wise expansive non-autonomous discrete
dynamical systems. We discuss various properties of such non-autonomous
systems. We further obtain results for cw-expansive non-autonomous systems with
shadowing property and obtain an important equivalence.
\keywords{Non-autonomous dynamical systems, continuum-wise expansive,
shadowing property
Authors' comments: Some results may not be correct
George Retsinas, Athena Elafrou, Georgios Goumas, Petros Maragos
In this paper, we introduce Channel-wise recurrent convolutional neural networks (RecNets), a family of novel, compact neural network architectures for computer vision tasks inspired by recurrent neural networks (RNNs). RecNets build upon Channel-wise recurrent convolutional (CRC) layers, a novel type of convolutional layer that splits the input channels into disjoint segments and processes them in a recurrent fashion. In this way, we simulate wide, yet compact models, since the number of parameters is vastly reduced via the parameter sharing of the RNN formulation. Experimental results on the CIFAR-10 and CIFAR-100 image classification tasks demonstrate the superior size-accuracy trade-off of RecNets compared to other compact state-of-the-art architectures.
Roger Behling, J. -Yunier Bello-Cruz, Luiz-Rafael Santos
The elementary Euclidean concept of circumcenter has recently been employed to improve two aspects of the classical Douglas--Rachford method for projecting onto the intersection of affine subspaces. The so-called circumcentered-reflection method is able to both accelerate the average reflection scheme by the Douglas--Rachford method and cope with the intersection of more than two affine subspaces. We now introduce the technique of circumcentering in blocks, which, more than just an option over the basic algorithm of circumcenters, turns out to be an elegant manner of generalizing the method of alternating projections. Linear convergence for this novel block-wise circumcenter framework is derived and illustrated numerically. Furthermore, we prove that the original circumcentered-reflection method essentially finds the best approximation solution in one single step if the given affine subspaces are hyperplanes.
Qian Lou, Feng Guo, Lantao Liu, Minje Kim, Lei Jiang
Network quantization is one of the most hardware friendly techniques to
enable the deployment of convolutional neural networks (CNNs) on low-power
mobile devices. Recent network quantization techniques quantize each weight
kernel in a convolutional layer independently for higher inference accuracy,
since the weight kernels in a layer exhibit different variances and hence have
different amounts of redundancy. The quantization bitwidth or bit number (QBN)
directly decides the inference accuracy, latency, energy and hardware overhead.
To effectively reduce the redundancy and accelerate CNN inferences, various
weight kernels should be quantized with different QBNs. However, prior works
use only one QBN to quantize each convolutional layer or the entire CNN,
because the design space of searching a QBN for each weight kernel is too
large. The hand-crafted heuristic of the kernel-wise QBN search is so
sophisticated that domain experts can obtain only sub-optimal results. It is
difficult for even deep reinforcement learning (DRL) Deep Deterministic Policy
Gradient (DDPG)-based agents to find a kernel-wise QBN configuration that can
achieve reasonable inference accuracy. In this paper, we propose a
hierarchical-DRL-based kernel-wise network quantization technique, AutoQ, to
automatically search a QBN for each weight kernel, and choose another QBN for
each activation layer. Compared to the models quantized by the state-of-the-art
DRL-based schemes, on average, the same models quantized by AutoQ reduce the
inference latency by 54.06\%, and decrease the inference energy consumption by
50.69\%, while achieving the same inference accuracy.
Authors' comments: 10 pages, 12 figures
Dániel Gerbner, Dániel T. Nagy, Balázs Patkós, Máté Vizer
In many proofs concerning extremal parameters of Berge hypergraphs one starts
with analyzing that part of that shadow graph which is contained in many
hyperedges. Capturing this phenomenon we introduce two new types of
hypergraphs. A hypergraph $\mathcal{H}$ is a $t$-heavy copy of a graph $F$ if
there is a copy of $F$ on its vertex set such that each edge of $F$ is
contained in at least $t$ hyperedges of $\mathcal{H}$. $\mathcal{H}$ is a
$t$-wise Berge copy of $F$ if additionally for distinct edges of $F$ those $t$
hyperedges are distinct.
We extend known upper bounds on the Tur\'an number of Berge hypergraphs to
the $t$-wise Berge hypergraphs case. We asymptotically determine the Tur\'an
number of $t$-heavy and $t$-wise Berge copies of long paths and cycles and
exactly determine the Tur\'an number of $t$-heavy and $t$-wise Berge copies of
cliques.
In the case of 3-uniform hypergraphs, we consider the problem in more details
and obtain additional results.
Authors' comments: 20 pages
Yingquan Wu, Eyal En Gad
In this paper we comprehensively investigate block-wise product (BWP) BCH
codes, wherein raw data is arranged in the form of block-wise matrix and each
row and column BCH codes intersect on one data block. We first devise efficient
BCH decoding algorithms, including reduced-1-bit decoding, extra-1-bit list
decoding, and extra-2-bit list decoding. We next present a systematic
construction of BWP-BCH codes upon given message and parity lengths that takes
into account for performance, implementation and scalability, rather than
focusing on a regularly defined BWP-BCH code. It can easily accommodate
different message length or parity length at minimal changes. It employs
extended BCH codes instead of BCH codes to reduce miscorrection rate and an
inner RS code to lower error floor. We also describe a high-speed scalable
encoder. We finally present a novel iterative decoding algorithm which is
divided into three phases. The first phase iteratively applies reduced BCH
correction capabilities to correct lightly corrupted rows/columns while
suppressing miscorrection, until the process stalls. The second phase
iteratively decodes up to the designed correction capabilities, until the
process stalls. The last phase iteratively applies the proposed list decoding
in a novel manner which effectively determines the correct candidate. The key
idea is to use cross decoding upon each list candidate to pick the candidate
which enables the maximum number of successful cross decoding. Our simulations
show that the proposed algorithm provides a significant performance boost
compared to the state-of-the-art algorithms.
Authors' comments: Submitted to IEEE trans. Info. Theory
Hyungtae Lee, Heesung Kwon, Wonkook Kim
Pixel-wise classification in remote sensing identifies entities in
large-scale satellite-based images at the pixel level. Few fully annotated
large-scale datasets for pixel-wise classification exist due to the challenges
of annotating individual pixels. Training data scarcity inevitably ensues from
the annotation challenge, leading to overfitting classifiers and degraded
classification performance. The lack of annotated pixels also necessarily
results in few hard examples of various entities critical for generating a
robust classification hyperplane. To overcome the problem of the data scarcity
and lack of hard examples in training, we introduce a two-step hard example
generation (HEG) approach that first generates hard example candidates and then
mines actual hard examples. In the first step, a generator that creates hard
example candidates is learned via the adversarial learning framework by fooling
a discriminator and a pixel-wise classification model at the same time. In the
second step, mining is performed to build a fixed number of hard examples from
a large pool of real and artificially generated examples. To evaluate the
effectiveness of the proposed HEG approach, we design a 9-layer fully
convolutional network suitable for pixel-wise classification. Experiments show
that using generated hard examples from the proposed HEG approach improves the
pixel-wise classification model's accuracy on red tide detection and
hyperspectral image classification tasks.
Authors' comments: IEEE Journal of Selected Topics in Applied Earth Observations and
Remote Sensing (JSTARS)
Daichi Ono, Hiroyuki Yabe, Tsutomu Horikawa
We propose the method that uses only computer graphics datasets to parse the real world 3D scenes. 3D scene parsing based on semantic segmentation is required to implement the categorical interaction in the virtual world. Convolutional Neural Networks (CNNs) have recently shown state-of-theart performance on computer vision tasks including semantic segmentation. However, collecting and annotating a huge amount of data are needed to train CNNs. Especially in the case of semantic segmentation, annotating pixel by pixel takes a significant amount of time and often makes mistakes. In contrast, computer graphics can generate a lot of accurate annotated data and easily scale up by changing camera positions, textures and lights. Despite these advantages, models trained on computer graphics datasets cannot perform well on real data, which is known as the domain shift. To address this issue, we first present that depth modal and synthetic noise are effective to reduce the domain shift. Then, we develop the class-wise adaptation which obtains domain invariant features of CNNs. To reduce the domain shift, we create computer graphics rooms with a lot of props, and provide photo-realistic rendered images.We also demonstrate the application which is combined semantic segmentation with Simultaneous Localization and Mapping (SLAM). Our application performs accurate 3D scene parsing in real-time on an actual room.
Sergio Vitale, Davide Cozzolino, Giuseppe Scarpa, Luisa Verdoliva, Giovanni Poggi
We propose a new method for SAR image despeckling which leverages information drawn from co-registered optical imagery. Filtering is performed by plain patch-wise nonlocal means, operating exclusively on SAR data. However, the filtering weights are computed by taking into account also the optical guide, which is much cleaner than the SAR data, and hence more discriminative. To avoid injecting optical-domain information into the filtered image, a SAR-domain statistical test is preliminarily performed to reject right away any risky predictor. Experiments on two SAR-optical datasets prove the proposed method to suppress very effectively the speckle, preserving structural details, and without introducing visible filtering artifacts. Overall, the proposed method compares favourably with all state-of-the-art despeckling filters, and also with our own previous optical-guided filter.
Aadirupa Saha, Aditya Gopalan
We consider the problem of probably approximately correct (PAC) ranking $n$
items by adaptively eliciting subset-wise preference feedback. At each round,
the learner chooses a subset of $k$ items and observes stochastic feedback
indicating preference information of the winner (most preferred) item of the
chosen subset drawn according to a Plackett-Luce (PL) subset choice model
unknown a priori. The objective is to identify an $\epsilon$-optimal ranking of
the $n$ items with probability at least $1 - \delta$. When the feedback in each
subset round is a single Plackett-Luce-sampled item, we show $(\epsilon,
\delta)$-PAC algorithms with a sample complexity of
$O\left(\frac{n}{\epsilon^2} \ln \frac{n}{\delta} \right)$ rounds, which we
establish as being order-optimal by exhibiting a matching sample complexity
lower bound of $\Omega\left(\frac{n}{\epsilon^2} \ln \frac{n}{\delta}
\right)$---this shows that there is essentially no improvement possible from
the pairwise comparisons setting ($k = 2$). When, however, it is possible to
elicit top-$m$ ($\leq k$) ranking feedback according to the PL model from each
adaptively chosen subset of size $k$, we show that an $(\epsilon, \delta)$-PAC
ranking sample complexity of $O\left(\frac{n}{m \epsilon^2} \ln
\frac{n}{\delta} \right)$ is achievable with explicit algorithms, which
represents an $m$-wise reduction in sample complexity compared to the pairwise
case. This again turns out to be order-wise unimprovable across the class of
symmetric ranking algorithms. Our algorithms rely on a novel {pivot trick} to
maintain only $n$ itemwise score estimates, unlike $O(n^2)$ pairwise score
estimates that has been used in prior work. We report results of numerical
experiments that corroborate our findings.
Authors' comments: In 22nd International Conference on Artificial Intelligence and
Statistics (AISTATS), 2019. (44 pages, 8 figures). arXiv admin note: text
overlap with arXiv:1808.04008
Carlos-Emiliano González-Gallardo, Juan-Manuel Torres-Moreno
Sentence Boundary Detection (SBD) has been a major research topic since
Automatic Speech Recognition transcripts have been used for further Natural
Language Processing tasks like Part of Speech Tagging, Question Answering or
Automatic Summarization. But what about evaluation? Do standard evaluation
metrics like precision, recall, F-score or classification error; and more
important, evaluating an automatic system against a unique reference is enough
to conclude how well a SBD system is performing given the final application of
the transcript? In this paper we propose Window-based Sentence Boundary
Evaluation (WiSeBE), a semi-supervised metric for evaluating Sentence Boundary
Detection systems based on multi-reference (dis)agreement. We evaluate and
compare the performance of different SBD systems over a set of Youtube
transcripts using WiSeBE and standard metrics. This double evaluation gives an
understanding of how WiSeBE is a more reliable metric for the SBD task.
Authors' comments: In proceedings of the 17th Mexican International Conference on
Artificial Intelligence (MICAI), 2018
Homanga Bharadhwaj
In this paper, we tackle the problem of explanations in a deep-learning based
model for recommendations by leveraging the technique of layer-wise relevance
propagation. We use a Deep Convolutional Neural Network to extract relevant
features from the input images before identifying similarity between the images
in feature space. Relationships between the images are identified by the model
and layer-wise relevance propagation is used to infer pixel-level details of
the images that may have significantly informed the model's choice. We evaluate
our method on an Amazon products dataset and demonstrate the efficacy of our
approach.
Authors' comments: Accepted in Proceedings of the EARS Workshop at SIGIR 2018