Sukmin Yun, Jongjin Park, Kimin Lee, Jinwoo Shin
Deep neural networks with millions of parameters may suffer from poor
generalization due to overfitting. To mitigate the issue, we propose a new
regularization method that penalizes the predictive distribution between
similar samples. In particular, we distill the predictive distribution between
different samples of the same label during training. This results in
regularizing the dark knowledge (i.e., the knowledge on wrong predictions) of a
single network (i.e., a self-knowledge distillation) by forcing it to produce
more meaningful and consistent predictions in a class-wise manner.
Consequently, it mitigates overconfident predictions and reduces intra-class
variations. Our experimental results on various image classification tasks
demonstrate that the simple yet powerful method can significantly improve not
only the generalization ability but also the calibration performance of modern
convolutional neural networks.
Authors' comments: Accepted to CVPR 2020. Code is available at
https://github.com/alinlab/cs-kd
Qihang Yu, Yingwei Li, Jieru Mei, Yuyin Zhou, Alan L. Yuille
3D Convolution Neural Networks (CNNs) have been widely applied to 3D scene
understanding, such as video analysis and volumetric image recognition.
However, 3D networks can easily lead to over-parameterization which incurs
expensive computation cost. In this paper, we propose Channel-wise Automatic
KErnel Shrinking (CAKES), to enable efficient 3D learning by shrinking standard
3D convolutions into a set of economic operations e.g., 1D, 2D convolutions.
Unlike previous methods, CAKES performs channel-wise kernel shrinkage, which
enjoys the following benefits: 1) enabling operations deployed in every layer
to be heterogeneous, so that they can extract diverse and complementary
information to benefit the learning process; and 2) allowing for an efficient
and flexible replacement design, which can be generalized to both
spatial-temporal and volumetric data. Further, we propose a new search space
based on CAKES, so that the replacement configuration can be determined
automatically for simplifying 3D networks. CAKES shows superior performance to
other methods with similar model size, and it also achieves comparable
performance to state-of-the-art with much fewer parameters and computational
costs on tasks including 3D medical imaging segmentation and video action
recognition. Codes and models are available at
https://github.com/yucornetto/CAKES
Authors' comments: AAAI 2021
Aliaksei L. Petsiuk, Joshua M. Pearce
The paper describes an open source computer vision-based hardware structure
and software algorithm, which analyzes layer-wise the 3-D printing processes,
tracks printing errors, and generates appropriate printer actions to improve
reliability. This approach is built upon multiple-stage monocular image
examination, which allows monitoring both the external shape of the printed
object and internal structure of its layers. Starting with the side-view height
validation, the developed program analyzes the virtual top view for outer shell
contour correspondence using the multi-template matching and iterative closest
point algorithms, as well as inner layer texture quality clustering the
spatial-frequency filter responses with Gaussian mixture models and segmenting
structural anomalies with the agglomerative hierarchical clustering algorithm.
This allows evaluation of both global and local parameters of the printing
modes. The experimentally-verified analysis time per layer is less than one
minute, which can be considered a quasi-real-time process for large prints. The
systems can work as an intelligent printing suspension tool designed to save
time and material. However, the results show the algorithm provides a means to
systematize in situ printing data as a first step in a fully open source
failure correction algorithm for additive manufacturing.
Authors' comments: 29 pages, 19 figures
Khoa D. Doan, Saurav Manchanda, Sarkhan Badirli, Chandan K. Reddy
Image hashing is one of the fundamental problems that demand both efficient and effective solutions for various practical scenarios. Adversarial autoencoders are shown to be able to implicitly learn a robust, locality-preserving hash function that generates balanced and high-quality hash codes. However, the existing adversarial hashing methods are inefficient to be employed for large-scale image retrieval applications. Specifically, they require an exponential number of samples to be able to generate optimal hash codes and a significantly high computational cost to train. In this paper, we show that the high sample-complexity requirement often results in sub-optimal retrieval performance of the adversarial hashing methods. To address this challenge, we propose a new adversarial-autoencoder hashing approach that has a much lower sample requirement and computational cost. Specifically, by exploiting the desired properties of the hash function in the low-dimensional, discrete space, our method efficiently estimates a better variant of Wasserstein distance by averaging a set of easy-to-compute one-dimensional Wasserstein distances. The resulting hashing approach has an order-of-magnitude better sample complexity, thus better generalization property, compared to the other adversarial hashing methods. In addition, the computational cost is significantly reduced using our approach. We conduct experiments on several real-world datasets and show that the proposed method outperforms the competing hashing methods, achieving up to 10% improvement over the current state-of-the-art image hashing methods. The code accompanying this paper is available on Github (https://github.com/khoadoan/adversarial-hashing).
Tomáš Dlask, Tomáš Werner
Coordinate-wise minimization is a simple popular method for large-scale
optimization. Unfortunately, for general (non-differentiable) convex problems
it may not find global minima. We present a class of linear programs that
coordinate-wise minimization solves exactly. We show that dual LP relaxations
of several well-known combinatorial optimization problems are in this class and
the method finds a global minimum with sufficient accuracy in reasonable
runtimes. Moreover, for extensions of these problems that no longer are in this
class the method yields reasonably good suboptima. Though the presented LP
relaxations can be solved by more efficient methods (such as max-flow), our
results are theoretically non-trivial and can lead to new large-scale
optimization algorithms in the future.
Authors' comments: The final authenticated version is available online at
https://doi.org/10.1007/978-3-030-53552-0_8
Hung-Yu Tseng, Hsin-Ying Lee, Jia-Bin Huang, Ming-Hsuan Yang
Few-shot classification aims to recognize novel categories with only few
labeled images in each class. Existing metric-based few-shot classification
algorithms predict categories by comparing the feature embeddings of query
images with those from a few labeled images (support examples) using a learned
metric function. While promising performance has been demonstrated, these
methods often fail to generalize to unseen domains due to large discrepancy of
the feature distribution across domains. In this work, we address the problem
of few-shot classification under domain shifts for metric-based methods. Our
core idea is to use feature-wise transformation layers for augmenting the image
features using affine transforms to simulate various feature distributions
under different domains in the training stage. To capture variations of the
feature distributions under different domains, we further apply a
learning-to-learn approach to search for the hyper-parameters of the
feature-wise transformation layers. We conduct extensive experiments and
ablation studies under the domain generalization setting using five few-shot
classification datasets: mini-ImageNet, CUB, Cars, Places, and Plantae.
Experimental results demonstrate that the proposed feature-wise transformation
layer is applicable to various metric-based models, and provides consistent
improvements on the few-shot classification performance under domain shift.
Authors' comments: ICLR 2020 (Spotlight). Project page:
http://vllab.ucmerced.edu/ym41608/projects/CrossDomainFewShot Code:
https://github.com/hytseng0509/CrossDomainFewShot
Koki Madono, Masayuki Tanaka, Masaki Onishi, Tetsuji Ogawa
In this study, a perceptually hidden object-recognition method is
investigated to generate secure images recognizable by humans but not machines.
Hence, both the perceptual information hiding and the corresponding object
recognition methods should be developed. Block-wise image scrambling is
introduced to hide perceptual information from a third party. In addition, an
adaptation network is proposed to recognize those scrambled images.
Experimental comparisons conducted using CIFAR datasets demonstrated that the
proposed adaptation network performed well in incorporating simple perceptual
information hiding into DNN-based image classification.
Authors' comments: 6 pages Artificial Intelligence of Things(AAAI-2020 WS)
S. J. Curran
Machine learning techniques, specifically the k-nearest neighbour algorithm
applied to optical band colours, have had some success in predicting
photometric redshifts of quasi-stellar objects (QSOs): Although the mean of
differences between the spectroscopic and photometric redshifts is close to
zero, the distribution of these differences remains wide and distinctly
non-Gaussian. As per our previous empirical estimate of photometric redshifts,
we find that the predictions can be significantly improved by adding colours
from other wavebands, namely the near-infrared and ultraviolet. Self-testing
this, by using half of the 33 643 strong QSO sample to train the algorithm,
results in a significantly narrower spread for the remaining half of the
sample. Using the whole QSO sample to train the algorithm, the same set of
magnitudes return a similar spread for a sample of radio sources (quasars).
Although the matching coincidence is relatively low (739 of the 3663 sources
having photometry in the relevant bands), this is still significantly larger
than from the empirical method (2%) and thus may provide a method with which to
obtain redshifts for the vast number of continuum radio sources expected to be
detected with the next generation of large radio telescopes.
Authors' comments: Accepted by MNRAS
Dan Liu, Libo Zhang, Tiejian Luo, Lili Tao, Yanjun Wu
The lack of interpretability of existing CNN-based hand detection methods
makes it difficult to understand the rationale behind their predictions. In
this paper, we propose a novel neural network model, which introduces
interpretability into hand detection for the first time. The main improvements
include: (1) Detect hands at pixel level to explain what pixels are the basis
for its decision and improve transparency of the model. (2) The explainable
Highlight Feature Fusion block highlights distinctive features among multiple
layers and learns discriminative ones to gain robust performance. (3) We
introduce a transparent representation, the rotation map, to learn rotation
features instead of complex and non-transparent rotation and derotation layers.
(4) Auxiliary supervision accelerates the training process, which saves more
than 10 hours in our experiments. Experimental results on the VIVA and Oxford
hand detection and tracking datasets show competitive accuracy of our method
compared with state-of-the-art methods with higher speed.
Authors' comments: Accepted to Pattern Recognition
Xin Zhou, Dejing Dou, Boyang Li
Search space is a key consideration for neural architecture search. Recently, Xie et al. (2019) found that randomly generated networks from the same distribution perform similarly, which suggests we should search for random graph distributions instead of graphs. We propose graphon as a new search space. A graphon is the limit of Cauchy sequence of graphs and a scale-free probabilistic distribution, from which graphs of different number of nodes can be drawn. By utilizing properties of the graphon space and the associated cut-distance metric, we develop theoretically motivated techniques that search for and scale up small-capacity stage-wise graphs found on small datasets to large-capacity graphs that can handle ImageNet. The scaled stage-wise graphs outperform DenseNet and randomly wired Watts-Strogatz networks, indicating the benefits of graphon theory in NAS applications.
Seokju Lee, Sunghoon Im, Stephen Lin, In So Kweon
We present an end-to-end joint training framework that explicitly models
6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular
camera setup without supervision. Our technical contributions are three-fold.
First, we propose a differentiable forward rigid projection module that plays a
key role in our instance-wise depth and motion learning. Second, we design an
instance-wise photometric and geometric consistency loss that effectively
decomposes background and moving object regions. Lastly, we introduce a new
auto-annotation scheme to produce video instance segmentation maps that will be
utilized as input to our training pipeline. These proposed elements are
validated in a detailed ablation study. Through extensive experiments conducted
on the KITTI dataset, our framework is shown to outperform the state-of-the-art
depth and motion estimation methods. Our code and dataset will be available at
https://github.com/SeokjuLee/Insta-DM.
Authors' comments: Project page at https://sites.google.com/site/seokjucv/home/instadm
Bingqing Xie, Pei Niu, Ting Su, Valérie Kaftandjian, Loic Boussel, Philippe Douek Feng Yang, Philippe Duvauchelle, Yuemin Zhu
Spectral photon-counting X-ray CT (sCT) opens up new possibilities for the quantitative measurement of materials in an object, compared to conventional energy-integrating CT or dual energy CT. However, achieving reliable and accurate material decomposition in sCT is extremely challenging, due to similarity between different basis materials, strong quantum noise and photon-counting detector limitations. We propose a novel material decomposition method that works in a region-wise manner. The method consists in optimizing basis materials based on spatio-energy segmentation of regions-of-interests (ROIs) in sCT images and performing a fine material decomposition involving optimized decomposition matrix and sparsity regularization. The effectiveness of the proposed method was validated on both digital and physical data. The results showed that the proposed ROI-wise material decomposition method presents clearly higher reliability and accuracy compared to common decomposition methods based on total variation (TV) or L1-norm (lasso) regularization.
Sebastian Guendel, Andreas Maier
The current accessibility to large medical datasets for training
convolutional neural networks is tremendously high. The associated dataset
labels are always considered to be the real "ground truth". However, the
labeling procedures often seem to be inaccurate and many wrong labels are
integrated. This may have fatal consequences on the performance of both
training and evaluation. In this paper, we show the impact of label noise in
the training set on a specific medical problem based on chest X-ray images.
With a simple one-class problem, the classification of tuberculosis, we measure
the performance on a clean evaluation set when training with label-corrupt
data. We develop a method to compete with incorrectly labeled data during
training by randomly attacking labels on individual epochs. The network tends
to be robust when flipping correct labels for a single epoch and initiates a
good step to the optimal minimum on the error surface when flipping noisy
labels. On a baseline with an AUC (Area under Curve) score of 0.924, the
performance drops to 0.809 when 30% of our training data is misclassified. With
our approach the baseline performance could almost be maintained, the
performance raised to 0.918.
Authors' comments: Accepted at BVM 2020
Tejus Gupta, Abhishek Sinha, Nupur Kumari, Mayank Singh, Balaji Krishnamurthy
We present an algorithm for computing class-specific universal adversarial perturbations for deep neural networks. Such perturbations can induce misclassification in a large fraction of images of a specific class. Unlike previous methods that use iterative optimization for computing a universal perturbation, the proposed method employs a perturbation that is a linear function of weights of the neural network and hence can be computed much faster. The method does not require any training data and has no hyper-parameters. The attack obtains 34% to 51% fooling rate on state-of-the-art deep neural networks on ImageNet and transfers across models. We also study the characteristics of the decision boundaries learned by standard and adversarially trained models to understand the universal adversarial perturbations.
Yousef Atoum, Mao Ye, Liu Ren, Ying Tai, Xiaoming Liu
Absence of nearby light sources while capturing an image will degrade the
visibility and quality of the captured image, making computer vision tasks
difficult. In this paper, a color-wise attention network (CWAN) is proposed for
low-light image enhancement based on convolutional neural networks. Motivated
by the human visual system when looking at dark images, CWAN learns an
end-to-end mapping between low-light and enhanced images while searching for
any useful color cues in the low-light image to aid in the color enhancement
process. Once these regions are identified, CWAN attention will be mainly
focused to synthesize these local regions, as well as the global image. Both
quantitative and qualitative experiments on challenging datasets demonstrate
the advantages of our method in comparison with state-of-the-art methods.
Authors' comments: 8 pages, 9 figures
Yiyao Shi, Jian Wang, Xiangyang Xue
In this paper, a learning-free color constancy algorithm called the
Patch-wise Bright Pixels (PBP) is proposed. In this algorithm, an input image
is first downsampled and then cut equally into a few patches. After that,
according to the modified brightness of each patch, a proper fraction of
brightest pixels in the patch is selected. Finally, Gray World (GW)-based
methods are applied to the selected bright pixels to estimate the illuminant of
the scene. Experiments on NUS $8$-Camera Dataset show that the PBP algorithm
outperforms the state-of-the-art learning-free methods as well as a broad range
of learning-based ones. In particular, PBP processes a $1080$p image within two
milliseconds, which is hundreds of times faster than the existing learning-free
ones. Our algorithm offers a potential solution to the full-screen smart phones
whose screen-to-body ratio is $100$\%.
Authors' comments: 7 figures and 4 tables
Lu Wang, Jie Yang
Large-scale cross-modal hashing similarity retrieval has attracted more and
more attention in modern search applications such as search engines and
autopilot, showing great superiority in computation and storage. However,
current unsupervised cross-modal hashing methods still have some limitations:
(1)many methods relax the discrete constraints to solve the optimization
objective which may significantly degrade the retrieval performance;(2)most
existing hashing model project heterogenous data into a common latent space,
which may always lose sight of diversity in heterogenous data;(3)transforming
real-valued data point to binary codes always results in abundant loss of
information, producing the suboptimal continuous latent space. To overcome
above problems, in this paper, a novel Cluster-wise Unsupervised Hashing (CUH)
method is proposed. Specifically, CUH jointly performs the multi-view
clustering that projects the original data points from different modalities
into its own low-dimensional latent semantic space and finds the cluster
centroid points and the common clustering indicators in its own low-dimensional
space, and learns the compact hash codes and the corresponding linear hash
functions. An discrete optimization framework is developed to learn the unified
binary codes across modalities under the guidance cluster-wise code-prototypes.
The reasonableness and effectiveness of CUH is well demonstrated by
comprehensive experiments on diverse benchmark datasets.
Authors' comments: 13 pages, 26 figures
Pavel Sulimov, Elena Sukmanova, Roman Chereshnev, Attila Kertesz-Farkas
Training of deep models for classification tasks is hindered by local minima problems and vanishing gradients, while unsupervised layer-wise pretraining does not exploit information from class labels. Here, we propose a new regularization technique, called diversifying regularization (DR), which applies a penalty on hidden units at any layer if they obtain similar features for different types of data. For generative models, DR is defined as divergence over the variational posteriori distributions and included in the maximum likelihood estimation as a prior. Thus, DR includes class label information for greedy pretraining of deep belief networks which result in a better weight initialization for fine-tuning methods. On the other hand, for discriminative training of deep neural networks, DR is defined as a distance over the features and included in the learning objective. With our experimental tests, we show that DR can help the backpropagation to cope with vanishing gradient problems and to provide faster convergence and smaller generalization errors.
Yuhu Shan
Among the neural network compression techniques, knowledge distillation is an effective one which forces a simpler student network to mimic the output of a larger teacher network. However, most of such model distillation methods focus on the image-level classification task. Directly adapting these methods to the task of semantic segmentation only brings marginal improvements. In this paper, we propose a simple, yet effective knowledge representation referred to as pixel-wise feature similarities (PFS) to tackle the challenging distillation problem of semantic segmentation. The developed PFS encodes spatial structural information for each pixel location of the high-level convolutional features, which helps guide the distillation process in an easier way. Furthermore, a novel weighted pixel-level soft prediction imitation approach is proposed to enable the student network to selectively mimic the teacher network's output, according to their pixel-wise knowledge-gaps. Extensive experiments are conducted on the challenging datasets of Pascal VOC 2012, ADE20K and Pascal Context. Our approach brings significant performance improvements compared to several strong baselines and achieves new state-of-the-art results.
Shiyu Chang, Yang Zhang, Mo Yu, Tommi S. Jaakkola
Selection of input features such as relevant pieces of text has become a
common technique of highlighting how complex neural predictors operate. The
selection can be optimized post-hoc for trained models or incorporated directly
into the method itself (self-explaining). However, an overall selection does
not properly capture the multi-faceted nature of useful rationales such as pros
and cons for decisions. To this end, we propose a new game theoretic approach
to class-dependent rationalization, where the method is specifically trained to
highlight evidence supporting alternative conclusions. Each class involves
three players set up competitively to find evidence for factual and
counterfactual scenarios. We show theoretically in a simplified scenario how
the game drives the solution towards meaningful class-dependent rationales. We
evaluate the method in single- and multi-aspect sentiment classification tasks
and demonstrate that the proposed method is able to identify both factual
(justifying the ground truth label) and counterfactual (countering the ground
truth label) rationales consistent with human rationalization. The code for our
method is publicly available.
Authors' comments: Accepted by Neural Information Processing Systems (NeurIPS 2019),
Vancouver, Canada