Sebastián Donoso, Lei Jin, Alejandro Maass, Yixiao Qiao
We study directional mean dimension of $\mathbb{Z}^k$-actions (where $k$ is a
positive integer). On the one hand, we show that there is a
$\mathbb{Z}^2$-action whose directional mean dimension (considered as a
$[0,+\infty]$-valued function on the torus) is not continuous. On the other
hand, we prove that if a $\mathbb{Z}^k$-action is continuum-wise expansive,
then the values of its $(k-1)$-dimensional directional mean dimension are
bounded. This is a generalization (with a view towards Meyerovitch and
Tsukamoto's theorem on mean dimension and expansive multiparameter actions) of
a classical result due to Ma\~n\'e: Any compact metrizable space admitting an
expansive homeomorphism (with respect to a compatible metric) is
finite-dimensional.
Authors' comments: Comments welcome!
Weilun Wang, Wengang Zhou, Jianmin Bao, Dong Chen, Houqiang Li
Contrastive learning shows great potential in unpaired image-to-image
translation, but sometimes the translated results are in poor quality and the
contents are not preserved consistently. In this paper, we uncover that the
negative examples play a critical role in the performance of contrastive
learning for image translation. The negative examples in previous methods are
randomly sampled from the patches of different positions in the source image,
which are not effective to push the positive examples close to the query
examples. To address this issue, we present instance-wise hard Negative Example
Generation for Contrastive learning in Unpaired image-to-image Translation
(NEGCUT). Specifically, we train a generator to produce negative examples
online. The generator is novel from two perspectives: 1) it is instance-wise
which means that the generated examples are based on the input image, and 2) it
can generate hard negative examples since it is trained with an adversarial
loss. With the generator, the performance of unpaired image-to-image
translation is significantly improved. Experiments on three benchmark datasets
demonstrate that the proposed NEGCUT framework achieves state-of-the-art
performance compared to previous methods.
Authors' comments: Accepted by ICCV 2021
Haozhe Jia, Haoteng Tang, Guixiang Ma, Weidong Cai, Heng Huang, Liang Zhan, Yong Xia
Automated and accurate segmentation of the infected regions in computed tomography (CT) images is critical for the prediction of the pathological stage and treatment response of COVID-19. Several deep convolutional neural networks (DCNNs) have been designed for this task, whose performance, however, tends to be suppressed by their limited local receptive fields and insufficient global reasoning ability. In this paper, we propose a pixel-wise sparse graph reasoning (PSGR) module and insert it into a segmentation network to enhance the modeling of long-range dependencies for COVID-19 infected region segmentation in CT images. In the PSGR module, a graph is first constructed by projecting each pixel on a node based on the features produced by the segmentation backbone, and then converted into a sparsely-connected graph by keeping only K strongest connections to each uncertain pixel. The long-range information reasoning is performed on the sparsely-connected graph to generate enhanced features. The advantages of this module are two-fold: (1) the pixel-wise mapping strategy not only avoids imprecise pixel-to-node projections but also preserves the inherent information of each pixel for global reasoning; and (2) the sparsely-connected graph construction results in effective information retrieval and reduction of the noise propagation. The proposed solution has been evaluated against four widely-used segmentation models on three public datasets. The results show that the segmentation model equipped with our PSGR module can effectively segment COVID-19 infected regions in CT images, outperforming all other competing models.
Qin Wang, Jun Wei, Boyuan Wang, Zhen Li1, Sheng Wang, Shuguang Cu
Protein secondary structure prediction (PSSP) is essential for protein
function analysis. However, for low homologous proteins, the PSSP suffers from
insufficient input features. In this paper, we explicitly import external
self-supervised knowledge for low homologous PSSP under the guidance of
residue-wise profile fusion. In practice, we firstly demonstrate the
superiority of profile over Position-Specific Scoring Matrix (PSSM) for low
homologous PSSP. Based on this observation, we introduce the novel
self-supervised BERT features as the pseudo profile, which implicitly involves
the residue distribution in all native discovered sequences as the
complementary features. Further-more, a novel residue-wise attention is
specially designed to adaptively fuse different features (i.e.,original
low-quality profile, BERT based pseudo profile), which not only takes full
advantage of each feature but also avoids noise disturbance. Be-sides, the
feature consistency loss is proposed to accelerate the model learning from
multiple semantic levels. Extensive experiments confirm that our method
outperforms state-of-the-arts (i.e.,4.7%forextremely low homologous cases on
BC40 dataset).
Authors' comments: Accepted in IJCAI-21
Cong Liu, Chuang Zhang, Zhuoyi Yin, Xiaopeng Liu, Zhihong Xu
In fringe projection profilometry, the high-order harmonics information of non-sinusoidal fringes will lead to errors in the phase estimation. In order to solve this problem, a point-wise posterior phase estimation (PWPPE) method based on deep learning technique is proposed in this paper. The complex nonlinear mapping relationship between the multiple gray values and the sine / cosine value of the phase is constructed by using the feedforward neural network model. After the model training, it can estimate the phase values of each pixel location, and the accuracy is higher than the point-wise least-square (PWLS) method. To further verify the effectiveness of this method, a face mask is measured, the traditional PWLS method and the proposed PWPPE method are employed, respectively. The comparison results show that the traditional method is with periodic phase errors, while the proposed PWPPE method can effectively eliminate such phase errors caused by non-sinusoidal fringes.
Jinlei Hou, Yingying Zhang, Qiaoyong Zhong, Di Xie, Shiliang Pu, Hong Zhou
Reconstruction-based methods play an important role in unsupervised anomaly
detection in images. Ideally, we expect a perfect reconstruction for normal
samples and poor reconstruction for abnormal samples. Since the
generalizability of deep neural networks is difficult to control, existing
models such as autoencoder do not work well. In this work, we interpret the
reconstruction of an image as a divide-and-assemble procedure. Surprisingly, by
varying the granularity of division on feature maps, we are able to modulate
the reconstruction capability of the model for both normal and abnormal
samples. That is, finer granularity leads to better reconstruction, while
coarser granularity leads to poorer reconstruction. With proper granularity,
the gap between the reconstruction error of normal and abnormal samples can be
maximized. The divide-and-assemble framework is implemented by embedding a
novel multi-scale block-wise memory module into an autoencoder network.
Besides, we introduce adversarial learning and explore the semantic latent
representation of the discriminator, which improves the detection of subtle
anomaly. We achieve state-of-the-art performance on the challenging MVTec AD
dataset. Remarkably, we improve the vanilla autoencoder model by 10.1% in terms
of the AUROC score.
Authors' comments: accepted by ICCV 2021
David Bonet, Antonio Ortega, Javier Ruiz-Hidalgo, Sarath Shekkizhar
State-of-the-art neural network architectures continue to scale in size and
deliver impressive generalization results, although this comes at the expense
of limited interpretability. In particular, a key challenge is to determine
when to stop training the model, as this has a significant impact on
generalization. Convolutional neural networks (ConvNets) comprise
high-dimensional feature spaces formed by the aggregation of multiple channels,
where analyzing intermediate data representations and the model's evolution can
be challenging owing to the curse of dimensionality. We present channel-wise
DeepNNK (CW-DeepNNK), a novel channel-wise generalization estimate based on
non-negative kernel regression (NNK) graphs with which we perform local
polytope interpolation on low-dimensional channels. This method leads to
instance-based interpretability of both the learned data representations and
the relationship between channels. Motivated by our observations, we use
CW-DeepNNK to propose a novel early stopping criterion that (i) does not
require a validation set, (ii) is based on a task performance metric, and (iii)
allows stopping to be reached at different points for each channel. Our
experiments demonstrate that our proposed method has advantages as compared to
the standard criterion based on validation set performance.
Authors' comments: Submitted to APSIPA 2021
Andrés Gómez, Thomas Genevois, Jerome Lussereau, Christian Laugier
Object detection is a critical problem for the safe interaction between
autonomous vehicles and road users. Deep-learning methodologies allowed the
development of object detection approaches with better performance. However,
there is still the challenge to obtain more characteristics from the objects
detected in real-time. The main reason is that more information from the
environment's objects can improve the autonomous vehicle capacity to face
different urban situations. This paper proposes a new approach to detect static
and dynamic objects in front of an autonomous vehicle. Our approach can also
get other characteristics from the objects detected, like their position,
velocity, and heading. We develop our proposal fusing results of the
environment's interpretations achieved of YoloV3 and a Bayesian filter. To
demonstrate our proposal's performance, we asses it through a benchmark dataset
and real-world data obtained from an autonomous platform. We compared the
results achieved with another approach.
Authors' comments: 6 pages, 7 figures
Yuxin Chen, Ziqi Zhang, Chunfeng Yuan, Bing Li, Ying Deng, Weiming Hu
Graph convolutional networks (GCNs) have been widely used and achieved
remarkable results in skeleton-based action recognition. In GCNs, graph
topology dominates feature aggregation and therefore is the key to extracting
representative features. In this work, we propose a novel Channel-wise Topology
Refinement Graph Convolution (CTR-GC) to dynamically learn different topologies
and effectively aggregate joint features in different channels for
skeleton-based action recognition. The proposed CTR-GC models channel-wise
topologies through learning a shared topology as a generic prior for all
channels and refining it with channel-specific correlations for each channel.
Our refinement method introduces few extra parameters and significantly reduces
the difficulty of modeling channel-wise topologies. Furthermore, via
reformulating graph convolutions into a unified form, we find that CTR-GC
relaxes strict constraints of graph convolutions, leading to stronger
representation capability. Combining CTR-GC with temporal modeling modules, we
develop a powerful graph convolutional network named CTR-GCN which notably
outperforms state-of-the-art methods on the NTU RGB+D, NTU RGB+D 120, and
NW-UCLA datasets.
Authors' comments: Accepted to ICCV2021. Camera-ready version with supplementary
materials. Code is available at https://github.com/Uason-Chen/CTR-GCN
Napat Wanchaitanawong, Masayuki Tanaka, Takashi Shibata, Masatoshi Okutomi
The combined use of multiple modalities enables accurate pedestrian detection
under poor lighting conditions by using the high visibility areas from these
modalities together. The vital assumption for the combination use is that there
is no or only a weak misalignment between the two modalities. In general,
however, this assumption often breaks in actual situations. Due to this
assumption's breakdown, the position of the bounding boxes does not match
between the two modalities, resulting in a significant decrease in detection
accuracy, especially in regions where the amount of misalignment is large. In
this paper, we propose a multi-modal Faster-RCNN that is robust against large
misalignment. The keys are 1) modal-wise regression and 2) multi-modal IoU for
mini-batch sampling. To deal with large misalignment, we perform bounding box
regression for both the RPN and detection-head with both modalities. We also
propose a new sampling strategy called "multi-modal mini-batch sampling" that
integrates the IoU for both modalities. We demonstrate that the proposed
method's performance is much better than that of the state-of-the-art methods
for data with large misalignment through actual image experiments.
Authors' comments: Accepted by MVA2021
Hiroki Ito, MaungMaung AprilPyone, Hitoshi Kiya
Since production-level trained deep neural networks (DNNs) are of a great
business value, protecting such DNN models against copyright infringement and
unauthorized access is in a rising demand. However, conventional model
protection methods focused only the image classification task, and these
protection methods were never applied to semantic segmentation although it has
an increasing number of applications. In this paper, we propose to protect
semantic segmentation models from unauthorized access by utilizing block-wise
transformation with a secret key for the first time. Protected models are
trained by using transformed images. Experiment results show that the proposed
protection method allows rightful users with the correct key to access the
model to full capacity and deteriorate the performance for unauthorized users.
However, protected models slightly drop the segmentation performance compared
to non-protected models.
Authors' comments: To appear in 2021 International Workshop on Smart Info-Media Systems
in Asia (SISA 2021)
Xu Li, Xixin Wu, Hui Lu, Xunying Liu, Helen Meng
Existing approaches for anti-spoofing in automatic speaker verification (ASV)
still lack generalizability to unseen attacks. The Res2Net approach designs a
residual-like connection between feature groups within one block, which
increases the possible receptive fields and improves the system's detection
generalizability. However, such a residual-like connection is performed by a
direct addition between feature groups without channel-wise priority. We argue
that the information across channels may not contribute to spoofing cues
equally, and the less relevant channels are expected to be suppressed before
adding onto the next feature group, so that the system can generalize better to
unseen attacks. This argument motivates the current work that presents a novel,
channel-wise gated Res2Net (CG-Res2Net), which modifies Res2Net to enable a
channel-wise gating mechanism in the connection between feature groups. This
gating mechanism dynamically selects channel-wise features based on the input,
to suppress the less relevant channels and enhance the detection
generalizability. Three gating mechanisms with different structures are
proposed and integrated into Res2Net. Experimental results conducted on
ASVspoof 2019 logical access (LA) demonstrate that the proposed CG-Res2Net
significantly outperforms Res2Net on both the overall LA evaluation set and
individual difficult unseen attacks, which also outperforms other
state-of-the-art single systems, depicting the effectiveness of our method.
Authors' comments: Accepted to INTERSPEECH 2021
Laurent Lejeune, Raphael Sznitman
The ability to quickly annotate medical imaging data plays a critical role in training deep learning frameworks for segmentation. Doing so for image volumes or video sequences is even more pressing as annotating these is particularly burdensome. To alleviate this problem, this work proposes a new method to efficiently segment medical imaging volumes or videos using point-wise annotations only. This allows annotations to be collected extremely quickly and remains applicable to numerous segmentation tasks. Our approach trains a deep learning model using an appropriate Positive/Unlabeled objective function using sparse point-wise annotations. While most methods of this kind assume that the proportion of positive samples in the data is known a-priori, we introduce a novel self-supervised method to estimate this prior efficiently by combining a Bayesian estimation framework and new stopping criteria. Our method iteratively estimates appropriate class priors and yields high segmentation quality for a variety of object types and imaging modalities. In addition, by leveraging a spatio-temporal tracking framework, we regularize our predictions by leveraging the complete data volume. We show experimentally that our approach outperforms state-of-the-art methods tailored to the same problem.
Chong Tang, Wenda Li, Shelly Vishwakarma, Fangzhan Shi, Simon Julier, Kevin Chetty
Micro-Doppler signatures contain considerable information about target dynamics. However, the radar sensing systems are easily affected by noisy surroundings, resulting in uninterpretable motion patterns on the micro-Doppler spectrogram. Meanwhile, radar returns often suffer from multipath, clutter and interference. These issues lead to difficulty in, for example motion feature extraction, activity classification using micro Doppler signatures ($\mu$-DS), etc. In this paper, we propose a latent feature-wise mapping strategy, called Feature Mapping Network (FMNet), to transform measured spectrograms so that they more closely resemble the output from a simulation under the same conditions. Based on measured spectrogram and the matched simulated data, our framework contains three parts: an Encoder which is used to extract latent representations/features, a Decoder outputs reconstructed spectrogram according to the latent features, and a Discriminator minimizes the distance of latent features of measured and simulated data. We demonstrate the FMNet with six activities data and two experimental scenarios, and final results show strong enhanced patterns and can keep actual motion information to the greatest extent. On the other hand, we also propose a novel idea which trains a classifier with only simulated data and predicts new measured samples after cleaning them up with the FMNet. From final classification results, we can see significant improvements.
Zeyu Wu, Cheng Wang, Weidong Liu
In this paper, we estimate the high dimensional precision matrix under the
weak sparsity condition where many entries are nearly zero. We revisit the
sparse column-wise inverse operator (SCIO) estimator \cite{liu2015fast} and
derive its general error bounds under the weak sparsity condition. A unified
framework is established to deal with various cases including the heavy-tailed
data, the non-paranormal data, and the matrix variate data. These new methods
can achieve the same convergence rates as the existing methods and can be
implemented efficiently.
Authors' comments: 29 pages, 5 figures
Michael C. Cushing, Adam C. Schneider, J. Davy Kirkpatrick, Caroline V. Morley, Mark S. Marley, Christopher R. Gelino, Gregory N. Mace, Edward L. Wright et al.
We present a Hubble Space Telescope/Wide-Field Camera 3 near infrared
spectrum of the archetype Y dwarf WISEP 182831.08+265037.8. The spectrum covers
the 0.9-1.7 um wavelength range at a resolving power of lambda/Delta lambda
~180 and is a significant improvement over the previously published spectrum
because it covers a broader wavelength range and is uncontaminated by light
from a background star. The spectrum is unique for a cool brown dwarf in that
the flux peaks in the Y, J, and H band are of near equal intensity in units of
f_lambda. We fail to detect any absorption bands of NH_3 in the spectrum, in
contrast to the predictions of chemical equilibrium models, but tentatively
identify CH_4 as the carrier of an unknown absorption feature centered at 1.015
um. Using previously published ground- and spaced-based photometry, and using a
Rayleigh Jeans tail to account for flux emerging longward of 4.5 um, we compute
a bolometric luminosity of log (L_bol/L_sun)=-6.50+-0.02 which is significantly
lower than previously published results. Finally, we compare the spectrum and
photometry to two sets of atmospheric models and find that best overall match
to the observed properties of WISEP 182831.08+265037.8 is a ~1 Gyr old binary
composed of two T_eff~325 K, ~5 M_Jup brown dwarfs with subsolar [C/O] ratios.
Authors' comments: Accepted for publication in the Astrophysical Journal
Yuming Zhang, Davide Cucci, Roberto Molinari, Stéphane Guerrier
The increased use of low-cost gyroscopes within inertial sensors for
navigation purposes, among others, has brought to the development of a
considerable amount of research in improving their measurement precision. Aside
from developing methods that allow to model and account for the deterministic
and stochastic components that contribute to the measurement errors of these
devices, an approach that has been put forward in recent years is to make use
of arrays of such sensors in order to combine their measurements thereby
reducing the impact of individual sensor noise. Nevertheless combining these
measurements is not straightforward given the complex stochastic nature of
these errors and, although some solutions have been suggested, these are
limited to certain specific settings which do not allow to achieve solutions in
more general and common circumstances. Hence, in this work we put forward a
non-parametric method that makes use of the wavelet cross-covariance at
different scales to combine the measurements coming from an array of gyroscopes
in order to deliver an optimal measurement signal without needing any
assumption on the processes underlying the individual error signals. We also
study an appropriate non-parametric approach for the estimation of the
asymptotic covariance matrix of the wavelet cross-covariance estimator which
has important applications beyond the scope of this work. The theoretical
properties of the proposed approach are studied and are supported by
simulations and real applications, indicating that this method represents an
appropriate and general tool for the construction of optimal virtual signals
that are particularly relevant for arrays of gyroscopes. Moreover, our results
can support the creation of optimal signals for other types of inertial sensors
other than gyroscopes as well as for redundant measurements in other domains
other than navigation.
Authors' comments: 18 pages, 10 figures
Meiling Fang, Naser Damer, Fadi Boutros, Florian Kirchbuchner, Arjan Kuijper
Iris presentation attack detection (PAD) plays a vital role in iris
recognition systems. Most existing CNN-based iris PAD solutions 1) perform only
binary label supervision during the training of CNNs, serving global
information learning but weakening the capture of local discriminative
features, 2) prefer the stacked deeper convolutions or expert-designed
networks, raising the risk of overfitting, 3) fuse multiple PAD systems or
various types of features, increasing difficulty for deployment on mobile
devices. Hence, we propose a novel attention-based deep pixel-wise binary
supervision (A-PBS) method. Pixel-wise supervision is first able to capture the
fine-grained pixel/patch-level cues. Then, the attention mechanism guides the
network to automatically find regions that most contribute to an accurate PAD
decision. Extensive experiments are performed on LivDet-Iris 2017 and three
other publicly available databases to show the effectiveness and robustness of
proposed A-PBS methods. For instance, the A-PBS model achieves an HTER of 6.50%
on the IIITD-WVU database outperforming state-of-the-art methods.
Authors' comments: To appear at the 2021 International Joint Conference on Biometrics
(IJCB 2021)
Xianjing Liu, Bo Li, Esther Bron, Wiro Niessen, Eppo Wolvius, Gennady Roshchupkin
Confounding bias is a crucial problem when applying machine learning to
practice, especially in clinical practice. We consider the problem of learning
representations independent to multiple biases. In literature, this is mostly
solved by purging the bias information from learned representations. We however
expect this strategy to harm the diversity of information in the
representation, and thus limiting its prospective usage (e.g., interpretation).
Therefore, we propose to mitigate the bias while keeping almost all information
in the latent representations, which enables us to observe and interpret them
as well. To achieve this, we project latent features onto a learned vector
direction, and enforce the independence between biases and projected features
rather than all learned features. To interpret the mapping between projected
features and input data, we propose projection-wise disentangling: a sampling
and reconstruction along the learned vector direction. The proposed method was
evaluated on the analysis of 3D facial shape and patient characteristics
(N=5011). Experiments showed that this conceptually simple method achieved
state-of-the-art fair prediction performance and interpretability, showing its
great potential for clinical applications.
Authors' comments: Accepted at MICCAI 2021
Yan Liu, Zheng Li, Lin Li, Qingyang Hong
This paper proposes a multi-task learning network with phoneme-aware and channel-wise attentive learning strategies for text-dependent Speaker Verification (SV). In the proposed structure, the frame-level multi-task learning along with the segment-level adversarial learning is adopted for speaker embedding extraction. The phoneme-aware attentive pooling is exploited on frame-level features in the main network for speaker classifier, with the corresponding posterior probability for the phoneme distribution in the auxiliary subnet. Further, the introduction of Squeeze and Excitation (SE-block) performs dynamic channel-wise feature recalibration, which improves the representational ability. The proposed method exploits speaker idiosyncrasies associated with pass-phrases, and is further improved by the phoneme-aware attentive pooling and SE-block from temporal and channel-wise aspects, respectively. The experiments conducted on RSR2015 Part 1 database confirm that the proposed system achieves outstanding results for textdependent SV.