Pourya Shamsolmoali, Masoumeh Zareapoor, Eric Granger, Huiyu Zhou
Skin lesion detection in dermoscopic images is essential in the accurate and early diagnosis of skin cancer by a computerized apparatus. Current skin lesion segmentation approaches show poor performance in challenging circumstances such as indistinct lesion boundaries, low contrast between the lesion and the surrounding area, or heterogeneous background that causes over/under segmentation of the skin lesion. To accurately recognize the lesion from the neighboring regions, we propose a dilated scale-wise feature fusion network based on convolution factorization. Our network is designed to simultaneously extract features at different scales which are systematically fused for better detection. The proposed model has satisfactory accuracy and efficiency. Various experiments for lesion segmentation are performed along with comparisons with the state-of-the-art models. Our proposed model consistently showcases state-of-the-art results.
Yiheng Liu, Enjie Ge, Mengshen He, Zhengliang Liu, Shijie Zhao, Xintao Hu, Dajiang Zhu, Tianming Liu et al.
Using deep learning models to recognize functional brain networks (FBNs) in
functional magnetic resonance imaging (fMRI) has been attracting increasing
interest recently. However, most existing work focuses on detecting static FBNs
from entire fMRI signals, such as correlation-based functional connectivity.
Sliding-window is a widely used strategy to capture the dynamics of FBNs, but
it is still limited in representing intrinsic functional interactive dynamics
at each time step. And the number of FBNs usually need to be set manually. More
over, due to the complexity of dynamic interactions in brain, traditional
linear and shallow models are insufficient in identifying complex and spatially
overlapped FBNs across each time step. In this paper, we propose a novel
Spatial and Channel-wise Attention Autoencoder (SCAAE) for discovering FBNs
dynamically. The core idea of SCAAE is to apply attention mechanism to FBNs
construction. Specifically, we designed two attention modules: 1) spatial-wise
attention (SA) module to discover FBNs in the spatial domain and 2) a
channel-wise attention (CA) module to weigh the channels for selecting the FBNs
automatically. We evaluated our approach on ADHD200 dataset and our results
indicate that the proposed SCAAE method can effectively recover the dynamic
changes of the FBNs at each fMRI time step, without using sliding windows. More
importantly, our proposed hybrid attention modules (SA and CA) do not enforce
assumptions of linearity and independence as previous methods, and thus provide
a novel approach to better understanding dynamic functional brain networks.
Authors' comments: 12 pages,6 figures, submitted to 36th Conference on Neural
Information Processing Systems (NeurIPS 2022)
Sean McBane, Youngsoo Choi, Karen Willcox
Lattice-like structures can provide a combination of high stiffness with
light weight that is useful in many applications, but a resolved finite element
mesh of such structures results in a computationally expensive discretization.
This computational expense may be particularly burdensome in many-query
applications, such as optimization. We develop a stress-constrained topology
optimization method for lattice-like structures that uses component-wise
reduced order models as a cheap surrogate, providing accurate computation of
stress fields while greatly reducing run time relative to a full order model.
We demonstrate the ability of our method to produce large reductions in mass
while respecting a constraint on the maximum stress in a pair of test problems.
The ROM methodology provides a speedup of about 150x in forward solves compared
to full order static condensation and provides a relative error of less than 5%
in the relaxed stress.
Authors' comments: 25 pages, 9 figure; submitted to Computer Methods in Applied
Mechanics and Engineering
Junkang Wei, Jin Xiao, Siyuan Chen, Licheng Zong, Xin Gao, Yu Li
The rapid growth in the number of experimental and predicted protein
structures and more complicated protein structures challenge users in
computational biology for utilizing the structural information and protein
surface property representation. Recently, AlphaFold2 released the
comprehensive proteome of various species, and protein surface property
representation plays a crucial role in protein-molecule interaction prediction
such as protein-protein interaction, protein-nucleic acid interaction, and
protein-compound interaction. Here, we proposed the first comprehensive
database, namely ProNet DB, which incorporates multiple protein surface
representations and RNA-binding landscape for more than 326,175 protein
structures covering 16 model organism proteomes from AlphaFold Protein
Structure Database (AlphaFold DB) and experimentally validated protein
structures deposited in Protein Data Bank (PDB). For each protein, we provided
the original protein structure, surface property representation including
hydrophobicity, charge distribution, hydrogen bond, interacting face, and
RNA-binding landscape such as RNA binding sites and RNA binding preference. To
interpret protein surface property representation and RNA binding landscape
intuitively, we also integrate Mol* and Online 3D Viewer to visualize the
representation on the protein surface. The pre-computed features are available
for the users instantaneously and boost computational biology development
including molecular mechanism exploration, geometry-based drug discovery and
novel therapeutics development. The server is now available on
https://proj.cse.cuhk.edu.hk/aihlab/pronet/.
Authors' comments: 12 pages, 6 figures
Menno van den Hout, Sjoerd van der Heide, Sebastiaan Goossens, Chigo Okonkwo
A predistorter for transmitter nonlinearities is applied to the evaluation of
a geometrically shaped constellation, such that constellation points are
transmitted correctly during the evaluation of the geometrically shaped
constellation.
Authors' comments: Accepted for presentation at SPPCom 2022
Arya Bangun, Oleh Melnyk, Benjamin März, Benedikt Diederichs, Alexander Clausen, Dieter Weber, Frank Filbir, Knut MÜller-Caspary
We propose algorithms based on an optimisation method for inverse multislice ptychography in, e.g. electron microscopy. The multislice method is widely used to model the interaction between relativistic electrons and thick specimens. Since only the intensity of diffraction patterns can be recorded, the challenge in applying inverse multislice ptychography is to uniquely reconstruct the electrostatic potential in each slice up to some ambiguities. In this conceptual study, we show that a unique separation of atomic layers for simulated data is possible when considering a low acceleration voltage. We also introduce an adaptation for estimating the illuminating probe. For the sake of practical application, we finally present slice reconstructions using experimental 4D scanning transmission electron microscopy (STEM) data.
Zhaofeng Si, Honggang Qi, Xiaoyu Song
Convolutional neural networks are prevailing in deep learning tasks. However,
they suffer from massive cost issues when working on mobile devices. Network
pruning is an effective method of model compression to handle such problems.
This paper presents a novel structured network pruning method with auxiliary
gating structures which assigns importance marks to blocks in backbone network
as a criterion when pruning. Block-wise pruning is then realized by proposed
voting strategy, which is different from prevailing methods who prune a model
in small granularity like channel-wise. We further develop a three-stage
training scheduling for the proposed architecture incorporating knowledge
distillation for better performance. Our experiments demonstrate that our
method can achieve state-of-the-arts compression performance for the
classification tasks. In addition, our approach can integrate synergistically
with other pruning methods by providing pretrained models, thus achieving a
better performance than the unpruned model with over 93\% FLOPs reduced.
Authors' comments: 7 pages, 7 figures, 2 tables
Alberto Mellone, Giordano Scarciotti
We address the path-wise control of systems described by a set of nonlinear stochastic differential equations. For this class of systems, we introduce a notion of stochastic relative degree and a change of coordinates which transforms the dynamics to a stochastic normal form. The normal form is instrumental for the design of a state-feedback control which linearises and makes the dynamics deterministic. We observe that this control is idealistic, i.e. it is not practically implementable because it employs a feedback of the Brownian motion (which is never available) to cancel the noise. Using the idealistic control as a starting point, we introduce a hybrid control architecture which achieves \emph{practical} path-wise control. This hybrid controller uses measurements of the state to perform periodic compensations for the noise contribution to the dynamics. We prove that the hybrid controller retrieves the idealistic performances in the limit as the compensating period approaches zero. We address the problem of asymptotic output tracking, solving it in the idealistic and in the practical framework. We finally validate the theory by means of a numerical example.
Yajing Feng, Qian Hu, Zhenzhou Tang
Vacant parking space (VPS) prediction is one of the key issues of intelligent parking guidance systems. Accurately predicting VPS information plays a crucial role in intelligent parking guidance systems, which can help drivers find parking space quickly, reducing unnecessary waste of time and excessive environmental pollution. Through the simple analysis of historical data, we found that there not only exists a obvious temporal correlation in each parking lot, but also a clear spatial correlation between different parking lots. In view of this, this paper proposed a graph data-based model ST-GBGRU (Spatial-Temporal Graph Based Gated Recurrent Unit), the number of VPSs can be predicted both in short-term (i.e., within 30 min) and in long-term (i.e., over 30min). On the one hand, the temporal correlation of historical VPS data is extracted by GRU, on the other hand, the spatial correlation of historical VPS data is extracted by GCN inside GRU. Two prediction methods, namely direct prediction and iterative prediction, are combined with the proposed model. Finally, the prediction model is applied to predict the number VPSs of 8 public parking lots in Santa Monica. The results show that in the short-term and long-term prediction tasks, ST-GBGRU model can achieve high accuracy and have good application prospects.
E. Glikman, M. Lacy, S. LaMassa, C. Bradley, S. G. Djorgovski, T. Urrutia, E. L. Gates, M. J. Graham et al.
We present a highly complete sample of broad-line (Type 1) QSOs out to z ~ 3
selected by their mid-infrared colors, a method that is minimally affected by
dust reddening. We remove host galaxy emission from the spectra and fit for
excess reddening in the residual QSOs, resulting in a Gaussian distribution of
colors for unreddened (blue) QSOs, with a tail extending toward heavily
reddened (red) QSOs, defined as having E(B - V) > 0.25. This radio-independent
selection method enables us to compare red and blue QSO radio properties in
both the FIRST (1.4 GHz) and VLASS (2 - 4 GHz) surveys. Consistent with recent
results from optically-selected QSOs from SDSS, we find that red QSOs have a
significantly higher detection fraction and a higher fraction of compact radio
morphologies at both frequencies. We employ radio stacking to investigate the
median radio properties of the QSOs including those that are undetected in
FIRST and VLASS, finding that red QSOs have significantly brighter radio
emission and steeper radio spectral slopes compared with blue QSOs. Finally, we
find that the incidence of red QSOs is strongly luminosity dependent, where red
QSOs make up > 40% of all QSOs at the highest luminosities. Overall, red QSOs
comprise ~ 40% of higher luminosity QSOs, dropping to only a few percent at
lower luminosities. Furthermore, red QSOs make up a larger percentage of the
radio-detected QSO population. We argue that dusty AGN-driven winds are
responsible for both the obscuration as well as excess radio emission seen in
red QSOs.
Authors' comments: Accepted for publication in ApJ; 35 pages, 24 Figures,6 Tables
Sarem Seitz
Gaussian Processes (GPs) are a versatile and popular method in Bayesian Machine Learning. A common modification are Sparse Variational Gaussian Processes (SVGPs) which are well suited to deal with large datasets. While GPs allow to elegantly deal with Gaussian-distributed target variables in closed form, their applicability can be extended to non-Gaussian data as well. These extensions are usually impossible to treat in closed form and hence require approximate solutions. This paper proposes to approximate the inverse-link function, which is necessary when working with non-Gaussian likelihoods, by a piece-wise constant function. It will be shown that this yields a closed form solution for the corresponding SVGP lower bound. In addition, it is demonstrated how the piece-wise constant function itself can be optimized, resulting in an inverse-link function that can be learnt from the data at hand.
Chen Tang, Haoyu Zhai, Kai Ouyang, Zhi Wang, Yifei Zhu, Wenwu Zhu
Conventional model quantization methods use a fixed quantization scheme to different data samples, which ignores the inherent "recognition difficulty" differences between various samples. We propose to feed different data samples with varying quantization schemes to achieve a data-dependent dynamic inference, at a fine-grained layer level. However, enabling this adaptive inference with changeable layer-wise quantization schemes is challenging because the combination of bit-widths and layers is growing exponentially, making it extremely difficult to train a single model in such a vast searching space and use it in practice. To solve this problem, we present the Arbitrary Bit-width Network (ABN), where the bit-widths of a single deep network can change at runtime for different data samples, with a layer-wise granularity. Specifically, first we build a weight-shared layer-wise quantizable "super-network" in which each layer can be allocated with multiple bit-widths and thus quantized differently on demand. The super-network provides a considerably large number of combinations of bit-widths and layers, each of which can be used during inference without retraining or storing myriad models. Second, based on the well-trained super-network, each layer's runtime bit-width selection decision is modeled as a Markov Decision Process (MDP) and solved by an adaptive inference strategy accordingly. Experiments show that the super-network can be built without accuracy degradation, and the bit-widths allocation of each layer can be adjusted to deal with various inputs on the fly. On ImageNet classification, we achieve 1.1% top1 accuracy improvement while saving 36.2% BitOps.
Sidike Paheding, Abel A. Reyes, Anush Kasaragod, Thomas Oommen
Hyperspectral image (HSI) classification is the most vibrant area of research
in the hyperspectral community due to the rich spectral information contained
in HSI can greatly aid in identifying objects of interest. However, inherent
non-linearity between materials and the corresponding spectral profiles brings
two major challenges in HSI classification: interclass similarity and
intraclass variability. Many advanced deep learning methods have attempted to
address these issues from the perspective of a region/patch-based approach,
instead of a pixel-based alternate. However, the patch-based approaches
hypothesize that neighborhood pixels of a target pixel in a fixed spatial
window belong to the same class. And this assumption is not always true. To
address this problem, we herein propose a new deep learning architecture,
namely Gramian Angular Field encoded Neighborhood Attention U-Net (GAF-NAU),
for pixel-based HSI classification. The proposed method does not require
regions or patches centered around a raw target pixel to perform 2D-CNN based
classification, instead, our approach transforms 1D pixel vector in HSI into 2D
angular feature space using Gramian Angular Field (GAF) and then embed it to a
new neighborhood attention network to suppress irrelevant angular feature while
emphasizing on pertinent features useful for HSI classification task.
Evaluation results on three publicly available HSI datasets demonstrate the
superior performance of the proposed model.
Authors' comments: 8 Pages, 9 Figures
Xun Gong, Yizhou Lu, Zhikai Zhou, Yanmin Qian
Accent variability has posed a huge challenge to automatic speech
recognition~(ASR) modeling. Although one-hot accent vector based adaptation
systems are commonly used, they require prior knowledge about the target accent
and cannot handle unseen accents. Furthermore, simply concatenating accent
embeddings does not make good use of accent knowledge, which has limited
improvements. In this work, we aim to tackle these problems with a novel
layer-wise adaptation structure injected into the E2E ASR model encoder. The
adapter layer encodes an arbitrary accent in the accent space and assists the
ASR model in recognizing accented speech. Given an utterance, the adaptation
structure extracts the corresponding accent information and transforms the
input acoustic feature into an accent-related feature through the linear
combination of all accent bases. We further explore the injection position of
the adaptation layer, the number of accent bases, and different types of accent
bases to achieve better accent adaptation. Experimental results show that the
proposed adaptation structure brings 12\% and 10\% relative word error
rate~(WER) reduction on the AESRC2020 accent dataset and the Librispeech
dataset, respectively, compared to the baseline.
Authors' comments: Accepted by Interspeech2021
Shana Moothedath, Namrata Vaswani
This work develops a provably accurate fully-decentralized alternating projected gradient descent (GD) algorithm for recovering a low rank (LR) matrix from mutually independent projections of each of its columns, in a fast and communication-efficient fashion. To our best knowledge, this work is the first attempt to develop a provably correct decentralized algorithm (i) for any problem involving the use of an alternating projected GD algorithm; (ii) and for any problem in which the constraint set to be projected to is a non-convex set.
Gen Luo, Yiyi Zhou, Xiaoshuai Sun, Yan Wang, Liujuan Cao, Yongjian Wu, Feiyue Huang, Rongrong Ji
Despite the exciting performance, Transformer is criticized for its excessive parameters and computation cost. However, compressing Transformer remains as an open problem due to its internal complexity of the layer designs, i.e., Multi-Head Attention (MHA) and Feed-Forward Network (FFN). To address this issue, we introduce Group-wise Transformation towards a universal yet lightweight Transformer for vision-and-language tasks, termed as LW-Transformer. LW-Transformer applies Group-wise Transformation to reduce both the parameters and computations of Transformer, while also preserving its two main properties, i.e., the efficient attention modeling on diverse subspaces of MHA, and the expanding-scaling feature transformation of FFN. We apply LW-Transformer to a set of Transformer-based networks, and quantitatively measure them on three vision-and-language tasks and six benchmark datasets. Experimental results show that while saving a large number of parameters and computations, LW-Transformer achieves very competitive performance against the original Transformer networks for vision-and-language tasks. To examine the generalization ability, we also apply our optimization strategy to a recently proposed image Transformer called Swin-Transformer for image classification, where the effectiveness can be also confirmed
Saira Soomro, Arjumand Bano Soomro, Tarique Bhatti, Yonis Gulzar
Blended learning (BL) is a recent tread among many options that can best fit
learners' needs, regardless of time and place. This study aimed to discover
students' perceptions of BL and the challenges faced by them while using
technology. This quantitative study used data gathered from 300 students
enrolled in four public universities in the Sindh province of Pakistan. the
finding shows that students were compatible with the use of technology, and it
has a positive effect on their academic experience. The study also showed that
the use of technology encourages peer collaboration. The challenges found
include: neither teacher support nor a training program was provided to the
students for the course which needed to shift from a traditional face to face
paradigm to a blended format, a lake of space lies with skills in a laboratory
assistants for the courses with a blended format and as shortage of high tech
computer laboratories / computer units to run these courses. Therefore, it is
recommended that the authorities must develop and incorporate a comprehensive
mechanism for the effective implementation of BL in the learning
teaching-learning process heads of the departments should also provide
additional computing infrastructure to their departments.
Authors' comments: 5 pages
Zhaowei Cai, Gukyeong Kwon, Avinash Ravichandran, Erhan Bas, Zhuowen Tu, Rahul Bhotika, Stefano Soatto
In this paper, we study the challenging instance-wise vision-language tasks, where the free-form language is required to align with the objects instead of the whole image. To address these tasks, we propose X-DETR, whose architecture has three major components: an object detector, a language encoder, and vision-language alignment. The vision and language streams are independent until the end and they are aligned using an efficient dot-product operation. The whole network is trained end-to-end, such that the detector is optimized for the vision-language tasks instead of an off-the-shelf component. To overcome the limited size of paired object-language annotations, we leverage other weak types of supervision to expand the knowledge coverage. This simple yet effective architecture of X-DETR shows good accuracy and fast speeds for multiple instance-wise vision-language tasks, e.g., 16.4 AP on LVIS detection of 1.2K categories at ~20 frames per second without using any LVIS annotation during training.
Wenjing Chen, Ruida Zhou, Chao Tian, Cong Shen
We analyze the performance of the Borda counting algorithm in a non-parametric model. The algorithm needs to utilize probabilistic rankings of the items within $m$-sized subsets to accurately determine which items are the overall top-$k$ items in a total of $n$ items. The Borda counting algorithm simply counts the cumulative scores for each item from these partial ranking observations. This generalizes a previous work of a similar nature by Shah et al. using probabilistic pairwise comparison data. The performance of the Borda counting algorithm critically depends on the associated score separation $\Delta_k$ between the $k$-th item and the $(k+1)$-th item. Specifically, we show that if $\Delta_k$ is greater than certain value, then the top-$k$ items selected by the algorithm is asymptotically accurate almost surely; if $\Delta_k$ is below certain value, then the result will be inaccurate with a constant probability. In the special case of $m=2$, i.e., pairwise comparison, the resultant bound is tighter than that given by Shah et al., leading to a reduced gap between the error probability upper and lower bounds. These results are further extended to the approximate top-$k$ selection setting. Numerical experiments demonstrate the effectiveness and accuracy of the Borda counting algorithm, compared with the spectral MLE-based algorithm, particularly when the data does not necessarily follow an assumed parametric model.
Jianan Wang, Guansong Lu, Hang Xu, Zhenguo Li, Chunjing Xu, Yanwei Fu
Existing text-guided image manipulation methods aim to modify the appearance
of the image or to edit a few objects in a virtual or simple scenario, which is
far from practical application. In this work, we study a novel task on
text-guided image manipulation on the entity level in the real world. The task
imposes three basic requirements, (1) to edit the entity consistent with the
text descriptions, (2) to preserve the text-irrelevant regions, and (3) to
merge the manipulated entity into the image naturally. To this end, we propose
a new transformer-based framework based on the two-stage image synthesis
method, namely \textbf{ManiTrans}, which can not only edit the appearance of
entities but also generate new entities corresponding to the text guidance. Our
framework incorporates a semantic alignment module to locate the image regions
to be manipulated, and a semantic loss to help align the relationship between
the vision and language. We conduct extensive experiments on the real datasets,
CUB, Oxford, and COCO datasets to verify that our method can distinguish the
relevant and irrelevant regions and achieve more precise and flexible
manipulation compared with baseline methods. The project homepage is
\url{https://jawang19.github.io/manitrans}.
Authors' comments: Accepted by CVPR2022 (Oral)