Liang Li
Gamma-ray bursts (GRBs) exhibit a diversity of spectra. Several spectral
models (e.g., Band, cutoff power-law, and blackbody) and their hybrid versions
(e.g., Band+blackbody) have been widely used to fit the observed GRB spectra.
Here, we attempt to collect all the bursts detected by {\it Fermi}-GBM with
known redshifts from July 2008 to May 2022, motivated to (i) provide a
parameter catalog independent from the official \emph{Fermi}/GBM team and (ii)
achieve a ``clean" model-based GRB spectral-energy correlation analysis. A
nearly complete GRB sample was created, containing 153 such bursts (136 long
gamma-ray bursts and 17 short gamma-ray bursts). Using the sample and by
performing detailed spectral analysis and model comparisons, we investigate two
GRB spectral-energy correlations: the cosmological rest-frame peak energy
($E_{\rm p,z}$) of the $\nu F_\nu$ prompt emission spectrum correlated with (i)
the isotropic-bolometric-equivalent emission energy $E_{\gamma, \rm iso}$ (the
Amati relation), and (ii) the isotropic-bolometric-equivalent peak luminosity
$L_{\rm p, iso}$ (the Yonetoku relation). From a linear regression analysis, a
tight correlation between $E_{\rm p,z}$ and $E_{\gamma, \rm iso}$ (and
$L_{\gamma,\rm iso}$) is found for both the Band-like and CPL-like bursts. More
interestingly, the CPL-like bursts do not fall on the Band-like burst Amati and
Yonetoku correlations, suggesting distinct radiation processes, and pointing
towards the fact that these spectral-energy correlations are tightly reliant on
the model-wise properties.
Authors' comments: 35 pages, 7 figures including 24 panels, 7 tables, accepted for
publication in The Astrophysical Journal Supplement Series
Jongho Park, Jinchao Xu, Xiaofeng Xu
In this paper, we propose a novel algorithm called Neuron-wise Parallel
Subspace Correction Method (NPSC) for the finite neuron method that
approximates numerical solutions of partial differential equations (PDEs) using
neural network functions. Despite extremely extensive research activities in
applying neural networks for numerical PDEs, there is still a serious lack of
effective training algorithms that can achieve adequate accuracy, even for
one-dimensional problems. Based on recent results on the spectral properties of
linear layers and landscape analysis for single neuron problems, we develop a
special type of subspace correction method that optimizes the linear layer and
each neuron in the nonlinear layer separately. An optimal preconditioner that
resolves the ill-conditioning of the linear layer is presented for
one-dimensional problems, so that the linear layer is trained in a uniform
number of iterations with respect to the number of neurons. In each single
neuron problem, a good local minimum that avoids flat energy regions is found
by a superlinearly convergent algorithm. Numerical experiments on function
approximation problems and PDEs demonstrate better performance of the proposed
method than other gradient-based methods.
Authors' comments: 24 pages, 6 figures
Amit Daniely, Elad Granot
We investigate the sample complexity of bounded two-layer neural networks
using different activation functions.
In particular, we consider the class
$$ \mathcal{H} = \left\{\textbf{x}\mapsto \langle \textbf{v}, \sigma \circ
W\textbf{b} + \textbf{b} \rangle : \textbf{b}\in\mathbb{R}^d, W \in
\mathbb{R}^{\mathcal{T}\times d}, \textbf{v} \in
\mathbb{R}^{\mathcal{T}}\right\} $$
where the spectral norm of $W$ and $\textbf{v}$ is bounded by $O(1)$, the
Frobenius norm of $W$ is bounded from its initialization by $R > 0$, and
$\sigma$ is a Lipschitz activation function.
We prove that if $\sigma$ is element-wise, then the sample complexity of
$\mathcal{H}$ has only logarithmic dependency in width and that this complexity
is tight, up to logarithmic factors.
We further show that the element-wise property of $\sigma$ is essential for a
logarithmic dependency bound in width, in the sense that there exist
non-element-wise activation functions whose sample complexity is linear in
width, for widths that can be up to exponential in the input dimension.
For the upper bound, we use the recent approach for norm-based bounds named
Approximate Description Length (ADL) by arXiv:1910.05697.
We further develop new techniques and tools for this approach that will
hopefully inspire future works.
Authors' comments: 13 pages
Kang Fu, Jianwei Hu, Seydou Keita, Hao Liu
The stochastic block model is a popular tool for detecting community structures in network data. Detecting the difference between two community structures is an important issue for stochastic block models. However, the two-sample test has been a largely under-explored domain, and too little work has been devoted to it. In this article, based on the maximum entry--wise deviation of the two centered and rescaled adjacency matrices, we propose a novel test statistic to test two samples of stochastic block models. We prove that the null distribution of the proposed test statistic converges in distribution to a Gumbel distribution, and we show the change of the two samples from stochastic block models can be tested via the proposed method. Then, we show that the proposed test has an asymptotic power guarantee against alternative models. One noticeable advantage of the proposed test statistic is that the number of communities can be allowed to grow linearly up to a logarithmic factor. Further, we extend the proposed method to the degree-corrected stochastic block model. Both simulation studies and real-world data examples indicate that the proposed method works well.
Mu Chen, Zhedong Zheng, Yi Yang, Tat-Seng Chua
Unsupervised Domain Adaptation (UDA) aims to enhance the generalization of the learned model to other domains. The domain-invariant knowledge is transferred from the model trained on labeled source domain, e.g., video game, to unlabeled target domains, e.g., real-world scenarios, saving annotation expenses. Existing UDA methods for semantic segmentation usually focus on minimizing the inter-domain discrepancy of various levels, e.g., pixels, features, and predictions, for extracting domain-invariant knowledge. However, the primary intra-domain knowledge, such as context correlation inside an image, remains underexplored. In an attempt to fill this gap, we propose a unified pixel- and patch-wise self-supervised learning framework, called PiPa, for domain adaptive semantic segmentation that facilitates intra-image pixel-wise correlations and patch-wise semantic consistency against different contexts. The proposed framework exploits the inherent structures of intra-domain images, which: (1) explicitly encourages learning the discriminative pixel-wise features with intra-class compactness and inter-class separability, and (2) motivates the robust feature learning of the identical patch against different contexts or fluctuations. Extensive experiments verify the effectiveness of the proposed method, which obtains competitive accuracy on the two widely-used UDA benchmarks, i.e., 75.6 mIoU on GTA to Cityscapes and 68.2 mIoU on Synthia to Cityscapes. Moreover, our method is compatible with other UDA approaches to further improve the performance without introducing extra parameters.
Stephanie Schoch, Haifeng Xu, Yangfeng Ji
Data valuation, or the valuation of individual datum contributions, has seen
growing interest in machine learning due to its demonstrable efficacy for tasks
such as noisy label detection. In particular, due to the desirable axiomatic
properties, several Shapley value approximation methods have been proposed. In
these methods, the value function is typically defined as the predictive
accuracy over the entire development set. However, this limits the ability to
differentiate between training instances that are helpful or harmful to their
own classes. Intuitively, instances that harm their own classes may be noisy or
mislabeled and should receive a lower valuation than helpful instances. In this
work, we propose CS-Shapley, a Shapley value with a new value function that
discriminates between training instances' in-class and out-of-class
contributions. Our theoretical analysis shows the proposed value function is
(essentially) the unique function that satisfies two desirable properties for
evaluating data values in classification. Further, our experiments on two
benchmark evaluation tasks (data removal and noisy label detection) and four
classifiers demonstrate the effectiveness of CS-Shapley over existing methods.
Lastly, we evaluate the "transferability" of data values estimated from one
classifier to others, and our results suggest Shapley-based data valuation is
transferable for application across different models.
Authors' comments: Accepted to NeurIPS 2022
Yuto Watanabe, Kazunori Sakurama
This paper addresses a distributed convex optimization problem with a class
of coupled constraints, which arise in a multi-agent system composed of
multiple communities modeled by cliques. First, we propose a fully distributed
gradient-based algorithm with a novel operator inspired by the convex
projection, called the clique-based projection. Next, we scrutinize the
convergence properties for both diminishing and fixed step sizes. For
diminishing ones, we show the convergence to an optimal solution under the
assumptions of the smoothness of an objective function and the compactness of
the constraint set. Additionally, when the objective function is strongly
monotone, the strict convergence to the unique solution is proved without the
assumption of compactness. For fixed step sizes, we prove the non-ergodic
convergence rate of O(1/k) concerning the objective residual under the
assumption of the smoothness of the objective function. Furthermore, we apply
Nesterov's acceleration method to the proposed algorithm and establish the
convergence rate of O(1/k^2). Numerical experiments illustrate the
effectiveness of the proposed method.
Authors' comments: 8 pages, 5 figures
Savinay Nagendra, Chaopeng Shen, Daniel Kifer
The purpose of binary segmentation models is to determine which pixels belong
to an object of interest (e.g., which pixels in an image are part of roads).
The models assign a logit score (i.e., probability) to each pixel and these are
converted into predictions by thresholding (i.e., each pixel with logit score
$\geq \tau$ is predicted to be part of a road). However, a common phenomenon in
current and former state-of-the-art segmentation models is spatial bias -- in
some patches, the logit scores are consistently biased upwards and in others
they are consistently biased downwards. These biases cause false positives and
false negatives in the final predictions. In this paper, we propose
PatchRefineNet (PRN), a small network that sits on top of a base segmentation
model and learns to correct its patch-specific biases. Across a wide variety of
base models, PRN consistently helps them improve mIoU by 2-3\%. One of the key
ideas behind PRN is the addition of a novel supervision signal during training.
Given the logit scores produced by the base segmentation model, each pixel is
given a pseudo-label that is obtained by optimally thresholding the logit
scores in each image patch. Incorporating these pseudo-labels into the loss
function of PRN helps correct systematic biases and reduce false
positives/negatives. Although we mainly focus on binary segmentation, we also
show how PRN can be extended to saliency detection and few-shot segmentation.
We also discuss how the ideas can be extended to multiclass segmentation.
Authors' comments: 16 pages, 12 figures, 7 tables (Added supplementary material)
Juan Sebastián Salcedo Gallo, Jesús Solano, Javier Hernán García, David Zarruk-Valencia, Alejandro Correa-Bahnsen
In this work, we propose a framework relying solely on chat-based customer
support (CS) interactions for predicting the recommendation decision of
individual users. For our case study, we analyzed a total number of 16.4k users
and 48.7k customer support conversations within the financial vertical of a
large e-commerce company in Latin America. Consequently, our main contributions
and objectives are to use Natural Language Processing (NLP) to assess and
predict the recommendation behavior where, in addition to using static
sentiment analysis, we exploit the predictive power of each user's sentiment
dynamics. Our results show that, with respective feature interpretability, it
is possible to predict the likelihood of a user to recommend a product or
service, based solely on the message-wise sentiment evolution of their CS
conversations in a fully automated way.
Authors' comments: 10 pages, 4 figures, 1 table. Already accepted at NeurIPS 2022,
LatinX in AI Workshop
Gao Xueqi, Xu Chao, Song Yihang, Hu Jing, Xiao Jian, Meng Zhaopeng
Road rage is a social problem that deserves attention, but little research has been done so far. In this paper, based on the biological topology of multi-channel EEG signals,we propose a model which combines transferable attention (TA) and regularized graph neural network (RGNN). First, topology-aware information aggregation is performed on EEG signals, and complex relationships between channels are dynamically learned. Then, the transferability of each channel is quantified based on the results of the node-wise domain classifier, which is used as attention score. We recruited 10 subjects and collected their EEG signals in pleasure and rage state in simulated driving conditions. We verify the effectiveness of our method on this dataset and compare it with other methods. The results indicate that our method is simple and efficient, with 85.63% accuracy in cross-subject experiments. It can be used to identify road rage. Our data and code are available. https://github.com/1CEc0ffee/dataAndCode.git
Tatsuya Chuman, Hitoshi Kiya
Privacy-preserving deep neural networks (DNNs) have been proposed for
protecting data privacy in the cloud server. Although several encryption
schemes for visually protection have been proposed for privacy-preserving DNNs,
several attacks enable to restore visual information from encrypted images. On
the other hand, it has been confirmed that the block-wise image encryption
scheme which utilizes block and pixel shuffling is robust against several
attacks. In this paper, we propose a jigsaw puzzle solver-based attack to
restore visual information from encrypted images including block and pixel
shuffling. In experiments, images encrypted by using the block-wise image
encryption are mostly restored by using the proposed attack.
Authors' comments: To be appeared in IWAIT2023
Sofoklis Kakouros, Themos Stafylakis, Ladislav Mosner, Lukas Burget
When recognizing emotions from speech, we encounter two common problems: how
to optimally capture emotion-relevant information from the speech signal and
how to best quantify or categorize the noisy subjective emotion labels.
Self-supervised pre-trained representations can robustly capture information
from speech enabling state-of-the-art results in many downstream tasks
including emotion recognition. However, better ways of aggregating the
information across time need to be considered as the relevant emotion
information is likely to appear piecewise and not uniformly across the signal.
For the labels, we need to take into account that there is a substantial degree
of noise that comes from the subjective human annotations. In this paper, we
propose a novel approach to attentive pooling based on correlations between the
representations' coefficients combined with label smoothing, a method aiming to
reduce the confidence of the classifier on the training labels. We evaluate our
proposed approach on the benchmark dataset IEMOCAP, and demonstrate high
performance surpassing that in the literature. The code to reproduce the
results is available at github.com/skakouros/s3prl_attentive_correlation.
Authors' comments: Submitted to IEEE-ICASSP 2023
Karthik Natarajan, Arjun Kodagehalli Ramachandra, Colin Tan
A collection of $n$ random events is said to be $(n - 1)$-wise independent if
any $n - 1$ events among them are mutually independent. We characterise all
probability measures with respect to which $n$ random events are $(n - 1)$-wise
independent. We provide sharp upper and lower bounds on the probability that at
least $k$ out of $n$ events with given marginal probabilities occur over these
probability measures. The bounds are shown to be computable in polynomial time.
Authors' comments: 18 pages, 2 tables
Yongzhi Su, Yan Di, Fabian Manhardt, Guangyao Zhai, Jason Rambach, Benjamin Busam, Didier Stricker, Federico Tombari
Despite monocular 3D object detection having recently made a significant leap forward thanks to the use of pre-trained depth estimators for pseudo-LiDAR recovery, such two-stage methods typically suffer from overfitting and are incapable of explicitly encapsulating the geometric relation between depth and object bounding box. To overcome this limitation, we instead propose OPA-3D, a single-stage, end-to-end, Occlusion-Aware Pixel-Wise Aggregation network that to jointly estimate dense scene depth with depth-bounding box residuals and object bounding boxes, allowing a two-stream detection of 3D objects, leading to significantly more robust detections. Thereby, the geometry stream denoted as the Geometry Stream, combines visible depth and depth-bounding box residuals to recover the object bounding box via explicit occlusion-aware optimization. In addition, a bounding box based geometry projection scheme is employed in an effort to enhance distance perception. The second stream, named as the Context Stream, directly regresses 3D object location and size. This novel two-stream representation further enables us to enforce cross-stream consistency terms which aligns the outputs of both streams, improving the overall performance. Extensive experiments on the public benchmark demonstrate that OPA-3D outperforms state-of-the-art methods on the main Car category, whilst keeping a real-time inference speed. We plan to release all codes and trained models soon.
Zihao Tang, Xinyi Wang, Lihaowen Zhu, Mariano Cabezas, Dongnan Liu, Michael Barnett, Weidong Cai, Chengyu Wang
Diffusion Weighted Imaging (DWI) is an advanced imaging technique commonly
used in neuroscience and neurological clinical research through a Diffusion
Tensor Imaging (DTI) model. Volumetric scalar metrics including fractional
anisotropy, mean diffusivity, and axial diffusivity can be derived from the DTI
model to summarise water diffusivity and other quantitative microstructural
information for clinical studies. However, clinical practice constraints can
lead to sub-optimal DWI acquisitions with missing slices (either due to a
limited field of view or the acquisition of disrupted slices). To avoid
discarding valuable subjects for group-wise studies, we propose a novel 3D
Tensor-Wise Brain-Aware Gate network (TW-BAG) for inpainting disrupted DTIs.
The proposed method is tailored to the problem with a dynamic gate mechanism
and independent tensor-wise decoders. We evaluated the proposed method on the
publicly available Human Connectome Project (HCP) dataset using common image
similarity metrics derived from the predicted tensors and scalar DTI metrics.
Our experimental results show that the proposed approach can reconstruct the
original brain DTI volume and recover relevant clinical imaging information.
Authors' comments: Accepted by The 2022 International Conference on Digital Image
Computing: Techniques and Applications (DICTA 2022)
Ruoyang Liu, Chenhan Wei, Yixiong Yang, Wenxun Wang, Huazhong Yang, Yongpan Liu
Data quantization is an effective method to accelerate neural network
training and reduce power consumption. However, it is challenging to perform
low-bit quantized training: the conventional equal-precision quantization will
lead to either high accuracy loss or limited bit-width reduction, while
existing mixed-precision methods offer high compression potential but failed to
perform accurate and efficient bit-width assignment. In this work, we propose
DYNASTY, a block-wise dynamic-precision neural network training framework.
DYNASTY provides accurate data sensitivity information through fast online
analytics, and maintains stable training convergence with an adaptive bit-width
map generator. Network training experiments on CIFAR-100 and ImageNet dataset
are carried out, and compared to 8-bit quantization baseline, DYNASTY brings up
to $5.1\times$ speedup and $4.7\times$ energy consumption reduction with no
accuracy drop and negligible hardware overhead.
Authors' comments: 7 pages, to be published in 28th Asia and South Pacific Design
Automation Conference (ASP-DAC 2023)
Mi Qian, Yao Ge, Miaowen Wen, Fei Ji
As a promising technique for high-mobility wireless communications, orthogonal time frequency space (OTFS) has been proved to enjoy excellent advantages with respect to traditional orthogonal frequency division multiplexing (OFDM). However, a challenging problem is to design efficient systems to further improve the performance. In this paper, we propose a novel block-wise index modulation (IM) scheme for OTFS systems, named Doppler-IM with OTFS (DoIM-OTFS), where a block of Doppler resource bins are activated simultaneously. For practical implementation, we develop a low complexity customized message passing (CMP) algorithm for our proposed DoIM-OTFS scheme. Simulation results demonstrate our proposed DoIM-OTFS system outperforms traditional OTFS system without IM. The proposed CMP algorithm can achieve desired performance and robustness to the imperfect channel state information (CSI).
Vardhan Dongre, Abhinav Thimma Reddy, Nikhitha Reddeddy
DeepFake Audio, unlike DeepFake images and videos, has been relatively less
explored from detection perspective, and the solutions which exist for the
synthetic speech classification either use complex networks or dont generalize
to different varieties of synthetic speech obtained using different generative
and optimization-based methods. Through this work, we propose a channel-wise
recalibration of features using attention feature fusion for synthetic speech
detection and compare its performance against different detection methods
including End2End models and Resnet-based models on synthetic speech generated
using Text to Speech and Vocoder systems like WaveNet, WaveRNN, Tactotron, and
WaveGlow. We also experiment with Squeeze Excitation (SE) blocks in our Resnet
models and found that the combination was able to get better performance. In
addition to the analysis, we also demonstrate that the combination of Linear
frequency cepstral coefficients (LFCC) and Mel Frequency cepstral coefficients
(MFCC) using the attentional feature fusion technique creates better input
features representations which can help even simpler models generalize well on
synthetic speech classification tasks. Our models (Resnet based using feature
fusion) trained on Fake or Real (FoR) dataset and were able to achieve 95% test
accuracy with the FoR data, and an average of 90% accuracy with samples we
generated using different generative models after adapting this framework.
Authors' comments: 7 pages, 8 figures, 4 tables
Pouria Mehrabi, Hamid D. Taghirad
Both in terrestrial and extraterrestrial environments, the precise and informative model of the ground and the surface ahead is crucial for navigation and obstacle avoidance. The ground surface is not always flat and it may be sloped, bumpy and rough specially in off-road terrestrial scenes. In bumpy and rough scenes the functional relationship of the surface-related features may vary in different areas of the ground, as the structure of the ground surface may vary suddenly and further the measured point cloud of the ground does not bear smoothness. Thus, the ground-related features must be obtained based on local estimates or even point estimates. To tackle this problem, the segment-wise GP-based ground segmentation method with local smoothness estimation is proposed. This method is an extension to our previous method in which a realistic measurement of the length-scale values were provided for the covariance kernel in each line-segment to give precise estimation of the ground for sloped terrains. In this extension, the value of the length-scale is estimated locally for each data point which makes it much more precise for the rough scenes while being not computationally complex and more robust to under-segmentation, sparsity and under-represent-ability. The segment-wise task is performed to estimate a partial continuous model of the ground for each radial range segment. Simulation results show the effectiveness of the proposed method to give a continuous and precise estimation of the ground surface in rough and bumpy scenes while being fast enough for real-world applications.
Chenghao Yang, Xuezhe Ma
Fine-tuning over large pretrained language models (PLMs) has established many
state-of-the-art results. Despite its superior performance, such fine-tuning
can be unstable, resulting in significant variance in performance and potential
risks for practical applications. Previous works have attributed such
instability to the catastrophic forgetting problem in the top layers of PLMs,
which indicates iteratively that fine-tuning layers in a top-down manner is a
promising solution. In this paper, we first point out that this method does not
always work out due to the different convergence speeds of different
layers/modules. Inspired by this observation, we propose a simple
component-wise gradient norm clipping method to adjust the convergence speed
for different components. Experiment results demonstrate that our method
achieves consistent improvements in terms of generalization performance,
convergence speed, and training stability. The codebase can be found at
https://github.com/yangalan123/FineTuningStability.
Authors' comments: EMNLP 2022 Camera Ready