Fangyuan Wang, Bo Xu, Bo Xu
Currently, the chunk-wise schemes are often used to make Automatic Speech
Recognition (ASR) models to support streaming deployment. However, existing
approaches are unable to capture the global context, lack support for parallel
training, or exhibit quadratic complexity for the computation of multi-head
self-attention (MHSA). On the other side, the causal convolution, no future
context used, has become the de facto module in streaming Conformer. In this
paper, we propose SSCFormer to push the limit of chunk-wise Conformer for
streaming ASR using the following two techniques: 1) A novel cross-chunks
context generation method, named Sequential Sampling Chunk (SSC) scheme, to
re-partition chunks from regular partitioned chunks to facilitate efficient
long-term contextual interaction within local chunks. 2)The Chunked Causal
Convolution (C2Conv) is designed to concurrently capture the left context and
chunk-wise future context. Evaluations on AISHELL-1 show that an End-to-End
(E2E) CER 5.33% can achieve, which even outperforms a strong time-restricted
baseline U2. Moreover, the chunk-wise MHSA computation in our model enables it
to train with a large batch size and perform inference with linear complexity.
Authors' comments: This manuscript has been accepted by SPL
Chang Zhang, Guo-Yin Zhang, Jin-Zeng Li, Jing-Hua Yuan
Massive young stellar objects (MYSOs) play a crucial role in star formation.
Given that MYSOs were previously identified based on the extended structure and
the observational data for them is limited, screening the Wide-field Infrared
Survey Explorer (WISE) objects showing green features (for the common coding of
the 4.6 $\mu$m band as green channel in three-color composite WISE images) will
yield more MYSO candidates. Using WISE images in the whole Galactic Plane ($
0^\circ<l<360^\circ $ and $\mid b \mid <2^\circ$), we identified sources with
strong emissions at 4.6 $\mu$m band, then according to morphological features
divided them into three groups. We present a catalog of 2135 WISE Green Objects
(WGOs). 264 WGOs have an extended structure. 1366 WGOs show compact green
feature but without extended structure. 505 WGOs have neither extended
structure nor green feature, but the intensity at 4.6 $\mu$m is numerically at
least 4.5 times that of 3.4 $\mu$m. According to the analysis of the
coordinates of WGOs, we find WGOs are mainly distributed in $\mid l \mid<
60^\circ$, coincident with the position of the giant molecular clouds in $\mid
l \mid> 60^\circ$. Matching results with various masers show that those three
groups of WGOs are at different evolutionary stages. After cross-matching WGOs
with published YSO survey catalogs, we infer that $\sim$50% of WGOs are samples
of newly discovered YSOs. In addition, 1260 WGOs are associated with Hi-GAL
sources, according to physical parameters estimated by spectral energy
distribution fitting, of which 231 are classified as robust MYSOs and 172 as
candidate MYSOs.
Authors' comments: 25 pages, 9 figures, accepted for publication in ApJS
E. J. Marchesini, V. Reynaldi, F. Vieyro, J. Saponara, I. Andruchow, I. E. López, P. Benaglia, S. A. Cellone et al.
Context. The gamma-ray emitting source WISE J141046.00+740511.2 has been
associated with a Fermi-LAT detection by crossmatching with Swift/XRT data. It
has shown all the canonical observational characteristics of a BL Lac source,
including a power-law, featureless optical spectrum. However, it was only
recently detected at radio frequencies and its radio flux is significantly low.
Aims. Given that a radio detection is fundamental to associate lower-energy
counterparts to Fermi-LAT sources, we aim to unambiguously classify this source
by performing a multiwavelength analysis based on contemporaneous data.
Methods. By using multifrequency observations at the Jansky Very Large Array,
Giant Metrewave Radio Telescope, Gran Telescopio Canarias, Gemini, William
Herschel Telescope and Liverpool observatories, together with Fermi-LAT and
Swift data, we carried out two kinds of analyses. On one hand, we studied
several known parameters that account for the radio loudness or weakness
characterization and their application to blazars (in general) and to our
source (in particular). And, on the other hand, we built and analyzed the
observed spectral energy distribution (SED) of this source to try to explain
its peculiar characteristics. Results. The multiwavelength analysis indicates
that WISE J141046.00+740511.2 is a blazar of the high-frequency peaked (HBL)
type that emits highly polarized light and that is likely located at a low
redshift. In addition, the one-zone model parameters that best fit its SED are
those of an extreme HBL (EHBL); this blazar type has been extensively predicted
in theory to be lacking in the radio emission that is otherwise typical of
canonical gamma-ray blazars. Conclusions. We confirm that WISE
J141046.00+740511.2 is indeed a highly polarized BL Lac of the HBL type.
Further studies will be conducted to explain the atypical low radio flux
detected for this source.
Authors' comments: accepted for publication in A&A, in press
Babhrubahan Bose
We study the relationship between the point-wise symmetry of Birkhoff-James orthogonality and the geometry of the space of operators $\mathbb{B}(\ell_\infty^n,\ell_1^m)$. We show that any non-zero left-symmetric point in this space is a smooth point. We also show that for $n\geq4$, any unit norm right-symmetric point of this space is an extreme point of the closed unit ball. This marks the first step towards characterizing the extreme points of these unit balls and finding the Grothendieck constants $G(m,n)$ using Birkhoff-James orthogonality techniques.
Yucong Lin, Jinhua Su, Yuhang Li, Yuhao Wei, Hanchao Yan, Saining Zhang, Jiaan Luo, Danni Ai et al.
Deep learning methods have contributed substantially to the rapid advancement of medical image segmentation, the quality of which relies on the suitable design of loss functions. Popular loss functions, including the cross-entropy and dice losses, often fall short of boundary detection, thereby limiting high-resolution downstream applications such as automated diagnoses and procedures. We developed a novel loss function that is tailored to reflect the boundary information to enhance the boundary detection. As the contrast between segmentation and background regions along the classification boundary naturally induces heterogeneity over the pixels, we propose the piece-wise two-sample t-test augmented (PTA) loss that is infused with the statistical test for such heterogeneity. We demonstrate the improved boundary detection power of the PTA loss compared to benchmark losses without a t-test component.
Ruoyi Zhang, Haibo Yuan
We have obtained accurate dust reddening from far-ultraviolet (UV) to the
mid-infrared (IR) for up to 5 million stars by the star-pair algorithm based on
LAMOST stellar parameters along with GALEX, Pan-STARRS 1, Gaia, SDSS, 2MASS,
and WISE photometric data. The typical errors are between 0.01-0.03 mag for
most colors. We derived the empirical reddening coefficients for 21 colors both
in the traditional (single valued) way and as a function of Teff and E(B-V) by
using the largest samples of accurate reddening measurements, together with the
extinction values from Schlegel et al. The corresponding extinction
coefficients have also been obtained. The results are compared with model
predictions and generally in good agreement. Comparisons with measurements in
the literature show that the Teff- and E(B-V)-dependent coefficients explain
the discrepancies between different measurements naturally, i.e., using sample
stars of different temperatures and reddening. Our coefficients are mostly
valid in the extinction range of 0-0.5 mag and the temperature range of
4000-10000 K. We recommend that the new Teff and E(B-V) dependent reddening and
extinction coefficients should be used in the future. A Python package is also
provided for the usage of the coefficient
Authors' comments: 21 pages, 9 figures, 4 table (including appendix). Published in the
Astrophysical Journal Supplement Series
Yiming Li, Zhifang Guo, Zhirong Ye, Xiangdong Wang, Hong Liu, Yueliang Qian, Rui Tao, Long Yan et al.
In this paper, we describe in detail our system for DCASE 2022 Task4. The
system combines two considerably different models: an end-to-end Sound Event
Detection Transformer (SEDT) and a frame-wise model, Metric Learning and Focal
Loss CNN (MLFL-CNN). The former is an event-wise model which learns event-level
representations and predicts sound event categories and boundaries directly,
while the latter is based on the widely adopted frame-classification scheme,
under which each frame is classified into event categories and event boundaries
are obtained by post-processing such as thresholding and smoothing. For SEDT,
self-supervised pre-training using unlabeled data is applied, and
semi-supervised learning is adopted by using an online teacher, which is
updated from the student model using the Exponential Moving Average (EMA)
strategy and generates reliable pseudo labels for weakly-labeled and unlabeled
data. For the frame-wise model, the ICT-TOSHIBA system of DCASE 2021 Task 4 is
used. Experimental results show that the hybrid system considerably outperforms
either individual model and achieves psds1 of 0.420 and psds2 of 0.783 on the
validation set without external data. The code is available at
https://github.com/965694547/Hybrid-system-of-frame-wise-model-and-SEDT.
Authors' comments: 5 pages, 2 figures, accepted for publication in DCASE2022 Workshop
Haibo Yang, Shengjie Zhang, Xiaoyang Han, Botao Zhao, Yan Ren, Yaru Sheng, Xiao-Yong Zhang
Small lesions in magnetic resonance imaging (MRI) images are crucial for
clinical diagnosis of many kinds of diseases. However, the MRI quality can be
easily degraded by various noise, which can greatly affect the accuracy of
diagnosis of small lesion. Although some methods for denoising MR images have
been proposed, task-specific denoising methods for improving the diagnosis
confidence of small lesions are lacking. In this work, we propose a voxel-wise
hybrid residual MLP-CNN model to denoise three-dimensional (3D) MR images with
small lesions. We combine basic deep learning architecture, MLP and CNN, to
obtain an appropriate inherent bias for the image denoising and integrate each
output layers in MLP and CNN by adding residual connections to leverage
long-range information. We evaluate the proposed method on 720 T2-FLAIR brain
images with small lesions at different noise levels. The results show the
superiority of our method in both quantitative and visual evaluations on
testing dataset compared to state-of-the-art methods. Moreover, two experienced
radiologists agreed that at moderate and high noise levels, our method
outperforms other methods in terms of recovery of small lesions and overall
image denoising quality. The implementation of our method is available at
https://github.com/laowangbobo/Residual_MLP_CNN_Mixer.
Authors' comments: accepted by MICCAI 2022
Se-In Jang, Tinsu Pan, Ye Li, Pedram Heidari, Junyu Chen, Quanzheng Li, Kuang Gong
Position emission tomography (PET) is widely used in clinics and research due
to its quantitative merits and high sensitivity, but suffers from low
signal-to-noise ratio (SNR). Recently convolutional neural networks (CNNs) have
been widely used to improve PET image quality. Though successful and efficient
in local feature extraction, CNN cannot capture long-range dependencies well
due to its limited receptive field. Global multi-head self-attention (MSA) is a
popular approach to capture long-range information. However, the calculation of
global MSA for 3D images has high computational costs. In this work, we
proposed an efficient spatial and channel-wise encoder-decoder transformer,
Spach Transformer, that can leverage spatial and channel information based on
local and global MSAs. Experiments based on datasets of different PET tracers,
i.e., $^{18}$F-FDG, $^{18}$F-ACBC, $^{18}$F-DCFPyL, and $^{68}$Ga-DOTATATE,
were conducted to evaluate the proposed framework. Quantitative results show
that the proposed Spach Transformer framework outperforms state-of-the-art deep
learning architectures. Our codes are available at
https://github.com/sijang/SpachTransformer
Authors' comments: 15 pages
S. Rahmani, C. W. Xiao
The branching ratios of the semileptonic decay widths of the charm mesons are analyzed, using three different models for the Isgur-Wise functions, such as ${D^0} \to {K^ - }{l^ + }\upsilon $, ${D^0} \to {\pi ^ - }{l^ + }\nu $, ${D_s} \to {K^0}{l^ + }\nu $ and ${D_s} \to \eta {l^ + }\nu$, where the form factors of these decays are discussed. The mass spectra of the charm mesons are obtained. We use a potential quark model and consider the non-relativistic Hamiltonian of the charm meson as a bound state of the quark-antiquark system. We take into account the harmonic-type confinement and also Hellmann potential, which is a superposition of the Coulomb and the Yukawa potential. Using the variational approach along with the harmonic oscillator wave functions, we evaluate the mass spectra of the charm mesons, the form factors and the semileptonic decay widths of $D_{(s)}$. We present our results for masses of $D, D_s$ and $\eta$, the Isgur-Wise functions, the form factors of the semileptonic decays, and the branching fractions of the semileptonic decays of $D$ and $D_s$. Our results are motivating.
Toshihiro Kasuga, Joseph R. Masiero
We present space-based thermal infrared observations of the presumably
Geminid-associated asteroids: (3200)Phaethon, 2005 UD and 1999 YC using
WISE/NEOWISE. The images were taken at the four wavelength bands
3.4$\mu$m(W1),4.6$\mu$m(W2),12$\mu$m(W3),and 22$\mu$m(W4). We find no evidence
of lasting mass-loss in the asteroids over the decadal multi-epoch datasets. We
set an upper limit to the mass-loss rate in dust of Q<2kg s$^{-1}$ for Phaethon
and <0.1kg s$^{-1}$ for both 2005 UD and 1999 YC, respectively, with little
dependency over the observed heliocentric distances of R=1.0$-$2.3au. For
Phaethon, even if the maximum mass-loss was sustained over the 1000(s)yr
dynamical age of the Geminid stream, it is more than two orders of magnitude
too small to supply the reported stream mass (1e13$-$14kg). The
Phaethon-associated dust trail (Geminid stream) is not detected at R=2.3au,
corresponding an upper limit on the optical depth of $\tau$<7e-9. Additionally,
no co-moving asteroids with radii r<650m were found. The DESTINY+ dust analyzer
would be capable of detecting several of the 10$\mu$m-sized interplanetary dust
particles when at far distances(>50,000km) from Phaethon. From 2005 UD, if the
mass-loss rate lasted over the 10,000yr dynamical age of the Daytime Sextantid
meteoroid stream, the mass of the stream would be ~1e10kg. The 1999 YC images
showed neither the related dust trail ($\tau$<2e-8) nor co-moving objects with
radii r<170m at R=1.6au. Estimated physical parameters from these limits do not
explain the production mechanism of the Geminid meteoroid stream. Lastly, to
explore the origin of the Geminids, we discuss the implications for our data in
relation to the possibly sodium (Na)-driven perihelion activity of Phaethon.
Authors' comments: Accepted for publication in The Astronomical Journal, 8 tables, 7
figures
Eeshan Bhaduri, Shagufta Pal, Arkopal Kishore Goswami
The study examines heterogeneity in travel behaviour among ride-hailing services (RHS) users by including attitudes, in order to reinforce conventional user-segmentation approaches. Simultaneously, prioritization of ride-hailing specific attributes was carried out to assess how RHS will operate in a sustainable way.
Katsuhisa Ouchi, Hiroyuki Masuyama
This paper considers the level-increment (LI) truncation approximation of
M/G/1-type Markov chains. The LI truncation approximation is useful for
implementing the M/G/1 paradigm, which is the framework for computing the
stationary distribution of M/G/1-type Markov chains. The main result of this
paper is a subgeometric convergence formula for the total variation distance
between the original stationary distribution and its LI truncation
approximation. Suppose that the equilibrium level-increment distribution is
subexponential, and that the downward transition matrix is rank one. We then
show that the convergence rate of the total variation error of the LI
truncation approximation is equal to that of the tail of the equilibrium
level-increment distribution and that of the tail of the original stationary
distribution.
Authors' comments: 20 pages This is a revised version of the paper to appear in JORSJ,
Vol. 65, No. 4, 2022
Fabian Duffhauss, Tobias Demmler, Gerhard Neumann
Estimating 6D poses of objects is an essential computer vision task. However,
most conventional approaches rely on camera data from a single perspective and
therefore suffer from occlusions. We overcome this issue with our novel
multi-view 6D pose estimation method called MV6D which accurately predicts the
6D poses of all objects in a cluttered scene based on RGB-D images from
multiple perspectives. We base our approach on the PVN3D network that uses a
single RGB-D image to predict keypoints of the target objects. We extend this
approach by using a combined point cloud from multiple views and fusing the
images from each view with a DenseFusion layer. In contrast to current
multi-view pose detection networks such as CosyPose, our MV6D can learn the
fusion of multiple perspectives in an end-to-end manner and does not require
multiple prediction stages or subsequent fine tuning of the prediction.
Furthermore, we present three novel photorealistic datasets of cluttered scenes
with heavy occlusions. All of them contain RGB-D images from multiple
perspectives and the ground truth for instance semantic segmentation and 6D
pose estimation. MV6D significantly outperforms the state-of-the-art in
multi-view 6D pose estimation even in cases where the camera poses are known
inaccurately. Furthermore, we show that our approach is robust towards dynamic
camera setups and that its accuracy increases incrementally with an increasing
number of perspectives.
Authors' comments: Accepted at IROS 2022
Naoki Makishima, Satoshi Suzuki, Atsushi Ando, Ryo Masumura
In this paper, we investigate the semi-supervised joint training of text to
speech (TTS) and automatic speech recognition (ASR), where a small amount of
paired data and a large amount of unpaired text data are available.
Conventional studies form a cycle called the TTS-ASR pipeline, where the
multispeaker TTS model synthesizes speech from text with a reference speech and
the ASR model reconstructs the text from the synthesized speech, after which
both models are trained with a cycle-consistency loss. However, the synthesized
speech does not reflect the speaker characteristics of the reference speech and
the synthesized speech becomes overly easy for the ASR model to recognize after
training. This not only decreases the TTS model quality but also limits the ASR
model improvement. To solve this problem, we propose improving the
cycleconsistency-based training with a speaker consistency loss and step-wise
optimization. The speaker consistency loss brings the speaker characteristics
of the synthesized speech closer to that of the reference speech. In the
step-wise optimization, we first freeze the parameter of the TTS model before
both models are trained to avoid over-adaptation of the TTS model to the ASR
model. Experimental results demonstrate the efficacy of the proposed method.
Authors' comments: Accepted to INTERSPEECH 2022
Geonho Cha, Chaehun Shin, Sungroh Yoon, Dongyoon Wee
To estimate the volume density and color of a 3D point in the multi-view image-based rendering, a common approach is to inspect the consensus existence among the given source image features, which is one of the informative cues for the estimation procedure. To this end, most of the previous methods utilize equally-weighted aggregation features. However, this could make it hard to check the consensus existence when some outliers, which frequently occur by occlusions, are included in the source image feature set. In this paper, we propose a novel source-view-wise feature aggregation method, which facilitates us to find out the consensus in a robust way by leveraging local structures in the feature set. We first calculate the source-view-wise distance distribution for each source feature for the proposed aggregation. After that, the distance distribution is converted to several similarity distributions with the proposed learnable similarity mapping functions. Finally, for each element in the feature set, the aggregation features are extracted by calculating the weighted means and variances, where the weights are derived from the similarity distributions. In experiments, we validate the proposed method on various benchmark datasets, including synthetic and real image scenes. The experimental results demonstrate that incorporating the proposed features improves the performance by a large margin, resulting in the state-of-the-art performance.
Xiaofeng Gao, Xingwei Wu, Samson Ho, Teruhisa Misu, Kumar Akash
Although partially autonomous driving (AD) systems are already available in
production vehicles, drivers are still required to maintain a sufficient level
of situational awareness (SA) during driving. Previous studies have shown that
providing information about the AD's capability using user interfaces can
improve the driver's SA. However, displaying too much information increases the
driver's workload and can distract or overwhelm the driver. Therefore, to
design an efficient user interface (UI), it is necessary to understand its
effect under different circumstances. In this paper, we focus on a UI based on
augmented reality (AR), which can highlight potential hazards on the road. To
understand the effect of highlighting on drivers' SA for objects with different
types and locations under various traffic densities, we conducted an in-person
experiment with 20 participants on a driving simulator. Our study results show
that the effects of highlighting on drivers' SA varied by traffic densities,
object locations and object types. We believe our study can provide guidance in
selecting which object to highlight for the AR-based driver-assistance
interface to optimize SA for drivers driving and monitoring partially
autonomous vehicles.
Authors' comments: 10 pages, 11 figures, IV 2022
Iasson Karafyllis, Pierdomenico Pepe, Yuan Wang, Antoine Chaillet
For time-delay systems, it is known that global asymptotic stability is guaranteed by the existence of a Lyapunov-Krasovskii functional that dissipates in a point-wise manner along solutions, namely whose dissipation rate involves only the current value of the solution's norm. So far, the extension of this result to global exponential stability (GES) holds only for systems ruled by a globally Lipschitz vector field and remains largely open for the input-to-state stability (ISS) property. In this paper, we rely on the notion of exponential ISS to extend the class of systems for which GES or ISS can be concluded from a point-wise dissipation. Our results in turn show that these properties still hold in the presence of a sufficiently small additional term involving the whole state history norm. We provide explicit estimates of the tolerable magnitude of this extra term and show through an example how it can be used to assess robustness with respect to modeling uncertainties.
Qiuyi Wu, Julie Bessac, Whitney Huang, Jiali Wang
This study develops a statistical conditional approach to evaluate climate model performance in wind speed and direction and to project their future changes under the representative concentration pathway 8.5 scenario over inland and offshore locations across the Continental United States. The proposed conditional approach extends the scope of existing studies by characterizing the changes of the full range of the joint wind speed and direction distribution. Directional wind speed distributions are estimated using two statistical methods: a Weibull distributional regression model and a quantile regression model, both of which enforce the circular constraint to their resulting estimates of directional distributions. Projected uncertainties associated with different climate models and model internal variability are investigated and compared with the climate change signal to quantify the statistical significance of the future projections. In particular this work extends the concept of internal variability to the standard deviation and high quantiles to assess the relative magnitudes to their projected changes. The evaluation results show that the studied climate model capture both historical wind speed, wind direction, and their dependencies reasonably well over both inland and offshore locations. In the future, most of the locations show no significant changes in mean wind speeds in both winter and summer, although the changes in standard deviation and 95th-quantile show some robust changes over certain locations in winter. The proposed conditional approach enables the characterization of the directional wind speed distributions, which offers additional insights for the joint assessment of speed and direction.
Christoph Wehner, Francis Powlesland, Bashar Altakrouri, Ute Schmid
Artificial Intelligence and Digital Twins play an integral role in driving innovation in the domain of intelligent driving. Long short-term memory (LSTM) is a leading driver in the field of lane change prediction for manoeuvre anticipation. However, the decision-making process of such models is complex and non-transparent, hence reducing the trustworthiness of the smart solution. This work presents an innovative approach and a technical implementation for explaining lane change predictions of layer normalized LSTMs using Layer-wise Relevance Propagation (LRP). The core implementation includes consuming live data from a digital twin on a German highway, live predictions and explanations of lane changes by extending LRP to layer normalized LSTMs, and an interface for communicating and explaining the predictions to a human user. We aim to demonstrate faithful, understandable, and adaptable explanations of lane change prediction to increase the adoption and trustworthiness of AI systems that involve humans. Our research also emphases that explainability and state-of-the-art performance of ML models for manoeuvre anticipation go hand in hand without negatively affecting predictive effectiveness.