Zihong Yan, Xiaoyi Wu, Zhuozhu Jian, Bin Lan Xueqian Wang, Bin Liang
Mobile robots navigating in outdoor environments frequently encounter the issue of undesired traces left by dynamic objects and manifested as obstacles on map, impeding robots from achieving accurate localization and effective navigation. To tackle the problem, a novel map construction framework based on 3D region-wise hash map structure (RH-Map) is proposed, consisting of front-end scan fresher and back-end removal modules, which realizes real-time map construction and online dynamic object removal (DOR). First, a two-layer 3D region-wise hash map structure of map management is proposed for effective online DOR. Then, in scan fresher, region-wise ground plane estimation (R-GPE) is adopted for estimating and preserving ground information and Scan-to-Map Removal (S2M-R) is proposed to discriminate and remove dynamic regions. Moreover, the lightweight back-end removal module maintaining keyframes is proposed for further DOR. As experimentally verified on SemanticKITTI, our proposed framework yields promising performance on online DOR of map construction compared with the state-of-the-art methods. And we also validate the proposed framework in real-world environments.
Manorama Jha, Bhaskar Banerjee
We present an efficient and robust point cloud registration (PCR) workflow
for part-wise rigid point cloud alignment using the Microsoft HoloLens 2. Point
Cloud Registration (PCR) is an important problem in Augmented and Mixed Reality
use cases, and we present a study for a special class of non-rigid
transformations. Many commonly encountered objects are composed of rigid parts
that move relative to one another about joints resulting in non-rigid
deformation of the whole object such as robots with manipulators, and machines
with hinges. The workflow presented allows us to register the point cloud with
various configurations of the point cloud.
Authors' comments: Accepted for presentation at WiCV @ CVPR 2023
Marco Braun, Moritz Luszek, Mirko Meuter, Dominic Spata, Kevin Kollek, Anton Kummert
Current Deep Learning methods for environment segmentation and velocity
estimation rely on Convolutional Recurrent Neural Networks to exploit
spatio-temporal relationships within obtained sensor data. These approaches
derive scene dynamics implicitly by correlating novel input and memorized data
utilizing ConvNets. We show how ConvNets suffer from architectural restrictions
for this task. Based on these findings, we then provide solutions to various
issues on exploiting spatio-temporal correlations in a sequence of sensor
recordings by presenting a novel Recurrent Neural Network unit utilizing
Transformer mechanisms. Within this unit, object encodings are tracked across
consecutive frames by correlating key-query pairs derived from sensor inputs
and memory states, respectively. We then use resulting tracking patterns to
obtain scene dynamics and regress velocities. In a last step, the memory state
of the Recurrent Neural Network is projected based on extracted velocity
estimates to resolve aforementioned spatio-temporal misalignment.
Authors' comments: Preprint submitted to 2022 IEEE 25th International Conference on
Intelligent Transportation Systems (ITSC), Macau, China, 7 pages
Shivang Rawat, Stefano Martiniani
Stochasticity plays a central role in nearly every biological process, and the noise power spectral density (PSD) is a critical tool for understanding variability and information processing in living systems. In steady-state, many such processes can be described by stochastic linear time-invariant (LTI) systems driven by Gaussian white noise, whose PSD is a complex rational function of the frequency that can be concisely expressed in terms of their Jacobian, dispersion, and diffusion matrices, fully defining the statistical properties of the system's dynamics at steady-state. Here, we arrive at compact element-wise solutions of the rational function coefficients for the auto- and cross-spectrum that enable the explicit analytical computation of the PSD in dimensions n=2,3,4. We further present a recursive Leverrier-Faddeev-type algorithm for the exact computation of the rational function coefficients. Crucially, both solutions are free of matrix inverses. We illustrate our element-wise and recursive solutions by considering the stochastic dynamics of neural systems models, namely Fitzhugh-Nagumo (n=2), Hindmarsh-Rose (n=3), Wilson-Cowan (n=4), and the Stabilized Supralinear Network (n=22), as well as an evolutionary game-theoretic model with mutations (n=5, 31). We extend our approach to derive a recursive method for calculating the coefficients in the power series expansion of the integrated covariance matrix for interacting spiking neurons modeled as Hawkes processes on arbitrary directed graphs.
Domenico Iuso, Soumick Chatterjee, Sven Cornelissen, Dries Verhees, Jan De Beenhouwer, Jan Sijbers
Additive Manufacturing (AM) has emerged as a manufacturing process that allows the direct production of samples from digital models. To ensure that quality standards are met in all manufactured samples of a batch, X-ray computed tomography (X-CT) is often used combined with automated anomaly detection. For the latter, deep learning (DL) anomaly detection techniques are increasingly, as they can be trained to be robust to the material being analysed and resilient towards poor image quality. Unfortunately, most recent and popular DL models have been developed for 2D image processing, thereby disregarding valuable volumetric information. This study revisits recent supervised (UNet, UNet++, UNet 3+, MSS-UNet) and unsupervised (VAE, ceVAE, gmVAE, vqVAE) DL models for porosity analysis of AM samples from X-CT images and extends them to accept 3D input data with a 3D-patch pipeline for lower computational requirements, improved efficiency and generalisability. The supervised models were trained using the Focal Tversky loss to address class imbalance that arises from the low porosity in the training datasets. The output of the unsupervised models is post-processed to reduce misclassifications caused by their inability to adequately represent the object surface. The findings were cross-validated in a 5-fold fashion and include: a performance benchmark of the DL models, an evaluation of the post-processing algorithm, an evaluation of the effect of training supervised models with the output of unsupervised models. In a final performance benchmark on a test set with poor image quality, the best performing supervised model was UNet++ with an average precision of 0.751 $\pm$ 0.030, while the best unsupervised model was the post-processed ceVAE with 0.830 $\pm$ 0.003. The VAE/ceVAE models demonstrated superior capabilities, particularly when leveraging post-processing techniques.
Di Hong, Jiangrong Shen, Yu Qi, Yueming Wang
Spiking Neural Networks (SNNs) are biologically realistic and practically promising in low-power computation because of their event-driven mechanism. Usually, the training of SNNs suffers accuracy loss on various tasks, yielding an inferior performance compared with ANNs. A conversion scheme is proposed to obtain competitive accuracy by mapping trained ANNs' parameters to SNNs with the same structures. However, an enormous number of time steps are required for these converted SNNs, thus losing the energy-efficient benefit. Utilizing both the accuracy advantages of ANNs and the computing efficiency of SNNs, a novel SNN training framework is proposed, namely layer-wise ANN-to-SNN knowledge distillation (LaSNN). In order to achieve competitive accuracy and reduced inference latency, LaSNN transfers the learning from a well-trained ANN to a small SNN by distilling the knowledge other than converting the parameters of ANN. The information gap between heterogeneous ANN and SNN is bridged by introducing the attention scheme, the knowledge in an ANN is effectively compressed and then efficiently transferred by utilizing our layer-wise distillation paradigm. We conduct detailed experiments to demonstrate the effectiveness, efficacy, and scalability of LaSNN on three benchmark data sets (CIFAR-10, CIFAR-100, and Tiny ImageNet). We achieve competitive top-1 accuracy compared to ANNs and 20x faster inference than converted SNNs with similar performance. More importantly, LaSNN is dexterous and extensible that can be effortlessly developed for SNNs with different architectures/depths and input encoding methods, contributing to their potential development.
Muhammad Febrian Rachmadi, Charissa Poon, Henrik Skibbe
In this paper, we propose a novel two-component loss for biomedical image
segmentation tasks called the Instance-wise and Center-of-Instance (ICI) loss,
a loss function that addresses the instance imbalance problem commonly
encountered when using pixel-wise loss functions such as the Dice loss. The
Instance-wise component improves the detection of small instances or ``blobs"
in image datasets with both large and small instances. The Center-of-Instance
component improves the overall detection accuracy. We compared the ICI loss
with two existing losses, the Dice loss and the blob loss, in the task of
stroke lesion segmentation using the ATLAS R2.0 challenge dataset from MICCAI
2022. Compared to the other losses, the ICI loss provided a better balanced
segmentation, and significantly outperformed the Dice loss with an improvement
of $1.7-3.7\%$ and the blob loss by $0.6-5.0\%$ in terms of the Dice similarity
coefficient on both validation and test set, suggesting that the ICI loss is a
potential solution to the instance imbalance problem.
Authors' comments: conference
Amirhossein Rasoulian, Soorena Salari, Yiming Xiao
Intracranial hemorrhage (ICH) is a life-threatening medical emergency caused by various factors. Timely and precise diagnosis of ICH is crucial for administering effective treatment and improving patient survival rates. While deep learning techniques have emerged as the leading approach for medical image analysis and processing, the most commonly employed supervised learning often requires large, high-quality annotated datasets that can be costly to obtain, particularly for pixel/voxel-wise image segmentation. To address this challenge and facilitate ICH treatment decisions, we proposed a novel weakly supervised ICH segmentation method that leverages a hierarchical combination of head-wise gradient-infused self-attention maps obtained from a Swin transformer. The transformer is trained using an ICH classification task with categorical labels. To build and validate the proposed technique, we used two publicly available clinical CT datasets, namely RSNA 2019 Brain CT hemorrhage and PhysioNet. Additionally, we conducted an exploratory study comparing two learning strategies - binary classification and full ICH subtyping - to assess their impact on self-attention and our weakly supervised ICH segmentation framework. The proposed algorithm was compared against the popular U-Net with full supervision, as well as a similar weakly supervised approach using Grad-CAM for ICH segmentation. With a mean Dice score of 0.47, our technique achieved similar ICH segmentation performance as the U-Net and outperformed the Grad-CAM based approach, demonstrating the excellent potential of the proposed framework in challenging medical image segmentation tasks.
Anja Rappl, Thomas Kneib, Stefan Lang, Elisabeth Bergherr
Joint models for longitudinal and time-to-event data have seen many developments in recent years. Though spatial joint models are still rare and the traditional proportional hazards formulation of the time-to-event part of the model is accompanied by computational challenges. We propose a joint model with a piece-wise exponential formulation of the hazard using the counting process representation of a hazard and structured additive predictors able to estimate (non-)linear, spatial and random effects. Its capabilities are assessed in a simulation study comparing our approach to an established one and highlighted by an example on physical functioning after cardiovascular events from the German Ageing Survey. The Structured Piecewise Additive Joint Model yielded good estimation performance, also and especially in spatial effects, while being double as fast as the chosen benchmark approach and performing stable in imbalanced data setting with few events.
Gokularam Muthukrishnan, Sheetal Kalyani
Conventionally, in a differentially private additive noise mechanism, independent and identically distributed (i.i.d.) noise samples are added to each coordinate of the response. In this work, we formally present the addition of noise that is independent but not identically distributed (i.n.i.d.) across the coordinates to achieve tighter privacy-accuracy trade-off by exploiting coordinate-wise disparity in privacy leakage. In particular, we study the i.n.i.d. Gaussian and Laplace mechanisms and obtain the conditions under which these mechanisms guarantee privacy. The optimal choice of parameters that ensure these conditions are derived considering (weighted) mean squared and $\ell_{p}^{p}$-errors as measures of accuracy. Theoretical analyses and numerical simulations demonstrate that the i.n.i.d. mechanisms achieve higher utility for the given privacy requirements compared to their i.i.d. counterparts. One of the interesting observations is that the Laplace mechanism outperforms Gaussian even in high dimensions, as opposed to the popular belief, if the irregularity in coordinate-wise sensitivities is exploited. We also demonstrate how the i.n.i.d. noise can improve the performance in private (a) coordinate descent, (b) principal component analysis, and (c) deep learning with group clipping.
Baichuan Mo, Qing Yi Wang, Xiaotong Guo, Matthias Winkenbach, Jinhua Zhao
In last-mile delivery, drivers frequently deviate from planned delivery routes because of their tacit knowledge of the road and curbside infrastructure, customer availability, and other characteristics of the respective service areas. Hence, the actual stop sequences chosen by an experienced human driver may be potentially preferable to the theoretical shortest-distance routing under real-life operational conditions. Thus, being able to predict the actual stop sequence that a human driver would follow can help to improve route planning in last-mile delivery. This paper proposes a pair-wise attention-based pointer neural network for this prediction task using drivers' historical delivery trajectory data. In addition to the commonly used encoder-decoder architecture for sequence-to-sequence prediction, we propose a new attention mechanism based on an alternative specific neural network to capture the local pair-wise information for each pair of stops. To further capture the global efficiency of the route, we propose a new iterative sequence generation algorithm that is used after model training to identify the first stop of a route that yields the lowest operational cost. Results from an extensive case study on real operational data from Amazon's last-mile delivery operations in the US show that our proposed method can significantly outperform traditional optimization-based approaches and other machine learning methods (such as the Long Short-Term Memory encoder-decoder and the original pointer network) in finding stop sequences that are closer to high-quality routes executed by experienced drivers in the field. Compared to benchmark models, the proposed model can increase the average prediction accuracy of the first four stops from around 0.2 to 0.312, and reduce the disparity between the predicted route and the actual route by around 15%.
Chunyu Qiang, Peng Yang, Hao Che, Xiaorui Wang, Zhongyuan Wang
Cross-speaker style transfer in speech synthesis aims at transferring a style
from source speaker to synthesised speech of a target speaker's timbre. Most
previous approaches rely on data with style labels, but manually-annotated
labels are expensive and not always reliable. In response to this problem, we
propose Style-Label-Free, a cross-speaker style transfer method, which can
realize the style transfer from source speaker to target speaker without style
labels. Firstly, a reference encoder structure based on quantized variational
autoencoder (Q-VAE) and style bottleneck is designed to extract discrete style
representations. Secondly, a speaker-wise batch normalization layer is proposed
to reduce the source speaker leakage. In order to improve the style extraction
ability of the reference encoder, a style invariant and contrastive data
augmentation method is proposed. Experimental results show that the method
outperforms the baseline. We provide a website with audio samples.
Authors' comments: Published to ISCSLP 2022
Fangyuan Wang, Bo Xu, Bo Xu
Currently, the chunk-wise schemes are often used to make Automatic Speech
Recognition (ASR) models to support streaming deployment. However, existing
approaches are unable to capture the global context, lack support for parallel
training, or exhibit quadratic complexity for the computation of multi-head
self-attention (MHSA). On the other side, the causal convolution, no future
context used, has become the de facto module in streaming Conformer. In this
paper, we propose SSCFormer to push the limit of chunk-wise Conformer for
streaming ASR using the following two techniques: 1) A novel cross-chunks
context generation method, named Sequential Sampling Chunk (SSC) scheme, to
re-partition chunks from regular partitioned chunks to facilitate efficient
long-term contextual interaction within local chunks. 2)The Chunked Causal
Convolution (C2Conv) is designed to concurrently capture the left context and
chunk-wise future context. Evaluations on AISHELL-1 show that an End-to-End
(E2E) CER 5.33% can achieve, which even outperforms a strong time-restricted
baseline U2. Moreover, the chunk-wise MHSA computation in our model enables it
to train with a large batch size and perform inference with linear complexity.
Authors' comments: This manuscript has been accepted by SPL
Chang Zhang, Guo-Yin Zhang, Jin-Zeng Li, Jing-Hua Yuan
Massive young stellar objects (MYSOs) play a crucial role in star formation.
Given that MYSOs were previously identified based on the extended structure and
the observational data for them is limited, screening the Wide-field Infrared
Survey Explorer (WISE) objects showing green features (for the common coding of
the 4.6 $\mu$m band as green channel in three-color composite WISE images) will
yield more MYSO candidates. Using WISE images in the whole Galactic Plane ($
0^\circ<l<360^\circ $ and $\mid b \mid <2^\circ$), we identified sources with
strong emissions at 4.6 $\mu$m band, then according to morphological features
divided them into three groups. We present a catalog of 2135 WISE Green Objects
(WGOs). 264 WGOs have an extended structure. 1366 WGOs show compact green
feature but without extended structure. 505 WGOs have neither extended
structure nor green feature, but the intensity at 4.6 $\mu$m is numerically at
least 4.5 times that of 3.4 $\mu$m. According to the analysis of the
coordinates of WGOs, we find WGOs are mainly distributed in $\mid l \mid<
60^\circ$, coincident with the position of the giant molecular clouds in $\mid
l \mid> 60^\circ$. Matching results with various masers show that those three
groups of WGOs are at different evolutionary stages. After cross-matching WGOs
with published YSO survey catalogs, we infer that $\sim$50% of WGOs are samples
of newly discovered YSOs. In addition, 1260 WGOs are associated with Hi-GAL
sources, according to physical parameters estimated by spectral energy
distribution fitting, of which 231 are classified as robust MYSOs and 172 as
candidate MYSOs.
Authors' comments: 25 pages, 9 figures, accepted for publication in ApJS
E. J. Marchesini, V. Reynaldi, F. Vieyro, J. Saponara, I. Andruchow, I. E. López, P. Benaglia, S. A. Cellone et al.
Context. The gamma-ray emitting source WISE J141046.00+740511.2 has been
associated with a Fermi-LAT detection by crossmatching with Swift/XRT data. It
has shown all the canonical observational characteristics of a BL Lac source,
including a power-law, featureless optical spectrum. However, it was only
recently detected at radio frequencies and its radio flux is significantly low.
Aims. Given that a radio detection is fundamental to associate lower-energy
counterparts to Fermi-LAT sources, we aim to unambiguously classify this source
by performing a multiwavelength analysis based on contemporaneous data.
Methods. By using multifrequency observations at the Jansky Very Large Array,
Giant Metrewave Radio Telescope, Gran Telescopio Canarias, Gemini, William
Herschel Telescope and Liverpool observatories, together with Fermi-LAT and
Swift data, we carried out two kinds of analyses. On one hand, we studied
several known parameters that account for the radio loudness or weakness
characterization and their application to blazars (in general) and to our
source (in particular). And, on the other hand, we built and analyzed the
observed spectral energy distribution (SED) of this source to try to explain
its peculiar characteristics. Results. The multiwavelength analysis indicates
that WISE J141046.00+740511.2 is a blazar of the high-frequency peaked (HBL)
type that emits highly polarized light and that is likely located at a low
redshift. In addition, the one-zone model parameters that best fit its SED are
those of an extreme HBL (EHBL); this blazar type has been extensively predicted
in theory to be lacking in the radio emission that is otherwise typical of
canonical gamma-ray blazars. Conclusions. We confirm that WISE
J141046.00+740511.2 is indeed a highly polarized BL Lac of the HBL type.
Further studies will be conducted to explain the atypical low radio flux
detected for this source.
Authors' comments: accepted for publication in A&A, in press
Babhrubahan Bose
We study the relationship between the point-wise symmetry of Birkhoff-James orthogonality and the geometry of the space of operators $\mathbb{B}(\ell_\infty^n,\ell_1^m)$. We show that any non-zero left-symmetric point in this space is a smooth point. We also show that for $n\geq4$, any unit norm right-symmetric point of this space is an extreme point of the closed unit ball. This marks the first step towards characterizing the extreme points of these unit balls and finding the Grothendieck constants $G(m,n)$ using Birkhoff-James orthogonality techniques.
Yucong Lin, Jinhua Su, Yuhang Li, Yuhao Wei, Hanchao Yan, Saining Zhang, Jiaan Luo, Danni Ai et al.
Deep learning methods have contributed substantially to the rapid advancement of medical image segmentation, the quality of which relies on the suitable design of loss functions. Popular loss functions, including the cross-entropy and dice losses, often fall short of boundary detection, thereby limiting high-resolution downstream applications such as automated diagnoses and procedures. We developed a novel loss function that is tailored to reflect the boundary information to enhance the boundary detection. As the contrast between segmentation and background regions along the classification boundary naturally induces heterogeneity over the pixels, we propose the piece-wise two-sample t-test augmented (PTA) loss that is infused with the statistical test for such heterogeneity. We demonstrate the improved boundary detection power of the PTA loss compared to benchmark losses without a t-test component.
Ruoyi Zhang, Haibo Yuan
We have obtained accurate dust reddening from far-ultraviolet (UV) to the
mid-infrared (IR) for up to 5 million stars by the star-pair algorithm based on
LAMOST stellar parameters along with GALEX, Pan-STARRS 1, Gaia, SDSS, 2MASS,
and WISE photometric data. The typical errors are between 0.01-0.03 mag for
most colors. We derived the empirical reddening coefficients for 21 colors both
in the traditional (single valued) way and as a function of Teff and E(B-V) by
using the largest samples of accurate reddening measurements, together with the
extinction values from Schlegel et al. The corresponding extinction
coefficients have also been obtained. The results are compared with model
predictions and generally in good agreement. Comparisons with measurements in
the literature show that the Teff- and E(B-V)-dependent coefficients explain
the discrepancies between different measurements naturally, i.e., using sample
stars of different temperatures and reddening. Our coefficients are mostly
valid in the extinction range of 0-0.5 mag and the temperature range of
4000-10000 K. We recommend that the new Teff and E(B-V) dependent reddening and
extinction coefficients should be used in the future. A Python package is also
provided for the usage of the coefficient
Authors' comments: 21 pages, 9 figures, 4 table (including appendix). Published in the
Astrophysical Journal Supplement Series
Yiming Li, Zhifang Guo, Zhirong Ye, Xiangdong Wang, Hong Liu, Yueliang Qian, Rui Tao, Long Yan et al.
In this paper, we describe in detail our system for DCASE 2022 Task4. The
system combines two considerably different models: an end-to-end Sound Event
Detection Transformer (SEDT) and a frame-wise model, Metric Learning and Focal
Loss CNN (MLFL-CNN). The former is an event-wise model which learns event-level
representations and predicts sound event categories and boundaries directly,
while the latter is based on the widely adopted frame-classification scheme,
under which each frame is classified into event categories and event boundaries
are obtained by post-processing such as thresholding and smoothing. For SEDT,
self-supervised pre-training using unlabeled data is applied, and
semi-supervised learning is adopted by using an online teacher, which is
updated from the student model using the Exponential Moving Average (EMA)
strategy and generates reliable pseudo labels for weakly-labeled and unlabeled
data. For the frame-wise model, the ICT-TOSHIBA system of DCASE 2021 Task 4 is
used. Experimental results show that the hybrid system considerably outperforms
either individual model and achieves psds1 of 0.420 and psds2 of 0.783 on the
validation set without external data. The code is available at
https://github.com/965694547/Hybrid-system-of-frame-wise-model-and-SEDT.
Authors' comments: 5 pages, 2 figures, accepted for publication in DCASE2022 Workshop
Haibo Yang, Shengjie Zhang, Xiaoyang Han, Botao Zhao, Yan Ren, Yaru Sheng, Xiao-Yong Zhang
Small lesions in magnetic resonance imaging (MRI) images are crucial for
clinical diagnosis of many kinds of diseases. However, the MRI quality can be
easily degraded by various noise, which can greatly affect the accuracy of
diagnosis of small lesion. Although some methods for denoising MR images have
been proposed, task-specific denoising methods for improving the diagnosis
confidence of small lesions are lacking. In this work, we propose a voxel-wise
hybrid residual MLP-CNN model to denoise three-dimensional (3D) MR images with
small lesions. We combine basic deep learning architecture, MLP and CNN, to
obtain an appropriate inherent bias for the image denoising and integrate each
output layers in MLP and CNN by adding residual connections to leverage
long-range information. We evaluate the proposed method on 720 T2-FLAIR brain
images with small lesions at different noise levels. The results show the
superiority of our method in both quantitative and visual evaluations on
testing dataset compared to state-of-the-art methods. Moreover, two experienced
radiologists agreed that at moderate and high noise levels, our method
outperforms other methods in terms of recovery of small lesions and overall
image denoising quality. The implementation of our method is available at
https://github.com/laowangbobo/Residual_MLP_CNN_Mixer.
Authors' comments: accepted by MICCAI 2022