Helal El-Zaatari, Fei Yu, Michael R Kosorok
Statistical analysis of social networks provides valuable insights into
complex network interactions across various scientific disciplines. However,
accurate modeling of networks remains challenging due to the heavy
computational burden and the need to account for observed network dependencies.
Exponential Random Graph Models (ERGMs) have emerged as a promising technique
used in social network modeling to capture network dependencies by
incorporating endogenous variables. Nevertheless, using ERGMs poses multiple
challenges, including the occurrence of ERGM degeneracy, which generates
unrealistic and meaningless network structures. To address these challenges and
enhance the modeling of collaboration networks, we propose and test a novel
approach that focuses on endogenous variable selection within ERGMs. Our method
aims to overcome the computational burden and improve the accommodation of
observed network dependencies, thereby facilitating more accurate and
meaningful interpretations of network phenomena in various scientific fields.
We conduct empirical testing and rigorous analysis to contribute to the
advancement of statistical techniques and offer practical insights for network
analysis.
Authors' comments: 23 pages, 6 tables and 18 figures
Yosuke Shinya
Scale-wise evaluation of object detectors is important for real-world
applications. However, existing metrics are either coarse or not sufficiently
reliable. In this paper, we propose novel scale-wise metrics that strike a
balance between fineness and reliability, using a filter bank consisting of
triangular and trapezoidal band-pass filters. We conduct experiments with two
methods on two datasets and show that the proposed metrics can highlight the
differences between the methods and between the datasets. Code is available at
https://github.com/shinya7y/UniverseNet .
Authors' comments: Honorable Mention Solution Award in Small Object Detection Challenge
for Spotting Birds, International Conference on Machine Vision Applications
(MVA) 2023
Zhuoling Li, Chunrui Han, Zheng Ge, Jinrong Yang, En Yu, Haoqian Wang, Hengshuang Zhao, Xiangyu Zhang
Efficiency is quite important for 3D lane detection due to practical deployment demand. In this work, we propose a simple, fast, and end-to-end detector that still maintains high detection precision. Specifically, we devise a set of fully convolutional heads based on row-wise classification. In contrast to previous counterparts, ours supports recognizing both vertical and horizontal lanes. Besides, our method is the first one to perform row-wise classification in bird-eye-view. In the heads, we split feature into multiple groups and every group of feature corresponds to a lane instance. During training, the predictions are associated with lane labels using the proposed single-win one-to-one matching to compute loss, and no post-processing operation is demanded for inference. In this way, our proposed fully convolutional detector, GroupLane, realizes end-to-end detection like DETR. Evaluated on 3 real world 3D lane benchmarks, OpenLane, Once-3DLanes, and OpenLane-Huawei, GroupLane adopting ConvNext-Base as the backbone outperforms the published state-of-the-art PersFormer by 13.6% F1 score in the OpenLane validation set. Besides, GroupLane with ResNet18 still surpasses PersFormer by 4.9% F1 score, while the inference speed is nearly 7x faster and the FLOPs is only 13.3% of it.
Liu Liu, Shuaifeng Zhi, Zhenhua Du, Li Liu, Xinyu Zhang, Kai Huo, Weidong Jiang
Radars, due to their robustness to adverse weather conditions and ability to measure object motions, have served in autonomous driving and intelligent agents for years. However, Radar-based perception suffers from its unintuitive sensing data, which lack of semantic and structural information of scenes. To tackle this problem, camera and Radar sensor fusion has been investigated as a trending strategy with low cost, high reliability and strong maintenance. While most recent works explore how to explore Radar point clouds and images, rich contextual information within Radar observation are discarded. In this paper, we propose a hybrid point-wise Radar-Optical fusion approach for object detection in autonomous driving scenarios. The framework benefits from dense contextual information from both the range-doppler spectrum and images which are integrated to learn a multi-modal feature representation. Furthermore, we propose a novel local coordinate formulation, tackling the object detection task in an object-centric coordinate. Extensive results show that with the information gained from optical images, we could achieve leading performance in object detection (97.69\% recall) compared to recent state-of-the-art methods FFT-RadNet (82.86\% recall). Ablation studies verify the key design choices and practicability of our approach given machine generated imperfect detections. The code will be available at https://github.com/LiuLiu-55/ROFusion.
Yasar Abbas Ur Rehman, Yan Gao, Pedro Porto Buarque de Gusmão, Mina Alibeigi, Jiajun Shen, Nicholas D. Lane
The ubiquity of camera-enabled devices has led to large amounts of unlabeled image data being produced at the edge. The integration of self-supervised learning (SSL) and federated learning (FL) into one coherent system can potentially offer data privacy guarantees while also advancing the quality and robustness of the learned visual representations without needing to move data around. However, client bias and divergence during FL aggregation caused by data heterogeneity limits the performance of learned visual representations on downstream tasks. In this paper, we propose a new aggregation strategy termed Layer-wise Divergence Aware Weight Aggregation (L-DAWA) to mitigate the influence of client bias and divergence during FL aggregation. The proposed method aggregates weights at the layer-level according to the measure of angular divergence between the clients' model and the global model. Extensive experiments with cross-silo and cross-device settings on CIFAR-10/100 and Tiny ImageNet datasets demonstrate that our methods are effective and obtain new SOTA performance on both contrastive and non-contrastive SSL approaches.
Pierre-Louis Lions, Benjamin Seeger
We consider linear and nonlinear transport equations with irregular velocity fields, motivated by models coming from mean field games. The velocity fields are assumed to increase in each coordinate, and the divergence therefore fails to be absolutely continuous with respect to the Lebesgue measure in general. For such velocity fields, the well-posedness of first- and second-order linear transport equations in Lebesgue spaces is established, as well as the existence and uniqueness of regular ODE and SDE Lagrangian flows. These results are then applied to the study of certain nonconservative, nonlinear systems of transport type, which are used to model mean field games in a finite state space. A notion of weak solution is identified for which a unique minimal and maximal solution exist, which do not coincide in general. A selection-by-noise result is established for a relevant example to demonstrate that different types of noise can select any of the admissible solutions in the vanishing noise limit.
Weiliang Chan, Qianqian Ren
Urban region embedding is an important and yet highly challenging issue due to the complexity and constantly changing nature of urban data. To address the challenges, we propose a Region-Wise Multi-View Representation Learning (ROMER) to capture multi-view dependencies and learn expressive representations of urban regions without the constraints of rigid neighbourhood region conditions. Our model focus on learn urban region representation from multi-source urban data. First, we capture the multi-view correlations from mobility flow patterns, POI semantics and check-in dynamics. Then, we adopt global graph attention networks to learn similarity of any two vertices in graphs. To comprehensively consider and share features of multiple views, a two-stage fusion module is further proposed to learn weights with external attention to fuse multi-view embeddings. Extensive experiments for two downstream tasks on real-world datasets demonstrate that our model outperforms state-of-the-art methods by up to 17\% improvement.
Yuyuan Li, Jiaming Zhang, Yixiu Liu, Chaochao Chen
Privacy concerns associated with machine learning models have driven research into machine unlearning, which aims to erase the memory of specific target training data from already trained models. This issue also arises in federated learning, creating the need to address the federated unlearning problem. However, federated unlearning remains a challenging task. On the one hand, current research primarily focuses on unlearning all data from a client, overlooking more fine-grained unlearning targets, e.g., class-wise and sample-wise removal. On the other hand, existing methods suffer from imprecise estimation of data influence and impose significant computational or storage burden. To address these issues, we propose a neuro-inspired federated unlearning framework based on active forgetting, which is independent of model architectures and suitable for fine-grained unlearning targets. Our framework distinguishes itself from existing methods by utilizing new memories to overwrite old ones. These new memories are generated through teacher-student learning. We further utilize refined elastic weight consolidation to mitigate catastrophic forgetting of non-target data. Extensive experiments on benchmark datasets demonstrate the efficiency and effectiveness of our method, achieving satisfactory unlearning completeness against backdoor attacks.
Peter A. Monkewitz
The scaling of Reynolds stresses in turbulent wall-bounded flows is the
subject of a long running debate. In the near-wall ``inner'' region, a sizeable
group, inspired by the ``attached eddy model'', has advocated the unlimited
growth of $\langle uu\rangle^+$ and in particular of its inner peak at
$y^+\approxeq 15$, with $\ln\Reytau$ \citep[see e.g.][and references
therein]{smitsetal2021}. Only recently, \citet{chen_sreeni2021,chen_sreeni2022}
have argued on the basis of bounded dissipation, that $\langle uu\rangle^+$
remains finite in the inner near-wall region for $\Reytau\rightarrow\infty$,
with finite Reynolds number corrections of order $\Reytau^{-1/4}$. In this
paper, the overlap between the two-term inner expansion $f_0(y^+) +
f_1(y^+)/\Reytau^{1/4}$ of \citet{monkewitz22} and the leading order outer
expansion for $\langle uu\rangle^+$ is shown to be of the form $C_0 +
C_1\,(y^+/\Reytau)^{1/4}$. With a new indicator function, overlaps of this form
are reliably identified in $\langle uu\rangle^+$ profiles for channels and
pipes, while the situation in boundary layers requires further clarification.
On the other hand, the standard logarithmic indicator function, evaluated for
the same data, shows no sign of a logarithmic law to connect an inner expansion
of $\langle uu\rangle^+$ growing as $\ln{\Reytau}$ to an outer expansion of
order unity.
Authors' comments: 10 pages, 5 figures
Can Cui, Ruining Deng, Quan Liu, Tianyuan Yao, Shunxing Bao, Lucas W. Remedios, Yucheng Tang, Yuankai Huo
The Segment Anything Model (SAM) is a recently proposed prompt-based segmentation model in a generic zero-shot segmentation approach. With the zero-shot segmentation capacity, SAM achieved impressive flexibility and precision on various segmentation tasks. However, the current pipeline requires manual prompts during the inference stage, which is still resource intensive for biomedical image segmentation. In this paper, instead of using prompts during the inference stage, we introduce a pipeline that utilizes the SAM, called all-in-SAM, through the entire AI development workflow (from annotation generation to model finetuning) without requiring manual prompts during the inference stage. Specifically, SAM is first employed to generate pixel-level annotations from weak prompts (e.g., points, bounding box). Then, the pixel-level annotations are used to finetune the SAM segmentation model rather than training from scratch. Our experimental results reveal two key findings: 1) the proposed pipeline surpasses the state-of-the-art (SOTA) methods in a nuclei segmentation task on the public Monuseg dataset, and 2) the utilization of weak and few annotations for SAM finetuning achieves competitive performance compared to using strong pixel-wise annotated data.
Dejene Zewdie, Roberto J. Assef, Chiara Mazzucchelli, Manuel Aravena, Andrew W. Blain, Tanio Daz-Santos, Peter R. M. Eisenhardt, Hyunsung D. Jun et al.
We report the identification of Lyman Break Galaxy (LBG) candidates around
the most luminous Hot Dust-Obscured Galaxy (Hot DOG) known, WISE
J224607.56$-$052634.9 (W2246$-$0526) at $z=4.601$, using deep \textit{r}-,
\textit{i}-, and \textit{z}-band imaging from the Gemini Multi-Object
Spectrograph South (GMOS-S). We use the surface density of LBGs to probe the
Mpc-scale environment of W2246$-$0526 to characterize its richness and
evolutionary state. We identify LBG candidates in the vicinity of W2246$-$0526
using the selection criteria developed by \cite{2004VOuchi} and
\cite{2006Yoshida} in the Subaru Deep Field and in the Subaru XMM-Newton Deep
Field, slightly modified to account for the difference between the filters
used, and we find 37 and 55 LBG candidates, respectively. Matching to the
$z$-band depths of those studies, this corresponds to $\delta =
5.8^{+2.4}_{-1.9}$ times the surface density of LBGs expected in the field.
Interestingly, the Hot DOG itself, as well as a confirmed neighbor, do not
satisfy either LBG selection criteria, suggesting we may be missing a large
number of companion galaxies. Our analysis shows that we are most likely only
finding those with higher-than-average IGM optical depth or moderately high
dust obscuration. The number density of LBG candidates is not concentrated
around W2246$-$0526, suggesting either an early evolutionary stage for the
proto-cluster or that the Hot DOG may not be the most massive galaxy, or that
the Hot DOG may be affecting the IGM transparency in its vicinity. The
overdensity around W2246$-$0526 is comparable to overdensities found around
other Hot DOGs and is somewhat higher than typically found for radio galaxies
and luminous quasars at a similar redshift.
Authors' comments: 20 pages, 15 figures. The main results are in Figures 9 and 12.
Accepted for publication in A&A
Meng Xiao, Dongjie Wang, Min Wu, Kunpeng Liu, Hui Xiong, Yuanchun Zhou, Yanjie Fu
Feature transformation aims to reconstruct an effective representation space
by mathematically refining the existing features. It serves as a pivotal
approach to combat the curse of dimensionality, enhance model generalization,
mitigate data sparsity, and extend the applicability of classical models.
Existing research predominantly focuses on domain knowledge-based feature
engineering or learning latent representations. However, these methods, while
insightful, lack full automation and fail to yield a traceable and optimal
representation space. An indispensable question arises: Can we concurrently
address these limitations when reconstructing a feature space for a
machine-learning task? Our initial work took a pioneering step towards this
challenge by introducing a novel self-optimizing framework. This framework
leverages the power of three cascading reinforced agents to automatically
select candidate features and operations for generating improved feature
transformation combinations. Despite the impressive strides made, there was
room for enhancing its effectiveness and generalization capability. In this
extended journal version, we advance our initial work from two distinct yet
interconnected perspectives: 1) We propose a refinement of the original
framework, which integrates a graph-based state representation method to
capture the feature interactions more effectively and develop different
Q-learning strategies to alleviate Q-value overestimation further. 2) We
utilize a new optimization technique (actor-critic) to train the entire
self-optimizing framework in order to accelerate the model convergence and
improve the feature transformation performance. Finally, to validate the
improved effectiveness and generalization capability of our framework, we
perform extensive experiments and conduct comprehensive analyses.
Authors' comments: 21 pages, submitted to TKDD. arXiv admin note: text overlap with
arXiv:2209.08044, arXiv:2205.14526
Alireza Daneshyar, Leon Herrmann, Stefan Kollmannsberger
Ductile damage models and cohesive laws incorporate the material plasticity entailing the growth of irrecoverable deformations even after complete failure. This unrealistic growth remains concealed until the unilateral effects arising from the crack closure emerge. We address this issue by proposing a new strategy to cope with the entire process of failure, from the very inception in the form of diffuse damage to the final stage, i.e. the emergence of sharp cracks. To this end, we introduce a new strain field, termed discontinuity strain, to the conventional additive strain decomposition to account for discontinuities in a continuous sense so that the standard principle of virtual work applies. We treat this strain field similar to a strong discontinuity, yet without introducing new kinematic variables and nonlinear boundary conditions. In this paper, we demonstrate the effectiveness of this new strategy at a simple ductile damage constitutive model. The model uses a scalar damage index to control the degradation process. The discontinuity strain field is injected into the strain decomposition if this damage index exceeds a certain threshold. The threshold corresponds to the limit at which the induced imperfections merge and form a discrete crack. With three-point bending tests under pure mode I and mixed-mode conditions, we demonstrate that this augmentation does not show the early crack closure artifact which is wrongly predicted by plastic damage formulations at load reversal. We also use the concrete damaged plasticity model provided in Abaqus commercial finite element program for our comparison. Lastly, a high-intensity low-cycle fatigue test demonstrates the unilateral effects resulting from the complete closure of the induced crack.
Yulan Liu, Yuyang Zhou, Rongrong Lin
This paper characterizes the proximal operator of the piece-wise exponential function $1\!-\!e^{-|x|/\sigma}$ with a given shape parameter $\sigma\!>\!0$, which is a popular nonconvex surrogate of $\ell_0$-norm in support vector machines, zero-one programming problems, and compressed sensing, etc. Although Malek-Mohammadi et al. [IEEE Transactions on Signal Processing, 64(21):5657--5671, 2016] once worked on this problem, the expressions they derived were regrettably inaccurate. In a sense, it was lacking a case. Using the Lambert W function and an extensive study of the piece-wise exponential function, we have rectified the formulation of the proximal operator of the piece-wise exponential function in light of their work. We have also undertaken a thorough analysis of this operator. Finally, as an application in compressed sensing, an iterative shrinkage and thresholding algorithm (ISTA) for the piece-wise exponential function regularization problem is developed and fully investigated. A comparative study of ISTA with nine popular non-convex penalties in compressed sensing demonstrates the advantage of the piece-wise exponential penalty.
Mi Qian, Fei Ji, Yao Ge, Miaowen Wen, Xiang Cheng, H. Vincent Poor
As a promising technique for high-mobility wireless communications,
orthogonal time frequency space (OTFS) has been proved to enjoy excellent
advantages with respect to traditional orthogonal frequency division
multiplexing (OFDM). Although multiple studies have considered index modulation
(IM) based OTFS (IM-OTFS) schemes to further improve system performance, a
challenging and open problem is the development of effective IM schemes and
efficient receivers for practical OTFS systems that must operate in the
presence of channel delays and Doppler shifts. In this paper, we propose two
novel block-wise IM schemes for OTFS systems, named delay-IM with OTFS
(DeIM-OTFS) and Doppler-IM with OTFS (DoIM-OTFS), where a block of
delay/Doppler resource bins are activated simultaneously. Based on a maximum
likelihood (ML) detector, we analyze upper bounds on the average bit error
rates for the proposed DeIM-OTFS and DoIM-OTFS schemes, and verify their
performance advantages over the existing IM-OTFS systems. We also develop a
multi-layer joint symbol and activation pattern detection (MLJSAPD) algorithm
and a customized message passing detection (CMPD) algorithm for our proposed
DeIMOTFS and DoIM-OTFS systems with low complexity. Simulation results
demonstrate that our proposed MLJSAPD and CMPD algorithms can achieve desired
performance with robustness to the imperfect channel state information (CSI).
Authors' comments: arXiv admin note: text overlap with arXiv:2210.13454
Jiuyu Liu, Yi Ma, Rahim Tafazolli
Numerous low-complexity iterative algorithms have been proposed to offer the
performance of linear multiple-input multiple-output (MIMO) detectors bypassing
the channel matrix inverse. These algorithms exhibit fast convergence in
well-conditioned MIMO channels. However, in the emerging MIMO paradigm
utilizing extremely large aperture arrays (ELAA), the wireless channel may
become ill-conditioned because of spatial non-stationarity, which results in a
considerably slower convergence rate for these algorithms. In this paper, we
propose a novel ELAA-MIMO detection scheme that leverages user-wise singular
value decomposition (UW-SVD) to accelerate the convergence of these iterative
algorithms. By applying UW-SVD, the MIMO signal model can be converted into an
equivalent form featuring a better-conditioned transfer function. Then,
existing iterative algorithms can be utilized to recover the transmitted signal
from the converted signal model with accelerated convergence towards
zero-forcing performance. Our simulation results indicate that proposed UW-SVD
scheme can significantly accelerate the convergence of the iterative algorithms
in spatially non-stationary ELAA channels. Moreover, the computational
complexity of the UW-SVD is comparatively minor in relation to the inherent
complexity of the iterative algorithms.
Authors' comments: Legend correction to Fig. 2(b)
Paul Primus and, Gerhard Widmer
Varying conditions between the data seen at training and at application time
remain a major challenge for machine learning. We study this problem in the
context of Acoustic Scene Classification (ASC) with mismatching recording
devices. Previous works successfully employed frequency-wise normalization of
inputs and hidden layer activations in convolutional neural networks to reduce
the recording device discrepancy. The main objective of this work was to adopt
frequency-wise normalization for Audio Spectrogram Transformers (ASTs), which
have recently become the dominant model architecture in ASC. To this end, we
first investigate how recording device characteristics are encoded in the
hidden layer activations of ASTs. We find that recording device information is
initially encoded in the frequency dimension; however, after the first
self-attention block, it is largely transformed into the token dimension. Based
on this observation, we conjecture that suppressing recording device
characteristics in the input spectrogram is the most effective. We propose a
frequency-centering operation for spectrograms that improves the ASC
performance on unseen recording devices on average by up to 18.2 percentage
points.
Authors' comments: EUSIPCO 2023
Lan Anh Thi Nguy, Bach Nguyen Gia, Thanh Tu Thi Nguyen, Kamioka Eiji, Tan Xuan Phan
Eye blinking detection in the wild plays an essential role in deception detection, driving fatigue detection, etc. Despite the fact that numerous attempts have already been made, the majority of them have encountered difficulties, such as the derived eye images having different resolutions as the distance between the face and the camera changes; or the requirement of a lightweight detection model to obtain a short inference time in order to perform in real-time. In this research, two problems are addressed: how the eye blinking detection model can learn efficiently from different resolutions of eye pictures in diverse conditions; and how to reduce the size of the detection model for faster inference time. We propose to utilize upsampling and downsampling the input eye images to the same resolution as one potential solution for the first problem, then find out which interpolation method can result in the highest performance of the detection model. For the second problem, although a recent spatiotemporal convolutional neural network used for eye blinking detection has a strong capacity to extract both spatial and temporal characteristics, it remains having a high number of network parameters, leading to high inference time. Therefore, using Depth-wise Separable Convolution rather than conventional convolution layers inside each branch is considered in this paper as a feasible solution.
Dong Xing, Pengjie Gu, Qian Zheng, Xinrun Wang, Shanqi Liu, Longtao Zheng, Bo An, Gang Pan
Ad hoc teamwork requires an agent to cooperate with unknown teammates without
prior coordination. Many works propose to abstract teammate instances into
high-level representation of types and then pre-train the best response for
each type. However, most of them do not consider the distribution of teammate
instances within a type. This could expose the agent to the hidden risk of
\emph{type confounding}. In the worst case, the best response for an abstract
teammate type could be the worst response for all specific instances of that
type. This work addresses the issue from the lens of causal inference. We first
theoretically demonstrate that this phenomenon is due to the spurious
correlation brought by uncontrolled teammate distribution. Then, we propose our
solution, CTCAT, which disentangles such correlation through an instance-wise
teammate feedback rectification. This operation reweights the interaction of
teammate instances within a shared type to reduce the influence of type
confounding. The effect of CTCAT is evaluated in multiple domains, including
classic ad hoc teamwork tasks and real-world scenarios. Results show that CTCAT
is robust to the influence of type confounding, a practical issue that directly
hazards the robustness of our trained agents but was unnoticed in previous
works.
Authors' comments: Accepted by ICML 2023
Claudia Chinea Hammecher, Karin van Garderen, Marion Smits, Pieter Wesseling, Bart Westerman, Pim French, Mathilde Kouwenhoven, Roel Verhaak et al.
Glioma growth may be quantified with longitudinal image registration.
However, the large mass-effects and tissue changes across images pose an added
challenge. Here, we propose a longitudinal, learning-based, and groupwise
registration method for the accurate and unbiased registration of glioma MRI.
We evaluate on a dataset from the Glioma Longitudinal AnalySiS consortium and
compare it to classical registration methods. We achieve comparable Dice
coefficients, with more detailed registrations, while significantly reducing
the runtime to under a minute. The proposed methods may serve as an alternative
to classical toolboxes, to provide further insight into glioma growth.
Authors' comments: Digital poster presented at the annual meeting of the International
Society for Magnetic Resonance in Medicine (ISMRM) 2023. A 6 minute video
about this work is available for browsing by the conference website (Program
number: 4361)