Naoki Matsunaga, Masato Ishii, Akio Hayakawa, Kenji Suzuki, Takuya Narihira
Our goal is to develop fine-grained real-image editing methods suitable for
real-world applications. In this paper, we first summarize four requirements
for these methods and propose a novel diffusion-based image editing framework
with pixel-wise guidance that satisfies these requirements. Specifically, we
train pixel-classifiers with a few annotated data and then infer the
segmentation map of a target image. Users then manipulate the map to instruct
how the image will be edited. We utilize a pre-trained diffusion model to
generate edited images aligned with the user's intention with pixel-wise
guidance. The effective combination of proposed guidance and other techniques
enables highly controllable editing with preserving the outside of the edited
area, which results in meeting our requirements. The experimental results
demonstrate that our proposal outperforms the GAN-based method for editing
quality and speed.
Authors' comments: Accepted by AI for Content Creation (AI4CC) workshop at CVPR 2023
Jiyan He, Xuechen Li, Da Yu, Huishuai Zhang, Janardhan Kulkarni, Yin Tat Lee, Arturs Backurs, Nenghai Yu et al.
Differentially private deep learning has recently witnessed advances in
computational efficiency and privacy-utility trade-off. We explore whether
further improvements along the two axes are possible and provide affirmative
answers leveraging two instantiations of \emph{group-wise clipping}. To reduce
the compute time overhead of private learning, we show that \emph{per-layer
clipping}, where the gradient of each neural network layer is clipped
separately, allows clipping to be performed in conjunction with backpropagation
in differentially private optimization. This results in private learning that
is as memory-efficient and almost as fast per training update as non-private
learning for many workflows of interest. While per-layer clipping with constant
thresholds tends to underperform standard flat clipping, per-layer clipping
with adaptive thresholds matches or outperforms flat clipping under given
training epoch constraints, hence attaining similar or better task performance
within less wall time. To explore the limits of scaling (pretrained) models in
differentially private deep learning, we privately fine-tune the 175
billion-parameter GPT-3. We bypass scaling challenges associated with clipping
gradients that are distributed across multiple devices with \emph{per-device
clipping} that clips the gradient of each model piece separately on its host
device. Privately fine-tuning GPT-3 with per-device clipping achieves a task
performance at $\epsilon=1$ better than what is attainable by non-privately
fine-tuning the largest GPT-2 on a summarization task.
Authors' comments: 25 pages
Qianwen Meng, Hangwei Qian, Yong Liu, Lizhen Cui, Yonghui Xu, Zhiqi Shen
Learning semantic-rich representations from raw unlabeled time series data is
critical for downstream tasks such as classification and forecasting.
Contrastive learning has recently shown its promising representation learning
capability in the absence of expert annotations. However, existing contrastive
approaches generally treat each instance independently, which leads to false
negative pairs that share the same semantics. To tackle this problem, we
propose MHCCL, a Masked Hierarchical Cluster-wise Contrastive Learning model,
which exploits semantic information obtained from the hierarchical structure
consisting of multiple latent partitions for multivariate time series.
Motivated by the observation that fine-grained clustering preserves higher
purity while coarse-grained one reflects higher-level semantics, we propose a
novel downward masking strategy to filter out fake negatives and supplement
positives by incorporating the multi-granularity information from the
clustering hierarchy. In addition, a novel upward masking strategy is designed
in MHCCL to remove outliers of clusters at each partition to refine prototypes,
which helps speed up the hierarchical clustering process and improves the
clustering quality. We conduct experimental evaluations on seven widely-used
multivariate time series datasets. The results demonstrate the superiority of
MHCCL over the state-of-the-art approaches for unsupervised time series
representation learning.
Authors' comments: accepted by AAAI 2023
Yassine Kamri, Julien M. Hendrickx, François Glineur
We propose a unifying framework for the automated computer-assisted worst-case analysis of cyclic block coordinate algorithms in the unconstrained smooth convex optimization setup. We compute exact worst-case bounds for the cyclic coordinate descent and the alternating minimization algorithms over the class of smooth convex functions, and provide sublinear upper and lower bounds on the worst-case rate for the standard class of functions with coordinate-wise Lipschitz gradients. We obtain in particular a new upper bound for cyclic coordinate descent that outperforms the best available ones by an order of magnitude. We also demonstrate the flexibility of our approach by providing new numerical bounds using simpler and more natural assumptions than those normally made for the analysis of block coordinate algorithms. Finally, we provide numerical evidence for the fact that a standard scheme that provably accelerates random coordinate descent to a $O(1/k^2)$ complexity is actually inefficient when used in a (deterministic) cyclic algorithm.
Yuyuan Liu, Choubo Ding, Yu Tian, Guansong Pang, Vasileios Belagiannis, Ian Reid, Gustavo Carneiro
Semantic segmentation models classify pixels into a set of known
(``in-distribution'') visual classes. When deployed in an open world, the
reliability of these models depends on their ability not only to classify
in-distribution pixels but also to detect out-of-distribution (OoD) pixels.
Historically, the poor OoD detection performance of these models has motivated
the design of methods based on model re-training using synthetic training
images that include OoD visual objects. Although successful, these re-trained
methods have two issues: 1) their in-distribution segmentation accuracy may
drop during re-training, and 2) their OoD detection accuracy does not
generalise well to new contexts (e.g., country surroundings) outside the
training set (e.g., city surroundings). In this paper, we mitigate these issues
with: (i) a new residual pattern learning (RPL) module that assists the
segmentation model to detect OoD pixels without affecting the inlier
segmentation performance; and (ii) a novel context-robust contrastive learning
(CoroCL) that enforces RPL to robustly detect OoD pixels among various
contexts. Our approach improves by around 10\% FPR and 7\% AuPRC the previous
state-of-the-art in Fishyscapes, Segment-Me-If-You-Can, and RoadAnomaly
datasets. Our code is available at: https://github.com/yyliu01/RPL.
Authors' comments: The paper contains 16 pages and it is accepted by ICCV'23
Liang Li
Gamma-ray bursts (GRBs) exhibit a diversity of spectra. Several spectral
models (e.g., Band, cutoff power-law, and blackbody) and their hybrid versions
(e.g., Band+blackbody) have been widely used to fit the observed GRB spectra.
Here, we attempt to collect all the bursts detected by {\it Fermi}-GBM with
known redshifts from July 2008 to May 2022, motivated to (i) provide a
parameter catalog independent from the official \emph{Fermi}/GBM team and (ii)
achieve a ``clean" model-based GRB spectral-energy correlation analysis. A
nearly complete GRB sample was created, containing 153 such bursts (136 long
gamma-ray bursts and 17 short gamma-ray bursts). Using the sample and by
performing detailed spectral analysis and model comparisons, we investigate two
GRB spectral-energy correlations: the cosmological rest-frame peak energy
($E_{\rm p,z}$) of the $\nu F_\nu$ prompt emission spectrum correlated with (i)
the isotropic-bolometric-equivalent emission energy $E_{\gamma, \rm iso}$ (the
Amati relation), and (ii) the isotropic-bolometric-equivalent peak luminosity
$L_{\rm p, iso}$ (the Yonetoku relation). From a linear regression analysis, a
tight correlation between $E_{\rm p,z}$ and $E_{\gamma, \rm iso}$ (and
$L_{\gamma,\rm iso}$) is found for both the Band-like and CPL-like bursts. More
interestingly, the CPL-like bursts do not fall on the Band-like burst Amati and
Yonetoku correlations, suggesting distinct radiation processes, and pointing
towards the fact that these spectral-energy correlations are tightly reliant on
the model-wise properties.
Authors' comments: 35 pages, 7 figures including 24 panels, 7 tables, accepted for
publication in The Astrophysical Journal Supplement Series
Jongho Park, Jinchao Xu, Xiaofeng Xu
In this paper, we propose a novel algorithm called Neuron-wise Parallel
Subspace Correction Method (NPSC) for the finite neuron method that
approximates numerical solutions of partial differential equations (PDEs) using
neural network functions. Despite extremely extensive research activities in
applying neural networks for numerical PDEs, there is still a serious lack of
effective training algorithms that can achieve adequate accuracy, even for
one-dimensional problems. Based on recent results on the spectral properties of
linear layers and landscape analysis for single neuron problems, we develop a
special type of subspace correction method that optimizes the linear layer and
each neuron in the nonlinear layer separately. An optimal preconditioner that
resolves the ill-conditioning of the linear layer is presented for
one-dimensional problems, so that the linear layer is trained in a uniform
number of iterations with respect to the number of neurons. In each single
neuron problem, a good local minimum that avoids flat energy regions is found
by a superlinearly convergent algorithm. Numerical experiments on function
approximation problems and PDEs demonstrate better performance of the proposed
method than other gradient-based methods.
Authors' comments: 24 pages, 6 figures
Amit Daniely, Elad Granot
We investigate the sample complexity of bounded two-layer neural networks
using different activation functions.
In particular, we consider the class
$$ \mathcal{H} = \left\{\textbf{x}\mapsto \langle \textbf{v}, \sigma \circ
W\textbf{b} + \textbf{b} \rangle : \textbf{b}\in\mathbb{R}^d, W \in
\mathbb{R}^{\mathcal{T}\times d}, \textbf{v} \in
\mathbb{R}^{\mathcal{T}}\right\} $$
where the spectral norm of $W$ and $\textbf{v}$ is bounded by $O(1)$, the
Frobenius norm of $W$ is bounded from its initialization by $R > 0$, and
$\sigma$ is a Lipschitz activation function.
We prove that if $\sigma$ is element-wise, then the sample complexity of
$\mathcal{H}$ has only logarithmic dependency in width and that this complexity
is tight, up to logarithmic factors.
We further show that the element-wise property of $\sigma$ is essential for a
logarithmic dependency bound in width, in the sense that there exist
non-element-wise activation functions whose sample complexity is linear in
width, for widths that can be up to exponential in the input dimension.
For the upper bound, we use the recent approach for norm-based bounds named
Approximate Description Length (ADL) by arXiv:1910.05697.
We further develop new techniques and tools for this approach that will
hopefully inspire future works.
Authors' comments: 13 pages
Kang Fu, Jianwei Hu, Seydou Keita, Hao Liu
The stochastic block model is a popular tool for detecting community structures in network data. Detecting the difference between two community structures is an important issue for stochastic block models. However, the two-sample test has been a largely under-explored domain, and too little work has been devoted to it. In this article, based on the maximum entry--wise deviation of the two centered and rescaled adjacency matrices, we propose a novel test statistic to test two samples of stochastic block models. We prove that the null distribution of the proposed test statistic converges in distribution to a Gumbel distribution, and we show the change of the two samples from stochastic block models can be tested via the proposed method. Then, we show that the proposed test has an asymptotic power guarantee against alternative models. One noticeable advantage of the proposed test statistic is that the number of communities can be allowed to grow linearly up to a logarithmic factor. Further, we extend the proposed method to the degree-corrected stochastic block model. Both simulation studies and real-world data examples indicate that the proposed method works well.
Mu Chen, Zhedong Zheng, Yi Yang, Tat-Seng Chua
Unsupervised Domain Adaptation (UDA) aims to enhance the generalization of the learned model to other domains. The domain-invariant knowledge is transferred from the model trained on labeled source domain, e.g., video game, to unlabeled target domains, e.g., real-world scenarios, saving annotation expenses. Existing UDA methods for semantic segmentation usually focus on minimizing the inter-domain discrepancy of various levels, e.g., pixels, features, and predictions, for extracting domain-invariant knowledge. However, the primary intra-domain knowledge, such as context correlation inside an image, remains underexplored. In an attempt to fill this gap, we propose a unified pixel- and patch-wise self-supervised learning framework, called PiPa, for domain adaptive semantic segmentation that facilitates intra-image pixel-wise correlations and patch-wise semantic consistency against different contexts. The proposed framework exploits the inherent structures of intra-domain images, which: (1) explicitly encourages learning the discriminative pixel-wise features with intra-class compactness and inter-class separability, and (2) motivates the robust feature learning of the identical patch against different contexts or fluctuations. Extensive experiments verify the effectiveness of the proposed method, which obtains competitive accuracy on the two widely-used UDA benchmarks, i.e., 75.6 mIoU on GTA to Cityscapes and 68.2 mIoU on Synthia to Cityscapes. Moreover, our method is compatible with other UDA approaches to further improve the performance without introducing extra parameters.
Stephanie Schoch, Haifeng Xu, Yangfeng Ji
Data valuation, or the valuation of individual datum contributions, has seen
growing interest in machine learning due to its demonstrable efficacy for tasks
such as noisy label detection. In particular, due to the desirable axiomatic
properties, several Shapley value approximation methods have been proposed. In
these methods, the value function is typically defined as the predictive
accuracy over the entire development set. However, this limits the ability to
differentiate between training instances that are helpful or harmful to their
own classes. Intuitively, instances that harm their own classes may be noisy or
mislabeled and should receive a lower valuation than helpful instances. In this
work, we propose CS-Shapley, a Shapley value with a new value function that
discriminates between training instances' in-class and out-of-class
contributions. Our theoretical analysis shows the proposed value function is
(essentially) the unique function that satisfies two desirable properties for
evaluating data values in classification. Further, our experiments on two
benchmark evaluation tasks (data removal and noisy label detection) and four
classifiers demonstrate the effectiveness of CS-Shapley over existing methods.
Lastly, we evaluate the "transferability" of data values estimated from one
classifier to others, and our results suggest Shapley-based data valuation is
transferable for application across different models.
Authors' comments: Accepted to NeurIPS 2022
Yuto Watanabe, Kazunori Sakurama
This paper addresses a distributed convex optimization problem with a class
of coupled constraints, which arise in a multi-agent system composed of
multiple communities modeled by cliques. First, we propose a fully distributed
gradient-based algorithm with a novel operator inspired by the convex
projection, called the clique-based projection. Next, we scrutinize the
convergence properties for both diminishing and fixed step sizes. For
diminishing ones, we show the convergence to an optimal solution under the
assumptions of the smoothness of an objective function and the compactness of
the constraint set. Additionally, when the objective function is strongly
monotone, the strict convergence to the unique solution is proved without the
assumption of compactness. For fixed step sizes, we prove the non-ergodic
convergence rate of O(1/k) concerning the objective residual under the
assumption of the smoothness of the objective function. Furthermore, we apply
Nesterov's acceleration method to the proposed algorithm and establish the
convergence rate of O(1/k^2). Numerical experiments illustrate the
effectiveness of the proposed method.
Authors' comments: 8 pages, 5 figures
Savinay Nagendra, Chaopeng Shen, Daniel Kifer
The purpose of binary segmentation models is to determine which pixels belong
to an object of interest (e.g., which pixels in an image are part of roads).
The models assign a logit score (i.e., probability) to each pixel and these are
converted into predictions by thresholding (i.e., each pixel with logit score
$\geq \tau$ is predicted to be part of a road). However, a common phenomenon in
current and former state-of-the-art segmentation models is spatial bias -- in
some patches, the logit scores are consistently biased upwards and in others
they are consistently biased downwards. These biases cause false positives and
false negatives in the final predictions. In this paper, we propose
PatchRefineNet (PRN), a small network that sits on top of a base segmentation
model and learns to correct its patch-specific biases. Across a wide variety of
base models, PRN consistently helps them improve mIoU by 2-3\%. One of the key
ideas behind PRN is the addition of a novel supervision signal during training.
Given the logit scores produced by the base segmentation model, each pixel is
given a pseudo-label that is obtained by optimally thresholding the logit
scores in each image patch. Incorporating these pseudo-labels into the loss
function of PRN helps correct systematic biases and reduce false
positives/negatives. Although we mainly focus on binary segmentation, we also
show how PRN can be extended to saliency detection and few-shot segmentation.
We also discuss how the ideas can be extended to multiclass segmentation.
Authors' comments: 16 pages, 12 figures, 7 tables (Added supplementary material)
Juan Sebastián Salcedo Gallo, Jesús Solano, Javier Hernán García, David Zarruk-Valencia, Alejandro Correa-Bahnsen
In this work, we propose a framework relying solely on chat-based customer
support (CS) interactions for predicting the recommendation decision of
individual users. For our case study, we analyzed a total number of 16.4k users
and 48.7k customer support conversations within the financial vertical of a
large e-commerce company in Latin America. Consequently, our main contributions
and objectives are to use Natural Language Processing (NLP) to assess and
predict the recommendation behavior where, in addition to using static
sentiment analysis, we exploit the predictive power of each user's sentiment
dynamics. Our results show that, with respective feature interpretability, it
is possible to predict the likelihood of a user to recommend a product or
service, based solely on the message-wise sentiment evolution of their CS
conversations in a fully automated way.
Authors' comments: 10 pages, 4 figures, 1 table. Already accepted at NeurIPS 2022,
LatinX in AI Workshop
Gao Xueqi, Xu Chao, Song Yihang, Hu Jing, Xiao Jian, Meng Zhaopeng
Road rage is a social problem that deserves attention, but little research has been done so far. In this paper, based on the biological topology of multi-channel EEG signals,we propose a model which combines transferable attention (TA) and regularized graph neural network (RGNN). First, topology-aware information aggregation is performed on EEG signals, and complex relationships between channels are dynamically learned. Then, the transferability of each channel is quantified based on the results of the node-wise domain classifier, which is used as attention score. We recruited 10 subjects and collected their EEG signals in pleasure and rage state in simulated driving conditions. We verify the effectiveness of our method on this dataset and compare it with other methods. The results indicate that our method is simple and efficient, with 85.63% accuracy in cross-subject experiments. It can be used to identify road rage. Our data and code are available. https://github.com/1CEc0ffee/dataAndCode.git
Tatsuya Chuman, Hitoshi Kiya
Privacy-preserving deep neural networks (DNNs) have been proposed for
protecting data privacy in the cloud server. Although several encryption
schemes for visually protection have been proposed for privacy-preserving DNNs,
several attacks enable to restore visual information from encrypted images. On
the other hand, it has been confirmed that the block-wise image encryption
scheme which utilizes block and pixel shuffling is robust against several
attacks. In this paper, we propose a jigsaw puzzle solver-based attack to
restore visual information from encrypted images including block and pixel
shuffling. In experiments, images encrypted by using the block-wise image
encryption are mostly restored by using the proposed attack.
Authors' comments: To be appeared in IWAIT2023
Sofoklis Kakouros, Themos Stafylakis, Ladislav Mosner, Lukas Burget
When recognizing emotions from speech, we encounter two common problems: how
to optimally capture emotion-relevant information from the speech signal and
how to best quantify or categorize the noisy subjective emotion labels.
Self-supervised pre-trained representations can robustly capture information
from speech enabling state-of-the-art results in many downstream tasks
including emotion recognition. However, better ways of aggregating the
information across time need to be considered as the relevant emotion
information is likely to appear piecewise and not uniformly across the signal.
For the labels, we need to take into account that there is a substantial degree
of noise that comes from the subjective human annotations. In this paper, we
propose a novel approach to attentive pooling based on correlations between the
representations' coefficients combined with label smoothing, a method aiming to
reduce the confidence of the classifier on the training labels. We evaluate our
proposed approach on the benchmark dataset IEMOCAP, and demonstrate high
performance surpassing that in the literature. The code to reproduce the
results is available at github.com/skakouros/s3prl_attentive_correlation.
Authors' comments: Submitted to IEEE-ICASSP 2023
Karthik Natarajan, Arjun Kodagehalli Ramachandra, Colin Tan
A collection of $n$ random events is said to be $(n - 1)$-wise independent if
any $n - 1$ events among them are mutually independent. We characterise all
probability measures with respect to which $n$ random events are $(n - 1)$-wise
independent. We provide sharp upper and lower bounds on the probability that at
least $k$ out of $n$ events with given marginal probabilities occur over these
probability measures. The bounds are shown to be computable in polynomial time.
Authors' comments: 18 pages, 2 tables
Yongzhi Su, Yan Di, Fabian Manhardt, Guangyao Zhai, Jason Rambach, Benjamin Busam, Didier Stricker, Federico Tombari
Despite monocular 3D object detection having recently made a significant leap forward thanks to the use of pre-trained depth estimators for pseudo-LiDAR recovery, such two-stage methods typically suffer from overfitting and are incapable of explicitly encapsulating the geometric relation between depth and object bounding box. To overcome this limitation, we instead propose OPA-3D, a single-stage, end-to-end, Occlusion-Aware Pixel-Wise Aggregation network that to jointly estimate dense scene depth with depth-bounding box residuals and object bounding boxes, allowing a two-stream detection of 3D objects, leading to significantly more robust detections. Thereby, the geometry stream denoted as the Geometry Stream, combines visible depth and depth-bounding box residuals to recover the object bounding box via explicit occlusion-aware optimization. In addition, a bounding box based geometry projection scheme is employed in an effort to enhance distance perception. The second stream, named as the Context Stream, directly regresses 3D object location and size. This novel two-stream representation further enables us to enforce cross-stream consistency terms which aligns the outputs of both streams, improving the overall performance. Extensive experiments on the public benchmark demonstrate that OPA-3D outperforms state-of-the-art methods on the main Car category, whilst keeping a real-time inference speed. We plan to release all codes and trained models soon.
Zihao Tang, Xinyi Wang, Lihaowen Zhu, Mariano Cabezas, Dongnan Liu, Michael Barnett, Weidong Cai, Chengyu Wang
Diffusion Weighted Imaging (DWI) is an advanced imaging technique commonly
used in neuroscience and neurological clinical research through a Diffusion
Tensor Imaging (DTI) model. Volumetric scalar metrics including fractional
anisotropy, mean diffusivity, and axial diffusivity can be derived from the DTI
model to summarise water diffusivity and other quantitative microstructural
information for clinical studies. However, clinical practice constraints can
lead to sub-optimal DWI acquisitions with missing slices (either due to a
limited field of view or the acquisition of disrupted slices). To avoid
discarding valuable subjects for group-wise studies, we propose a novel 3D
Tensor-Wise Brain-Aware Gate network (TW-BAG) for inpainting disrupted DTIs.
The proposed method is tailored to the problem with a dynamic gate mechanism
and independent tensor-wise decoders. We evaluated the proposed method on the
publicly available Human Connectome Project (HCP) dataset using common image
similarity metrics derived from the predicted tensors and scalar DTI metrics.
Our experimental results show that the proposed approach can reconstruct the
original brain DTI volume and recover relevant clinical imaging information.
Authors' comments: Accepted by The 2022 International Conference on Digital Image
Computing: Techniques and Applications (DICTA 2022)