Jingquan Luo, Qisheng Wang, Lvzhou Li
We explore potential quantum speedups for the fundamental problem of testing
the properties of closeness and $k$-wise uniformity of probability
distributions.
\textit{Closeness testing} is the problem of distinguishing whether two
$n$-dimensional distributions are identical or at least $\varepsilon$-far in
$\ell^1$- or $\ell^2$-distance. We show that the quantum query complexities for
$\ell^1$- and $\ell^2$-closeness testing are $O\rbra{\sqrt{n}/\varepsilon}$ and
$O\rbra{1/\varepsilon}$, respectively, both of which achieve optimal dependence
on $\varepsilon$, improving the prior best results of
\hyperlink{cite.gilyen2019distributional}{Gily{\'e}n and Li~(2019)}.
\textit{$k$-wise uniformity testing} is the problem of distinguishing whether
a distribution over $\cbra{0, 1}^n$ is uniform when restricted to any $k$
coordinates or $\varepsilon$-far from any such distributions. We propose the
first quantum algorithm for this problem with query complexity
$O\rbra{\sqrt{n^k}/\varepsilon}$, achieving a quadratic speedup over the
state-of-the-art classical algorithm with sample complexity
$O\rbra{n^k/\varepsilon^2}$ by \hyperlink{cite.o2018closeness}{O'Donnell and
Zhao (2018)}. Moreover, when $k = 2$ our quantum algorithm outperforms any
classical one because of the classical lower bound
$\Omega\rbra{n/\varepsilon^2}$.
All our quantum algorithms are fairly simple and time-efficient, using only
basic quantum subroutines such as amplitude estimation.
Authors' comments: We have added the proof of lower bounds and have polished the
language
Remi Luschei, Werner Brannath
The population-wise error rate (PWER) is a type I error rate for clinical
trials with multiple target populations. In such trials, one treatment is
tested for its efficacy in each population. The PWER is defined as the
probability that a randomly selected, future patient will be exposed to an
inefficient treatment based on the study results. The PWER can be understood
and computed as an average of strata specific family-wise error rates and
involves the prevalences of these strata. A major issue of this concept is that
the population prevalences needed to determine this average are usually not
known in practice, so that the PWER cannot be directly controlled. Instead, one
could use an estimator of the prevalences based on the given sample, like their
maximum-likelihood estimator. In this paper we show in simulations that this
does not substantially inflate the true PWER. We differentiate between the
expected PWER, which is almost perfectly controlled, and study-specific values
of the PWER which are conditioned to given sample sizes and vary within a
narrow range. Thereby, we consider up to eight different overlapping patient
populations and moderate to large sample sizes.
Authors' comments: 10 pages, 5 figures
Kyu Beom Han, Olivia G. Odenthal, Woo Jae Kim, Sung-Eui Yoon
Auxiliary features such as geometric buffers (G-buffers) and path descriptors
(P-buffers) have been shown to significantly improve Monte Carlo (MC)
denoising. However, recent approaches implicitly learn to exploit auxiliary
features for denoising, which could lead to insufficient utilization of each
type of auxiliary features. To overcome such an issue, we propose a denoising
framework that relies on an explicit pixel-wise guidance for utilizing
auxiliary features. First, we train two denoisers, each trained by a different
auxiliary feature (i.e., G-buffers or P-buffers). Then we design our ensembling
network to obtain per-pixel ensembling weight maps, which represent pixel-wise
guidance for which auxiliary feature should be dominant at reconstructing each
individual pixel and use them to ensemble the two denoised results of our
denosiers. We also propagate our pixel-wise guidance to the denoisers by
jointly training the denoisers and the ensembling network, further guiding the
denoisers to focus on regions where G-buffers or P-buffers are relatively
important for denoising. Our result and show considerable improvement in
denoising performance compared to the baseline denoising model using both
G-buffers and P-buffers.
Authors' comments: 19 pages
Pu Li, Marie Roch, Holger Klinck, Erica Fleishman, Douglas Gillespie, Eva-Marie Nosal, Yu Shiu, Xiaobai Liu
Whistle contour extraction aims to derive animal whistles from time-frequency
spectrograms as polylines. For toothed whales, whistle extraction results can
serve as the basis for analyzing animal abundance, species identity, and social
activities. During the last few decades, as long-term recording systems have
become affordable, automated whistle extraction algorithms were proposed to
process large volumes of recording data. Recently, a deep learning-based method
demonstrated superior performance in extracting whistles under varying noise
conditions. However, training such networks requires a large amount of
labor-intensive annotation, which is not available for many species. To
overcome this limitation, we present a framework of stage-wise generative
adversarial networks (GANs), which compile new whistle data suitable for deep
model training via three stages: generation of background noise in the
spectrogram, generation of whistle contours, and generation of whistle signals.
By separating the generation of different components in the samples, our
framework composes visually promising whistle data and labels even when few
expert annotated data are available. Regardless of the amount of
human-annotated data, the proposed data augmentation framework leads to a
consistent improvement in performance of the whistle extraction model, with a
maximum increase of 1.69 in the whistle extraction mean F1-score. Our
stage-wise GAN also surpasses one single GAN in improving whistle extraction
models with augmented data. The data and code will be available at
https://github.com/Paul-LiPu/CompositeGAN\_WhistleAugment.
Authors' comments: Accepted by IEEE Transactions of Multimedia (2023)
Per Calissendorff, Matthew De Furio, Michael Meyer, Loïc Albert, Christian Aganze, Mohamad Ali-Dib, Daniella C. Bardalez Gagliuffi, Frederique Baron et al.
We report the discovery of the first brown dwarf binary system with a Y dwarf
primary, WISE J033605.05$-$014350.4, observed with NIRCam on JWST with the
F150W and F480M filters. We employed an empirical point spread function binary
model to identify the companion, located at a projected separation of 84
milliarcseconds, position angle of 295 degrees, and with contrast of 2.8 and
1.8 magnitudes in F150W and F480M, respectively. At a distance of 10$\,$pc
based on its Spitzer parallax, and assuming a random inclination distribution,
the physical separation is approximately 1$\,$au. Evolutionary models predict
for that an age of 1-5 Gyr, the companion mass is about 4-12.5 Jupiter masses
around the 7.5-20 Jupiter mass primary, corresponding to a companion-to-host
mass fraction of $q=0.61\pm0.05$. Under the assumption of a Keplerian orbit the
period for this extreme binary is in the range of 5-9 years. The system joins a
small but growing sample of ultracool dwarf binaries with effective
temperatures of a few hundreds of Kelvin. Brown dwarf binaries lie at the nexus
of importance for understanding the formation mechanisms of these elusive
objects, as they allow us to investigate whether the companions formed as stars
or as planets in a disk around the primary.
Authors' comments: 8 pages, 3 figures, 1 table. Accepted for publication in
Astrophysical Journal Letters
Midia Reshadi, David Gregg
Sparse tensor computing is a core computational part of numerous applications in areas such as data science, graph processing, and scientific computing. Sparse tensors offer the potential of skipping unnecessary computations caused by zero values. In this paper, we propose a new strategy for extending row-wise product sparse tensor accelerators. We propose a new processing element called Maple that uses multiple multiply-accumulate (MAC) units to exploit local clusters of non-zero values to increase parallelism and reduce data movement. Maple works on the compressed sparse row (CSR) format and calculates only non-zero elements of the input matrices based on the sparsity pattern. Furthermore, we may employ Maple as a basic building block in a variety of spatial tensor accelerators that operate based on a row-wise product approach. As a proof of concept, we utilize Maple in two reference accelerators: Extensor and Matraptor. Our experiments show that using Maple in Matraptor and Extensor achieves 50% and 60% energy benefit and 15% and 22% speedup over the baseline designs, respectively. Employing Maple also results in 5.9x and 15.5x smaller area consumption in Matraptor and Extensor compared with the baseline structures, respectively.
Ziwei Liu, Yongtao Wang, Xiaojie Chu
Knowledge distillation is a popular technique for transferring the knowledge
from a large teacher model to a smaller student model by mimicking. However,
distillation by directly aligning the feature maps between teacher and student
may enforce overly strict constraints on the student thus degrade the
performance of the student model. To alleviate the above feature misalignment
issue, existing works mainly focus on spatially aligning the feature maps of
the teacher and the student, with pixel-wise transformation. In this paper, we
newly find that aligning the feature maps between teacher and student along the
channel-wise dimension is also effective for addressing the feature
misalignment issue. Specifically, we propose a learnable nonlinear channel-wise
transformation to align the features of the student and the teacher model.
Based on it, we further propose a simple and generic framework for feature
distillation, with only one hyper-parameter to balance the distillation loss
and the task specific loss. Extensive experimental results show that our method
achieves significant performance improvements in various computer vision tasks
including image classification (+3.28% top-1 accuracy for MobileNetV1 on
ImageNet-1K), object detection (+3.9% bbox mAP for ResNet50-based Faster-RCNN
on MS COCO), instance segmentation (+2.8% Mask mAP for ResNet50-based
Mask-RCNN), and semantic segmentation (+4.66% mIoU for ResNet18-based PSPNet in
semantic segmentation on Cityscapes), which demonstrates the effectiveness and
the versatility of the proposed method. The code will be made publicly
available.
Authors' comments: 13 pages
Shenghai Liao, Xuya Liu, Ruyi Han, Shujun Fu, Yuanfeng Zhou, Yuliang Li
Digital image inpainting is an interpolation problem, inferring the content
in the missing (unknown) region to agree with the known region data such that
the interpolated result fulfills some prior knowledge. Low-rank and nonlocal
self-similarity are two important priors for image inpainting. Based on the
nonlocal self-similarity assumption, an image is divided into overlapped square
target patches (submatrices) and the similar patches of any target patch are
reshaped as vectors and stacked into a patch matrix. Such a patch matrix
usually enjoys a property of low rank or approximately low rank, and its
missing entries are recoveried by low-rank matrix approximation (LRMA)
algorithms. Traditionally, $n$ nearest neighbor similar patches are searched
within a local window centered at a target patch. However, for an image with
missing lines, the generated patch matrix is prone to having entirely-missing
rows such that the downstream low-rank model fails to reconstruct it well. To
address this problem, we propose a region-wise matching (RwM) algorithm by
dividing the neighborhood of a target patch into multiple subregions and then
search the most similar one within each subregion. A non-convex weighted
low-rank decomposition (NC-WLRD) model for LRMA is also proposed to reconstruct
all degraded patch matrices grouped by the proposed RwM algorithm. We solve the
proposed NC-WLRD model by the alternating direction method of multipliers
(ADMM) and analyze the convergence in detail. Numerous experiments on line
inpainting (entire-row/column missing) demonstrate the superiority of our
method over other competitive inpainting algorithms. Unlike other
low-rank-based matrix completion methods and inpainting algorithms, the
proposed model NC-WLRD is also effective for removing random-valued impulse
noise and structural noise (stripes).
Authors' comments: region-wise matching algorithm, image inpainting, 20 pages, 18
figures
Elliot Vincent, Jean Ponce, Mathieu Aubry
Improvements in Earth observation by satellites allow for imagery of ever
higher temporal and spatial resolution. Leveraging this data for agricultural
monitoring is key for addressing environmental and economic challenges. Current
methods for crop segmentation using temporal data either rely on annotated data
or are heavily engineered to compensate the lack of supervision. In this paper,
we present and compare datasets and methods for both supervised and
unsupervised pixel-wise segmentation of satellite image time series (SITS). We
also introduce an approach to add invariance to spectral deformations and
temporal shifts to classical prototype-based methods such as K-means and
Nearest Centroid Classifier (NCC). We study different levels of supervision and
show this simple and highly interpretable method achieves the best performance
in the low data regime and significantly improves the state of the art for
unsupervised classification of agricultural time series on four recent SITS
datasets.
Authors' comments: Revised version. Added references and baselines. Corrected typos.
Added discussion section and Appendix A, B and C
Bencheng Liao, Shaoyu Chen, Bo Jiang, Tianheng Cheng, Qian Zhang, Wenyu Liu, Chang Huang, Xinggang Wang
Online lane graph construction is a promising but challenging task in
autonomous driving. Previous methods usually model the lane graph at the pixel
or piece level, and recover the lane graph by pixel-wise or piece-wise
connection, which breaks down the continuity of the lane and results in
suboptimal performance. Human drivers focus on and drive along the continuous
and complete paths instead of considering lane pieces. Autonomous vehicles also
require path-specific guidance from lane graph for trajectory planning. We
argue that the path, which indicates the traffic flow, is the primitive of the
lane graph. Motivated by this, we propose to model the lane graph in a novel
path-wise manner, which well preserves the continuity of the lane and encodes
traffic information for planning. We present a path-based online lane graph
construction method, termed LaneGAP, which end-to-end learns the path and
recovers the lane graph via a Path2Graph algorithm. We qualitatively and
quantitatively demonstrate the superior accuracy and efficiency of LaneGAP over
conventional pixel-based and piece-based methods on the challenging nuScenes
and Argoverse2 datasets under controllable and fair conditions. Compared to the
recent state-of-the-art piece-wise method TopoNet on the OpenLane-V2 dataset,
LaneGAP still outperforms by 1.6 mIoU, further validating the effectiveness of
path-wise modeling. Abundant visualizations in the supplementary material show
LaneGAP can cope with diverse traffic conditions. Code is released at
\url{https://github.com/hustvl/LaneGAP}.
Authors' comments: Accepted to ECCV 2024
Kira Maag, Tobias Riedlinger
In recent years, deep neural networks have defined the state-of-the-art in semantic segmentation where their predictions are constrained to a predefined set of semantic classes. They are to be deployed in applications such as automated driving, although their categorically confined expressive power runs contrary to such open world scenarios. Thus, the detection and segmentation of objects from outside their predefined semantic space, i.e., out-of-distribution (OoD) objects, is of highest interest. Since uncertainty estimation methods like softmax entropy or Bayesian models are sensitive to erroneous predictions, these methods are a natural baseline for OoD detection. Here, we present a method for obtaining uncertainty scores from pixel-wise loss gradients which can be computed efficiently during inference. Our approach is simple to implement for a large class of models, does not require any additional training or auxiliary data and can be readily used on pre-trained segmentation models. Our experiments show the ability of our method to identify wrong pixel classifications and to estimate prediction quality at negligible computational overhead. In particular, we observe superior performance in terms of OoD segmentation to comparable baselines on the SegmentMeIfYouCan benchmark, clearly outperforming other methods.
Zheqi Zhu, Yuchen Shi, Jiajun Luo, Fei Wang, Chenghui Peng, Pingyi Fan, Khaled B. Letaief
Federated learning (FL) has prevailed as an efficient and privacy-preserved scheme for distributed learning. In this work, we mainly focus on the optimization of computation and communication in FL from a view of pruning. By adopting layer-wise pruning in local training and federated updating, we formulate an explicit FL pruning framework, FedLP (Federated Layer-wise Pruning), which is model-agnostic and universal for different types of deep learning models. Two specific schemes of FedLP are designed for scenarios with homogeneous local models and heterogeneous ones. Both theoretical and experimental evaluations are developed to verify that FedLP relieves the system bottlenecks of communication and computation with marginal performance decay. To the best of our knowledge, FedLP is the first framework that formally introduces the layer-wise pruning into FL. Within the scope of federated learning, more variants and combinations can be further designed based on FedLP.
ZongTan Li
Multi-Object Tracking (MOT) has gained extensive attention in recent years
due to its potential applications in traffic and pedestrian detection. We note
that tracking by detection may suffer from errors generated by noise detectors,
such as an imprecise bounding box before the occlusions, and observed that in
most tracking scenarios, objects tend to move and lost within specific
locations. To counter this, we present a novel tracker to deal with the bad
detector and occlusions. Firstly, we proposed a location-wise sub-region
recognition method which equally divided the frame, which we called mesh. Then
we proposed corresponding location-wise loss management strategies and
different matching strategies. The resulting Mesh-SORT, ablation studies
demonstrate its effectiveness and made 3% fragmentation 7.2% ID switches drop
and 0.4% MOTA improvement compared to the baseline on MOT17 datasets. Finally,
we analyze its limitation on the specific scene and discussed what future works
can be extended.
Authors' comments: 14 pages 18 figs
Kai Zhai, Qiang Nie, Bo Ouyang, Xiang Li, Shanlin Yang
2D-to-3D human pose lifting is fundamental for 3D human pose estimation
(HPE), for which graph convolutional networks (GCNs) have proven inherently
suitable for modeling the human skeletal topology. However, the current
GCN-based 3D HPE methods update the node features by aggregating their
neighbors' information without considering the interaction of joints in
different joint synergies. Although some studies have proposed importing limb
information to learn the movement patterns, the latent synergies among joints,
such as maintaining balance are seldom investigated. We propose the Hop-wise
GraphFormer with Intragroup Joint Refinement (HopFIR) architecture to tackle
the 3D HPE problem. HopFIR mainly consists of a novel hop-wise GraphFormer
(HGF) module and an intragroup joint refinement (IJR) module. The HGF module
groups the joints by k-hop neighbors and applies a hopwise transformer-like
attention mechanism to these groups to discover latent joint synergies. The IJR
module leverages the prior limb information for peripheral joint refinement.
Extensive experimental results show that HopFIR outperforms the SOTA methods by
a large margin, with a mean per-joint position error (MPJPE) on the Human3.6M
dataset of 32.67 mm. We also demonstrate that the state-of-the-art GCN-based
methods can benefit from the proposed hop-wise attention mechanism with a
significant improvement in performance: SemGCN and MGCN are improved by 8.9%
and 4.5%, respectively.
Authors' comments: Accepted by ICCV 2023
Shenwei Xie, Wanfeng Zheng, Zhenglin Xian, Junli Yang, Chuang Zhang, Ming Wu
Automatically extracting roads from satellite imagery is a fundamental yet
challenging computer vision task in the field of remote sensing. Pixel-wise
semantic segmentation-based approaches and graph-based approaches are two
prevailing schemes. However, prior works show the imperfections that semantic
segmentation-based approaches yield road graphs with low connectivity, while
graph-based methods with iterative exploring paradigms and smaller receptive
fields focus more on local information and are also time-consuming. In this
paper, we propose a new scheme for multi-task satellite imagery road
extraction, Patch-wise Road Keypoints Detection (PaRK-Detect). Building on top
of D-LinkNet architecture and adopting the structure of keypoint detection, our
framework predicts the position of patch-wise road keypoints and the adjacent
relationships between them to construct road graphs in a single pass.
Meanwhile, the multi-task framework also performs pixel-wise semantic
segmentation and generates road segmentation masks. We evaluate our approach
against the existing state-of-the-art methods on DeepGlobe, Massachusetts
Roads, and RoadTracer datasets and achieve competitive or better results. We
also demonstrate a considerable outperformance in terms of inference speed.
Authors' comments: Accepted at BMVC 2022 (Oral). 13 pages, 5 figures.
https://bmvc2022.mpi-inf.mpg.de/381/
Shancong Mou, Xiaoyi Gu, Meng Cao, Haoping Bai, Ping Huang, Jiulong Shan, Jianjun Shi
Generative adversarial networks (GANs), trained on a large-scale image dataset, can be a good approximator of the natural image manifold. GAN-inversion, using a pre-trained generator as a deep generative prior, is a promising tool for image restoration under corruptions. However, the performance of GAN-inversion can be limited by a lack of robustness to unknown gross corruptions, i.e., the restored image might easily deviate from the ground truth. In this paper, we propose a Robust GAN-inversion (RGI) method with a provable robustness guarantee to achieve image restoration under unknown \textit{gross} corruptions, where a small fraction of pixels are completely corrupted. Under mild assumptions, we show that the restored image and the identified corrupted region mask converge asymptotically to the ground truth. Moreover, we extend RGI to Relaxed-RGI (R-RGI) for generator fine-tuning to mitigate the gap between the GAN learned manifold and the true image manifold while avoiding trivial overfitting to the corrupted input image, which further improves the image restoration and corrupted region mask identification performance. The proposed RGI/R-RGI method unifies two important applications with state-of-the-art (SOTA) performance: (i) mask-free semantic inpainting, where the corruptions are unknown missing regions, the restored background can be used to restore the missing content; (ii) unsupervised pixel-wise anomaly detection, where the corruptions are unknown anomalous regions, the retrieved mask can be used as the anomalous region's segmentation mask.
Matthew De Furio, Ben W. Lew, Charles A. Beichman, Thomas Roellig, Geoffrey Bryden, David R. Ciardi, Michael R. Meyer, Marcia J. Rieke et al.
The Y-dwarf WISE 1828+2650 is one of the coldest known Brown Dwarfs with an
effective temperature of $\sim$300 K. Located at a distance of just 10 pc,
previous model-based estimates suggest WISE1828+2650 has a mass of $\sim$5-10
Mj, making it a valuable laboratory for understanding the formation, evolution
and physical characteristics of gas giant planets. However, previous photometry
and spectroscopy have presented a puzzle with the near-impossibility of
simultaneously fitting both the short (0.9-2.0 microns) and long wavelength
(3-5 microns) data. A potential solution to this problem has been the
suggestion that WISE 1828+2650 is a binary system whose composite spectrum
might provide a better match to the data. Alternatively, new models being
developed to fit JWST/NIRSpec and MIRI spectroscopy might provide new insights.
This article describes JWST/NIRCam observations of WISE 1828+2650 in 6 filters
to address the binarity question and to provide new photometry to be used in
model fitting. We also report Adaptive Optics imaging with the Keck 10 m
telescope. We find no evidence for multiplicity for a companion beyond 0.5 AU
with either JWST or Keck. Companion articles will present low and high
resolution spectra of WISE 1828+2650 obtained with both NIRSpec and MIRI.
Authors' comments: 15 pages, 9 figures, Accepted by ApJ on Feb. 21 2023
Shuai Tao, Himavanth Reddy, Jesper Rindom Jensen, Mads Græsbøll Christensen
In this work, we propose a frequency bin-wise method to estimate the
single-channel speech presence probability (SPP) with multiple deep neural
networks (DNNs) in the short-time Fourier transform domain. Since all frequency
bins are typically considered simultaneously as input features for conventional
DNN-based SPP estimators, high model complexity is inevitable. To reduce the
model complexity and the requirements on the training data, we take a single
frequency bin and some of its neighboring frequency bins into account to train
separate gate recurrent units. In addition, the noisy speech and the a
posteriori probability SPP representation are used to train our model. The
experiments were performed on the Deep Noise Suppression challenge dataset. The
experimental results show that the speech detection accuracy can be improved
when we employ the frequency bin-wise model. Finally, we also demonstrate that
our proposed method outperforms most of the state-of-the-art SPP estimation
methods in terms of speech detection accuracy and model complexity.
Authors' comments: Accepted for ICASSP 2023
Marco Landt-Hayen, Willi Rath, Martin Claus, Peer Kröger
Layer-wise relevance propagation (LRP) is a widely used and powerful
technique to reveal insights into various artificial neural network (ANN)
architectures. LRP is often used in the context of image classification. The
aim is to understand, which parts of the input sample have highest relevance
and hence most influence on the model prediction. Relevance can be traced back
through the network to attribute a certain score to each input pixel. Relevance
scores are then combined and displayed as heat maps and give humans an
intuitive visual understanding of classification models. Opening the black box
to understand the classification engine in great detail is essential for domain
experts to gain trust in ANN models. However, there are pitfalls in terms of
model-inherent artifacts included in the obtained relevance maps, that can
easily be missed. But for a valid interpretation, these artifacts must not be
ignored. Here, we apply and revise LRP on various ANN architectures trained as
classifiers on geospatial and synthetic data. Depending on the network
architecture, we show techniques to control model focus and give guidance to
improve the quality of obtained relevance maps to separate facts from
artifacts.
Authors' comments: Fixed typo
Wei Tang, Kangning Cui, Raymond H. Chan
Diabetic retinopathy (DR) is a leading global cause of blindness. Early
detection of hard exudates plays a crucial role in identifying DR, which aids
in treating diabetes and preventing vision loss. However, the unique
characteristics of hard exudates, ranging from their inconsistent shapes to
indistinct boundaries, pose significant challenges to existing segmentation
techniques. To address these issues, we present a novel supervised contrastive
learning framework to optimize hard exudate segmentation. Specifically, we
introduce a patch-wise density contrasting scheme to distinguish between areas
with varying lesion concentrations, and therefore improve the model's
proficiency in segmenting small lesions. To handle the ambiguous boundaries, we
develop a discriminative edge inspection module to dynamically analyze the
pixels that lie around the boundaries and accurately delineate the exudates.
Upon evaluation using the IDRiD dataset and comparison with state-of-the-art
frameworks, our method exhibits its effectiveness and shows potential for
computer-assisted hard exudate detection. The code to replicate experiments is
available at github.com/wetang7/HECL/.
Authors' comments: 8 pages, 3 figures, 2 tables. To appear in ISBI 2024