Weiye Zhao, Tairan He, Rui Chen, Tianhao Wei, Changliu Liu
Despite the tremendous success of Reinforcement Learning (RL) algorithms in
simulation environments, applying RL to real-world applications still faces
many challenges. A major concern is safety, in another word, constraint
satisfaction. State-wise constraints are one of the most common constraints in
real-world applications and one of the most challenging constraints in Safe RL.
Enforcing state-wise constraints is necessary and essential to many challenging
tasks such as autonomous driving, robot manipulation. This paper provides a
comprehensive review of existing approaches that address state-wise constraints
in RL. Under the framework of State-wise Constrained Markov Decision Process
(SCMDP), we will discuss the connections, differences, and trade-offs of
existing approaches in terms of (i) safety guarantee and scalability, (ii)
safety and reward performance, and (iii) safety after convergence and during
training. We also summarize limitations of current methods and discuss
potential future directions.
Authors' comments: IJCAI Survey Track 2023
T. H. Jarrett, M. E. Cluver, Edward N. Taylor, Sabine Bellstedt, A. S. G Robotham, H. F. M. Yao
We derive new empirical scaling relations between WISE mid-infrared galaxy
photometry and well-determined stellar masses from SED modeling of a suite of
optical-infrared photometry provided by the DR4 Catalogue of the
GAMA-KiDS-VIKING survey of the southern G23 field. The mid-infrared source
extraction and characterization are drawn from the WISE Extended Source
Catalogue (WXSC) and the archival ALLWISE catalog, combining both resolved and
compact galaxies in the G23 sample to a redshift of 0.15. Three scaling
relations are derived: W1 3.4 micron luminosity versus stellar mass, and WISE
W1-W2, W1-W3 colors versus mass-to-light ratio (sensitive to a variety of
galaxy types from passive to star-forming). For each galaxy in the sample, we
then derive the combined stellar mass from these scaling relations, producing
Mstellar estimates with better than $\sim$25-30% accuracy for galaxies with
$>$10$^{9}$ Msolar and $<$40 - 50% for lower luminosity dwarf galaxies. We also
provide simple prescriptions for rest-frame corrections and estimating stellar
masses using only the W1 flux and the W1-W2 color, making stellar masses more
accessible to users of the WISE data. Given a redshift or distance, these new
scaling relations will enable stellar mass estimates for any galaxy in the sky
detected by WISE with high fidelity across a range of mass-to-light.
Authors' comments: Accepted for publication in the Astrophysical Journal (ApJ)
Norihide Tokushige
A family of $k$-element subsets of an $n$-element set is called 3-wise
intersecting if any three members in the family have non-empty intersection. We
determine the maximum size of such families exactly or asymptotically. One of
our results shows that for every $\epsilon>0$ there exists $n_0$ such that if
$n>n_0$ and $\frac25+\epsilon<\frac kn<\frac 12-\epsilon$ then the maximum size
is $4\binom{n-4}{k-3}+\binom{n-4}{k-4}$.
Authors' comments: 12 pages, fixed proof Theorem 4
Yizeng Han, Zhihang Yuan, Yifan Pu, Chenhao Xue, Shiji Song, Guangyu Sun, Gao Huang
Spatial-wise dynamic convolution has become a promising approach to improving
the inference efficiency of deep networks. By allocating more computation to
the most informative pixels, such an adaptive inference paradigm reduces the
spatial redundancy in image features and saves a considerable amount of
unnecessary computation. However, the theoretical efficiency achieved by
previous methods can hardly translate into a realistic speedup, especially on
the multi-core processors (e.g. GPUs). The key challenge is that the existing
literature has only focused on designing algorithms with minimal computation,
ignoring the fact that the practical latency can also be influenced by
scheduling strategies and hardware properties. To bridge the gap between
theoretical computation and practical efficiency, we propose a latency-aware
spatial-wise dynamic network (LASNet), which performs coarse-grained spatially
adaptive inference under the guidance of a novel latency prediction model. The
latency prediction model can efficiently estimate the inference latency of
dynamic networks by simultaneously considering algorithms, scheduling
strategies, and hardware properties. We use the latency predictor to guide both
the algorithm design and the scheduling optimization on various hardware
platforms. Experiments on image classification, object detection and instance
segmentation demonstrate that the proposed framework significantly improves the
practical inference efficiency of deep networks. For example, the average
latency of a ResNet-101 on the ImageNet validation set could be reduced by 36%
and 46% on a server GPU (Nvidia Tesla-V100) and an edge device (Nvidia Jetson
TX2 GPU) respectively without sacrificing the accuracy. Code is available at
https://github.com/LeapLabTHU/LASNet.
Authors' comments: NeurIPS 2022
Michael Panchenko, Anes Benmerzoug, Miguel de Benito Delgado
For many applications of probabilistic classifiers it is important that the
predicted confidence vectors reflect true probabilities (one says that the
classifier is calibrated). It has been shown that common models fail to satisfy
this property, making reliable methods for measuring and improving calibration
important tools. Unfortunately, obtaining these is far from trivial for
problems with many classes. We propose two techniques that can be used in
tandem. First, a reduced calibration method transforms the original problem
into a simpler one. We prove for several notions of calibration that solving
the reduced problem minimizes the corresponding notion of miscalibration in the
full problem, allowing the use of non-parametric recalibration methods that
fail in higher dimensions. Second, we propose class-wise calibration methods,
based on intuition building on a phenomenon called neural collapse and the
observation that most of the accurate classifiers found in practice can be
thought of as a union of K different functions which can be recalibrated
separately, one for each class. These typically out-perform their non
class-wise counterparts, especially for classifiers trained on imbalanced data
sets. Applying the two methods together results in class-wise reduced
calibration algorithms, which are powerful tools for reducing the prediction
and per-class calibration errors. We demonstrate our methods on real and
synthetic datasets and release all code as open source at
https://github.com/appliedAI-Initiative
Authors' comments: Accepted at the 21st IEEE International Conference on Machine
Learning and Applications, ICMLA 2022
Jiaqi Liao, Mengyu Cao, Mei Lu
Let $t$, $r$, $k$ and $n$ be positive integers and $\mathcal{F}$ a family of
$k$-subsets of an $n$-set $V$. The family $ \CF $ is $ r $-wise $ t
$-intersecting if for any $ F_1, \ldots, F_r \in \CF $, we have $ \abs{\cap_{i
= 1}^{r}F_i}\gs t $. An $ r $-wise $ t $-intersecting family of $ r + 1 $ sets
$ \{T_1, \ldots, T_{r + 1}\} $ is called an $ (r + 1,t) $-triangle if $ |T_1
\cap \cdots \cap T_{r + 1}| \ls t - 1 $. In this paper, we prove that if $ n
\gs n_0(r, t, k) $, then the $ r $-wise $ t $-intersecting family $ \CF
\subseteq \binom{[n]}{k} $ containing the most $ (r + 1,t) $-triangles is
isomorphic to $ \curlybraces{F \in \binom{[n]}{k}: \abs{F \cap [r + t]} \gs r +
t - 1} $. This can also be regarded as a generalized Tur\'{a}n type result.
Authors' comments: 14 pages
Zelin Zhao, Ze Wu, Yueqing Zhuang, Boxun Li, Jiaya Jia
Multi-object tracking (MOT) requires detecting and associating objects
through frames. Unlike tracking via detected bounding boxes or tracking objects
as points, we propose tracking objects as pixel-wise distributions. We
instantiate this idea on a transformer-based architecture, P3AFormer, with
pixel-wise propagation, prediction, and association. P3AFormer propagates
pixel-wise features guided by flow information to pass messages between frames.
Furthermore, P3AFormer adopts a meta-architecture to produce multi-scale object
feature maps. During inference, a pixel-wise association procedure is proposed
to recover object connections through frames based on the pixel-wise
prediction. P3AFormer yields 81.2\% in terms of MOTA on the MOT17 benchmark --
the first among all transformer networks to reach 80\% MOTA in literature.
P3AFormer also outperforms state-of-the-arts on the MOT20 and KITTI benchmarks.
Authors' comments: Accepted in ECCV22 as an oral presentation paper. The code&project
page is at
https://github.com/dvlab-research/ECCV22-P3AFormer-Tracking-Objects-as-Pixel-wise-Distributions
Jingtang Liang, Chi-Man Pun
Image harmonization task aims at harmonizing different composite foreground regions according to specific background image. Previous methods would rather focus on improving the reconstruction ability of the generator by some internal enhancements such as attention, adaptive normalization and light adjustment, $etc.$. However, they pay less attention to discriminating the foreground and background appearance features within a restricted generator, which becomes a new challenge in image harmonization task. In this paper, we propose a novel image harmonization framework with external style fusion and region-wise contrastive learning scheme. For the external style fusion, we leverage the external background appearance from the encoder as the style reference to generate harmonized foreground in the decoder. This approach enhances the harmonization ability of the decoder by external background guidance. Moreover, for the contrastive learning scheme, we design a region-wise contrastive loss function for image harmonization task. Specifically, we first introduce a straight-forward samples generation method that selects negative samples from the output harmonized foreground region and selects positive samples from the ground-truth background region. Our method attempts to bring together corresponding positive and negative samples by maximizing the mutual information between the foreground and background styles, which desirably makes our harmonization network more robust to discriminate the foreground and background style features when harmonizing composite images. Extensive experiments on the benchmark datasets show that our method can achieve a clear improvement in harmonization quality and demonstrate the good generalization capability in real-scenario applications.
Yue Guan, Zhengyi Li, Jingwen Leng, Zhouhan Lin, Minyi Guo
Transformer architecture has become the de-facto model for many machine
learning tasks from natural language processing and computer vision. As such,
improving its computational efficiency becomes paramount. One of the major
computational inefficiency of Transformer-based models is that they spend the
identical amount of computation throughout all layers. Prior works have
proposed to augment the Transformer model with the capability of skimming
tokens to improve its computational efficiency. However, they suffer from not
having effectual and end-to-end optimization of the discrete skimming
predictor. To address the above limitations, we propose the Transkimmer
architecture, which learns to identify hidden state tokens that are not
required by each layer. The skimmed tokens are then forwarded directly to the
final output, thus reducing the computation of the successive layers. The key
idea in Transkimmer is to add a parameterized predictor before each layer that
learns to make the skimming decision. We also propose to adopt
reparameterization trick and add skim loss for the end-to-end training of
Transkimmer. Transkimmer achieves 10.97x average speedup on GLUE benchmark
compared with vanilla BERT-base baseline with less than 1% accuracy
degradation.
Authors' comments: Published as a conference paper at ACL 2022
Hong-Yi Wang, Tian-Sheuan Chang
Following the success of the natural language processing, the transformer for
vision applications has attracted significant attention in recent years due to
its excellent performance. However, existing deep learning hardware
accelerators for vision cannot execute this structure efficiently due to
significant model architecture differences. As a result, this paper proposes
the hardware accelerator for vision transformers with row-wise scheduling,
which decomposes major operations in vision transformers as a single dot
product primitive for a unified and efficient execution. Furthermore, by
sharing weights in columns, we can reuse the data and reduce the usage of
memory. The implementation with TSMC 40nm CMOS technology only requires 262K
gate count and 149KB SRAM buffer for 403.2 GOPS throughput at 600MHz clock
frequency.
Authors' comments: 5 pages, 6 figures, published in IEEE AICAS 2022
Lucas Rosenblatt, Joshua Allen, Julia Stoyanovich
Differentially private (DP) synthetic data generation is a practical method for improving access to data as a means to encourage productive partnerships. One issue inherent to DP is that the "privacy budget" is generally "spent" evenly across features in the data set. This leads to good statistical parity with the real data, but can undervalue the conditional probabilities and marginals that are critical for predictive quality of synthetic data. Further, loss of predictive quality may be non-uniform across the data set, with subsets that correspond to minority groups potentially suffering a higher loss. In this paper, we develop ensemble methods that distribute the privacy budget "wisely" to maximize predictive accuracy of models trained on DP data, and "fairly" to bound potential disparities in accuracy across groups and reduce inequality. Our methods are based on the insights that feature importance can inform how privacy budget is allocated, and, further, that per-group feature importance and fairness-related performance objectives can be incorporated in the allocation. These insights make our methods tunable to social contexts, allowing data owners to produce balanced synthetic data for predictive analysis.
Martin Groenewegen
(abridged) Variability is a key property of stars on the asymptotic giant branch (AGB). Their pulsation period is related to the luminosity and mass-loss rate (MLR) of the star. Long-period variables (LPVs) and Mira variables are the most prominent of all types of variability of evolved stars. However, the reddest, most obscured AGB stars are too faint in the optical and have eluded large variability surveys. Our goal is to obtain a sample of LPVs with large MLRs by analysing WISE W1 and W2 light curves (LCs) for about 2000 sources, photometrically selected to include known C-stars with the 11.3 mu silicon carbide dust feature in absorption, and Galactic O-stars with periods longer than 1000 days. Epoch photometry was retrieved from the AllWISE and NEOWISE database and fitted with a sinus curve. Photometry from other variability surveys was also downloaded and fitted. For a subset of 316 of the reddest stars, spectral energy distributions (SEDs) were constructed, and, together with mid-infrared (MIR) spectra when available, fitted with a dust radiative transfer programme in order to derive MLRs. WISE based LCs and fits to the data are presented for all stars. Inspired by a recent paper, a number of non-variable OH/IRs are identified. Based on a selection on amplitude, a sample of about 750 (candidate) LPVs is selected of which 145 have periods over 1000 days, many of them being new. For the subset of the stars with the colours of C-rich extremely red objects (EROs) the fitting of the SEDs (and available MIR spectra) separates them into C- and O-rich objects. The number of Galactic EROs appears to be complete up to about 5~kpc and a total dust return rate in the solar neighbourhood for this class is determined. Based on the EROs in the Magellanic Clouds, a bolometric period luminosity is derived.
Jialong Tang, Hongyu Lin, Meng Liao, Yaojie Lu, Xianpei Han, Le Sun, Weijian Xie, Jin Xu
Procedural text understanding requires machines to reason about entity states
within the dynamical narratives. Current procedural text understanding
approaches are commonly \textbf{entity-wise}, which separately track each
entity and independently predict different states of each entity. Such an
entity-wise paradigm does not consider the interaction between entities and
their states. In this paper, we propose a new \textbf{scene-wise} paradigm for
procedural text understanding, which jointly tracks states of all entities in a
scene-by-scene manner. Based on this paradigm, we propose \textbf{S}cene
\textbf{G}raph \textbf{R}easoner (\textbf{SGR}), which introduces a series of
dynamically evolving scene graphs to jointly formulate the evolution of
entities, states and their associations throughout the narrative. In this way,
the deep interactions between all entities and states can be jointly captured
and simultaneously derived from scene graphs. Experiments show that SGR not
only achieves the new state-of-the-art performance but also significantly
accelerates the speed of reasoning.
Authors' comments: 9 pages, 2 figures
Cristian Ivan
We introduce an algorithm where the individual bits representing the weights
of a neural network are learned. This method allows training weights with
integer values on arbitrary bit-depths and naturally uncovers sparse networks,
without additional constraints or regularization techniques. We show better
results than the standard training technique with fully connected networks and
similar performance as compared to standard training for convolutional and
residual networks. By training bits in a selective manner we found that the
biggest contribution to achieving high accuracy is given by the first three
most significant bits, while the rest provide an intrinsic regularization. As a
consequence more than 90\% of a network can be used to store arbitrary codes
without affecting its accuracy. These codes may be random noise, binary files
or even the weights of previously trained networks.
Authors' comments: 9 pages, 9 figures
Yoon-Jae Yeo, Min-Cheol Sagong, Seung Park, Sung-Jea Ko, Yong-Goo Shin
Region-adaptive normalization (RAN) methods have been widely used in the
generative adversarial network (GAN)-based image-to-image translation
technique. However, since these approaches need a mask image to infer the
pixel-wise affine transformation parameters, they cannot be applied to the
general image generation models having no paired mask images. To resolve this
problem, this paper presents a novel normalization method, called self
pixel-wise normalization (SPN), which effectively boosts the generative
performance by performing the pixel-adaptive affine transformation without the
mask image. In our method, the transforming parameters are derived from a
self-latent mask that divides the feature map into the foreground and
background regions. The visualization of the self-latent masks shows that SPN
effectively captures a single object to be generated as the foreground. Since
the proposed method produces the self-latent mask without external data, it is
easily applicable in the existing generative models. Extensive experiments on
various datasets reveal that the proposed method significantly improves the
performance of image generation technique in terms of Frechet inception
distance (FID) and Inception score (IS).
Authors' comments: 13 pages, 8 figures
Denise Hung, Josef Hanuš, Joseph R. Masiero, David J. Tholen
We present new thermophysical model (TPM) fits of 1,847 asteroids, deriving
thermal inertia, diameter, and Bond and visible geometric albedo. We use
thermal flux measurements obtained by the Wide-field Infrared Survey Explorer
(WISE; Wright et al. 2010; Mainzer et al. 2011) during its fully cryogenic
phase, when both the 12$\mu$m (W3) and 22$\mu$m (W4) bands were available. We
take shape models and spin information from the Database of Asteroid Models
from Inversion Techniques (DAMIT; \v{D}urech et al. 2010) and derive new shape
models through lightcurve inversion and combining WISE photometry with existing
DAMIT lightcurves. When we limit our sample to the asteroids with the most
reliable shape models and thermal flux measurements, we find broadly consistent
thermal inertia relations with recent studies. We apply fits to the diameters
$D$ (km) and thermal inertia $\Gamma$ (J m$^{-2}$ s$^{-0.5}$ K$^{-1}$)
normalized to 1 au with a linear relation of the form
$\log[\Gamma]=\alpha+\beta\log[D]$, where we find $\alpha = 2.667 \pm 0.059$
and $\beta = -0.467 \pm 0.044$ for our sample alone and $\alpha = 2.509 \pm
0.017$ and $\beta = -0.352 \pm 0.012$ when combined with other literature
estimates. We find little evidence of any correlation between rotation period
and thermal inertia, owing to the small number of slow rotators to consider in
our sample. While the large uncertainties on the majority of our derived
thermal inertia only allow us to identify broad trends between thermal inertia
and other physical parameters, we can expect a significant increase in
high-quality thermal flux measurements and asteroid shape models with upcoming
infrared and wide-field surveys, enabling even more thermophysical modeling of
higher precision in the future.
Authors' comments: 86 pages, 17 figures, 6 tables, accepted by PSJ with proof
corrections. Table 5, now corrected, was erroneously published with a
different definition for "dense lightcurve", using 15 data points as a
minimum rather than 30, so several asteroids had higher numbers of dense
lightcurves than presented elsewhere in the paper
Daiqing Li, Huan Ling, Seung Wook Kim, Karsten Kreis, Adela Barriuso, Sanja Fidler, Antonio Torralba
Annotating images with pixel-wise labels is a time-consuming and costly
process. Recently, DatasetGAN showcased a promising alternative - to synthesize
a large labeled dataset via a generative adversarial network (GAN) by
exploiting a small set of manually labeled, GAN-generated images. Here, we
scale DatasetGAN to ImageNet scale of class diversity. We take image samples
from the class-conditional generative model BigGAN trained on ImageNet, and
manually annotate 5 images per class, for all 1k classes. By training an
effective feature segmentation architecture on top of BigGAN, we turn BigGAN
into a labeled dataset generator. We further show that VQGAN can similarly
serve as a dataset generator, leveraging the already annotated data. We create
a new ImageNet benchmark by labeling an additional set of 8k real images and
evaluate segmentation performance in a variety of settings. Through an
extensive ablation study we show big gains in leveraging a large generated
dataset to train different supervised and self-supervised backbone models on
pixel-wise tasks. Furthermore, we demonstrate that using our synthesized
datasets for pre-training leads to improvements over standard ImageNet
pre-training on several downstream datasets, such as PASCAL-VOC, MS-COCO,
Cityscapes and chest X-ray, as well as tasks (detection, segmentation). Our
benchmark will be made public and maintain a leaderboard for this challenging
task. Project Page: https://nv-tlabs.github.io/big-datasetgan/
Authors' comments: https://nv-tlabs.github.io/big-datasetgan/
Zemin Liu, Yuan Fang, Chenghao Liu, Steven C. H. Hoi
Graph neural networks (GNNs) emerge as a powerful family of representation learning models on graphs. To derive node representations, they utilize a global model that recursively aggregates information from the neighboring nodes. However, different nodes reside at different parts of the graph in different local contexts, making their distributions vary across the graph. Ideally, how a node receives its neighborhood information should be a function of its local context, to diverge from the global GNN model shared by all nodes. To utilize node locality without overfitting, we propose a node-wise localization of GNNs by accounting for both global and local aspects of the graph. Globally, all nodes on the graph depend on an underlying global GNN to encode the general patterns across the graph; locally, each node is localized into a unique model as a function of the global model and its local context. Finally, we conduct extensive experiments on four benchmark graphs, and consistently obtain promising performance surpassing the state-of-the-art GNNs.
Tim Dettmers, Mike Lewis, Sam Shleifer, Luke Zettlemoyer
Stateful optimizers maintain gradient statistics over time, e.g., the
exponentially smoothed sum (SGD with momentum) or squared sum (Adam) of past
gradient values. This state can be used to accelerate optimization compared to
plain stochastic gradient descent but uses memory that might otherwise be
allocated to model parameters, thereby limiting the maximum size of models
trained in practice. In this paper, we develop the first optimizers that use
8-bit statistics while maintaining the performance levels of using 32-bit
optimizer states. To overcome the resulting computational, quantization, and
stability challenges, we develop block-wise dynamic quantization. Block-wise
quantization divides input tensors into smaller blocks that are independently
quantized. Each block is processed in parallel across cores, yielding faster
optimization and high precision quantization. To maintain stability and
performance, we combine block-wise quantization with two additional changes:
(1) dynamic quantization, a form of non-linear optimization that is precise for
both large and small magnitude values, and (2) a stable embedding layer to
reduce gradient variance that comes from the highly non-uniform distribution of
input tokens in language models. As a result, our 8-bit optimizers maintain
32-bit performance with a small fraction of the memory footprint on a range of
tasks, including 1.5B parameter language modeling, GLUE finetuning, ImageNet
classification, WMT'14 machine translation, MoCo v2 contrastive ImageNet
pretraining+finetuning, and RoBERTa pretraining, without changes to the
original optimizer hyperparameters. We open-source our 8-bit optimizers as a
drop-in replacement that only requires a two-line code change.
Authors' comments: ICLR2022 spotlight version
Denishrouf Thesingarajah, Adam M. Johansen
Motivated by problems from neuroimaging in which existing approaches make use
of "mass univariate" analysis which neglects spatial structure entirely, but
the full joint modelling of all quantities of interest is computationally
infeasible, a novel method for incorporating spatial dependence within a
(potentially large) family of model-selection problems is presented. Spatial
dependence is encoded via a Markov random field model for which a variant of
the pseudo-marginal Markov chain Monte Carlo algorithm is developed and
extended by a further augmentation of the underlying state space. This approach
allows the exploitation of existing unbiased marginal likelihood estimators
used in settings in which spatial independence is normally assumed thereby
facilitating the incorporation of spatial dependence using non-spatial
estimates with minimal additional development effort.
The proposed algorithm can be realistically used for analysis of %smaller
subsets of large image moderately sized data sets such as $2$D slices of whole
$3$D dynamic PET brain images or other regions of interest. Principled
approximations of the proposed method, together with simple extensions based on
the augmented spaces, are investigated and shown to provide similar results to
the full pseudo-marginal method. Such approximations and extensions allow the
improved performance obtained by incorporating spatial dependence to be
obtained at negligible additional cost. An application to measured PET image
data shows notable improvements in revealing underlying spatial structure when
compared to current methods that assume spatial independence.
Authors' comments: 37 pages, 17 figures, 1 table