Jiaqi Liao, Mengyu Cao, Mei Lu
Let $t$, $r$, $k$ and $n$ be positive integers and $\mathcal{F}$ a family of
$k$-subsets of an $n$-set $V$. The family $ \CF $ is $ r $-wise $ t
$-intersecting if for any $ F_1, \ldots, F_r \in \CF $, we have $ \abs{\cap_{i
= 1}^{r}F_i}\gs t $. An $ r $-wise $ t $-intersecting family of $ r + 1 $ sets
$ \{T_1, \ldots, T_{r + 1}\} $ is called an $ (r + 1,t) $-triangle if $ |T_1
\cap \cdots \cap T_{r + 1}| \ls t - 1 $. In this paper, we prove that if $ n
\gs n_0(r, t, k) $, then the $ r $-wise $ t $-intersecting family $ \CF
\subseteq \binom{[n]}{k} $ containing the most $ (r + 1,t) $-triangles is
isomorphic to $ \curlybraces{F \in \binom{[n]}{k}: \abs{F \cap [r + t]} \gs r +
t - 1} $. This can also be regarded as a generalized Tur\'{a}n type result.
Authors' comments: 14 pages
Zelin Zhao, Ze Wu, Yueqing Zhuang, Boxun Li, Jiaya Jia
Multi-object tracking (MOT) requires detecting and associating objects
through frames. Unlike tracking via detected bounding boxes or tracking objects
as points, we propose tracking objects as pixel-wise distributions. We
instantiate this idea on a transformer-based architecture, P3AFormer, with
pixel-wise propagation, prediction, and association. P3AFormer propagates
pixel-wise features guided by flow information to pass messages between frames.
Furthermore, P3AFormer adopts a meta-architecture to produce multi-scale object
feature maps. During inference, a pixel-wise association procedure is proposed
to recover object connections through frames based on the pixel-wise
prediction. P3AFormer yields 81.2\% in terms of MOTA on the MOT17 benchmark --
the first among all transformer networks to reach 80\% MOTA in literature.
P3AFormer also outperforms state-of-the-arts on the MOT20 and KITTI benchmarks.
Authors' comments: Accepted in ECCV22 as an oral presentation paper. The code&project
page is at
https://github.com/dvlab-research/ECCV22-P3AFormer-Tracking-Objects-as-Pixel-wise-Distributions
Jingtang Liang, Chi-Man Pun
Image harmonization task aims at harmonizing different composite foreground regions according to specific background image. Previous methods would rather focus on improving the reconstruction ability of the generator by some internal enhancements such as attention, adaptive normalization and light adjustment, $etc.$. However, they pay less attention to discriminating the foreground and background appearance features within a restricted generator, which becomes a new challenge in image harmonization task. In this paper, we propose a novel image harmonization framework with external style fusion and region-wise contrastive learning scheme. For the external style fusion, we leverage the external background appearance from the encoder as the style reference to generate harmonized foreground in the decoder. This approach enhances the harmonization ability of the decoder by external background guidance. Moreover, for the contrastive learning scheme, we design a region-wise contrastive loss function for image harmonization task. Specifically, we first introduce a straight-forward samples generation method that selects negative samples from the output harmonized foreground region and selects positive samples from the ground-truth background region. Our method attempts to bring together corresponding positive and negative samples by maximizing the mutual information between the foreground and background styles, which desirably makes our harmonization network more robust to discriminate the foreground and background style features when harmonizing composite images. Extensive experiments on the benchmark datasets show that our method can achieve a clear improvement in harmonization quality and demonstrate the good generalization capability in real-scenario applications.
Yue Guan, Zhengyi Li, Jingwen Leng, Zhouhan Lin, Minyi Guo
Transformer architecture has become the de-facto model for many machine
learning tasks from natural language processing and computer vision. As such,
improving its computational efficiency becomes paramount. One of the major
computational inefficiency of Transformer-based models is that they spend the
identical amount of computation throughout all layers. Prior works have
proposed to augment the Transformer model with the capability of skimming
tokens to improve its computational efficiency. However, they suffer from not
having effectual and end-to-end optimization of the discrete skimming
predictor. To address the above limitations, we propose the Transkimmer
architecture, which learns to identify hidden state tokens that are not
required by each layer. The skimmed tokens are then forwarded directly to the
final output, thus reducing the computation of the successive layers. The key
idea in Transkimmer is to add a parameterized predictor before each layer that
learns to make the skimming decision. We also propose to adopt
reparameterization trick and add skim loss for the end-to-end training of
Transkimmer. Transkimmer achieves 10.97x average speedup on GLUE benchmark
compared with vanilla BERT-base baseline with less than 1% accuracy
degradation.
Authors' comments: Published as a conference paper at ACL 2022
Hong-Yi Wang, Tian-Sheuan Chang
Following the success of the natural language processing, the transformer for
vision applications has attracted significant attention in recent years due to
its excellent performance. However, existing deep learning hardware
accelerators for vision cannot execute this structure efficiently due to
significant model architecture differences. As a result, this paper proposes
the hardware accelerator for vision transformers with row-wise scheduling,
which decomposes major operations in vision transformers as a single dot
product primitive for a unified and efficient execution. Furthermore, by
sharing weights in columns, we can reuse the data and reduce the usage of
memory. The implementation with TSMC 40nm CMOS technology only requires 262K
gate count and 149KB SRAM buffer for 403.2 GOPS throughput at 600MHz clock
frequency.
Authors' comments: 5 pages, 6 figures, published in IEEE AICAS 2022
Lucas Rosenblatt, Joshua Allen, Julia Stoyanovich
Differentially private (DP) synthetic data generation is a practical method for improving access to data as a means to encourage productive partnerships. One issue inherent to DP is that the "privacy budget" is generally "spent" evenly across features in the data set. This leads to good statistical parity with the real data, but can undervalue the conditional probabilities and marginals that are critical for predictive quality of synthetic data. Further, loss of predictive quality may be non-uniform across the data set, with subsets that correspond to minority groups potentially suffering a higher loss. In this paper, we develop ensemble methods that distribute the privacy budget "wisely" to maximize predictive accuracy of models trained on DP data, and "fairly" to bound potential disparities in accuracy across groups and reduce inequality. Our methods are based on the insights that feature importance can inform how privacy budget is allocated, and, further, that per-group feature importance and fairness-related performance objectives can be incorporated in the allocation. These insights make our methods tunable to social contexts, allowing data owners to produce balanced synthetic data for predictive analysis.
Martin Groenewegen
(abridged) Variability is a key property of stars on the asymptotic giant branch (AGB). Their pulsation period is related to the luminosity and mass-loss rate (MLR) of the star. Long-period variables (LPVs) and Mira variables are the most prominent of all types of variability of evolved stars. However, the reddest, most obscured AGB stars are too faint in the optical and have eluded large variability surveys. Our goal is to obtain a sample of LPVs with large MLRs by analysing WISE W1 and W2 light curves (LCs) for about 2000 sources, photometrically selected to include known C-stars with the 11.3 mu silicon carbide dust feature in absorption, and Galactic O-stars with periods longer than 1000 days. Epoch photometry was retrieved from the AllWISE and NEOWISE database and fitted with a sinus curve. Photometry from other variability surveys was also downloaded and fitted. For a subset of 316 of the reddest stars, spectral energy distributions (SEDs) were constructed, and, together with mid-infrared (MIR) spectra when available, fitted with a dust radiative transfer programme in order to derive MLRs. WISE based LCs and fits to the data are presented for all stars. Inspired by a recent paper, a number of non-variable OH/IRs are identified. Based on a selection on amplitude, a sample of about 750 (candidate) LPVs is selected of which 145 have periods over 1000 days, many of them being new. For the subset of the stars with the colours of C-rich extremely red objects (EROs) the fitting of the SEDs (and available MIR spectra) separates them into C- and O-rich objects. The number of Galactic EROs appears to be complete up to about 5~kpc and a total dust return rate in the solar neighbourhood for this class is determined. Based on the EROs in the Magellanic Clouds, a bolometric period luminosity is derived.
Jialong Tang, Hongyu Lin, Meng Liao, Yaojie Lu, Xianpei Han, Le Sun, Weijian Xie, Jin Xu
Procedural text understanding requires machines to reason about entity states
within the dynamical narratives. Current procedural text understanding
approaches are commonly \textbf{entity-wise}, which separately track each
entity and independently predict different states of each entity. Such an
entity-wise paradigm does not consider the interaction between entities and
their states. In this paper, we propose a new \textbf{scene-wise} paradigm for
procedural text understanding, which jointly tracks states of all entities in a
scene-by-scene manner. Based on this paradigm, we propose \textbf{S}cene
\textbf{G}raph \textbf{R}easoner (\textbf{SGR}), which introduces a series of
dynamically evolving scene graphs to jointly formulate the evolution of
entities, states and their associations throughout the narrative. In this way,
the deep interactions between all entities and states can be jointly captured
and simultaneously derived from scene graphs. Experiments show that SGR not
only achieves the new state-of-the-art performance but also significantly
accelerates the speed of reasoning.
Authors' comments: 9 pages, 2 figures
Cristian Ivan
We introduce an algorithm where the individual bits representing the weights
of a neural network are learned. This method allows training weights with
integer values on arbitrary bit-depths and naturally uncovers sparse networks,
without additional constraints or regularization techniques. We show better
results than the standard training technique with fully connected networks and
similar performance as compared to standard training for convolutional and
residual networks. By training bits in a selective manner we found that the
biggest contribution to achieving high accuracy is given by the first three
most significant bits, while the rest provide an intrinsic regularization. As a
consequence more than 90\% of a network can be used to store arbitrary codes
without affecting its accuracy. These codes may be random noise, binary files
or even the weights of previously trained networks.
Authors' comments: 9 pages, 9 figures
Yoon-Jae Yeo, Min-Cheol Sagong, Seung Park, Sung-Jea Ko, Yong-Goo Shin
Region-adaptive normalization (RAN) methods have been widely used in the
generative adversarial network (GAN)-based image-to-image translation
technique. However, since these approaches need a mask image to infer the
pixel-wise affine transformation parameters, they cannot be applied to the
general image generation models having no paired mask images. To resolve this
problem, this paper presents a novel normalization method, called self
pixel-wise normalization (SPN), which effectively boosts the generative
performance by performing the pixel-adaptive affine transformation without the
mask image. In our method, the transforming parameters are derived from a
self-latent mask that divides the feature map into the foreground and
background regions. The visualization of the self-latent masks shows that SPN
effectively captures a single object to be generated as the foreground. Since
the proposed method produces the self-latent mask without external data, it is
easily applicable in the existing generative models. Extensive experiments on
various datasets reveal that the proposed method significantly improves the
performance of image generation technique in terms of Frechet inception
distance (FID) and Inception score (IS).
Authors' comments: 13 pages, 8 figures
Denise Hung, Josef Hanuš, Joseph R. Masiero, David J. Tholen
We present new thermophysical model (TPM) fits of 1,847 asteroids, deriving
thermal inertia, diameter, and Bond and visible geometric albedo. We use
thermal flux measurements obtained by the Wide-field Infrared Survey Explorer
(WISE; Wright et al. 2010; Mainzer et al. 2011) during its fully cryogenic
phase, when both the 12$\mu$m (W3) and 22$\mu$m (W4) bands were available. We
take shape models and spin information from the Database of Asteroid Models
from Inversion Techniques (DAMIT; \v{D}urech et al. 2010) and derive new shape
models through lightcurve inversion and combining WISE photometry with existing
DAMIT lightcurves. When we limit our sample to the asteroids with the most
reliable shape models and thermal flux measurements, we find broadly consistent
thermal inertia relations with recent studies. We apply fits to the diameters
$D$ (km) and thermal inertia $\Gamma$ (J m$^{-2}$ s$^{-0.5}$ K$^{-1}$)
normalized to 1 au with a linear relation of the form
$\log[\Gamma]=\alpha+\beta\log[D]$, where we find $\alpha = 2.667 \pm 0.059$
and $\beta = -0.467 \pm 0.044$ for our sample alone and $\alpha = 2.509 \pm
0.017$ and $\beta = -0.352 \pm 0.012$ when combined with other literature
estimates. We find little evidence of any correlation between rotation period
and thermal inertia, owing to the small number of slow rotators to consider in
our sample. While the large uncertainties on the majority of our derived
thermal inertia only allow us to identify broad trends between thermal inertia
and other physical parameters, we can expect a significant increase in
high-quality thermal flux measurements and asteroid shape models with upcoming
infrared and wide-field surveys, enabling even more thermophysical modeling of
higher precision in the future.
Authors' comments: 86 pages, 17 figures, 6 tables, accepted by PSJ with proof
corrections. Table 5, now corrected, was erroneously published with a
different definition for "dense lightcurve", using 15 data points as a
minimum rather than 30, so several asteroids had higher numbers of dense
lightcurves than presented elsewhere in the paper
Daiqing Li, Huan Ling, Seung Wook Kim, Karsten Kreis, Adela Barriuso, Sanja Fidler, Antonio Torralba
Annotating images with pixel-wise labels is a time-consuming and costly
process. Recently, DatasetGAN showcased a promising alternative - to synthesize
a large labeled dataset via a generative adversarial network (GAN) by
exploiting a small set of manually labeled, GAN-generated images. Here, we
scale DatasetGAN to ImageNet scale of class diversity. We take image samples
from the class-conditional generative model BigGAN trained on ImageNet, and
manually annotate 5 images per class, for all 1k classes. By training an
effective feature segmentation architecture on top of BigGAN, we turn BigGAN
into a labeled dataset generator. We further show that VQGAN can similarly
serve as a dataset generator, leveraging the already annotated data. We create
a new ImageNet benchmark by labeling an additional set of 8k real images and
evaluate segmentation performance in a variety of settings. Through an
extensive ablation study we show big gains in leveraging a large generated
dataset to train different supervised and self-supervised backbone models on
pixel-wise tasks. Furthermore, we demonstrate that using our synthesized
datasets for pre-training leads to improvements over standard ImageNet
pre-training on several downstream datasets, such as PASCAL-VOC, MS-COCO,
Cityscapes and chest X-ray, as well as tasks (detection, segmentation). Our
benchmark will be made public and maintain a leaderboard for this challenging
task. Project Page: https://nv-tlabs.github.io/big-datasetgan/
Authors' comments: https://nv-tlabs.github.io/big-datasetgan/
Zemin Liu, Yuan Fang, Chenghao Liu, Steven C. H. Hoi
Graph neural networks (GNNs) emerge as a powerful family of representation learning models on graphs. To derive node representations, they utilize a global model that recursively aggregates information from the neighboring nodes. However, different nodes reside at different parts of the graph in different local contexts, making their distributions vary across the graph. Ideally, how a node receives its neighborhood information should be a function of its local context, to diverge from the global GNN model shared by all nodes. To utilize node locality without overfitting, we propose a node-wise localization of GNNs by accounting for both global and local aspects of the graph. Globally, all nodes on the graph depend on an underlying global GNN to encode the general patterns across the graph; locally, each node is localized into a unique model as a function of the global model and its local context. Finally, we conduct extensive experiments on four benchmark graphs, and consistently obtain promising performance surpassing the state-of-the-art GNNs.
Tim Dettmers, Mike Lewis, Sam Shleifer, Luke Zettlemoyer
Stateful optimizers maintain gradient statistics over time, e.g., the
exponentially smoothed sum (SGD with momentum) or squared sum (Adam) of past
gradient values. This state can be used to accelerate optimization compared to
plain stochastic gradient descent but uses memory that might otherwise be
allocated to model parameters, thereby limiting the maximum size of models
trained in practice. In this paper, we develop the first optimizers that use
8-bit statistics while maintaining the performance levels of using 32-bit
optimizer states. To overcome the resulting computational, quantization, and
stability challenges, we develop block-wise dynamic quantization. Block-wise
quantization divides input tensors into smaller blocks that are independently
quantized. Each block is processed in parallel across cores, yielding faster
optimization and high precision quantization. To maintain stability and
performance, we combine block-wise quantization with two additional changes:
(1) dynamic quantization, a form of non-linear optimization that is precise for
both large and small magnitude values, and (2) a stable embedding layer to
reduce gradient variance that comes from the highly non-uniform distribution of
input tokens in language models. As a result, our 8-bit optimizers maintain
32-bit performance with a small fraction of the memory footprint on a range of
tasks, including 1.5B parameter language modeling, GLUE finetuning, ImageNet
classification, WMT'14 machine translation, MoCo v2 contrastive ImageNet
pretraining+finetuning, and RoBERTa pretraining, without changes to the
original optimizer hyperparameters. We open-source our 8-bit optimizers as a
drop-in replacement that only requires a two-line code change.
Authors' comments: ICLR2022 spotlight version
Denishrouf Thesingarajah, Adam M. Johansen
Motivated by problems from neuroimaging in which existing approaches make use
of "mass univariate" analysis which neglects spatial structure entirely, but
the full joint modelling of all quantities of interest is computationally
infeasible, a novel method for incorporating spatial dependence within a
(potentially large) family of model-selection problems is presented. Spatial
dependence is encoded via a Markov random field model for which a variant of
the pseudo-marginal Markov chain Monte Carlo algorithm is developed and
extended by a further augmentation of the underlying state space. This approach
allows the exploitation of existing unbiased marginal likelihood estimators
used in settings in which spatial independence is normally assumed thereby
facilitating the incorporation of spatial dependence using non-spatial
estimates with minimal additional development effort.
The proposed algorithm can be realistically used for analysis of %smaller
subsets of large image moderately sized data sets such as $2$D slices of whole
$3$D dynamic PET brain images or other regions of interest. Principled
approximations of the proposed method, together with simple extensions based on
the augmented spaces, are investigated and shown to provide similar results to
the full pseudo-marginal method. Such approximations and extensions allow the
improved performance obtained by incorporating spatial dependence to be
obtained at negligible additional cost. An application to measured PET image
data shows notable improvements in revealing underlying spatial structure when
compared to current methods that assume spatial independence.
Authors' comments: 37 pages, 17 figures, 1 table
Juan Miguel Valverde, Jussi Tohka
We propose Region-wise (RW) loss for biomedical image segmentation. Region-wise loss is versatile, can simultaneously account for class imbalance and pixel importance, and it can be easily implemented as the pixel-wise multiplication between the softmax output and a RW map. We show that, under the proposed RW loss framework, certain loss functions, such as Active Contour and Boundary loss, can be reformulated similarly with appropriate RW maps, thus revealing their underlying similarities and a new perspective to understand these loss functions. We investigate the observed optimization instability caused by certain RW maps, such as Boundary loss distance maps, and we introduce a mathematically-grounded principle to avoid such instability. This principle provides excellent adaptability to any dataset and practically ensures convergence without extra regularization terms or optimization tricks. Following this principle, we propose a simple version of boundary distance maps called rectified Region-wise (RRW) maps that, as we demonstrate in our experiments, achieve state-of-the-art performance with similar or better Dice coefficients and Hausdorff distances than Dice, Focal, weighted Cross entropy, and Boundary losses in three distinct segmentation tasks. We quantify the optimization instability provided by Boundary loss distance maps, and we empirically show that our RRW maps are stable to optimize. The code to run all our experiments is publicly available at: https://github.com/jmlipman/RegionWiseLoss.
Samuel J. Goodman
I present WISEA J052305.94-015356.1 as a new candidate extremely metal-poor T
subdwarf (esdT), based on its distinctive infrared colours and high proper
motion ($\sim500\ $mas/yr). Spectroscopic follow-up is now needed to confirm it
is a member of this newly discovered class of substellar objects.
Authors' comments: Published in RNAAS
Yan Bin Ng, Basura Fernando
We present a new architecture for human action forecasting from videos. A temporal recurrent encoder captures temporal information of input videos while a self-attention model is used to attend on relevant feature dimensions of the input space. To handle temporal variations in observed video data, a feature masking techniques is employed. We classify observed actions accurately using an auxiliary classifier which helps to understand what has happened so far. Then the decoder generates actions for the future based on the output of the recurrent encoder and the self-attention model. Experimentally, we validate each component of our architecture where we see that the impact of self-attention to identify relevant feature dimensions, temporal masking, and observed auxiliary classifier. We evaluate our method on two standard action forecasting benchmarks and obtain state-of-the-art results.
Sara Venkatesh
We investigate leaf-wise intersection points on hypersurfaces of contact type in monotone symplectic manifolds. We show that monotone Floer-essential Lagrangians detect periodic leaf-wise intersection points in hypersurfaces of contact type whose Reeb flow is Zoll. Examples include the prequantization bundles appearing in monotone toric negative line bundles. Generalizing, we prove the existence of leaf-wise intersection points for certain annulus subbundles in weak+-monotone negative line bundles, not necessarily toric. The proofs combine reduced symplectic cohomology with the original methods employed by Albers-Frauenfelder to prove global existence results of this kind.
Zhenliang He, Meina Kan, Shiguang Shan
Recent studies on Generative Adversarial Network (GAN) reveal that different
layers of a generative CNN hold different semantics of the synthesized images.
However, few GAN models have explicit dimensions to control the semantic
attributes represented in a specific layer. This paper proposes EigenGAN which
is able to unsupervisedly mine interpretable and controllable dimensions from
different generator layers. Specifically, EigenGAN embeds one linear subspace
with orthogonal basis into each generator layer. Via generative adversarial
training to learn a target distribution, these layer-wise subspaces
automatically discover a set of "eigen-dimensions" at each layer corresponding
to a set of semantic attributes or interpretable variations. By traversing the
coefficient of a specific eigen-dimension, the generator can produce samples
with continuous changes corresponding to a specific semantic attribute. Taking
the human face for example, EigenGAN can discover controllable dimensions for
high-level concepts such as pose and gender in the subspace of deep layers, as
well as low-level concepts such as hue and color in the subspace of shallow
layers. Moreover, in the linear case, we theoretically prove that our algorithm
derives the principal components as PCA does. Codes can be found in
https://github.com/LynnHo/EigenGAN-Tensorflow.
Authors' comments: ICCV 2021. Code: https://github.com/LynnHo/EigenGAN-Tensorflow