Pallavi Patil, Mark Whittle, Kristina Nyland, Carol Lonsdale, Mark Lacy, Amy E. Kimball, Colin Lonsdale, Wendy Peters et al.
We present radio spectra spanning $0.1 - 10$ GHz for the sample of heavily
obscured luminous quasars with extremely red mid-infrared-optical colors and
compact radio emission. The spectra are constructed from targeted 10 GHz
observations and archival radio survey data, which together yield $6-11$ flux
density measurements for each object. Our suite of Python tools for modeling
the radio spectra is publicly available on Github. Our primary result is that
most (61%) of the sample have peaked or curved radio spectra and many (36%)
could be classified as Gigahertz Peaked Spectrum (GPS) sources. This indicates
compact emission regions likely arising from recently triggered radio jets.
Assuming synchrotron self-absorption (SSA) generates the peaks, we infer
compact source sizes ($3 - 100$ pc) with strong magnetic fields ($6 - 100$ mG)
and young ages ($30 - 10^4$ years). Conversely, free-free absorption (FFA)
could also create peaks due to the high column densities associated with the
deeply embedded nature of the sample. However, we find no correlations between
the existence or frequency of the peaks and any parameters of the MIR emission.
The high-frequency spectral indices are steep ($\alpha \approx -1$) and
correlate, weakly, with the ratio of MIR photon energy density to magnetic
energy density, suggesting that the spectral steepening could arise from
inverse Compton scattering off the intense MIR photon field. This study
provides a foundation for combining multi-frequency and mixed-resolution radio
survey data for understanding the impact of young radio jets on the ISM and
star formation rates of their host galaxies.
Authors' comments: 48 pages, 17 figures, published in Astrophysical Journal
Chen Lin, Zheyang Li, Bo Peng, Haoji Hu, Wenming Tan, Ye Ren, Shiliang Pu
This paper introduces a post-training quantization~(PTQ) method achieving
highly efficient Convolutional Neural Network~ (CNN) quantization with high
performance. Previous PTQ methods usually reduce compression error via
performing layer-by-layer parameters calibration. However, with lower
representational ability of extremely compressed parameters (e.g., the
bit-width goes less than 4), it is hard to eliminate all the layer-wise errors.
This work addresses this issue via proposing a unit-wise feature reconstruction
algorithm based on an observation of second order Taylor series expansion of
the unit-wise error. It indicates that leveraging the interaction between
adjacent layers' parameters could compensate layer-wise errors better. In this
paper, we define several adjacent layers as a Basic-Unit, and present a
unit-wise post-training algorithm which can minimize quantization error. This
method achieves near-original accuracy on ImageNet and COCO when quantizing
FP32 models to INT4 and INT3.
Authors' comments: Accepted by BMVC 2021
Surajit Borkotokey, Sujata Goala, Niharika Kakoty, Parishmita Boruah
We introduce the component-wise egalitarian Myerson value for network games. This new value being a convex combination of the Myerson value and the component-wise equal division rule is a player-based allocation rule. In network games under the cooperative framework, the Myerson value is an extreme example of marginalism, while the equal division rule signifies egalitarianism. In the proposed component-wise egalitarian Myerson value, a convexity parameter combines these two attributes and determines the degree of solidarity to the players. Here, by solidarity, we mean the mutual support or compensation among the players in a network. We provide three axiomatic characterizations of the value. Further, we propose an implementation mechanism for the component-wise egalitarian Myerson value under subgame perfect Nash equilibrium.
Théo Dessertaine, Jean-Philippe Bouchaud
We consider a simple model for multidimensional cone-wise linear dynamics
around cusp-like equilibria. We assume that the local linear evolution is
either $\mathbf{v}^\prime=\mathbb{A}\mathbf{v}$ or $\mathbb{B}\mathbf{v}$ (with
$\mathbb{A}$, $\mathbb{B}$ independently drawn a rotationally invariant
ensemble of $N \times N$ matrices) depending on the sign of the first component
of $\mathbf{v}$. We establish strong connections with the random diffusion
persistence problem. When $N \to \infty$, we find that the Lyapounov exponent
is non self-averaging, i.e. one can observe apparent stability and apparent
instability for the same system, depending on time and initial conditions.
Finite $N$ effects are also discussed, and lead to cone trapping phenomena.
Authors' comments: 5 pages, 4 figures
Hao Sun, Taiyi Wang
Although it is well known that exploration plays a key role in Reinforcement Learning (RL), prevailing exploration strategies for continuous control tasks in RL are mainly based on naive isotropic Gaussian noise regardless of the causality relationship between action space and the task and consider all dimensions of actions equally important. In this work, we propose to conduct interventions on the primal action space to discover the causal relationship between the action space and the task reward. We propose the method of State-Wise Action Refined (SWAR), which addresses the issue of action space redundancy and promote causality discovery in RL. We formulate causality discovery in RL tasks as a state-dependent action space selection problem and propose two practical algorithms as solutions. The first approach, TD-SWAR, detects task-related actions during temporal difference learning, while the second approach, Dyn-SWAR, reveals important actions through dynamic model prediction. Empirically, both methods provide approaches to understand the decisions made by RL agents and improve learning efficiency in action-redundant tasks.
Hunmin Lee, Yueyang Liu, Donghyun Kim, Yingshu Li
Non-IID dataset and heterogeneous environment of the local clients are regarded as a major issue in Federated Learning (FL), causing a downturn in the convergence without achieving satisfactory performance. In this paper, we propose a novel Label-wise clustering algorithm that guarantees the trainability among geographically dispersed heterogeneous local clients, by selecting only the local models trained with a dataset that approximates into uniformly distributed class labels, which is likely to obtain faster minimization of the loss and increment the accuracy among the FL network. Through conducting experiments on the suggested six common non-IID scenarios, we empirically show that the vanilla FL aggregation model is incapable of gaining robust convergence generating biased pre-trained local models and drifting the local weights to mislead the trainability in the worst case. Moreover, we quantitatively estimate the expected performance of the local models before training, which offers a global server to select the optimal clients, saving additional computational costs. Ultimately, in order to gain resolution of the non-convergence in such non-IID situations, we design clustering algorithms based on local input class labels, accommodating the diversity and assorting clients that could lead the overall system to attain the swift convergence as global training continues. Our paper shows that proposed Label-wise clustering demonstrates prompt and robust convergence compared to other FL algorithms when local training datasets are non-IID or coexist with IID through multiple experiments.
Lanning Wei, Huan Zhao, Zhiqiang He
In recent years, Graph Neural Networks (GNNs) have shown superior performance on diverse applications on real-world datasets. To improve the model capacity and alleviate the over-smoothing problem, several methods proposed to incorporate the intermediate layers by layer-wise connections. However, due to the highly diverse graph types, the performance of existing methods vary on diverse graphs, leading to a need for data-specific layer-wise connection methods. To address this problem, we propose a novel framework LLC (Learn Layer-wise Connections) based on neural architecture search (NAS) to learn adaptive connections among intermediate layers in GNNs. LLC contains one novel search space which consists of 3 types of blocks and learnable connections, and one differentiable search algorithm to enable the efficient search process. Extensive experiments on five real-world datasets are conducted, and the results show that the searched layer-wise connections can not only improve the performance but also alleviate the over-smoothing problem.
Norihide Tokushige
Let $\mathcal G$ be a family of subsets of an $n$-element set. The family $\mathcal G$ is called $3$-wise $t$-intersecting if the intersection of any three subsets in $\mathcal G$ is of size at least $t$. For a real number $p\in(0,1)$ we define the measure of the family by the sum of $p^{|G|}(1-p)^{n-|G|}$ over all $G\in\mathcal G$. We prove that if $t\geq 15$ and $p\leq 2/(\sqrt{4t+9}-1)$ then $p^t$ is the maximum measure of $3$-wise $t$-intersecting families, and the bound for $p$ is sharp. We also present the corresponding stability result for shifted families.
Srimanta Mandal, Kuldeep Purohit, A. N. Rajagopalan
In practice, images can contain different amounts of noise for different color channels, which is not acknowledged by existing super-resolution approaches. In this paper, we propose to super-resolve noisy color images by considering the color channels jointly. Noise statistics are blindly estimated from the input low-resolution image and are used to assign different weights to different color channels in the data cost. Implicit low-rank structure of visual data is enforced via nuclear norm minimization in association with adaptive weights, which is added as a regularization term to the cost. Additionally, multi-scale details of the image are added to the model through another regularization term that involves projection onto PCA basis, which is constructed using similar patches extracted across different scales of the input image. The results demonstrate the super-resolving capability of the approach in real scenarios.
Vijay Lingam, Chanakya Ekbote, Manan Sharma, Rahul Ragesh, Arun Iyer, Sundararajan Sellamanickam
Graph Neural Networks (GNNs) exploit signals from node features and the input
graph topology to improve node classification task performance. However, these
models tend to perform poorly on heterophilic graphs, where connected nodes
have different labels. Recently proposed GNNs work across graphs having varying
levels of homophily. Among these, models relying on polynomial graph filters
have shown promise. We observe that solutions to these polynomial graph filter
models are also solutions to an overdetermined system of equations. It suggests
that in some instances, the model needs to learn a reasonably high order
polynomial. On investigation, we find the proposed models ineffective at
learning such polynomials due to their designs. To mitigate this issue, we
perform an eigendecomposition of the graph and propose to learn multiple
adaptive polynomial filters acting on different subsets of the spectrum. We
theoretically and empirically show that our proposed model learns a better
filter, thereby improving classification accuracy. We study various aspects of
our proposed model including, dependency on the number of eigencomponents
utilized, latent polynomial filters learned, and performance of the individual
polynomials on the node classification task. We further show that our model is
scalable by evaluating over large graphs. Our model achieves performance gains
of up to 5% over the state-of-the-art models and outperforms existing
polynomial filter-based approaches in general.
Authors' comments: 28 pages, 9 figures, Under Review
Zhiguo Huang, Xiaowei Chen, Bojuan Wang
Numerous works have proven that existing neighbor-averaging Graph Neural
Networks cannot efficiently catch structure features, and many works show that
injecting structure, distance, position or spatial features can significantly
improve performance of GNNs, however, injecting overall structure and distance
into GNNs is an intuitive but remaining untouched idea. In this work, we shed
light on the direction. We first extracting hop-wise structure information and
compute distance distributional information, gathering with node's intrinsic
features, embedding them into same vector space and then adding them up. The
derived embedding vectors are then fed into GATs(like GAT, AGDN) and then
Correct and Smooth, experiments show that the DHSEGATs achieve competitive
result. The code is available at https://github.com/hzg0601/DHSEGATs.
Authors' comments: 11 pages; 1 figures;
Tal Shaharabany, Lior Wolf
The leading segmentation methods represent the output map as a pixel grid. We study an alternative representation in which the object edges are modeled, per image patch, as a polygon with $k$ vertices that is coupled with per-patch label probabilities. The vertices are optimized by employing a differentiable neural renderer to create a raster image. The delineated region is then compared with the ground truth segmentation. Our method obtains multiple state-of-the-art results: 76.26\% mIoU on the Cityscapes validation, 90.92\% IoU on the Vaihingen building segmentation benchmark, 66.82\% IoU for the MoNU microscopy dataset, and 90.91\% for the bird benchmark CUB. Our code for training and reproducing these results is attached as supplementary.
Kento Hasegawa, Kazuki Yamashita, Seira Hidano, Kazuhide Fukushima, Kazuo Hashimoto, Nozomu Togawa
In the fourth industrial revolution, securing the protection of the supply chain has become an ever-growing concern. One such cyber threat is a hardware Trojan (HT), a malicious modification to an IC. HTs are often identified in the hardware manufacturing process, but should be removed earlier, when the design is being specified. Machine learning-based HT detection in gate-level netlists is an efficient approach to identify HTs at the early stage. However, feature-based modeling has limitations in discovering an appropriate set of HT features. We thus propose NHTD-GL in this paper, a novel node-wise HT detection method based on graph learning (GL). Given the formal analysis of HT features obtained from domain knowledge, NHTD-GL bridges the gap between graph representation learning and feature-based HT detection. The experimental results demonstrate that NHTD-GL achieves 0.998 detection accuracy and outperforms state-of-the-art node-wise HT detection methods. NHTD-GL extracts HT features without heuristic feature engineering.
Hyunmin Lee, Jaesik Park
In this paper, we introduce a new dataset, named InstaOrder, that can be used
to understand the geometrical relationships of instances in an image. The
dataset consists of 2.9M annotations of geometric orderings for class-labeled
instances in 101K natural scenes. The scenes were annotated by 3,659
crowd-workers regarding (1) occlusion order that identifies occluder/occludee
and (2) depth order that describes ordinal relations that consider relative
distance from the camera. The dataset provides joint annotation of two kinds of
orderings for the same instances, and we discover that the occlusion order and
depth order are complementary. We also introduce a geometric order prediction
network called InstaOrderNet, which is superior to state-of-the-art approaches.
Moreover, we propose a dense depth prediction network called InstaDepthNet that
uses auxiliary geometric order loss to boost the accuracy of the
state-of-the-art depth prediction approach, MiDaS [56].
Authors' comments: Accepted to CVPR 2022. Code is available at
https://github.com/POSTECH-CVLab/InstaOrder
Barnabás Janzer
A family $\mathcal{F}$ of subsets of $\{1,\dots,n\}$ is called $k$-wise
intersecting if any $k$ members of $\mathcal{F}$ have non-empty intersection,
and it is called maximal $k$-wise intersecting if no family strictly containing
$\mathcal{F}$ satisfies this condition. We show that for each $k\geq 2$ there
is a maximal $k$-wise intersecting family of size $O(2^{n/(k-1)})$. Up to a
constant factor, this matches the best known lower bound, and answers an old
question of Erd\H{o}s and Kleitman, recently studied by Hendrey, Lund,
Tompkins, and Tran.
Authors' comments: 4 pages; added a new section about the non-existence of certain types
of constructions
Marco Colussi, Stavros Ntalampiras
After constructing a deep neural network for urban sound classification, this work focuses on the sensitive application of assisting drivers suffering from hearing loss. As such, clear etiology justifying and interpreting model predictions comprise a strong requirement. To this end, we used two different representations of audio signals, i.e. Mel and constant-Q spectrograms, while the decisions made by the deep neural network are explained via layer-wise relevance propagation. At the same time, frequency content assigned with high relevance in both feature sets, indicates extremely discriminative information characterizing the present classification task. Overall, we present an explainable AI framework for understanding deep urban sound classification.
Yuanhao Cai, Jing Lin, Xiaowan Hu, Haoqian Wang, Xin Yuan, Yulun Zhang, Radu Timofte, Luc Van Gool
Hyperspectral image (HSI) reconstruction aims to recover the 3D
spatial-spectral signal from a 2D measurement in the coded aperture snapshot
spectral imaging (CASSI) system. The HSI representations are highly similar and
correlated across the spectral dimension. Modeling the inter-spectra
interactions is beneficial for HSI reconstruction. However, existing CNN-based
methods show limitations in capturing spectral-wise similarity and long-range
dependencies. Besides, the HSI information is modulated by a coded aperture
(physical mask) in CASSI. Nonetheless, current algorithms have not fully
explored the guidance effect of the mask for HSI restoration. In this paper, we
propose a novel framework, Mask-guided Spectral-wise Transformer (MST), for HSI
reconstruction. Specifically, we present a Spectral-wise Multi-head
Self-Attention (S-MSA) that treats each spectral feature as a token and
calculates self-attention along the spectral dimension. In addition, we
customize a Mask-guided Mechanism (MM) that directs S-MSA to pay attention to
spatial regions with high-fidelity spectral representations. Extensive
experiments show that our MST significantly outperforms state-of-the-art (SOTA)
methods on simulation and real HSI datasets while requiring dramatically
cheaper computational and memory costs. Code and pre-trained models are
available at https://github.com/caiyuanhao1998/MST/
Authors' comments: CVPR 2022; The first Transformer-based method for snapshot
compressive imaging
Feng Liu, Zhe Kong, Haozhe Liu, Wentian Zhang, Linlin Shen
Due to the diversity of attack materials, fingerprint recognition systems
(AFRSs) are vulnerable to malicious attacks. It is thus important to propose
effective fingerprint presentation attack detection (PAD) methods for the
safety and reliability of AFRSs. However, current PAD methods often exhibit
poor robustness under new attack types settings. This paper thus proposes a
novel channel-wise feature denoising fingerprint PAD (CFD-PAD) method by
handling the redundant noise information ignored in previous studies. The
proposed method learns important features of fingerprint images by weighing the
importance of each channel and identifying discriminative channels and "noise"
channels. Then, the propagation of "noise" channels is suppressed in the
feature map to reduce interference. Specifically, a PA-Adaptation loss is
designed to constrain the feature distribution to make the feature distribution
of live fingerprints more aggregate and that of spoof fingerprints more
disperse. Experimental results evaluated on the LivDet 2017 dataset showed that
the proposed CFD-PAD can achieve a 2.53% average classification error (ACE) and
a 93.83% true detection rate when the false detection rate equals 1.0%
(TDR@FDR=1%). Also, the proposed method markedly outperforms the best
single-model-based methods in terms of ACE (2.53% vs. 4.56%) and
TDR@FDR=1%(93.83% vs. 73.32%), which demonstrates its effectiveness. Although
we have achieved a comparable result with the state-of-the-art
multiple-model-based methods, there still is an increase in TDR@FDR=1% from
91.19% to 93.83%. In addition, the proposed model is simpler, lighter and more
efficient and has achieved a 74.76% reduction in computation time compared with
the state-of-the-art multiple-model-based method. The source code is available
at https://github.com/kongzhecn/cfd-pad.
Authors' comments: 15 pages, 8 figures, Accepted by TIFS
Matteo Guarrera, Baihong Jin, Tung-Wei Lin, Maria Zuluaga, Yuxin Chen, Alberto Sangiovanni-Vincentelli
We consider the problem of detecting OoD(Out-of-Distribution) input data when
using deep neural networks, and we propose a simple yet effective way to
improve the robustness of several popular OoD detection methods against label
shift. Our work is motivated by the observation that most existing OoD
detection algorithms consider all training/test data as a whole, regardless of
which class entry each input activates (inter-class differences). Through
extensive experimentation, we have found that such practice leads to a detector
whose performance is sensitive and vulnerable to label shift. To address this
issue, we propose a class-wise thresholding scheme that can apply to most
existing OoD detection algorithms and can maintain similar OoD detection
performance even in the presence of label shift in the test distribution.
Authors' comments: 12 pages, 7 figures, 7 tables
Wangyou Zhang, Zhuo Chen, Naoyuki Kanda, Shujie Liu, Jinyu Li, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao et al.
Multi-talker conversational speech processing has drawn many interests for
various applications such as meeting transcription. Speech separation is often
required to handle overlapped speech that is commonly observed in conversation.
Although the original utterancelevel permutation invariant training-based
continuous speech separation approach has proven to be effective in various
conditions, it lacks the ability to leverage the long-span relationship of
utterances and is computationally inefficient due to the highly overlapped
sliding windows. To overcome these drawbacks, we propose a novel training
scheme named Group-PIT, which allows direct training of the speech separation
models on the long-form speech with a low computational cost for label
assignment. Two different speech separation approaches with Group-PIT are
explored, including direct long-span speech separation and short-span speech
separation with long-span tracking. The experiments on the simulated
meeting-style data demonstrate the effectiveness of our proposed approaches,
especially in dealing with a very long speech input.
Authors' comments: 5 pages, 3 figures, 3 tables, submitted to IEEE ICASSP 2022