Rui Li, Shenglong Zhou, Dong Liu
Video analysis tasks rely heavily on identifying the pixels from different
frames that correspond to the same visual target. To tackle this problem,
recent studies have advocated feature learning methods that aim to learn
distinctive representations to match the pixels, especially in a
self-supervised fashion. Unfortunately, these methods have difficulties for
tiny or even single-pixel visual targets. Pixel-wise video correspondences were
traditionally related to optical flows, which however lead to deterministic
correspondences and lack robustness on real-world videos. We address the
problem of learning features for establishing pixel-wise correspondences.
Motivated by optical flows as well as the self-supervised feature learning, we
propose to use not only labeled synthetic videos but also unlabeled real-world
videos for learning fine-grained representations in a holistic framework. We
adopt an adversarial learning scheme to enhance the generalization ability of
the learned features. Moreover, we design a coarse-to-fine framework to pursue
high computational efficiency. Our experimental results on a series of
correspondence-based tasks demonstrate that the proposed method outperforms
state-of-the-art rivals in both accuracy and efficiency.
Authors' comments: Accepted to ICCV 2023. Code and models are available at
https://github.com/qianduoduolr/FGVC
Chang-Bin Jeon, Kyogu Lee
The loudness war, an ongoing phenomenon in the music industry characterized
by the increasing final loudness of music while reducing its dynamic range, has
been a controversial topic for decades. Music mastering engineers have used
limiters to heavily compress and make music louder, which can induce ear
fatigue and hearing loss in listeners. In this paper, we introduce music
de-limiter networks that estimate uncompressed music from heavily compressed
signals. Inspired by the principle of a limiter, which performs sample-wise
gain reduction of a given signal, we propose the framework of sample-wise gain
inversion (SGI). We also present the musdb-XL-train dataset, consisting of 300k
segments created by applying a commercial limiter plug-in for training
real-world friendly de-limiter networks. Our proposed de-limiter network
achieves excellent performance with a scale-invariant source-to-distortion
ratio (SI-SDR) of 23.8 dB in reconstructing musdb-HQ from musdb- XL data, a
limiter-applied version of musdb-HQ. The training data, codes, and model
weights are available in our repository
(https://github.com/jeonchangbin49/De-limiter).
Authors' comments: Accepted to IEEE Workshop on Applications of Signal Processing to
Audio and Acoustics (WASPAA) 2023
Hanyu Peng, Guanhua Fang, Ping Li
Instance-wise feature selection and ranking methods can achieve a good
selection of task-friendly features for each sample in the context of neural
networks. However, existing approaches that assume feature subsets to be
independent are imperfect when considering the dependency between features. To
address this limitation, we propose to incorporate the Gaussian copula, a
powerful mathematical technique for capturing correlations between variables,
into the current feature selection framework with no additional changes needed.
Experimental results on both synthetic and real datasets, in terms of
performance comparison and interpretability, demonstrate that our method is
capable of capturing meaningful correlations.
Authors' comments: 15 pages, UAI poster
Yajie Cui, Zhaoxiang Liu, Shiguo Lian
Anomaly detection without priors of the anomalies is challenging. In the field of unsupervised anomaly detection, traditional auto-encoder (AE) tends to fail based on the assumption that by training only on normal images, the model will not be able to reconstruct abnormal images correctly. On the contrary, we propose a novel patch-wise auto-encoder (Patch AE) framework, which aims at enhancing the reconstruction ability of AE to anomalies instead of weakening it. Each patch of image is reconstructed by corresponding spatially distributed feature vector of the learned feature representation, i.e., patch-wise reconstruction, which ensures anomaly-sensitivity of AE. Our method is simple and efficient. It advances the state-of-the-art performances on Mvtec AD benchmark, which proves the effectiveness of our model. It shows great potential in practical industrial application scenarios.
Chunjin Yang, Fanman Meng, Shuai Chen, Mingyu Liu, Runtong Zhang
Large-scale vision-language models (LVLMs) pretrained on massive image-text pairs have achieved remarkable success in visual representations. However, existing paradigms to transfer LVLMs to downstream tasks encounter two primary challenges. Firstly, the text features remain fixed after being calculated and cannot be adjusted according to image features, which decreases the model's adaptability. Secondly, the model's output solely depends on the similarity between the text and image features, leading to excessive reliance on LVLMs. To address these two challenges, we introduce a novel two-branch model named the Instance-Wise Adaptive Tuning and Caching (ATC). Specifically, one branch implements our proposed ConditionNet, which guides image features to form an adaptive textual cache that adjusts based on image features, achieving instance-wise inference and improving the model's adaptability. The other branch introduces the similarities between images and incorporates a learnable visual cache, designed to decouple new and previous knowledge, allowing the model to acquire new knowledge while preserving prior knowledge. The model's output is jointly determined by the two branches, thus overcoming the limitations of existing methods that rely solely on LVLMs. Additionally, our method requires limited computing resources to tune parameters, yet outperforms existing methods on 11 benchmark datasets.
Cheng Wen, Baosheng Yu, Rao Fu, Dacheng Tao
A generative model for high-fidelity point clouds is of great importance in synthesizing 3d environments for applications such as autonomous driving and robotics. Despite the recent success of deep generative models for 2d images, it is non-trivial to generate 3d point clouds without a comprehensive understanding of both local and global geometric structures. In this paper, we devise a new 3d point cloud generation framework using a divide-and-conquer approach, where the whole generation process can be divided into a set of patch-wise generation tasks. Specifically, all patch generators are based on learnable priors, which aim to capture the information of geometry primitives. We introduce point- and patch-wise transformers to enable the interactions between points and patches. Therefore, the proposed divide-and-conquer approach contributes to a new understanding of point cloud generation from the geometry constitution of 3d shapes. Experimental results on a variety of object categories from the most popular point cloud dataset, ShapeNet, show the effectiveness of the proposed patch-wise point cloud generation, where it clearly outperforms recent state-of-the-art methods for high-fidelity point cloud generation.
Yunhao Ge, Yuecheng Li, Shuo Ni, Jiaping Zhao, Ming-Hsuan Yang, Laurent Itti
Continual learning aims to emulate the human ability to continually
accumulate knowledge over sequential tasks. The main challenge is to maintain
performance on previously learned tasks after learning new tasks, i.e., to
avoid catastrophic forgetting. We propose a Channel-wise Lightweight
Reprogramming (CLR) approach that helps convolutional neural networks (CNNs)
overcome catastrophic forgetting during continual learning. We show that a CNN
model trained on an old task (or self-supervised proxy task) could be
``reprogrammed" to solve a new task by using our proposed lightweight (very
cheap) reprogramming parameter. With the help of CLR, we have a better
stability-plasticity trade-off to solve continual learning problems: To
maintain stability and retain previous task ability, we use a common
task-agnostic immutable part as the shared ``anchor" parameter set. We then add
task-specific lightweight reprogramming parameters to reinterpret the outputs
of the immutable parts, to enable plasticity and integrate new knowledge. To
learn sequential tasks, we only train the lightweight reprogramming parameters
to learn each new task. Reprogramming parameters are task-specific and
exclusive to each task, which makes our method immune to catastrophic
forgetting. To minimize the parameter requirement of reprogramming to learn new
tasks, we make reprogramming lightweight by only adjusting essential kernels
and learning channel-wise linear mappings from anchor parameters to
task-specific domain knowledge. We show that, for general CNNs, the CLR
parameter increase is less than 0.6\% for any new task. Our method outperforms
13 state-of-the-art continual learning baselines on a new challenging sequence
of 53 image classification datasets. Code and data are available at
https://github.com/gyhandy/Channel-wise-Lightweight-Reprogramming
Authors' comments: ICCV 2023
Yongjeong Oh, Jaeho Lee, Christopher G. Brinton, Yo-Seb Jeon
This paper proposes a novel communication-efficient split learning (SL) framework, named SplitFC, which reduces the communication overhead required for transmitting intermediate feature and gradient vectors during the SL training process. The key idea of SplitFC is to leverage different dispersion degrees exhibited in the columns of the matrices. SplitFC incorporates two compression strategies: (i) adaptive feature-wise dropout and (ii) adaptive feature-wise quantization. In the first strategy, the intermediate feature vectors are dropped with adaptive dropout probabilities determined based on the standard deviation of these vectors. Then, by the chain rule, the intermediate gradient vectors associated with the dropped feature vectors are also dropped. In the second strategy, the non-dropped intermediate feature and gradient vectors are quantized using adaptive quantization levels determined based on the ranges of the vectors. To minimize the quantization error, the optimal quantization levels of this strategy are derived in a closed-form expression. Simulation results on the MNIST, CIFAR-100, and CelebA datasets demonstrate that SplitFC outperforms state-of-the-art SL frameworks by significantly reducing communication overheads while maintaining high accuracy.
Wenyu Zhang, Qing Ding, Jian Hu, Yi Ma, Mingzhe Lu
Graph convolutional networks (GCN) is widely used to handle irregular data since it updates node features by using the structure information of graph. With the help of iterated GCN, high-order information can be obtained to further enhance the representation of nodes. However, how to apply GCN to structured data (such as pictures) has not been deeply studied. In this paper, we explore the application of graph attention networks (GAT) in image feature extraction. First of all, we propose a novel graph generation algorithm to convert images into graphs through matrix transformation. It is one magnitude faster than the algorithm based on K Nearest Neighbors (KNN). Then, GAT is used on the generated graph to update the node features. Thus, a more robust representation is obtained. These two steps are combined into a module called pixel-wise graph attention module (PGA). Since the graph obtained by our graph generation algorithm can still be transformed into a picture after processing, PGA can be well combined with CNN. Based on these two modules, we consulted the ResNet and design a pixel-wise graph attention network (PGANet). The PGANet is applied to the task of person re-identification in the datasets Market1501, DukeMTMC-reID and Occluded-DukeMTMC (outperforms state-of-the-art by 0.8\%, 1.1\% and 11\% respectively, in mAP scores). Experiment results show that it achieves the state-of-the-art performance. \href{https://github.com/wenyu1009/PGANet}{The code is available here}.
Christian Herglotz, Sion Grosche, Akarsh Bharadwaj, André Kaup
This paper presents a novel method to estimate the power consumption of
distinct active components on an electronic carrier board by using thermal
imaging. The components and the board can be made of heterogeneous material
such as plastic, coated microchips, and metal bonds or wires, where a special
coating for high emissivity is not required. The thermal images are recorded
when the components on the board are dissipating power. In order to enable
reliable estimates, a segmentation of the thermal image must be available that
can be obtained by manual labeling, object detection methods, or exploiting
layout information. Evaluations show that with low-resolution consumer infrared
cameras and dissipated powers larger than 300mW, mean estimation errors of 10%
can be achieved.
Authors' comments: 10 pages, 8 figures
Christian Bender, Steffen Meyer
We introduce and analyze a family of linear least-squares Monte Carlo schemes
for backward SDEs, which interpolate between the one-step dynamic programming
scheme of Lemor, Warin, and Gobet (Bernoulli, 2006) and the multi-step dynamic
programming scheme of Gobet and Turkedjiev (Mathematics of Computation, 2016).
Our algorithm approximates conditional expectations over segments of the time
grid. We discuss the optimal choice of the segment length depending on the
`smoothness' of the problem and show that, in typical situations, the
complexity can be reduced compared to the state-of-the-art multi-step dynamic
programming scheme.
Authors' comments: 35 pages
Javier Salazar Cavazos, Jeffrey A. Fessler, Laura Balzano
Principal component analysis (PCA) is a key tool in the field of data
dimensionality reduction that is useful for various data science problems.
However, many applications involve heterogeneous data that varies in quality
due to noise characteristics associated with different sources of the data.
Methods that deal with this mixed dataset are known as heteroscedastic methods.
Current methods like HePPCAT make Gaussian assumptions of the basis
coefficients that may not hold in practice. Other methods such as Weighted PCA
(WPCA) assume the noise variances are known, which may be difficult to know in
practice. This paper develops a PCA method that can estimate the sample-wise
noise variances and use this information in the model to improve the estimate
of the subspace basis associated with the low-rank structure of the data. This
is done without distributional assumptions of the low-rank component and
without assuming the noise variances are known. Simulations show the
effectiveness of accounting for such heteroscedasticity in the data, the
benefits of using such a method with all of the data versus retaining only good
data, and comparisons are made against other PCA methods established in the
literature like PCA, Robust PCA (RPCA), and HePPCAT. Code available at
https://github.com/javiersc1/ALPCAH
Authors' comments: This article has been accepted for publication in the Fourteenth
International Conference on Sampling Theory and Applications, accessible via
IEEE XPlore. See DOI section
Wenting Tang, Xingxing Wei, Bo Li
Structured network pruning is a practical approach to reduce computation cost directly while retaining the CNNs' generalization performance in real applications. However, identifying redundant filters is a core problem in structured network pruning, and current redundancy criteria only focus on individual filters' attributes. When pruning sparsity increases, these redundancy criteria are not effective or efficient enough. Since the filter-wise interaction also contributes to the CNN's prediction accuracy, we integrate the filter-wise interaction into the redundancy criterion. In our criterion, we introduce the filter importance and filter utilization strength to reflect the decision ability of individual and multiple filters. Utilizing this new redundancy criterion, we propose a structured network pruning approach SNPFI (Structured Network Pruning by measuring Filter-wise Interaction). During the pruning, the SNPFI can automatically assign the proper sparsity based on the filter utilization strength and eliminate the useless filters by filter importance. After the pruning, the SNPFI can recover pruned model's performance effectively without iterative training by minimizing the interaction difference. We empirically demonstrate the effectiveness of the SNPFI with several commonly used CNN models, including AlexNet, MobileNetv1, and ResNet-50, on various image classification datasets, including MNIST, CIFAR-10, and ImageNet. For all experimental CNN models, nearly 60% of computation is reduced in a network compression while the classification accuracy remains.
Runshi Tang, Ming Yuan, Anru R. Zhang
This paper introduces a novel framework called Mode-wise Principal Subspace
Pursuit (MOP-UP) to extract hidden variations in both the row and column
dimensions for matrix data. To enhance the understanding of the framework, we
introduce a class of matrix-variate spiked covariance models that serve as
inspiration for the development of the MOP-UP algorithm. The MOP-UP algorithm
consists of two steps: Average Subspace Capture (ASC) and Alternating
Projection (AP). These steps are specifically designed to capture the row-wise
and column-wise dimension-reduced subspaces which contain the most informative
features of the data. ASC utilizes a novel average projection operator as
initialization and achieves exact recovery in the noiseless setting. We analyze
the convergence and non-asymptotic error bounds of MOP-UP, introducing a
blockwise matrix eigenvalue perturbation bound that proves the desired bound,
where classic perturbation bounds fail. The effectiveness and practical merits
of the proposed framework are demonstrated through experiments on both
simulated and real datasets. Lastly, we discuss generalizations of our approach
to higher-order data.
Authors' comments: Journal of the Royal Statistical Society, Series B, to appear
Xingxing Wei, Shiji Zhao
Adversarial examples have attracted widespread attention in security-critical applications because of their transferability across different models. Although many methods have been proposed to boost adversarial transferability, a gap still exists between capabilities and practical demand. In this paper, we argue that the model-specific discriminative regions are a key factor causing overfitting to the source model, and thus reducing the transferability to the target model. For that, a patch-wise mask is utilized to prune the model-specific regions when calculating adversarial perturbations. To accurately localize these regions, we present a learnable approach to automatically optimize the mask. Specifically, we simulate the target models in our framework, and adjust the patch-wise mask according to the feedback of the simulated models. To improve the efficiency, the differential evolutionary (DE) algorithm is utilized to search for patch-wise masks for a specific image. During iterative attacks, the learned masks are applied to the image to drop out the patches related to model-specific regions, thus making the gradients more generic and improving the adversarial transferability. The proposed approach is a preprocessing method and can be integrated with existing methods to further boost the transferability. Extensive experiments on the ImageNet dataset demonstrate the effectiveness of our method. We incorporate the proposed approach with existing methods to perform ensemble attacks and achieve an average success rate of 93.01% against seven advanced defense methods, which can effectively enhance the state-of-the-art transfer-based attack performance.
Kovi Rose, Joshua Pritchard, Tara Murphy, Manisha Caleb, Dougal Dobie, Laura Driessen, Stefan W. Duchesne, David L. Kaplan et al.
We present the detection of rotationally modulated, circularly polarized
radio emission from the T8 brown dwarf WISE J062309.94-045624.6 between 0.9 and
2.0 GHz. We detected this high proper motion ultracool dwarf with the
Australian SKA Pathfinder in $1.36$ GHz imaging data from the Rapid ASKAP
Continuum Survey. We observed WISE J062309.94-045624.6 to have a time and
frequency averaged Stokes I flux density of $4.17\pm0.41$ mJy beam$^{-1}$, with
an absolute circular polarization fraction of $66.3\pm9.0\%$, and calculated a
specific radio luminosity of $L_{\nu}\sim10^{14.8}$ erg s$^{-1}$ Hz$^{-1}$. In
follow-up observations with the Australian Telescope Compact Array and MeerKAT
we identified a multi-peaked pulse structure, used dynamic spectra to place a
lower limit of $B>0.71$ kG on the dwarf's magnetic field, and measured a
$P=1.912\pm0.005$ h periodicity which we concluded to be due to rotational
modulation. The luminosity and period we measured are comparable to those of
other ultracool dwarfs observed at radio wavelengths. This implies that future
megahertz to gigahertz surveys, with increased cadence and improved
sensitivity, are likely to detect similar or later-type dwarfs. Our detection
of WISE J062309.94-045624.6 makes this dwarf the coolest and latest-type star
observed to produce radio emission.
Authors' comments: Accepted for publication in ApJ Letters; 11 pages, 3 figures and 2
tables
Lucile Ter-Minassian, Oscar Clivio, Karla Diaz-Ordaz, Robin J. Evans, Chris Holmes
Predictive black-box models can exhibit high accuracy but their opaque nature hinders their uptake in safety-critical deployment environments. Explanation methods (XAI) can provide confidence for decision-making through increased transparency. However, existing XAI methods are not tailored towards models in sensitive domains where one predictor is of special interest, such as a treatment effect in a clinical model, or ethnicity in policy models. We introduce Path-Wise Shapley effects (PWSHAP), a framework for assessing the targeted effect of a binary (e.g.~treatment) variable from a complex outcome model. Our approach augments the predictive model with a user-defined directed acyclic graph (DAG). The method then uses the graph alongside on-manifold Shapley values to identify effects along causal pathways whilst maintaining robustness to adversarial attacks. We establish error bounds for the identified path-wise Shapley effects and for Shapley values. We show PWSHAP can perform local bias and mediation analyses with faithfulness to the model. Further, if the targeted variable is randomised we can quantify local effect modification. We demonstrate the resolution, interpretability, and true locality of our approach on examples and a real-world experiment.
Michael Loibl, Leonardo Leonetti, Alessandro Reali, Josef Kiendl
This work presents an efficient quadrature rule for shell analysis fully integrated in CAD by means of Isogeometric Analysis (IGA). General CAD-models may consist of trimmed parts such as holes, intersections, cut-offs etc. Therefore, IGA should be able to deal with these models in order to fulfil its promise of closing the gap between design and analysis. Trimming operations violate the tensor-product structure of the used Non-Uniform Rational B-spline (NURBS) basis functions and of typical quadrature rules. Existing efficient patch-wise quadrature rules consider actual knot vectors and are determined in 1D. They are extended to further dimensions by means of a tensor-product. Therefore, they are not directly applicable to trimmed structures. The herein proposed method extends patch-wise quadrature rules to trimmed surfaces. Thereby, the number of quadrature points can be signifficantly reduced. Geometrically linear and non-linear benchmarks of plane, plate and shell structures are investigated. The results are compared to a standard trimming procedure and a good performance is observed.
Seungjin Jung, Seungmo Seo, Yonghyun Jeong, Jongwon Choi
The class-wise training losses often diverge as a result of the various
levels of intra-class and inter-class appearance variation, and we find that
the diverging class-wise training losses cause the uncalibrated prediction with
its reliability. To resolve the issue, we propose a new calibration method to
synchronize the class-wise training losses. We design a new training loss to
alleviate the variance of class-wise training losses by using multiple
class-wise scaling factors. Since our framework can compensate the training
losses of overfitted classes with those of under-fitted classes, the integrated
training loss is preserved, preventing the performance drop even after the
model calibration. Furthermore, our method can be easily employed in the
post-hoc calibration methods, allowing us to use the pre-trained model as an
initial model and reduce the additional computation for model calibration. We
validate the proposed framework by employing it in the various post-hoc
calibration methods, which generally improves calibration performance while
preserving accuracy, and discover through the investigation that our approach
performs well with unbalanced datasets and untuned hyperparameters.
Authors' comments: Published at ICML 2023. Camera ready version
Lin Li, Jianing Qiu, Michael Spratling
Deep neural networks are vulnerable to adversarial examples. Adversarial
training (AT) is an effective defense against adversarial examples. However, AT
is prone to overfitting which degrades robustness substantially. Recently, data
augmentation (DA) was shown to be effective in mitigating robust overfitting if
appropriately designed and optimized for AT. This work proposes a new method to
automatically learn online, instance-wise, DA policies to improve robust
generalization for AT. This is the first automated DA method specific for
robustness. A novel policy learning objective, consisting of Vulnerability,
Affinity and Diversity, is proposed and shown to be sufficiently effective and
efficient to be practical for automatic DA generation during AT. Importantly,
our method dramatically reduces the cost of policy search from the 5000 hours
of AutoAugment and the 412 hours of IDBH to 9 hours, making automated DA more
practical to use for adversarial robustness. This allows our method to
efficiently explore a large search space for a more effective DA policy and
evolve the policy as training progresses. Empirically, our method is shown to
outperform all competitive DA methods across various model architectures and
datasets. Our DA policy reinforced vanilla AT to surpass several
state-of-the-art AT methods regarding both accuracy and robustness. It can also
be combined with those advanced AT methods to further boost robustness. Code
and pre-trained models are available at https://github.com/TreeLLi/AROID.
Authors' comments: published at the IJCV in press