Peter Frankl, Andrey Kupavskii
A family of subsets of $[n]$ is $r$-wise agreeing if for any $r$ sets from the family there is an element $x$ that is either contained in all or contained in none of the $r$ sets. The study of such families is motivated by questions in discrete optimization. In this paper, we determine the size of the largest non-trivial $r$-wise agreeing family. This can be seen as a generalization of the classical Brace-Daykin theorem.
Aldo Pacchiano, Mohammad Ghavamzadeh, Peter Bartlett
We study contextual bandits in the presence of a stage-wise constraint (a
constraint at each round), when the constraint must be satisfied both with high
probability and in expectation. Obviously the setting where the constraint is
in expectation is a relaxation of the one with high probability. We start with
the linear case where both the contextual bandit problem (reward function) and
the stage-wise constraint (cost function) are linear. In each of the high
probability and in expectation settings, we propose an upper-confidence bound
algorithm for the problem and prove a $T$-round regret bound for it. Our
algorithms balance exploration and constraint satisfaction using a novel idea
that scales the radii of the reward and cost confidence sets with different
scaling factors. We also prove a lower-bound for this constrained problem, show
how our algorithms and analyses can be extended to multiple constraints, and
provide simulations to validate our theoretical results. In the high
probability setting, we describe the minimum requirements for the action set in
order for our algorithm to be tractable. In the setting that the constraint is
in expectation, we further specialize our results to multi-armed bandits and
propose a computationally efficient algorithm for this setting with regret
analysis. Finally, we extend our results to the case where the reward and cost
functions are both non-linear. We propose an algorithm for this case and prove
a regret bound for it that characterize the function class complexity by the
eluder dimension.
Authors' comments: 53 pages. arXiv admin note: text overlap with arXiv:2006.10185
C. Contreras Peña, M. Ashraf, J. E. Lee, G. Herczeg, P. W. Lucas, Z. Guo, D. Johnstone, H. G. Lee et al.
This work presents four high-amplitude variable YSOs ($\simeq$ 3 mag at near-
or mid-IR wavelengths) arising from the SPICY catalog. Three outbursts show a
duration that is longer than 1 year, and are still ongoing. And additional YSO
brightened over the last two epochs of NEOWISE observations and the duration of
the outburst is thus unclear. Analysis of the spectra of the four sources
confirms them as new members of the eruptive variable class. We find two YSOs
that can be firmly classified as bona fide FUors and one object that falls in
the V1647 Ori-like class. Given the uncertainty in the duration of its
outburst, an additional YSO can only be classified as a candidate FUor.
Continued monitoring and follow-up of these particular sources is important to
better understand the accretion process of YSOs.
Authors' comments: 10 pages, 10 figures, Accepted for publication at the Journal of the
Korean Astronomical Society
Adam Dejl, Hamed Ayoobi, Matthew Williams, Francesca Toni
Feature attribution methods are widely used to explain neural models by determining the influence of individual input features on the models' outputs. We propose a novel feature attribution method, CAFE (Conflict-Aware Feature-wise Explanations), that addresses three limitations of the existing methods: their disregard for the impact of conflicting features, their lack of consideration for the influence of bias terms, and an overly high sensitivity to local variations in the underpinning activation functions. Unlike other methods, CAFE provides safeguards against overestimating the effects of neuron inputs and separately traces positive and negative influences of input features and biases, resulting in enhanced robustness and increased ability to surface feature conflicts. We show experimentally that CAFE is better able to identify conflicting features on synthetic tabular data and exhibits the best overall fidelity on several real-world tabular datasets, while being highly computationally efficient.
Tycho F. A. van der Ouderaa, Alexander Immer, Mark van der Wilk
Convolutions encode equivariance symmetries into neural networks leading to better generalisation performance. However, symmetries provide fixed hard constraints on the functions a network can represent, need to be specified in advance, and can not be adapted. Our goal is to allow flexible symmetry constraints that can automatically be learned from data using gradients. Learning symmetry and associated weight connectivity structures from scratch is difficult for two reasons. First, it requires efficient and flexible parameterisations of layer-wise equivariances. Secondly, symmetries act as constraints and are therefore not encouraged by training losses measuring data fit. To overcome these challenges, we improve parameterisations of soft equivariance and learn the amount of equivariance in layers by optimising the marginal likelihood, estimated using differentiable Laplace approximations. The objective balances data fit and model complexity enabling layer-wise symmetry discovery in deep networks. We demonstrate the ability to automatically learn layer-wise equivariances on image classification tasks, achieving equivalent or improved performance over baselines with hard-coded symmetry.
Yafang Zheng, Lei Lin, Zhaohong Lai, Binling Wang, Shan Liu, Biao Fu, Wenhao Rao, Peigen Ye et al.
Despite successes across a broad range of applications, sequence-to-sequence
models' construct of solutions are argued to be less compositional than
human-like generalization. There is mounting evidence that one of the reasons
hindering compositional generalization is representations of the encoder and
decoder uppermost layer are entangled. In other words, the syntactic and
semantic representations of sequences are twisted inappropriately. However,
most previous studies mainly concentrate on enhancing token-level semantic
information to alleviate the representations entanglement problem, rather than
composing and using the syntactic and semantic representations of sequences
appropriately as humans do. In addition, we explain why the entanglement
problem exists from the perspective of recent studies about training deeper
Transformer, mainly owing to the ``shallow'' residual connections and its
simple, one-step operations, which fails to fuse previous layers' information
effectively. Starting from this finding and inspired by humans' strategies, we
propose \textsc{FuSion} (\textbf{Fu}sing \textbf{S}yntactic and
Semant\textbf{i}c Representati\textbf{on}s), an extension to
sequence-to-sequence models to learn to fuse previous layers' information back
into the encoding and decoding process appropriately through introducing a
\emph{fuse-attention module} at each encoder and decoder layer. \textsc{FuSion}
achieves competitive and even \textbf{state-of-the-art} results on two
realistic benchmarks, which empirically demonstrates the effectiveness of our
proposal.
Authors' comments: work in progress. arXiv admin note: substantial text overlap with
arXiv:2305.12169
Roshan Sharma, Kenneth Zheng, Siddhant Arora, Shinji Watanabe, Rita Singh, Bhiksha Raj
End-to-end speech summarization has been shown to improve performance over
cascade baselines. However, such models are difficult to train on very large
inputs (dozens of minutes or hours) owing to compute restrictions and are hence
trained with truncated model inputs. Truncation leads to poorer models, and a
solution to this problem rests in block-wise modeling, i.e., processing a
portion of the input frames at a time. In this paper, we develop a method that
allows one to train summarization models on very long sequences in an
incremental manner. Speech summarization is realized as a streaming process,
where hypothesis summaries are updated every block based on new acoustic
information. We devise and test strategies to pass semantic context across the
blocks. Experiments on the How2 dataset demonstrate that the proposed
block-wise training method improves by 3 points absolute on ROUGE-L over a
truncated input baseline.
Authors' comments: Accepted at Interspeech 2023
Keke Huang, Jing Tang, Juncheng Liu, Renchi Yang, Xiaokui Xiao
Graph Neural Networks (GNNs) have shown superior performance for semi-supervised learning of numerous web applications, such as classification on web services and pages, analysis of online social networks, and recommendation in e-commerce. The state of the art derives representations for all nodes in graphs following the same diffusion (message passing) model without discriminating their uniqueness. However, (i) labeled nodes involved in model training usually account for a small portion of graphs in the semisupervised setting, and (ii) different nodes locate at different graph local contexts and it inevitably degrades the representation qualities if treating them undistinguishedly in diffusion. To address the above issues, we develop NDM, a universal node-wise diffusion model, to capture the unique characteristics of each node in diffusion, by which NDM is able to yield high-quality node representations. In what follows, we customize NDM for semisupervised learning and design the NIGCN model. In particular, NIGCN advances the efficiency significantly since it (i) produces representations for labeled nodes only and (ii) adopts well-designed neighbor sampling techniques tailored for node representation generation. Extensive experimental results on various types of web datasets, including citation, social and co-purchasing graphs, not only verify the state-of-the-art effectiveness of NIGCN but also strongly support the remarkable scalability of NIGCN. In particular, NIGCN completes representation generation and training within 10 seconds on the dataset with hundreds of millions of nodes and billions of edges, up to orders of magnitude speedups over the baselines, while achieving the highest F1-scores on classification.
Rodrigo Arruda, Bernardo Carvalho, Alberto Sarmiento
This paper discusses the dynamics of continuum-wise hyperbolic surface homeomorphisms. We prove that $cw_F$-hyperbolic surface homeomorphisms containing only a finite set of spines are $cw_2$-hyperbolic. In the case of $cw_3$-hyperbolic homeomorphisms we prove the finiteness of spines and, hence, that $cw_3$-hyperbolicity implies $cw_2$-hyperbolicity. In the proof, we adapt techniques of Hiraide [11] and Lewowicz [15] in the case of expansive surface homeomorphisms to prove that local stable/unstable continua of $cw_F$-hyperbolic homeomorphisms are continuous arcs. We also adapt techniques of Artigue, Pac\'ifico and Vieitez [6] in the case of N-expansive surface homeomorphisms to prove that the existence of spines is strongly related to the existence of bi-asymptotic sectors and conclude that spines are necessarily isolated from other spines.
Yuto Watanabe, Kazunori Sakurama
This study addresses a distributed optimization with a novel class of coupling of variables, called clique-wise coupling. A clique is a node set of a complete subgraph of an undirected graph. This setup is an extension of pairwise coupled optimization problems (e.g., consensus optimization) and allows us to handle coupling of variables consisting of more than two agents systematically. To solve this problem, we propose a clique-based linearized ADMM algorithm, which is proved to be distributed. Additionally, we consider objective functions given as a sum of nonsmooth and smooth convex functions and present a more flexible algorithm based on the FLiP-ADMM algorithm. Moreover, we provide convergence theorems of these algorithms. Notably, all the algorithmic parameters and the derived condition in the theorems depend only on local information, which means that each agent can choose the parameters in a distributed manner. Finally, we apply the proposed methods to a consensus optimization problem and demonstrate their effectiveness via numerical experiments.
Sri Charan Kattamuru, Kshitij Agrawal, Shyam Prasad Adhikari, Abhishek Bose, Hemant Misra
Images captured through smartphone cameras often suffer from degradation,
blur being one of the major ones, posing a challenge in processing these images
for downstream tasks. In this paper we propose low-compute lightweight
patch-wise features for image quality assessment. Using our method we can
discriminate between blur vs sharp image degradation. To this end, we train a
decision-tree based XGBoost model on various intuitive image features like gray
level variance, first and second order gradients, texture features like local
binary patterns. Experiments conducted on an open dataset show that the
proposed low compute method results in 90.1% mean accuracy on the validation
set, which is comparable to the accuracy of a compute-intensive VGG16 network
with 94% mean accuracy fine-tuned to this task. To demonstrate the
generalizability of our proposed features and model we test the model on BHBID
dataset and an internal dataset where we attain accuracy of 98% and 91%,
respectively. The proposed method is 10x faster than the VGG16 based model on
CPU and scales linearly to the input image size making it suitable to be
implemented on low compute edge devices.
Authors' comments: Accepted in AIMLSystems2022
Zeming Wei, Yifei Wang, Yiwen Guo, Yisen Wang
Adversarial training has been widely acknowledged as the most effective
method to improve the adversarial robustness against adversarial examples for
Deep Neural Networks (DNNs). So far, most existing works focus on enhancing the
overall model robustness, treating each class equally in both the training and
testing phases. Although revealing the disparity in robustness among classes,
few works try to make adversarial training fair at the class level without
sacrificing overall robustness. In this paper, we are the first to
theoretically and empirically investigate the preference of different classes
for adversarial configurations, including perturbation margin, regularization,
and weight averaging. Motivated by this, we further propose a
\textbf{C}lass-wise calibrated \textbf{F}air \textbf{A}dversarial training
framework, named CFA, which customizes specific training configurations for
each class automatically. Experiments on benchmark datasets demonstrate that
our proposed CFA can improve both overall robustness and fairness notably over
other state-of-the-art methods. Code is available at
\url{https://github.com/PKU-ML/CFA}.
Authors' comments: CVPR 2023
Kenyu Kobayashi, Renata Khasanova, Arno Schneuwly, Felix Schmidt, Matteo Casserini
Autoencoders are a powerful and versatile tool often used for various problems such as anomaly detection, image processing and machine translation. However, their reconstructions are not always trivial to explain. Therefore, we propose a fast explainability solution by extending the Layer-wise Relevance Propagation method with the help of Deep Taylor Decomposition framework. Furthermore, we introduce a novel validation technique for comparing our explainability approach with baseline methods in the case of missing ground-truth data. Our results highlight computational as well as qualitative advantages of the proposed explainability solution with respect to existing methods.
Kai Zhang, Yutong Dai, Hongyi Wang, Eric Xing, Xun Chen, Lichao Sun
Federated learning is a promising paradigm that allows multiple clients to collaboratively train a model without sharing the local data. However, the presence of heterogeneous devices in federated learning, such as mobile phones and IoT devices with varying memory capabilities, would limit the scale and hence the performance of the model could be trained. The mainstream approaches to address memory limitations focus on width-slimming techniques, where different clients train subnetworks with reduced widths locally and then the server aggregates the subnetworks. The global model produced from these methods suffers from performance degradation due to the negative impact of the actions taken to handle the varying subnetwork widths in the aggregation phase. In this paper, we introduce a memory-adaptive depth-wise learning solution in FL called FeDepth, which adaptively decomposes the full model into blocks according to the memory budgets of each client and trains blocks sequentially to obtain a full inference model. Our method outperforms state-of-the-art approaches, achieving 5% and more than 10% improvements in top-1 accuracy on CIFAR-10 and CIFAR-100, respectively. We also demonstrate the effectiveness of depth-wise fine-tuning on ViT. Our findings highlight the importance of memory-aware techniques for federated learning with heterogeneous devices and the success of depth-wise training strategy in improving the global model's performance.
Yucheng Xu, Mohammadreza Kasaei, Hamidreza Kasaei, Zhibin Li
Generating high-quality instance-wise grasp configurations provides critical information of how to grasp specific objects in a multi-object environment and is of high importance for robot manipulation tasks. This work proposed a novel \textbf{S}ingle-\textbf{S}tage \textbf{G}rasp (SSG) synthesis network, which performs high-quality instance-wise grasp synthesis in a single stage: instance mask and grasp configurations are generated for each object simultaneously. Our method outperforms state-of-the-art on robotic grasp prediction based on the OCID-Grasp dataset, and performs competitively on the JACQUARD dataset. The benchmarking results showed significant improvements compared to the baseline on the accuracy of generated grasp configurations. The performance of the proposed method has been validated through both extensive simulations and real robot experiments for three tasks including single object pick-and-place, grasp synthesis in cluttered environments and table cleaning task.
Weiye Zhao, Tairan He, Rui Chen, Tianhao Wei, Changliu Liu
Despite the tremendous success of Reinforcement Learning (RL) algorithms in
simulation environments, applying RL to real-world applications still faces
many challenges. A major concern is safety, in another word, constraint
satisfaction. State-wise constraints are one of the most common constraints in
real-world applications and one of the most challenging constraints in Safe RL.
Enforcing state-wise constraints is necessary and essential to many challenging
tasks such as autonomous driving, robot manipulation. This paper provides a
comprehensive review of existing approaches that address state-wise constraints
in RL. Under the framework of State-wise Constrained Markov Decision Process
(SCMDP), we will discuss the connections, differences, and trade-offs of
existing approaches in terms of (i) safety guarantee and scalability, (ii)
safety and reward performance, and (iii) safety after convergence and during
training. We also summarize limitations of current methods and discuss
potential future directions.
Authors' comments: IJCAI Survey Track 2023
T. H. Jarrett, M. E. Cluver, Edward N. Taylor, Sabine Bellstedt, A. S. G Robotham, H. F. M. Yao
We derive new empirical scaling relations between WISE mid-infrared galaxy
photometry and well-determined stellar masses from SED modeling of a suite of
optical-infrared photometry provided by the DR4 Catalogue of the
GAMA-KiDS-VIKING survey of the southern G23 field. The mid-infrared source
extraction and characterization are drawn from the WISE Extended Source
Catalogue (WXSC) and the archival ALLWISE catalog, combining both resolved and
compact galaxies in the G23 sample to a redshift of 0.15. Three scaling
relations are derived: W1 3.4 micron luminosity versus stellar mass, and WISE
W1-W2, W1-W3 colors versus mass-to-light ratio (sensitive to a variety of
galaxy types from passive to star-forming). For each galaxy in the sample, we
then derive the combined stellar mass from these scaling relations, producing
Mstellar estimates with better than $\sim$25-30% accuracy for galaxies with
$>$10$^{9}$ Msolar and $<$40 - 50% for lower luminosity dwarf galaxies. We also
provide simple prescriptions for rest-frame corrections and estimating stellar
masses using only the W1 flux and the W1-W2 color, making stellar masses more
accessible to users of the WISE data. Given a redshift or distance, these new
scaling relations will enable stellar mass estimates for any galaxy in the sky
detected by WISE with high fidelity across a range of mass-to-light.
Authors' comments: Accepted for publication in the Astrophysical Journal (ApJ)
Norihide Tokushige
A family of $k$-element subsets of an $n$-element set is called 3-wise
intersecting if any three members in the family have non-empty intersection. We
determine the maximum size of such families exactly or asymptotically. One of
our results shows that for every $\epsilon>0$ there exists $n_0$ such that if
$n>n_0$ and $\frac25+\epsilon<\frac kn<\frac 12-\epsilon$ then the maximum size
is $4\binom{n-4}{k-3}+\binom{n-4}{k-4}$.
Authors' comments: 12 pages, fixed proof Theorem 4
Yizeng Han, Zhihang Yuan, Yifan Pu, Chenhao Xue, Shiji Song, Guangyu Sun, Gao Huang
Spatial-wise dynamic convolution has become a promising approach to improving
the inference efficiency of deep networks. By allocating more computation to
the most informative pixels, such an adaptive inference paradigm reduces the
spatial redundancy in image features and saves a considerable amount of
unnecessary computation. However, the theoretical efficiency achieved by
previous methods can hardly translate into a realistic speedup, especially on
the multi-core processors (e.g. GPUs). The key challenge is that the existing
literature has only focused on designing algorithms with minimal computation,
ignoring the fact that the practical latency can also be influenced by
scheduling strategies and hardware properties. To bridge the gap between
theoretical computation and practical efficiency, we propose a latency-aware
spatial-wise dynamic network (LASNet), which performs coarse-grained spatially
adaptive inference under the guidance of a novel latency prediction model. The
latency prediction model can efficiently estimate the inference latency of
dynamic networks by simultaneously considering algorithms, scheduling
strategies, and hardware properties. We use the latency predictor to guide both
the algorithm design and the scheduling optimization on various hardware
platforms. Experiments on image classification, object detection and instance
segmentation demonstrate that the proposed framework significantly improves the
practical inference efficiency of deep networks. For example, the average
latency of a ResNet-101 on the ImageNet validation set could be reduced by 36%
and 46% on a server GPU (Nvidia Tesla-V100) and an edge device (Nvidia Jetson
TX2 GPU) respectively without sacrificing the accuracy. Code is available at
https://github.com/LeapLabTHU/LASNet.
Authors' comments: NeurIPS 2022
Michael Panchenko, Anes Benmerzoug, Miguel de Benito Delgado
For many applications of probabilistic classifiers it is important that the
predicted confidence vectors reflect true probabilities (one says that the
classifier is calibrated). It has been shown that common models fail to satisfy
this property, making reliable methods for measuring and improving calibration
important tools. Unfortunately, obtaining these is far from trivial for
problems with many classes. We propose two techniques that can be used in
tandem. First, a reduced calibration method transforms the original problem
into a simpler one. We prove for several notions of calibration that solving
the reduced problem minimizes the corresponding notion of miscalibration in the
full problem, allowing the use of non-parametric recalibration methods that
fail in higher dimensions. Second, we propose class-wise calibration methods,
based on intuition building on a phenomenon called neural collapse and the
observation that most of the accurate classifiers found in practice can be
thought of as a union of K different functions which can be recalibrated
separately, one for each class. These typically out-perform their non
class-wise counterparts, especially for classifiers trained on imbalanced data
sets. Applying the two methods together results in class-wise reduced
calibration algorithms, which are powerful tools for reducing the prediction
and per-class calibration errors. We demonstrate our methods on real and
synthetic datasets and release all code as open source at
https://github.com/appliedAI-Initiative
Authors' comments: Accepted at the 21st IEEE International Conference on Machine
Learning and Applications, ICMLA 2022