Amir Hadifar, Johannes Deleu, Chris Develder, Thomas Demeester
Neural networks have achieved state of the art performance across a wide variety of machine learning tasks, often with large and computation-heavy models. Inducing sparseness as a way to reduce the memory and computation footprint of these models has seen significant research attention in recent years. In this paper, we present a new method for \emph{dynamic sparseness}, whereby part of the computations are omitted dynamically, based on the input. For efficiency, we combined the idea of dynamic sparseness with block-wise matrix-vector multiplications. In contrast to static sparseness, which permanently zeroes out selected positions in weight matrices, our method preserves the full network capabilities by potentially accessing any trained weights. Yet, matrix vector multiplications are accelerated by omitting a pre-defined fraction of weight blocks from the matrix, based on the input. Experimental results on the task of language modeling, using recurrent and quasi-recurrent models, show that the proposed method can outperform a magnitude-based static sparseness baseline. In addition, our method achieves similar language modeling perplexities as the dense baseline, at half the computational cost at inference time.
Tuyen Trung Truong
Let $z=(x,y)$ be coordinates for the product space $\mathbb{R}^{m_1}\times
\mathbb{R}^{m_2}$. Let $f:\mathbb{R}^{m_1}\times \mathbb{R}^{m_2}\rightarrow
\mathbb{R}$ be a $C^1$ function, and $\nabla f=(\partial _xf,\partial _yf)$ its
gradient. Fix $0<\alpha <1$. For a point $(x,y) \in \mathbb{R}^{m_1}\times
\mathbb{R}^{m_2}$, a number $\delta >0$ satisfies Armijo's condition at $(x,y)$
if the following inequality holds: \begin{eqnarray*} f(x-\delta \partial
_xf,y-\delta \partial _yf)-f(x,y)\leq -\alpha \delta (||\partial
_xf||^2+||\partial _yf||^2). \end{eqnarray*}
When $f(x,y)=f_1(x)+f_2(y)$ is a coordinate-wise sum map, we propose the
following {\bf coordinate-wise} Armijo's condition. Fix again $0<\alpha <1$. A
pair of positive numbers $\delta _1,\delta _2>0$ satisfies the coordinate-wise
variant of Armijo's condition at $(x,y)$ if the following inequality holds:
\begin{eqnarray*} [f_1(x-\delta _1\nabla f_1(x))+f_2(y-\delta _2\nabla
f_2(y))]-[f_1(x)+f_2(y)]\leq -\alpha (\delta _1||\nabla f_1(x)||^2+\delta
_2||\nabla f_2(y)||^2). \end{eqnarray*}
We then extend results in our recent previous results, on Backtracking
Gradient Descent and some variants, to this setting. We show by an example the
advantage of using coordinate-wise Armijo's condition over the usual Armijo's
condition.
Authors' comments: 6 pages
Klas Leino, Emily Black, Matt Fredrikson, Shayak Sen, Anupam Datta
We study the phenomenon of bias amplification in classifiers, wherein a
machine learning model learns to predict classes with a greater disparity than
the underlying ground truth. We demonstrate that bias amplification can arise
via an inductive bias in gradient descent methods that results in the
overestimation of the importance of moderately-predictive "weak" features if
insufficient training data is available. This overestimation gives rise to
feature-wise bias amplification -- a previously unreported form of bias that
can be traced back to the features of a trained model. Through analysis and
experiments, we show that while some bias cannot be mitigated without
sacrificing accuracy, feature-wise bias amplification can be mitigated through
targeted feature selection. We present two new feature selection algorithms for
mitigating bias amplification in linear models, and show how they can be
adapted to convolutional neural networks efficiently. Our experiments on
synthetic and real data demonstrate that these algorithms consistently lead to
reduced bias without harming accuracy, in some cases eliminating predictive
bias altogether while providing modest gains in accuracy.
Authors' comments: Published in ICLR 2019
Yilin Song, Chenge Li, Yao Wang
In this paper, we propose a novel pixel-wise visual object tracking framework that can track any anonymous object in a noisy background. The framework consists of two submodels, a global attention model and a local segmentation model. The global model generates a region of interests (ROI) that the object may lie in the new frame based on the past object segmentation maps, while the local model segments the new image in the ROI. Each model uses a LSTM structure to model the temporal dynamics of the motion and appearance, respectively. To circumvent the dependency of the training data between the two models, we use an iterative update strategy. Once the models are trained, there is no need to refine them to track specific objects, making our method efficient compared to online learning approaches. We demonstrate our real time pixel-wise object tracking framework on a challenging VOT dataset
R. J. Assef, D. Stern, G. Noirot, H. D. Jun, R. M. Cutri, P. R. M. Eisenhardt
We present two large catalogs of AGN candidates identified across ~75% of the
sky from the Wide-field Infrared Survey Explorer's AllWISE Data Release. Both
catalogs, some of the largest such catalogs published to date, are selected
purely on the basis of mid-IR photometry in the WISE W1 and W2 bands. The
catalogs are designed to be appropriate for a broad range of scientific
investigations, with one catalog emphasizing reliability while the other
emphasizes completeness. Specifically, the R90 catalog consists of 4,543,530
AGN candidates with 90% reliability, while the C75 catalog consists of
20,907,127 AGN candidates with 75% completeness. We provide a detailed
discussion of potential artifacts, and excise portions of the sky close to the
Galactic Center, Galactic Plane, nearby galaxies, and other expected
contaminating sources. Our final catalogs cover 30,093 deg^2 of extragalactic
sky. These catalogs are expected to enable a broad range of science, and we
present a few simple illustrative cases. From the R90 sample we identify 45
highly variable AGN lacking radio counterparts in the FIRST survey, implying
they are unlikely to be blazars. One of these sources, WISEA
J142846.71+172353.1, is a mid-IR-identified changing-look quasar at z=0.104. We
characterize our catalogs by comparing them to large, wide-area AGN catalogs in
the literature, specifically UV-to-near-IR quasar selections from SDSS and
XDQSOz, mid-IR selection from Secrest et al. (2015) and X-ray selection from
ROSAT. From the latter work, we identify four ROSAT X-ray sources that each are
matched to three WISE-selected AGN in the R90 sample within 30". Palomar
spectroscopy reveals one of these systems, 2RXS J150158.6+691029, to consist of
a triplet of quasars at z=1.133 +/- 0.004, suggestive of a rich group or
forming galaxy cluster.(Abridged)
Authors' comments: Accepted for publication in the Astrophysical Journal Supplements.
Updated with comments from the referee. 20 pages, 15 figures, 8 tables. The
WISE AGN Catalogs can be made available upon request by writing to
roberto.assef@mail.udp.cl
Hidekazu Oiwa, Ryohei Fujimaki
Region-specific linear models are widely used in practical applications
because of their non-linear but highly interpretable model representations. One
of the key challenges in their use is non-convexity in simultaneous
optimization of regions and region-specific models. This paper proposes novel
convex region-specific linear models, which we refer to as partition-wise
linear models. Our key ideas are 1) assigning linear models not to regions but
to partitions (region-specifiers) and representing region-specific linear
models by linear combinations of partition-specific models, and 2) optimizing
regions via partition selection from a large number of given partition
candidates by means of convex structured regularizations. In addition to
providing initialization-free globally-optimal solutions, our convex
formulation makes it possible to derive a generalization bound and to use such
advanced optimization techniques as proximal methods and decomposition of the
proximal maps for sparsity-inducing regularizations. Experimental results
demonstrate that our partition-wise linear models perform better than or are at
least competitive with state-of-the-art region-specific or locally linear
models.
Authors' comments: 15 pages
Egor Ianovski
We consider equivalence relations and preorders complete for various levels of the arithmetical hierarchy under computable, component-wise reducibility. We show that implication in first order logic is a complete preorder for $\SI 1$, the $\le^P_m$ relation on EXPTIME sets for $\SI 2$ and the embeddability of computable subgroups of $(\QQ,+)$ for $\SI 3$. In all cases, the symmetric fragment of the preorder is complete for equivalence relations on the same level. We present a characterisation of $\PI 1$ equivalence relations which allows us to establish that equality of polynomial time functions and inclusion of polynomial time sets are complete for $\PI 1$ equivalence relations and preorders respectively. We also show that this is the limit of the enquiry: for $n\geq 2$ there are no $\PI n$ nor $\DE n$-complete equivalence relations.
Willem-Jan Vriend, Edwin A. Valentijn, Andrey Belikov, Gijs A. Verdoes Kleijn
Astro-WISE is a scientific information system for the data processing of
optical images. In this paper we review main features of Astro-WISE and
describe the current status of the system.
Authors' comments: 4 pages, Proc. of ADASS XXI, ASP Conference Series
David R. Hardoon, Kristiaan Pelcksman
This paper studies the problem of learning clusters which are consistently present in different (continuously valued) representations of observed data. Our setup differs slightly from the standard approach of (co-) clustering as we use the fact that some form of `labeling' becomes available in this setup: a cluster is only interesting if it has a counterpart in the alternative representation. The contribution of this paper is twofold: (i) the problem setting is explored and an analysis in terms of the PAC-Bayesian theorem is presented, (ii) a practical kernel-based algorithm is derived exploiting the inherent relation to Canonical Correlation Analysis (CCA), as well as its extension to multiple views. A content based information retrieval (CBIR) case study is presented on the multi-lingual aligned Europal document dataset which supports the above findings.
R. D. Kenway
After a brief introduction to the Heavy-Quark Effective Theory (HQET), I
review the extraction of the Isgur-Wise function from lattice QCD calculations
of the matrix elements for semi-leptonic decays of heavy-light pseudoscalar
mesons both into pseudoscalar and into vector mesons. This work is beginning to
test the heavy-quark spin-flavour symmetries around the charm mass and to
indicate the size of $O(1/m_c)$ corrections. An alternative approach to put the
HQET on the lattice offers the prospect of computing the Isgur-Wise function
directly.
Authors' comments: 6 pages, uuencoded compressed tar postscript file, Edinburgh preprint
93/535, Talk presented at LATTICE 93 Dallas
Mei Qiu, William Lorenz Reindl, Yaobin Chen, Stanley Chien, Shu Hu
This paper proposes a scalable and interpretable framework for lane-wise highway traffic anomaly detection, leveraging multi-modal time series data extracted from surveillance cameras. Unlike traditional sensor-dependent methods, our approach uses AI-powered vision models to extract lane-specific features, including vehicle count, occupancy, and truck percentage, without relying on costly hardware or complex road modeling. We introduce a novel dataset containing 73,139 lane-wise samples, annotated with four classes of expert-validated anomalies: three traffic-related anomalies (lane blockage and recovery, foreign object intrusion, and sustained congestion) and one sensor-related anomaly (camera angle shift). Our multi-branch detection system integrates deep learning, rule-based logic, and machine learning to improve robustness and precision. Extensive experiments demonstrate that our framework outperforms state-of-the-art methods in precision, recall, and F1-score, providing a cost-effective and scalable solution for real-world intelligent transportation systems.
Tejaswini Medi, Julia Grabinski, Margret Keuper
While being very successful in solving many downstream tasks, the application of deep neural networks is limited in real-life scenarios because of their susceptibility to domain shifts such as common corruptions, and adversarial attacks. The existence of adversarial examples and data corruption significantly reduces the performance of deep classification models. Researchers have made strides in developing robust neural architectures to bolster decisions of deep classifiers. However, most of these works rely on effective adversarial training methods, and predominantly focus on overall model robustness, disregarding class-wise differences in robustness, which are critical. Exploiting weakly robust classes is a potential avenue for attackers to fool the image recognition models. Therefore, this study investigates class-to-class biases across adversarially trained robust classification models to understand their latent space structures and analyze their strong and weak class-wise properties. We further assess the robustness of classes against common corruptions and adversarial attacks, recognizing that class vulnerability extends beyond the number of correct classifications for a specific class. We find that the number of false positives of classes as specific target classes significantly impacts their vulnerability to attacks. Through our analysis on the Class False Positive Score, we assess a fair evaluation of how susceptible each class is to misclassification.
Giuseppe Bisicchia, Giuseppe Clemente, Jose Garcia-Alonso, Juan Manuel Murillo Rodríguez, Massimo D'Elia, Antonio Brogi
NISQ (Noisy Intermediate-Scale Quantum) era constraints, high sensitivity to
noise and limited qubit count, impose significant barriers on the usability of
QPUs (Quantum Process Units) capabilities. To overcome these challenges,
researchers are exploring methods to maximize the utility of existing QPUs
despite their limitations. Building upon the idea that the execution of a
quantum circuit's shots needs not to be treated as a singular monolithic unit,
we propose a methodological framework, termed shot-wise, which enables the
distribution of shots for a single circuit across multiple QPUs. Our framework
features customizable policies to adapt to various scenarios. Additionally, it
introduces a calibration method to pre-evaluate the accuracy and reliability of
each QPU's output before the actual distribution process and an incremental
execution mechanism for dynamically managing the shot allocation and policy
updates. Such an approach enables flexible and fine-grained management of the
distribution process, taking into account various user-defined constraints and
(contrasting) objectives. Experimental findings show that while these
strategies generally do not exceed the best individual QPU results, they
maintain robustness and align closely with average outcomes. Overall, the
shot-wise methodology improves result stability and often outperforms single
QPU runs, offering a flexible approach to managing variability in quantum
computing.
Authors' comments: 22 pages, 7 figures
Brayan Monroy, Jorge Bacca
In this paper, we introduce an efficient algorithm for generating specific Hadamard rows, addressing the memory demands of pre-computing the entire matrix. Leveraging Sylvester's recursive construction, our method generates the required $i$-th row on demand, significantly reducing computational resources. The algorithm uses the Kronecker product to construct the desired row from the binary representation of the index, without creating the full matrix. This approach is particularly useful for single-pixel imaging systems that need only one row at a time.
Kun Ma, Cong Xu, Zeyuan Chen, Wei Zhang
A transparent decision-making process is essential for developing reliable
and trustworthy recommender systems. For sequential recommendation, it means
that the model can identify key items that account for its recommendation
results. However, achieving both interpretability and recommendation
performance simultaneously is challenging, especially for models that take the
entire sequence of items as input without screening. In this paper, we propose
an interpretable framework (named PTSR) that enables a pattern-wise transparent
decision-making process without extra features. It breaks the sequence of items
into multi-level patterns that serve as atomic units throughout the
recommendation process. The contribution of each pattern to the outcome is
quantified in the probability space. With a carefully designed score correction
mechanism, the pattern contribution can be implicitly learned in the absence of
ground-truth key patterns. The final recommended items are those that most key
patterns strongly endorse. Extensive experiments on five public datasets
demonstrate remarkable recommendation performance, while statistical analysis
and case studies validate the model interpretability.
Authors' comments: This paper has been accepted by IEEE TKDE
Linara Adilova, Maksym Andriushchenko, Michael Kamp, Asja Fischer, Martin Jaggi
Averaging neural network parameters is an intuitive method for fusing the
knowledge of two independent models. It is most prominently used in federated
learning. If models are averaged at the end of training, this can only lead to
a good performing model if the loss surface of interest is very particular,
i.e., the loss in the midpoint between the two models needs to be sufficiently
low. This is impossible to guarantee for the non-convex losses of
state-of-the-art networks. For averaging models trained on vastly different
datasets, it was proposed to average only the parameters of particular layers
or combinations of layers, resulting in better performing models. To get a
better understanding of the effect of layer-wise averaging, we analyse the
performance of the models that result from averaging single layers, or groups
of layers. Based on our empirical and theoretical investigation, we introduce a
novel notion of the layer-wise linear connectivity, and show that deep networks
do not have layer-wise barriers between them.
Authors' comments: published at ICLR24
Weiye Zhao, Rui Chen, Yifan Sun, Tianhao Wei, Changliu Liu
Reinforcement Learning (RL) algorithms have shown tremendous success in
simulation environments, but their application to real-world problems faces
significant challenges, with safety being a major concern. In particular,
enforcing state-wise constraints is essential for many challenging tasks such
as autonomous driving and robot manipulation. However, existing safe RL
algorithms under the framework of Constrained Markov Decision Process (CMDP) do
not consider state-wise constraints. To address this gap, we propose State-wise
Constrained Policy Optimization (SCPO), the first general-purpose policy search
algorithm for state-wise constrained reinforcement learning. SCPO provides
guarantees for state-wise constraint satisfaction in expectation. In
particular, we introduce the framework of Maximum Markov Decision Process, and
prove that the worst-case safety violation is bounded under SCPO. We
demonstrate the effectiveness of our approach on training neural network
policies for extensive robot locomotion tasks, where the agent must satisfy a
variety of state-wise safety constraints. Our results show that SCPO
significantly outperforms existing methods and can handle state-wise
constraints in high-dimensional robotics tasks.
Authors' comments: arXiv admin note: text overlap with arXiv:2305.13681
Guanchu Wang, Ninghao Liu, Daochen Zha, Xia Hu
Anomaly detection, where data instances are discovered containing feature patterns different from the majority, plays a fundamental role in various applications. However, it is challenging for existing methods to handle the scenarios where the instances are systems whose characteristics are not readily observed as data. Appropriate interactions are needed to interact with the systems and identify those with abnormal responses. Detecting system-wise anomalies is a challenging task due to several reasons including: how to formally define the system-wise anomaly detection problem; how to find the effective activation signal for interacting with systems to progressively collect the data and learn the detector; how to guarantee stable training in such a non-stationary scenario with real-time interactions? To address the challenges, we propose InterSAD (Interactive System-wise Anomaly Detection). Specifically, first, we adopt Markov decision process to model the interactive systems, and define anomalous systems as anomalous transition and anomalous reward systems. Then, we develop an end-to-end approach which includes an encoder-decoder module that learns system embeddings, and a policy network to generate effective activation for separating embeddings of normal and anomaly systems. Finally, we design a training method to stabilize the learning process, which includes a replay buffer to store historical interaction data and allow them to be re-sampled. Experiments on two benchmark environments, including identifying the anomalous robotic systems and detecting user data poisoning in recommendation models, demonstrate the superiority of InterSAD compared with state-of-the-art baselines methods.
Xu Ma, Yuqian Zhou, Xingqian Xu, Bin Sun, Valerii Filev, Nikita Orlov, Yun Fu, Humphrey Shi
Image rasterization is a mature technique in computer graphics, while image
vectorization, the reverse path of rasterization, remains a major challenge.
Recent advanced deep learning-based models achieve vectorization and semantic
interpolation of vector graphs and demonstrate a better topology of generating
new figures. However, deep models cannot be easily generalized to out-of-domain
testing data. The generated SVGs also contain complex and redundant shapes that
are not quite convenient for further editing. Specifically, the crucial
layer-wise topology and fundamental semantics in images are still not well
understood and thus not fully explored. In this work, we propose Layer-wise
Image Vectorization, namely LIVE, to convert raster images to SVGs and
simultaneously maintain its image topology. LIVE can generate compact SVG forms
with layer-wise structures that are semantically consistent with human
perspective. We progressively add new bezier paths and optimize these paths
with the layer-wise framework, newly designed loss functions, and
component-wise path initialization technique. Our experiments demonstrate that
LIVE presents more plausible vectorized forms than prior works and can be
generalized to new images. With the help of this newly learned topology, LIVE
initiates human editable SVGs for both designers and other downstream
applications. Codes are made available at
https://github.com/Picsart-AI-Research/LIVE-Layerwise-Image-Vectorization.
Authors' comments: Accepted as Oral Presentation at CVPR 2022
Hyeokjun Kweon, Hyeonseong Kim, Yoonsu Kang, Youngho Yoon, Wooseong Jeong, Kuk-Jin Yoon
Image stitching aims at stitching the images taken from different viewpoints into an image with a wider field of view. Existing methods warp the target image to the reference image using the estimated warp function, and a homography is one of the most commonly used warping functions. However, when images have large parallax due to non-planar scenes and translational motion of a camera, the homography cannot fully describe the mapping between two images. Existing approaches based on global or local homography estimation are not free from this problem and suffer from undesired artifacts due to parallax. In this paper, instead of relying on the homography-based warp, we propose a novel deep image stitching framework exploiting the pixel-wise warp field to handle the large-parallax problem. The proposed deep image stitching framework consists of two modules: Pixel-wise Warping Module (PWM) and Stitched Image Generating Module (SIGMo). PWM employs an optical flow estimation model to obtain pixel-wise warp of the whole image, and relocates the pixels of the target image with the obtained warp field. SIGMo blends the warped target image and the reference image while eliminating unwanted artifacts such as misalignments, seams, and holes that harm the plausibility of the stitched result. For training and evaluating the proposed framework, we build a large-scale dataset that includes image pairs with corresponding pixel-wise ground truth warp and sample stitched result images. We show that the results of the proposed framework are qualitatively superior to those of the conventional methods, especially when the images have large parallax. The code and the proposed dataset will be publicly available soon.