Han Liu, Yangyang Guo, Jianhua Yin, Zan Gao, Liqiang Nie
Utilizing review information to enhance recommendation, the de facto review-involved recommender systems, have received increasing interests over the past few years. Thereinto, one advanced branch is to extract salient aspects from textual reviews (i.e., the item attributes that users express) and combine them with the matrix factorization technique. However, existing approaches all ignore the fact that semantically different reviews often include opposite aspect information. In particular, positive reviews usually express aspects that users prefer, while negative ones describe aspects that users reject. As a result, it may mislead the recommender systems into making incorrect decisions pertaining to user preference modeling. Towards this end, in this paper, we propose a Review Polarity-wise Recommender model, dubbed as RPR, to discriminately treat reviews with different polarities. To be specific, in this model, positive and negative reviews are separately gathered and utilized to model the user-preferred and user-rejected aspects, respectively. Besides, in order to overcome the imbalance problem of semantically different reviews, we also develop an aspect-aware importance weighting approach to align the aspect importance for these two kinds of reviews. Extensive experiments conducted on eight benchmark datasets have demonstrated the superiority of our model as compared to a series of state-of-the-art review-involved baselines. Moreover, our method can provide certain explanations to the real-world rating prediction scenarios.
Alfonso Artigue, Bernardo Carvalho, Welington Cordeiro, José Vieitez
We introduce continuum-wise hyperbolicity, a generalization of hyperbolicity with respect to the continuum theory. We discuss similarities and differences between topological hyperbolicity and continuum-wise hyperbolicity. A shadowing lemma for cw-hyperbolic homeomorphisms is proved in the form of the L-shadowing property and a Spectral Decomposition is obtained in this scenario. In the proof we generalize the construction of Fathi \cite{Fat89} of a hyperbolic metric using only cw-expansivity, obtaining a hyperbolic cw-metric. We also introduce cwN-hyperbolicity, exhibit examples of these systems for arbitrarily large $N\in\mathbb{N}$ and obtain further dynamical properties of these systems such as finiteness of periodic points with the same period. We prove that homeomorphisms of $\mathbb{S}^2$ that are induced by topologically hyperbolic homeomorphisms of $\mathbb{T}^2$ are continuum-wise-hyperbolic and topologically conjugate to linear cw-Anosov diffeomorphisms of $\mathbb{S}^2$, being in particular cw2-hyperbolic.
Amir Hadifar, Johannes Deleu, Chris Develder, Thomas Demeester
Neural networks have achieved state of the art performance across a wide variety of machine learning tasks, often with large and computation-heavy models. Inducing sparseness as a way to reduce the memory and computation footprint of these models has seen significant research attention in recent years. In this paper, we present a new method for \emph{dynamic sparseness}, whereby part of the computations are omitted dynamically, based on the input. For efficiency, we combined the idea of dynamic sparseness with block-wise matrix-vector multiplications. In contrast to static sparseness, which permanently zeroes out selected positions in weight matrices, our method preserves the full network capabilities by potentially accessing any trained weights. Yet, matrix vector multiplications are accelerated by omitting a pre-defined fraction of weight blocks from the matrix, based on the input. Experimental results on the task of language modeling, using recurrent and quasi-recurrent models, show that the proposed method can outperform a magnitude-based static sparseness baseline. In addition, our method achieves similar language modeling perplexities as the dense baseline, at half the computational cost at inference time.
Tuyen Trung Truong
Let $z=(x,y)$ be coordinates for the product space $\mathbb{R}^{m_1}\times
\mathbb{R}^{m_2}$. Let $f:\mathbb{R}^{m_1}\times \mathbb{R}^{m_2}\rightarrow
\mathbb{R}$ be a $C^1$ function, and $\nabla f=(\partial _xf,\partial _yf)$ its
gradient. Fix $0<\alpha <1$. For a point $(x,y) \in \mathbb{R}^{m_1}\times
\mathbb{R}^{m_2}$, a number $\delta >0$ satisfies Armijo's condition at $(x,y)$
if the following inequality holds: \begin{eqnarray*} f(x-\delta \partial
_xf,y-\delta \partial _yf)-f(x,y)\leq -\alpha \delta (||\partial
_xf||^2+||\partial _yf||^2). \end{eqnarray*}
When $f(x,y)=f_1(x)+f_2(y)$ is a coordinate-wise sum map, we propose the
following {\bf coordinate-wise} Armijo's condition. Fix again $0<\alpha <1$. A
pair of positive numbers $\delta _1,\delta _2>0$ satisfies the coordinate-wise
variant of Armijo's condition at $(x,y)$ if the following inequality holds:
\begin{eqnarray*} [f_1(x-\delta _1\nabla f_1(x))+f_2(y-\delta _2\nabla
f_2(y))]-[f_1(x)+f_2(y)]\leq -\alpha (\delta _1||\nabla f_1(x)||^2+\delta
_2||\nabla f_2(y)||^2). \end{eqnarray*}
We then extend results in our recent previous results, on Backtracking
Gradient Descent and some variants, to this setting. We show by an example the
advantage of using coordinate-wise Armijo's condition over the usual Armijo's
condition.
Authors' comments: 6 pages
Klas Leino, Emily Black, Matt Fredrikson, Shayak Sen, Anupam Datta
We study the phenomenon of bias amplification in classifiers, wherein a
machine learning model learns to predict classes with a greater disparity than
the underlying ground truth. We demonstrate that bias amplification can arise
via an inductive bias in gradient descent methods that results in the
overestimation of the importance of moderately-predictive "weak" features if
insufficient training data is available. This overestimation gives rise to
feature-wise bias amplification -- a previously unreported form of bias that
can be traced back to the features of a trained model. Through analysis and
experiments, we show that while some bias cannot be mitigated without
sacrificing accuracy, feature-wise bias amplification can be mitigated through
targeted feature selection. We present two new feature selection algorithms for
mitigating bias amplification in linear models, and show how they can be
adapted to convolutional neural networks efficiently. Our experiments on
synthetic and real data demonstrate that these algorithms consistently lead to
reduced bias without harming accuracy, in some cases eliminating predictive
bias altogether while providing modest gains in accuracy.
Authors' comments: Published in ICLR 2019
Yilin Song, Chenge Li, Yao Wang
In this paper, we propose a novel pixel-wise visual object tracking framework that can track any anonymous object in a noisy background. The framework consists of two submodels, a global attention model and a local segmentation model. The global model generates a region of interests (ROI) that the object may lie in the new frame based on the past object segmentation maps, while the local model segments the new image in the ROI. Each model uses a LSTM structure to model the temporal dynamics of the motion and appearance, respectively. To circumvent the dependency of the training data between the two models, we use an iterative update strategy. Once the models are trained, there is no need to refine them to track specific objects, making our method efficient compared to online learning approaches. We demonstrate our real time pixel-wise object tracking framework on a challenging VOT dataset
R. J. Assef, D. Stern, G. Noirot, H. D. Jun, R. M. Cutri, P. R. M. Eisenhardt
We present two large catalogs of AGN candidates identified across ~75% of the
sky from the Wide-field Infrared Survey Explorer's AllWISE Data Release. Both
catalogs, some of the largest such catalogs published to date, are selected
purely on the basis of mid-IR photometry in the WISE W1 and W2 bands. The
catalogs are designed to be appropriate for a broad range of scientific
investigations, with one catalog emphasizing reliability while the other
emphasizes completeness. Specifically, the R90 catalog consists of 4,543,530
AGN candidates with 90% reliability, while the C75 catalog consists of
20,907,127 AGN candidates with 75% completeness. We provide a detailed
discussion of potential artifacts, and excise portions of the sky close to the
Galactic Center, Galactic Plane, nearby galaxies, and other expected
contaminating sources. Our final catalogs cover 30,093 deg^2 of extragalactic
sky. These catalogs are expected to enable a broad range of science, and we
present a few simple illustrative cases. From the R90 sample we identify 45
highly variable AGN lacking radio counterparts in the FIRST survey, implying
they are unlikely to be blazars. One of these sources, WISEA
J142846.71+172353.1, is a mid-IR-identified changing-look quasar at z=0.104. We
characterize our catalogs by comparing them to large, wide-area AGN catalogs in
the literature, specifically UV-to-near-IR quasar selections from SDSS and
XDQSOz, mid-IR selection from Secrest et al. (2015) and X-ray selection from
ROSAT. From the latter work, we identify four ROSAT X-ray sources that each are
matched to three WISE-selected AGN in the R90 sample within 30". Palomar
spectroscopy reveals one of these systems, 2RXS J150158.6+691029, to consist of
a triplet of quasars at z=1.133 +/- 0.004, suggestive of a rich group or
forming galaxy cluster.(Abridged)
Authors' comments: Accepted for publication in the Astrophysical Journal Supplements.
Updated with comments from the referee. 20 pages, 15 figures, 8 tables. The
WISE AGN Catalogs can be made available upon request by writing to
roberto.assef@mail.udp.cl
Hidekazu Oiwa, Ryohei Fujimaki
Region-specific linear models are widely used in practical applications
because of their non-linear but highly interpretable model representations. One
of the key challenges in their use is non-convexity in simultaneous
optimization of regions and region-specific models. This paper proposes novel
convex region-specific linear models, which we refer to as partition-wise
linear models. Our key ideas are 1) assigning linear models not to regions but
to partitions (region-specifiers) and representing region-specific linear
models by linear combinations of partition-specific models, and 2) optimizing
regions via partition selection from a large number of given partition
candidates by means of convex structured regularizations. In addition to
providing initialization-free globally-optimal solutions, our convex
formulation makes it possible to derive a generalization bound and to use such
advanced optimization techniques as proximal methods and decomposition of the
proximal maps for sparsity-inducing regularizations. Experimental results
demonstrate that our partition-wise linear models perform better than or are at
least competitive with state-of-the-art region-specific or locally linear
models.
Authors' comments: 15 pages
Egor Ianovski
We consider equivalence relations and preorders complete for various levels of the arithmetical hierarchy under computable, component-wise reducibility. We show that implication in first order logic is a complete preorder for $\SI 1$, the $\le^P_m$ relation on EXPTIME sets for $\SI 2$ and the embeddability of computable subgroups of $(\QQ,+)$ for $\SI 3$. In all cases, the symmetric fragment of the preorder is complete for equivalence relations on the same level. We present a characterisation of $\PI 1$ equivalence relations which allows us to establish that equality of polynomial time functions and inclusion of polynomial time sets are complete for $\PI 1$ equivalence relations and preorders respectively. We also show that this is the limit of the enquiry: for $n\geq 2$ there are no $\PI n$ nor $\DE n$-complete equivalence relations.
Willem-Jan Vriend, Edwin A. Valentijn, Andrey Belikov, Gijs A. Verdoes Kleijn
Astro-WISE is a scientific information system for the data processing of
optical images. In this paper we review main features of Astro-WISE and
describe the current status of the system.
Authors' comments: 4 pages, Proc. of ADASS XXI, ASP Conference Series
David R. Hardoon, Kristiaan Pelcksman
This paper studies the problem of learning clusters which are consistently present in different (continuously valued) representations of observed data. Our setup differs slightly from the standard approach of (co-) clustering as we use the fact that some form of `labeling' becomes available in this setup: a cluster is only interesting if it has a counterpart in the alternative representation. The contribution of this paper is twofold: (i) the problem setting is explored and an analysis in terms of the PAC-Bayesian theorem is presented, (ii) a practical kernel-based algorithm is derived exploiting the inherent relation to Canonical Correlation Analysis (CCA), as well as its extension to multiple views. A content based information retrieval (CBIR) case study is presented on the multi-lingual aligned Europal document dataset which supports the above findings.
R. D. Kenway
After a brief introduction to the Heavy-Quark Effective Theory (HQET), I
review the extraction of the Isgur-Wise function from lattice QCD calculations
of the matrix elements for semi-leptonic decays of heavy-light pseudoscalar
mesons both into pseudoscalar and into vector mesons. This work is beginning to
test the heavy-quark spin-flavour symmetries around the charm mass and to
indicate the size of $O(1/m_c)$ corrections. An alternative approach to put the
HQET on the lattice offers the prospect of computing the Isgur-Wise function
directly.
Authors' comments: 6 pages, uuencoded compressed tar postscript file, Edinburgh preprint
93/535, Talk presented at LATTICE 93 Dallas
Rowan Martnishn, Sean Anderson
As machine learning models grow in complexity, they increasingly struggle with three conflicting demands: the need for high accuracy, the requirement for hardware efficiency, and the necessity of functional stability. Traditional architectures often achieve performance at the expense of spiky or unpredictable behavior, where small changes in input lead to massive swings in output -- a critical flaw for real-world deployment in sensitive environments. This paper introduces ChainzRule (CR), a novel neural architecture designed to harmonize these competing goals. ChainzRule replaces standard piecewise-linear activations with a Polynomial Engine governed by Differential Regularization (DREG). Unlike traditional methods that impose global, coarse-grained constraints on a model's Lipschitz constant, DREG acts as a targeted regularization on intermediate derivatives. This approach suppresses extreme sensitivity without attenuating the representational power inherent in the Polynomial Engine. In head-to-head "Fair Fight" benchmarks, ChainzRule outperformed standard models while using 15.5x fewer parameters. On the MNIST dataset, it reduced peak gradient volatility by an average of 23.1%, ensuring a smoother and more predictable manifold. On Yelp Full ordinal regression under explicit DREG regularization, ChainzRule achieves 70.17% accuracy, validating that derivative-aware regularization is compatible with competitive performance on realistic tasks. By embedding gradient awareness into the architecture via DREG, ChainzRule demonstrates that stability and accuracy need not be competing objectives.
Authors' comments: Under Review at Neural Network Elsevier
Timo Kuosmanen, Juan F. Monge, José L. Ruiz, Xun Zhou
Isotonic regression provides a flexible, tuning-free approach to estimating monotonic functions without imposing global curvature constraints, yet the estimated regression function is inherently a step function. This paper addresses a key limitation of such estimators: their inability to provide meaningful marginal properties, such as shadow prices or elasticities. We propose a novel piece-wise linear smoothing framework that recovers meaningful marginal estimates even in non-convex settings. Building on the concept of conditional convexity originally developed in deterministic frontier analysis, we formulate the smoothing process as a bilevel optimization problem that fits a continuous, monotonic, piece-wise linear function to the initial isotonic regression predictions. Monte Carlo simulations demonstrate that the proposed approach can significantly improve estimation precision, reducing mean squared error in both convex and non-convex settings for univariate and multivariate data. We apply this approach to analyze agglomeration economies in Finnish municipalities, illustrating its practical value.
Mei Qiu, William Lorenz Reindl, Yaobin Chen, Stanley Chien, Shu Hu
This paper proposes a scalable and interpretable framework for lane-wise highway traffic anomaly detection, leveraging multi-modal time series data extracted from surveillance cameras. Unlike traditional sensor-dependent methods, our approach uses AI-powered vision models to extract lane-specific features, including vehicle count, occupancy, and truck percentage, without relying on costly hardware or complex road modeling. We introduce a novel dataset containing 73,139 lane-wise samples, annotated with four classes of expert-validated anomalies: three traffic-related anomalies (lane blockage and recovery, foreign object intrusion, and sustained congestion) and one sensor-related anomaly (camera angle shift). Our multi-branch detection system integrates deep learning, rule-based logic, and machine learning to improve robustness and precision. Extensive experiments demonstrate that our framework outperforms state-of-the-art methods in precision, recall, and F1-score, providing a cost-effective and scalable solution for real-world intelligent transportation systems.
Tejaswini Medi, Julia Grabinski, Margret Keuper
While being very successful in solving many downstream tasks, the application of deep neural networks is limited in real-life scenarios because of their susceptibility to domain shifts such as common corruptions, and adversarial attacks. The existence of adversarial examples and data corruption significantly reduces the performance of deep classification models. Researchers have made strides in developing robust neural architectures to bolster decisions of deep classifiers. However, most of these works rely on effective adversarial training methods, and predominantly focus on overall model robustness, disregarding class-wise differences in robustness, which are critical. Exploiting weakly robust classes is a potential avenue for attackers to fool the image recognition models. Therefore, this study investigates class-to-class biases across adversarially trained robust classification models to understand their latent space structures and analyze their strong and weak class-wise properties. We further assess the robustness of classes against common corruptions and adversarial attacks, recognizing that class vulnerability extends beyond the number of correct classifications for a specific class. We find that the number of false positives of classes as specific target classes significantly impacts their vulnerability to attacks. Through our analysis on the Class False Positive Score, we assess a fair evaluation of how susceptible each class is to misclassification.
Giuseppe Bisicchia, Giuseppe Clemente, Jose Garcia-Alonso, Juan Manuel Murillo Rodríguez, Massimo D'Elia, Antonio Brogi
NISQ (Noisy Intermediate-Scale Quantum) era constraints, high sensitivity to
noise and limited qubit count, impose significant barriers on the usability of
QPUs (Quantum Process Units) capabilities. To overcome these challenges,
researchers are exploring methods to maximize the utility of existing QPUs
despite their limitations. Building upon the idea that the execution of a
quantum circuit's shots needs not to be treated as a singular monolithic unit,
we propose a methodological framework, termed shot-wise, which enables the
distribution of shots for a single circuit across multiple QPUs. Our framework
features customizable policies to adapt to various scenarios. Additionally, it
introduces a calibration method to pre-evaluate the accuracy and reliability of
each QPU's output before the actual distribution process and an incremental
execution mechanism for dynamically managing the shot allocation and policy
updates. Such an approach enables flexible and fine-grained management of the
distribution process, taking into account various user-defined constraints and
(contrasting) objectives. Experimental findings show that while these
strategies generally do not exceed the best individual QPU results, they
maintain robustness and align closely with average outcomes. Overall, the
shot-wise methodology improves result stability and often outperforms single
QPU runs, offering a flexible approach to managing variability in quantum
computing.
Authors' comments: 22 pages, 7 figures
Brayan Monroy, Jorge Bacca
In this paper, we introduce an efficient algorithm for generating specific Hadamard rows, addressing the memory demands of pre-computing the entire matrix. Leveraging Sylvester's recursive construction, our method generates the required $i$-th row on demand, significantly reducing computational resources. The algorithm uses the Kronecker product to construct the desired row from the binary representation of the index, without creating the full matrix. This approach is particularly useful for single-pixel imaging systems that need only one row at a time.
Kun Ma, Cong Xu, Zeyuan Chen, Wei Zhang
A transparent decision-making process is essential for developing reliable
and trustworthy recommender systems. For sequential recommendation, it means
that the model can identify key items that account for its recommendation
results. However, achieving both interpretability and recommendation
performance simultaneously is challenging, especially for models that take the
entire sequence of items as input without screening. In this paper, we propose
an interpretable framework (named PTSR) that enables a pattern-wise transparent
decision-making process without extra features. It breaks the sequence of items
into multi-level patterns that serve as atomic units throughout the
recommendation process. The contribution of each pattern to the outcome is
quantified in the probability space. With a carefully designed score correction
mechanism, the pattern contribution can be implicitly learned in the absence of
ground-truth key patterns. The final recommended items are those that most key
patterns strongly endorse. Extensive experiments on five public datasets
demonstrate remarkable recommendation performance, while statistical analysis
and case studies validate the model interpretability.
Authors' comments: This paper has been accepted by IEEE TKDE
Linara Adilova, Maksym Andriushchenko, Michael Kamp, Asja Fischer, Martin Jaggi
Averaging neural network parameters is an intuitive method for fusing the
knowledge of two independent models. It is most prominently used in federated
learning. If models are averaged at the end of training, this can only lead to
a good performing model if the loss surface of interest is very particular,
i.e., the loss in the midpoint between the two models needs to be sufficiently
low. This is impossible to guarantee for the non-convex losses of
state-of-the-art networks. For averaging models trained on vastly different
datasets, it was proposed to average only the parameters of particular layers
or combinations of layers, resulting in better performing models. To get a
better understanding of the effect of layer-wise averaging, we analyse the
performance of the models that result from averaging single layers, or groups
of layers. Based on our empirical and theoretical investigation, we introduce a
novel notion of the layer-wise linear connectivity, and show that deep networks
do not have layer-wise barriers between them.
Authors' comments: published at ICLR24