Takato Yasuno, Junichiro Fujii, Masazumi Amakata
Urban rivers provide a water environment that influences residential living.
River surface monitoring has become crucial for making decisions about where to
prioritize cleaning and when to automatically start the cleaning treatment. We
focus on the organic mud, or "scum", that accumulates on the river's surface
and contributes to the river's odor and has external economic effects on the
landscape. Because of its feature of a sparsely distributed and unstable
pattern of organic shape, automating the monitoring process has proved
difficult. We propose a patch-wise classification pipeline to detect scum
features on the river surface using mixture image augmentation to increase the
diversity between the scum floating on the river and the entangled background
on the river surface reflected by nearby structures like buildings, bridges,
poles, and barriers. Furthermore, we propose a scum-index cover on rivers to
help monitor worse grade online, collect floating scum, and decide on chemical
treatment policies. Finally, we demonstrate the application of our method on a
time series dataset with frames every ten minutes recording river scum events
over several days. We discuss the significance of our pipeline and its
experimental findings.
Authors' comments: 15 figures, 3 table
Jinyi Hu, Xiaoyuan Yi, Wenhao Li, Maosong Sun, Xing Xie
The past several years have witnessed Variational Auto-Encoder's superiority
in various text generation tasks. However, due to the sequential nature of the
text, auto-regressive decoders tend to ignore latent variables and then reduce
to simple language models, known as the KL vanishing problem, which would
further deteriorate when VAE is combined with Transformer-based structures. To
ameliorate this problem, we propose DELLA, a novel variational Transformer
framework. DELLA learns a series of layer-wise latent variables with each
inferred from those of lower layers and tightly coupled with the hidden states
by low-rank tensor product. In this way, DELLA forces these posterior latent
variables to be fused deeply with the whole computation path and hence
incorporate more information. We theoretically demonstrate that our method can
be regarded as entangling latent variables to avoid posterior information
decrease through layers, enabling DELLA to get higher non-zero KL values even
without any annealing or thresholding tricks. Experiments on four unconditional
and three conditional generation tasks show that DELLA could better alleviate
KL vanishing and improve both quality and diversity compared to several strong
baselines.
Authors' comments: NAACL 2022
Lalita Kumari, Sukhdeep Singh, VVS Rathore, Anuj Sharma
Cursive handwritten text recognition is a challenging research problem in the domain of pattern recognition. The current state-of-the-art approaches include models based on convolutional recurrent neural networks and multi-dimensional long short-term memory recurrent neural networks techniques. These methods are highly computationally extensive as well model is complex at design level. In recent studies, combination of convolutional neural network and gated convolutional neural networks based models demonstrated less number of parameters in comparison to convolutional recurrent neural networks based models. In the direction to reduced the total number of parameters to be trained, in this work, we have used depthwise convolution in place of standard convolutions with a combination of gated-convolutional neural network and bidirectional gated recurrent unit to reduce the total number of parameters to be trained. Additionally, we have also included a lexicon based word beam search decoder at testing step. It also helps in improving the the overall accuracy of the model. We have obtained 3.84% character error rate and 9.40% word error rate on IAM dataset; 4.88% character error rate and 14.56% word error rate in George Washington dataset, respectively.
Chanyong Jung, Joonhyung Lee, Sunkyoung You, Jong Chul Ye
The acquisition conditions for low-dose and high-dose CT images are usually
different, so that the shifts in the CT numbers often occur. Accordingly,
unsupervised deep learning-based approaches, which learn the target image
distribution, often introduce CT number distortions and result in detrimental
effects in diagnostic performance. To address this, here we propose a novel
unsupervised learning approach for lowdose CT reconstruction using patch-wise
deep metric learning. The key idea is to learn embedding space by pulling the
positive pairs of image patches which shares the same anatomical structure, and
pushing the negative pairs which have same noise level each other. Thereby, the
network is trained to suppress the noise level, while retaining the original
global CT number distributions even after the image translation. Experimental
results confirm that our deep metric learning plays a critical role in
producing high quality denoised images without CT number shift.
Authors' comments: MICCAI 2022
Dabao Wang, Hang Feng, Siwei Wu, Yajin Zhou, Lei Wu, Xingliang Yuan
The prosperity of decentralized finance motivates many investors to profit
via trading their crypto assets on decentralized applications (DApps for short)
of the Ethereum ecosystem. Apart from Ether (the native cryptocurrency of
Ethereum), many ERC20 (a widely used token standard on Ethereum) tokens obtain
vast market value in the ecosystem. Specifically, the approval mechanism is
used to delegate the privilege of spending users' tokens to DApps. By doing so,
the DApps can transfer these tokens to arbitrary receivers on behalf of the
users. To increase the usability, unlimited approval is commonly adopted by
DApps to reduce the required interaction between them and their users. However,
as shown in existing security incidents, this mechanism can be abused to steal
users' tokens.
In this paper, we present the first systematic study to quantify the risk of
unlimited approval of ERC20 tokens on Ethereum. Specifically, by evaluating
existing transactions up to 31st July 2021, we find that unlimited approval is
prevalent (60%, 15.2M/25.4M) in the ecosystem, and 22% of users have a high
risk of their approved tokens for stealing. After that, we investigate the
security issues that are involved in interacting with the UIs of 22
representative DApps and 9 famous wallets to prepare the approval transactions.
The result reveals the worrisome fact that all DApps request unlimited approval
from the front-end users and only 10% (3/31) of UIs provide explanatory
information for the approval mechanism. Meanwhile, only 16% (5/31) of UIs allow
users to modify their approval amounts. Finally, we take a further step to
characterize the user behavior into five modes and formalize the good practice,
i.e., on-demand approval and timely spending, towards securely spending
approved tokens. However, the evaluation result suggests that only 0.2% of
users follow the good practice to mitigate the risk.
Authors' comments: 16 pages 12 figures Conferences: The 25th International Symposium on
Research in Attacks, Intrusions and Defenses (RAID 2022), October 26--28,
2022, Limassol, Cyprus
Menglong Zhang, Tao Feng
For a positive integer $d\geq 2$, a family $\mathcal F\subseteq \binom{[n]}{k}$ is said to be d-wise intersecting if $|F_1\cap F_2\cap \dots\cap F_d|\geq 1$ for all $F_1, F_2, \dots ,F_d\in \mathcal F$. A d-wise intersecting family $\mathcal F\subseteq \binom{[n]}{k}$ is called maximal if $\mathcal F\cup\{A\}$ is not d-wise intersecting for any $A\in\binom{[n]}{k}\setminus\mathcal F$. We provide a refinement of O'Neill and Verstra\"{e}te's Theorem about the structure of the largest and the second largest maximal non-trivial d-wise intersecting k-uniform families. We also determine the structure of the third largest and the fourth largest maximal non-trivial d-wise intersecting k-uniform families for any $k>d+1\geq 4$, and the fifth largest and the sixth largest maximal non-trivial 3-wise intersecting k-uniform families for any $k\geq 5$, in the asymptotic sense. Our proofs are applications of the $\Delta$-system method.
N. Lodieu, M. R. Zapatero Osorio, E. L. Martin, R. Rebolo Lopez, B. Gauza
Our goal is to characterise the physical properties of the metal-poor brown
dwarf population. In particular, we focus on the recently discovered peculiar
dwarf WISE J1810055$-$1010023.
We collected optical iz and near-infrared J-band imaging on multiple
occasions over 1.5 years to derive accurate trigonometric parallax and proper
motion of the metal-depleted ultra-cool dwarf candidate WISE1810. We also
acquired low-resolution optical spectroscopy (0.6$-$1.0 $\mu$m) and new
infrared (0.9$-$1.3 $\mu$m) spectra of WISE1810 that were combined with our
photometry, other existing data from the literature and our trigonometric
distance to determine the object's luminosity from the integration of the
observed spectral energy distribution covering from 0.6 through 16$\mu$m. We
compared the full optical and infrared spectrum with state-of-the-art
atmosphere models to further constrain its effective temperature, surface
gravity and metallicity.
WISE1810 is detected in the $iz$ bands with AB magnitudes of
$i$=23.871$\pm$0.104 and $z$=20.147$\pm$0.083 mag in the PanSTARRS system. It
does not show any obvious photometric variability beyond 0.1$-$0.2 mag in any
of the $z$- and $J$-band filters. The very red $z-J \approx 2.9$ mag colour is
compatible with an ultra-cool dwarf nature. Fitting for parallax and proper
motion, we measure a trigonometric parallax of 112.5$^{+8.1}_{-8.0}$ mas for
WISE1810, placing the object at only 8.9$^{+0.7}_{-0.6}$ pc, about three times
closer than previously thought. We employed Monte Carlo methods to estimate the
error on the parallax and proper motion. The object's luminosity was determined
at log$L/L_\odot$=$-$5.78$\pm$0.11 dex. From the comparison to atmospheric
models, we infer a likely metallicity of [Fe/H] $\approx -1.5$ and an effective
temperature cooler than 1000K.
Abridged
Authors' comments: 12 pages, 15 figures, 6 tables, accepted for publications to A&A
Silpa Babu, Sajan Goud Lingala, Namrata Vaswani
This work develops a fast, memory-efficient, and general algorithm for
accelerated/undersampled dynamic MRI by assuming an approximate LR model on the
matrix formed by the vectorized images of the sequence. By general, we mean
that our algorithm can be used for multiple accelerated dynamic MRI
applications and multiple sampling rates (acceleration rates) and patterns with
a single choice of parameters (no parameter tuning). We show that our proposed
algorithms, alternating Gradient Descent (GD) and minimization for MRI
(altGDmin-MRI and altGDmin-MRI2), outperform many existing approaches while
also being faster than all of them, on average. This claim is based on
comparisons on 8 different retrospectively undersampled single- or multi-coil
dynamic MRI applications, undersampled using either 1D Cartesian or 2D
pseudo-radial undersampling at multiple sampling rates. All comparisons used
the same set of algorithm parameters. Our second contribution is a mini-batch
and a fully online extension that can process new measurements and return
reconstructions either as soon as measurements of a new image frame arrive, or
after a short delay.
Authors' comments: I have a duplication submission in arXiv (arXiv:2212.09664)
Byeonggeun Kim, Seunghan Yang, Jangho Kim, Hyunsin Park, Juntae Lee, Simyung Chang
While using two-dimensional convolutional neural networks (2D-CNNs) in image
processing, it is possible to manipulate domain information using channel
statistics, and instance normalization has been a promising way to get
domain-invariant features. However, unlike image processing, we analyze that
domain-relevant information in an audio feature is dominant in frequency
statistics rather than channel statistics. Motivated by our analysis, we
introduce Relaxed Instance Frequency-wise Normalization (RFN): a plug-and-play,
explicit normalization module along the frequency axis which can eliminate
instance-specific domain discrepancy in an audio feature while relaxing
undesirable loss of useful discriminative information. Empirically, simply
adding RFN to networks shows clear margins compared to previous domain
generalization approaches on acoustic scene classification and yields improved
robustness for multiple audio devices. Especially, the proposed RFN won the
DCASE2021 challenge TASK1A, low-complexity acoustic scene classification with
multiple devices, with a clear margin, and RFN is an extended work of our
technical report.
Authors' comments: Proceedings of INTERSPEECH 2022
József Balogh, Ce Chen, Haoran Luo
A family $\mathcal{F}$ on ground set $\{1,2,\ldots, n\}$ is maximal $k$-wise
intersecting if every collection of $k$ sets in $\mathcal{F}$ has non-empty
intersection, and no other set can be added to $\mathcal{F}$ while maintaining
this property. Erd\H{o}s and Kleitman asked for the minimum size of a maximal
$k$-wise intersecting family. Complementing earlier work of Hendrey, Lund,
Tompkins and Tran, who answered this question for $k=3$ and large even $n$, we
answer it for $k=3$ and large odd $n$. We show that the unique minimum family
is obtained by partitioning the ground set into two sets $A$ and $B$ with
almost equal sizes and taking the family consisting of all the proper supersets
of $A$ and of $B$. A key ingredient of our proof is the stability result by
Ellis and Sudakov about the so-called $2$-generator set systems.
Authors' comments: 11 pages
Matteo Risso, Alessio Burrello, Luca Benini, Enrico Macii, Massimo Poncino, Daniele Jahier Pagliari
Quantization is widely employed in both cloud and edge systems to reduce the memory occupation, latency, and energy consumption of deep neural networks. In particular, mixed-precision quantization, i.e., the use of different bit-widths for different portions of the network, has been shown to provide excellent efficiency gains with limited accuracy drops, especially with optimized bit-width assignments determined by automated Neural Architecture Search (NAS) tools. State-of-the-art mixed-precision works layer-wise, i.e., it uses different bit-widths for the weights and activations tensors of each network layer. In this work, we widen the search space, proposing a novel NAS that selects the bit-width of each weight tensor channel independently. This gives the tool the additional flexibility of assigning a higher precision only to the weights associated with the most informative features. Testing on the MLPerf Tiny benchmark suite, we obtain a rich collection of Pareto-optimal models in the accuracy vs model size and accuracy vs energy spaces. When deployed on the MPIC RISC-V edge processor, our networks reduce the memory and energy for inference by up to 63% and 27% respectively compared to a layer-wise approach, for the same accuracy.
Laura Baldelli, Simone Ciani, Igor I. Skrypnik, Vincenzo Vespri
In this brief note we discuss local H\"older continuity for solutions to
anisotropic elliptic equations of the type $ \sum_{i=1}^s \partial_{ii} u+
\sum_{i=s+1}^N \partial_i \bigg(A_i(x,u,\nabla u) \bigg) =0,$ for $x \in \Omega
\subset \subset \mathbb{R}^N$ and $1\leq s \leq N-1$, where each operator $A_i$
behaves directionally as the singular $p$-Laplacian, $1< p < 2$ and the
supercritical condition $p+(N-s)(p-2)>0$ holds true. We show that the Harnack
inequality can be proved without the continuity of solutions and that in turn
this implies H\"older continuity of solutions.
Authors' comments: 17 pages
Roberto J. Assef, Franz E. Bauer, Andrew W. Blain, Murray Brightman, Tanio Díaz-Santos, Peter R. M. Eisenhardt, Hyunsung D. Jun, Daniel Stern et al.
We report on VLT/FORS2 imaging polarimetry observations in the $R_{\rm
special}$ band of WISE J011601.41-050504.0 (W0116-0505), a heavily obscured
hyper-luminous quasar at $z=3.173$ classified as a Hot, Dust-Obscured Galaxy
(Hot DOG) based on its mid-IR colors. Recently, Assef et al. (2020) identified
W0116-0505 as having excess rest-frame optical/UV emission, and concluded this
excess emission is most likely scattered light from the heavily obscured AGN.
We find that the broad-band rest-frame UV flux is strongly linearly polarized
(10.8$\pm$1.9\%, with a polarization angle of 74$\pm$9~deg), confirming this
conclusion. We analyze these observations in the context of a simple model
based on scattering either by free electrons or by optically thin dust,
assuming a classical dust torus with polar openings. Both can replicate the
degree of polarization and the luminosity of the scattered component for a
range of geometries and column densities, but we argue that optically thin dust
in the ISM is the more likely scenario. We also explore the possibility that
the scattering medium corresponds to an outflow recently identified for
W0116-0505. This is a feasible option if the outflow component is bi-conical
with most of the scattering occurring at the base of the receding outflow. In
this scenario the quasar would still be obscured even if viewed face on, but
might appear as a reddened type 1 quasar once the outflow has expanded. We
discuss a possible connection between blue-excess Hot DOGs, extremely red
quasars (ERQs), reddened type 1 quasars, and unreddened quasars that depends on
a combination of evolution and viewing geometry.
Authors' comments: 19 pages, 10 figures, 2 tables. Resubmitted to ApJ after first round
of referee comments
Imke Botha, Matthew P. Adams, Dang Khuong Tran, Frederick R. Bennett, Christopher Drovandi
The ensemble Kalman filter (EnKF) is a Monte Carlo approximation of the Kalman filter for high dimensional linear Gaussian state space models. EnKF methods have also been developed for parameter inference of static Bayesian models with a Gaussian likelihood, in a way that is analogous to likelihood tempering sequential Monte Carlo (SMC). These methods are commonly referred to as ensemble Kalman inversion (EKI). Unlike SMC, the inference from EKI is only asymptotically unbiased if the likelihood is linear Gaussian and the priors are Gaussian. However, EKI is significantly faster to run. Currently, a large limitation of EKI methods is that the covariance of the measurement error is assumed to be fully known. We develop a new method, which we call component-wise iterative ensemble Kalman inversion (CW-IEKI), that allows elements of the covariance matrix to be inferred alongside the model parameters at negligible extra cost. This novel method is compared to SMC on three different application examples: a model of nitrogen mineralisation in soil that is based on the Agricultural Production Systems Simulator (APSIM), a model predicting seagrass decline due to stress from water temperature and light, and a model predicting coral calcification rates. On all of these examples, we find that CW-IEKI has relatively similar predictive performance to SMC, albeit with greater uncertainty, and it has a significantly faster run time.
Fatih Furkan Yilmaz, Reinhard Heckel
The risk of overparameterized models, in particular deep neural networks, is
often double-descent shaped as a function of the model size. Recently, it was
shown that the risk as a function of the early-stopping time can also be
double-descent shaped, and this behavior can be explained as a super-position
of bias-variance tradeoffs. In this paper, we show that the risk of explicit
L2-regularized models can exhibit double descent behavior as a function of the
regularization strength, both in theory and practice. We find that for linear
regression, a double descent shaped risk is caused by a superposition of
bias-variance tradeoffs corresponding to different parts of the model and can
be mitigated by scaling the regularization strength of each part appropriately.
Motivated by this result, we study a two-layer neural network and show that
double descent can be eliminated by adjusting the regularization strengths for
the first and second layer. Lastly, we study a 5-layer CNN and ResNet-18
trained on CIFAR-10 with label noise, and CIFAR-100 without label noise, and
demonstrate that all exhibit double descent behavior as a function of the
regularization strength.
Authors' comments: To be published in the 2022 IEEE International Symposium on
Information Theory (ISIT) Proceedings
Qiwen Cui, Simon S. Du
This paper considers offline multi-agent reinforcement learning. We propose
the strategy-wise concentration principle which directly builds a confidence
interval for the joint strategy, in contrast to the point-wise concentration
principle that builds a confidence interval for each point in the joint action
space. For two-player zero-sum Markov games, by exploiting the convexity of the
strategy-wise bonus, we propose a computationally efficient algorithm whose
sample complexity enjoys a better dependency on the number of actions than the
prior methods based on the point-wise bonus. Furthermore, for offline
multi-agent general-sum Markov games, based on the strategy-wise bonus and a
novel surrogate function, we give the first algorithm whose sample complexity
only scales $\sum_{i=1}^mA_i$ where $A_i$ is the action size of the $i$-th
player and $m$ is the number of players. In sharp contrast, the sample
complexity of methods based on the point-wise bonus would scale with the size
of the joint action space $\Pi_{i=1}^m A_i$ due to the curse of multiagents.
Lastly, all of our algorithms can naturally take a pre-specified strategy class
$\Pi$ as input and output a strategy that is close to the best strategy in
$\Pi$. In this setting, the sample complexity only scales with $\log |\Pi|$
instead of $\sum_{i=1}^mA_i$.
Authors' comments: 34 pages; accepted by NeurIPS 2022
Dongjie Wang, Yanjie Fu, Kunpeng Liu, Xiaolin Li, Yan Solihin
Representation (feature) space is an environment where data points are
vectorized, distances are computed, patterns are characterized, and geometric
structures are embedded. Extracting a good representation space is critical to
address the curse of dimensionality, improve model generalization, overcome
data sparsity, and increase the availability of classic models. Existing
literature, such as feature engineering and representation learning, is limited
in achieving full automation (e.g., over heavy reliance on intensive labor and
empirical experiences), explainable explicitness (e.g., traceable
reconstruction process and explainable new features), and flexible optimal
(e.g., optimal feature space reconstruction is not embedded into downstream
tasks). Can we simultaneously address the automation, explicitness, and optimal
challenges in representation space reconstruction for a machine learning task?
To answer this question, we propose a group-wise reinforcement generation
perspective. We reformulate representation space reconstruction into an
interactive process of nested feature generation and selection, where feature
generation is to generate new meaningful and explicit features, and feature
selection is to eliminate redundant features to control feature sizes. We
develop a cascading reinforcement learning method that leverages three
cascading Markov Decision Processes to learn optimal generation policies to
automate the selection of features and operations and the feature crossing. We
design a group-wise generation strategy to cross a feature group, an operation,
and another feature group to generate new features and find the strategy that
can enhance exploration efficiency and augment reward signals of cascading
agents. Finally, we present extensive experiments to demonstrate the
effectiveness, efficiency, traceability, and explicitness of our system.
Authors' comments: KDD 2022
Biprateep Dey, David Zhao, Brett H. Andrews, Jeffrey A. Newman, Rafael Izbicki, Ann B. Lee
Key science questions, such as galaxy distance and weather forecasting, often
require knowing the full predictive distribution of a target variable $y$ given
complex inputs $\mathbf{x}$. Despite recent advances in machine learning and
physics-based models, it remains challenging to assess whether an initial model
is calibrated for all $\mathbf{x}$, and when needed, to reshape the densities
of $y$ toward "instance-wise" calibration. This paper introduces the LADaR
(Local Amortized Diagnostics and Reshaping of Conditional Densities) framework
and proposes a new computationally efficient algorithm ($\texttt{Cal-PIT}$)
that produces interpretable local diagnostics and provides a mechanism for
adjusting conditional density estimates (CDEs). $\texttt{Cal-PIT}$ learns a
single interpretable local probability--probability (optimal transport) map
from calibration data that identifies where and how the initial model is
miscalibrated across feature space, which can be used to morph CDEs such that
they are well-calibrated. We illustrate the LADaR framework on synthetic
examples, including probabilistic forecasting from image sequences, akin to
predicting storm wind speed from satellite imagery. Our main science
application involves estimating the probability density functions of galaxy
distances given photometric data, where $\texttt{Cal-PIT}$ achieves better
instance-wise calibration than all 11 other literature methods in a benchmark
data challenge, demonstrating its utility for next-generation cosmological
analyses.
Authors' comments: Code available as a Python package
https://github.com/lee-group-cmu/Cal-PIT
Xu Chen, Qiu Qiu, Changshan Li, Kunqing Xie
In recent years, the emergence and development of third-party platforms have
greatly facilitated the growth of the Online to Offline (O2O) business.
However, the large amount of transaction data raises new challenges for
retailers, especially anomaly detection in operating conditions. Thus,
platforms begin to develop intelligent business assistants with embedded
anomaly detection methods to reduce the management burden on retailers.
Traditional time-series anomaly detection methods capture underlying patterns
from the perspectives of time and attributes, ignoring the difference between
retailers in this scenario. Besides, similar transaction patterns extracted by
the platforms can also provide guidance to individual retailers and enrich
their available information without privacy issues. In this paper, we pose an
entity-wise multivariate time-series anomaly detection problem that considers
the time-series of each unique entity. To address this challenge, we propose
GraphAD, a novel multivariate time-series anomaly detection model based on the
graph neural network. GraphAD decomposes the Key Performance Indicator (KPI)
into stable and volatility components and extracts their patterns in terms of
attributes, entities and temporal perspectives via graph neural networks. We
also construct a real-world entity-wise multivariate time-series dataset from
the business data of Ele.me. The experimental results on this dataset show that
GraphAD significantly outperforms existing anomaly detection methods.
Authors' comments: SIGIR'22 Short Paper
Akiyoshi Sannai, Yasunari Hikima, Ken Kobayashi, Akinori Tanaka, Naoki Hamada
In this paper, we propose a strategy to construct a multi-objective optimization algorithm from a single-objective optimization algorithm by using the B\'ezier simplex model. Also, we extend the stability of optimization algorithms in the sense of Probability Approximately Correct (PAC) learning and define the PAC stability. We prove that it leads to an upper bound on the generalization with high probability. Furthermore, we show that multi-objective optimization algorithms derived from a gradient descent-based single-objective optimization algorithm are PAC stable. We conducted numerical experiments and demonstrated that our method achieved lower generalization errors than the existing multi-objective optimization algorithm.