Yule Wang, Xin Xin, Yue Ding, Yunzhe Li, Dong Wang
Recommender system based on historical user-item interactions is of vital importance for web-based services. However, the observed data used to train the recommender model suffers from severe bias issues. Practically, the item frequency distribution of the dataset is a highly skewed power-law distribution. Interactions of a small fraction of head items account for almost the whole training data. The normal training paradigm from such biased data tends to repetitively generate recommendations from the head items, which further exacerbates the biases and affects the exploration of potentially interesting items from the niche set. In this work, we innovatively explore the central theme of recommendation debiasing from an item cluster-wise multi-objective optimization perspective. Aiming to balance the learning on various item clusters that differ in popularity during the training process, we propose a model-agnostic framework namely Item Cluster-Wise Pareto-Efficient Recommendation (ICPE). In detail, we define our item cluster-wise optimization target as the recommender model should balance all item clusters that differ in popularity, thus we set the model learning on each item cluster as a unique optimization objective. To achieve this goal, we first explore items' popularity levels from a novel causal reasoning perspective. Then, we devise popularity discrepancy-based bisecting clustering to separate the item clusters. Next, we adaptively find the overall harmonious gradient direction for cluster-wise optimization objectives from a Pareto-efficient solver. Finally, in the prediction stage, we perform counterfactual inference to further eliminate the impact of global propensity. Extensive experimental results verify the superiorities of ICPE on overall recommendation performance and biases elimination.
Negin Ghamsarian, Mario Taschwer, Doris Putzgruber-Adamitsch, Stephanie Sarny, Yosuf El-Shabrawi, Klaus Schoeffmann
Semantic segmentation in surgical videos is a prerequisite for a broad range
of applications towards improving surgical outcomes and surgical video
analysis. However, semantic segmentation in surgical videos involves many
challenges. In particular, in cataract surgery, various features of the
relevant objects such as blunt edges, color and context variation, reflection,
transparency, and motion blur pose a challenge for semantic segmentation. In
this paper, we propose a novel convolutional module termed as \textit{ReCal}
module, which can calibrate the feature maps by employing region
intra-and-inter-dependencies and channel-region cross-dependencies. This
calibration strategy can effectively enhance semantic representation by
correlating different representations of the same semantic label, considering a
multi-angle local view centering around each pixel. Thus the proposed module
can deal with distant visual characteristics of unique objects as well as
cross-similarities in the visual characteristics of different objects.
Moreover, we propose a novel network architecture based on the proposed module
termed as ReCal-Net. Experimental results confirm the superiority of ReCal-Net
compared to rival state-of-the-art approaches for all relevant objects in
cataract surgery. Moreover, ablation studies reveal the effectiveness of the
ReCal module in boosting semantic segmentation accuracy.
Authors' comments: 12 pages, 5 figures, accepted at the 28th International Conference on
Neural Information Processing (ICONIP), 2021
Lianbo Ma, Nan Li, Guo Yu, Xiaoyu Geng, Min Huang, Xingwei Wang
In the deployment of deep neural models, how to effectively and automatically find feasible deep models under diverse design objectives is fundamental. Most existing neural architecture search (NAS) methods utilize surrogates to predict the detailed performance (e.g., accuracy and model size) of a candidate architecture during the search, which however is complicated and inefficient. In contrast, we aim to learn an efficient Pareto classifier to simplify the search process of NAS by transforming the complex multi-objective NAS task into a simple Pareto-dominance classification task. To this end, we propose a classification-wise Pareto evolution approach for one-shot NAS, where an online classifier is trained to predict the dominance relationship between the candidate and constructed reference architectures, instead of using surrogates to fit the objective functions. The main contribution of this study is to change supernet adaption into a Pareto classifier. Besides, we design two adaptive schemes to select the reference set of architectures for constructing classification boundary and regulate the rate of positive samples over negative ones, respectively. We compare the proposed evolution approach with state-of-the-art approaches on widely-used benchmark datasets, and experimental results indicate that the proposed approach outperforms other approaches and have found a number of neural architectures with different model sizes ranging from 2M to 6M under diverse objectives and constraints.
Yang Zhang, Yao Wang, Zhi Han, Xi'ai Chen, Yandong Tang
In recent years, there have been an increasing number of applications of tensor completion based on the tensor train (TT) format because of its efficiency and effectiveness in dealing with higher-order tensor data. However, existing tensor completion methods using TT decomposition have two obvious drawbacks. One is that they only consider mode weights according to the degree of mode balance, even though some elements are recovered better in an unbalanced mode. The other is that serious blocking artifacts appear when the missing element rate is relatively large. To remedy such two issues, in this work, we propose a novel tensor completion approach via the element-wise weighted technique. Accordingly, a novel formulation for tensor completion and an effective optimization algorithm, called as tensor completion by parallel weighted matrix factorization via tensor train (TWMac-TT), is proposed. In addition, we specifically consider the recovery quality of edge elements from adjacent blocks. Different from traditional reshaping and ket augmentation, we utilize a new tensor augmentation technique called overlapping ket augmentation, which can further avoid blocking artifacts. We then conduct extensive performance evaluations on synthetic data and several real image data sets. Our experimental results demonstrate that the proposed algorithm TWMac-TT outperforms several other competing tensor completion methods.
Ryoya Katafuchi, Terumasa Tokunaga
The utilization of prior knowledge about anomalies is an essential issue for anomaly detections. Recently, the visual attention mechanism has become a promising way to improve the performance of CNNs for some computer vision tasks. In this paper, we propose a novel model called Layer-wise External Attention Network (LEA-Net) for efficient image anomaly detection. The core idea relies on the integration of unsupervised and supervised anomaly detectors via the visual attention mechanism. Our strategy is as follows: (i) Prior knowledge about anomalies is represented as the anomaly map generated by unsupervised learning of normal instances, (ii) The anomaly map is translated to an attention map by the external network, (iii) The attention map is then incorporated into intermediate layers of the anomaly detection network. Notably, this layer-wise external attention can be applied to any CNN model in an end-to-end training manner. For a pilot study, we validate LEA-Net on color anomaly detection tasks. Through extensive experiments on PlantVillage, MVTec AD, and Cloud datasets, we demonstrate that the proposed layer-wise visual attention mechanism consistently boosts anomaly detection performances of an existing CNN model, even on imbalanced datasets. Moreover, we show that our attention mechanism successfully boosts the performance of several CNN models.
Haonan Wang, Peng Cao, Jiaqi Wang, Osmar R. Zaiane
Most recent semantic segmentation methods adopt a U-Net framework with an
encoder-decoder architecture. It is still challenging for U-Net with a simple
skip connection scheme to model the global multi-scale context: 1) Not each
skip connection setting is effective due to the issue of incompatible feature
sets of encoder and decoder stage, even some skip connection negatively
influence the segmentation performance; 2) The original U-Net is worse than the
one without any skip connection on some datasets. Based on our findings, we
propose a new segmentation framework, named UCTransNet (with a proposed CTrans
module in U-Net), from the channel perspective with attention mechanism.
Specifically, the CTrans module is an alternate of the U-Net skip connections,
which consists of a sub-module to conduct the multi-scale Channel Cross fusion
with Transformer (named CCT) and a sub-module Channel-wise Cross-Attention
(named CCA) to guide the fused multi-scale channel-wise information to
effectively connect to the decoder features for eliminating the ambiguity.
Hence, the proposed connection consisting of the CCT and CCA is able to replace
the original skip connection to solve the semantic gaps for an accurate
automatic medical image segmentation. The experimental results suggest that our
UCTransNet produces more precise segmentation performance and achieves
consistent improvements over the state-of-the-art for semantic segmentation
across different datasets and conventional architectures involving transformer
or U-shaped framework. Code: https://github.com/McGregorWwww/UCTransNet.
Authors' comments: Accepted by AAAI 2022. Code is available at
https://github.com/McGregorWwww/UCTransNet
Ramiz Aktar, Li Xue, Tong Liu
We examine the properties of spiral shocks from a steady, adiabatic,
non-axisymmetric accretion disk around a compact star in binary. We first time
incorporate all the possible influences from binary through adopting the Roche
potential and Coriolis forces in the basic conservation equations. In this
paper, we assume the spiral shocks to be point-wise self-similar, and the flow
is in vertical hydrostatic equilibrium to simplify the study. We also
investigate the mass outflow due to the shock compression and apply it to the
accreting white dwarf in binary. We find that our model will be beneficial to
overcome the ad hoc assumption of optically thick wind generally used in the
studies of the progenitor of supernovae Ia.
Authors' comments: 17 pages, 7 figures, 1 appendix. Accepted for publication in ApJ
Gushu Li, Anbang Wu, Yunong Shi, Ali Javadi-Abhari, Yufei Ding, Yuan Xie
The quantum simulation kernel is an important subroutine appearing as a very long gate sequence in many quantum programs. In this paper, we propose Paulihedral, a block-wise compiler framework that can deeply optimize this subroutine by exploiting high-level program structure and optimization opportunities. Paulihedral first employs a new Pauli intermediate representation that can maintain the high-level semantics and constraints in quantum simulation kernels. This naturally enables new large-scale optimizations that are hard to implement at the low gate-level. In particular, we propose two technology-independent instruction scheduling passes, and two technology-dependent code optimization passes which reconcile the circuit synthesis, gate cancellation, and qubit mapping stages of the compiler. Experimental results show that Paulihedral can outperform state-of-the-art compiler infrastructures in a wide-range of applications on both near-term superconducting quantum processors and future fault-tolerant quantum computers.
Janine Witte, Ronja Foraita, Vanessa Didelez
Causal discovery algorithms estimate causal graphs from observational data.
This can provide a valuable complement to analyses focussing on the causal
relation between individual treatment-outcome pairs. Constraint-based causal
discovery algorithms rely on conditional independence testing when building the
graph. Until recently, these algorithms have been unable to handle missing
values. In this paper, we investigate two alternative solutions: Test-wise
deletion and multiple imputation. We establish necessary and sufficient
conditions for the recoverability of causal structures under test-wise
deletion, and argue that multiple imputation is more challenging in the context
of causal discovery than for estimation. We conduct an extensive comparison by
simulating from benchmark causal graphs: As one might expect, we find that
test-wise deletion and multiple imputation both clearly outperform list-wise
deletion and single imputation. Crucially, our results further suggest that
multiple imputation is especially useful in settings with a small number of
either Gaussian or discrete variables, but when the dataset contains a mix of
both neither method is uniformly best. The methods we compare include random
forest imputation and a hybrid procedure combining test-wise deletion and
multiple imputation. An application to data from the IDEFICS cohort study on
diet- and lifestyle-related diseases in European children serves as an
illustrating example.
Authors' comments: 38 pages, 11 figures
Ke Wang, Jonathan I Tamir, Alfredo De Goyeneche, Uri Wollner, Rafi Brada, Stella Yu, Michael Lustig
Purpose: To improve reconstruction fidelity of fine structures and textures
in deep learning (DL) based reconstructions.
Methods: A novel patch-based Unsupervised Feature Loss (UFLoss) is proposed
and incorporated into the training of DL-based reconstruction frameworks in
order to preserve perceptual similarity and high-order statistics. The UFLoss
provides instance-level discrimination by mapping similar instances to similar
low-dimensional feature vectors and is trained without any human annotation. By
adding an additional loss function on the low-dimensional feature space during
training, the reconstruction frameworks from under-sampled or corrupted data
can reproduce more realistic images that are closer to the original with finer
textures, sharper edges, and improved overall image quality. The performance of
the proposed UFLoss is demonstrated on unrolled networks for accelerated 2D and
3D knee MRI reconstruction with retrospective under-sampling. Quantitative
metrics including NRMSE, SSIM, and our proposed UFLoss were used to evaluate
the performance of the proposed method and compare it with others.
Results: In-vivo experiments indicate that adding the UFLoss encourages
sharper edges and more faithful contrasts compared to traditional and
learning-based methods with pure l2 loss. More detailed textures can be seen in
both 2D and 3D knee MR images. Quantitative results indicate that
reconstruction with UFLoss can provide comparable NRMSE and a higher SSIM while
achieving a much lower UFLoss value.
Conclusion: We present UFLoss, a patch-based unsupervised learned feature
loss, which allows the training of DL-based reconstruction to obtain more
detailed texture, finer features, and sharper edges with higher overall image
quality under DL-based reconstruction frameworks.
Authors' comments: 35 pages, 13 figures
Jakob Filser, Karsten Reuter, Harald Oberhofer
The multipole-expansion (MPE) model is an implicit solvation model used to
efficiently incorporate solvent effects in quantum chemistry. Even within the
recent direct approach, the multipole basis used in MPE to express the
dielectric response still solves the electrostatic problem inefficiently or not
at all for solutes larger than $\approx 10$ non-hydrogen atoms. In existing MPE
parameterizations, the resulting systematic underestimation of the
electrostatic solute-solvent interaction is presently compensated for by a
systematic overestimation of non-electrostatic attractive interactions. Even
though the MPE model can thus reproduce experimental free energies of solvation
of small molecules remarkably well, the inherent error cancellation makes it
hard to assign physical meaning to the individual free energy terms in the
model, raising concerns about transferability. Here, we resolve this issue by
solving the electrostatic problem piece-wise in 3D regions centered around all
non-hydrogen nuclei of the solute, ensuring reliable convergence of the
multipole series. The resulting method, which we call MPE-$n$c, thus allows for
a much improved reproduction of the dielectric response of a medium to a
solute. Employing a reduced non-electrostatic model with a single free
parameter, in addition to the density isovalue defining the solvation cavity,
MPE-$n$c yields free energies of solvation of neutral, anionic and cationic
solutes in water in good agreement with experiment.
Authors' comments: Journal of Chemical Theory and Computation, Accepted for Publication
Tejas Dastane, Varun Rao, Kartik Shenoy, Devendra Vyavaharkar
This paper presents a novel technique for skin colour segmentation that
overcomes the limitations faced by existing techniques such as Colour Range
Thresholding. Skin colour segmentation is affected by the varied skin colours
and surrounding lighting conditions, leading to poorskin segmentation for many
techniques. We propose a new two stage Pixel Neighbourhood technique that
classifies any pixel as skin or non-skin based on its neighbourhood pixels. The
first step calculates the probability of each pixel being skin by passing HSV
values of the pixel to a Deep Neural Network model. In the next step, it
calculates the likeliness of pixel being skin using these probabilities of
neighbouring pixels. This technique performs skin colour segmentation better
than the existing techniques.
Authors' comments: 5 pages
Keyang Wang, Lei Zhang, Wenli Song, Qinghai Lang, Lingyun Qin
The anchor-based detectors handle the problem of scale variation by building
the feature pyramid and directly setting different scales of anchors on each
cell in different layers. However, it is difficult for box-wise anchors to
guide the adaptive learning of scale-specific features in each layer because
there is no one-to-one correspondence between box-wise anchors and pixel-level
features. In order to alleviate the problem, in this paper, we propose a
scale-customized weak segmentation (SCWS) block at the pixel level for scale
customized object feature learning in each layer. By integrating the SCWS
blocks into the single-shot detector, a scale-aware object detector (SCOD) is
constructed to detect objects of different sizes naturally and accurately.
Furthermore, the standard location loss neglects the fact that the hard and
easy samples may be seriously imbalanced. A forthcoming problem is that it is
unable to get more accurate bounding boxes due to the imbalance. To address
this problem, an adaptive IoU (AIoU) loss via a simple yet effective squeeze
operation is specified in our SCOD. Extensive experiments on PASCAL VOC and MS
COCO demonstrate the superiority of our SCOD.
Authors' comments: To appear in IEEE International Conference on Image Processing 2021
Naser Damer, Noemie Spiller, Meiling Fang, Fadi Boutros, Florian Kirchbuchner, Arjan Kuijper
A face morphing attack image can be verified to multiple identities, making
this attack a major vulnerability to processes based on identity verification,
such as border checks. Various methods have been proposed to detect face
morphing attacks, however, with low generalizability to unexpected
post-morphing processes. A major post-morphing process is the print and scan
operation performed in many countries when issuing a passport or identity
document. In this work, we address this generalization problem by adapting a
pixel-wise supervision approach where we train a network to classify each pixel
of the image into an attack or not, rather than only having one label for the
whole image. Our pixel-wise morphing attack detection (PW-MAD) solution proved
to perform more accurately than a set of established baselines. More
importantly, PW-MAD shows high generalizability in comparison to related works,
when evaluated on unknown re-digitized attacks. Additionally to our PW-MAD
approach, we create a new face morphing attack dataset with digital and
re-digitized samples, namely the LMA-DRD dataset that is publicly available for
research purposes upon request.
Authors' comments: Accepted at the 16th International Symposium on Visual Computing
(ISVC 2021)
Md Amirul Islam, Matthew Kowal, Sen Jia, Konstantinos G. Derpanis, Neil D. B. Bruce
In this paper, we challenge the common assumption that collapsing the spatial
dimensions of a 3D (spatial-channel) tensor in a convolutional neural network
(CNN) into a vector via global pooling removes all spatial information.
Specifically, we demonstrate that positional information is encoded based on
the ordering of the channel dimensions, while semantic information is largely
not. Following this demonstration, we show the real world impact of these
findings by applying them to two applications. First, we propose a simple yet
effective data augmentation strategy and loss function which improves the
translation invariance of a CNN's output. Second, we propose a method to
efficiently determine which channels in the latent representation are
responsible for (i) encoding overall position information or (ii)
region-specific positions. We first show that semantic segmentation has a
significant reliance on the overall position channels to make predictions. We
then show for the first time that it is possible to perform a `region-specific'
attack, and degrade a network's performance in a particular part of the input.
We believe our findings and demonstrated applications will benefit research
areas concerned with understanding the characteristics of CNNs.
Authors' comments: ICCV 2021
Shubham Maheshwari, Khushbu Pahwa, Tavpritesh Sethi
Structure learning offers an expressive, versatile and explainable approach to causal and mechanistic modeling of complex biological data. We present wiseR, an open source application for learning, evaluating and deploying robust causal graphical models using graph neural networks and Bayesian networks. We demonstrate the utility of this application through application on for biomarker discovery in a COVID-19 clinical dataset.
Mingcheng Chen, Zhenghui Wang, Zhiyun Zhao, Weinan Zhang, Xiawei Guo, Jian Shen, Yanru Qu, Jieli Lu et al.
Diabetes prediction is an important data science application in the social
healthcare domain. There exist two main challenges in the diabetes prediction
task: data heterogeneity since demographic and metabolic data are of different
types, data insufficiency since the number of diabetes cases in a single
medical center is usually limited. To tackle the above challenges, we employ
gradient boosting decision trees (GBDT) to handle data heterogeneity and
introduce multi-task learning (MTL) to solve data insufficiency. To this end,
Task-wise Split Gradient Boosting Trees (TSGB) is proposed for the multi-center
diabetes prediction task. Specifically, we firstly introduce task gain to
evaluate each task separately during tree construction, with a theoretical
analysis of GBDT's learning objective. Secondly, we reveal a problem when
directly applying GBDT in MTL, i.e., the negative task gain problem. Finally,
we propose a novel split method for GBDT in MTL based on the task gain
statistics, named task-wise split, as an alternative to standard feature-wise
split to overcome the mentioned negative task gain problem. Extensive
experiments on a large-scale real-world diabetes dataset and a commonly used
benchmark dataset demonstrate TSGB achieves superior performance against
several state-of-the-art methods. Detailed case studies further support our
analysis of negative task gain problems and provide insightful findings. The
proposed TSGB method has been deployed as an online diabetes risk assessment
software for early diagnosis.
Authors' comments: 11 pages (2 pages of supplementary), 10 figures, 7 tables. Accepted
by ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2021)
Aditya Saini, Ranjitha Prasad
Albeit the tremendous performance improvements in designing complex
artificial intelligence (AI) systems in data-intensive domains, the black-box
nature of these systems leads to the lack of trustworthiness. Post-hoc
interpretability methods explain the prediction of a black-box ML model for a
single instance, and such explanations are being leveraged by domain experts to
diagnose the underlying biases of these models. Despite their efficacy in
providing valuable insights, existing approaches fail to deliver consistent and
reliable explanations. In this paper, we propose an active learning-based
technique called UnRAvEL (Uncertainty driven Robust Active Learning Based
Locally Faithful Explanations), which consists of a novel acquisition function
that is locally faithful and uses uncertainty-driven sampling based on the
posterior distribution on the probabilistic locality using Gaussian process
regression(GPR). We present a theoretical analysis of UnRAvEL by treating it as
a local optimizer and analyzing its regret in terms of instantaneous regrets
over a global optimizer. We demonstrate the efficacy of the local samples
generated by UnRAvEL by incorporating different kernels such as the Matern and
linear kernels in GPR. Through a series of experiments, we show that UnRAvEL
outperforms the baselines with respect to stability and local fidelity on
several real-world models and datasets. We show that UnRAvEL is an efficient
surrogate dataset generator by deriving importance scores on this surrogate
dataset using sparse linear models. We also showcase the sample efficiency and
flexibility of the developed framework on the Imagenet dataset using a
pre-trained ResNet model.
Authors' comments: To be published in the main track of AIES'22
Slavche Pejoski, Zoran Hadzi-Velkov, Robert Schober
We propose a novel transmission protocol for harvest-then-transmit wireless powered communication networks, which takes into account the non-linearity of the energy harvesting (EH) process at the EH users and maximizes the sum rate in the uplink. We assume a piece-wise linear energy harvesting model and provide expressions for the optimal transmit power of the base station (BS), the duration of the EH phase, and the duration of the uplink information transmission phases of the users. The obtained solution provides insight regarding the significance of the non-linear EH model on the optimal resource allocation. Simulations unveil the growing impact of the saturation effect, which occurs for high received radio frequency powers, as the average and the maximum instantaneous transmit powers of the BS increase.
Chenyu You, Yuan Zhou, Ruihan Zhao, Lawrence Staib, James S. Duncan
Automated segmentation in medical image analysis is a challenging task that
requires a large amount of manually labeled data. However, most existing
learning-based approaches usually suffer from limited manually annotated
medical data, which poses a major practical problem for accurate and robust
medical image segmentation. In addition, most existing semi-supervised
approaches are usually not robust compared with the supervised counterparts,
and also lack explicit modeling of geometric structure and semantic
information, both of which limit the segmentation accuracy. In this work, we
present SimCVD, a simple contrastive distillation framework that significantly
advances state-of-the-art voxel-wise representation learning. We first describe
an unsupervised training strategy, which takes two views of an input volume and
predicts their signed distance maps of object boundaries in a contrastive
objective, with only two independent dropout as mask. This simple approach
works surprisingly well, performing on the same level as previous fully
supervised methods with much less labeled data. We hypothesize that dropout can
be viewed as a minimal form of data augmentation and makes the network robust
to representation collapse. Then, we propose to perform structural distillation
by distilling pair-wise similarities. We evaluate SimCVD on two popular
datasets: the Left Atrial Segmentation Challenge (LA) and the NIH pancreas CT
dataset. The results on the LA dataset demonstrate that, in two types of
labeled ratios (i.e., 20% and 10%), SimCVD achieves an average Dice score of
90.85% and 89.03% respectively, a 0.91% and 2.22% improvement compared to
previous best results. Our method can be trained in an end-to-end fashion,
showing the promise of utilizing SimCVD as a general framework for downstream
tasks, such as medical image synthesis, enhancement, and registration.
Authors' comments: IEEE Transactions on Medical Imaging (IEEE-TMI) 2022