Ollie Ballinger
In the context of recent, highly destructive conflicts in Gaza and Ukraine, reliable estimates of building damage are essential for an informed public discourse, human rights monitoring, and humanitarian aid provision. Given the contentious nature of conflict damage assessment, these estimates must be fully reproducible, explainable, and derived from open access data. This paper introduces a new method for building damage detection-- the Pixel-Wise T-Test (PWTT)-- that satisfies these conditions. Using a combination of freely-available synthetic aperture radar imagery and statistical change detection, the PWTT generates accurate conflict damage estimates across a wide area at regular time intervals. Accuracy is assessed using an original dataset of over half a million labeled building footprints spanning 12 cities across Ukraine, Palestine, Syria, and Iraq. Despite being simple and lightweight, the algorithm achieves building-level accuracy statistics (AUC=0.88 across Ukraine, 0.81 in Gaza) rivalling state of the art methods that use deep learning and high resolution imagery. The workflow is open source and deployed entirely within the Google Earth Engine environment, allowing for the generation of interactive Battle Damage Dashboards for Ukraine and Gaza that update in near-real time, allowing the public and humanitarian practitioners to immediately get estimates of damaged buildings in a given area.
Lucas Gretta, William He, Angelos Pelecanos
We prove that the permutation computed by a reversible circuit with
$\tilde{O}(nk\cdot \log(1/\varepsilon))$ random $3$-bit gates is
$\varepsilon$-approximately $k$-wise independent. Our bound improves on
currently known bounds in the regime when the approximation error $\varepsilon$
is not too small. We obtain our results by analyzing the log-Sobolev constants
of appropriate Markov chains rather than their spectral gaps.
Authors' comments: 19 pages
Kumar Shubham, Aishwarya Jayagopal, Syed Mohammed Danish, Prathosh AP, Vaibhav Rajan
Cancer, a leading cause of death globally, occurs due to genomic changes and
manifests heterogeneously across patients. To advance research on personalized
treatment strategies, the effectiveness of various drugs on cells derived from
cancers (`cell lines') is experimentally determined in laboratory settings.
Nevertheless, variations in the distribution of genomic data and drug responses
between cell lines and humans arise due to biological and environmental
differences. Moreover, while genomic profiles of many cancer patients are
readily available, the scarcity of corresponding drug response data limits the
ability to train machine learning models that can predict drug response in
patients effectively. Recent cancer drug response prediction methods have
largely followed the paradigm of unsupervised domain-invariant representation
learning followed by a downstream drug response classification step.
Introducing supervision in both stages is challenging due to heterogeneous
patient response to drugs and limited drug response data. This paper addresses
these challenges through a novel representation learning method in the first
phase and weak supervision in the second. Experimental results on real patient
data demonstrate the efficacy of our method (WISER) over state-of-the-art
alternatives on predicting personalized drug response.
Authors' comments: ICML 2024
Matías Suazo, Erik Zackrisson, Priyatam K. Mahto, Fabian Lundell, Carl Nettelblad, Andreas J. Korn, Jason T. Wright, Suman Majumdar
The search for extraterrestrial intelligence is currently being pursued using
multiple techniques and in different wavelength bands. Dyson spheres,
megastructures that could be constructed by advanced civilizations to harness
the radiation energy of their host stars, represent a potential
technosignature, that in principle may be hiding in public data already
collected as part of large astronomical surveys. In this study, we present a
comprehensive search for partial Dyson spheres by analyzing optical and
infrared observations from Gaia, 2MASS, and WISE. We develop a pipeline that
employs multiple filters to identify potential candidates and reject
interlopers in a sample of five million objects, which incorporates a
convolutional neural network to help identify confusion in WISE data. Finally,
the pipeline identifies 7 candidates deserving of further analysis. All of
these objects are M-dwarfs, for which astrophysical phenomena cannot easily
account for the observed infrared excess emission.
Authors' comments: Accepted to be published in MNRAS
Ziyi Yin, Rafael Orozco, Felix J. Herrmann
We present a semi-amortized variational inference framework designed for computationally feasible uncertainty quantification in 2D full-waveform inversion to explore the multimodal posterior distribution without dimensionality reduction. The framework is called WISER, short for full-Waveform variational Inference via Subsurface Extensions with Refinements. WISER leverages the power of generative artificial intelligence to perform approximate amortized inference that is low-cost albeit showing an amortization gap. This gap is closed through non-amortized refinements that make frugal use of acoustic wave physics. Case studies illustrate that WISER is capable of full-resolution, computationally feasible, and reliable uncertainty estimates of velocity models and imaged reflectivities.
Daxin Li, Yuanchao Bai, Kai Wang, Junjun Jiang, Xianming Liu, Wen Gao
Transformer-based entropy models have gained prominence in recent years due
to their superior ability to capture long-range dependencies in probability
distribution estimation compared to convolution-based methods. However,
previous transformer-based entropy models suffer from a sluggish coding process
due to pixel-wise autoregression or duplicated computation during inference. In
this paper, we propose a novel transformer-based entropy model called
GroupedMixer, which enjoys both faster coding speed and better compression
performance than previous transformer-based methods. Specifically, our approach
builds upon group-wise autoregression by first partitioning the latent
variables into groups along spatial-channel dimensions, and then entropy coding
the groups with the proposed transformer-based entropy model. The global causal
self-attention is decomposed into more efficient group-wise interactions,
implemented using inner-group and cross-group token-mixers. The inner-group
token-mixer incorporates contextual elements within a group while the
cross-group token-mixer interacts with previously decoded groups. Alternate
arrangement of two token-mixers enables global contextual reference. To further
expedite the network inference, we introduce context cache optimization to
GroupedMixer, which caches attention activation values in cross-group
token-mixers and avoids complex and duplicated computation. Experimental
results demonstrate that the proposed GroupedMixer yields the state-of-the-art
rate-distortion performance with fast compression speed.
Authors' comments: Accepted by IEEE TCSVT
Jinming Cao, Sicheng Shen, Qiu Zhou, Yifang Yin, Yangyan Li, Roger Zimmermann
Photographing optoelectronic displays often introduces unwanted moir\'e
patterns due to analog signal interference between the pixel grids of the
display and the camera sensor arrays. This work identifies two problems that
are largely ignored by existing image demoir\'eing approaches: 1) moir\'e
patterns vary across different channels (RGB); 2) repetitive patterns are
constantly observed. However, employing conventional convolutional (CNN) layers
cannot address these problems. Instead, this paper presents the use of our
recently proposed \emph{Shape} concept. It was originally employed to model
consistent features from fragmented regions, particularly when identical or
similar objects coexist in an RGB-D image. Interestingly, we find that the
Shape information effectively captures the moir\'e patterns in artifact images.
Motivated by this discovery, we propose a new method, ShapeMoir\'e, for image
demoir\'eing. Beyond modeling shape features at the patch-level, we further
extend this to the global image-level and design a novel Shape-Architecture.
Consequently, our proposed method, equipped with both ShapeConv and
Shape-Architecture, can be seamlessly integrated into existing approaches
without introducing any additional parameters or computation overhead during
inference. We conduct extensive experiments on four widely used datasets, and
the results demonstrate that our ShapeMoir\'e achieves state-of-the-art
performance, particularly in terms of the PSNR metric.
Authors' comments: 19 pages
Shi-Yu Xia, Wenxuan Zhu, Xu Yang, Xin Geng
In practice, we usually need to build variable-sized models adapting for diverse resource constraints in different application scenarios, where weight initialization is an important step prior to training. The Learngene framework, introduced recently, firstly learns one compact part termed as learngene from a large well-trained model, after which learngene is expanded to initialize variable-sized models. In this paper, we start from analysing the importance of guidance for the expansion of well-trained learngene layers, inspiring the design of a simple but highly effective Learngene approach termed SWS (Stage-wise Weight Sharing), where both learngene layers and their learning process critically contribute to providing knowledge and guidance for initializing models at varying scales. Specifically, to learn learngene layers, we build an auxiliary model comprising multiple stages where the layer weights in each stage are shared, after which we train it through distillation. Subsequently, we expand these learngene layers containing stage information at their corresponding stage to initialize models of variable depths. Extensive experiments on ImageNet-1K demonstrate that SWS achieves consistent better performance compared to many models trained from scratch, while reducing around 6.6x total training costs. In some cases, SWS performs better only after 1 epoch tuning. When initializing variable-sized models adapting for different resource constraints, SWS achieves better results while reducing around 20x parameters stored to initialize these models and around 10x pre-training costs, in contrast to the pre-training and fine-tuning approach.
Hao Miao, Senzhang Wang, Meiyue Zhang, Diansheng Guo, Funing Sun, Fan Yang
Accurately forecasting traffic flows is critically important to many real
applications including public safety and intelligent transportation systems.
The challenges of this problem include both the dynamic mobility patterns of
the people and the complex spatial-temporal correlations of the urban traffic
data. Meanwhile, most existing models ignore the diverse impacts of the various
traffic observations (e.g. vehicle speed and road occupancy) on the traffic
flow prediction, and different traffic observations can be considered as
different channels of input features. We argue that the analysis in
multiple-channel traffic observations might help to better address this
problem. In this paper, we study the novel problem of multi-channel traffic
flow prediction, and propose a deep \underline{M}ulti-\underline{V}iew
\underline{C}hannel-wise \underline{S}patio-\underline{T}emporal
\underline{Net}work (MVC-STNet) model to effectively address it. Specifically,
we first construct the localized and globalized spatial graph where the
multi-view fusion module is used to effectively extract the local and global
spatial dependencies. Then LSTM is used to learn the temporal correlations. To
effectively model the different impacts of various traffic observations on
traffic flow prediction, a channel-wise graph convolutional network is also
designed. Extensive experiments are conducted over the PEMS04 and PEMS08
datasets. The results demonstrate that the proposed MVC-STNet outperforms
state-of-the-art methods by a large margin.
Authors' comments: Accepted by AAAI2020 workshop
Federico Marocco, J. Davy Kirkpatrick, Adam C. Schneider, Aaron M. Meisner, Mark Popinchalk, Christopher R. Gelino, Jacqueline K. Faherty, Adam J. Burgasser et al.
We present the discovery of 13 new widely separated T dwarf companions to M
dwarf primaries, identified using WISE/NEOWISE data by the CatWISE and Backyard
Worlds: Planet 9 projects. This sample represents a $\sim$60% increase in the
number of known M+T systems, and allows us to probe the most extreme products
of binary/planetary system formation, a discovery space made available by the
CatWISE2020 catalog and the Backyard Worlds: Planet 9 effort. Highlights among
the sample are WISEP J075108.79-763449.6, a previously known T9 thought to be
old due to its SED, which we now find is part of a common-proper-motion pair
with L 34-26 A, a well studied young M3 V star within 10 pc of the Sun; CWISE
J054129.32-745021.5 B and 2MASS J05581644-4501559 B, two T8 dwarfs possibly
associated with the very fast-rotating M4 V stars CWISE J054129.32-745021.5 A
and 2MASS J05581644-4501559 A; and UCAC3 52-1038 B, which is among the widest
late T companions to main sequence stars, with a projected separation of
$\sim$7100 au. The new benchmarks presented here are prime $JWST$ targets, and
can help us place strong constraints on formation and evolution theory of
substellar objects as well as on atmospheric models for these cold exoplanet
analogs.
Authors' comments: Accepted for publication in ApJ. 35 pages, 6 tables, 21 figures
Paulo Yanez Sarmiento, Simon Witzke, Nadja Klein, Bernhard Y. Renard
Explainability is a key component in many applications involving deep neural
networks (DNNs). However, current explanation methods for DNNs commonly leave
it to the human observer to distinguish relevant explanations from spurious
noise. This is not feasible anymore when going from easily human-accessible
data such as images to more complex data such as genome sequences. To
facilitate the accessibility of DNN outputs from such complex data and to
increase explainability, we present a modification of the widely used
explanation method layer-wise relevance propagation. Our approach enforces
sparsity directly by pruning the relevance propagation for the different
layers. Thereby, we achieve sparser relevance attributions for the input
features as well as for the intermediate layers. As the relevance propagation
is input-specific, we aim to prune the relevance propagation rather than the
underlying model architecture. This allows to prune different neurons for
different inputs and hence, might be more appropriate to the local nature of
explanation methods. To demonstrate the efficacy of our method, we evaluate it
on two types of data, images and genomic sequences. We show that our
modification indeed leads to noise reduction and concentrates relevance on the
most important features compared to the baseline.
Authors' comments: 15 pages, 5 figures
Davide Materia, Leonardo Ratini, Celestino Angeli, Leonardo Guidoni
The intersection of Quantum Chemistry and Quantum Computing has led to significant advancements in understanding the potential of using quantum devices for the efficient calculation of molecular energies. Simultaneously, this intersection is enhancing the comprehension of quantum chemical properties through the use of quantum computing and quantum information tools. This paper tackles a key question in this relationship: Is the nature of the orbital-wise electron correlations in wavefunctions of realistic prototypical cases classical or quantum? We delve into this inquiry with a comprehensive examination of molecular wavefunctions using Shannon and von Neumann entropies, alongside classical and quantum information theory. Our analysis reveals a notable distinction between classical and quantum mutual information in molecular systems when analyzed with Hartree-Fock canonical orbitals. However, this difference decreases dramatically, by approximately 100-fold, when Natural Orbitals are used as reference. This finding suggests that wavefunction correlations, when viewed through the appropriate orbital basis, are predominantly classical. This insight indicates that computational tasks in quantum chemistry could be significantly simplified by employing Natural Orbitals. Consequently, our study underscores the importance of using Natural Orbitals to accurately assess molecular wavefunction correlations and to avoid their overestimation. In summary, our results suggest a promising path for computational simplification in quantum chemistry, advocating for the wider adoption of Natural Orbitals and raising questions about the actual computational complexity of the multi-body problem in quantum chemistry.
Wenqi Jia, Sian Jin, Jinzhen Wang, Wei Niu, Dingwen Tao, Miao Yin
The rapid expansion of computational capabilities and the ever-growing scale of modern HPC systems present formidable challenges in managing exascale scientific data. Faced with such vast datasets, traditional lossless compression techniques prove insufficient in reducing data size to a manageable level while preserving all information intact. In response, researchers have turned to error-bounded lossy compression methods, which offer a balance between data size reduction and information retention. However, despite their utility, these compressors employing conventional techniques struggle with limited reconstruction quality. To address this issue, we draw inspiration from recent advancements in deep learning and propose GWLZ, a novel group-wise learning-based lossy compression framework with multiple lightweight learnable enhancer models. Leveraging a group of neural networks, GWLZ significantly enhances the decompressed data reconstruction quality with negligible impact on the compression efficiency. Experimental results on different fields from the Nyx dataset demonstrate remarkable improvements by GWLZ, achieving up to 20% quality enhancements with negligible overhead as low as 0.0003x.
Yuyan Shi, Jialu Ma, Jin Yang, Shasha Wang, Yichi Zhang
Medical image segmentation plays an important role in many image-guided clinical approaches. However, existing segmentation algorithms mostly rely on the availability of fully annotated images with pixel-wise annotations for training, which can be both labor-intensive and expertise-demanding, especially in the medical imaging domain where only experts can provide reliable and accurate annotations. To alleviate this challenge, there has been a growing focus on developing segmentation methods that can train deep models with weak annotations, such as image-level, bounding boxes, scribbles, and points. The emergence of vision foundation models, notably the Segment Anything Model (SAM), has introduced innovative capabilities for segmentation tasks using weak annotations for promptable segmentation enabled by large-scale pre-training. Adopting foundation models together with traditional learning methods has increasingly gained recent interest research community and shown potential for real-world applications. In this paper, we present a comprehensive survey of recent progress on annotation-efficient learning for medical image segmentation utilizing weak annotations before and in the era of foundation models. Furthermore, we analyze and discuss several challenges of existing approaches, which we believe will provide valuable guidance for shaping the trajectory of foundational models to further advance the field of medical image segmentation.
Fang Guo, Wenyu Li, Honglei Zhuang, Yun Luo, Yafu Li, Le Yan, Qi Zhu, Yue Zhang
The most recent pointwise Large Language Model (LLM) rankers have achieved remarkable ranking results. However, these rankers are hindered by two major drawbacks: (1) they fail to follow a standardized comparison guidance during the ranking process, and (2) they struggle with comprehensive considerations when dealing with complicated passages. To address these shortcomings, we propose to build a ranker that generates ranking scores based on a set of criteria from various perspectives. These criteria are intended to direct each perspective in providing a distinct yet synergistic evaluation. Our research, which examines eight datasets from the BEIR benchmark demonstrates that incorporating this multi-perspective criteria ensemble approach markedly enhanced the performance of pointwise LLM rankers.
Changsuk Oh, Dongseok Shim, Taekbeom Lee, H. Jin Kim
Object removal refers to the process of erasing designated objects from an image while preserving the overall appearance, and it is one area where image inpainting is widely used in real-world applications. The performance of an object remover is quantitatively evaluated by measuring the quality of object removal results, similar to how the performance of an image inpainter is gauged. Current works reporting quantitative performance evaluations utilize original images as references. In this letter, to validate the current evaluation methods cannot properly evaluate the performance of an object remover, we create a dataset with object removal ground truth and compare the evaluations made by the current methods using original images to those utilizing object removal ground truth images. The disparities between two evaluation sets validate that the current methods are not suitable for measuring the performance of an object remover. Additionally, we propose new evaluation methods tailored to gauge the performance of an object remover. The proposed methods evaluate the performance through class-wise object removal results and utilize images without the target class objects as a comparison set. We confirm that the proposed methods can make judgments consistent with human evaluators in the COCO dataset, and that they can produce measurements aligning with those using object removal ground truth in the self-acquired dataset.
Junbiao Pang, Zailin Dong, Jiaxin Deng, Mengyuan Zhu, Yunwei Zhang
Parsing Computer-Aided Design (CAD) drawings is a fundamental step for CAD
revision, semantic-based management, and the generation of 3D prototypes in
both the architecture and engineering industries. Labeling symbols from a CAD
drawing is a challenging yet notorious task from a practical point of view. In
this work, we propose to label and spot symbols from CAD images that are
converted from CAD drawings. The advantage of spotting symbols from CAD images
lies in the low requirement of labelers and the low-cost annotation. However,
pixel-wise spotting symbols from CAD images is challenging work. We propose a
pixel-wise point location via Progressive Gaussian Kernels (PGK) to balance
between training efficiency and location accuracy. Besides, we introduce a
local offset to the heatmap-based point location method. Based on the keypoints
detection, we propose a symbol grouping method to redraw the rectangle symbols
in CAD images. We have released a dataset containing CAD images of equipment
rooms from telecommunication industrial CAD drawings. Extensive experiments on
this real-world dataset show that the proposed method has good generalization
ability.
Authors' comments: 10 pages, 10 figures,6 tables
Yu Li, Han Jiang, Chuanyang Gong, Zhihua Wei
Despite the remarkable achievements of language models (LMs) across a broad spectrum of tasks, their propensity for generating toxic outputs remains a prevalent concern. Current solutions involving finetuning or auxiliary models usually require extensive computational resources, hindering their practicality in large language models (LLMs). In this paper, we propose DeStein, a novel method that detoxifies LMs by applying representation engineering in activation spaces with lower resource and time costs. Specifically, we derive detoxification vectors from self-induced, universal steering pairs through arithmetic operations in activation spaces. During inference, detoxification is achieved by fusing the detoxification vectors with the original representations in a head-wise manner. Empirical results demonstrate that our method significantly outperforms previous state-of-the-art approaches on various metrics, while also maintaining satisfactory generation quality and diversity. We further validate the practicality and scalability of DeStein with a series of white-box LLMs. The method is open-sourced at https://github.com/LizLizLi/DeStein. Warning: Some example model outputs may contain highly offensive or disturbing text.
Raoul Prisant, Federica Garin, Paolo Frasca
In this paper we make use of graphon theory to study opinion dynamics on
large undirected networks. The opinion dynamics models that we take into
consideration allow for negative interactions between the individuals, i.e.
competing entities whose opinions can grow apart. We consider both the
repelling model and the opposing model that are studied in the literature. We
define the repelling and the opposing dynamics on graphons and we show that
their initial value problem's solutions exist and are unique. We then show that
the graphon dynamics well approximate the dynamics on large graphs that
converge to a graphon. This result applies to large random graphs that are
sampled according to a graphon. All these facts are illustrated in an extended
numerical example.
Authors' comments: 8 double-column pages. This revised version corrects several typos.
An abridged version is going to appear in the proceedings of the 2024 IEEE
Conference on Decision and Control
Jiing-Ping Wang, Ming-Guang Lin, An-Yeu, Wu
With the rise of Transformer models in NLP and CV domain, Multi-Head Attention has been proven to be a game-changer. However, its expensive computation poses challenges to the model throughput and efficiency, especially for the long sequence tasks. Exploiting the sparsity in attention has been proven to be an effective way to reduce computation. Nevertheless, prior works do not consider the various distributions among different heads and lack a systematic method to determine the threshold. To address these challenges, we propose Low-Precision Approximate Attention with Head-wise Trainable Threshold for Efficient Transformer (LATTE). LATTE employs a headwise threshold-based filter with the low-precision dot product and computation reuse mechanism to reduce the computation of MHA. Moreover, the trainable threshold is introduced to provide a systematic method for adjusting the thresholds and enable end-to-end optimization. Experimental results indicate LATTE can smoothly adapt to both NLP and CV tasks, offering significant computation savings with only a minor compromise in performance. Also, the trainable threshold is shown to be essential for the leverage between the performance and the computation. As a result, LATTE filters up to 85.16% keys with only a 0.87% accuracy drop in the CV task and 89.91% keys with a 0.86 perplexity increase in the NLP task.