Nooshin Yousefzadeh, Rahul Sengupta, Yashaswi Karnati, Anand Rangarajan, Sanjay Ranka
Traffic congestion has significant economic, environmental, and social
ramifications. Intersection traffic flow dynamics are influenced by numerous
factors. While microscopic traffic simulators are valuable tools, they are
computationally intensive and challenging to calibrate. Moreover, existing
machine-learning approaches struggle to provide lane-specific waveforms or
adapt to intersection topology and traffic patterns. In this study, we propose
two efficient and accurate "Digital Twin" models for intersections, leveraging
Graph Attention Neural Networks (GAT). These attentional graph auto-encoder
digital twins capture temporal, spatial, and contextual aspects of traffic
within intersections, incorporating various influential factors such as
high-resolution loop detector waveforms, signal state records, driving
behaviors, and turning-movement counts. Trained on diverse counterfactual
scenarios across multiple intersections, our models generalize well, enabling
the estimation of detailed traffic waveforms for any intersection approach and
exit lanes. Multi-scale error metrics demonstrate that our models perform
comparably to microsimulations. The primary application of our study lies in
traffic signal optimization, a pivotal area in transportation systems research.
These lightweight digital twins can seamlessly integrate into corridor and
network signal timing optimization frameworks. Furthermore, our study's
applications extend to lane reconfiguration, driving behavior analysis, and
facilitating informed decisions regarding intersection safety and efficiency
enhancements. A promising avenue for future research involves extending this
approach to urban freeway corridors and integrating it with measures of
effectiveness metrics.
Authors' comments: T-TIS Journal, 12 pages, 8 figures, 4 tables
Omid Ghahroodi, Marzia Nouri, Mohammad Vali Sanian, Alireza Sahebi, Doratossadat Dastgheib, Ehsaneddin Asgari, Mahdieh Soleymani Baghshah, Mohammad Hossein Rohban
Evaluating Large Language Models (LLMs) is challenging due to their generative nature, necessitating precise evaluation methodologies. Additionally, non-English LLM evaluation lags behind English, resulting in the absence or weakness of LLMs for many languages. In response to this necessity, we introduce Khayyam Challenge (also known as PersianMMLU), a meticulously curated collection comprising 20,192 four-choice questions sourced from 38 diverse tasks extracted from Persian examinations, spanning a wide spectrum of subjects, complexities, and ages. The primary objective of the Khayyam Challenge is to facilitate the rigorous evaluation of LLMs that support the Persian language. Distinctive features of the Khayyam Challenge are (i) its comprehensive coverage of various topics, including literary comprehension, mathematics, sciences, logic, intelligence testing, etc., aimed at assessing different facets of LLMs such as language comprehension, reasoning, and information retrieval across various educational stages, from lower primary school to upper secondary school (ii) its inclusion of rich metadata such as human response rates, difficulty levels, and descriptive answers (iii) its utilization of new data to avoid data contamination issues prevalent in existing frameworks (iv) its use of original, non-translated data tailored for Persian speakers, ensuring the framework is free from translation challenges and errors while encompassing cultural nuances (v) its inherent scalability for future data updates and evaluations without requiring special human effort. Previous works lacked an evaluation framework that combined all of these features into a single comprehensive benchmark. Furthermore, we evaluate a wide range of existing LLMs that support the Persian language, with statistical analyses and interpretations of their outputs.
Zihao Wang, Bin Cui, Shaoduo Gan
Optimizing the Key-Value (KV) cache of the Large Language Model (LLM) has been considered critical to saving the cost of inference. Most of the existing KV-cache compression algorithms attempted to sparsify the sequence of tokens by taking advantage of the different importance of tokens. However, most of these methods treat all layers equally, allocating the same KV budget to each layer. This approach is suboptimal, as some layers may be less sensitive to input tokens yet still receive the same budget as others. In this work, we found that by identifying the importance of attention layers, we could optimize the KV-cache jointly from two dimensions, i.e., sequence-wise and layer-wise. Based on our observations regarding layer-wise importance in inference, we propose SqueezeAttention to precisely optimize the allocation of KV-cache budget among layers on-the-fly and then incorporate three representative sequence-wise algorithms to compress the KV-cache for each layer with its very own budget. Specifically, we first measure each layer's importance by calculating the cosine similarity of the input prompt differences before and after the self-attention layers. Based on this similarity, we then categorize the layers into two groups and adjust their KV budgets accordingly. By optimizing the KV-cache from both sequence's and layer's dimensions, SqueezeAttention achieves around 30% to 70% of the memory reductions and up to 2.2 times of throughput improvements in a wide range of LLMs and benchmarks. The code is available at https://github.com/hetailang/SqueezeAttention.
Mo Kordzanganeh, Danial Keshvary, Nariman Arian
Latent diffusion models are the state-of-the-art for synthetic image
generation. To align these models with human preferences, training the models
using reinforcement learning on human feedback is crucial. Black et. al 2024
introduced denoising diffusion policy optimisation (DDPO), which accounts for
the iterative denoising nature of the generation by modelling it as a Markov
chain with a final reward. As the reward is a single value that determines the
model's performance on the entire image, the model has to navigate a very
sparse reward landscape and so requires a large sample count. In this work, we
extend the DDPO by presenting the Pixel-wise Policy Optimisation (PXPO)
algorithm, which can take feedback for each pixel, providing a more nuanced
reward to the model.
Authors' comments: 6 pages, 7 figures
Toshiyuki Mizuki, Munetake Momose, Masataka Aizawa, Hiroshi Kobayashi
More than a thousand warm debris disks have been detected as infrared excess
at mid-infrared wavelengths, and their frequencies have been obtained for
various spectral types of stars. However, the dependence of the frequencies on
spectral type is still debated because the number of stars with significant and
detectable infrared excess is limited. Herein, we present the largest
systematic search for infrared excess using data from Gaia, WISE, and Spitzer.
We identified 373, 485, and 255-reliable infrared excesses in the mid-infrared
archival data at wavelengths of 12, 22, and 24 $\mu$m for WISE/$W3$, $W4$, and
Spitzer/MIPS ch1, respectively. Although we confirmed that more massive stars
tend to show higher frequencies of debris disks, these disk frequencies are
relatively flat for both low- and intermediate-mass stars, with a jump at 7000
K for all three wavelengths. Assuming that bright, warm debris disks have
lifetimes of a few to several hundred million years, the disk frequency can be
understood as the ratio between the timescale and the upper limits of the
sample ages. We also found that intermediate-mass stars with infrared excess
tend to be bluer and fainter along the evolutionary track than those without,
implying that massive stars hosting debris disks are relatively young, with an
isochronal age of approximately 500 Myr. These tendencies are reasonably
explained by a standard scenario in which debris disks are likely to be
produced by collisions of planetesimals in early stages of stellar evolution,
such as the Late Heavy Bombardment.
Authors' comments: Accepted for publication in AJ. 27 pages, 19 figures, 5 tables
Francis Engelmann, Fabian Manhardt, Michael Niemeyer, Keisuke Tateno, Marc Pollefeys, Federico Tombari
Large visual-language models (VLMs), like CLIP, enable open-set image
segmentation to segment arbitrary concepts from an image in a zero-shot manner.
This goes beyond the traditional closed-set assumption, i.e., where models can
only segment classes from a pre-defined training set. More recently, first
works on open-set segmentation in 3D scenes have appeared in the literature.
These methods are heavily influenced by closed-set 3D convolutional approaches
that process point clouds or polygon meshes. However, these 3D scene
representations do not align well with the image-based nature of the
visual-language models. Indeed, point cloud and 3D meshes typically have a
lower resolution than images and the reconstructed 3D scene geometry might not
project well to the underlying 2D image sequences used to compute pixel-aligned
CLIP features. To address these challenges, we propose OpenNeRF which naturally
operates on posed images and directly encodes the VLM features within the NeRF.
This is similar in spirit to LERF, however our work shows that using pixel-wise
VLM features (instead of global CLIP features) results in an overall less
complex architecture without the need for additional DINO regularization. Our
OpenNeRF further leverages NeRF's ability to render novel views and extract
open-set VLM features from areas that are not well observed in the initial
posed images. For 3D point cloud segmentation on the Replica dataset, OpenNeRF
outperforms recent open-vocabulary methods such as LERF and OpenScene by at
least +4.9 mIoU.
Authors' comments: ICLR 2024, Project page: https://opennerf.github.io
Luca Comanducci, Fabio Antonacci, Augusto Sarti
Deep learning models are widely applied in the signal processing community, yet their inner working procedure is often treated as a black box. In this paper, we investigate the use of eXplainable Artificial Intelligence (XAI) techniques to learning-based end-to-end speech source localization models. We consider the Layer-wise Relevance Propagation (LRP) technique, which aims to determine which parts of the input are more important for the output prediction. Using LRP we analyze two state-of-the-art models, of differing architectural complexity that map audio signals acquired by the microphones to the cartesian coordinates of the source. Specifically, we inspect the relevance associated with the input features of the two models and discover that both networks denoise and de-reverberate the microphone signals to compute more accurate statistical correlations between them and consequently localize the sources. To further demonstrate this fact, we estimate the Time-Difference of Arrivals (TDoAs) via the Generalized Cross Correlation with Phase Transform (GCC-PHAT) using both microphone signals and relevance signals extracted from the two networks and show that through the latter we obtain more accurate time-delay estimation results.
Yukun Yue
In this paper, we establish discrete versions of the Poincar\'e and trace inequalities for hybridizable finite element spaces. These spaces are made of piecewise polynomial functions defined both within the interiors of elements and across all faces in a mesh's skeleton, serving as the basis for both the hybridizable discontinuous Galerkin (HDG) and hybrid high-order (HHO) methods. Additionally, we present a specific adaptation of these inequalities for the HDG method and apply them to demonstrate the stability of the related numerical schemes for second-order elliptic equations under the minimal regularity assumptions for the source term and boundary data.
Natalie Lang, Alejandro Cohen, Nir Shlezinger
Synchronous federated learning (FL) is a popular paradigm for collaborative edge learning. It typically involves a set of heterogeneous devices locally training neural network (NN) models in parallel with periodic centralized aggregations. As some of the devices may have limited computational resources and varying availability, FL latency is highly sensitive to stragglers. Conventional approaches discard incomplete intra-model updates done by stragglers, alter the amount of local workload and architecture, or resort to asynchronous settings; which all affect the trained model performance under tight training latency constraints. In this work, we propose straggler-aware layer-wise federated learning (SALF) that leverages the optimization procedure of NNs via backpropagation to update the global model in a layer-wise fashion. SALF allows stragglers to synchronously convey partial gradients, having each layer of the global model be updated independently with a different contributing set of users. We provide a theoretical analysis, establishing convergence guarantees for the global model under mild assumptions on the distribution of the participating devices, revealing that SALF converges at the same asymptotic rate as FL with no timing limitations. This insight is matched with empirical observations, demonstrating the performance gains of SALF compared to alternative mechanisms mitigating the device heterogeneity gap in FL.
Khunanon Thongkham, Anthony H. Gonzalez, Mark Brodwin, Ariane Trudeau, Ripon Saha, Peter Eisenhardt, S. A. Stanford, Emily Moravec et al.
The Massive and Distant Clusters of WISE Survey 2 (MaDCoWS2) is a new survey
designed as the successor of the original MaDCoWS survey. MaDCoWS2 improves
upon its predecessor by using deeper optical and infrared data and a more
powerful detection algorithm (PZWav). As input to the search, we use grz
photometry from DECaLS in combination with W1 and W2 photometry from the
CatWISE2020 catalog to derive the photometric redshifts with full redshift
probability distribution functions for WISE-selected galaxies. Cluster
candidates are then detected using the PZWav algorithm to find
three-dimensional galaxy overdensities from the sky positions and photometric
redshifts. This paper provides the first MaDCoWS2 data release, covering 1461
(1838 without masking) deg^2 centered on the Hyper-SuprimeCam Subaru Strategic
Program equatorial fields. Within this region, we derive a catalog of 22,970
galaxy cluster candidates detected at S/N>5. These clusters span the redshift
range 0.1<z<2, including 1312 candidates at z>1.5. We compare MaDCoWS2 to six
existing catalogs in the area. We rediscover 60%-92% of the clusters in these
surveys at S/N>5. The medians of the absolute redshift offset are <0.02
relative to these surveys, while the standard deviations are less than 0.06.
The median offsets between the detection position from MaDCoWS2 and other
surveys are less than 0.25 Mpc. We quantify the relation between S/N and gas
mass, total mass, luminosity, and richness from other surveys using a
redshift-dependent power law relation. We find that the S/N-richness relation
exhibits the lowest scatter.
Authors' comments: 27 pages, 7 figures. Typo corrected. Accepted for publication in ApJ
Yongqiang Wang, Haisheng Fu, Qi Cao, Shang Wang, Zhenjiao Chen, Feng Liang
Recently, deep learning technology has been successfully applied in the field of image compression, leading to superior rate-distortion performance. It is crucial to design an effective and efficient entropy model to estimate the probability distribution of the latent representation. However, the majority of entropy models primarily focus on one-dimensional correlation processing between channel and spatial information. In this paper, we propose an Adaptive Channel-wise and Global-inter attention Context (ACGC) entropy model, which can efficiently achieve dual feature aggregation in both inter-slice and intraslice contexts. Specifically, we divide the latent representation into different slices and then apply the ACGC model in a parallel checkerboard context to achieve faster decoding speed and higher rate-distortion performance. In order to capture redundant global features across different slices, we utilize deformable attention in adaptive global-inter attention to dynamically refine the attention weights based on the actual spatial relationships and context. Furthermore, in the main transformation structure, we propose a high-performance S2LIC model. We introduce the residual SwinV2 Transformer model to capture global feature information and utilize a dense block network as the feature enhancement module to improve the nonlinear representation of the image within the transformation structure. Experimental results demonstrate that our method achieves faster encoding and decoding speeds and outperforms VTM-17.1 and some recent learned image compression methods in both PSNR and MS-SSIM metrics.
Nazmul Hasan, Apurba Kumar Saha, Andrew Wessman, Mohammed Shafae
Overheating anomaly detection is essential for the quality and reliability of
parts produced by laser powder bed fusion (LPBF) additive manufacturing (AM).
In this research, we focus on the detection of overheating anomalies using
photodiode sensor data. Photodiode sensors can collect high-frequency data from
the melt pool, reflecting the process dynamics and thermal history. Hence, the
proposed method offers a machine learning (ML) framework to utilize photodiode
sensor data for layer-wise detection of overheating anomalies. In doing so,
three sets of features are extracted from the raw photodiode data: MSMM (mean,
standard deviation, median, maximum), MSQ (mean, standard deviation,
quartiles), and MSD (mean, standard deviation, deciles). These three datasets
are used to train several ML classifiers. Cost-sensitive learning is used to
handle the class imbalance between the "anomalous" layers (affected by
overheating) and "nominal" layers in the benchmark dataset. To boost detection
accuracy, our proposed ML framework involves utilizing the majority voting
ensemble (MVE) approach. The proposed method is demonstrated using a case study
including an open benchmark dataset of photodiode measurements from an LPBF
specimen with deliberate overheating anomalies at some layers. The results from
the case study demonstrate that the MSD features yield the best performance for
all classifiers, and the MVE classifier (with a mean F1-score of 0.8654)
surpasses the individual ML classifiers. Moreover, our machine learning
methodology achieves superior results (9.66% improvement in mean F1-score) in
detecting layer-wise overheating anomalies, surpassing the existing methods in
the literature that use the same benchmark dataset.
Authors' comments: 12 pages (including references); 5 figures; 4 tables
Peng Zhang, Ao Duan, Xianglu Zou, Yuhong Liu
Privacy-Preserving Neural Networks (PPNN) are advanced to perform inference without breaching user privacy, which can serve as an essential tool for medical diagnosis to simultaneously achieve big data utility and privacy protection. As one of the key techniques to enable PPNN, Fully Homomorphic Encryption (FHE) is facing a great challenge that homomorphic operations cannot be easily adapted for non-linear activation calculations. In this paper, batch-oriented element-wise data packing and approximate activation are proposed, which train linear low-degree polynomials to approximate the non-linear activation function - ReLU. Compared with other approximate activation methods, the proposed fine-grained, trainable approximation scheme can effectively reduce the accuracy loss caused by approximation errors. Meanwhile, due to element-wise data packing, a large batch of images can be packed and inferred concurrently, leading to a much higher utility ratio of ciphertext slots. Therefore, although the total inference time increases sharply, the amortized time for each image actually decreases, especially when the batch size increases. Furthermore, knowledge distillation is adopted in the training process to further enhance the inference accuracy. Experiment results show that when ciphertext inference is performed on 4096 input images, compared with the current most efficient channel-wise method, the inference accuracy is improved by 1.65%, and the amortized inference time is reduced by 99.5%.
Jiawei Li, Sitong Li, Shanshan Wang, Yicheng Zeng, Falong Tan, Chuanlong Xie
Deploying machine learning in open environments presents the challenge of encountering diverse test inputs that differ significantly from the training data. These out-of-distribution samples may exhibit shifts in local or global features compared to the training distribution. The machine learning (ML) community has responded with a number of methods aimed at distinguishing anomalous inputs from original training data. However, the majority of previous studies have primarily focused on the output layer or penultimate layer of pre-trained deep neural networks. In this paper, we propose a novel framework, Multitesting-based Layer-wise Out-of-Distribution (OOD) Detection (MLOD), to identify distributional shifts in test samples at different levels of features through rigorous multiple testing procedure. Our approach distinguishes itself from existing methods as it does not require modifying the structure or fine-tuning of the pre-trained classifier. Through extensive experiments, we demonstrate that our proposed framework can seamlessly integrate with any existing distance-based inspection method while efficiently utilizing feature extractors of varying depths. Our scheme effectively enhances the performance of out-of-distribution detection when compared to baseline methods. In particular, MLOD-Fisher achieves superior performance in general. When trained using KNN on CIFAR10, MLOD-Fisher significantly lowers the false positive rate (FPR) from 24.09% to 7.47% on average compared to merely utilizing the features of the last layer.
Nhan-Khanh Le, Erfaun Noorani, Sandra Hirche, John Baras
Real-world scenarios are characterized by timing uncertainties, e.g., delays, and disturbances. Algorithms with temporal robustness are crucial in guaranteeing the successful execution of tasks and missions in such scenarios. We study time-robust path planning for synthesizing robots' trajectories that adhere to spatial-temporal specifications expressed in Signal Temporal Logic (STL). In contrast to prior approaches that rely on {discretize}d trajectories with fixed time steps, we leverage Piece-Wise Linear (PWL) signals for the synthesis. PWL signals represent a trajectory through a sequence of time-stamped waypoints. This allows us to encode the STL formula into a Mixed-Integer Linear Program (MILP) with fewer variables. This reduction is more pronounced for specifications with a long planning horizon. To that end, we define time-robustness for PWL signals. Subsequently, we propose quantitative semantics for PWL signals according to the recursive syntax of STL and prove their soundness. We then propose an encoding strategy to transform our semantics into a MILP. Our simulations showcase the soundness and the performance of our algorithm.
Haokun Lin, Haoli Bai, Zhili Liu, Lu Hou, Muyi Sun, Linqi Song, Ying Wei, Zhenan Sun
Vision-language pre-trained models have achieved impressive performance on
various downstream tasks. However, their large model sizes hinder their
utilization on platforms with limited computational resources. We find that
directly using smaller pre-trained models and applying magnitude-based pruning
on CLIP models leads to inflexibility and inferior performance. Recent efforts
for VLP compression either adopt uni-modal compression metrics resulting in
limited performance or involve costly mask-search processes with learnable
masks. In this paper, we first propose the Module-wise Pruning Error (MoPE)
metric, accurately assessing CLIP module importance by performance decline on
cross-modal tasks. Using the MoPE metric, we introduce a unified pruning
framework applicable to both pre-training and task-specific fine-tuning
compression stages. For pre-training, MoPE-CLIP effectively leverages knowledge
from the teacher model, significantly reducing pre-training costs while
maintaining strong zero-shot capabilities. For fine-tuning, consecutive pruning
from width to depth yields highly competitive task-specific models. Extensive
experiments in two stages demonstrate the effectiveness of the MoPE metric, and
MoPE-CLIP outperforms previous state-of-the-art VLP compression methods.
Authors' comments: 18 pages, 8 figures, Published in CVPR2024
Gabriel Toshio Hirokawa Higa, Rodrigo Stuqui Monzani, Jorge Fernando da Silva Cecatto, Maria Fernanda Balestieri Mariano de Souza, Vanessa Aparecida de Moraes Weber, Hemerson Pistori, Edson Takashi Matsubara
Smart indoor tourist attractions, such as smart museums and aquariums, usually require a significant investment in indoor localization devices. The smartphone Global Positional Systems use is unsuitable for scenarios where dense materials such as concrete and metal block weaken the GPS signals, which is the most common scenario in an indoor tourist attraction. Deep learning makes it possible to perform region-wise indoor localization using smartphone images. This approach does not require any investment in infrastructure, reducing the cost and time to turn museums and aquariums into smart museums or smart aquariums. This paper proposes using deep learning algorithms to classify locations using smartphone camera images for indoor tourism attractions. We evaluate our proposal in a real-world scenario in Brazil. We extensively collect images from ten different smartphones to classify biome-themed fish tanks inside the Pantanal Biopark, creating a new dataset of 3654 images. We tested seven state-of-the-art neural networks, three being transformer-based, achieving precision around 90% on average and recall and f-score around 89% on average. The results indicate good feasibility of the proposal in a most indoor tourist attractions.
Vinay Chakravarthi Gogineni, Esmaeil S. Nadimi
Machine unlearning has garnered significant attention due to its ability to
selectively erase knowledge obtained from specific training data samples in an
already trained machine learning model. This capability enables data holders to
adhere strictly to data protection regulations. However, existing unlearning
techniques face practical constraints, often causing performance degradation,
demanding brief fine-tuning post unlearning, and requiring significant storage.
In response, this paper introduces a novel class of machine unlearning
algorithms. First method is partial amnesiac unlearning, integration of
layer-wise pruning with amnesiac unlearning. In this method, updates made to
the model during training are pruned and stored, subsequently used to forget
specific data from trained model. The second method assimilates layer-wise
partial-updates into label-flipping and optimization-based unlearning to
mitigate the adverse effects of data deletion on model efficacy. Through a
detailed experimental evaluation, we showcase the effectiveness of proposed
unlearning methods. Experimental results highlight that the partial amnesiac
unlearning not only preserves model efficacy but also eliminates the necessity
for brief post fine-tuning, unlike conventional amnesiac unlearning. Moreover,
employing layer-wise partial updates in label-flipping and optimization-based
unlearning techniques demonstrates superiority in preserving model efficacy
compared to their naive counterparts.
Authors' comments: 16pages, 4 figures
José A. Vélez-Marulanda
Let $\mathbf{k}$ be a field and let $V: \mathscr{C} \to \mathbf{k}\textup{-Mod}$ be a point-wise finite dimensional persistence modules, where $\mathscr{C}$ is a small category. Assume that for all local Artinian $\mathbf{k}$-algebras $R$ with residue field isomorphic to $\mathbf{k}$, there is a generalized persistence module $M: \mathscr{C} \to R\textup{-Mod}$, such that for all $x\in \mathrm{Ob}(\mathscr{C})$, $M(x)$ is free over $R$ with finite rank and $\mathbf{k}\otimes_R M(x)\cong V(x)$. If $V$ is a direct sum of indecomposable persistence modules $V_I: \mathscr{C}\to \mathbf{k}\textup{-Mod}$ with endomorphism ring isomorphic to $\mathbf{k}$, then $M$ is a direct sum of indecomposables $M_I:\mathscr{C}\to R\textup{-Mod}$ with endomorphism ring isomorphic to $R$
Haochen Shi, Zhiyuan Sun, Xingdi Yuan, Marc-Alexandre Côté, Bang Liu
Embodied Instruction Following (EIF) is a crucial task in embodied learning, requiring agents to interact with their environment through egocentric observations to fulfill natural language instructions. Recent advancements have seen a surge in employing large language models (LLMs) within a framework-centric approach to enhance performance in embodied learning tasks, including EIF. Despite these efforts, there exists a lack of a unified understanding regarding the impact of various components-ranging from visual perception to action execution-on task performance. To address this gap, we introduce OPEx, a comprehensive framework that delineates the core components essential for solving embodied learning tasks: Observer, Planner, and Executor. Through extensive evaluations, we provide a deep analysis of how each component influences EIF task performance. Furthermore, we innovate within this space by deploying a multi-agent dialogue strategy on a TextWorld counterpart, further enhancing task performance. Our findings reveal that LLM-centric design markedly improves EIF outcomes, identify visual perception and low-level action execution as critical bottlenecks, and demonstrate that augmenting LLMs with a multi-agent framework further elevates performance.