Ziyu Zhao, Tao Shen, Didi Zhu, Zexi Li, Jing Su, Xuwu Wang, Kun Kuang, Fei Wu
Low-Rank Adaptation (LoRA) has emerged as a popular technique for fine-tuning large language models (LLMs) to various domains due to its modular design and widespread availability on platforms like Huggingface. This modularity has sparked interest in combining multiple LoRAs to enhance LLM capabilities. However, existing methods for LoRA composition primarily focus on task-specific adaptations that require additional training, and current model merging techniques often fail to fully leverage LoRA's modular nature, leading to parameter interference and performance degradation. In this paper, we investigate the feasibility of disassembling and reassembling multiple LoRAs at a finer granularity, analogous to assembling LEGO blocks. We introduce the concept of Minimal Semantic Units (MSUs), where the parameters corresponding to each rank in LoRA function as independent units. These MSUs demonstrate permutation invariance and concatenation-summation equivalence properties, enabling flexible combinations to create new LoRAs. Building on these insights, we propose the LoRA-LEGO framework. This framework conducts rank-wise parameter clustering by grouping MSUs from different LoRAs into $k$ clusters. The centroid of each cluster serves as a representative MSU, enabling the assembly of a merged LoRA with an adjusted rank of $k$. Additionally, we apply a dual reweighting strategy to optimize the scale of the merged LoRA. Experiments across various benchmarks demonstrate that our method outperforms existing approaches in LoRA merging.
Xian Zhong, Shengwang Hu, Wenxuan Liu, Wenxin Huang, Jianhao Ding, Zhaofei Yu, Tiejun Huang
Spiking neural networks (SNNs) have garnered significant attention for their low power consumption and high biological interpretability. Their rich spatio-temporal information processing capability and event-driven nature make them ideally well-suited for neuromorphic datasets. However, current SNNs struggle to balance accuracy and latency in classifying these datasets. In this paper, we propose Hybrid Step-wise Distillation (HSD) method, tailored for neuromorphic datasets, to mitigate the notable decline in performance at lower time steps. Our work disentangles the dependency between the number of event frames and the time steps of SNNs, utilizing more event frames during the training stage to improve performance, while using fewer event frames during the inference stage to reduce latency. Nevertheless, the average output of SNNs across all time steps is susceptible to individual time step with abnormal outputs, particularly at extremely low time steps. To tackle this issue, we implement Step-wise Knowledge Distillation (SKD) module that considers variations in the output distribution of SNNs at each time step. Empirical evidence demonstrates that our method yields competitive performance in classification tasks on neuromorphic datasets, especially at lower time steps. Our code will be available at: {https://github.com/hsw0929/HSD}.
Junlin Lv, Yuan Feng, Xike Xie, Xin Jia, Qirong Peng, Guiming Xie
Large language models have achieved notable success across various domains, yet efficient inference is still limited by the quadratic computation complexity of the attention mechanism. The inference consists of prefilling and decoding phases. Although several attempts have been made to accelerate decoding, the inefficiency of the prefilling phase, especially for long-context tasks, remains a challenge. In this paper, we observe a locality in query criticality during the prefilling phase of long-context processing: adjacent query tokens tend to focus on similar subsets of the past Key-Value (KV) cache. Based on this observation, we propose CritiPrefill, a criticality-based segment-wise prefilling method. This method partitions the input sequence's queries and KV cache into segments and blocks, utilizing a segment-wise algorithm to estimate the query criticality. By pruning non-critical computations between query segments and cache blocks in the self-attention mechanism, the prefilling process can be significantly accelerated. Extensive evaluations on multiple long-context datasets show up to 2.7x speedup on Llama3-8B and 3.0x speedup on Yi-9B for 128K context length on a single A100 GPU, with minimal quality degradation.
Zhixing Hou, Maoxu Gao, Hang Yu, Mengyu Yang, Chio-In Ieong
This paper introduces a Spiking Diffusion Policy (SDP) learning method for robotic manipulation by integrating Spiking Neurons and Learnable Channel-wise Membrane Thresholds (LCMT) into the diffusion policy model, thereby enhancing computational efficiency and achieving high performance in evaluated tasks. Specifically, the proposed SDP model employs the U-Net architecture as the backbone for diffusion learning within the Spiking Neural Network (SNN). It strategically places residual connections between the spike convolution operations and the Leaky Integrate-and-Fire (LIF) nodes, thereby preventing disruptions to the spiking states. Additionally, we introduce a temporal encoding block and a temporal decoding block to transform static and dynamic data with timestep $T_S$ into each other, enabling the transmission of data within the SNN in spike format. Furthermore, we propose LCMT to enable the adaptive acquisition of membrane potential thresholds, thereby matching the conditions of varying membrane potentials and firing rates across channels and avoiding the cumbersome process of manually setting and tuning hyperparameters. Evaluating the SDP model on seven distinct tasks with SNN timestep $T_S=4$, we achieve results comparable to those of the ANN counterparts, along with faster convergence speeds than the baseline SNN method. This improvement is accompanied by a reduction of 94.3\% in dynamic energy consumption estimated on 45nm hardware.
Andrew W. Blain
Soon after the release of the WISE all-sky catalogue of 500 million
mid-infrared (IR) objects, suggestions were made that it could be used to
search for extrasolar devices constructed by an advanced civilization to
convert a significant fraction of their host star's luminosity into useful
work: "technostructures", "megastructures" or "Dyson spheres/structures",
hereafter DSMs, whose inevitable waste heat would be seen by WISE at mid-IR
wavelengths. However, a trawl of several million potentially-habitable
Gaia-detected stars for mid-IR-excess signatures is fraught with danger, due to
both noise from such a large sample and, more importantly, confusion with the
emission from dusty background galaxies. In light of a recent claim of seven
potential DSMs in MNRAS, a brief rebuttal appeared on arXiv. Further to this
response, the relevance of WISE-detected galaxies is discussed in more detail,
leading to a seemingly tight limit on the number and lifetime of DSMs, and
indeed intelligent worlds, in the ~600-pc-radius region patrolled by Gaia.
However, the detectability of DSMs is questioned: a DSM might extinguish its
star at optical/near-IR wavelengths, and thus either not appear or appear
anomalously faint in a stellar catalogue. Moreover, a civilization advanced
enough to construct a DSM is likely to be advanced enough to use
countermeasures to mask its presence from us.
Authors' comments: 6 pages. No figures. Submitted to MNRAS, possibly letters
Lee R. Martin, Andrew W. Blain, Tanio Díaz-Santos, Roberto J. Assef, Chao-Wei Tsai, Hyunsung D. Jun, Peter R. M. Eisenhardt, Jingwen Wu et al.
We present observations of mid-J J=4-3 or J=5-4 carbon monoxide (CO) emission
lines and continuum emission from a sample of ten of the most luminous
log(L/L_solar)~14 Hot Dust-Obscured Galaxies (Hot DOGs) discovered by the
Wide-field Infrared Survey Explorer (WISE) with redshifts up to 4.6. We uncover
broad spectral lines (FWHM~400 km/s) in these objects, suggesting a turbulent
molecular interstellar medium (ISM) may be ubiquitous in Hot DOGs. A halo of
molecular gas, extending out to a radius of 5 kpc is observed in W2305-0039,
likely supplied by 940 km/s molecular outflows. W0831+0140 is plausibly the
host of a merger between at least two galaxies, consistent with observations
made using ionized gas. These CO(4-3) observations contrast with previous
CO(1-0) studies of the same sources: the CO(4-3) to CO(1-0) luminosity ratios
exceed 300 in each source, suggesting that the lowest excited states of CO are
underluminous. These findings show that the molecular gas in Hot DOGs is
consistently turbulent, plausibly a consequence of AGN feedback, triggered by
galactic mergers.
Authors' comments: 19 pages (16 main text & 3 in Appendix), 9 figures, plus 3 in
Appendix. MNRAS in press
Tianyuan Zhang, Lu Wang, Jiaqi Kang, Xinwei Zhang, Siyuan Liang, Yuwei Chen, Aishan Liu, Xianglong Liu
Recent advances in deep learning have markedly improved autonomous driving
(AD) models, particularly end-to-end systems that integrate perception,
prediction, and planning stages, achieving state-of-the-art performance.
However, these models remain vulnerable to adversarial attacks, where
human-imperceptible perturbations can disrupt decision-making processes. While
adversarial training is an effective method for enhancing model robustness
against such attacks, no prior studies have focused on its application to
end-to-end AD models. In this paper, we take the first step in adversarial
training for end-to-end AD models and present a novel Module-wise Adaptive
Adversarial Training (MA2T). However, extending conventional adversarial
training to this context is highly non-trivial, as different stages within the
model have distinct objectives and are strongly interconnected. To address
these challenges, MA2T first introduces Module-wise Noise Injection, which
injects noise before the input of different modules, targeting training models
with the guidance of overall objectives rather than each independent module
loss. Additionally, we introduce Dynamic Weight Accumulation Adaptation, which
incorporates accumulated weight changes to adaptively learn and adjust the loss
weights of each module based on their contributions (accumulated reduction
rates) for better balance and robust training. To demonstrate the efficacy of
our defense, we conduct extensive experiments on the widely-used nuScenes
dataset across several end-to-end AD models under both white-box and black-box
attacks, where our method outperforms other baselines by large margins
(+5-10%). Moreover, we validate the robustness of our defense through
closed-loop evaluation in the CARLA simulation environment, showing improved
resilience even against natural corruption.
Authors' comments: 14 pages
Junhui He, Shangyu Wu, Weidong Wen, Chun Jason Xue, Qingan Li
Deploying large language models (LLMs) on edge devices presents significant challenges due to the substantial computational overhead and memory requirements. Activation sparsification can mitigate these resource challenges by reducing the number of activated neurons during inference. Existing methods typically employ thresholding-based sparsification based on the statistics of activation tensors. However, they do not model the impact of activation sparsification on performance, resulting in suboptimal performance degradation. To address the limitations, this paper reformulates the activation sparsification problem to explicitly capture the relationship between activation sparsity and model performance. Then, this paper proposes CHESS, a general activation sparsification approach via CHannel-wise thrEsholding and Selective Sparsification. First, channel-wise thresholding assigns a unique threshold to each activation channel in the feed-forward network (FFN) layers. Then, selective sparsification involves applying thresholding-based activation sparsification to specific layers within the attention modules. Finally, we detail the implementation of sparse kernels to accelerate LLM inference. Experimental results demonstrate that the proposed CHESS achieves lower performance degradation over eight downstream tasks while activating fewer parameters than existing methods, thus speeding up the LLM inference by up to 1.27x.
Xi Xie, Yuebo Luo, Hongwu Peng, Caiwen Ding
Top-k algorithms are essential in various applications, from high-performance computing and information retrieval to big data and neural network model training. This paper introduces RTop-K, a highly efficient parallel row-wise top-k selection algorithm designed for GPUs. RTop-K employs a Binary Search-based approach to optimize resource allocation and provides a scalable solution that significantly accelerates top-k operations. We perform a theoretical analysis of the effects of early stopping in our algorithm, demonstrating that it maintains the accuracy of neural network models while enhancing performance. Comprehensive tests show that our GPU implementation of RTop-K outperforms other row-wise top-k GPU implementations, with minimal impact on testing accuracy when early stopping is applied. Notably, RTop-K achieves speed increases ranging from 4.245$\times$ to 9.506$\times$ with early stopping, and 3.936$\times$ without early stopping, compared to state-of-the-art implementations. The proposed methods offer significant improvements in the training and inference of Graph Neural Networks (GNNs), addressing critical challenges in latency and throughput on GPU platforms.
Haofeng Liu, Emad Alsusa, Arafat Al-Dweik
This paper investigates the bit error rate (BER) and outage probability performance of integrated sensing and communication (ISaC) in uplink non-orthogonal multiple access (NOMA) based Internet of Things (IoT) systems. Specifically, we consider an ISaC system where the radar signal is designed to be orthogonal to the communication signal over two symbol periods so that its interference on the communication signal is completely eliminated when detecting the data in pairs of consecutive symbols. This is akin to multi-symbol rate NOMA systems except in this case as the radar bears no data, its waveform is manipulated to be orthogonal to the transmitted communication signal. To eliminate potential decision ambiguity during the pair-wise data detection, a constant phase-offset between adjacent communication symbols is applied at the transmitter. The performance of such a system is analyzed through deriving analytical expressions for the exact BER of zero-forcing (ZF) based receivers. In addition, close-form expressions for the upper BER bound and the outage probability for both ZF and the joint maximum likelihood (JML) receivers are presented. The results show that the derived expressions are perfectly matched with the simulation results. The obtained expressions provide an insight into the performance of this novel ISaC system including demonstrating the impact of various parameters and showing how the ZF receiver provides a useful trade-off between performance and complexity relative to the JML receiver.
Zizheng Huang, Haoxing Chen, Jiaqi Li, Jun Lan, Huijia Zhu, Weiqiang Wang, Limin Wang
Recent Vision Mamba models not only have much lower complexity for processing higher resolution images and longer videos but also the competitive performance with Vision Transformers (ViTs). However, they are stuck into overfitting and thus only present up to base size (about 80M). It is still unclear how vanilla Vision Mamba (Vim) can be efficiently scaled up to larger sizes, which is essentially for further exploitation. In this paper, we propose a stochastic layer-wise shuffle regularization, which empowers successfully scaling non-hierarchical Vision Mamba to a large size (about 300M) in a supervised setting. Specifically, our base and large-scale ShuffleMamba models can outperform the supervised ViTs of similar size by 0.8\% and 1.0\% classification accuracy on ImageNet1k, respectively, without auxiliary data. When evaluated on the ADE20K semantic segmentation and COCO detection tasks, our ShuffleMamba models also show significant improvements. Without bells and whistles, the stochastic layer-wise shuffle has the following highlights: (1) \textit{Plug and play:} it does not change model architectures and will be omitted in inference. (2) \textit{Simple but effective:} it can improve the overfitting in Vim training and only introduce random token permutation operations. (3) \textit{Intuitive:} the token sequences in deeper layers are more likely to be shuffled as they are expected to be more semantic and less sensitive to patch positions. Code and models will be available at https://github.com/huangzizheng01/ShuffleMamba.
Sergey Karpov, Oleg Malkov, Alexandra Avdeeva
Sixty years after the discovery of brown dwarfs, the search for these objects
continues, particularly in the vicinity of the Sun. Objects near the Sun are
characterized by large proper motions, making them seen as fast-moving objects.
While the Gaia DR3 catalogue is a comprehensive source of proper motions, it
lacks the depth needed for discovering fainter objects. Modern multi-epoch
surveys, with their greater depth, offer a new opportunity for systematic
search for ultra-cool dwarfs. The study aims to systematically search for high
proper motion objects using the newly released catalogue of epochal WISE data
in order to identify new brown dwarf candidates in the solar neighborhood,
estimate their spectral types, distances and spatial velocities. We used
recently released unTimely catalogue of epochal detections in unWISE coadds to
search for objects with high proper motions using simple motion detection
algorithm. This method was used to identify objects with proper motions
exceeding approximately 0.6 arcseconds per year. The identified objects were
then cross-referenced with data from other large-scale sky surveys to further
analyze their characteristics. The search yielded 3245 moving objects with
significant proper motions, 32 of which had not been previously published.
Among these, at least 15 were identified as reliable new brown dwarf
candidates, with estimated distances closer than 50 parsecs and spectral types
later than T0.
Authors' comments: Table 1 is available online at https://zenodo.org/records/13362690.
Submitted to A&A
Langrui Zhou, Guang Li
The current mainstream multi-modal medical image-to-image translation methods
face a contradiction. Supervised methods with outstanding performance rely on
pixel-wise aligned training data to constrain the model optimization. However,
obtaining pixel-wise aligned multi-modal medical image datasets is challenging.
Unsupervised methods can be trained without paired data, but their reliability
cannot be guaranteed. At present, there is no ideal multi-modal medical
image-to-image translation method that can generate reliable translation
results without the need for pixel-wise aligned data. This work aims to develop
a novel medical image-to-image translation model that is independent of
pixel-wise aligned data (MITIA), enabling reliable multi-modal medical
image-to-image translation under the condition of misaligned training data. The
proposed MITIA model utilizes a prior extraction network composed of a
multi-modal medical image registration module and a multi-modal misalignment
error detection module to extract pixel-level prior information from training
data with misalignment errors to the largest extent. The extracted prior
information is then used to construct a regularization term to constrain the
optimization of the unsupervised cycle-consistent GAN model, restricting its
solution space and thereby improving the performance and reliability of the
generator. We trained the MITIA model using six datasets containing different
misalignment errors and two well-aligned datasets. Subsequently, we compared
the proposed method with six other state-of-the-art image-to-image translation
methods. The results of both quantitative analysis and qualitative visual
inspection indicate that MITIA achieves superior performance compared to the
competing state-of-the-art methods, both on misaligned data and aligned data.
Authors' comments: This paper has been accepted as a research article by Medical Physics
Zhikai Li, Xuewen Liu, Dongrong Joe Fu, Jianquan Li, Qingyi Gu, Kurt Keutzer, Zhen Dong
The rapid advancement of visual generative models necessitates efficient and
reliable evaluation methods. Arena platform, which gathers user votes on model
comparisons, can rank models with human preferences. However, traditional Arena
methods, while established, require an excessive number of comparisons for
ranking to converge and are vulnerable to preference noise in voting,
suggesting the need for better approaches tailored to contemporary evaluation
challenges. In this paper, we introduce K-Sort Arena, an efficient and reliable
platform based on a key insight: images and videos possess higher perceptual
intuitiveness than texts, enabling rapid evaluation of multiple samples
simultaneously. Consequently, K-Sort Arena employs K-wise comparisons, allowing
K models to engage in free-for-all competitions, which yield much richer
information than pairwise comparisons. To enhance the robustness of the system,
we leverage probabilistic modeling and Bayesian updating techniques. We propose
an exploration-exploitation-based matchmaking strategy to facilitate more
informative comparisons. In our experiments, K-Sort Arena exhibits 16.3x faster
convergence compared to the widely used ELO algorithm. To further validate the
superiority and obtain a comprehensive leaderboard, we collect human feedback
via crowdsourced evaluations of numerous cutting-edge text-to-image and
text-to-video models. Thanks to its high efficiency, K-Sort Arena can
continuously incorporate emerging models and update the leaderboard with
minimal votes. Our project has undergone several months of internal testing and
is now available at https://huggingface.co/spaces/ksort/K-Sort-Arena
Authors' comments: CVPR 2025. Project page:
https://huggingface.co/spaces/ksort/K-Sort-Arena
Antón de la Fuente, Dan Jurafsky
This study asks how self-supervised speech models represent suprasegmental
categories like Mandarin lexical tone, English lexical stress, and English
phrasal accents. Through a series of probing tasks, we make layer-wise
comparisons of English and Mandarin 12 layer monolingual models. Our findings
suggest that 1) English and Mandarin wav2vec 2.0 models learn contextual
representations of abstract suprasegmental categories which are strongest in
the middle third of the network. 2) Models are better at representing features
that exist in the language of their training data, and this difference is
driven by enriched context in transformer blocks, not local acoustic
representation. 3) Fine-tuned wav2vec 2.0 improves performance in later layers
compared to pre-trained models mainly for lexically contrastive features like
tone and stress, 4) HuBERT and WavLM learn similar representations to wav2vec
2.0, differing mainly in later layer performance. Our results extend previous
understanding of how models represent suprasegmentals and offer new insights
into the language-specificity and contextual nature of these representations.
Authors' comments: 4 pages, 3 figures, to be published in Interspeech 2024 proceedings
Mirko Nardi, Lorenzo Valerio, Andrea Passarella
Federated Learning (FL) is a pivotal approach in decentralized machine learning, especially when data privacy is crucial and direct data sharing is impractical. While FL is typically associated with supervised learning, its potential in unsupervised scenarios is underexplored. This paper introduces a novel unsupervised federated learning methodology designed to identify the complete set of categories (global K) across multiple clients within label-free, non-uniform data distributions, a process known as Federated Clustering. Our approach, Federated Cluster-Wise Refinement (FedCRef), involves clients that collaboratively train models on clusters with similar data distributions. Initially, clients with diverse local data distributions (local K) train models on their clusters to generate compressed data representations. These local models are then shared across the network, enabling clients to compare them through reconstruction error analysis, leading to the formation of federated groups.In these groups, clients collaboratively train a shared model representing each data distribution, while continuously refining their local clusters to enhance data association accuracy. This iterative process allows our system to identify all potential data distributions across the network and develop robust representation models for each. To validate our approach, we compare it with traditional centralized methods, establishing a performance baseline and showcasing the advantages of our distributed solution. We also conduct experiments on the EMNIST and KMNIST datasets, demonstrating FedCRef's ability to refine and align cluster models with actual data distributions, significantly improving data representation precision in unsupervised federated settings.
Zhanzhong Pang, Fadime Sener, Shrinivas Ramasubramanian, Angela Yao
Procedural activity videos often exhibit a long-tailed action distribution
due to varying action frequencies and durations. However, state-of-the-art
temporal action segmentation methods overlook the long tail and fail to
recognize tail actions. Existing long-tail methods make class-independent
assumptions and struggle to identify tail classes when applied to temporal
segmentation frameworks. This work proposes a novel group-wise temporal logit
adjustment~(G-TLA) framework that combines a group-wise softmax formulation
while leveraging activity information and action ordering for logit adjustment.
The proposed framework significantly improves in segmenting tail actions
without any performance loss on head actions.
Authors' comments: Accepted by ECCV 2024
Jingcai Guo, Zhijie Rao, Zhi Chen, Song Guo, Jingren Zhou, Dacheng Tao
Zero-shot image recognition (ZSIR) aims to recognize and reason in unseen
domains by learning generalized knowledge from limited data in the seen domain.
The gist of ZSIR is constructing a well-aligned mapping between the input
visual space and the target semantic space, which is a bottom-up paradigm
inspired by the process by which humans observe the world. In recent years,
ZSIR has witnessed significant progress on a broad spectrum, from theory to
algorithm design, as well as widespread applications. However, to the best of
our knowledge, there remains a lack of a systematic review of ZSIR from an
element-wise perspective, i.e., learning fine-grained elements of data and
their inferential associations. To fill the gap, this paper thoroughly
investigates recent advances in element-wise ZSIR and provides a sound basis
for its future development. Concretely, we first integrate three basic ZSIR
tasks, i.e., object recognition, compositional recognition, and foundation
model-based open-world recognition, into a unified element-wise paradigm and
provide a detailed taxonomy and analysis of the main approaches. Next, we
summarize the benchmarks, covering technical implementations, standardized
datasets, and some more details as a library. Last, we sketch out related
applications, discuss vital challenges, and suggest potential future
directions.
Authors' comments: 20 pages, 6 figures, and 4 tables
Pengxiang Zhao, Hanyu Hu, Ping Li, Yi Zheng, Zhefeng Wang, Xiaoming Yuan
Pruning is a critical strategy for compressing trained large language models (LLMs), aiming at substantial memory conservation and computational acceleration without compromising performance. However, existing pruning methods often necessitate inefficient retraining for billion-scale LLMs or rely on heuristic methods such as the optimal brain surgeon framework, which degrade performance. In this paper, we introduce FISTAPruner, the first post-training pruner based on convex optimization models and algorithms. Specifically, we propose a convex optimization model incorporating $\ell_1$ norm to induce sparsity and utilize the FISTA solver for optimization. FISTAPruner incorporates an intra-layer cumulative error correction mechanism and supports parallel pruning. We comprehensively evaluate FISTAPruner on models such as OPT, LLaMA, LLaMA-2, and LLaMA-3 with 125M to 70B parameters under unstructured and 2:4 semi-structured sparsity, demonstrating superior performance over existing state-of-the-art methods across various language benchmarks.
Wenhao Li, Jie Zhou, Chuan Luo, Chao Tang, Kun Zhang, Shixiong Zhao
In the realm of modern mobile E-commerce, providing users with nearby
commercial service recommendations through location-based online services has
become increasingly vital. While machine learning approaches have shown promise
in multi-scene recommendation, existing methodologies often struggle to address
cold-start problems in unprecedented scenes: the increasing diversity of
commercial choices, along with the short online lifespan of scenes, give rise
to the complexity of effective recommendations in online and dynamic scenes. In
this work, we propose Scene-wise Adaptive Network (SwAN), a novel approach that
emphasizes high-performance cold-start online recommendations for new scenes.
Our approach introduces several crucial capabilities, including scene
similarity learning, user-specific scene transition cognition, scene-specific
information construction for the new scene, and enhancing the diverged logical
information between scenes. We demonstrate SwAN's potential to optimize dynamic
multi-scene recommendation problems by effectively online handling cold-start
recommendations for any newly arrived scenes. More encouragingly, SwAN has been
successfully deployed in Meituan's online catering recommendation service,
which serves millions of customers per day, and SwAN has achieved a 5.64% CTR
index improvement relative to the baselines and a 5.19% increase in daily order
volume proportion.
Authors' comments: 10 pages, 6 figures, accepted by Recsys 2024