Chenhui Deng, Zichao Yue, Cunxi Yu, Gokce Sarar, Ryan Carey, Rajeev Jain, Zhiru Zhang
While graph neural networks (GNNs) have gained popularity for learning
circuit representations in various electronic design automation (EDA) tasks,
they face challenges in scalability when applied to large graphs and exhibit
limited generalizability to new designs. These limitations make them less
practical for addressing large-scale, complex circuit problems. In this work we
propose HOGA, a novel attention-based model for learning circuit
representations in a scalable and generalizable manner. HOGA first computes
hop-wise features per node prior to model training. Subsequently, the hop-wise
features are solely used to produce node representations through a gated
self-attention module, which adaptively learns important features among
different hops without involving the graph topology. As a result, HOGA is
adaptive to various structures across different circuits and can be efficiently
trained in a distributed manner. To demonstrate the efficacy of HOGA, we
consider two representative EDA tasks: quality of results (QoR) prediction and
functional reasoning. Our experimental results indicate that (1) HOGA reduces
estimation error over conventional GNNs by 46.76% for predicting QoR after
logic synthesis; (2) HOGA improves 10.0% reasoning accuracy over GNNs for
identifying functional blocks on unseen gate-level netlists after complex
technology mapping; (3) The training time for HOGA almost linearly decreases
with an increase in computing resources.
Authors' comments: Published as a conference paper at Design Automation Conference (DAC)
2024
Qiao Han, Mingqian Li, Yao Yang, Yiteng Zhai
Block-wise missing data poses significant challenges in real-world data imputation tasks. Compared to scattered missing data, block-wise gaps exacerbate adverse effects on subsequent analytic and machine learning tasks, as the lack of local neighboring elements significantly reduces the interpolation capability and predictive power. However, this issue has not received adequate attention. Most SOTA matrix completion methods appeared less effective, primarily due to overreliance on neighboring elements for predictions. We systematically analyze the issue and propose a novel matrix completion method ``BlockEcho" for a more comprehensive solution. This method creatively integrates Matrix Factorization (MF) within Generative Adversarial Networks (GAN) to explicitly retain long-distance inter-element relationships in the original matrix. Besides, we incorporate an additional discriminator for GAN, comparing the generator's intermediate progress with pre-trained MF results to constrain high-order feature distributions. Subsequently, we evaluate BlockEcho on public datasets across three domains. Results demonstrate superior performance over both traditional and SOTA methods when imputing block-wise missing data, especially at higher missing rates. The advantage also holds for scattered missing data at high missing rates. We also contribute on the analyses in providing theoretical justification on the optimality and convergence of fusing MF and GAN for missing block data.
Mohammad Sadil Khan, Elona Dupont, Sk Aziz Ali, Kseniya Cherenkova, Anis Kacem, Djamila Aouada
Reverse engineering in the realm of Computer-Aided Design (CAD) has been a longstanding aspiration, though not yet entirely realized. Its primary aim is to uncover the CAD process behind a physical object given its 3D scan. We propose CAD-SIGNet, an end-to-end trainable and auto-regressive architecture to recover the design history of a CAD model represented as a sequence of sketch-and-extrusion from an input point cloud. Our model learns visual-language representations by layer-wise cross-attention between point cloud and CAD language embedding. In particular, a new Sketch instance Guided Attention (SGA) module is proposed in order to reconstruct the fine-grained details of the sketches. Thanks to its auto-regressive nature, CAD-SIGNet not only reconstructs a unique full design history of the corresponding CAD model given an input point cloud but also provides multiple plausible design choices. This allows for an interactive reverse engineering scenario by providing designers with multiple next-step choices along with the design process. Extensive experiments on publicly available CAD datasets showcase the effectiveness of our approach against existing baseline models in two settings, namely, full design history recovery and conditional auto-completion from point clouds.
Tianjie Ju, Weiwei Sun, Wei Du, Xinwei Yuan, Zhaochun Ren, Gongshen Liu
Previous work has showcased the intriguing capability of large language
models (LLMs) in retrieving facts and processing context knowledge. However,
only limited research exists on the layer-wise capability of LLMs to encode
knowledge, which challenges our understanding of their internal mechanisms. In
this paper, we devote the first attempt to investigate the layer-wise
capability of LLMs through probing tasks. We leverage the powerful generative
capability of ChatGPT to construct probing datasets, providing diverse and
coherent evidence corresponding to various facts. We employ $\mathcal V$-usable
information as the validation metric to better reflect the capability in
encoding context knowledge across different layers. Our experiments on
conflicting and newly acquired knowledge show that LLMs: (1) prefer to encode
more context knowledge in the upper layers; (2) primarily encode context
knowledge within knowledge-related entity tokens at lower layers while
progressively expanding more knowledge within other tokens at upper layers; and
(3) gradually forget the earlier context knowledge retained within the
intermediate layers when provided with irrelevant evidence. Code is publicly
available at https://github.com/Jometeorie/probing_llama.
Authors' comments: Accepted at LREC-COLING 2024 (Long Paper)
Jinxu Zhang
Understanding the contents of multimodal documents is essential to accurately extract relevant evidence and use it for reasoning. Existing document understanding models tend to generate answers with a single word or phrase directly, ignoring the source document's evidence and lacking interpretability. In this work, we address the lack of step-wise capabilities through data augmentation and extension. Specifically, We use Multi-modal Large Language Models (MLLMs), which have strong visual understanding and reasoning abilities, as data generators to generate step-wise question-and-answer pairs for document images and use a high-performance LLM as the error detector to filter out noisy data. This step-wise data generation pipeline is implemented using both template-based and few-shot methods. We then use the generated high-quality data to train a humanized document understanding and reasoning model, specifically designed to solve complex questions that require reasoning or multi-hop question answering, dubbed DocAssistant. Experimental results demonstrate the effectiveness and application value of step-wise generation, showing a 5 improvement on InfoVQA with complex layouts and a 7 improvement on ChartQA with complex reasoning, compared to directly generated answers. We hope our work highlights the potential of synthetic data and encourages further exploration of multi-modal document reasoning capabilities.
Yanan Wu, Jie Liu, Xingyuan Bu, Jiaheng Liu, Zhanhui Zhou, Yuanxing Zhang, Chenchen Zhang, Zhiqi Bai et al.
This paper introduces ConceptMath, a bilingual (English and Chinese),
fine-grained benchmark that evaluates concept-wise mathematical reasoning of
Large Language Models (LLMs). Unlike traditional benchmarks that evaluate
general mathematical reasoning with an average accuracy, ConceptMath
systematically organizes math problems under a hierarchy of math concepts, so
that mathematical reasoning can be evaluated at different granularity with
concept-wise accuracies. Based on our ConcepthMath, we evaluate a broad range
of LLMs, and we observe existing LLMs, though achieving high average accuracies
on traditional benchmarks, exhibit significant performance variations across
different math concepts and may even fail catastrophically on the most basic
ones. Besides, we also introduce an efficient fine-tuning strategy to enhance
the weaknesses of existing LLMs. Finally, we hope ConceptMath could guide the
developers to understand the fine-grained mathematical abilities of their
models and facilitate the growth of foundation models.
Authors' comments: The benchmark dataset will be released soon
Chen Shenglun, Zhang Hong, Ma XinZhu, Wang Zhihui, Li Haojie
Depth completion is a long-standing challenge in computer vision, where
classification-based methods have made tremendous progress in recent years.
However, most existing classification-based methods rely on pre-defined
pixel-shared and discrete depth values as depth categories. This representation
fails to capture the continuous depth values that conform to the real depth
distribution, leading to depth smearing in boundary regions. To address this
issue, we revisit depth completion from the clustering perspective and propose
a novel clustering-based framework called CluDe which focuses on learning the
pixel-wise and continuous depth representation. The key idea of CluDe is to
iteratively update the pixel-shared and discrete depth representation to its
corresponding pixel-wise and continuous counterpart, driven by the real depth
distribution. Specifically, CluDe first utilizes depth value clustering to
learn a set of depth centers as the depth representation. While these depth
centers are pixel-shared and discrete, they are more in line with the real
depth distribution compared to pre-defined depth categories. Then, CluDe
estimates offsets for these depth centers, enabling their dynamic adjustment
along the depth axis of the depth distribution to generate the pixel-wise and
continuous depth representation. Extensive experiments demonstrate that CluDe
successfully reduces depth smearing around object boundaries by utilizing
pixel-wise and continuous depth representation. Furthermore, CluDe achieves
state-of-the-art performance on the VOID datasets and outperforms
classification-based methods on the KITTI dataset.
Authors' comments: Published in IEEE TCSVT,15 pages,12 figures
Xiaosa Li, Runze Zhao, Chengyue Lu, Xiao Xiao, Wenbo Ding
Surface vibration tactile feedback is capable of conveying various semantic information to humans via the handheld electronic devices, like smartphone, touch panel,and game controller. However, covering the whole device contacting surface with dense actuator arrangement can affect its normal use, how to produce desired vibration patterns at any contact point with only several sparse actuators deployed on the handled device surface remains a significant challenge. In this work, we develop a tactile feedback board with only five actuators in the size of a smartphone, and achieve the precise vibration pattern production that can focus at any desired position all over the board. Specifically, we investigate the vibration characteristics of single passive coil actuator, and construct its vibration pattern model at any position on the feedback board surface. Optimal phase and amplitude modulation, found with the simulated annealing algorithm, is employed with five actuators in a sparse array. And all actuators' vibration patterns are superimposed linearly to synthetically generate different onboard vibration energy distribution for tactile sensing. Experiments demonstrated that for point-wise vibration pattern production on our tactile board achieved an average level of about 0.9 in the Structural Similarity Index Measure (SSIM) evaluation, when compared to the ideal single-point-focused target vibration pattern. The sparse actuator array can be easily embedded into usual handheld electronic devices, which shows a good significant implication for enriching their haptic interaction functionalities.
AprilPyone MaungMaung, Huy H. Nguyen, Hitoshi Kiya, Isao Echizen
We propose a method for generating spurious features by leveraging large-scale text-to-image diffusion models. Although the previous work detects spurious features in a large-scale dataset like ImageNet and introduces Spurious ImageNet, we found that not all spurious images are spurious across different classifiers. Although spurious images help measure the reliance of a classifier, filtering many images from the Internet to find more spurious features is time-consuming. To this end, we utilize an existing approach of personalizing large-scale text-to-image diffusion models with available discovered spurious images and propose a new spurious feature similarity loss based on neural features of an adversarially robust model. Precisely, we fine-tune Stable Diffusion with several reference images from Spurious ImageNet with a modified objective incorporating the proposed spurious-feature similarity loss. Experiment results show that our method can generate spurious images that are consistently spurious across different classifiers. Moreover, the generated spurious images are visually similar to reference images from Spurious ImageNet.
Yuji Roh, Qingyun Liu, Huan Gui, Zhe Yuan, Yujin Tang, Steven Euijong Whang, Liang Liu, Shuchao Bi et al.
Fine-tuning is becoming widely used for leveraging the power of pre-trained foundation models in new downstream tasks. While there are many successes of fine-tuning on various tasks, recent studies have observed challenges in the generalization of fine-tuned models to unseen distributions (i.e., out-of-distribution; OOD). To improve OOD generalization, some previous studies identify the limitations of fine-tuning data and regulate fine-tuning to preserve the general representation learned from pre-training data. However, potential limitations in the pre-training data and models are often ignored. In this paper, we contend that overly relying on the pre-trained representation may hinder fine-tuning from learning essential representations for downstream tasks and thus hurt its OOD generalization. It can be especially catastrophic when new tasks are from different (sub)domains compared to pre-training data. To address the issues in both pre-training and fine-tuning data, we propose a novel generalizable fine-tuning method LEVI, where the pre-trained model is adaptively ensembled layer-wise with a small task-specific model, while preserving training and inference efficiencies. By combining two complementing models, LEVI effectively suppresses problematic features in both the fine-tuning data and pre-trained model and preserves useful features for new tasks. Broad experiments with large language and vision models show that LEVI greatly improves fine-tuning generalization via emphasizing different views from fine-tuning data and pre-trained features.
Umut Cem Entok, Firas Laakom, Farhad Pakdaman, Moncef Gabbouj
Most scenes are illuminated by several light sources, where the traditional
assumption of uniform illumination is invalid. This issue is ignored in most
color constancy methods, primarily due to the complex spatial impact of
multiple light sources on the image. Moreover, most existing multi-illuminant
methods fail to preserve the smooth change of illumination, which stems from
spatial dependencies in natural images. Motivated by this, we propose a novel
multi-illuminant color constancy method, by learning pixel-wise illumination
maps caused by multiple light sources. The proposed method enforces smoothness
within neighboring pixels, by regularizing the training with the total
variation loss. Moreover, a bilateral filter is provisioned further to enhance
the natural appearance of the estimated images, while preserving the edges.
Additionally, we propose a label-smoothing technique that enables the model to
generalize well despite the uncertainties in ground truth. Quantitative and
qualitative experiments demonstrate that the proposed method outperforms the
state-of-the-art.
Authors' comments: Copyright 2024 IEEE - Submitted to IEEE ICIP 2024
Alexandra Saliba, Yuanchao Li, Ramon Sanabria, Catherine Lai
The efficacy of self-supervised speech models has been validated, yet the
optimal utilization of their representations remains challenging across diverse
tasks. In this study, we delve into Acoustic Word Embeddings (AWEs), a
fixed-length feature derived from continuous representations, to explore their
advantages in specific tasks. AWEs have previously shown utility in capturing
acoustic discriminability. In light of this, we propose measuring layer-wise
similarity between AWEs and word embeddings, aiming to further investigate the
inherent context within AWEs. Moreover, we evaluate the contribution of AWEs,
in comparison to other types of speech features, in the context of Speech
Emotion Recognition (SER). Through a comparative experiment and a layer-wise
accuracy analysis on two distinct corpora, IEMOCAP and ESD, we explore
differences between AWEs and raw self-supervised representations, as well as
the proper utilization of AWEs alone and in combination with word embeddings.
Our findings underscore the acoustic context conveyed by AWEs and showcase the
highly competitive SER accuracies by appropriately employing AWEs.
Authors' comments: Accepted to ICASSP2024 Self-supervision in Audio, Speech and Beyond
(SASB) workshop. First two authors contributed equally
Snir Ben Ovadia
We introduce the notion of tubular dimension, and give a formula for it. As an application we show that every invariant measure of a $C^{1+\gamma}$ diffeomorphism of a closed Riemannian manifold admits an asymptotic local product structure for conditional measures on intermediate foliations of unstable leaves. As a second application, we prove a bound on the gap between any two consecutive conditional entropies, in the form of volume growth. As a third application, for certain $C^\infty$ maps we compute all conditional entropies for the measure of maximal entropy; And in particular as a consequence, in a follow-up paper we compute the Hausdorff dimension of the equilibrium measure of holomorphic endomorphisms of $\mathbb{C}\mathbb{P}^k$, $k\geq 1$, giving a solution to the Binder-DeMarco conjecture, and answering a question of Forn{\ae}ss and Sibony.
Danning Lao, Qi Liu, Jiazi Bu, Junchi Yan, Wei Shen
As computer vision continues to advance and finds widespread applications across various domains, the need for interpretability in deep learning models becomes paramount. Existing methods often resort to post-hoc techniques or prototypes to explain the decision-making process, which can be indirect and lack intrinsic illustration. In this research, we introduce ViTree, a novel approach for fine-grained visual categorization that combines the popular vision transformer as a feature extraction backbone with neural decision trees. By traversing the tree paths, ViTree effectively selects patches from transformer-processed features to highlight informative local regions, thereby refining representations in a step-wise manner. Unlike previous tree-based models that rely on soft distributions or ensembles of paths, ViTree selects a single tree path, offering a clearer and simpler decision-making process. This patch and path selectivity enhances model interpretability of ViTree, enabling better insights into the model's inner workings. Remarkably, extensive experimentation validates that this streamlined approach surpasses various strong competitors and achieves state-of-the-art performance while maintaining exceptional interpretability which is proved by multi-perspective methods. Code can be found at https://github.com/SJTU-DeepVisionLab/ViTree.
Chak Fong Chong, Xinyi Fang, Jielong Guo, Yapeng Wang, Wei Ke, Chan-Tong Lam, Sio-Kei Im
Large-scale image datasets are often partially labeled, where only a few categories' labels are known for each image. Assigning pseudo-labels to unknown labels to gain additional training signals has become prevalent for training deep classification models. However, some pseudo-labels are inevitably incorrect, leading to a notable decline in the model classification performance. In this paper, we propose a novel method called Category-wise Fine-Tuning (CFT), aiming to reduce model inaccuracies caused by the wrong pseudo-labels. In particular, CFT employs known labels without pseudo-labels to fine-tune the logistic regressions of trained models individually to calibrate each category's model predictions. Genetic Algorithm, seldom used for training deep models, is also utilized in CFT to maximize the classification performance directly. CFT is applied to well-trained models, unlike most existing methods that train models from scratch. Hence, CFT is general and compatible with models trained with different methods and schemes, as demonstrated through extensive experiments. CFT requires only a few seconds for each category for calibration with consumer-grade GPUs. We achieve state-of-the-art results on three benchmarking datasets, including the CheXpert chest X-ray competition dataset (ensemble mAUC 93.33%, single model 91.82%), partially labeled MS-COCO (average mAP 83.69%), and Open Image V3 (mAP 85.31%), outperforming the previous bests by 0.28%, 2.21%, 2.50%, and 0.91%, respectively. The single model on CheXpert has been officially evaluated by the competition server, endorsing the correctness of the result. The outstanding results and generalizability indicate that CFT could be substantial and prevalent for classification model development. Code is available at: https://github.com/maxium0526/category-wise-fine-tuning.
Erik Duse
In this work we provide a survey of Fuglede's flux extensions of first order
partial differential operators, a concept largely forgotten today. A long the
way we also survey the classical weak and strong extensions of PDE operators
and the works of Friedrichs and H\"ormander. We give several applications of
this theory showing its usefulness, as well as connecting it to more recent
developments in connection to various sharp versions of the divergence theorem.
In particular, we use it to prove a generalization of Morera's theorem valid
for general first order operators. Using this theory we also prove a new local
limit formula for the maximal extension of a first order operator. We initiate
a study of this limit and connect it to the wave cone of the operator, a
concept that first arose in the theory of compensated compactness. Hopefully,
this will contribute to a rival of Fuglede's beautiful ideas.
Authors' comments: Feedback is welcome! Typos fixed
Nachuan Ma, Rui Fan, Lihua Xie
Over the past decade, automated methods have been developed to detect cracks more efficiently, accurately, and objectively, with the ultimate goal of replacing conventional manual visual inspection techniques. Among these methods, semantic segmentation algorithms have demonstrated promising results in pixel-wise crack detection tasks. However, training such data-driven algorithms requires a large amount of human-annotated datasets with pixel-level annotations, which is a highly labor-intensive and time-consuming process. Moreover, supervised learning-based methods often struggle with poor generalization ability in unseen datasets. Therefore, we propose an unsupervised pixel-wise road crack detection network, known as UP-CrackNet. Our approach first generates multi-scale square masks and randomly selects them to corrupt undamaged road images by removing certain regions. Subsequently, a generative adversarial network is trained to restore the corrupted regions by leveraging the semantic context learned from surrounding uncorrupted regions. During the testing phase, an error map is generated by calculating the difference between the input and restored images, which allows for pixel-wise crack detection. Our comprehensive experimental results demonstrate that UP-CrackNet outperforms other general-purpose unsupervised anomaly detection algorithms, and exhibits comparable performance and superior generalizability when compared with state-of-the-art supervised crack segmentation algorithms. Our source code is publicly available at mias.group/UP-CrackNet.
Haiyang Peng, Yi Zhan, Benkang Wang, Hongtao Zhang
In High-definition (HD) maps, lane elements constitute the majority of components and demand stringent localization requirements to ensure safe vehicle navigation. Vision lane detection with LiDAR position assignment is a prevalent method to acquire initial lanes for HD maps. However, due to incorrect vision detection and coarse camera-LiDAR calibration, initial lanes may deviate from their true positions within an uncertain range. To mitigate the need for manual lane correction, we propose a patch-wise lane correction network (PLCNet) to automatically correct the positions of initial lane points in local LiDAR images that are transformed from point clouds. PLCNet first extracts multi-scale image features and crops patch (ROI) features centered at each initial lane point. By applying ROIAlign, the fix-sized ROI features are flattened into 1D features. Then, a 1D lane attention module is devised to compute instance-level lane features with adaptive weights. Finally, lane correction offsets are inferred by a multi-layer perceptron and used to correct the initial lane positions. Considering practical applications, our automatic method supports merging local corrected lanes into global corrected lanes. Through extensive experiments on a self-built dataset, we demonstrate that PLCNet achieves fast and effective initial lane correction.
Xinliang Frederick Zhang, Carter Blum, Temma Choji, Shalin Shah, Alakananda Vempala
Structural extraction of events within discourse is critical since it avails
a deeper understanding of communication patterns and behavior trends. Event
argument extraction (EAE), at the core of event-centric understanding, is the
task of identifying role-specific text spans (i.e., arguments) for a given
event. Document-level EAE (DocEAE) focuses on arguments that are scattered
across an entire document. In this work, we explore open-source Large Language
Models (LLMs) for DocEAE, and propose ULTRA, a hierarchical framework that
extracts event arguments more cost-effectively. Further, it alleviates the
positional bias issue intrinsic to LLMs. ULTRA sequentially reads text chunks
of a document to generate a candidate argument set, upon which non-pertinent
candidates are dropped through self-refinement. We introduce LEAFER to address
the challenge LLMs face in locating the exact boundary of an argument. ULTRA
outperforms strong baselines, including strong supervised models and ChatGPT,
by 9.8% when evaluated by Exact Match (EM).
Authors' comments: ACL'24 Findings
Jiabin Lin, Shana Moothedath
We present conservative distributed multi-task learning in stochastic linear contextual bandits with heterogeneous agents. This extends conservative linear bandits to a distributed setting where M agents tackle different but related tasks while adhering to stage-wise performance constraints. The exact context is unknown, and only a context distribution is available to the agents as in many practical applications that involve a prediction mechanism to infer context, such as stock market prediction and weather forecast. We propose a distributed upper confidence bound (UCB) algorithm, DiSC-UCB. Our algorithm constructs a pruned action set during each round to ensure the constraints are met. Additionally, it includes synchronized sharing of estimates among agents via a central server using well-structured synchronization steps. We prove the regret and communication bounds on the algorithm. We extend the problem to a setting where the agents are unaware of the baseline reward. For this setting, we provide a modified algorithm, DiSC-UCB2, and we show that the modified algorithm achieves the same regret and communication bounds. We empirically validated the performance of our algorithm on synthetic data and real-world Movielens-100K data.