Hongyu Zhang, Yufan Deng, Zilin Pan, Peng-Tao Jiang, Bo Li, Qibin Hou, Zhiyang Dou, Zhen Dong et al.
Generating high-quality videos from complex temporal descriptions that contain multiple sequential actions is a key unsolved problem. Existing methods are constrained by an inherent trade-off: using multiple short prompts fed sequentially into the model improves action fidelity but compromises temporal consistency, while a single complex prompt preserves consistency at the cost of prompt-following capability. We attribute this problem to two primary causes: 1) temporal misalignment between video content and the prompt, and 2) conflicting attention coupling between motion-related visual objects and their associated text conditions. To address these challenges, we propose a novel, training-free attention mechanism, Temporal-wise Separable Attention (TS-Attn), which dynamically rearranges attention distribution to ensure temporal awareness and global coherence in multi-event scenarios. TS-Attn can be seamlessly integrated into various pre-trained text-to-video models, boosting StoryEval-Bench scores by 33.5% and 16.4% on Wan2.1-T2V-14B and Wan2.2-T2V-A14B with only a 2% increase in inference time. It also supports plug-and-play usage across models for multi-event image-to-video generation. The source code and project page are available at https://github.com/Hong-yu-Zhang/TS-Attn.
Authors' comments: ICLR 2026, code available at: https://github.com/Hong-yu-Zhang/TS-Attn
Haiyang Wu, Juan J. Gonzales Torres, George Vosselman, Ville Lehtola
Frame-wise semantic segmentation of indoor lidar scans is a fundamental step toward higher-level 3D scene understanding and mapping applications. However, acquiring frame-wise ground truth for training deep learning models is costly and time-consuming. This challenge is largely addressed, for imagery, by Visual Foundation Models (VFMs) which segment image frames. The same VFMs may be used to train a lidar scan frame segmentation model via a 2D-to-3D distillation pipeline. The success of such distillation has been shown for autonomous driving scenes, but not yet for indoor scenes. Here, we study the feasibility of repeating this success for indoor scenes, in a frame-wise distillation manner by coupling each lidar scan with a VFM-processed camera image. The evaluation is done using indoor SLAM datasets, where pseudo-labels are used for downstream evaluation. Also, a small manually annotated lidar dataset is provided for validation, as there are no other lidar frame-wise indoor datasets with semantics. Results show that the distilled model achieves up to 56% mIoU under pseudo-label evaluation and around 36% mIoU with real-label, demonstrating the feasibility of cross-modal distillation for indoor lidar semantic segmentation without manual annotations.
Junjie Wen, Junlin He, Fei Ma, Jinqiang Cui
Accurate open-vocabulary 3D scene understanding requires semantic representations that are both language-aligned and spatially precise at the pixel level, while remaining scalable when lifted to 3D space. However, existing representations struggle to jointly satisfy these requirements, and densely propagating pixel-wise semantics to 3D often results in substantial redundancy, leading to inefficient storage and querying in large-scale scenes. To address these challenges, we present \emph{PLAF}, a Pixel-wise Language-Aligned Feature extraction framework that enables dense and accurate semantic alignment in 2D without sacrificing open-vocabulary expressiveness. Building upon this representation, we further design an efficient semantic storage and querying scheme that significantly reduces redundancy across both 2D and 3D domains. Experimental results show that \emph{PLAF} provides a strong semantic foundation for accurate and efficient open-vocabulary 3D scene understanding. The codes are publicly available at https://github.com/RockWenJJ/PLAF.
Authors' comments: Accepted by ICCA 2026
Xiang Xia, Wuyang Zhang, Jiazheng Liu, Cheng Yan, Yanyong Zhang
Diffusion language models (DLMs) have emerged as a promising alternative to autoregressive language generation due to their potential for parallel decoding and global refinement of the entire sequence. To unlock this potential, DLM inference must carefully balance generation quality and decoding speed. Recent block-wise DLM decoding methods improve this trade-off by performing diffusion-based decoding sequentially in blocks. However, existing methods typically rely on fixed block schedules or current-step local signals to determine block boundaries, and use conservative confidence-based parallel decoding to avoid conflicts, limiting the quality-speed trade-off. In this paper, we argue that block-wise DLM inference requires more suitable signals for its two core decisions: cross-step signals for determining block boundaries, and token-level conflict signals for parallel decoding. Based on this view, we propose DepCap, a training-free framework for efficient block-wise DLM inference. Specifically, DepCap instantiates the cross-step signal as the influence of the last decoded block and uses it to adaptively determine how far the next block should extend, while identifying a conflict-free subset of tokens for safe parallel decoding within each block, enabling substantial inference acceleration with negligible quality degradation. DepCap is a plug-and-play method applicable to various DLMs, and compatible with existing KV-cache strategies for block-wise DLM. An information-theoretic analysis further suggests that the cumulative last-block influence on a candidate block is approximately additive across tokens, supporting the proposed block-partitioning criterion. Experimental results show that DepCap achieves favorable speed-quality trade-offs across multiple DLM backbones and reasoning and coding benchmarks, with up to 5.63$\times$ speedup without significant performance degradation.
Jingyuan Wang, Meiyan Xu, Zhihao Jia, Chenyu Liu, Xinliang Zhou, Ziyu Jia, Yong Li, Fang Li et al.
EEG foundation models (FMs) achieve strong cross-subject and cross-task generalization but impose substantial computational and memory costs that hinder deployment on embedded BCI systems. Knowledge distillation is a natural solution; however, conventional methods fail for EEG FMs because task-relevant semantics are often distributed across intermediate layers, and aggressive dimensionality reduction can distort oscillatory structure via representational collapse and aliasing. To address these challenges, we propose DLink (Distilling Layer-wise and Dominant Knowledge), a unified framework for transferring knowledge from large EEG FMs to compact students with three key innovations: (1) a dynamic Router that adaptively aggregates teacher layers to capture dominant intermediate representations; (2) an EEG MiC student with a Mimic-then-Compress pipeline, which inherits high-dimensional teacher features and then applies structured spatio-temporal compression to avoid a heavy classification head; and (3) spectral distillation that aligns teacher-student representations in the frequency domain to regularize compression and mitigate aliasing and temporal jitter. Experiments on four EEG benchmarks show that DLink enables compact students to outperform lightweight baselines while approaching fully fine-tuned FM performance at substantially lower model size and inference cost.
Chenyan Liu, Yun Lin, Yuhuan Huang, Jiaxin Chang, Binhang Qi, Bo Jiang, Zhiyong Huang, Jin Song Dong
In industrial and open-source software engineering tasks, developers often perform project-wise code editing tasks, including feature enhancement, refactoring, and bug fixing, where the leading AI models are expected to support the productivity. Hence, researchers and practitioners have proposed and adopted many LLM-based solutions to facilitate their real-world development. However, they largely suffer from the balance among predicting scope, accuracy, and efficiency. For example, solutions like Cursor achieve high accuracy only in a local editing scope while its performance drops on cross-file edits. In contrast, solutions like CoEdPilot exhibit efficiency limitations when used to predict project-wise edits.
In this work, we propose TRACE (Tool-integrated RecommendAtion for Code Editing), a novel subsequent code editing solution to push the boundary of scope, accuracy, and efficiency. Our rationale lies in that code edits are triggered for either semantic or syntactic reasons. Therefore, TRACE predicts subsequent edits by interleaving neural-based induction for semantic edit prediction and tool-based deduction for syntactic edit prediction. The tools can be any IDE facilities, such as refactoring tools (e.g., rename) or linting tools (e.g., use-def), providing decent performance of deducing edit-location and edit-generation. Technically, we address the challenge of (1) when to interleave between neural-based and tool-based prediction and (2) how to further improve the performance of neural-based prediction. As for the former, we learn a neural model to detect when to invoke IDE editing tools. As for the latter, we propose a novel and fine-grained editing representation to further boost the performance of neural editing models. ......
Authors' comments: ASE 2025 conference paper, 13 pages
Haocheng Xi, Harman Singh, Yuezhou Hu, Coleman Hooper, Rishabh Tiwari, Aditya Tomar, Minjae Lee, Wonjun Kang et al.
Block-wise diffusion language models (DLMs) generate multiple tokens in any order, offering a promising alternative to the autoregressive decoding pipeline. However, they still remain bottlenecked by memory-bound attention in long-context scenarios. Naive sparse attention fails on DLMs due to a KV Inflation problem, where different queries select different prefix positions, making the union of accessed KV pages large. To address this, we observe that between consecutive denoising steps, only a small fraction of active tokens exhibit significant hidden-state changes, while the majority of stable tokens remain nearly constant. Based on this insight, we propose LOSA (Locality-aware Sparse Attention), which reuses cached prefix-attention results for stable tokens and applies sparse attention only to active tokens. This substantially shrinks the number of KV indices that must be loaded, yielding both higher speedup and higher accuracy. Across multiple block-wise DLMs and benchmarks, LOSA preserves near-dense accuracy while significantly improving efficiency, achieving up to +9 points in average accuracy at aggressive sparsity levels while maintaining 1.54x lower attention density. It also achieves up to 4.14x attention speedup on RTX A6000 GPUs, demonstrating the effectiveness of the proposed method.
Authors' comments: 16 pages, 11 figures, 6 tables
Aryan Singh Dalal, Maria Baloch, Asiyah Yu Lin, Anna Maria Masci, Kathleen M. Jagodnik, Hande Kucuk McGinty
The Semantic Web standardizes concept meaning for humans and machines, enabling machine-operable content and consistent interpretation that improves advanced analytics. Reusing ontologies speeds development and enforces consistency, yet selecting the optimal choice is challenging because authors lack systematic selection criteria and often rely on intuition that is difficult to justify, limiting reuse. To solve this, WiseOWL is proposed, a methodology with scoring and guidance to select ontologies for reuse. It scores four metrics: (i) Well-Described, measuring documentation coverage; (ii) Well-Defined, using state-of-the-art embeddings to assess label-definition alignment; (iii) Connection, capturing structural interconnectedness; and (iv) Hierarchical Breadth, reflecting hierarchical balance. WiseOWL outputs normalized 0-10 scores with actionable feedback. Implemented as a Streamlit app, it ingests OWL format, converts to RDF Turtle, and provides interactive visualizations. Evaluation across six ontologies, including the Plant Ontology (PO), Gene Ontology (GO), Semanticscience Integrated Ontology (SIO), Food Ontology (FoodON), Dublin Core (DC), and GoodRelations, demonstrates promising effectiveness.
Authors' comments: 7 pages, 2 figures. Submitted to a conference
Suyoung Kim, Sunghyun Wee, Hyeonjin Kim, Kyomin Hwang, Hyunho Lee, Nojun Kwak
Rotation-based Post-Training Quantization (PTQ) has emerged as a promising solution for mitigating activation outliers in the quantization of Large Language Models (LLMs). Global rotation methods achieve inference efficiency by fusing activation rotations into attention and FFN blocks, but suffer from limited expressivity as they are constrained to use a single learnable rotation matrix across all layers. To tackle this, layer-wise transformation methods emerged, achieving superior accuracy through localized adaptation. However, layer-wise methods cannot fuse activation rotation matrices into weights, requiring online computations and causing significant overhead. In this paper, we propose ReSpinQuant, a quantization framework that resolves such overhead by leveraging offline activation rotation fusion and matching basis using efficient residual subspace rotation. This design reconciles the high expressivity of layer-wise adaptation with only negligible inference overhead. Extensive experiments on W4A4 and W3A3 quantization demonstrate that ReSpinQuant achieves state-of-the-art performance, outperforming global rotation methods and matching the accuracy of computationally expensive layer-wise methods with minimal overhead.
Prasanjit Dubey, Xiaoming Huo
Simultaneously testing $K$ hypotheses while controlling the family-wise error rate is a fundamental problem in statistics. Existing procedures (Bonferroni, Holm, Hochberg, Hommel) provide valid control but sacrifice power, increasingly so as $K$ grows, because they base decisions on marginal $p$-value ranks rather than the joint likelihood. Rosset et al. (2022) formulated the most powerful family-wise-error-rate-controlling test as a dual program and proved the existence of an optimal dual vector $μ^*$, but left its computation as an open problem. We solve this problem for $K$ exchangeable hypotheses. The key insight is that the family-wise error rate constraint coefficients $b_{l,k}(\vec{u})$ admit closed-form expressions through elementary symmetric polynomials of the likelihood-ratio values $g(u_1), \ldots, g(u_K)$. This algebraic structure implies a global monotonicity theorem: the target functions $F_γ(μ) = {\rm FWER}_γ(\vec{D}^μ)$ are simultaneously non-increasing in every component of $μ$, for arbitrary $K$, which guarantees unique coordinate-wise roots and enables a bisection-based coordinate-descent algorithm with $O(\log \varepsilon^{-1})$ convergence rate. The relative power gain over Hommel's method grows from 15\% at $K{=}3$ to 84\% at $K{=}12$. Applications to replication studies, a clinical trial, and a replicability assessment illustrate both the power gains and the role of the exchangeability assumption.
Hanwen Liu
Motivated by Hartogs' theorem on separate holomorphicity, we establish two rigidity results in complex geometry.
First, we prove a lemma on regularity for weak solutions of algebraic differential equations, namely, we show that a fiber-wise distributional $L^2$ solution to an algebraic linear differential equation automatically upgrades to a global holomorphic solution on the total space, assuming the equation admits a compatible $L^2$ distributional solution along a transverse compact complex submanifold that maps birationally to the base.
Second, for a compact Kobayashi hyperbolic manifold fibered over $\mathbb{P}^1$, we prove that a continuous, fiber-wise holomorphic map of degree $1$ to a projective variety is a bi-holomorphic isomorphism, provided it is injective on a very ample hypersurface.
Authors' comments: 7 pages, 0 figure
Zhiqi Yang, Jin-Liang Xiao, Shan Yin, Liang-Jian Deng, Gemine Vivone
Pansharpening aims to generate high-resolution multispectral (HRMS) images by fusing low-resolution multispectral (LRMS) and high-resolution panchromatic (PAN) images while preserving both spectral and spatial information. Although deep learning (DL)-based pansharpening methods achieve impressive performance, they require high training cost and large datasets, and often degrade when the test distribution differs from training, limiting generalization. Recent zero-shot methods, trained on a single PAN/LRMS pair, offer strong generalization but suffer from limited fusion quality, high computational overhead, and slow convergence. To address these issues, we propose FMG-Pan, a fast and generalizable model-guided instance-wise adaptation framework for real-world pansharpening, achieving both cross-sensor generality and rapid training-inference. The framework leverages a pretrained model to guide a lightweight adaptive network through joint optimization with spectral and physical fidelity constraints. We further design a novel physical fidelity term to enhance spatial detail preservation. Extensive experiments on real-world datasets under both intra- and cross-sensor settings demonstrate state-of-the-art performance. On the WorldView-3 dataset, FMG-Pan completes training and inference for a 512x512x8 image within 3 seconds on an RTX 3090 GPU, significantly faster than existing zero-shot methods, making it suitable for practical deployment.
Kaiyuan Tian, Yu Tang, Gongqingjian Jiang, Baihui Liu, Yifu Gao, Xialin Su, Linbo Qiao, Dongsheng Li
Full-parameter fine-tuning of large language models is constrained by substantial GPU memory requirements. Low-rank adaptation methods mitigate this challenge by updating only a subset of parameters. However, these approaches often limit model expressiveness and yield lower performance than full-parameter fine-tuning. Layer-wise fine-tuning methods have emerged as an alternative, enabling memory-efficient training through static layer importance sampling strategies. However, these methods overlook variations in layer importance across tasks and training stages, resulting in suboptimal performance on downstream tasks. To address these limitations, we propose GRASS, a gradient-based adaptive layer-wise importance sampling framework. GRASS utilizes mean gradient norms as a task-aware and training-stage-aware metric for estimating layer importance. Furthermore, GRASS adaptively adjusts layer sampling probabilities through an adaptive training strategy. We also introduce a layer-wise optimizer state offloading mechanism that overlaps computation and communication to further reduce memory usage while maintaining comparable training throughput. Extensive experiments across multiple models and benchmarks demonstrate that GRASS consistently outperforms state-of-the-art methods, achieving an average accuracy improvement of up to 4.38 points and reducing memory usage by up to 19.97\%.
Authors' comments: Accepted by ACL 2026 Findings
Anuvab Sen, Mir Sayeed Mohammad, Saibal Mukhopadhyay
This paper presents RAVEN, a computationally efficient deep learning architecture for FMCW radar perception. The method processes raw ADC data in a chirp-wise streaming manner, preserves MIMO structure through independent receiver state-space encoders, and uses a learnable cross-antenna mixing module to recover compact virtual-array features. It also introduces an early-exit mechanism so the model can make decisions using only a subset of chirps when the latent state has stabilized. Across automotive radar benchmarks, the approach reports strong object detection and BEV free-space segmentation performance while substantially reducing computation and end-to-end latency compared with conventional frame-based radar pipelines.
Authors' comments: CVPR submission / conference paper
Tomoki Sadatoshi, Antonis Papachristodoulou, Yutaka Hori
Moment dynamics in stochastic chemical kinetics often involve an infinite chain of coupled equations, where lower-order moments depend on higher-order ones, making them analytically intractable. Moment bounding via semidefinite programming provides guaranteed upper and lower bounds on stationary moments. However, this formulation suffers from the rapidly growing size of semidefinite constraints due to the combinatorial growth of moments with the number of molecular species. In this paper, we propose a sparsity-exploiting matrix decomposition method for semidefinite constraints in stationary moment bounding problems to reduce the computational cost of the resulting semidefinite programs. Specifically, we characterize the sparsity structure of moment equations, where each reaction involves only a subset of variables determined by its reactants, and exploit this structure to decompose the semidefinite constraints into smaller ones. We demonstrate that the resulting formulation reduces the computational cost of the optimization problem while providing practically useful bounds.
Yuan-Hao Wei
This paper presents a Structured Source-Wise Adaptive Diffusion Framework for linear and nonlinear blind source separation. The framework interprets each latent dimension as a source component and assigns to it an individual adaptive diffusion mechanism, thereby establishing source-wise latent modeling rather than relying on a single shared latent prior. The resulting formulation learns source recovery and the mixing/reconstruction process jointly within a unified end-to-end objective, allowing model parameters and latent sources to adapt simultaneously during training. This yields a common framework for both linear and nonlinear blind source separation. In the present instantiation, each source is further equipped with its own adaptive Gaussian process (GP) prior to impose source-wise temporal structure on the latent trajectories, while the overall framework is not restricted to Gaussian process priors and can in principle accommodate other structured source priors. The proposed model thus provides a general structured diffusion-based route to unsupervised source recovery, with potential relevance beyond blind source separation to interpretable latent modeling, source-wise disentanglement, and potentially identifiable nonlinear latent-variable learning under appropriate structural conditions.
Daniel Andreas Moj
We show that cascade-free counting from carry theory is a special case of a general transfer matrix construction. For any binary stateful digit-wise operation with GEN/PROP/KILL decomposition, the number of cascade-free sequences of length $L$ depends on only two parameters: the alphabet size $N$ and the product $d = |\text{GEN}| \cdot |\text{PROP}|$. The resulting sequence satisfies $a(L) = N a(L-1) - d a(L-2)$ and equals a scaled Chebyshev polynomial of the second kind with coupling parameter $x = N/(2\sqrt{d}) \geq 1$. We instantiate this for digit-wise addition and doubling in base $p$. For odd primes the exact relation $a_{\text{carry}}(L) = p^L a_{\text{dbl}}(L)$ holds. For $p = 3$ the cascade-free doubling count equals the Fibonacci bisection $F(2L+2)$ via $U_L(3/2) = F(2L+2)$ (OEIS A001906); we are not aware of this interpretation in the existing literature. We analyse the dispersion index $D = \text{Var}(ν)/E[ν]$ of the state count for uniformly distributed inputs. For symmetric chains ($g = k$) the Poisson transition $D_\infty = 1$ occurs at $μ= 1/3$, corresponding to base 3 where the Fibonacci bisection appears. The finite Poisson transition point $μ^*(L)$ decreases strictly to $1/3$ with rate $1/(6L) + O(1/L^2)$. We generalise to state spaces $|S| > 2$ via state avoidance. The restricted transfer matrix has dimension $s-1$; the Chebyshev representation persists for $|S| = 3$.
Authors' comments: 20 pages, 5 tables. Submitted to Advances in Applied Mathematics
Chenghao Yue, Zhiyuan Ma, Zhongye Xia, Xinche Zhang, Yisi Zhang, Xinke Shen, Sen Song
Electroencephalography (EEG) provides a non-invasive window into brain activity, offering high temporal resolution crucial for understanding and interacting with neural processes through brain-computer interfaces (BCIs). Current dual-stream neural networks for EEG often process temporal and spatial features independently through parallel branches, delaying their integration until a final, late-stage fusion. This design inherently leads to an "information silo" problem, precluding intermediate cross-stream refinement and hindering spatial-temporal decompositions essential for full feature utilization. We propose LI-DSN, a layer-wise interactive dual-stream network that facilitates progressive, cross-stream communication at each layer, thereby overcoming the limitations of late-fusion paradigms. LI-DSN introduces a novel Temporal-Spatial Integration Attention (TSIA) mechanism, which constructs a Spatial Affinity Correlation Matrix (SACM) to capture inter-electrode spatial structural relationships and a Temporal Channel Aggregation Matrix (TCAM) to integrate cosine-gated temporal dynamics under spatial guidance. Furthermore, we employ an adaptive fusion strategy with learnable channel weights to optimize the integration of dual-stream features. Extensive experiments across eight diverse EEG datasets, encompassing motor imagery (MI) classification, emotion recognition, and steady-state visual evoked potentials (SSVEP), consistently demonstrate that LI-DSN significantly outperforms 13 state-of-the-art (SOTA) baseline models, showcasing its superior robustness and decoding performance. The code will be publicized after acceptance.
Xiang Yang, Feifei Li, Mi Zhang, Geng Hong, Xiaoyu You, Min Yang
Recent Text-to-Image (T2I) models based on rectified-flow transformers (e.g., SD3, FLUX) achieve high generative fidelity but remain vulnerable to unsafe semantics, especially when triggered by multi-token interactions. Existing mitigation methods largely rely on fine-tuning or attention modulation for concept unlearning; however, their expensive computational overhead and design tailored to U-Net-based denoisers hinder direct adaptation to transformer-based diffusion models (e.g., MMDiT). In this paper, we conduct an in-depth analysis of the attention mechanism in MMDiT and find that unsafe semantics concentrate within interpretable, low-dimensional subspaces at head level, where a finite set of safety-critical heads is responsible for unsafe feature extraction. We further observe that perturbing the Rotary Positional Embedding (RoPE) applied to the query and key vectors can effectively modify some specific concepts in the generated images. Motivated by these insights, we propose SafeRoPE, a lightweight and fine-grained safe generation framework for MMDiT. Specifically, SafeRoPE first constructs head-wise unsafe subspaces by decomposing unsafe embeddings within safety-critical heads, and computes a Latent Risk Score (LRS) for each input vector via projection onto these subspaces. We then introduce head-wise RoPE perturbations that can suppress unsafe semantics without degrading benign content or image quality. SafeRoPE combines both head-wise LRS and RoPE perturbations to perform risk-specific head-wise rotation on query and key vector embeddings, enabling precise suppression of unsafe outputs while maintaining generation fidelity. Extensive experiments demonstrate that SafeRoPE achieves SOTA performance in balancing effective harmful content mitigation and utility preservation for safe generation of MMDiT. Codes are available at https://github.com/deng12yx/SafeRoPE.
Authors' comments: CVPR26
Anika Goel, Samir Salim, Sara L. Ellison, Shobita Satyapal, Sheyda Salehirad, Robert W. Bickley, Christopher J. Agostino
In this paper, we investigate the robustness of WISE mid-IR color selection (W1-W2) for identifying obscured (Type 2) active galactic nuclei (AGNs) at low redshift (z<0.3), using a sample of ~360,000 SDSS galaxies classified via emission lines into Seyfert 2 (Sy2), LINER, and star-forming (BPT-SF) galaxies. We find that the K-correction is essential to remove non-AGN contamination, and once applied the simple W1-W2>0.5 selection emerges as optimal in terms of purity and completeness of AGN selection. However, we confirm that even this lenient cut selects only ~13% of Sy2 galaxies and that achieving W1-W2>0.5 requires AGN contributing >75% of the total infrared luminosity, which is uncommon. Although mid-IR-selected Sy2s tend to be luminous, the high [OIII] luminosity does not guarantee red W1-W2 (nor does any other tested global or NLR-scale parameter), suggesting the critical role of obscuration on smaller scales. <1% of BPT-SF systems (but making ~20% of all mid-IR selected galaxies) exhibit W1-W2>0.5 colors. Such colors cannot be reproduced by models of star-heated dust alone. Red BPT-SFs tend to have higher W4 luminosities than expected from SF, indicating true AGNs. Intriguingly, mid-IR AGNs in massive bulges ($M_{\mathrm{bulge}} \gtrsim 10^{10} M_{\odot}$) predominantly (84%) manifest themselves as BPT-AGNs, whereas those in low-mass bulges ($\lesssim 10^{10} M_{\odot}$) mostly (60%) manifest as BPT-SF. This BPT-AGN vs.\ BPT-SF dichotomy does not extend to total stellar mass. We conclude that although the mid-IR AGN selection is incomplete, its strength lies in identifying optically inconspicuous AGNs with low-mass bulges, regardless of the total mass.
Authors' comments: Submitted to ApJ. Comments welcome