Tao Guo, Ruida Zhou, Chao Tian
In a private information retrieval (PIR) system, the user needs to retrieve one of the possible messages from a set of storage servers, but wishes to keep the identity of requested message private from any given server. Existing efforts in this area have made it clear that the efficiency of the retrieval will be impacted significantly by the amount of the storage space allowed at the servers. In this work, we consider the tradeoff between the storage cost and the retrieval cost. We first present three fundamental results: 1) a regime-wise 2-approximate characterization of the optimal tradeoff, 2) a cyclic permutation lemma that can produce more sophisticated codes from simpler ones, and 3) a relaxed entropic linear program (LP) lower bound that has a polynomial complexity. Equipped with the cyclic permutation lemma, we then propose two novel code constructions, and by applying the lemma, obtain new storage-retrieval points. Furthermore, we derive more explicit lower bounds by utilizing only a subset of the constraints in the relaxed entropic LP in a systematic manner. Though the new upper bound and lower bound do not lead to a more precise approximate characterization in general, they are significantly tighter than the existing art.
Numan Khurshid, Talha Hanif, Mohbat Tharani, Murtaza Taj
Cross-modal retrieval aims to measure the content similarity between
different types of data. The idea has been previously applied to visual, text,
and speech data. In this paper, we present a novel cross-modal retrieval method
specifically for multi-view images, called Cross-view Image Retrieval CVIR. Our
approach aims to find a feature space as well as an embedding space in which
samples from street-view images are compared directly to satellite-view images
(and vice-versa). For this comparison, a novel deep metric learning based
solution "DeepCVIR" has been proposed. Previous cross-view image datasets are
deficient in that they (1) lack class information; (2) were originally
collected for cross-view image geolocalization task with coupled images; (3) do
not include any images from off-street locations. To train, compare, and
evaluate the performance of cross-view image retrieval, we present a new 6
class cross-view image dataset termed as CrossViewRet which comprises of images
including freeway, mountain, palace, river, ship, and stadium with 700
high-resolution dual-view images for each class. Results show that the proposed
DeepCVIR outperforms conventional matching approaches on the CVIR task for the
given dataset and would also serve as the baseline for future research.
Authors' comments: International Conference on Neural Information Processing
(ICONIP-2019)
Nils C. Geib, Matthias Zilk, Thomas Pertsch, Falk Eilenberger
We present a common pulse retrieval algorithm (COPRA) that can be used for a broad category of ultrashort laser pulse measurement schemes including frequency-resolved optical gating (FROG), interferometric FROG, dispersion scan, time domain ptychography, and pulse shaper assisted techniques such as multiphoton intrapulse interference phase scan (MIIPS). We demonstrate its properties in comprehensive numerical tests and show that it is fast, reliable and accurate in the presence of Gaussian noise. For FROG it outperforms retrieval algorithms based on generalized projections and ptychography. Furthermore, we discuss the pulse retrieval problem as a nonlinear least-squares problem and demonstrate the importance of obtaining a least-squares solution for noisy data. These results improve and extend the possibilities of numerical pulse retrieval. COPRA is faster and provides more accurate results in comparison to existing retrieval algorithms. Furthermore, it enables full pulse retrieval from measurements for which no retrieval algorithm was known before, e.g., MIIPS measurements.
Kyosuke Nishida, Itsumi Saito, Atsushi Otsuka, Hisako Asano, Junji Tomita
This study considers the task of machine reading at scale (MRS) wherein,
given a question, a system first performs the information retrieval (IR) task
of finding relevant passages in a knowledge source and then carries out the
reading comprehension (RC) task of extracting an answer span from the passages.
Previous MRS studies, in which the IR component was trained without considering
answer spans, struggled to accurately find a small number of relevant passages
from a large set of passages. In this paper, we propose a simple and effective
approach that incorporates the IR and RC tasks by using supervised multi-task
learning in order that the IR component can be trained by considering answer
spans. Experimental results on the standard benchmark, answering SQuAD
questions using the full Wikipedia as the knowledge source, showed that our
model achieved state-of-the-art performance. Moreover, we thoroughly evaluated
the individual contributions of our model components with our new Japanese
dataset and SQuAD. The results showed significant improvements in the IR task
and provided a new perspective on IR for RC: it is effective to teach which
part of the passage answers the question rather than to give only a relevance
score to the whole passage.
Authors' comments: 10 pages, 6 figure. Accepted as a full paper at CIKM 2018
Karim Banawan, Sennur Ulukus
We consider the problem of noisy private information retrieval (NPIR) from
$N$ non-communicating databases, each storing the same set of $M$ messages. In
this model, the answer strings are not returned through noiseless bit pipes,
but rather through \emph{noisy} memoryless channels. We aim at characterizing
the PIR capacity for this model as a function of the statistical information
measures of the noisy channels such as entropy and mutual information. We
derive a general upper bound for the retrieval rate in the form of a max-min
optimization. We use the achievable schemes for the PIR problem under
asymmetric traffic constraints and random coding arguments to derive a general
lower bound for the retrieval rate. The upper and lower bounds match for $M=2$
and $M=3$, for any $N$, and any noisy channel. The results imply that
separation between channel coding and retrieval is optimal except for adapting
the traffic ratio from the databases. We refer to this as \emph{almost
separation}. Next, we consider the private information retrieval problem from
multiple access channels (MAC-PIR). In MAC-PIR, the database responses reach
the user through a multiple access channel (MAC) that mixes the responses
together in a stochastic way. We show that for the additive MAC and the
conjunction/disjunction MAC, channel coding and retrieval scheme are
\emph{inseparable} unlike in NPIR. We show that the retrieval scheme depends on
the properties of the MAC, in particular on the linearity aspect. For both
cases, we provide schemes that achieve the full capacity without any loss due
to the privacy constraint, which implies that the user can exploit the nature
of the channel to improve privacy. Finally, we show that the full unconstrained
capacity is not always attainable by determining the capacity of the selection
channel.
Authors' comments: Submitted to IEEE Transactions on Information Theory, July 2018
Mohini P. Sardey, G. K. Kharate
Basic group of visual techniques such as color, shape, texture are used in
Content Based Image Retrievals (CBIR) to retrieve query image or subregion of
image to find similar images in image database. To improve query result,
relevance feedback is used many times in CBIR to help user to express their
preference and improve query results.In this paper, a new approach for image
retrieval is proposed which is based on the features such as Color Histogram,
Eigen Values and Match Point. Images from various types of database are first
identified by using edge detection techniques.Once the image is identified,
then the image is searched in the particular database, then all related images
are displayed. This will save the retrieval time. Further to retrieve the
precise query image, any of the three techniques are used and comparison is
done w.r.t. average retrieval time. Eigen value technique found to be the best
as compared with other two techniques
Authors' comments: 9 pages, 4 figures, 2 tables
C Ravindranath Chowdary, Anil Kumar Singh, Anil Nelakanti
Retrieval and content management are assumed to be mutually exclusive. In this paper we suggest that they need not be so. In the usual information retrieval scenario, some information about queries leading to a website (due to `hits' or `visits') is available to the server administrator of the concerned website. This information can used to better present the content on the website. Further, we suggest that some more information can be shared by the retrieval system with the content provider. This will enable the content provider (any website) to have a more dynamic presentation of the content that is in tune with the query trends, without violating the privacy of the querying user. The result will be a better synchronization between retrieval systems and content providers, with the purpose of improving the user's web search experience. This will also give the content provider a say in this process, given that the content provider is the one who knows much more about the content than the retrieval system. It also means that the content presentation may change in response to a query. In the end, the user will be able to find the relevant content more easily and quickly.
Robin Aly, Maria Eskevich, Roeland Ordelman, Gareth J. F. Jones
This report describes metrics for the evaluation of the effectiveness of
segment-based retrieval based on existing binary information retrieval metrics.
This metrics are described in the context of a task for the hyperlinking of
video segments. This evaluation approach re-uses existing evaluation measures
from the standard Cranfield evaluation paradigm. Our adaptation approach can in
principle be used with any kind of effectiveness measure that uses binary
relevance, and for other segment-baed retrieval tasks. In our video
hyperlinking setting, we use precision at a cut-off rank n and mean average
precision.
Authors' comments: Explanation of evaluation measures for the linking task of the
MediaEval Workshop 2013
Michael R. Line, Aaron Wolf, Xi Zhang, Heather Knutson, Joshua Kammer, Elias Ellison, Pieter Deroo, Dave Crisp et al.
Spectra of exoplanet atmospheres provide us the opportunity to improve our
understanding of these objects just as remote sensing in our own solar system
has increased our understanding of the solar system bodies. The challenge is to
quantitatively determine the range of temperatures and species abundances
allowed by the data. This challenge is often difficult given the low
information content of most exoplanet spectra which commonly leads to
degeneracies in the interpretation. A variety of temperature and abundance
retrieval approaches have been applied to exoplanet spectra, but no previous
investigations have sought to compare these approaches. In this investigation
we compare three different retrieval methods: Optimal Estimation, Differential
Evolution Markov Chain Monte Carlo, and Bootstrap Monte Carlo. We call our
suite of retrieval algorithms the Caltech Inverse Modeling and Retrieval
Algorithms (CHIMERA). We discuss what we can expect in terms of uncertainties
in abundances and temperatures given current observations as well as potential
future observations and what conclusions can be drawn given those
uncertainties. In general we find that the three approaches agree for high
quality spectra expected to come from potential future spaceborne missions, but
disagree for low quality spectra representative of current observations. We
also show that the Gaussian posterior probability distribution assumption made
in the Optimal Estimation approach is valid for high quality spectral data. We
also discuss the implications of our models for the inferred C to O ratios of
exoplanetary atmospheres, which of course are important for understanding
formation environments. More specifically we show that in the observational
limit of a few photometric points, the retrieved C/O is biased towards values
near solar and near one simply due to the assumption of uninformative priors.
Authors' comments: 27 pages, 13 figures
Konstantin Avrachenkov, Evsey Morozov
We consider a GI/G/c/K-type retrial queueing system with constant retrial rate. The system consists of a primary queue and an orbit queue. The primary queue has $c$ identical servers and can accommodate the maximal number of $K$ jobs. If a newly arriving job finds the full primary queue, it joins the orbit. The original primary jobs arrive to the system according to a renewal process. The jobs have general i.i.d. service times. A job in front of the orbit queue retries to enter the primary queue after an exponentially distributed time independent of the orbit queue length. Telephone exchange systems, Medium Access Protocols and short TCP transfers are just some applications of the proposed queueing system. For this system we establish minimal sufficient stability conditions. Our model is very general. In addition, to the known particular cases (e.g., M/G/1/1 or M/M/c/c systems), the proposed model covers as particular cases the deterministic service model and the Erlang model with constant retrial rate. The latter particular cases have not been considered in the past. The obtained stability conditions have clear probabilistic interpretation.
Tewfik Kernane
We study the stability of single server retrial queues under general distribution for retrial times and stationary ergodic service times, for three main retrial policies studied in the literature: classical linear, constant and control policies. The approach used is the renovating events approach to obtain sufficient stability conditions by strong coupling convergence of the process modeling the dynamics of the system to a unique stationary ergodic regime. We also obtain instability conditions by convergence in distribution to improper limiting sequences.
Jovan Pehcevski, James A. Thom, Anne-Marie Vercoustre
This paper investigates the impact of three approaches to XML retrieval:
using Zettair, a full-text information retrieval system; using eXist, a native
XML database; and using a hybrid system that takes full article answers from
Zettair and uses eXist to extract elements from those articles. For the
content-only topics, we undertake a preliminary analysis of the INEX 2003
relevance assessments in order to identify the types of highly relevant
document components. Further analysis identifies two complementary sub-cases of
relevance assessments ("General" and "Specific") and two categories of topics
("Broad" and "Narrow"). We develop a novel retrieval module that for a
content-only topic utilises the information from the resulting answer list of a
native XML database and dynamically determines the preferable units of
retrieval, which we call "Coherent Retrieval Elements". The results of our
experiments show that -- when each of the three systems is evaluated against
different retrieval scenarios (such as different cases of relevance
assessments, different topic categories and different choices of evaluation
metrics) -- the XML retrieval systems exhibit varying behaviour and the best
performance can be reached for different values of the retrieval parameters. In
the case of INEX 2003 relevance assessments for the content-only topics, our
newly developed hybrid XML retrieval system is substantially more effective
than either Zettair or eXist, and yields a robust and a very effective XML
retrieval.
Authors' comments: Postprint version. The editor version can be accessed through the DOI
Chenghao Yue, Zhiyuan Ma, Zhongye Xia, Xinche Zhang, Yisi Zhang, Xinke Shen, Sen Song
Electroencephalography (EEG) provides a non-invasive window into brain activity, offering high temporal resolution crucial for understanding and interacting with neural processes through brain-computer interfaces (BCIs). Current dual-stream neural networks for EEG often process temporal and spatial features independently through parallel branches, delaying their integration until a final, late-stage fusion. This design inherently leads to an "information silo" problem, precluding intermediate cross-stream refinement and hindering spatial-temporal decompositions essential for full feature utilization. We propose LI-DSN, a layer-wise interactive dual-stream network that facilitates progressive, cross-stream communication at each layer, thereby overcoming the limitations of late-fusion paradigms. LI-DSN introduces a novel Temporal-Spatial Integration Attention (TSIA) mechanism, which constructs a Spatial Affinity Correlation Matrix (SACM) to capture inter-electrode spatial structural relationships and a Temporal Channel Aggregation Matrix (TCAM) to integrate cosine-gated temporal dynamics under spatial guidance. Furthermore, we employ an adaptive fusion strategy with learnable channel weights to optimize the integration of dual-stream features. Extensive experiments across eight diverse EEG datasets, encompassing motor imagery (MI) classification, emotion recognition, and steady-state visual evoked potentials (SSVEP), consistently demonstrate that LI-DSN significantly outperforms 13 state-of-the-art (SOTA) baseline models, showcasing its superior robustness and decoding performance. The code will be publicized after acceptance.
Xiang Yang, Feifei Li, Mi Zhang, Geng Hong, Xiaoyu You, Min Yang
Recent Text-to-Image (T2I) models based on rectified-flow transformers (e.g., SD3, FLUX) achieve high generative fidelity but remain vulnerable to unsafe semantics, especially when triggered by multi-token interactions. Existing mitigation methods largely rely on fine-tuning or attention modulation for concept unlearning; however, their expensive computational overhead and design tailored to U-Net-based denoisers hinder direct adaptation to transformer-based diffusion models (e.g., MMDiT). In this paper, we conduct an in-depth analysis of the attention mechanism in MMDiT and find that unsafe semantics concentrate within interpretable, low-dimensional subspaces at head level, where a finite set of safety-critical heads is responsible for unsafe feature extraction. We further observe that perturbing the Rotary Positional Embedding (RoPE) applied to the query and key vectors can effectively modify some specific concepts in the generated images. Motivated by these insights, we propose SafeRoPE, a lightweight and fine-grained safe generation framework for MMDiT. Specifically, SafeRoPE first constructs head-wise unsafe subspaces by decomposing unsafe embeddings within safety-critical heads, and computes a Latent Risk Score (LRS) for each input vector via projection onto these subspaces. We then introduce head-wise RoPE perturbations that can suppress unsafe semantics without degrading benign content or image quality. SafeRoPE combines both head-wise LRS and RoPE perturbations to perform risk-specific head-wise rotation on query and key vector embeddings, enabling precise suppression of unsafe outputs while maintaining generation fidelity. Extensive experiments demonstrate that SafeRoPE achieves SOTA performance in balancing effective harmful content mitigation and utility preservation for safe generation of MMDiT. Codes are available at https://github.com/deng12yx/SafeRoPE.
Authors' comments: CVPR26
Anika Goel, Samir Salim, Sara L. Ellison, Shobita Satyapal, Sheyda Salehirad, Robert W. Bickley, Christopher J. Agostino
In this paper, we investigate the robustness of WISE mid-IR color selection (W1-W2) for identifying obscured (Type 2) active galactic nuclei (AGNs) at low redshift (z<0.3), using a sample of ~360,000 SDSS galaxies classified via emission lines into Seyfert 2 (Sy2), LINER, and star-forming (BPT-SF) galaxies. We find that the K-correction is essential to remove non-AGN contamination, and once applied the simple W1-W2>0.5 selection emerges as optimal in terms of purity and completeness of AGN selection. However, we confirm that even this lenient cut selects only ~13% of Sy2 galaxies and that achieving W1-W2>0.5 requires AGN contributing >75% of the total infrared luminosity, which is uncommon. Although mid-IR-selected Sy2s tend to be luminous, the high [OIII] luminosity does not guarantee red W1-W2 (nor does any other tested global or NLR-scale parameter), suggesting the critical role of obscuration on smaller scales. <1% of BPT-SF systems (but making ~20% of all mid-IR selected galaxies) exhibit W1-W2>0.5 colors. Such colors cannot be reproduced by models of star-heated dust alone. Red BPT-SFs tend to have higher W4 luminosities than expected from SF, indicating true AGNs. Intriguingly, mid-IR AGNs in massive bulges ($M_{\mathrm{bulge}} \gtrsim 10^{10} M_{\odot}$) predominantly (84%) manifest themselves as BPT-AGNs, whereas those in low-mass bulges ($\lesssim 10^{10} M_{\odot}$) mostly (60%) manifest as BPT-SF. This BPT-AGN vs.\ BPT-SF dichotomy does not extend to total stellar mass. We conclude that although the mid-IR AGN selection is incomplete, its strength lies in identifying optically inconspicuous AGNs with low-mass bulges, regardless of the total mass.
Authors' comments: Submitted to ApJ. Comments welcome
Rui Chen, Nan Jiang
Chance-constrained programs (CCPs) provide a powerful modeling framework for decision-making under uncertainty, but their nonconvex feasible regions make them computationally challenging. A widely used convex inner approximation replaces chance constraints with Conditional Value-at-Risk (CVaR) constraints; however, the resulting solutions can be overly conservative and suboptimal. We propose a scenario-wise scaling approach that strengthens CVaR approximations for CCPs with finitely supported uncertainty. The method introduces scaling factors that reweight individual scenarios within the CVaR constraint, yielding a family of potentially tighter inner approximations. We establish sufficient conditions under which, for a suitable choice of scaling factors, the scaled CVaR approximation attains the same optimal value as the original CCP and admits a (near-)optimal solution of the CCP. We show that these conditions are tight and further relax them in the convex setting. We also show that optimizing over scenario-wise scaling factors is NP-hard. To address this computational challenge, we develop efficient heuristic and sequential convex approximation algorithms that iteratively update the scaling factors and generate improved feasible solutions. Numerical experiments demonstrate that the proposed methods consistently improve upon standard CVaR and state-of-the-art convex approximations, often reducing conservativeness while maintaining tractability.
Daichi Yashima, Koki Seno, Shuhei Kurita, Yusuke Oda, Komei Sugiura
Coarse-to-fine autoregressive modeling has recently shown strong promise for visuomotor policy learning, combining the inference efficiency of autoregressive methods with the global trajectory coherence of diffusion-based policies. However, existing approaches rely on discrete action tokenizers that map continuous action sequences to codebook indices, a design inherited from image generation where learned compression is necessary for high-dimensional pixel data. We observe that robot actions are inherently low-dimensional continuous vectors, for which such tokenization introduces unnecessary quantization error and a multi-stage training pipeline. In this work, we propose Hierarchical Flow Policy (HiFlow), a tokenization-free coarse-to-fine autoregressive policy that operates directly on raw continuous actions. HiFlow constructs multi-scale continuous action targets from each action chunk via simple temporal pooling. Specifically, it averages contiguous action windows to produce coarse summaries that are refined at finer temporal resolutions. The entire model is trained end-to-end in a single stage, eliminating the need for a separate tokenizer. Experiments on MimicGen, RoboTwin 2.0, and real-world environments demonstrate that HiFlow consistently outperforms existing methods including diffusion-based and tokenization-based autoregressive policies.
Yuan-Hao Wei
We propose SAHMM-VAE, a source-wise adaptive Hidden Markov prior variational autoencoder for unsupervised blind source separation. Instead of treating the latent prior as a single generic regularizer, the proposed framework assigns each latent dimension its own adaptive regime-switching prior, so that different latent dimensions are pulled toward different source-specific temporal organizations during training. Under this formulation, source separation is not implemented as an external post-processing step; it is embedded directly into variational learning itself. The encoder, decoder, posterior parameters, and source-wise prior parameters are optimized jointly, where the encoder progressively learns an inference map that behaves like an approximate inverse of the mixing transformation, while the decoder plays the role of the generative mixing model. Through this coupled optimization, the gradual alignment between posterior source trajectories and heterogeneous HMM priors becomes the mechanism through which different latent dimensions separate into different source components. To instantiate this idea, we develop three branches within one common framework: a Gaussian-emission HMM prior, a Markov-switching autoregressive HMM prior, and an HMM state-flow prior with state-wise autoregressive flow transformations. Experiments show that the proposed framework achieves unsupervised source recovery while also learning meaningful source-wise switching structures. More broadly, the method extends our structured-prior VAE line from smooth, mixture-based, and flow-based latent priors to adaptive switching priors, and provides a useful basis for future work on interpretable and potentially identifiable latent source modeling.
Chia-Yu Lee, Huang-Cheng Chou, Tzu-Quan Lin, Yuanchao Li, Ya-Tse Wu, Shrikanth Narayanan, Chi-Chun Lee
Integrating Automatic Speech Recognition (ASR) into Speech Emotion Recognition (SER) enhances modeling by providing linguistic context. However, conventional feature fusion faces performance bottlenecks, and multi-task learning often suffers from optimization conflicts. While task vectors and model merging have addressed such conflicts in NLP and CV, their potential in speech tasks remains largely unexplored. In this work, we propose an Adaptive Layer-wise Task Vector Merging (AdaLTM) framework based on WavLM-Large. Instead of joint optimization, we extract task vectors from in-domain ASR and SER models fine-tuned on emotion datasets. These vectors are integrated into a frozen base model using layer-wise learnable coefficients. This strategy enables depth-aware balancing of linguistic and paralinguistic knowledge across transformer layers without gradient interference. Experiments on the MSP-Podcast demonstrate that the proposed approach effectively mitigates conflicts between ASR and SER.
Authors' comments: Submitted to Interspeech 2026
Jimyung Hong, Jaehyung Kim
Large language models (LLMs) have demonstrated remarkable capabilities, but their massive scale poses significant challenges for practical deployment. Structured pruning offers a promising solution by removing entire dimensions or layers, yet existing methods face critical trade-offs: task-agnostic approaches cannot adapt to task-specific requirements, while task-aware methods require costly training to learn task adaptability. We propose DIET (Dimension-wise global pruning of LLMs via merging Task-wise importance scores), a training-free structured pruning method that combines dimension-level granularity with task-aware selection. DIET profiles activation magnitudes across tasks using only 100 samples per task, then applies majority voting to construct a single global mask. DIET does not require large costs from pre-computation or training. Experiments on seven zero-shot benchmarks using Gemma-2 2B and 9B models demonstrate the effectiveness of DIET; for example, at 20% sparsity on Gemma-2 2B, DIET achieves near 10% average accuracy improvement, compared to previous state-of-the-art structured pruning methods. This advantage persists across various sparsity levels and model scales, positioning DIET as a practical and robust choice for structured LLM pruning.
Authors' comments: 14 pages, 10 figures. Code available at https://github.com/Jimmy145123/DIET