Ming-qiu Huang, Cheng-zu Li, Yuan-ben Dai
We present a QCD sum rule calculation of the Isgur-Wise form-factor
tau_1(v.v') and tau_2(v.v') for the semileptonic decays B->D_1(2420)l\bar\nu
and B->D_2^*(2460)l\bar\nu in the framework of heavy quark effective theory.
These two universal functions, associated with the matching of the weak
currents in QCD onto those in the effective theory, appear at the order 1/m_Q
in the heavy quark expansion of meson weak decay form factors.
Authors' comments: RevTeX, 18 pages including 3 PS figures
Z. Ligeti, Y. Nir, M. Neubert
We calculate, in the framework of QCD sum rules and to next-to-leading order
in perturbation theory, the universal function $\xi_3(v\cdot v')$ which appears
at order $1/m_Q$ in the heavy quark expansion of meson weak decay form factors.
We find that radiative corrections of order $\alpha_s$ are very important. Over
the kinematic range accessible in semileptonic decays, $\xi_3(v\cdot v')$ is
proportional to the leading-order Isgur-Wise function $\xi(v\cdot v')$ to very
good accuracy. Taking into account all sources of uncertainty, we estimate
$\xi_3/\xi=(0.6\pm 0.2)$. This reduces the theoretical uncertainty in the
extraction of $|\,V_{cb}|$ from $\bar B\to D\,\ell\, \bar\nu$ transitions. A
measurement of the form factor ratio $A_2/A_1$ in $\bar B\to D^*\ell\,\bar\nu$
decays can be used to test our prediction.
Authors' comments: 14 pages (ReVTeX, 2 figures available), SLAC-PUB-6146,
WIS-93/33/May-PH
Zahid Hassan Tushar, Sanjay Purushotham
Aerosol Optical Depth (AOD) retrieval is essential for Earth observation, supporting applications from air quality monitoring to climate studies. Conventional physics-based AOD retrieval methods formulate the problem as a pixel-wise inversion, relying on radiative transfer modeling, memory-intensive look-up tables, and auxiliary meteorological data. While recent data-driven approaches have shown promise, many fail to exploit the spatial-spectral coherence of hyperspectral imagery, leading to spatially inconsistent and noise-sensitive retrievals. We present the first study exploring Foundation AI models for AOD retrieval and propose ViTCG, a Vision Transformer with Channel-wise Grouping-based spatial regression framework that reduces retrieval bias and error. ViTCG uses hyperspectral top-of-atmosphere radiance as input and jointly models spatial context and spectral information. Validation with PACE radiance observations demonstrates a 62% reduction in mean squared error compared to state-of-the-art foundation models, including Prithvi, and produces spatially coherent AOD fields.
Authors' comments: 5 pages, 4 figures, to appear in 2026 IEEE International Geoscience and Remote Sensing Symposium
H. Kühnle, P. Patapis, P. Mollière, P. Tremblin, E. Matthews, A. M. Glauser, N. Whiteford, M. Vasist et al.
With a temperature of $\sim 285$ K WISE0855 is the coldest brown dwarf
observed so far. Using the James Webb Space Telescope (JWST) we obtained
observations that allow us to characterize WISE0855s atmosphere focusing on
vertical variation in the water steam abundance, measuring trace gas abundances
and receiving bulk parameters for this cold object. We observed the ultra cool
dwarf WISE0855 using the Mid-Infrared Instrument Medium Resolution Spectrometer
(MIRI/MRS) onboard JWST at a spectral resolution of up to 3750. We combined the
observation with published data from the Near Infrared Spectrograph (NIRSpec)
G395M and PRISM modes yielding a spectrum ranging from 0.8 to 22 um. We apply
atmospheric retrievals using petitRADTRANS to measure atmospheric abundances,
the pressure-temperature structure, radius and gravity of the brown dwarf. We
also employ publicly available clear and cloudy self-consistent grid models to
estimate bulk properties of the atmosphere such as the effective temperature,
radius, gravity and metallicity. Atmospheric retrievals constrain a variable
water abundance profile in the atmosphere, as predicted by equilibrium
chemistry. We detect the 15NH3 isotopologue and infer a ratio of mass fraction
of 14NH3/15NH3 = 332+63-43 for the clear retrieval. We measure the bolometric
luminosity by integrating the presented spectrum and obtain a value of
log(L/L$_{\odot}$) = -7.291+/-0.008. The detected water depletion indicates
that water condenses out in the upper atmosphere due to the very low effective
temperature of WISE0855. The height in the atmosphere where this occurs is
covered by the MIRI/MRS data, and thus demonstrates the potential of MIRI to
characterize cold gas giants atmospheres. Comparing the data to retrievals and
self-consistent grid models, we do not detect signs for water ice clouds,
although their spectral features have been predicted in previous studies.
Authors' comments: Submitted to A&A, 29 pages, 21 figures
Chenyang Shao, Yong Li, Fengli Xu
The rapid development of AI agent has spurred the development of advanced research tools, such as Deep Research. Achieving this require a nuanced understanding of the relations within scientific literature, surpasses the scope of keyword-based or embedding-based retrieval. Existing retrieval agents mainly focus on the content-level similarities and are unable to decode critical relational dynamics, such as identifying corroborating or conflicting studies or tracing technological lineages, all of which are essential for a comprehensive literature review. Consequently, this fundamental limitation often results in a fragmented knowledge structure, misleading sentiment interpretation, and inadequate modeling of collective scientific progress. To investigate relation-aware retrieval more deeply, we propose SciNetBench, the first Scientific Network Relation-aware Benchmark for literature retrieval agents. Constructed from a corpus of over 18 million AI papers, our benchmark systematically evaluates three levels of relations: ego-centric retrieval of papers with novel knowledge structures, pair-wise identification of scholarly relationships, and path-wise reconstruction of scientific evolutionary trajectories. Through extensive evaluation of three categories of retrieval agents, we find that their accuracy on relation-aware retrieval tasks often falls below 20%, revealing a core shortcoming of current retrieval paradigms. Notably, further experiments on the literature review tasks demonstrate that providing agents with relational ground truth leads to a substantial 23.4% performance improvement in the review quality, validating the critical importance of relation-aware retrieval. We publicly release our benchmark at https://anonymous.4open.science/r/SciNetBench/ to support future research on advanced retrieval systems.
Chuan Meng, Negar Arabzadeh, Mohammad Aliannejadi, Maarten de Rijke
Query performance prediction (QPP) is a core task in information retrieval.
The QPP task is to predict the retrieval quality of a search system for a query
without relevance judgments. Research has shown the effectiveness and
usefulness of QPP for ad-hoc search. Recent years have witnessed considerable
progress in conversational search (CS). Effective QPP could help a CS system to
decide an appropriate action to be taken at the next turn. Despite its
potential, QPP for CS has been little studied. We address this research gap by
reproducing and studying the effectiveness of existing QPP methods in the
context of CS. While the task of passage retrieval remains the same in the two
settings, a user query in CS depends on the conversational history, introducing
novel QPP challenges. In particular, we seek to explore to what extent findings
from QPP methods for ad-hoc search generalize to three CS settings: (i)
estimating the retrieval quality of different query rewriting-based retrieval
methods, (ii) estimating the retrieval quality of a conversational dense
retrieval method, and (iii) estimating the retrieval quality for top ranks vs.
deeper-ranked lists. Our findings can be summarized as follows: (i) supervised
QPP methods distinctly outperform unsupervised counterparts only when a
large-scale training set is available; (ii) point-wise supervised QPP methods
outperform their list-wise counterparts in most cases; and (iii) retrieval
score-based unsupervised QPP methods show high effectiveness in assessing the
conversational dense retrieval method, ConvDR.
Authors' comments: Accepted for publication at SIGIR 2023
Qizhi Pei, Lijun Wu, Zhenyu He, Jinhua Zhu, Yingce Xia, Shufang Xie, Rui Yan
Drug-Target binding Affinity (DTA) prediction is essential for drug
discovery. Despite the application of deep learning methods to DTA prediction,
the achieved accuracy remain suboptimal. In this work, inspired by the recent
success of retrieval methods, we propose $k$NN-DTA, a non-parametric
embedding-based retrieval method adopted on a pre-trained DTA prediction model,
which can extend the power of the DTA model with no or negligible cost.
Different from existing methods, we introduce two neighbor aggregation ways
from both embedding space and label space that are integrated into a unified
framework. Specifically, we propose a \emph{label aggregation} with
\emph{pair-wise retrieval} and a \emph{representation aggregation} with
\emph{point-wise retrieval} of the nearest neighbors. This method executes in
the inference phase and can efficiently boost the DTA prediction performance
with no training cost. In addition, we propose an extension, Ada-$k$NN-DTA, an
instance-wise and adaptive aggregation with lightweight learning. Results on
four benchmark datasets show that $k$NN-DTA brings significant improvements,
outperforming previous state-of-the-art (SOTA) results, e.g, on BindingDB
IC$_{50}$ and $K_i$ testbeds, $k$NN-DTA obtains new records of RMSE
$\bf{0.684}$ and $\bf{0.750}$. The extended Ada-$k$NN-DTA further improves the
performance to be $\bf{0.675}$ and $\bf{0.735}$ RMSE. These results strongly
prove the effectiveness of our method. Results in other settings and
comprehensive studies/analyses also show the great potential of our $k$NN-DTA
approach.
Authors' comments: Accepted by 33rd ACM International Conference on Information and
Knowledge Management 2024 (CIKM 2024)
Michal Shlapentokh-Rothman, Prachi Garg, Yu-Xiong Wang, Derek Hoiem
Keyframe selection is a direct way to provide verifiable visual evidence for long-video question answering (QA). Queries differ in what they require, and finding the right frames depends on knowing what to look for. Existing keyframe selectors either score every frame against a single query, or decompose the query into a fixed schema evaluated by a single visual tool. We propose ToolMerge, a keyframe retrieval method based on decomposition and merging: an Large Language Model (LLM) based planner decomposes the query into tool calls and specifies how their per-tool rankings are merged using boolean operators. To evaluate retrieval directly, we construct Molmo-2 Moments (M2M), a benchmark in which every question is anchored to a specific time interval by construction. Across QA, question retrieval, and caption retrieval, ToolMerge is competitive with prior keyframe selectors, most notably on caption retrieval, outperforming other methods by 5%. Code and data can be found at https://github.com/michalsr/ToolMerge .
Hamed Shirzad, Frederik Wenkel, Dominique Beaini, Danica J. Sutherland, Emmanuel Noutahi
Knowledge graphs (KGs) offer a rich representation for relational knowledge, but their irregular structure makes retrieval challenging: ego-graph expansion grows rapidly, and dense embedding methods struggle with multi-hop compositional queries. Existing agent-based graph exploration approaches, while expressive, are often too expensive for large-scale retrieval. We introduce SeedER (Seed-and-Expand Retrieval), a retrieval framework that explicitly leverages KG structure through iterative, low-cost expansion. SeedER first seeds a compact set of core nodes using lightweight dense and entity-based retrieval, then selectively expands this set via a learned graph-aware policy trained with reinforcement learning. This design decomposes global reasoning into reusable local decisions, enabling efficient discovery of query-relevant nodes while tightly controlling expansion cost. We show theoretical limitations of dense retrieval on compositional graph queries, and establish advantages of SeedER from both compositional generalization and graph-constrained submodular optimization perspectives. Empirically, SeedER substantially improves recall with compact candidate sets over strong dense and graph-augmented baselines, making it an effective first-stage retriever for knowledge-intensive reasoning systems.
Jung Yi, Minjae Kim, Paul Hyunbin Cho, Wooseok Jang, Sangdoo Yun, Seungryong Kim
Autoregressive video diffusion models have enabled real-time, action-conditioned world generation. However, sustaining a persistent world, where revisiting a previously seen viewpoint yields consistent content, remains an open problem. Full KV-cache attention preserves this consistency but breaks real-time constraints: memory footprint and attention cost grow linearly with rollout length. Sliding window inference restores throughput but discards long-term consistency. We propose WorldKV, a training-free framework with two components: World Retrieval and World Compression. World Retrieval stores evicted KV-cache chunks in GPU/CPU memory and selectively retrieves scene-relevant chunks via camera/ action correspondence, inserting them back into the native attention window without re-encoding. World Compression prunes redundant tokens within each chunk via key-key similarity to an anchor frame, halving per-chunk storage to fit 2x more history under a fixed budget. On Matrix-Game-2.0 and LingBot- World-Fast, WorldKV matches or exceeds full-KV memory fidelity at roughly 2x the throughput, and is competitive with memory-trained baselines without any fine-tuning. Project Page: https://cvlab-kaist.github.io/WorldKV/
Authors' comments: Project Page: https://cvlab-kaist.github.io/WorldKV/
Haokun Wen, Xuemeng Song, Xinghao Xie, Xiaolin Chen, Xiangyu Zhao, Weili Guan
Fashion image retrieval is a cornerstone of modern e-commerce systems. A unified framework that supports diverse query formats and search intentions is highly desired in practice. However, existing approaches focus on narrow retrieval tasks and do not fully capture such diversity. Therefore, in this work, we aim to develop a unified framework capable of handling diverse realistic fashion retrieval scenarios, achieving truly versatile fashion image retrieval. To establish a data foundation, we first introduce U-FIRE, a comprehensive benchmark that consolidates fragmented fashion datasets into a unified collection, supplemented by two manually curated datasets for testing generalization. Building upon this, we propose FashionLens, a unified framework based on Multimodal Large Language Models. To handle divergent matching objectives, we design a Proposal-Guided Spherical Query Calibrator that dynamically shifts query representations into task-aligned metric spaces via adaptive spherical linear interpolation. Additionally, to mitigate the optimization imbalance caused by varying task complexities and data scales, we develop a Gradient-Guided Adaptive Sampling strategy that automatically re-weights tasks based on realtime learning difficulty and the data scale prior. Experiments on U-FIRE show that FashionLens achieves state-of-the-art performance across diverse retrieval scenarios and generalizes robustly to unseen tasks. The data and code are publicly released at https://github.com/haokunwen/FashionLens.
Wenhao Zhang, Ruihao Yu, Yi Bai, Zhumin Chen, Pengjie Ren
While generative retrieval (GR) demonstrates competitive performance on standard retrieval benchmarks, existing approaches directly map queries to document identifiers (docids) without intermediate deliberation, limiting their effectiveness for complex queries that require multi-step reasoning. As a preliminary study on integrating chain-of-thought (CoT) into generative retrieval, we introduce ThinkGR, a unified framework that interleaves CoT with docid generation, enabling iterative thinking and retrieval within a single generative process. To bridge the gap between free-form thought generation and structured retrieval targets, we design (1) a hybrid decoding strategy that dynamically switches between unconstrained thought generation and constrained docid decoding, and (2) a two-phase training approach that first aligns thought-retrieval patterns through supervised fine-tuning, then optimizes thought quality via retrieval-grounded reinforcement learning. Experiments on four multi-hop retrieval benchmarks demonstrate that ThinkGR achieves state-of-the-art performance with an average improvement of +6.86\%. Our work opens new avenues for enhancing generative retrieval with explicit deliberation capabilities, with promising implications for retrieval tasks requiring complex reasoning.
Authors' comments: This work was initially submitted to kdd 2026 in August 2025
Joseph Arnold Riley, Christian Johnson-Richards, Noel Healy, Victor Pacheco-Peña
The advent of additive manufacturing has opened opportunities to rapidly prototype devices and products ranging from automotive and aerospace applications to micro/nanoscale metastructures, as examples). Three-dimensional (3D) printing has become relevant for electromagnetic structures, integrated optics and photonics systems, however, the optical properties of commercially available 3D printed polymers at telecommunication wavelengths (wavelength 1550nm) is not always available. Provided the importance of 3D printing technologies, in this work, we evaluate both theoretically and experimentally the complex refractive index of four polymers including some recycled versions (namely Butenediol Vinyl Alcohol (BVOH), Polylactic Acid (PLA), Recycled Polyethylene Terephthalate (rPET), and recycled Polylactic Acid (rPLA)) as potential candidates for photonics applications. The 3D printed samples have thicknesses from ~100 to 400 nm (~64wavelengths to ~258wavelengths, respectively). The experimental reflectance and transmittance spectra are extracted and used to retrieve the complex refractive index of each printed material demonstrating extinction coefficients in the order of 10^-4 at wavelength=1550nm. The experimental results are validated using numerical simulations. Finally, as a proof-of-concept, a convex-planar lens and a Bragg mirror are designed and numerically evaluated, showing the potential of the proposed polymers for 3D printing photonic structures at telecommunication wavelengths.
Authors' comments: 23 pages, 6 figures
Noelia Luna-Barahona, Antonio Ríos-Vila, David Rizo, Jorge Calvo-Zaragoza
The digitization of musical scores plays a crucial role in their preservation and accessibility, yet information retrieval still depends mainly on metadata searches, such as by title or composer. Content based search in music score images remains underexplored compared to text documents, despite its potential value for musicians, musicologists, and educators. This work contributes to the field by first studying which characteristics of a score are most relevant for search and by defining a systematic method to build query datasets from any annotated corpus. We also consider diverse methods for content-based search on music score images, ranging from transcription-based approaches relying on Optical Music Recognition (OMR), to a transcription-free Transformer model trained to recognize queries directly from score images, and a text-prompted Large Language Model. Our experiments evaluate these models on four corpora exhibiting diverse characteristics in terms of dataset size, image quality, and typesetting mechanisms. Overall, each method excels under different conditions: OMR-based pipelines achieve higher in-domain retrieval, whereas transcription-free models handle domain variability more effectively.
Authors' comments: 17 pages (14 pages + references), 3 figures (with subfigures)
Kai Golan Hashiloni, Daniel Fadlon, Lior Livyatan, Ofri Hefetz, Jiahuan Pei, Kfir Bar
Idioms pose a fundamental challenge for language models, as their meaning cannot be inferred from surface form alone. Understanding such expressions, therefore, requires semantic abstraction beyond lexical overlap. We introduce IdioLink, a retrieval benchmark designed to test whether models can link idiomatic expressions to conceptually equivalent meanings expressed in literal or paraphrased forms. IdioLink comprises 10,700 documents and 2,140 queries, spanning 107 idioms with both literal and figurative uses. Each document and query is annotated with spans that convey the core meaning. Evaluating strong embedding baselines (e.g., BGE, E5, Contriever, and Qwen), we show that current models struggle to retrieve equivalent meanings across divergent surface realizations, relying instead on topical and shallow semantic cues. IdioLink exposes key gaps in idiom-aware semantic retrieval and provides a challenging testbed for future models.
Ningyuan Li, Haiyang Shen, Mugeng Liu, Yudong Han, Zhuofan Shi, Sixiong Xie, Yun Ma
Recent advances in large language models and tool-using agents have expanded the range of benchmarked web tasks. Yet an important class of specialized retrieval tasks remains undercharacterized. On many specialized data-retrieval websites, answer-bearing evidence becomes accessible only after establishing the correct site-specific retrieval state through filters, views, hierarchies, or scopes. We term this capability state-gated retrieval (SGR). We introduce SGR-Bench, a benchmark for this setting containing 100 expert-curated tasks spanning six source families and 12 public data ecosystems. Each task requires discovering the appropriate website and configuring its site-specific retrieval state to produce a structured answer. SGR-Bench pairs constraint-guided and goal-oriented formulations of the same underlying problems, enabling controlled comparisons between explicit and implicit guidance for state-gated retrieval. We evaluate eight CLI-based agentic LLM systems and three commercial search-agent products. On SGR-Bench, the strongest system reaches only 66.18% item-level F1, while row-level F1 remains much lower. A manual audit of 156 analyzable failed CLI trajectories shows why: agents often reach a relevant web source, but establish the wrong site-specific retrieval state. Retrieval-scope drift (37.2%) and criterion mismatch (27.6%) dominate, whereas final answer composition accounts for only 10.3%. The dataset and single-case evaluation instructions are available at https://huggingface.co/datasets/PKUAIWeb/SGR-BENCH.
Authors' comments: Work in Progress. 23 pages, 7 figures, preprint
Vinodh Kumar Sunkara, Satheeshkumar Karuppusamy, Hangjun Xu, Sai Deepika Regani, Kshitij Gupta, Gaby Nahum, Sneha Iyer, Jean-Baptiste Fiot et al.
Traditional ads recommendation systems have primarily focused on optimizing for prediction accuracy of click or conversion events using canonical metrics such as recall or normalized discounted cumulative gain (NDCG). With the hyper-growth of ads inventory and liquidity with generative AI technologies, the prediction stability and predictability is becoming increasingly critical. Intuitively, prediction stability and predictability can be defined to quantify system robustness with respect to minor/noisy input (ads, creatives) perturbations, the lack of which could lead to advertiser perceivable problems such as repeatability, cold start and under-exploration. In this paper, we introduce a new evaluation framework for quantifying stability and predictability of an ads recommender system, and present an online validated semantic candidate generation framework powered by fine-tuned Large Language Models (LLMs) that showed significant improvement along these metrics by fundamentally improving the semantic-awareness of the system. The approach extracts hierarchical semantic attributes from ad creatives to obtain LLM representations, which serve as the foundation for graph-based expansion, ensuring the retrieved candidates encapsulate semantic variants of an ad, guaranteeing that small creative variants from the advertiser yield consistent and explainable delivery results to the user. We tested this LLM ads retrieval framework in a large-scale industrial ads recommendation system, demonstrating significant improvements across offline and online A/B experiments, showcasing gains in both predictability and traditional performance metrics. Although evaluated in the ads stack, this is a general framework that can be applied broadly to any large-scale recommendation and retrieval systems facing similar scaling and predictability challenges.
Authors' comments: SIGIR 2026 AgentSearch Workshop, Melbourne Australia
Mehrdad Saberi, Keivan Rezaei, Soheil Feizi
Large language models increasingly use external tools such as web search and document retrieval to solve information-intensive tasks. However, multi-hop tool use in complex tasks introduces substantial latency, since the model must repeatedly wait for tool observations before continuing. We study how to accelerate such trajectories without changing the final trajectory the model would have taken without acceleration, assuming access to faster but less reliable speculator tools. We develop a theoretical framework for lossless speculation in multi-hop tool-use settings, characterizing the optimal achievable latency gain. We propose SpecHop, a continuous speculation framework that maintains multiple speculative threads, verifies predicted observations asynchronously as target tool outputs arrive, commits correct branches, and rolls back incorrect ones. This preserves accuracy while reducing wall-clock latency. We show that SpecHop can approach oracle latency gains with enough active threads. Empirically, on retrieval-augmented multi-hop tasks, SpecHop closely matches theoretical predictions and reduces latency by up to 40\% in some settings. Code: https://github.com/mehrdadsaberi/spechop
Shao Kan
Medical RAG systems in high-risk QA settings are often evaluated through a single answer-or-abstain decision, but mixed evidence may support one claim, require conditions for another, and contradict a third. We study claim-selective certification: each response is decomposed into verifiable claims, scored against retrieved evidence, and mapped by an intent-aware selector to {full, partial, conflict, abstain}. On the primary weak-label certificate protocol, whose real-source-only dev/test rows cover the naturally occurring non-abstain actions, the full system records UCCR=0.0000, PAU=1.0000, PAU Precision=0.9901, and action accuracy=0.9204 on dev (n=314), and UCCR=0.0000, PAU=0.9967, PAU Precision=0.9739, and action accuracy=0.8997 on test (n=319). UCCR measures unsupported-claim risk within the certificate definition, and a source-missing counterfactual slice evaluates abstain under empty evidence. Shortcut controls quantify the action-label prior explained by source and intent metadata, while source/evidence-novel slices characterize transfer boundaries. The resulting interface separates action-label prediction from evidence-linked claim selection under mixed evidence.
Authors' comments: 22 pages, 7 figures, 11 tables
Chengcai Gao, Zhihong Sun, Xiaochuan Shi, Qiufeng Wang, Chao Liang
The growing adoption of Retrieval-Augmented Generation (RAG) has led to a rise in adversarial attacks. Existing defenses, relying on semantic analysis or voting, face a trade-off between high computational cost and limited robustness under strong poisoning attacks. Their fundamental limitation is the exclusive focus on semantic content relevance, while neglecting the retrieval context that is critically defined by ranking structures. To this end, we investigate the bidirectional ranking behavior of poisoned and benign documents, and discover a key discriminative pattern: poisoned documents exhibit significantly stronger alignment between their backward rankings and the query's forward ranking. Capitalizing on this, we propose BiRD, a bidirectional ranking defense mechanism built upon a dual-signal framework that leverages forward ranking to assess semantic content relevance and backward ranking to quantify ranking context consistency. This design directly addresses the fundamental limitation of prior approaches, enabling simultaneous efficiency and robustness. Extensive evaluation across 3 datasets with 3 retrievers and 3 LLMs under 2 attack scenarios validates BiRD's effectiveness. Notably, BiRD reduces the attack success rate of PoisonedRAG by up to 54% while simultaneously improving task accuracy by up to 56%, with average additional latency under 1 second.
Authors' comments: 17 pages, 10 figures and 8 tables