benty-fields - Search paper

7961. Unlocking Multimodal Document Intelligence: From Current Triumphs to Future Frontiers of Visual Document Retrieval

Yibo Yan, Jiahao Huo, Guanbo Feng, Mingdong Ou, Yi Cao, Xin Zou, Shuliang Liu, Yuanhuiyi Lyu et al.

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.19961v1

Vote

Add to Library

Recommend

7962. Sculpting the Vector Space: Towards Efficient Multi-Vector Visual Document Retrieval via Prune-then-Merge Framework

Yibo Yan, Mingdong Ou, Yi Cao, Xin Zou, Jiahao Huo, Shuliang Liu, James Kwok, Xuming Hu

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.19549v1

Vote

Add to Library

Recommend

7963. Learning to Reason for Multi-Step Retrieval of Personal Context in Personalized Question Answering

Maryam Amirizaniani, Alireza Salemi, Hamed Zamani

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.19317v1

Vote

Add to Library

Recommend

7964. Topology of Reasoning: Retrieved Cell Complex-Augmented Generation for Textual Graph Question Answering

Sen Zhao, Lincheng Zhou, Yue Chen, Ding Zou

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.19240v1

Vote

Add to Library

Recommend

7965. Retrieval Augmented Enhanced Dual Co-Attention Framework for Target Aware Multimodal Bengali Hateful Meme Detection

Raihan Tanvir, Md. Golam Rabiul Alam

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.19212v1

Vote

Add to Library

Recommend

7966. AgenticRAGTracer: A Hop-Aware Benchmark for Diagnosing Multi-Step Retrieval Reasoning in Agentic RAG

Qijie You, Wenkai Yu, Wentao Zhang

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.19127v1

Vote

Add to Library

Recommend

7967. Tempawral: A Time-Resolved Retrieval Framework for Variable Brown Dwarfs and Exoplanets

Fei Wang, Ben Burningham, Stuart Littlefair, Etienne Artigau, Yuka Fujii, Jacqueline K. Faherty, Johanna M. Vos

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.18984v1

Vote

Add to Library

Recommend

7968. GraphSkill: Documentation-Guided Hierarchical Retrieval-Augmented Coding for Complex Graph Reasoning

Fali Wang, Chenglin Weng, Xianren Zhang, Siyuan Hong, Hui Liu, Suhang Wang

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.06620v1

Vote

Add to Library

Recommend

7969. Decomposing Retrieval Failures in RAG for Long-Document Financial Question Answering

Amine Kobeissi, Philippe Langlais

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.17981v1

Vote

Add to Library

Recommend

7970. Topological Exploration of High-Dimensional Empirical Risk Landscapes: general approach, and applications to phase retrieval

Antoine Maillard, Tony Bonnaire, Giulio Biroli

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.17779v1

We consider the landscape of empirical risk minimization for high-dimensional Gaussian single-index models (generalized linear models). The objective is to recover an unknown signal $\boldsymbolθ^\star \in \mathbb{R}^d$ (where $d \gg 1$) from a loss function $\hat{R}(\boldsymbolθ)$ that depends on pairs of labels $(\mathbf{x}_i \cdot \boldsymbolθ, \mathbf{x}_i \cdot \boldsymbolθ^\star)_{i=1}^n$, with $\mathbf{x}_i \sim \mathcal{N}(0, I_d)$, in the proportional asymptotic regime $n \asymp d$. Using the Kac-Rice formula, we analyze different complexities of the landscape -- defined as the expected number of critical points -- corresponding to various types of critical points, including local minima. We first show that some variational formulas previously established in the literature for these complexities can be drastically simplified, reducing to explicit variational problems over a finite number of scalar parameters that we can efficiently solve numerically. Our framework also provides detailed predictions for properties of the critical points, including the spectral properties of the Hessian and the joint distribution of labels. We apply our analysis to the real phase retrieval problem for which we derive complete topological phase diagrams of the loss landscape, characterizing notably BBP-type transitions where the Hessian at local minima (as predicted by the Kac-Rice formula) becomes unstable in the direction of the signal. We test the predictive power of our analysis to characterize gradient flow dynamics, finding excellent agreement with finite-size simulations of local optimization algorithms, and capturing fine-grained details such as the empirical distribution of labels. Overall, our results open new avenues for the asymptotic study of loss landscapes and topological trivialization phenomena in high-dimensional statistical models.
Authors' comments: 43 pages, 14 figures

Vote

Add to Library

Recommend

7971. Mine and Refine: Optimizing Graded Relevance in E-commerce Search Retrieval

Jiaqi Xi, Raghav Saboo, Luming Chen, Martin Wang, Sudeep Das

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.17654v1

Vote

Add to Library

Recommend

7972. Enhancing Large Language Models (LLMs) for Telecom using Dynamic Knowledge Graphs and Explainable Retrieval-Augmented Generation

Dun Yuan, Hao Zhou, Xue Liu, Hao Chen, Yan Xin, Jianzhong, Zhang

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.17529v1

Vote

Add to Library

Recommend

7973. Beyond Pipelines: A Fundamental Study on the Rise of Generative-Retrieval Architectures in Web Research

Amirereza Abbasi, Mohsen Hooshmand

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.17450v1

Vote

Add to Library

Recommend

7974. Visual Model Checking: Graph-Based Inference of Visual Routines for Image Retrieval

Adrià Molina, Oriol Ramos Terrades, Josep Lladós

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.17386v1

Vote

Add to Library

Recommend

7975. WebFAQ 2.0: A Multilingual QA Dataset with Mined Hard Negatives for Dense Retrieval

Michael Dinzinger, Laura Caspari, Ali Salman, Irvin Topi, Jelena Mitrović, Michael Granitzer

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.17327v1

Vote

Add to Library

Recommend

7976. NotebookRAG: Retrieving Multiple Notebooks to Augment the Generation of EDA Notebooks for Crowd-Wisdom

Yi Shan, Yixuan He, Zekai Shao, Kai Xu, Siming Chen

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.17215v1

Vote

Add to Library

Recommend

7977. Beyond Chunk-Then-Embed: A Comprehensive Taxonomy and Evaluation of Document Chunking Strategies for Information Retrieval

Yongjie Zhou, Shuai Wang, Bevan Koopman, Guido Zuccon

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.16974v1

Document chunking is a critical preprocessing step in dense retrieval systems, yet the design space of chunking strategies remains poorly understood. Recent research has proposed several concurrent approaches, including LLM-guided methods (e.g., DenseX and LumberChunker) and contextualized strategies(e.g., Late Chunking), which generate embeddings before segmentation to preserve contextual information. However, these methods emerged independently and were evaluated on benchmarks with minimal overlap, making direct comparisons difficult. This paper reproduces prior studies in document chunking and presents a systematic framework that unifies existing strategies along two key dimensions: (1) segmentation methods, including structure-based methods (fixed-size, sentence-based, and paragraph-based) as well as semantically-informed and LLM-guided methods; and (2) embedding paradigms, which determine the timing of chunking relative to embedding (pre-embedding chunking vs. contextualized chunking). Our reproduction evaluates these approaches in two distinct retrieval settings established in previous work: in-document retrieval (needle-in-a-haystack) and in-corpus retrieval (the standard information retrieval task). Our comprehensive evaluation reveals that optimal chunking strategies are task-dependent: simple structure-based methods outperform LLM-guided alternatives for in-corpus retrieval, while LumberChunker performs best for in-document retrieval. Contextualized chunking improves in-corpus effectiveness but degrades in-document retrieval. We also find that chunk size correlates moderately with in-document but weakly with in-corpus effectiveness, suggesting segmentation method differences are not purely driven by chunk size. Our code and evaluation benchmarks are publicly available at (Anonymoused).
Authors' comments: Github link will be pushed later as it's anonymoused at the moment

Vote

Add to Library

Recommend

7978. RankEvolve: Automating the Discovery of Retrieval Algorithms via LLM-Driven Evolution

Jinming Nian, Fangchen Li, Dae Hoon Park, Yi Fang

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.16932v1

Vote

Add to Library

Recommend

7979. Enhancing Financial Report Question-Answering: A Retrieval-Augmented Generation System with Reranking Analysis

Zhiyuan Cheng, Longying Lai, Yue Liu, Kai Cheng, Xiaoxi Qi

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.16877v1

Vote

Add to Library

Recommend

7980. Retrieval-Augmented Foundation Models for Matched Molecular Pair Transformations to Recapitulate Medicinal Chemistry Intuition

Bo Pan, Peter Zhiping Zhang, Hao-Wei Pang, Alex Zhu, Xiang Yu, Liying Zhang, Liang Zhao

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.16684v1

Vote

Add to Library

Recommend

Benty-search

7961. Unlocking Multimodal Document Intelligence: From Current Triumphs to Future Frontiers of Visual Document Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.19961v1

7962. Sculpting the Vector Space: Towards Efficient Multi-Vector Visual Document Retrieval via Prune-then-Merge Framework

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.19549v1

7963. Learning to Reason for Multi-Step Retrieval of Personal Context in Personalized Question Answering

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.19317v1

7964. Topology of Reasoning: Retrieved Cell Complex-Augmented Generation for Textual Graph Question Answering

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.19240v1

7965. Retrieval Augmented Enhanced Dual Co-Attention Framework for Target Aware Multimodal Bengali Hateful Meme Detection

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.19212v1

7966. AgenticRAGTracer: A Hop-Aware Benchmark for Diagnosing Multi-Step Retrieval Reasoning in Agentic RAG

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.19127v1

7967. Tempawral: A Time-Resolved Retrieval Framework for Variable Brown Dwarfs and Exoplanets

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.18984v1

7968. GraphSkill: Documentation-Guided Hierarchical Retrieval-Augmented Coding for Complex Graph Reasoning

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2603.06620v1

7969. Decomposing Retrieval Failures in RAG for Long-Document Financial Question Answering

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.17981v1

7970. Topological Exploration of High-Dimensional Empirical Risk Landscapes: general approach, and applications to phase retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.17779v1

7971. Mine and Refine: Optimizing Graded Relevance in E-commerce Search Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.17654v1

7972. Enhancing Large Language Models (LLMs) for Telecom using Dynamic Knowledge Graphs and Explainable Retrieval-Augmented Generation

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.17529v1

7973. Beyond Pipelines: A Fundamental Study on the Rise of Generative-Retrieval Architectures in Web Research

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.17450v1

7974. Visual Model Checking: Graph-Based Inference of Visual Routines for Image Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.17386v1

7975. WebFAQ 2.0: A Multilingual QA Dataset with Mined Hard Negatives for Dense Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.17327v1

7976. NotebookRAG: Retrieving Multiple Notebooks to Augment the Generation of EDA Notebooks for Crowd-Wisdom

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.17215v1

7977. Beyond Chunk-Then-Embed: A Comprehensive Taxonomy and Evaluation of Document Chunking Strategies for Information Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.16974v1

7978. RankEvolve: Automating the Discovery of Retrieval Algorithms via LLM-Driven Evolution

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.16932v1

7979. Enhancing Financial Report Question-Answering: A Retrieval-Augmented Generation System with Reranking Analysis

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2603.16877v1

7980. Retrieval-Augmented Foundation Models for Matched Molecular Pair Transformations to Recapitulate Medicinal Chemistry Intuition

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.16684v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.19961v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.19549v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.19317v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.19240v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.19212v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.19127v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.18984v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.06620v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.17981v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.17779v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.17654v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.17529v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.17450v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.17386v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.17327v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.17215v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.16974v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.16932v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.16877v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.16684v1