benty-fields - Search paper

8981. Logical Consistency is Vital: Neural-Symbolic Information Retrieval for Negative-Constraint Queries

Ganlin Xu, Zhoujia Zhang, Wangyi Mei, Jiaqing Liang, Weijia Lu, Xiaodong Zhang, Zhifei Yang, Xiaofeng Ma et al.

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.22299v1

Vote

Add to Library

Recommend

8982. Yambda-5B -- A Large-Scale Multi-modal Dataset for Ranking And Retrieval

A. Ploshkin, V. Tytskiy, A. Pismenny, V. Baikalov, E. Taychinov, A. Permiakov, D. Burlakov, E. Krofto et al.

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.22238v1

Vote

Add to Library

Recommend

8983. Multilingual vs Crosslingual Retrieval of Fact-Checked Claims: A Tale of Two Approaches

Alan Ramponi, Marco Rovera, Robert Moro, Sara Tonelli

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.22118v1

Vote

Add to Library

Recommend

8984. UAVPairs: A Challenging Benchmark for Match Pair Retrieval of Large-scale UAV Images

Junhuan Liu, San Jiang, Wei Ge, Wei Huang, Bingxuan Guo, Qingquan Li

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.22098v1

The primary contribution of this paper is a challenging benchmark dataset, UAVPairs, and a training pipeline designed for match pair retrieval of large-scale UAV images. First, the UAVPairs dataset, comprising 21,622 high-resolution images across 30 diverse scenes, is constructed; the 3D points and tracks generated by SfM-based 3D reconstruction are employed to define the geometric similarity of image pairs, ensuring genuinely matchable image pairs are used for training. Second, to solve the problem of expensive mining cost for global hard negative mining, a batched nontrivial sample mining strategy is proposed, leveraging the geometric similarity and multi-scene structure of the UAVPairs to generate training samples as to accelerate training. Third, recognizing the limitation of pair-based losses, the ranked list loss is designed to improve the discrimination of image retrieval models, which optimizes the global similarity structure constructed from the positive set and negative set. Finally, the effectiveness of the UAVPairs dataset and training pipeline is validated through comprehensive experiments on three distinct large-scale UAV datasets. The experiment results demonstrate that models trained with the UAVPairs dataset and the ranked list loss achieve significantly improved retrieval accuracy compared to models trained on existing datasets or with conventional losses. Furthermore, these improvements translate to enhanced view graph connectivity and higher quality of reconstructed 3D models. The models trained by the proposed approach perform more robustly compared with hand-crafted global features, particularly in challenging repetitively textured scenes and weakly textured scenes. For match pair retrieval of large-scale UAV images, the trained image retrieval models offer an effective solution. The dataset would be made publicly available at https://github.com/json87/UAVPairs.

Vote

Add to Library

Recommend

8985. Safeguarding Privacy of Retrieval Data against Membership Inference Attacks: Is This Query Too Close to Home?

Yujin Choi, Youngjoo Park, Junyoung Byun, Jaewook Lee, Jinseong Park

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.22061v1

Vote

Add to Library

Recommend

8986. Decoding Dense Embeddings: Sparse Autoencoders for Interpreting and Discretizing Dense Retrieval

Seongwan Park, Taeklim Kim, Youngjoong Ko

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2506.00041v1

Vote

Add to Library

Recommend

8987. Agent-UniRAG: A Trainable Open-Source LLM Agent Framework for Unified Retrieval-Augmented Generation Systems

Hoang Pham, Thuy-Duong Nguyen, Khac-Hoai Nam Bui

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.22571v2

Vote

Add to Library

Recommend

8988. Rethinking Chunk Size For Long-Document Retrieval: A Multi-Dataset Analysis

Sinchana Ramakanth Bhat, Max Rudat, Jannis Spiekermann, Nicolas Flores-Herr

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.21700v1

Vote

Add to Library

Recommend

8989. Query Drift Compensation: Enabling Compatibility in Continual Learning of Retrieval Embedding Models

Dipam Goswami, Liying Wang, Bartłomiej Twardowski, Joost van de Weijer

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2506.00037v1

Text embedding models enable semantic search, powering several NLP applications like Retrieval Augmented Generation by efficient information retrieval (IR). However, text embedding models are commonly studied in scenarios where the training data is static, thus limiting its applications to dynamic scenarios where new training data emerges over time. IR methods generally encode a huge corpus of documents to low-dimensional embeddings and store them in a database index. During retrieval, a semantic search over the corpus is performed and the document whose embedding is most similar to the query embedding is returned. When updating an embedding model with new training data, using the already indexed corpus is suboptimal due to the non-compatibility issue, since the model which was used to obtain the embeddings of the corpus has changed. While re-indexing of old corpus documents using the updated model enables compatibility, it requires much higher computation and time. Thus, it is critical to study how the already indexed corpus can still be effectively used without the need of re-indexing. In this work, we establish a continual learning benchmark with large-scale datasets and continually train dense retrieval embedding models on query-document pairs from new datasets in each task and observe forgetting on old tasks due to significant drift of embeddings. We employ embedding distillation on both query and document embeddings to maintain stability and propose a novel query drift compensation method during retrieval to project new model query embeddings to the old embedding space. This enables compatibility with previously indexed corpus embeddings extracted using the old model and thus reduces the forgetting. We show that the proposed method significantly improves performance without any re-indexing. Code is available at https://github.com/dipamgoswami/QDC.
Authors' comments: Accepted at CoLLAs 2025

Vote

Add to Library

Recommend

8990. Faithfulness-Aware Uncertainty Quantification for Fact-Checking the Output of Retrieval Augmented Generation

Ekaterina Fadeeva, Aleksandr Rubashevskii, Roman Vashurin, Shehzaad Dhuliawala, Artem Shelmanov, Timothy Baldwin, Preslav Nakov, Mrinmaya Sachan et al.

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.21072v1

Vote

Add to Library

Recommend

8991. ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval

Eric Xing, Pranavi Kolouju, Robert Pless, Abby Stylianou, Nathan Jacobs

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.20764v1

Vote

Add to Library

Recommend

8992. UQLegalAI@COLIEE2025: Advancing Legal Case Retrieval with Large Language Models and Graph Neural Networks

Yanran Tang, Ruihong Qiu, Zi Huang

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.20743v1

Vote

Add to Library

Recommend

8993. What LLMs Miss in Recommendations: Bridging the Gap with Retrieval-Augmented Collaborative Signals

Shahrooz Pouryousef

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.20730v1

Vote

Add to Library

Recommend

8994. TeroSeek: An AI-Powered Knowledge Base and Retrieval Generation Platform for Terpenoid Research

Xu Kang, Siqi Jiang, Kangwei Xu, Jiahao Li, Ruibo Wu

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.20663v1

Vote

Add to Library

Recommend

8995. Topology-Aware and Highly Generalizable Deep Reinforcement Learning for Efficient Retrieval in Multi-Deep Storage Systems

Funing Li, Yuan Tian, Ruben Noortwyck, Jifeng Zhou, Liming Kuang, Robert Schulz

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2506.14787v1

Vote

Add to Library

Recommend

8996. Retrieval Visual Contrastive Decoding to Mitigate Object Hallucinations in Large Vision-Language Models

Jihoon Lee, Min Song

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.20569v1

Vote

Add to Library

Recommend

8997. HAND Me the Data: Fast Robot Adaptation via Hand Path Retrieval

Matthew Hong, Anthony Liang, Kevin Kim, Harshitha Rajaprakash, Jesse Thomason, Erdem Bıyık, Jesse Zhang

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.20455v1

Vote

Add to Library

Recommend

8998. Vertical Profile Corrected Satellite NH3 Retrievals Enable Accurate Agricultural Emission Characterization in China

Qiming Liu, Yilin Chen, Peng Xu, Huizhong Shen, Zelin Mai, Ruixin Zhang, Peng Guo, Zhiyu Zheng et al.

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.19942v1

Ammonia (NH3) emissions significantly contribute to atmospheric pollution, yet discrepancies exist between bottom-up inventories and satellite-constrained top-down estimates, with the latter typically one-third higher. This study quantifies how assumptions about NH3 vertical distribution in satellite retrievals contribute to this gap. By implementing spatially and temporally resolved vertical profiles from the Community Multiscale Air Quality model to replace steep gradients in Infrared Atmospheric Sounding Interferometer (IASI) retrievals, we reduced satellite-model column discrepancies from 71% to 18%. We subsequently constrained NH3 emissions across China using a hybrid inversion framework combining iterative mass balance and four-dimensional variational methods. Our posterior emissions showed agreement with the a priori inventory (7.9% lower), suggesting that discrepancies between inventory approaches were amplified by overestimation of near-surface NH3 in baseline satellite retrievals, potentially causing a 43% overestimation of growing season emissions. Evaluation against ground-based measurements confirmed improved model performance, with normalized root-mean-square error reductions of 1-27% across six months. These findings demonstrate that accurate representation of vertical profiles in satellite retrievals is critical for robust NH3 emission estimates and can reconcile the long-standing discrepancy between bottom-up and top-down approaches. Our hybrid inversion methodology, leveraging profile-corrected satellite data, reveals that China's NH3 emissions exhibit greater spatial concentration than previously recognized, reflecting agricultural intensification. This advancement enables timely and accurate characterization of rapidly changing agricultural emission patterns, critical for implementing effective nitrogen pollution control measures.
Authors' comments: 44 pages, 14 figures, 1 table. The main text spans pages 1-28 and includes 5 figures. The supplementary information (SI) spans pages 29-44, containing 9 figures and 1 table

Vote

Add to Library

Recommend

8999. CPA-RAG:Covert Poisoning Attacks on Retrieval-Augmented Generation in Large Language Models

Chunyang Li, Junwei Zhang, Anda Cheng, Zhuo Ma, Xinghua Li, Jianfeng Ma

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.19864v1

Vote

Add to Library

Recommend

9000. DGRAG: Distributed Graph-based Retrieval-Augmented Generation in Edge-Cloud Systems

Wenqing Zhou, Yuxuan Yan, Qianqian Yang

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.19847v1

Vote

Add to Library

Recommend

Benty-search

8981. Logical Consistency is Vital: Neural-Symbolic Information Retrieval for Negative-Constraint Queries

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2505.22299v1

8982. Yambda-5B -- A Large-Scale Multi-modal Dataset for Ranking And Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2505.22238v1

8983. Multilingual vs Crosslingual Retrieval of Fact-Checked Claims: A Tale of Two Approaches

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2505.22118v1

8984. UAVPairs: A Challenging Benchmark for Match Pair Retrieval of Large-scale UAV Images

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2505.22098v1

8985. Safeguarding Privacy of Retrieval Data against Membership Inference Attacks: Is This Query Too Close to Home?

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2505.22061v1

8986. Decoding Dense Embeddings: Sparse Autoencoders for Interpreting and Discretizing Dense Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2506.00041v1

8987. Agent-UniRAG: A Trainable Open-Source LLM Agent Framework for Unified Retrieval-Augmented Generation Systems

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2505.22571v2

8988. Rethinking Chunk Size For Long-Document Retrieval: A Multi-Dataset Analysis

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2505.21700v1

8989. Query Drift Compensation: Enabling Compatibility in Continual Learning of Retrieval Embedding Models

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2506.00037v1

8990. Faithfulness-Aware Uncertainty Quantification for Fact-Checking the Output of Retrieval Augmented Generation

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2505.21072v1

8991. ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2505.20764v1

8992. UQLegalAI@COLIEE2025: Advancing Legal Case Retrieval with Large Language Models and Graph Neural Networks

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2505.20743v1

8993. What LLMs Miss in Recommendations: Bridging the Gap with Retrieval-Augmented Collaborative Signals

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2505.20730v1

8994. TeroSeek: An AI-Powered Knowledge Base and Retrieval Generation Platform for Terpenoid Research

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2505.20663v1

8995. Topology-Aware and Highly Generalizable Deep Reinforcement Learning for Efficient Retrieval in Multi-Deep Storage Systems

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2506.14787v1

8996. Retrieval Visual Contrastive Decoding to Mitigate Object Hallucinations in Large Vision-Language Models

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2505.20569v1

8997. HAND Me the Data: Fast Robot Adaptation via Hand Path Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2505.20455v1

8998. Vertical Profile Corrected Satellite NH3 Retrievals Enable Accurate Agricultural Emission Characterization in China

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2505.19942v1

8999. CPA-RAG:Covert Poisoning Attacks on Retrieval-Augmented Generation in Large Language Models

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2505.19864v1

9000. DGRAG: Distributed Graph-based Retrieval-Augmented Generation in Edge-Cloud Systems

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2505.19847v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.22299v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.22238v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.22118v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.22098v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.22061v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2506.00041v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.22571v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.21700v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2506.00037v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.21072v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.20764v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.20743v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.20730v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.20663v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2506.14787v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.20569v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.20455v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.19942v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.19864v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.19847v1