benty-fields - Search paper

8441. Mitigating Cross-modal Representation Bias for Multicultural Image-to-Recipe Retrieval

Qing Wang, Chong-Wah Ngo, Yu Cao, Ee-Peng Lim

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.20393v1

Vote

Add to Library

Recommend

8442. Multimedia-Aware Question Answering: A Review of Retrieval and Cross-Modal Reasoning Architectures

Rahul Raja, Arpita Vats

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.20193v1

Vote

Add to Library

Recommend

8443. XGen-Q: An Explainable Domain-Adaptive LLM Framework with Retrieval-Augmented Generation for Software Security

Hamed Jelodar, Mohammad Meymani, Roozbeh Razavi-Far, Ali A. Ghorbani

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.19006v1

Vote

Add to Library

Recommend

8444. Sherlock Your Queries: Learning to Ask the Right Questions for Dialogue-Based Retrieval

Dong Yun, Marco Schouten, Dim Papadopoulos

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.18659v1

Vote

Add to Library

Recommend

8445. EfficientNav: Towards On-Device Object-Goal Navigation with Navigation Map Caching and Retrieval

Zebin Yang, Sunjian Zheng, Tong Xie, Tianshi Xu, Bo Yu, Fan Wang, Jie Tang, Shaoshan Liu et al.

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.18546v1

Vote

Add to Library

Recommend

8446. Zero-Shot Vehicle Model Recognition via Text-Based Retrieval-Augmented Generation

Wei-Chia Chang, Yan-Ann Chen

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.18502v1

Vote

Add to Library

Recommend

8447. Frame-Difference Guided Dynamic Region Perception for CLIP Adaptation in Text-Video Retrieval

Jiaao Yu, Mingjie Han, Tao Gong, Jian Zhang, Man Lan

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.21806v1

Vote

Add to Library

Recommend

8448. KrishokBondhu: A Retrieval-Augmented Voice-Based Agricultural Advisory Call Center for Bengali Farmers

Mohd Ruhul Ameen, Akif Islam, Farjana Aktar, M. Saifuzzaman Rafat

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.18355v1

In Bangladesh, many farmers continue to face challenges in accessing timely, expert-level agricultural guidance. This paper presents KrishokBondhu, a voice-enabled, call-centre-integrated advisory platform built on a Retrieval-Augmented Generation (RAG) framework, designed specifically for Bengali-speaking farmers. The system aggregates authoritative agricultural handbooks, extension manuals, and NGO publications; applies Optical Character Recognition (OCR) and document-parsing pipelines to digitize and structure the content; and indexes this corpus in a vector database for efficient semantic retrieval. Through a simple phone-based interface, farmers can call the system to receive real-time, context-aware advice: speech-to-text converts the Bengali query, the RAG module retrieves relevant content, a large language model (Gemma 3-4B) generates a context-grounded response, and text-to-speech delivers the answer in natural spoken Bengali. In a pilot evaluation, KrishokBondhu produced high-quality responses for 72.7% of diverse agricultural queries covering crop management, disease control, and cultivation practices. Compared to the KisanQRS benchmark, the system achieved a composite score of 4.53 (vs. 3.13) on a 5-point scale, a 44.7% improvement, with especially large gains in contextual richness (+367%) and completeness (+100.4%), while maintaining comparable relevance and technical specificity. Semantic similarity analysis further revealed a strong correlation between retrieved context and answer quality, emphasizing the importance of grounding generative responses in curated documentation. KrishokBondhu demonstrates the feasibility of integrating call-centre accessibility, multilingual voice interaction, and modern RAG techniques to deliver expert-level agricultural guidance to remote Bangladeshi farmers, paving the way toward a fully AI-driven agricultural advisory ecosystem.
Authors' comments: 6 pages, 7 figures, 5 tables, submitted to the 11th IEEE International Women in Engineering (WIE) Conference on Electrical and Computer Engineering (WIECON-ECE 2025)

Vote

Add to Library

Recommend

8449. Proactive Reasoning-with-Retrieval Framework for Medical Multimodal Large Language Models

Lehan Wang, Yi Qin, Honglong Yang, Xiaomeng Li

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.18303v1

Vote

Add to Library

Recommend

8450. From Retrieval to Generation: Unifying External and Parametric Knowledge for Medical Question Answering

Lei Li, Xiao Zhou, Yingying Zhang, Xian Wu

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.18297v1

Vote

Add to Library

Recommend

8451. Train for Truth, Keep the Skills: Binary Retrieval-Augmented Reward Mitigates Hallucinations

Tong Chen, Akari Asai, Luke Zettlemoyer, Hannaneh Hajishirzi, Faeze Brahman

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.17733v1

Vote

Add to Library

Recommend

8452. ParaVul: A Parallel Large Language Model and Retrieval-Augmented Framework for Smart Contract Vulnerability Detection

Tenghui Huang, Jinbo Wen, Jiawen Kang, Siyong Chen, Zhengtao Li, Tao Zhang, Dongning Liu, Jiacheng Wang et al.

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.17919v1

Vote

Add to Library

Recommend

8453. Right Answer at the Right Time - Temporal Retrieval-Augmented Generation via Graph Summarization

Zulun Zhu, Haoyu Liu, Mengke He, Siqiang Luo

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.16715v1

Vote

Add to Library

Recommend

8454. RefAtomNet++: Advancing Referring Atomic Video Action Recognition using Semantic Retrieval based Multi-Trajectory Mamba

Kunyu Peng, Di Wen, Jia Fu, Jiamin Wu, Kailun Yang, Junwei Zheng, Ruiping Liu, Yufan Chen et al.

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.16444v1

Referring Atomic Video Action Recognition (RAVAR) aims to recognize fine-grained, atomic-level actions of a specific person of interest conditioned on natural language descriptions. Distinct from conventional action recognition and detection tasks, RAVAR emphasizes precise language-guided action understanding, which is particularly critical for interactive human action analysis in complex multi-person scenarios. In this work, we extend our previously introduced RefAVA dataset to RefAVA++, which comprises >2.9 million frames and >75.1k annotated persons in total. We benchmark this dataset using baselines from multiple related domains, including atomic action localization, video question answering, and text-video retrieval, as well as our earlier model, RefAtomNet. Although RefAtomNet surpasses other baselines by incorporating agent attention to highlight salient features, its ability to align and retrieve cross-modal information remains limited, leading to suboptimal performance in localizing the target person and predicting fine-grained actions. To overcome the aforementioned limitations, we introduce RefAtomNet++, a novel framework that advances cross-modal token aggregation through a multi-hierarchical semantic-aligned cross-attention mechanism combined with multi-trajectory Mamba modeling at the partial-keyword, scene-attribute, and holistic-sentence levels. In particular, scanning trajectories are constructed by dynamically selecting the nearest visual spatial tokens at each timestep for both partial-keyword and scene-attribute levels. Moreover, we design a multi-hierarchical semantic-aligned cross-attention strategy, enabling more effective aggregation of spatial and temporal tokens across different semantic hierarchies. Experiments show that RefAtomNet++ establishes new state-of-the-art results. The dataset and code are released at https://github.com/KPeng9510/refAVA2.
Authors' comments: Extended version of ECCV 2024 paper arXiv:2407.01872. The dataset and code are released at https://github.com/KPeng9510/refAVA2

Vote

Add to Library

Recommend

8455. Memory-SAM: Human-Prompt-Free Tongue Segmentation via Retrieval-to-Prompt

Joongwon Chae, Lihui Luo, Xi Yuan, Dongmei Yu, Zhenglin Chen, Lian Zhang, Peiwu Qin

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.15849v1

Vote

Add to Library

Recommend

8456. Demo: Guide-RAG: Evidence-Driven Corpus Curation for Retrieval-Augmented Generation in Long COVID

Philip DiGiacomo, Haoyang Wang, Jinrui Fang, Yan Leng, W Michael Brode, Ying Ding

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.15782v1

Vote

Add to Library

Recommend

8457. Think Parallax: Solving Multi-Hop Problems via Multi-View Knowledge-Graph-Based Retrieval-Augmented Generation

Jinliang Liu

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.15552v1

Vote

Add to Library

Recommend

8458. MSAM: Multi-Semantic Adaptive Mining for Cross-Modal Drone Video-Text Retrieval

Jinghao Huang, Yaxiong Chen, Ganchao Liu

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.15470v1

Vote

Add to Library

Recommend

8459. GRank: Towards Target-Aware and Streamlined Industrial Retrieval with a Generate-Rank Framework

Yijia Sun, Shanshan Huang, Zhiyuan Guan, Qiang Luo, Ruiming Tang, Kun Gai, Guorui Zhou

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.15299v1

Vote

Add to Library

Recommend

8460. Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding

Sensen Gao, Shanshan Zhao, Xu Jiang, Lunhao Duan, Yong Xien Chng, Qing-Guo Chen, Weihua Luo, Kaifu Zhang et al.

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.15253v1

Vote

Add to Library

Recommend

Benty-search

8441. Mitigating Cross-modal Representation Bias for Multicultural Image-to-Recipe Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2510.20393v1

8442. Multimedia-Aware Question Answering: A Review of Retrieval and Cross-Modal Reasoning Architectures

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2510.20193v1

8443. XGen-Q: An Explainable Domain-Adaptive LLM Framework with Retrieval-Augmented Generation for Software Security

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2510.19006v1

8444. Sherlock Your Queries: Learning to Ask the Right Questions for Dialogue-Based Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2510.18659v1

8445. EfficientNav: Towards On-Device Object-Goal Navigation with Navigation Map Caching and Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2510.18546v1

8446. Zero-Shot Vehicle Model Recognition via Text-Based Retrieval-Augmented Generation

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2510.18502v1

8447. Frame-Difference Guided Dynamic Region Perception for CLIP Adaptation in Text-Video Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2510.21806v1

8448. KrishokBondhu: A Retrieval-Augmented Voice-Based Agricultural Advisory Call Center for Bengali Farmers

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2510.18355v1

8449. Proactive Reasoning-with-Retrieval Framework for Medical Multimodal Large Language Models

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2510.18303v1

8450. From Retrieval to Generation: Unifying External and Parametric Knowledge for Medical Question Answering

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2510.18297v1

8451. Train for Truth, Keep the Skills: Binary Retrieval-Augmented Reward Mitigates Hallucinations

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2510.17733v1

8452. ParaVul: A Parallel Large Language Model and Retrieval-Augmented Framework for Smart Contract Vulnerability Detection

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2510.17919v1

8453. Right Answer at the Right Time - Temporal Retrieval-Augmented Generation via Graph Summarization

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2510.16715v1

8454. RefAtomNet++: Advancing Referring Atomic Video Action Recognition using Semantic Retrieval based Multi-Trajectory Mamba

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2510.16444v1

8455. Memory-SAM: Human-Prompt-Free Tongue Segmentation via Retrieval-to-Prompt

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2510.15849v1

8456. Demo: Guide-RAG: Evidence-Driven Corpus Curation for Retrieval-Augmented Generation in Long COVID

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2510.15782v1

8457. Think Parallax: Solving Multi-Hop Problems via Multi-View Knowledge-Graph-Based Retrieval-Augmented Generation

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2510.15552v1

8458. MSAM: Multi-Semantic Adaptive Mining for Cross-Modal Drone Video-Text Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2510.15470v1

8459. GRank: Towards Target-Aware and Streamlined Industrial Retrieval with a Generate-Rank Framework

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2510.15299v1

8460. Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2510.15253v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.20393v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.20193v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.19006v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.18659v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.18546v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.18502v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.21806v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.18355v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.18303v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.18297v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.17733v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.17919v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.16715v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.16444v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.15849v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.15782v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.15552v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.15470v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.15299v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.15253v1