benty-fields - Search paper

Video moment retrieval targets at retrieving a moment in a video for a given language query. The challenges of this task include 1) the requirement of localizing the relevant moment in an untrimmed video, and 2) bridging the semantic gap between textual query and video contents. To tackle those problems, early approaches adopt the sliding window or uniform sampling to collect video clips first and then match each clip with the query. Obviously, these strategies are time-consuming and often lead to unsatisfied accuracy in localization due to the unpredictable length of the golden moment. To avoid the limitations, researchers recently attempt to directly predict the relevant moment boundaries without the requirement to generate video clips first. One mainstream approach is to generate a multimodal feature vector for the target query and video frames (e.g., concatenation) and then use a regression approach upon the multimodal feature vector for boundary detection. Although some progress has been achieved by this approach, we argue that those methods have not well captured the cross-modal interactions between the query and video frames. In this paper, we propose an Attentive Cross-modal Relevance Matching (ACRM) model which predicts the temporal boundaries based on an interaction modeling. In addition, an attention module is introduced to assign higher weights to query words with richer semantic cues, which are considered to be more important for finding relevant video contents. Another contribution is that we propose an additional predictor to utilize the internal frames in the model training to improve the localization accuracy. Extensive experiments on two datasets TACoS and Charades-STA demonstrate the superiority of our method over several state-of-the-art methods. Ablation studies have been also conducted to examine the effectiveness of different modules in our ACRM model.
Authors' comments: 12 pages; accepted by IEEE TMM

Vote

Add to Library

Recommend

5. Towards Accurate Pixel-wise Object Tracking by Attention Retrieval

Zhipeng Zhang, Bing Li, Weiming Hu, Houwen Peng

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2008.02745v3

Vote

Add to Library

Recommend

6. Optimizing Retrieval Components for a Shared Backbone via Component-Wise Multi-Stage Training

Yunhan Li, Mingjie Xie, Zihan Gong, Zeyang Shi, Gengshen Wu, Min Yang

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.00805v1

Vote

Add to Library

Recommend

7. STEPER: Step-wise Knowledge Distillation for Enhancing Reasoning Ability in Multi-Step Retrieval-Augmented Language Models

Kyumin Lee, Minjin Jeon, Sanghwan Jang, Hwanjo Yu

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.07923v1

Vote

Add to Library

Recommend

8. Learning to Route Queries Across Knowledge Bases for Step-wise Retrieval-Augmented Reasoning

Chunyi Peng, Zhipeng Xu, Zhenghao Liu, Yishan Li, Yukun Yan, Shuo Wang, Zhiyuan Liu, Yu Gu et al.

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.22095v1

Vote

Add to Library

Recommend

9. RISSOLE: Parameter-efficient Diffusion Models via Block-wise Generation and Retrieval-Guidance

Avideep Mukherjee, Soumya Banerjee, Piyush Rai, Vinay P. Namboodiri

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2408.17095v2

Vote

Add to Library

Recommend

10. LRP4RAG: Detecting Hallucinations in Retrieval-Augmented Generation via Layer-wise Relevance Propagation

Haichuan Hu, Yuhan Sun, Qunjun Zhang

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2408.15533v1

Vote

Add to Library

Recommend

11. PromptDSI: Prompt-based Rehearsal-free Instance-wise Incremental Learning for Document Retrieval

Tuan-Luc Huynh, Thuy-Trang Vu, Weiqing Wang, Yinwei Wei, Trung Le, Dragan Gasevic, Yuan-Fang Li, Thanh-Toan Do

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2406.12593v1

Vote

Add to Library

Recommend

12. TRAD: Enhancing LLM Agents with Step-Wise Thought Retrieval and Aligned Decision

Ruiwen Zhou, Yingxuan Yang, Muning Wen, Ying Wen, Wenhao Wang, Chunling Xi, Guoqiang Xu, Yong Yu et al.

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2403.06221v1

Vote

Add to Library

Recommend

13. Information retrieval for label noise document ranking by bag sampling and group-wise loss

Chunyu Li, Jiajia Ding, Xing hu, Fan Wang

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2203.06408v1

Vote

Add to Library

Recommend

14. Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely

Siyun Zhao, Yuqing Yang, Zilong Wang, Zhiyuan He, Luna K. Qiu, Lili Qiu

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2409.14924v1

Vote

Add to Library

Recommend

15. Probing the Heights and Depths of Y Dwarf Atmospheres: A Retrieval Analysis of the JWST Spectral Energy Distribution of WISE J035934.06$-$540154.6

Harshil Kothari, Michael C. Cushing, Ben Burningham, Samuel A. Beiler, J. Davy Kirkpatrick, Adam C. Schneider, Sagnick Mukherjee, Mark S. Marley

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2406.06493v1

Vote

Add to Library

Recommend

16. Finite Field Multiple Access II:from Symbol-wise to Codeword-wise

Qi-yue Yu, Shi-wen Lin, Ting-wei Yang

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2503.09991v2

Vote

Add to Library

Recommend

17. Layer-wise and Dimension-wise Locally Adaptive Federated Learning

Belhal Karimi, Ping Li, Xiaoyun Li

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2110.00532v3

Vote

Add to Library

Recommend

18. A deep WISE search for very late type objects and the discovery of two halo/thick-disk T dwarfs: WISE 0013+0634 and WISE 0833+0052

D. J. Pinfield, J. Gomes, A. C. Day-Jones, S. K. Leggett, M. Gromadzki, B. Burningham, M. T. Ruiz, R. Kurtev et al.

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 1308.0495v1

Vote

Add to Library

Recommend

19. Diagnosing FP4 inference: a layer-wise and block-wise sensitivity analysis of NVFP4 and MXFP4

Musa Cim, Burak Topcu, Mahmut Taylan Kandemir

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.08747v1

Vote

Add to Library

Recommend

20. Enhanced Multimodal Hate Video Detection via Channel-wise and Modality-wise Fusion

Yinghui Zhang, Tailin Chen, Yuchen Zhang, Zeyu Fu

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.12051v1

Vote

Add to Library

Recommend

Benty-search

1. Channel-wise Retrieval for Multivariate Time Series Forecasting

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2604.05543v1

2. Token-wise Influential Training Data Retrieval for Large Language Models

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2405.11724v2

3. BagFormer: Better Cross-Modal Retrieval via bag-wise interaction

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2212.14322v1

4. Frame-wise Cross-modal Matching for Video Moment Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2009.10434v2

5. Towards Accurate Pixel-wise Object Tracking by Attention Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2008.02745v3

6. Optimizing Retrieval Components for a Shared Backbone via Component-Wise Multi-Stage Training

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2602.00805v1

7. STEPER: Step-wise Knowledge Distillation for Enhancing Reasoning Ability in Multi-Step Retrieval-Augmented Language Models

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2510.07923v1

8. Learning to Route Queries Across Knowledge Bases for Step-wise Retrieval-Augmented Reasoning

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2505.22095v1

9. RISSOLE: Parameter-efficient Diffusion Models via Block-wise Generation and Retrieval-Guidance

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2408.17095v2

10. LRP4RAG: Detecting Hallucinations in Retrieval-Augmented Generation via Layer-wise Relevance Propagation

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2408.15533v1

11. PromptDSI: Prompt-based Rehearsal-free Instance-wise Incremental Learning for Document Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2406.12593v1

12. TRAD: Enhancing LLM Agents with Step-Wise Thought Retrieval and Aligned Decision

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2403.06221v1

13. Information retrieval for label noise document ranking by bag sampling and group-wise loss

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2203.06408v1

14. Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2409.14924v1

15. Probing the Heights and Depths of Y Dwarf Atmospheres: A Retrieval Analysis of the JWST Spectral Energy Distribution of WISE J035934.06$-$540154.6

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2406.06493v1

16. Finite Field Multiple Access II:from Symbol-wise to Codeword-wise

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2503.09991v2

17. Layer-wise and Dimension-wise Locally Adaptive Federated Learning

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2110.00532v3

18. A deep WISE search for very late type objects and the discovery of two halo/thick-disk T dwarfs: WISE 0013+0634 and WISE 0833+0052

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 1308.0495v1

19. Diagnosing FP4 inference: a layer-wise and block-wise sensitivity analysis of NVFP4 and MXFP4

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2603.08747v1

20. Enhanced Multimodal Hate Video Detection via Channel-wise and Modality-wise Fusion

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2505.12051v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2604.05543v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2405.11724v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2212.14322v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2009.10434v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2008.02745v3

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2602.00805v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2510.07923v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.22095v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2408.17095v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2408.15533v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2406.12593v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2403.06221v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2203.06408v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2409.14924v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2406.06493v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2503.09991v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2110.00532v3

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 1308.0495v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2603.08747v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2505.12051v1