benty-fields - Search paper

Recently, large-scale pre-training methods like CLIP have made great progress in multi-modal research such as text-video retrieval. In CLIP, transformers are vital for modeling complex multi-modal relations. However, in the vision transformer of CLIP, the essential visual tokenization process, which produces discrete visual token sequences, generates many homogeneous tokens due to the redundancy nature of consecutive and similar frames in videos. This significantly increases computation costs and hinders the deployment of video retrieval models in web applications. In this paper, to reduce the number of redundant video tokens, we design a multi-segment token clustering algorithm to find the most representative tokens and drop the non-essential ones. As the frame redundancy occurs mostly in consecutive frames, we divide videos into multiple segments and conduct segment-level clustering. Center tokens from each segment are later concatenated into a new sequence, while their original spatial-temporal relations are well maintained. We instantiate two clustering algorithms to efficiently find deterministic medoids and iteratively partition groups in high dimensional space. Through this token clustering and center selection procedure, we successfully reduce computation costs by removing redundant visual tokens. This method further enhances segment-level semantic alignment between video and text representations, enforcing the spatio-temporal interactions of tokens from within-segment frames. Our method, coined as CenterCLIP, surpasses existing state-of-the-art by a large margin on typical text-video benchmarks, while reducing the training memory cost by 35\% and accelerating the inference speed by 14\% at the best case. The code is available at \href{{https://github.com/mzhaoshuai/CenterCLIP}}{{https://github.com/mzhaoshuai/CenterCLIP}}.
Authors' comments: accepted by SIGIR 2022, code is at https://github.com/mzhaoshuai/CenterCLIP

Vote

Add to Library

Recommend

6456. Relevance-based Margin for Contrastively-trained Video Retrieval Models

Alex Falcon, Swathikiran Sudhakaran, Giuseppe Serra, Sergio Escalera, Oswald Lanz

Proceedings of the 2022 International Conference on Multimedia Retrieval (2022)

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2204.13001v1

Vote

Add to Library

Recommend

6457. Retrieving black hole information from the main Lorentzian saddle point

Cristiano Germani

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2204.13046v2

Vote

Add to Library

Recommend

6458. A Thorough Examination on Zero-shot Dense Retrieval

Ruiyang Ren, Yingqi Qu, Jing Liu, Wayne Xin Zhao, Qifei Wu, Yuchen Ding, Hua Wu, Haifeng Wang et al.

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2204.12755v2

Vote

Add to Library

Recommend

6459. Cross-Camera Trajectories Help Person Retrieval in a Camera Network

Xin Zhang, Xiaohua Xie, Jianhuang Lai, Wei-Shi Zheng

IEEE Transactions on Image Processing, 1-1 (2023)

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2204.12900v3

Vote

Add to Library

Recommend

6460. Evaluating Interpolation and Extrapolation Performance of Neural Retrieval Models

Jingtao Zhan, Xiaohui Xie, Jiaxin Mao, Yiqun Liu, Jiafeng Guo, Min Zhang, Shaoping Ma

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2204.11447v2

Vote

Add to Library

Recommend

Benty-search

6441. Two-Step Question Retrieval for Open-Domain QA

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2205.09393v1

6442. Health Information Retrieval -- State of the art report

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2205.09083v1

6443. VRAG: Region Attention Graphs for Content-Based Video Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2205.09068v1

6444. Entailment Tree Explanations via Iterative Retrieval-Generation Reasoner

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2205.09224v2

6445. Debiasing Neural Retrieval via In-batch Balancing Regularization

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2205.09240v1

6446. Modeling Exemplification in Long-form Question Answering via Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2205.09278v1

6447. A CLIP-Hitchhiker's Guide to Long Video Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2205.08508v1

6448. Digital Blind Box: Random Symmetric Private Information Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2205.07828v1

6449. Beyond Griffin-Lim: Improved Iterative Phase Retrieval for Speech

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2205.05496v1

6450. Parallel Private Retrieval of Merkle Proofs via Tree Colorings

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2205.05211v4

6451. Cross-lingual Adaptation for Recipe Retrieval with Mixup

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2205.03891v1

6452. Better Retrieval May Not Lead to Better Question Answering

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2205.03685v1

6453. Dimension Reduction for Efficient Dense Retrieval via Conditional Autoencoder

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2205.03284v2

6454. Relation Extraction as Open-book Examination: Retrieval-enhanced Prompt Tuning

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2205.02355v2

6455. CenterCLIP: Token Clustering for Efficient Text-Video Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2205.00823v1

6456. Relevance-based Margin for Contrastively-trained Video Retrieval Models

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2204.13001v1

6457. Retrieving black hole information from the main Lorentzian saddle point

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2204.13046v2

6458. A Thorough Examination on Zero-shot Dense Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2204.12755v2

6459. Cross-Camera Trajectories Help Person Retrieval in a Camera Network

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2204.12900v3

6460. Evaluating Interpolation and Extrapolation Performance of Neural Retrieval Models

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2204.11447v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2205.09393v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2205.09083v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2205.09068v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2205.09224v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2205.09240v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2205.09278v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2205.08508v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2205.07828v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2205.05496v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2205.05211v4

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2205.03891v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2205.03685v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2205.03284v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2205.02355v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2205.00823v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2204.13001v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2204.13046v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2204.12755v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2204.12900v3

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2204.11447v2