benty-fields - Search paper

Current dense retrievers (DRs) are limited in their ability to effectively process misspelled queries, which constitute a significant portion of query traffic in commercial search engines. The main issue is that the pre-trained language model-based encoders used by DRs are typically trained and fine-tuned using clean, well-curated text data. Misspelled queries are typically not found in the data used for training these models, and thus misspelled queries observed at inference time are out-of-distribution compared to the data used for training and fine-tuning. Previous efforts to address this issue have focused on \textit{fine-tuning} strategies, but their effectiveness on misspelled queries remains lower than that of pipelines that employ separate state-of-the-art spell-checking components. To address this challenge, we propose ToRoDer (TypOs-aware bottlenecked pre-training for RObust DEnse Retrieval), a novel re-training strategy for DRs that increases their robustness to misspelled queries while preserving their effectiveness in downstream retrieval tasks. ToRoDer utilizes an encoder-decoder architecture where the encoder takes misspelled text with masked tokens as input and outputs bottlenecked information to the decoder. The decoder then takes as input the bottlenecked embeddings, along with token embeddings of the original text with the misspelled tokens masked out. The pre-training task is to recover the masked tokens for both the encoder and decoder. Our extensive experimental results and detailed ablation studies show that DRs pre-trained with ToRoDer exhibit significantly higher effectiveness on misspelled queries, sensibly closing the gap with pipelines that use a separate, complex spell-checker component, while retaining their effectiveness on correctly spelled queries.
Authors' comments: 10 pages, accepted at SIGIR-AP

Vote

Add to Library

Recommend

6186. Language Guided Local Infiltration for Interactive Image Retrieval

Fuxiang Huang, Lei Zhang

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2304.07747v1

Vote

Add to Library

Recommend

6187. Enriching Simple Keyword Queries for Domain-Aware Narrative Retrieval

Hermann Kroll, Christin Katharina Kreutz, Pascal Sackhoff, Wolf-Tilo Balke

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2304.07604v2

Vote

Add to Library

Recommend

6188. Deep Metric Multi-View Hashing for Multimedia Retrieval

Jian Zhu, Zhangmin Huang, Xiaohu Ruan, Yu Cui, Yongli Cheng, Lingfang Zeng

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2304.06358v1

Vote

Add to Library

Recommend

6189. Unicom: Universal and Compact Representation Learning for Image Retrieval

Xiang An, Jiankang Deng, Kaicheng Yang, Jaiwei Li, Ziyong Feng, Jia Guo, Jing Yang, Tongliang Liu

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2304.05884v1

Vote

Add to Library

Recommend

6190. Conjugate phase retrieval in a complex shift-invariant space

Yang Chen, Yanan Wang

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2304.06206v1

Vote

Add to Library

Recommend

6191. TextANIMAR: Text-based 3D Animal Fine-Grained Retrieval

Trung-Nghia Le, Tam V. Nguyen c, Minh-Quan Le, Trong-Thuan Nguyen, Viet-Tham Huynh, Trong-Le Do, Khanh-Duy Le, Mai-Khiem Tran et al.

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2304.06053v1

Vote

Add to Library

Recommend

6192. SketchANIMAR: Sketch-based 3D Animal Fine-Grained Retrieval

Trung-Nghia Le, Tam V. Nguyen, Minh-Quan Le, Trong-Thuan Nguyen, Viet-Tham Huynh, Trong-Le Do, Khanh-Duy Le, Mai-Khiem Tran et al.

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2304.05731v1

Vote

Add to Library

Recommend

6193. A Decision Tree to Shepherd Scientists through Data Retrievability

Andrea Bianchi, Giordano d'Aloisio, Francesca Marzi, Antinisca Di Marco

Second Workshop on Reproducibility and Replication of Research Results (RRRR 2023)

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2304.05767v1

Vote

Add to Library

Recommend

6194. LADER: Log-Augmented DEnse Retrieval for Biomedical Literature Search

Qiao Jin, Andrew Shin, Zhiyong Lu

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2304.04590v1

Queries with similar information needs tend to have similar document clicks, especially in biomedical literature search engines where queries are generally short and top documents account for most of the total clicks. Motivated by this, we present a novel architecture for biomedical literature search, namely Log-Augmented DEnse Retrieval (LADER), which is a simple plug-in module that augments a dense retriever with the click logs retrieved from similar training queries. Specifically, LADER finds both similar documents and queries to the given query by a dense retriever. Then, LADER scores relevant (clicked) documents of similar queries weighted by their similarity to the input query. The final document scores by LADER are the average of (1) the document similarity scores from the dense retriever and (2) the aggregated document scores from the click logs of similar queries. Despite its simplicity, LADER achieves new state-of-the-art (SOTA) performance on TripClick, a recently released benchmark for biomedical literature retrieval. On the frequent (HEAD) queries, LADER largely outperforms the best retrieval model by 39% relative NDCG@10 (0.338 v.s. 0.243). LADER also achieves better performance on the less frequent (TORSO) queries with 11% relative NDCG@10 improvement over the previous SOTA (0.303 v.s. 0.272). On the rare (TAIL) queries where similar queries are scarce, LADER still compares favorably to the previous SOTA method (NDCG@10: 0.310 v.s. 0.295). On all queries, LADER can improve the performance of a dense retriever by 24%-37% relative NDCG@10 while not requiring additional training, and further performance improvement is expected from more logs. Our regression analysis has shown that queries that are more frequent, have higher entropy of query similarity and lower entropy of document similarity, tend to benefit more from log augmentation.
Authors' comments: SIGIR 2023

Vote

Add to Library

Recommend

6195. Unsupervised Multi-Criteria Adversarial Detection in Deep Image Retrieval

Yanru Xiao, Cong Wang, Xing Gao

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2304.04228v1

Vote

Add to Library

Recommend

6196. Memory Storage and Retrieval in Sparsely Connected Balanced Networks

Enrico Ventura

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2305.07656v1

Vote

Add to Library

Recommend

6197. From Retrieval to Generation: Efficient and Effective Entity Set Expansion

Shulin Huang, Shirong Ma, Yangning Li, Yinghui Li, Hai-Tao Zheng

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2304.03531v3

Vote

Add to Library

Recommend

6198. Noise-Robust Dense Retrieval via Contrastive Alignment Post Training

Daniel Campos, ChengXiang Zhai, Alessandro Magnani

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2304.03401v2

Vote

Add to Library

Recommend

6199. Exposing and Mitigating Spurious Correlations for Cross-Modal Retrieval

Jae Myung Kim, A. Sophia Koepke, Cordelia Schmid, Zeynep Akata

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2304.03391v1

Vote

Add to Library

Recommend

6200. An Intrinsic Framework of Information Retrieval Evaluation Measures

Fernando Giner

LNNS 822 (2024) 692-713

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2304.00615v1

Vote

Add to Library

Recommend

Benty-search

6181. Is Cross-modal Information Retrieval Possible without Training?

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2304.11095v1

6182. Phase-Retrieval with Incomplete Autocorrelations Using Deep Convolutional Autoencoders

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2304.09303v2

6183. Behavior Retrieval: Few-Shot Imitation Learning by Querying Unlabeled Datasets

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2304.08742v2

6184. Statute-enhanced lexical retrieval of court cases for COLIEE 2022

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2304.08188v1

6185. Typos-aware Bottlenecked Pre-Training for Robust Dense Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2304.08138v2

6186. Language Guided Local Infiltration for Interactive Image Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2304.07747v1

6187. Enriching Simple Keyword Queries for Domain-Aware Narrative Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2304.07604v2

6188. Deep Metric Multi-View Hashing for Multimedia Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2304.06358v1

6189. Unicom: Universal and Compact Representation Learning for Image Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2304.05884v1

6190. Conjugate phase retrieval in a complex shift-invariant space

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2304.06206v1

6191. TextANIMAR: Text-based 3D Animal Fine-Grained Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2304.06053v1

6192. SketchANIMAR: Sketch-based 3D Animal Fine-Grained Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2304.05731v1

6193. A Decision Tree to Shepherd Scientists through Data Retrievability

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2304.05767v1

6194. LADER: Log-Augmented DEnse Retrieval for Biomedical Literature Search

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2304.04590v1

6195. Unsupervised Multi-Criteria Adversarial Detection in Deep Image Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2304.04228v1

6196. Memory Storage and Retrieval in Sparsely Connected Balanced Networks

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2305.07656v1

6197. From Retrieval to Generation: Efficient and Effective Entity Set Expansion

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2304.03531v3

6198. Noise-Robust Dense Retrieval via Contrastive Alignment Post Training

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2304.03401v2

6199. Exposing and Mitigating Spurious Correlations for Cross-Modal Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2304.03391v1

6200. An Intrinsic Framework of Information Retrieval Evaluation Measures

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2304.00615v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2304.11095v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2304.09303v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2304.08742v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2304.08188v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2304.08138v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2304.07747v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2304.07604v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2304.06358v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2304.05884v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2304.06206v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2304.06053v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2304.05731v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2304.05767v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2304.04590v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2304.04228v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2305.07656v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2304.03531v3

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2304.03401v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2304.03391v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2304.00615v1